close

Вход

Забыли?

вход по аккаунту

?

JP2008178087

код для вставкиСкачать
Patent Translate
Powered by EPO and Google
Notice
This translation is machine-generated. It cannot be guaranteed that it is intelligible, accurate,
complete, reliable or fit for specific purposes. Critical decisions, such as commercially relevant or
financial decisions, should not be based on machine-translation output.
DESCRIPTION JP2008178087
A method of echo compensation in a loudspeaker-indoor-microphone system is provided. The
present invention relates to a method of echo compensating at least one microphone signal
including echo signal contribution due to a loudspeaker signal in a loudspeaker-roommicrophone system, said method comprising: Converting at least a portion, obtaining a first
loudspeaker sub-band signal, storing the loudspeaker sub-band signal down-sampled by a second
down-sampling rate, and Convoluting the speaker sub-band signal, obtaining echo signal
contributions that are virtually down-sampled by the first down-sampling rate, and obtaining an
error sub-band signal. [Selected figure] Figure 1
Low complexity echo compensation
[0001]
The present invention relates to signal processing systems and methods, and more particularly to
audio signal processing with acoustic echo compensation. The present invention relates to echo
compensation through the processing of downsampled sub-band signals.
[0002]
In a communication system equipped with a microphone, echo compensation is a basic topic,
which is not only the necessary signal, such as the voice signal of the user of a speech
recognition system or a hands-free telephone set, but also the same room as the microphone It
04-05-2019
1
also detects the noisy signal output by the loudspeakers of the same communication system
installed in. In the case of a hands-free set, it is inevitable, for example, that signals received from
the remote party and output by nearby loudspeakers are re-supplied to the communication
system by the nearby microphones and transmitted again to the remote party must not.
Microphone detection of the signal output by the loudspeaker can result in unpleasant acoustic
echoes and can also lead to a complete interruption of communication if the acoustic echoes are
not sufficiently dampened or substantially eliminated.
[0003]
Similar problems occur with speech recognition systems used in noisy environments. It must be
prevented that a signal different from the user's voice signal is provided to the recognition unit.
However, the microphone of the speech recognition system may detect, for example, the output
of a loudspeaker representing an audio signal reproduced by an audio device such as a CD or
DVD player or a radio. If these signals are not sufficiently filtered, the necessary signals
representing the user's utterances may be enveloped in noise, possibly to the extent that
adequate speech recognition is not possible. In recent years, several methods for echo
compensation have been proposed and implemented in communication systems. An adaptive
filter used to make a transfer function (impulse response) of a loudspeaker-room-microphone
(LRM) system using an adaptive finite impulse response (FIR) filter is used for echo
compensation of the acoustic signal ( See, for example, Non-Patent Document 1).
[0004]
One of the known methods of adapting filters for echo compensation is based on a normalized
least mean square algorithm. However, in the case of speech signals, it is known that
convergence is usually somewhat slow, because successive signal samples are often correlated.
On the other hand, the acceleration of convergence characteristics requires relatively high
computational resources in terms of memory capacity and processor load. Although the signal
processing is usually performed in the area of the downsampled sub-band in order to include the
demand for high-performance computing means to a reasonable level, compared with the
processing of the full band in that area The computational complexity is in principle reduced.
[0005]
04-05-2019
2
If a higher downsampling rate of the sub-band signal processed for echo compensation is
selected, the cost of calculation may be reduced accordingly. However, in the art, selection of the
appropriate downsampling factor is generally limited by known aliasing issues. The Hann
window or other selected filters exhibit different aliasing characteristics. The artifact increases
with increasing downsampling rate, and furthermore, if the downsampling rate exceeds some
threshold, the echo decay rate becomes insufficient.
[0006]
Thus, despite recent engineering advances, problems still exist with effective echo compensation
of audio signals with an acceptable time delay, in particular with echo compensation in verbal
hands-free communication. Acoustic Echo and Noise Control、
E.Haensler and G. Schmidt、John Wiley & Sons、New
York、 2004
[0007]
The above mentioned problems are solved or at least alleviated by the method of echo
compensation in a loudspeaker-room-microphone system. At least one microphone signal (y (n))
including an echo signal contribution (d (n)) by the loudspeaker signal (x (n)) is echo
compensated. A method provided in accordance with claim 1 comprises a microphone sub-band
signal (yμ (n) downsampled by a first downsampling rate (r) at least a portion of the at least one
microphone signal (y (n)). Converting the loudspeaker signal (x (n)) to a loudspeaker sub-band
signal down-sampled by a second down-sampling rate less than the first down-sampling rate)
Obtaining a first loudspeaker sub-band signal; storing the loudspeaker sub-band signal
downsampled by the second down-sampling rate; and the loudspeakers in a predetermined
number of sub-bands-room- Microphone system Convoluting the first loudspeaker sub-band
signal with an estimate for the unpulsed response, and down-sampling the convoluted first
loudspeaker sub-band signal at a third down-sampling rate An echo signal contribution that is
effectively downsampled by the first downsampling rate (r) to a sub-band (where the microphone
is split)
[0008]
Obtaining the estimated echo signal contribution from the respective microphone sub-band
signal (yμ (n)) for each sub-band.
04-05-2019
3
[0009]
Obtaining an error subband signal (eμ (n)) by subtracting (one for each subband).
[0010]
The sub-band signal (eμ (n)) representing the echo-compensated microphone sub-band signal is
then given a predetermined upsampling rate, in particular the first downsampling rate (which is
the product of the second and third downsampling rates) By being upsampled and synthesized by
the same speed as that of H.sub.i, an enhanced microphone signal can be obtained.
The enhanced microphone signal may be sent to the remote communication party.
However, it should be understood that the sub-band error signal (eμ (n)) may undergo further
processing before being upsampled or synthesized.
[0011]
The at least one microphone signal (y (n)) is detected by a microphone that is part of a
loudspeaker-in-room-microphone system (LRM). The loudspeaker signal (also referred to as
reference audio signal) (x (n)) is sensed according to the actual impulse response of the LRM
system and hence the echo contribution (d) present in the at least one microphone signal (y (n))
(N)) to increase. The at least one microphone signal (y (n)) also comprises the desired signal, for
example an audio signal of a local speaker, which is enhanced by echo compensation. Instead of
dividing all the microphone signals into sub-band signals to be processed for echo compensation,
it may be preferred to divide only a portion of the microphone signal, for example a portion that
includes only a predetermined frequency range. The impulse response of the loudspeaker-roommicrophone system is estimated / made up by the adaptable filter coefficients of the echo
compensation filtering means used.
[0012]
According to the inventive method, the first loudspeaker sub-band signal consists of a
predetermined number of sub-bands in which both (at least one microphone signal (y (n)) and
loudspeaker signal (x (n)) are divided. Echo signal contribution in the microphone sub-band
04-05-2019
4
signal (yμ (n) for each sub-band)
[0013]
(Eg, in a ring buffer) for subsequent processing to estimate
This estimation is performed on the basis of the first loudspeaker sub-band signal, ie the
estimation is generated using the stored loudspeaker sub-band signal (sampled at the second
down-sampling rate) Is calculated only at the first sampling rate (one of the microphone subband signals). Estimating the echo signal contribution and subtracting the estimated echo signal
contribution from the respective microphone sub-band signal for each sub-band is performed at
a first downsampling rate. According to this, the advantages of both downsampling rate for
higher downsampling rate [microphone signal] (low aliasing term for lower downsampling rate
[reference signal]) and lower computational complexity are exploited obtain.
[0014]
Echo signal contribution
[0015]
Since the step of estimating is only required at the first downsampling rate, the computational
complexity can be reduced by a large amount (compared to the setup that can operate entirely at
the second downsampling rate) .
If the step of estimating the echo signal contribution is performed entirely at the first
downsampling rate, only a very low cancellation quality is achieved due to the large aliasing
terms in the loudspeaker sub-band signal. obtain.
[0016]
The above-mentioned estimation step comprises different sampling rates, loudspeaker sub-band
signals (second down-sampling rate) and other signals, ie microphone sub-band signals (yμ (n)),
estimated echo sub-band signals
04-05-2019
5
[0017]
And for the error sub-band (eμ (n)), the estimation step can achieve good aliasing properties for
sub-band loudspeaker signals with low computational complexity.
[0018]
However, other than the stored first loudspeaker sub-band signal, which is only down-sampled by
the second down-sampling rate below the first down-sampling rate, the step of echo
compensation consists of the microphone sub-band signal (yμ Use the estimated echo signal
contribution downsampled down to the downsampling rate of (n)).
These estimated echo signal contributions (or second filtered loudspeaker sub-band signals) are
used for the generally higher operation involved in the echo compensation step.
Thus, as compared in the art, computer resources are used more effectively and estimation of
echo contribution is made faster with lower memory demands.
[0019]
The second downsampling rate is chosen to ensure that aliasing does not occur (almost), while
the downsampling that occurs next with the third downsampling rate is the second loudspeaker
sub-band The signal may result in presenting some aliasing components. These second
loudspeaker sub-band signals are used for such operations in the step of estimating the impulse
response of the LRM, whereby echo contributions in the microphone sub-band signal are less
sensitive to aliasing Used for signals.
[0020]
According to one embodiment, the echo signal contribution to each sub-band
[0021]
04-05-2019
6
The step of estimating comprises fitting the filter coefficients of the echo contribution filtering
means based on the stored first loudspeaker sub-band signal, but down-sampled by a first downsampling rate.
In other words, only a portion of the stored first loudspeaker sub-band signal is used for the
adaptation step performed at the first downsampling rate.
[0022]
Hence, the adaptation of the filter coefficients of the echo compensation filtering means is the
highest operation in the overall signal processing for echo compensation, but the highest
reasonable downsampling rate (e.g. 128 for the number of 256 subbands) Can save memory and
greatly reduce processor load as compared to the prior art. The third downsampling rate may be
selected, for example, from 2 to 4, 2 or 3 and so on.
[0023]
The filter coefficients of the echo compensation filtering means are, for example,
[0024]
According to H., it can be effectively adapted for each sub-band by a normalized least mean
square algorithm.
The quantity c (n) describes the step size of the fitting process.
[0025]
According to an embodiment of the echo compensation method disclosed herein, at least a
portion of the at least one microphone signal (y (n)) is converted to a microphone sub-band
signal using an analysis filter bank And / or the loudspeaker signal (x (n)) is converted to a
loudspeaker sub-band signal, the analysis filter bank having a first equal to half the number of
sub-bands including, for example, a square rooted Hann window With a downsampling rate, the
sub-band signal (eμ (n)) is upsampled according to a predetermined upsampling rate, preferably
with the first sampling rate described above, and is a synthesis filterbank comprising a square
04-05-2019
7
rooted Han window filter Improved microphone signal by being synthesized by
[0026]
To get
[0027]
The use of square roots of Hane's windows is particularly effective and robust in terms of
stability, and the square root of the Hane window function is easily implemented.
The filter lengths of the analysis and synthesis filterbank may be selected to be identical and
equal to the number of sub-bands into which the at least one microphone signal and the
reference audio signal are divided.
The filter bank of the M parallel filters is a prototype low pass filter h 0 (n) and a modulated band
pass filter
[0028]
May be included. In this case, only one filter need be designed. It should also be noted that a very
effective implementation based on the discrete Fourier transform in the form of a polyphase
technique that provides a nearly flat frequency response is available in this modulation approach.
[0029]
The pure Han window filter (without a square root) of the analysis filter bank is raised to a
predetermined first rational number, in particular to the power of 0.75, and the pure Han
window filter of the synthesis filter bank is calculated to a predetermined second rational number
Experiments show that good results for echo compensation can be achieved, especially when it is
raised to a power of 0.25 so that the sum of the first rational number and the second rational
number becomes 1. The first rational number is preferably chosen to be larger than the second
04-05-2019
8
rational number, as the analysis filterbank influences the quality of the enhanced microphone
signal that is ultimately achieved over the synthesis filterbank.
[0030]
As mentioned above, the error subband signals may be further processed before being
upsampled and combined. For example, the error subband signal (eμ (n)) is further filtered by
noise reduction filtering means and / or residual echo suppression filtering means to further
improve the quality of the processed signal. By means of noise reduction filtering, background
noise that may be present in the microphone signal (y (n)) and hence in the microphone subband signal and the error sub-band signal is suppressed. Some residual echo that may still be
present in the error sub-band signal is suppressed by residual echo suppression filtering means,
as known in the art.
[0031]
The inventive method according to one of the above embodiments can also be applied in the case
of more than one microphone signal. For example, a microphone array may be present in an LRM
system that provides many microphone signals (channels) that are beamformed to improve
signal to noise ratio. For example, a delayed summing beamformer (or any other beamforming
means known in the art) may be used.
[0032]
Thus, in one variation of the above embodiment, some microphone signals (yk (n)), each of which
includes an echo signal contribution due to the loudspeaker signal (x (n)), have a first
downsampling rate Converted to a microphone sub-band signal (yμ, k (n)) downsampled by
[0033]
Is estimated for each microphone sub-band signal (yμ, k (n)) of several microphone signals (yk
(n)) (for each microphone channel).
Each estimated echo signal contribution for each sub-band
04-05-2019
9
[0034]
Of the several microphone signals (yk (n)) by being subtracted from the respective microphone
sub-band signals (yμ, k (n)) of the several microphone signals (yk (n)) By obtaining the error
subband signal (eμ, k (n)) for each and beamforming the error subband signal (eμ, k (n)) for
each of several microphone signals (yk (n)) , Acquire a beamformed error sub-band signal.
[0035]
Echo signal contribution
[0036]
Convolving the first loudspeaker sub-band signal with an estimate for the impulse response of
the loudspeaker-in-room-microphone system in a predetermined number of sub-bands for each
of several microphone signals (yk (n)) , Down-sampling the folded first loudspeaker sub-band
signal by a third down-sampling rate.
[0037]
The invention also includes one or more of the above embodiments for the method of echo
compensation disclosed herein comprising one or more computer readable media having
computer executable instructions. To provide a computer program product to do.
[0038]
The above mentioned problems are also solved by means of signal processing of the echo
compensation of at least one microphone signal (y (n)), including the echo signal contribution (d
(n)) caused by the loudspeaker signal (x (n)) The means is configured to convert at least a portion
of the microphone signal (y (n)) into a microphone sub-band signal (y.mu. (n)) downsampled by
the first downsampling rate , A first analysis filter bank and the loudspeaker signal (x (n)) is
converted to a loudspeaker sub-band signal downsampled by a second downsampling rate less
than a first downsampling rate (r) A second analysis configured to obtain a first loudspeaker subband signal by A filter bank and a memory, in particular a ring buffer, configured to store the
first loudspeaker sub-band signal down-sampled by the second down-sampling rate (r1), the
loudspeaker-room Convoluting the first loudspeaker sub-band signal using an estimation for the
impulse response of the microphone system and down-sampling the convoluted first loudspeaker
sub-band signal according to a third down sampling rate (r2) By means of the first downsampling
rate (r = r1 · r2).
04-05-2019
10
[0039]
And the echo signal contribution
[0040]
An echo compensated microphone sub-band signal by echo compensating the microphone subband signal (yμ (n)) by
[0041]
And echo compensation filtering means configured to obtain
[0042]
The echo compensation by the echo compensation filtering means is performed based on the
stored first loudspeaker signal sampled at the second sampling rate, but at a first velocity, ie one
of the microphone sub-band signals Calculated by only one.
It is the filling of the memory that takes place at a higher speed (r1), and the more extensive
processing takes place at a lower speed.
[0043]
The signal processing means comprises a high microphone signal by upsampling and combining
the echo compensated microphone sub-band signal (eμ (n))
[0044]
Further comprising a synthesis filter bank configured to obtain
Upsampling may be performed by a synthesis filterbank that includes upsampling means that
upsamples at the same rate (by the same factor) as the first downsampling rate.
04-05-2019
11
[0045]
According to one embodiment, the signal processing means are echo-compensated microphone
sub-band signals (eμ (n) to suppress some background noise and / or residual echo contribution
not removed by the echo compensation filtering means And / or noise reduction filtering means
configured to filter B.).
[0046]
Each of the first and second analysis filterbanks and the synthesis filterbank may include a
plurality of square rooted Han window filters.
For the window filters of the first and second analysis filter banks, the first rational number and
the second rational number sum to 1 so that the predetermined first rational number is raised, in
particular 0.75. A Han window and a predetermined second rational number, particularly a 0.25
power, Han filter of the synthesis filter bank may preferably be used.
The second rational number may be selected from those lower than the first rational number.
[0047]
The signal processing means of one of the above embodiments comprises at least a portion of
one of the several microphone signals (yk (n)) or of some of the microphone signals (yk (n)). ,
Microphone sub-band signals (y μ, k down-sampled by a first down-sampling rate (ie, a
predetermined number of microphone sub-band signals (y μ, k (n)) are generated for each
microphone channel) (N)) may comprise the number of first analysis filterbanks, each configured
to convert to: (n), the echo compensation filtering means may for each of the several microphone
signals (yk (n)) The several microphone signals by echo compensating each of the microphone
sub-band signals (y.mu., k (n)) An error sub-band signal (eμ, k (n)) for each of the microphone
signals (yk (n)) configured to obtain an error sub-band signal (eμ, k (n)) for each of (yk (n)) The
method further includes beam forming means configured to obtain a beam formed error subband signal by beam forming B.
[0048]
The beamforming means may be a delay-and-sum beamformer or a universal sidelobe canceller.
04-05-2019
12
A general purpose sidelobe canceller consists of two signal processing paths, which have a first
(or lower) adaptive path with blocking matrix and adaptive noise canceling means, and a second
(or higher) fixed beam Non-conforming path with a former, for example, Griffiths, L. et al. J.
およびJim、C.W.
, "An alternative approach to linear constrained optical beamforming", 30th edition, IEEE
Transaction on Antennas and Propagation, 1982, p. See 27.
[0049]
The above embodiments of the signal processing means of the present invention may be
advantageously incorporated into a system for electrically mediated communication and
automated speech recognition.
Thus, a hands-free telephone system and speech recognition means are provided, each
comprising signal processing means according to one of the above embodiments.
Furthermore, a speech dialog system or a voice control system is provided, comprising such
speech recognition means.
[0050]
Furthermore, the invention provides a vehicle communication system, which system comprises at
least one microphone, in particular a microphone array which may comprise one or more
directional microphones, at least one loudspeaker and the signal processing means described
above. Or includes the hands-free telephone system described above.
[0051]
Additional features and advantages of the present invention will be described with reference to
04-05-2019
13
the drawings.
In the description, reference is made to the accompanying drawings, which are intended to
illustrate preferred embodiments of the invention.
It is to be understood that such embodiments do not represent the full scope of the present
invention.
[0052]
The present invention further provides the following means.
[0053]
(Item 1) At least one microphone signal (y (n)) including an echo signal contribution (d (n))
caused by a loudspeaker signal (x (n)) in a loudspeaker-room-microphone system is echo
compensated The method comprises: at least a portion of the at least one microphone signal (y
(n)) being downsampled into a microphone sub-band signal (y.mu. (n) at a first downsampling
rate (r). Converting the loudspeaker signal (x (n)) to a second downsampling rate (r1) below said
first downsampling rate (r)) The first loudspeaker sub-band signal by converting to And storing
the loudspeaker sub-band signal down-sampled by the second down-sampling rate (r1), and
estimating for the impulse response of the loudspeaker-room-microphone system in a
predetermined number of sub-bands. Consolidating the first loudspeaker sub-band signal using
the step: and down-sampling the convoluted first loudspeaker sub-band signal at a third downsampling rate (r2). An echo signal contribution that is virtually downsampled by the first
downsampling rate (r) for each of a number of sub-bands
[0054]
Obtaining the estimated echo signal contribution from the respective microphone sub-band
signal (yμ (n)) for each sub-band.
[0055]
Obtaining an error subband signal (eμ (n)) by subtracting.
[0056]
(Item 2) The above echo signal contribution
04-05-2019
14
[0057]
The step of obtaining c. Includes adapting the filter coefficients of the echo compensation
filtering means based on the stored first loudspeaker sub-band signal at a rate equal to the first
downsampling rate (r). , Method according to item 1.
[0058]
3. The method of claim 2, wherein the filter coefficients of the echo compensation filtering means
are adapted by a normalized least mean square algorithm.
[0059]
(Item 4) The method according to Item 2 or 3, wherein the third downsampling rate (r2) is
selected from the range of 2 to 4.
[0060]
(Item 5) At least a portion of the at least one microphone signal (y (n)) is converted to a
microphone sub-band signal (yμ (n)) using an analysis filter bank including a square rooted Han
window filter And / or the loudspeaker signal (x (n)) is converted to a loudspeaker sub-band
signal, the sub-band signal (e.mu. (n)) is upsampled at a predetermined upsampling rate and The
above improved microphone signal by being synthesized by a synthesis filter bank including
Hann window
[0061]
The method according to one of items 1 to 4, obtaining
[0062]
6. The at least a portion of the at least one microphone signal (y (n)) is converted to a
microphone sub-band signal (yμ (n)) using an analysis filter bank that includes a Hann window
filter, and And / or the loudspeaker signal (x (n)) is converted to a loudspeaker sub-band signal
and the sub-band signal (eμ (n)) is upsampled at a predetermined upsampling rate and includes
a Hann window filter The above improved microphone signal by being synthesized by a synthesis
filter bank
[0063]
04-05-2019
15
The Han window filter of the analysis filter bank is raised to a predetermined first rational
number, in particular raised to 0.75, and the Han window filter of the synthesis filter bank is
raised to a second rational number, in particular The method according to one of items 1 to 4,
wherein the sum of the first rational number and the second rational number is 1, which results
in being raised to 0.25.
[0064]
(Item 7) The method according to any one of items 1 to 6, further comprising filtering the error
subband signal (eμ (n)) by noise reduction filtering means and / or residual echo suppression
filtering means.
[0065]
(Item 8) Each of several microphone signals (yk (n)) including echo signal contribution caused by
the loudspeaker signal (x (n)) is downsampled by the first downsampling rate (r) Converted to the
microphone sub-band signal (yμ, k (n))
[0066]
Are acquired for each of the microphone sub-band signals (yμ, k (n)) of the several microphone
signals (yk (n)) based on the first and second loudspeaker sub-band signals For each sub-band,
the respective estimated echo signal contribution
[0067]
Is subtracted from the respective microphone sub-band signal (yμ, k (n)) of each of the several
microphone signals (yk (n)) to obtain the several microphone signals (yk (n)). Obtaining the error
sub-band signal (eμ, k (n)) for each of the)), and the method includes: the error sub-band signal
(eμ, k (n) for each of the several microphone signals (yk (n)) 8. A method according to one of
items 1 to 7, further comprising acquiring a beamformed error sub-band signal by beamforming
A).
[0068]
9. A computer program product comprising one or more computer readable media having
computer executable instructions for performing the above steps of the method according to one
of items 1 to 8.
[0069]
04-05-2019
16
(Item 10) Echo compensation of at least one microphone signal (y (n)) including an echo signal
contribution (d (n)) caused by the loudspeaker signal (x (n)) in the loudspeaker-in-roommicrophone system Means for processing at least a portion of the at least one microphone signal
(y (n)) downsampled by a first downsampling rate (r). A first analysis filter bank (12, 12 ′)
configured to convert to (yμ (n)) and a second of the loudspeaker signals (x (n)) below the first
downsampling rate Convert to loudspeaker sub-band signal downsampled by the downsampling
rate (r1) of And a second analysis filter bank (15, 15 ′) configured to obtain a first loudspeaker
sub-band signal, and down-sampled by the second down-sampling rate (r1), A memory, in
particular a ring buffer, configured to store the first loudspeaker sub-band signal; and the first
loudspeaker sub-band signal using an estimation for the impulse response of the loudspeaker-inroom-microphone system The echo signal down-sampled by the first down-sampling rate (r) by
convolving the first loudspeaker sub-band signal by convolving the first loudspeaker sub-band
signal by the third down-sampling rate (r2) contribution
[0070]
And the echo signal contribution
[0071]
Echo compensation filtering means (17, 17) configured to obtain an echo compensated
microphone sub band signal (e μ (n)) by echo compensating the microphone sub band signal (y
μ (n)) according to Signal processing means including ') and.
[0072]
(Item 11) A high microphone signal by upsampling and combining the above echo-compensated
microphone sub-band signal (eμ (n))
[0073]
10. A signal processing means according to item 10, further comprising a synthesis filter bank
(19) configured to obtain.
[0074]
12. A residual echo suppression filtering means (23) and / or a noise reduction filtering means
(23) configured to filter echo compensated microphone sub-band signals (eμ (n)), item 10 Or the
signal processing means as described in 11.
04-05-2019
17
[0075]
(Item 13) Each of the first and second analysis filter banks (12, 12 ', 15, 15') and the synthesis
filter bank (19) includes a plurality of square rooted Han window filters. Signal processing means
according to one of 12.
[0076]
(Item 14) Each of the first and second analysis filter banks (12, 12 ', 15, 15') and the synthesis
filter bank (19) includes a plurality of Hane window filters, The Han window filters of the two
analysis filter banks (12, 12 ', 15, 15') are raised to a predetermined first rational number, in
particular raised to 0.75, and the Han windows of the synthesis filter bank (19) The filter is
raised to a predetermined second rational number, in particular 0.25, so that the sum of the first
rational number and the second rational number is 1, among items 10 to 12 Signal processing
means according to item 1.
[0077]
(Item 15) A microphone sub-band signal (yμ, k (n) downsampled by a first downsampling rate (r)
with at least a portion of one of several microphone signals (yk (n)) And a plurality of first
analysis filter banks (15 '), each configured to convert to.), Said echo compensation filtering
means (17, 17') error subband signals (eμ, k) for each of the several microphone signals (yk (n))
by echo compensating each of the microphone subband signals (yμ, k (n)) in n)) (N)) configured
to obtain the error for each of the microphone signals (yk (n)) Item 10-14, further comprising
beam forming means (22) configured to obtain a beamformed error sub-band signal by beam
forming a subband signal (eμ, k (n)). The signal processing means according to claim 1.
[0078]
(Item 16) The signal processing means according to item 15, wherein the beam forming means
(22) is a delay and add method beamformer or a general purpose side lobe canceller.
[0079]
(Item 17) A hands-free telephone system including the above-mentioned signal processing means
according to one of items 10 or 16.
[0080]
(Item 18) A speech recognition means comprising the above-mentioned signal processing means
according to one of items 10 or 16.
04-05-2019
18
[0081]
(Item 19) A voice dialog system or a voice control system including the voice recognition means
described in item 18.
[0082]
20. At least one microphone, in particular a microphone array, at least one loudspeaker, and the
signal processing means according to one of items 10 to 16, or the handsfree telephone
according to item 17. Vehicle communication system, including systems.
[0083]
The present invention relates to a method of echo compensating at least one microphone signal,
including echo signal contributions due to loudspeaker signals in a loudspeaker-roommicrophone system, said method comprising the steps of: Converting a portion into a
microphone sub-band signal downsampled by a first downsampling rate; and a loudspeaker
downsampled by a second downsampling rate less than the first downsampling rate Obtaining a
first loudspeaker sub-band signal by converting it to a loudspeaker sub-band signal; and the
loudspeaker down-sampled by a second down-sampling rate Folding the first loudspeaker subband signal using the steps of storing the sub-band signal and using an estimate for the impulse
response of the loudspeaker room-microphone system in a predetermined number of sub-bands;
An echo signal contribution that is substantially downsampled by the first downsampling rate for
each of the predetermined number of sub-bands by downsampling one loudspeaker sub-band
signal by the third downsampling rate Obtaining an error sub-band signal by subtracting the
estimated echo signal contribution from the respective microphone sub-band signal for each subband.
[0084]
The basic steps of the method for echo compensation of a microphone signal, as disclosed herein,
are shown in FIG.
In step 1, the microphone signal is divided into sub-band signals and down-sampled by some
down-sampling factor r = r1 · r2.
In the first stage of the two-stage downsampling process of the reference audio signal, the
04-05-2019
19
reference audio signal is downsampled in step 2 by the downsampling factor r1.
The reference audio signal represents the audio signal received from the remote communication
party and is input to the near-end loudspeaker.
The corresponding signal output by the loudspeakers is corrected due to the impulse response of
the near end loudspeaker-room-microphone (LRM) system and is detected by the LRM's
microphone.
[0085]
The so downsampled sub-band signal of the reference audio signal is stored in the ring buffer in
step 3.
The first downsampling is performed using a downsampling rate r1, which ensures that aliasing
is suppressed sufficiently.
Next, a second downsampling by downsampling factor r2 is performed in step 4 to reach
downsampling rate r = r1 · r2 corresponding to one of the downsampled microphone sub-band
signals .
The adaptation of the filter coefficients of the filtering means to the echo compensation and
thereby the estimation of the echo present in the microphone sub-band is performed in step 5 at
this relatively high downsampling rate r = r1 · r2.
[0086]
At this downsampling rate r = r1 · r2, the estimated echo is subtracted from the microphone subsignal to obtain an enhanced microphone sub-signal.
These enhanced microphone sub-signals are then combined to obtain an enhanced audio signal
04-05-2019
20
that can be transmitted to the remote communication party.
[0087]
In an expensive processing step of adapting the filter coefficients of the employed echo
compensation filtering means of the above-described embodiment of the method of the
invention, convolving the filter coefficients with the appropriately down-sampled reference subband signal The downsampling rate r = r1 · r2 may be performed on the signal downsampled by
the rate r = r1 · r2, which is the downsampling rate used in the prior art for the process of subband signal generation of the reference signal Faster than
Indeed, for example, a downsampling rate of r = r1 · r2 = 128 combined with the total number of
M = 256 sub-bands may still be used for satisfactory echo compensation.
[0088]
FIG. 2 shows an example of the signal processing means disclosed herein used to improve the
quality of the microphone signal y (n), where n represents the discrete time index.
The microphone signal y (n) is acquired by a microphone that is part of the LRM system 10.
The microphone detects the audio signal s (n) of the local speaker and the contribution d (n) to
the echo, which contribution follows the loudspeaker, or the actual LRM impulse response h (n)
Due to the reference audio signal x (n) detected by the microphone after correction.
[0089]
The microphone signal y (n) is
[0090]
The sub-band μ = 0,. . , M−1, where the superscript index T indicates transposition operation,
04-05-2019
21
and Nana indicates filter length.
The subband signals are downsampled by the downsampling means using a downsampling factor
of r = r1 · r2 with integers r1 and r2.
The resulting downsampled microphone sub-band signal yμ (n) is further processed for echo
compensation.
[0091]
The reference audio signal x (n) is also input to the analysis filterbank 15 for echo compensation
of the sub-band signal yμ (n) of the downsampled microphone.
According to this embodiment, the reference audio signal x (n) is filtered by the filtering means
(13) with filter coefficients gμ, ana as used for the microphone signal y (n) Band signal, the
reference audio signal is downsampled by the downsampling means 16 with a downsampling
factor of r1, for example r1 = 64, for a number of M = 256 sub-bands (sampling rate of the
microphone signal is , For example 11025 Hz).
[0092]
In principle, the adaptation of the filter coefficients of the echo compensation filtering means 17
used for echo compensation can be performed after the first downsampling by the downsampling
factor of r1.
However, according to this embodiment of the invention, the sub-band signal down-sampled by
r1 is stored in a ring buffer (not shown) and then the adaptation of the filter coefficients and the
actual echo compensation is down-sampled r2. It is performed after the second downsampling by
a factor, eg, r2 = 2 for the selection of r1 = 64 for the number of M = 256 sub-bands.
In particular, the most expensive operation to the overall echo compensation is performed on the
signal downsampled by r = r1 · r2 as an adaptation of the filter coefficients, a very effective
04-05-2019
22
reduction of the processor load and the overall Provides acceleration of signal processing.
[0093]
In the frequency (Ω) domain, the analysis filter bank 15 outputs a sub-band signal (short-time
spectrum).
[0094]
These short time spectra are echo compensated by the echo compensation filtering means 17.
[0095]
, The filter coefficients (in the frequency domain) of the echo compensation filtering means 17
[0096]
Obtain using.
coefficient
[0097]
は、LRM
[0098]
Represents a temporarily fitted estimate for the corresponding impulse response (following the
factor of h (n) in the time domain).
[0099]
For μ = 0, the aliasing term of the analysis filter bank is
[0100]
04-05-2019
23
Can be erased for selection.
Here, all sub-bands M have the same sub-bandwidth.
μ = 1,. . , M = 1, and other filters Gμ, ana (e <jΩ>) can be derived from the filters described
above for the sub-band μ = 0 by simple frequency shift operation.
Therefore, only one filter needs to be designed.
[0101]
Subband estimation thus obtained for the echo contribution d (n) detected by the LRM
microphone and hence present in the microphone signal y (n)
[0102]
, Subtract from the downsampled microphone sub-band signal yμ (n) to obtain the sub-band
error signal eμ (n).
Subband estimation
[0103]
The estimates for are generated using stored loudspeaker signals (sampled at a second sampling
rate), but they are only at a first velocity (one of the microphone sub-band signals) It must be
emphasized that it is calculated.
It may be preferable to filter the sub-band error signal eμ (n) in order to reduce background
noise and residual echo that is also always present in the microphone signal y (n).
[0104]
04-05-2019
24
As shown in FIG. 2, the sub-band error signal e.mu. (n) has high pass, band pass and low pass
with a synthesis filter bank 19 including an upsampling means 20 having an upsampling factor
of r = r1.r2. The imaging terms are removed as is known in the art by being input to the filtering
means 21 which includes the passage.
Resulting synthesized speech signal
[0105]
Is characterized by a clearly reduced acoustic echo.
[0106]
FIG. 3 illustrates the capture of echo compensation according to the invention in a
communication system comprising a directional microphone and a microphone array comprising
beam forming means 22.
A plurality of microphone signals Yk (n) are obtained from the microphone array.
Each microphone channel k of the microphone array is connected with a respective analysis filter
bank 12 ', which operates as described above with reference to FIG.
[0107]
Thus, the echo compensation filtering means 17 'provide a filter for each microphone channel
[0108]
Downsampled estimates for each channel's echo contribution, including
[0109]
04-05-2019
25
Is subtracted from the microphone sub-band signal yμ, k (n).
By this, the error signal eμ, k (n) inputted to the beam forming means 22 is obtained.
Estimate
[0110]
Filter the sub-bands obtained from the reference audio signal x (n) by the analysis filter bank 15 '
[0111]
Obtained by folding in
Convolving these coefficients with the filter coefficients of the echo compensation filtering means
17 'and the sub-band reference signal results in a downsampling rate similar to the
downsampling rate of the analysis filterbank 12' receiving the microphone signal yk (n). Again
(see also description above).
[0112]
The multi-channel system of this example may utilize adaptive or non-adaptive beamforming, for
example, "Optimum Array Processing, Part IV of Detection, Estimation, and Modulation Theory"
(H. L. van Trees, Wiley & Sons, See New York, 2002).
The beamforming means 22 combine the error signals e.mu., k (n) for the microphone channel to
obtain a beamformed sub-band signal which suppresses the remaining echo and is known in the
art. The noise reduction is input to the filtering means 23, which improves the quality of the
beamformed sub-band signal.
[0113]
04-05-2019
26
The filtering means 23 is, for example,
[0114]
A Wiener filter may be provided to reduce background noise according to the filter
characteristics in the frequency domain given by
[0115]
および
[0116]
Denote the estimated short time power density of the background noise and the short time power
density of the (full band) error signal, respectively.
[0117]
Improved sub-band signal
[0118]
Are input to a synthesis filter bank similar to that described with reference to FIG.
After the upsampling by r = r1 · r2 performed by the upsampling means 20 and the filtering by
the filtering means 21 including high pass, band pass and low pass to remove the imaging terms,
the result Synthetically synthesized speech signal
[0119]
Is acquired.
[0120]
It should be understood that some or all of the features described above may also be combined in
different ways.
04-05-2019
27
[0121]
FIG. 1 is a flow chart illustrating the essential steps of an embodiment of the inventive method of
echo compensating a microphone signal, comprising two stages of downsampling of a reference
audio signal.
FIG. 2 illustrates an embodiment of signal processing means according to the invention, wherein
the reference audio signal is downsampled and filtered by echo compensation filtering means.
FIG. 3 illustrates a further embodiment of signal processing means according to the invention,
comprising a microphone array and beam forming means.
Explanation of sign
[0122]
DESCRIPTION OF SYMBOLS 10 LRM system 12 1st analysis filter bank 15 2nd analysis filter
bank 17 Echo compensation filtering means 20 Upsampling means 21 Filtering means
04-05-2019
28
Документ
Категория
Без категории
Просмотров
0
Размер файла
41 Кб
Теги
jp2008178087
1/--страниц
Пожаловаться на содержимое документа