вход по аккаунту



код для вставкиСкачать
Patent Translate
Powered by EPO and Google
This translation is machine-generated. It cannot be guaranteed that it is intelligible, accurate,
complete, reliable or fit for specific purposes. Critical decisions, such as commercially relevant or
financial decisions, should not be based on machine-translation output.
BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a
method of reducing directional noise in speech and a device for implementing the method.
2. Description of the Related Art Known methods for reducing noise in speech use linear filtering.
FIG. 1 shows a standard apparatus for the implementation of such a method. This device
essentially comprises a noisy or noisy signal (voice signal) source connected to the + input of the
subtractor 2. The noise-only speech source 3 is connected to the subtraction input of the
subtractor 2 via the adaptive filter 4. The output of the subtractor 2 constitutes the output of the
noise reduction device, which is furthermore connected to the control input of the filter 4 in
order to send the residual error signal ε.
The speech source 3 constitutes a noise model in the sense of a fixed reference, for example an
average least squares reference, which noise is adaptively subtracted from the noisy signal. The
operating principle of this device depends on the assumption that the useful signal s, the noise
n0 affecting this signal, the noise model n1 and the output signal y of the filter 4 are in steady
state, and further between s and n0 Has no correlation, and relies on the assumption that a high
degree of correlation exists between n0 and n1.
The output signal is equal to: ε = s + n 0 −y, ie: ε 2 = s 2 + (n 0 −y) 2 + 2s (n 0 −y), and the
power values are as follows:
Since the output signal is not affected by adaptive filtering, it is as follows:
The output of the filter 4 is adjusted so as to minimize Emin.
This minimization of the total output leads to a reduction of the noise output and thus a
maximization of the S / N ratio.
In the best case, we get:
In other words, if the signal of audio source 3 is not uncorrelated with the signal of audio source
1, then:
Due to the minimization of the output, the adaptive weight of the filter 4 approaches zero, which
causes E [y2] to approach zero.
This method is well known to those skilled in the art.
The adaptive filter 4 is of the LMS (mean least squares) type for convenience and the other is the
RLS (recursive least squares) type.
The main disadvantage of this known method is that it is paramount to make the noise source 3
physically available.
The audio source may include variable proportions of signals that do not have noise-only
features. The performance of the noise reduction method is then considerably influenced by this
fact. This fact is not shown here, but is shown by standard theoretical calculations.
The first possible way to overcome this drawback is to use "frequency diversity". This solution
processes the noisy signal by means of DFT (Discrete Fourier Transform) and, based on its output
value, uses inverse discrete Fourier transform of this output value to generate the signal y to be
subtracted therefrom It consists essentially of things. This processing operation consists of
dividing the useful signal containing noise into separate sub-bands, eg by Fourier analysis, and
then processing each sub-band individually to increase the size of the vector space of
observations . This type of segmentation operation can not be used for speech processing as it is
known that speech signals are not stationary in frequency and do not statistically occupy a
unified frequency band [e.g. voiced structures].
Another way is to use temporary diversity. This method is also unusable. The reason is that
steady state voice transmission is physically unrealistic. It is only possible to observe the degree
of steady state in dozens of frames of 25.6 ms (corresponding to 256 points of the signal
sampled at 10 kHz) relative to the center of stable voice Instead, this steady state will drop to a
period of 1-3 frames for plosives (sounds such as "t").
The third method is a "spatial diversity" method in which several signal tapping (vector
information tapping) points are distributed in space. Then filtering is performed as schematically
shown in FIG.
In front of the speaker, for example, a set 5 of (L + 1) microphones which can be arranged at
equal intervals, and these output signals are put into x0, x1. . . Add xL and reference numbers.
After each of these microphones, a narrowband adaptive filter is placed, and the entire filter set
is referenced at 6, which are respectively W0, W1. . . Reference by WL. Their outputs are
connected to the accumulator 7, which outputs constitute the output of the device.
Xk points to any of the input vectors, WKT points to the transposed vector of weights applied to
the filter, and gk to the output scalar.
It becomes as follows.
At a certain moment (for example, as determined by the sample and hold operation), L input
signals are available.
Voice transmission affects the entire output signal of the microphone 5 and the difference
between these signals is mainly due to the difference in propagation time between the
loudspeakers and the individual microphones.
In a manner known per se, the spatial processing operations form the antenna by forming a
normal channel (generally by linear combination of the microphone signals), and thus by the
phase shift of the directional lobe of the thus formed antenna (or pure Delay)). The limitations
stated above also apply to the other known methods.
The object of the present invention is a method for the reduction of directional noise in speech
that can be used to obtain a reference noise source that shows the greatest correlation with noise
affecting noisy speech, which is Whatever the content of the voice of the voice, including V., can
be achieved without particular limitations on the frequency or duration of the voice signal.
The object of the invention is furthermore a simple and inexpensive device for the
implementation of this method.
SUMMARY OF THE INVENTION The method of the present invention implements directional
sound collection with at least four microphones equally spaced in front of the audio source of the
signal to be processed, the audio source direction Form a linear combination of the addition and
subtraction of the signals from the microphones so that the main sensitivity lobe is obtained, and
the sum of all of the output signals of the microphones represents a noisy signal, each of the
other combinations being a signal The invention comprises the same number of subtractions as
additions and is used only as a noise source and further comprises an estimate of the noise to be
subtracted from the noisy signal which is processed with a directional adaptive filter It should be
construed as a non-limiting example and will become more apparent from the following detailed
description of the embodiments illustrated in the accompanying drawings.
The present invention will be described below for the reduction of noise in the emission of the
speaker's voice placed in an environment under noise, but is not limited to such an application
and is assumed to be a point source, an environment under noise And can be implemented for
the reduction of the noise of the useful signal from any mobile voice.
In this example, two sets of microphones are placed in front of the speaker at a normal distance
(for example, a distance of 1 to 4 m) from the speaker.
Each of these two sets comprises, for example, eight microphones arranged in a straight line and
equally spaced from one another.
The distance between adjacent microphones is, for example, several cm to about 10 cm. For
example, if the spacing D = 10 cm, then the maximum frequency to be processed is fmax = C /
2d, which is about 1.7 kHz, where c is the speed of sound in air. These two sets are arranged, for
example, in a crossing manner. However, the number of microphones and their arrangement can
vary. The number of these microphones can be, for example, a multiple of 4 to 4 for each set, and
can be arranged, for example, in the form of a "V" or a circle. Preferably, these two sets are
The eight microphones of each of these two sets are represented by M1 to M8. Their output
signals are coupled to one another by eight different combinations. One of these combinations
(row 1 of the table of FIG. 3) provides the generated signal S1, which is the combination
corresponding to the summation of all these signals. Each of the other seven combinations
includes four additions and four subtractions, providing respectively generated signals S2-S8.
Thus, for example, the combination in the second row of the table of FIG. 3 corresponds to
subtracting the M5 to M8 signals from the addition of the M1 to M4 signals, and thus S2 = (M1 +
M2 + M3 + M4)-(M5 + M6 + M7 + M8) It becomes.
It should be noted that these eight combinations are orthogonal (ie, they form part of the
cardinality of an orthonormal function such as Haarde composition base). Signal S1 corresponds
to the main lobe of the directivity of the standard channel. The other signals S2 to S8 can be
considered to be generated as "noise only" because the main lobe direction is zero. FIG. 4 is
formed between the planes perpendicular to the plane of the two sets of microphones and the
different directions of measurement of the directional lobes at the different outputs S1 to S8 as
defined above for each of the two sets of microphones Fig. 6 shows a graph showing an arbitrary
value U as a function of the angle of incidence. The value of the X axis is scaled by the value of λ
/ D, where λ is the speed of sound and d is the distance between the microphones.
In order to simplify the explanation, it is assumed that the used microphones are omnidirectional.
If a graph of directivity is taken into consideration, the explanation given here is, of course, valid
on the condition that the microphones are all identical or the difference is compensated (by
complex weighting in output level, phase) .
In order to practice the method of the present invention, the microphone sets are first all directed
to the audio source (speaker). In this direction, for example, a wide-band super-resolving method
for determining the elevation angle and relative orientation of the sound source such as AR,
Capon, Music, etc. Any known suitable method can be used.
It should be noted that the method of the present invention does not require calculations for the
formation of "noise only". The combinations described above with reference to the table of FIG. 3
actually include simple serialization operations ("+" combinations) and the use of knot circuits
(for "-" combinations).
The output signals from all the microphones are sampled simultaneously. That is, all signals
taken at the same instant are available. Thus, a steady state of noise affecting all of these signals
is ensured.
From a spatial point of view, there is an ambiguity of right or left relative to the speaker
(microphone lobes are symmetrical about the center of the array, which is the same as for the
antenna array image lobes) However, this ambiguity does not affect the operation of the noise
reduction device. That is because the signal generated is firstly the sum signal of all the
microphone signals and secondly the sum signal of the "noise" channel. The position of the
speaker is determined separately from this ambiguity. The weights of the various filters can be
calculated by one of the super-resolution methods shown in the following table:
In this table: W is the vector corresponding to the filter weights, D (θ) and D + (θ) are the
normal vector and its conjugate transpose, respectively, and Γ x is the spectrum Inter-matrix,
where Γ x + and Γ -1 are conjugate transpose matrices of Γ x and the inverse (or pseudoinverse) of Γ x respectively, and the parameters λ k and V refer to the smallest eigenvalue of Γ
x and the associated eigenvector, E is a vector composed of the output of the first component of
noise (the other components are equal to zero).
The general term “super-resolution” is simply a distance less than λ / L (L is the total length
of the physical antenna formed by a series of eight aligned microphones and λ is the
wavelength) It shows that the corresponding method can resolve (distinguish) two sound sources
that are apart.
The processing of the various channels is broadband processing.
The covariance matrix is an interspectral matrix (a matrix of correlations of observation vectors).
The 2D array of cross-aligned microphones allows the determination of the elevation and relative
orientation of the sound source through detection of the maximum value of the following
This makes it possible to servolink the array to the position of the loudspeaker. This servo link is
accomplished by electronic pointing of the array of microphones (similar to a tracking radar
antenna) by changing the phase shift applied to their respective signals.
When this pointing is performed, binary conversion (FIG. 3) is applied to the signal coming from
the microphone. This leads first to a noisy signal channel and secondly to 7 "noise only" channels
(in the case of 8 microphones). Vector processing (for example of LMS or RLS type) allows
filtering of this calculated value of the noise. Subtracting this calculated value from the abovementioned noise-containing signal makes it possible to perform an adaptive noise reduction
The mean least squares method of vector processing and, similarly, the search for optimal
filtering are generally performed in the iterative gradient-search method.
When sampling the signal from the microphone with a period Te where s (k) = s | kTe |, the
adaptive algorithm of the deterministic slope is as follows:
In these equations, μ refers to an incremental step of the algorithm, which is a positive constant.
Depending on the value of this step is the speed of convergence, residual variance and stringency
of the algorithm.
An algorithm that is advantageous for performing vector processing operations is the stochastic
gradient algorithm, also referred to as the LMS (LMS) algorithm. This algorithm utilizes a
probabilistic estimate of the slope such that the filtering vector is adjusted with
Since the speech signal is considered in the baseband without the need for complex
demodulation, the equations included here are real equations.
The LMS algorithm allows the minimization of the square root of the mean squared error of the
adaptive device to a reference ("noise only") signal whose value is known at each sampling time.
This algorithm is, on average, the optimal solution:
It is possible to obtain However, with a variance proportional to the incremental step μ and the
output of the noise collected by the microphone. This algorithm is advantageous because it
exhibits high stability, but it is of course also possible to use other algorithms, for example an
inductive least squares algorithm with a high convergence rate. However, a compromise must be
found between the convergence speed and the residual variance of the estimation error.
FIG. 6 is a schematic view of an example of an apparatus for carrying out the method of the
present invention.
Reference numeral 8 in this figure indicates eight microphones M'1 to M'8 arranged at a set of
equal intervals.
The other set 9 indicated by a dashed line is arranged crosswise to the first set and is coplanar
therewith. The connection of this second set of microphones is identical to that of the set 8
microphones up to the mounting device and is shown for clarity of the drawing.
After each of the set 8 microphones, a preamplifier (the preamplifier set is referenced 10), a
sampling device (the sampling set is represented 11), a non-regressive delay line filter (the set of
such filters is referenced 12) Continue. The outputs of the microphones M ′ 2, M ′ 4, M ′ 6
and M ′ 8 are connected to the accumulator 13. The outputs of the filters of the set 12
associated with the microphones M'3 to M'6 are connected to the other accumulator 14.
The output of the accumulator 13 is connected to the first input of the accumulator 16 via a
compensation filter 15 (for example of the FIR type) and the output of the accumulator 14 is
identical to the filter 15 via the filter 17 Connected to 16 second inputs.
The output of the accumulator 16 is connected to the device 18 for the stationary of the
This device 18 for servo control in elevation and relative orientation of the two microphone sets
8, 9 is of the super-division type, which works according to one of the above principles.
The sampling devices 11 of the microphones M'1 to M'8 are furthermore connected to a binary
conversion coupling device 19 of the type described with reference to FIG. The “main lobe”
output of coupling device 19 (coupling including only addition) is connected to the (+) input of
subtractor 20. The other seven "noise only" outputs are connected through adaptive filter 21 to
the input (-) of subtractor 20. The filter 21 is, for example, a delay line filter, the coefficients of
which are recomputed for each speech frame to minimize noise at the output of the subtractor
20 (e.g. 1 frame = 25.6 ms).
Two sets of microphones 8, 9 perform dual spatial sampling of the speaker's voice emission. At
the output of the accumulators 13, 14, two physical antennas are formed having two identical
main lobes and two frequencies separated by one octave. The compensation filters 15, 17 correct
the dispersion between the frequencies at the endpoints of the received acoustic spectrum. The
signals coming from the two filters 15, 17 are summed at 16. Therefore, at 16 outputs, constant
directional lobes (corresponding to the main lobes of sets 8 and 9) are obtained. The normal to
the plane formed by sets 8 and 9 (microphones are simulated to points on the same plane)
passes the intersection C of sets 8 and 9 and is a loudspeaker (also regarded as a point) This is
true when being turned towards. Otherwise, at least one of the two main lobes is offset with
respect to the center (C) of the corresponding directivity pattern, and the offset signal of 16 is
affected thereby (output loss). The servo control / stationary device 18 then orients the plane of
the microphone with respect to elevation and relative orientation in a known manner to recover
the maximum value of the functional at the output 16.
FIG. 5 shows a schematic view of one representative example of the broadband adaptive filter 21
of FIG. For each of the (L-1) channels of the microphone lophone of x0 to xL, this filter comprises
elementary filters, each represented by 220 to 22L. Each of these elementary filters comprises, in
known manner, k step processing blocks. Each processing block is a delay cell Z-1 (e.g. in the
form of a buffer register), and for convolution cells Wi, j [for channels 220 to 22L, i is 0 to L, and
j is 0 for each channel ( k-1) is essentially included. The first processing block (W0,0 to W * L, 0)
has no delay cell. The outputs of channels 220-22 L are connected to parallel adder 23. The
weights of the various processing blocks (coefficients affecting the delayed discrete values
processed by each processing block) are servo linked to the output signal of the adder 20.
Since the present invention is configured as described above, directivity in speech that can be
used to obtain a reference noise source that exhibits the greatest correlation with noise affecting
noisy speech A method is provided for noise reduction, which can be achieved without any
particular restriction on the frequency or duration of the speech signal whatever the content of
the voice of the noisy speech.
Furthermore, a simple and inexpensive apparatus for the implementation of this method is
Без категории
Размер файла
24 Кб
Пожаловаться на содержимое документа