Patent Translate Powered by EPO and Google Notice This translation is machine-generated. It cannot be guaranteed that it is intelligible, accurate, complete, reliable or fit for specific purposes. Critical decisions, such as commercially relevant or financial decisions, should not be based on machine-translation output. DESCRIPTION JPH06245291 [0001] BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a method of reducing directional noise in speech and a device for implementing the method. [0002] 2. Description of the Related Art Known methods for reducing noise in speech use linear filtering. FIG. 1 shows a standard apparatus for the implementation of such a method. This device essentially comprises a noisy or noisy signal (voice signal) source connected to the + input of the subtractor 2. The noise-only speech source 3 is connected to the subtraction input of the subtractor 2 via the adaptive filter 4. The output of the subtractor 2 constitutes the output of the noise reduction device, which is furthermore connected to the control input of the filter 4 in order to send the residual error signal ε. [0003] The speech source 3 constitutes a noise model in the sense of a fixed reference, for example an average least squares reference, which noise is adaptively subtracted from the noisy signal. The operating principle of this device depends on the assumption that the useful signal s, the noise n0 affecting this signal, the noise model n1 and the output signal y of the filter 4 are in steady state, and further between s and n0 Has no correlation, and relies on the assumption that a high 03-05-2019 1 degree of correlation exists between n0 and n1. [0004] The output signal is equal to: ε = s + n 0 −y, ie: ε 2 = s 2 + (n 0 −y) 2 + 2s (n 0 −y), and the power values are as follows: [0006] Since the output signal is not affected by adaptive filtering, it is as follows: [0008] The output of the filter 4 is adjusted so as to minimize Emin. This minimization of the total output leads to a reduction of the noise output and thus a maximization of the S / N ratio. [0009] In the best case, we get: [0011] In other words, if the signal of audio source 3 is not uncorrelated with the signal of audio source 1, then: [0013] Due to the minimization of the output, the adaptive weight of the filter 4 approaches zero, which causes E [y2] to approach zero. This method is well known to those skilled in the art. 03-05-2019 2 The adaptive filter 4 is of the LMS (mean least squares) type for convenience and the other is the RLS (recursive least squares) type. [0014] The main disadvantage of this known method is that it is paramount to make the noise source 3 physically available. The audio source may include variable proportions of signals that do not have noise-only features. The performance of the noise reduction method is then considerably influenced by this fact. This fact is not shown here, but is shown by standard theoretical calculations. [0015] The first possible way to overcome this drawback is to use "frequency diversity". This solution processes the noisy signal by means of DFT (Discrete Fourier Transform) and, based on its output value, uses inverse discrete Fourier transform of this output value to generate the signal y to be subtracted therefrom It consists essentially of things. This processing operation consists of dividing the useful signal containing noise into separate sub-bands, eg by Fourier analysis, and then processing each sub-band individually to increase the size of the vector space of observations . This type of segmentation operation can not be used for speech processing as it is known that speech signals are not stationary in frequency and do not statistically occupy a unified frequency band [e.g. voiced structures]. [0016] Another way is to use temporary diversity. This method is also unusable. The reason is that steady state voice transmission is physically unrealistic. It is only possible to observe the degree of steady state in dozens of frames of 25.6 ms (corresponding to 256 points of the signal sampled at 10 kHz) relative to the center of stable voice Instead, this steady state will drop to a period of 1-3 frames for plosives (sounds such as "t"). [0017] 03-05-2019 3 The third method is a "spatial diversity" method in which several signal tapping (vector information tapping) points are distributed in space. Then filtering is performed as schematically shown in FIG. [0018] In front of the speaker, for example, a set 5 of (L + 1) microphones which can be arranged at equal intervals, and these output signals are put into x0, x1. . . Add xL and reference numbers. After each of these microphones, a narrowband adaptive filter is placed, and the entire filter set is referenced at 6, which are respectively W0, W1. . . Reference by WL. Their outputs are connected to the accumulator 7, which outputs constitute the output of the device. [0019] Xk points to any of the input vectors, WKT points to the transposed vector of weights applied to the filter, and gk to the output scalar. [0020] It becomes as follows. [0022] At a certain moment (for example, as determined by the sample and hold operation), L input signals are available. Voice transmission affects the entire output signal of the microphone 5 and the difference between these signals is mainly due to the difference in propagation time between the loudspeakers and the individual microphones. In a manner known per se, the spatial processing operations form the antenna by forming a normal channel (generally by linear combination of the microphone signals), and thus by the phase shift of the directional lobe of the thus formed antenna (or pure Delay)). The limitations 03-05-2019 4 stated above also apply to the other known methods. [0023] The object of the present invention is a method for the reduction of directional noise in speech that can be used to obtain a reference noise source that shows the greatest correlation with noise affecting noisy speech, which is Whatever the content of the voice of the voice, including V., can be achieved without particular limitations on the frequency or duration of the voice signal. [0024] The object of the invention is furthermore a simple and inexpensive device for the implementation of this method. [0025] SUMMARY OF THE INVENTION The method of the present invention implements directional sound collection with at least four microphones equally spaced in front of the audio source of the signal to be processed, the audio source direction Form a linear combination of the addition and subtraction of the signals from the microphones so that the main sensitivity lobe is obtained, and the sum of all of the output signals of the microphones represents a noisy signal, each of the other combinations being a signal The invention comprises the same number of subtractions as additions and is used only as a noise source and further comprises an estimate of the noise to be subtracted from the noisy signal which is processed with a directional adaptive filter It should be construed as a non-limiting example and will become more apparent from the following detailed description of the embodiments illustrated in the accompanying drawings. [0026] The present invention will be described below for the reduction of noise in the emission of the speaker's voice placed in an environment under noise, but is not limited to such an application and is assumed to be a point source, an environment under noise And can be implemented for the reduction of the noise of the useful signal from any mobile voice. In this example, two sets of microphones are placed in front of the speaker at a normal distance (for example, a distance of 1 to 4 m) from the speaker. 03-05-2019 5 Each of these two sets comprises, for example, eight microphones arranged in a straight line and equally spaced from one another. The distance between adjacent microphones is, for example, several cm to about 10 cm. For example, if the spacing D = 10 cm, then the maximum frequency to be processed is fmax = C / 2d, which is about 1.7 kHz, where c is the speed of sound in air. These two sets are arranged, for example, in a crossing manner. However, the number of microphones and their arrangement can vary. The number of these microphones can be, for example, a multiple of 4 to 4 for each set, and can be arranged, for example, in the form of a "V" or a circle. Preferably, these two sets are coplanar. [0027] The eight microphones of each of these two sets are represented by M1 to M8. Their output signals are coupled to one another by eight different combinations. One of these combinations (row 1 of the table of FIG. 3) provides the generated signal S1, which is the combination corresponding to the summation of all these signals. Each of the other seven combinations includes four additions and four subtractions, providing respectively generated signals S2-S8. Thus, for example, the combination in the second row of the table of FIG. 3 corresponds to subtracting the M5 to M8 signals from the addition of the M1 to M4 signals, and thus S2 = (M1 + M2 + M3 + M4)-(M5 + M6 + M7 + M8) It becomes. [0028] It should be noted that these eight combinations are orthogonal (ie, they form part of the cardinality of an orthonormal function such as Haarde composition base). Signal S1 corresponds to the main lobe of the directivity of the standard channel. The other signals S2 to S8 can be considered to be generated as "noise only" because the main lobe direction is zero. FIG. 4 is formed between the planes perpendicular to the plane of the two sets of microphones and the different directions of measurement of the directional lobes at the different outputs S1 to S8 as defined above for each of the two sets of microphones Fig. 6 shows a graph showing an arbitrary value U as a function of the angle of incidence. The value of the X axis is scaled by the value of λ / D, where λ is the speed of sound and d is the distance between the microphones. [0029] 03-05-2019 6 In order to simplify the explanation, it is assumed that the used microphones are omnidirectional. If a graph of directivity is taken into consideration, the explanation given here is, of course, valid on the condition that the microphones are all identical or the difference is compensated (by complex weighting in output level, phase) . [0030] In order to practice the method of the present invention, the microphone sets are first all directed to the audio source (speaker). In this direction, for example, a wide-band super-resolving method for determining the elevation angle and relative orientation of the sound source such as AR, Capon, Music, etc. Any known suitable method can be used. [0031] It should be noted that the method of the present invention does not require calculations for the formation of "noise only". The combinations described above with reference to the table of FIG. 3 actually include simple serialization operations ("+" combinations) and the use of knot circuits (for "-" combinations). [0032] The output signals from all the microphones are sampled simultaneously. That is, all signals taken at the same instant are available. Thus, a steady state of noise affecting all of these signals is ensured. [0033] From a spatial point of view, there is an ambiguity of right or left relative to the speaker (microphone lobes are symmetrical about the center of the array, which is the same as for the antenna array image lobes) However, this ambiguity does not affect the operation of the noise reduction device. That is because the signal generated is firstly the sum signal of all the 03-05-2019 7 microphone signals and secondly the sum signal of the "noise" channel. The position of the speaker is determined separately from this ambiguity. The weights of the various filters can be calculated by one of the super-resolution methods shown in the following table: [0035] In this table: W is the vector corresponding to the filter weights, D (θ) and D + (θ) are the normal vector and its conjugate transpose, respectively, and Γ x is the spectrum Inter-matrix, where Γ x + and Γ -1 are conjugate transpose matrices of Γ x and the inverse (or pseudoinverse) of Γ x respectively, and the parameters λ k and V refer to the smallest eigenvalue of Γ x and the associated eigenvector, E is a vector composed of the output of the first component of noise (the other components are equal to zero). [0036] The general term “super-resolution” is simply a distance less than λ / L (L is the total length of the physical antenna formed by a series of eight aligned microphones and λ is the wavelength) It shows that the corresponding method can resolve (distinguish) two sound sources that are apart. [0037] The processing of the various channels is broadband processing. The covariance matrix is an interspectral matrix (a matrix of correlations of observation vectors). The 2D array of cross-aligned microphones allows the determination of the elevation and relative orientation of the sound source through detection of the maximum value of the following function: [0039] This makes it possible to servolink the array to the position of the loudspeaker. This servo link is accomplished by electronic pointing of the array of microphones (similar to a tracking radar antenna) by changing the phase shift applied to their respective signals. 03-05-2019 8 [0040] When this pointing is performed, binary conversion (FIG. 3) is applied to the signal coming from the microphone. This leads first to a noisy signal channel and secondly to 7 "noise only" channels (in the case of 8 microphones). Vector processing (for example of LMS or RLS type) allows filtering of this calculated value of the noise. Subtracting this calculated value from the abovementioned noise-containing signal makes it possible to perform an adaptive noise reduction operation. [0041] The mean least squares method of vector processing and, similarly, the search for optimal filtering are generally performed in the iterative gradient-search method. [0042] When sampling the signal from the microphone with a period Te where s (k) = s | kTe |, the adaptive algorithm of the deterministic slope is as follows: [0044] In these equations, μ refers to an incremental step of the algorithm, which is a positive constant. Depending on the value of this step is the speed of convergence, residual variance and stringency of the algorithm. An algorithm that is advantageous for performing vector processing operations is the stochastic gradient algorithm, also referred to as the LMS (LMS) algorithm. This algorithm utilizes a probabilistic estimate of the slope such that the filtering vector is adjusted with [0046] Since the speech signal is considered in the baseband without the need for complex 03-05-2019 9 demodulation, the equations included here are real equations. [0047] The LMS algorithm allows the minimization of the square root of the mean squared error of the adaptive device to a reference ("noise only") signal whose value is known at each sampling time. This algorithm is, on average, the optimal solution: [0049] It is possible to obtain However, with a variance proportional to the incremental step μ and the output of the noise collected by the microphone. This algorithm is advantageous because it exhibits high stability, but it is of course also possible to use other algorithms, for example an inductive least squares algorithm with a high convergence rate. However, a compromise must be found between the convergence speed and the residual variance of the estimation error. [0050] FIG. 6 is a schematic view of an example of an apparatus for carrying out the method of the present invention. [0051] Reference numeral 8 in this figure indicates eight microphones M'1 to M'8 arranged at a set of equal intervals. The other set 9 indicated by a dashed line is arranged crosswise to the first set and is coplanar therewith. The connection of this second set of microphones is identical to that of the set 8 microphones up to the mounting device and is shown for clarity of the drawing. [0052] 03-05-2019 10 After each of the set 8 microphones, a preamplifier (the preamplifier set is referenced 10), a sampling device (the sampling set is represented 11), a non-regressive delay line filter (the set of such filters is referenced 12) Continue. The outputs of the microphones M ′ 2, M ′ 4, M ′ 6 and M ′ 8 are connected to the accumulator 13. The outputs of the filters of the set 12 associated with the microphones M'3 to M'6 are connected to the other accumulator 14. [0053] The output of the accumulator 13 is connected to the first input of the accumulator 16 via a compensation filter 15 (for example of the FIR type) and the output of the accumulator 14 is identical to the filter 15 via the filter 17 Connected to 16 second inputs. [0054] The output of the accumulator 16 is connected to the device 18 for the stationary of the loudspeakers. This device 18 for servo control in elevation and relative orientation of the two microphone sets 8, 9 is of the super-division type, which works according to one of the above principles. [0055] The sampling devices 11 of the microphones M'1 to M'8 are furthermore connected to a binary conversion coupling device 19 of the type described with reference to FIG. The “main lobe” output of coupling device 19 (coupling including only addition) is connected to the (+) input of subtractor 20. The other seven "noise only" outputs are connected through adaptive filter 21 to the input (-) of subtractor 20. The filter 21 is, for example, a delay line filter, the coefficients of which are recomputed for each speech frame to minimize noise at the output of the subtractor 20 (e.g. 1 frame = 25.6 ms). [0056] 03-05-2019 11 Two sets of microphones 8, 9 perform dual spatial sampling of the speaker's voice emission. At the output of the accumulators 13, 14, two physical antennas are formed having two identical main lobes and two frequencies separated by one octave. The compensation filters 15, 17 correct the dispersion between the frequencies at the endpoints of the received acoustic spectrum. The signals coming from the two filters 15, 17 are summed at 16. Therefore, at 16 outputs, constant directional lobes (corresponding to the main lobes of sets 8 and 9) are obtained. The normal to the plane formed by sets 8 and 9 (microphones are simulated to points on the same plane) passes the intersection C of sets 8 and 9 and is a loudspeaker (also regarded as a point) This is true when being turned towards. Otherwise, at least one of the two main lobes is offset with respect to the center (C) of the corresponding directivity pattern, and the offset signal of 16 is affected thereby (output loss). The servo control / stationary device 18 then orients the plane of the microphone with respect to elevation and relative orientation in a known manner to recover the maximum value of the functional at the output 16. [0057] FIG. 5 shows a schematic view of one representative example of the broadband adaptive filter 21 of FIG. For each of the (L-1) channels of the microphone lophone of x0 to xL, this filter comprises elementary filters, each represented by 220 to 22L. Each of these elementary filters comprises, in known manner, k step processing blocks. Each processing block is a delay cell Z-1 (e.g. in the form of a buffer register), and for convolution cells Wi, j [for channels 220 to 22L, i is 0 to L, and j is 0 for each channel ( k-1) is essentially included. The first processing block (W0,0 to W * L, 0) has no delay cell. The outputs of channels 220-22 L are connected to parallel adder 23. The weights of the various processing blocks (coefficients affecting the delayed discrete values processed by each processing block) are servo linked to the output signal of the adder 20. [0058] Since the present invention is configured as described above, directivity in speech that can be used to obtain a reference noise source that exhibits the greatest correlation with noise affecting noisy speech A method is provided for noise reduction, which can be achieved without any particular restriction on the frequency or duration of the speech signal whatever the content of the voice of the noisy speech. [0059] Furthermore, a simple and inexpensive apparatus for the implementation of this method is provided. 03-05-2019 12 03-05-2019 13

1/--страниц