Patent Translate Powered by EPO and Google Notice This translation is machine-generated. It cannot be guaranteed that it is intelligible, accurate, complete, reliable or fit for specific purposes. Critical decisions, such as commercially relevant or financial decisions, should not be based on machine-translation output. DESCRIPTION JP2017516126 Abstract A noise suppressor has first (401) and second transducers (403) that generate first and second frequency domain signals from frequency conversion of first and second microphone signals. A time unit responsive to a difference indicator for an absolute value time-frequency tile value of the first frequency domain signal and a second absolute value time-frequency tile value of the second frequency domain signal, a gain unit (405, 407, 409) Determine the tile gain. A scaler (411) generates a third frequency domain signal by scaling the time frequency tile value of the first frequency domain signal by the time frequency tile gain. The resulting signal is converted to the time domain by a third converter (413). A designator (405, 407, 415) designates the time frequency tile of the first frequency domain signal as an utterance tile or noise tile, and the gain unit (409) comprises an utterance tile or noise of the time frequency tile. The gain is determined according to the designation as a tile. Noise suppression [0001] The present invention relates to noise suppression, and more particularly, but not exclusively, to the suppression of non-stationary diffuse noise based on signals captured from two microphones. [0002] The capture of audio, especially speech, has become increasingly important in recent decades. 03-05-2019 1 In fact, speech capture has become increasingly important for a variety of applications including telecommunications, teleconferencing, gaming and the like. However, the problem in many scenarios and applications is that the desired speech source is typically not the only audio source in the environment. Rather, in a typical audio environment, there are many other audio / noise sources captured by the microphone. One of the key issues presented to many speech capture applications is the question of how best to extract speech in noisy environments. Several different approaches for noise suppression have been proposed to address this problem. [0003] One of the most difficult tasks in speech enhancement is the suppression of non-stationary diffusion noise. Diffuse noise is, for example, an acoustic (noise) sound field in a room where the noise comes from all directions. A typical example is the so-called "bobble" noise in cafeterias and restaurants where there are many noise sources distributed throughout the room. [0004] When recording a desired speaker in a room using a microphone or microphone array, the desired speech is captured in addition to background noise. Speech enhancement can be used in an attempt to modify the microphone signal so that background noise is reduced while the desired speech is as unaffected as possible. When the noise is diffuse, one proposed approach estimates the spectral amplitude of the background noise and the resulting spectral amplitude of the enhanced signal is as similar as possible to the spectral amplitude of the speech signal desired. Yes, trying to correct the spectral amplitude. In this approach, the phase of the acquired signal is not changed. [0005] FIG. 1 shows an example of a noise suppression system according to the prior art. In this example, an input signal is received from two microphones. One microphone is considered to be the reference microphone and the other is the main microphone capturing the desired audio source, in particular the speech. Thus, the reference microphone signal x (n) and the main microphone signal are received. These signals are transformed in the frequency domain in the converters 101, 103 and the absolute values in the individual time frequency tiles are generated 03-05-2019 2 by the absolute value units 105, 107. The resulting absolute value is input to unit 109 to calculate the gain. The resulting gain is multiplied by the frequency domain value of the main signal in multiplier 111, thereby producing a frequency spectrum compensated output signal, which is transformed in the other transform unit 113 into the time domain. [0006] This approach can be considered best in the frequency domain. First, the frequency domain signal is generated by calculating the short time Fourier transform (STFT) of, for example, overlapping Hanning windowed blocks of the time domain signal. STFT is generally a function of both time and frequency and is represented by the two arguments t k and ω l. Where tk = kB is discrete time, k is frame index, B is frame shift, ω l = lω 0 is the (discrete) frequency, l is frequency index, ω 0 Represents the fundamental frequency interval. [0007] Let Z (t k, ω l) be the (complex) microphone signal to be enhanced. It consists of the desired speech signal Z s (tk, ω l) and the noise signal Z n (tk, ω l): Z (tk, ω l) = Z s (tk, ω l) + Z n (tk,) ω l) This microphone signal is input to a post processor. The post-processor performs noise suppression by modifying the spectral amplitude of the input signal while leaving the phase unchanged. The operation of the postprocessor can be described by a gain function. The gain function typically has the following form for spectral amplitude subtraction: [0008] Here, | · | is an absolute value operation. The output signal is calculated as Q (t k, ω l) = Z (t k, ω l) * G (t k, ω l). After being converted back to the time domain, the time is taken into account by combining the current and previous frames, taking into account that the original time signal has been windowed and time overlapped (the duplicate addition procedure has been performed) The domain signal is reconstructed. [0009] The gain function can be generalized as follows: [0010] For = 1, this describes the gain function for spectral amplitude subtraction. For α = 2, this again describes the gain function for spectral power that is often used. The following description focuses on spectral amplitude subtraction, but it will be understood that the principles given may also apply to spectral power subtraction in particular. The amplitude spectrum of noise at | Z n (t k, ω l) | is generally unknown. Therefore, it is necessary to use an estimated value; [0012] [^ 03-05-2019 3 with | Z n (t k, ω l) |] instead. Since the estimate is not always accurate, the oversubtraction factor γ n for noise is used (ie, the noise is scaled by a factor greater than 1). However, this can lead to negative values for; [0013], which is undesirable. For that reason, the gain function is limited to zero or some small positive value. [0014] For the above gain function, this results in the following: [0015] For stationary noise, | Z n (tk, ω l) | is the amplitude spectrum between silences | Z ( It can estimate by measuring and averaging tk, ω l) |. However, for non-stationary noise, an estimate of | Z n (t k, ω l) | can not be derived from such an approach. This is because the characteristics change with time. This tends to prevent that an accurate estimate is generated from a single microphone signal. Instead, it has been proposed to use an additional microphone to be able to estimate | Z n (t k, ω l) |. As a specific example, consider a scenario where there are two microphones in the room, one microphone is located near the desired speaker (main microphone) and the other microphone is further from the speaker (reference microphone) Can. In this scenario, it can be assumed that the main microphone contains the desired speech and noise components, and it is assumed that the reference microphone signal contains no speech at all, but only the noise signal recorded at the position of the reference microphone It can be done. The microphone signals are for the main and reference microphones respectively: Z (tk, ω l) = Z s (tk, ω l) + Z n (tk, ω l) X (tk, ω l) = X n (tk , ω l). [0017] In order to relate noise components in the microphone signal, the so-called coherence term is defined as follows. Where E {·} is the expectation operator. The coherence term is a measure of the average correlation between the amplitude of the noise component in the main microphone signal and the amplitude of the reference microphone signal. [0019] Since C (tk, ω l) does not depend on the instantaneous audio in the microphone, but on the spatial characteristics of the noise field, the variation of C (tk, ω l) as a function of time is Z Much less than the time variation of n and X n. [0020] As a result, C (tk, ω l) temporally sets | Z n (tk, ω l) | and | X n (tk, ω l) | It can be estimated relatively accurately by averaging. An approach to do so is disclosed in US Pat. The document specifically describes a method in which explicit speech detection is not required to determine C (t k, ω l). As in the case of stationary noise, the equations for the gain functions for the two microphones can be derived as follows: [0022] Since X contains no speech, it is considered that the absolute value of X multiplied by the coherence term C (t k, ω l) gives an estimate of the noise component in the main microphone signal. As a consequence, the equation given above is (estimated) by scaling the frequency domain signal, ie by Q (tk, ωl) = Z (tk, ωl) * G (tk, ωl) It can be used to shape the spectrum of the first microphone signal to correspond to the speech component. [0023] However, while the described approach may provide advantageous performance in many scenarios, it may provide less than optimal performance in some scenarios. In particular, noise suppression may not be optimal in some scenarios. In particular, for diffuse noise, the improvement in signal-to-noise ratio (SNR) may be limited and often the so-called SNR 03-05-2019 4 improvement (SNRI) is practically limited to the order of 6-9 dB. While this may be acceptable for some applications, in many scenarios, significant noise components tend to remain, resulting in degradation of perceived speech quality. Furthermore, although other noise suppression techniques can be used, these too tend to be suboptimal, eg complex, lacking in flexibility, impractical, computationally demanding, complex hardware (eg multiple microphones) There is a tendency to provide demanding and / or non-optimal noise suppression. [0024] Thus, improved noise suppression would be advantageous. In particular, noise that allows for reducing complexity, increasing flexibility, facilitating implementation, reducing cost (eg, not requiring multiple microphones), improving noise suppression and / or improving performance. Suppression would be advantageous. [0025] U.S. Pat. No. 7,602,926 U.S. Pat. No. 7,146,012 [0026] Thus, the present invention seeks to preferably mitigate, alleviate or eliminate one or more of the above mentioned disadvantages singly or in any combination. [0027] According to an aspect of the invention, a noise suppressor is provided for suppressing noise in a first microphone signal. The noise suppressor is: a first converter that generates a first frequency domain signal from frequency conversion of a first microphone signal, wherein the first frequency domain signal is represented by a time frequency tile value. A second transducer for generating a second frequency domain signal from frequency conversion of a second microphone signal, the second frequency domain signal being represented by a time frequency tile value, Between a second converter and a first monotonic function of an absolute value time frequency tile value of the first frequency domain signal and a second monotonic function of an absolute value time frequency tile value of the second frequency domain signal A gain unit for determining the time frequency tile gain as a non-negative monotonic function of the difference indicator indicative of the difference between the time frequency tile value of the first frequency domain signal by the time frequency tile gain; And a scaler for generating an output frequency domain signal by scaling. The noise suppressor further comprises a designator for specifying a time frequency tile of the first frequency domain signal as an utterance tile or a noise tile, and the gain unit is configured to determine the time of the first frequency domain signal. When the time-frequency tile is designated as a noise tile for the time-frequency tile gain of a timefrequency tile in response to the designation as a speech tile or noise tile of the frequency tile, the time-frequency tile is an utterance tile The time-frequency tile gains are configured to be determined such that lower gain values are determined than specified. 03-05-2019 5 [0028] The present invention may provide improved and / or facilitated noise suppression in many embodiments. In particular, the present invention may allow for improved suppression of non-stationary and / or diffuse noise. An increased signal or speech to noise ratio can often be achieved. In particular, the approach may actually increase the upper bound for potential SNR improvement. In fact, in many practical scenarios, the present invention may allow to improve the SNR of the noise-suppressed signal from about 6-8 dB to more than 20 dB. [0029] The approach can typically provide improved noise suppression, and in particular, can allow improved suppression of noise without corresponding speech suppression. An improved signal to noise ratio of the suppressed signal can often be achieved. [0030] The gain unit is configured to separately determine different time frequency tile gains for at least two time frequency tiles. In many embodiments, the time frequency tiles may be divided into multiple sets of time frequency tiles, and the gain units are configured to determine gain independently and / or separately for each set of time frequency tiles It may be done. In many embodiments, the gains for time frequency tiles of a set of time frequency tiles are determined by the first frequency domain signal and the second frequency in time frequency tiles belonging to the set of time frequency tiles. It may depend on the attribute of only the area signal. [0031] The gain unit may determine, for the time frequency tile, a different gain if this is designated as a speech tile than if it is designated as a noise tile. The gain unit may in particular be arranged to calculate the gain for the time frequency tile by evaluating a function depending on the designation of the time frequency tile. In some embodiments, the gain unit evaluates the gain for the time frequency tile and a different function when the time frequency tile is designated as the speech tile than when it is designated as the noise tile May be configured to calculate. The functions, equations, algorithms and / or parameters used in determining the time frequency tile gain differ from those specified as noise tiles when the time frequency tile is specified as an utterance tile. It is also good. [0032] The time frequency tiles may in particular correspond to one bin of frequency transforms in one time segment / frame. In particular, the first and second converters may use block processing to convert successive segments of the first and second signals. A time frequency tile may correspond to a set (typically one) of transform bins in one segment / frame. [0033] Designation as speech or noise (temporal frequency) tiles may be performed for each 03-05-2019 6 temporal frequency tile in some embodiments. However, often the designation may be applied to a group of time frequency tiles. In particular, the designation may apply to all time frequency tiles in a certain time segment. Thus, in some embodiments, the first microphone signal may be segmented into time segments / frames that are individually converted to the frequency domain, designated as speech or noise tiles of the time frequency tile May be common to all time frequency tiles of one segment / frame. [0034] In some embodiments, the noise suppressor may further comprise a third converter for generating the output signal from the frequency to time conversion of the output frequency domain signal. In other embodiments, the output frequency domain signal may be used directly. For example, speech recognition or speech enhancement may be performed in the frequency domain, so output frequency domain signals may be used directly without the need for conversion to the time domain. [0035] According to an optional feature of the invention, the gain unit is configured to determine a gain value for a time frequency tile gain of a time frequency tile as a function of the difference index of the time frequency tile. [0036] This may provide for efficient noise suppression and / or facilitated implementation. In particular, many embodiments can lead to efficient noise suppression that can be efficiently adapted to the signal characteristics and still be implemented without the need for high computational load or extremely complex processing. [0037] The function may in particular be a monotonic function of the difference indicator, and the gain value may in particular be proportional to the difference value. [0038] According to an optional feature of the invention, at least one of the first monotonic function and the second monotonic function is dependent on whether the time frequency tile is designated as a speech tile or a noise tile . [0039] This may provide for efficient noise suppression and / or facilitated implementation. In particular, many embodiments can lead to efficient noise suppression that can be efficiently adapted to the signal characteristics and still be implemented without the need for high computational load or extremely complex processing. 03-05-2019 7 [0040] The at least one of the first monotonic function and the second monotonic function is a time frequency tile for time frequency tile values of the same absolute value of the first or second frequency domain signal, respectively, for a time frequency tile. When is specified as a speech tile, it provides different output values than when it is specified as a noise tile. [0041] According to an optional feature of the invention, the second monotonic function is a time frequency using a scale value depending on whether the time frequency tile is designated as a speech time frequency tile or a noise time frequency tile. The scaling of the absolute value time frequency tile value of the second frequency domain signal for the tile is included. [0042] This may provide for efficient noise suppression and / or facilitated implementation. In particular, many embodiments can lead to efficient noise suppression that can be efficiently adapted to the signal characteristics and still be implemented without the need for high computational load or extremely complex processing. [0043] According to an optional feature of the invention, a gain unit generates a noise coherence estimate indicating a correlation between the amplitude of the second microphone signal and the amplitude of the noise component of the first microphone signal. Configured such that at least one of the first monotonic function and the second monotonic function depend on the noise coherence estimate. [0044] This may provide for efficient noise suppression and / or facilitated implementation. The noise coherence estimate is in particular an estimate of the correlation between the amplitude of the first microphone signal and the amplitude of the second microphone signal in the absence of speech, ie when the speech source is inactive. It may be Noise coherence estimates may be determined based on the first and second microphone signals and / or the first and second frequency domain signals in some embodiments. In some embodiments, noise correlation estimates may be generated based on a separate calibration or measurement process. [0045] According to an optional feature of the invention, the first monotonic function and the second monotonic function are noise coherence estimates for the amplitude relationship between the first microphone signal and the second microphone signal. And the expected value of the difference indicator is negative if the time frequency tile is designated as a noise tile. 03-05-2019 8 [0046] According to an optional feature of the invention, a gain unit comprises at least one of the first monotonic function and the second monotonic function, the first microphone signal corresponding to a noise coherence estimate and the first microphone signal. The expected value of the difference indicator for the amplitude relationship between the two microphone signals is configured to change differently for time frequency tiles specified as noise tiles than for time frequency tiles specified as speech tiles ing. [0047] According to an optional feature of the invention, the gain differences for time frequency tiles designated as speech and noise tiles are: signal level of said first microphone signal; signal level of said second microphone signal And at least one value from the group consisting of signal-to-noise estimates for the first microphone signal. [0048] This may provide for efficient noise suppression and / or facilitated implementation. In particular, many embodiments can lead to efficient noise suppression that can be efficiently adapted to the signal characteristics and still be implemented without the need for high computational load or extremely complex processing. [0049] According to an optional feature of the invention, the difference measure for a time frequency tile depends on whether the time frequency tile is designated as a noise tile or a speech tile. [0050] This may provide for efficient noise suppression and / or facilitated implementation. [0051] According to an optional feature of the invention, the designator designates the time frequency tile of the first frequency domain signal as an utterance tile or a noise tile as an absolute value of the first frequency domain signal. A time frequency tile value and an absolute value of the second frequency domain signal are configured to be responsive to the difference value generated in response to the difference indicator for the noise tile relative to the time frequency tile value. [0052] This may allow a particularly advantageous designation. In particular, a reliable designation can be achieved, while at the same time allowing reduced complexity. In particular, corresponding or typically the same functionality may be used for both tile specification and gain determination. 03-05-2019 9 [0053] In many embodiments, the designator is configured to specify a time frequency tile as a noise tile if the value of the difference is less than a threshold. [0054] According to an optional feature of the invention, the designator is configured to filter difference values across time frequency tiles. The filtering includes time frequency tiles that differ in both time and frequency. [0055] This provides an improved designation of time frequency tiles in many scenarios and applications, and as a result provides improved noise suppression. [0056] According to an optional feature of the invention, the gain unit is configured to filter gain values across multiple time frequency tiles. The filtering includes time frequency tiles that differ in both time and frequency. [0057] This can provide substantially improved performance, and can typically tolerate substantially improved signal to noise ratios. The approach may improve noise suppression by applying filtering to the gain values for time frequency tiles. Here, the filtering is both frequency and time filtering. [0058] According to an optional feature of the invention, a gain unit filters at least one of an absolute value time frequency tile value of the first frequency domain signal and an absolute value time frequency tile value of the second frequency domain signal. Configured. The filtering includes time frequency tiles that differ in both time and frequency. [0059] This can provide substantially improved performance, and can typically tolerate substantially improved signal to noise ratios. The approach may improve noise suppression by applying filtering to signal values for time frequency tiles. Here, the filtering is both frequency and time filtering. [0060] In many embodiments, a gain unit is configured to filter both an absolute value time frequency tile value of the first frequency domain signal and an absolute value time frequency tile value of the second frequency domain signal. Here, the filtering includes time frequency tiles that differ in both time and frequency. 03-05-2019 10 [0061] According to an optional feature of the invention, the noise suppressor is further configured as an audio beamformer configured to generate the first microphone signal and the second microphone signal from signals from a microphone array. Have. [0062] This can improve performance and can tolerate an improved signal to noise ratio of the suppressed signal. In particular, the present approach may allow the reference signal with reduced contribution from the desired source to be processed by the algorithm to provide improved designation and / or noise suppression. [0063] According to an optional feature of the invention, the noise suppressor is further adaptively canceling the signal component of the first microphone signal correlated with the second microphone signal from the first microphone signal. Have a bowl. [0064] This can improve performance and can tolerate an improved signal to noise ratio of the suppressed signal. In particular, the present approach may allow the reference signal with reduced contribution from the desired source to be processed by the algorithm to provide improved designation and / or noise suppression. [0065] According to an optional feature of the invention, the difference measure comprises a first value given as a monotonic function of an absolute value time frequency tile value of the first frequency domain signal and a second value of the second frequency domain signal. It is determined as the difference between the absolute value and the second value given as a monotonic function of the time frequency tile value. [0066] According to an aspect of the present invention, there is provided a method of suppressing noise in a first microphone signal comprising: generating a first frequency domain signal from frequency conversion of the first microphone signal, Frequency domain signals are represented by time frequency tile values; generating a second frequency domain signal from frequency conversion of a second microphone signal, the second frequency domain signal being a time frequency tile A time-frequency tile gain in response to the step of being represented by a value; and the difference measure for the absolute value time-frequency tile value of the first frequency-domain signal and the absolute value time-frequency tile value of the second frequency-domain signal Scaling the time frequency tile values of the first frequency domain signal by the time frequency tile gain Generating an output frequency domain signal according 03-05-2019 11 to, the method further comprising: designating a time frequency tile of the first frequency domain signal as an utterance tile or a noise tile, the time frequency tile gain being A method is provided, which is determined in response to the designation of the first frequency domain signal as a speech or noise tile of a time frequency tile. [0067] In some embodiments, the method may further include generating an output signal from frequency to time conversion of the output frequency domain signal. [0068] These and other aspects, features and advantages of the present invention will be apparent from, and elucidated with reference to, the embodiments described hereinafter. [0069] Embodiments of the invention will be described, by way of example only, with reference to the drawings. FIG. 1 shows an example of a noise suppressor according to the prior art. It is a figure which shows the example of the noise suppression performance about the noise suppressor of a prior art. It is a figure which shows the example of the noise suppression performance about the noise suppressor of a prior art. FIG. 7 illustrates an example of a noise suppressor in accordance with some embodiments of the present invention. FIG. 5 illustrates an example of a noise suppressor configuration in accordance with some embodiments of the present invention. FIG. 2 shows an example of a converter from time domain to frequency domain. FIG. 2 shows an example of a converter from the frequency domain to the time domain. FIG. 6 shows an example of elements of a noise suppressor according to some embodiments of the invention. FIG. 6 shows an example of elements of a noise suppressor according to some embodiments of the invention. FIG. 5 illustrates an example of a noise suppressor configuration in accordance with some embodiments of the present invention. FIG. 5 illustrates an example of a noise suppressor configuration in accordance with some embodiments of the present invention. [0070] The inventor of the present application recognizes that the performance of the prior art approach of FIG. 1 gives a non-optimum performance for non-stationary / spreading noise, and a diagram for non-stationary / spreading noise. It has been recognized that improvements can be made by introducing specific concepts that can mitigate or eliminate the performance limitations experienced by one system. [0071] Specifically, the inventors have come to realize that the approach of FIG. 1 for diffuse noise has a limited signal to noise ratio improvement (SNRI) range. In particular, the inventor has found that other adverse effects can be introduced when increasing the excess subtraction factor γ n in the conventional function as described above, in particular the increase of the speech 03-05-2019 12 attenuation during speech It came to recognize what could be done. [0072] This can be understood by looking at the characteristics of the ideal spherical isotropic diffuse noise field. When two microphones are arranged such distance apart by a distance d to provide the microphone signals X 1 (tk, ω l) and X 2 (tk, ω l) respectively, the wavenumber k = ω / c (c The following equation holds, using the speed of sound) and the variance σ <2> of the real part and imaginary part of X 1 (tk, ω 1) and X 2 (tk 1, ω 1) distributed in a Gaussian. [0073] The coherence function between X 1 (t k, ω l) and X 2 (t k, ω l) is given by [0074] From this coherence function, X 1 (t k, ω l) and X 2 (t k, ω l) will be uncorrelated for higher frequencies and large distances. For example, if the distance is greater than 3 meters, X 1 (t k, ω l) and X 2 (t k, ω l) are substantially uncorrelated for frequencies above 200 Hz. [0075] Using these characteristics, C (t k, ω l) = 1, and the gain function then reduces. [0076] Assuming that there is no speech, that is, Z (tk, ωl) = Zn (tk, ωl), and looking at the numerator, | Z (tk, ωl) | and | X (tk, ωl) | Become a Rayleigh distribution. This is because real and imaginary parts are Gaussian and independent. It is assumed that γ n = 1 and θ = 0. Consider the variable d = | Z (tk, [omega] l) |-| X (tk, [omega] l) |. [0077] The mean of the difference of the two random variables is equal to the difference of the mean: E {d} = 0. [0078] The variance of the difference of the two probability signals is equal to the sum of the individual variances: var (d) = (4-π) σ <2>. [0079] Limiting d to 0 (ie, negative values are forced to 0), the power of d is half the value of the variance of d because the distribution of d is symmetric around 0: E {d < 2>} = (4- (pi)) <2> / 2. [0080] Here, comparing the power of the residual signal with the power of the input signal (2σ <2>), the following is obtained for the suppression due to the post-processor: A = -10 log 10 (1-π 03-05-2019 13 / 4) = 6.68 dB . [0081] Thus, the attenuation is limited to a relatively low value of less than 7 dB for the case where only background noise is present. [0082] Bounded variable db = MAX ((| Z (tk, ω l) | -γ n | X (tk, ω l) |), 0, desired to increase noise suppression by increasing γ n Considering), it is possible to derive A = -10log10 {(? N / 2) (-? + (2 /? N) + 2arctan (? N))} for the decay of the post-processor. [0083] The decay is a function of the oversubtraction factor γ n, so some exemplary values may be: [0084] As can be seen, a large overdamping factor is required to reach a noise suppression of, for example, 10 dB or more. [0085] Next, considering the influence of noise subtraction on the remaining utterance amplitude, | Z (tk, ωl) | ≦ | Zs (tk, ωl) | + | Zn (tk, ωl) |. [0086] Thus, subtraction of the noise component from | Z (t k, ω l) | leads easily to oversubtraction, even for γ n as small as one. [0087] The power of | Z (tk, ω l) | and (| Z (tk, ω l) | − | Z s (tk, ω l) |) is the speech amplitude v = | Z s (tk, ω l) | And noise power (2σ <2>) as a function of (or can be determined by simulation or numerical analysis). FIG. 2 shows the result in the case of 2σ <2> = 1. [0088] As can be seen from FIG. 2, for large v, the powers of | Z (tk, ωl) | and | Zs (tk, ωl) | are close to each other. As a result, subtraction of the noise estimate | X (t k, ω l) | leads to oversubtraction. [0089] Utterance attenuation 03-05-2019 14 [0090] For v> 2, the speech attenuation is about 2 dB. For smaller v, especially v <1, not all noise is suppressed because of the large dispersion of ds = Z (tk, ω1) -X (tk, ω1). For those values, d s can be negative and, as in the noise only case, they are clipped so that θ ≧ 0. For larger v, d s is not negative and limiting to 0 does not affect performance. [0091] If the oversubtraction factor γ n is increased, speech attenuation will increase as shown in FIG. FIG. 3 corresponds to FIG. 1, but E {(| Z (tk, ω 1) | -γ n | X (tk, ω 1) |) <2>} for γ n = 1 and γ n = 1.8 respectively. Is given and compared to the desired output. [0092] For v> 2, an increase in speech distortion in the range of 4 to 5 dB is seen. For v <2, the output increases for γ n = 1.8. This can be prevented by limiting to 0 as discussed above. [0093] The 4 dB gain of noise suppression when going from γ n = 1 to γ n = 1.8 is canceled by a 2 to 3 dB greater speech attenuation, thus leading to an SNR improvement of only 1 to 2 dB or so. This is typical for diffuse-like noise fields. The overall SNR improvement is limited to about 12 dB. [0094] Thus, while the approach may lead to an effective noise suppression in practice with an improved SNR, this suppression is still practically limited to a relatively modest SNR improvement of not more than 10 dB. [0095] FIG. 4 illustrates an example of a noise suppressor in accordance with some embodiments of the present invention. The noise suppressor of FIG. 4 may provide substantially higher SNR improvement for diffusive noise than is typically possible in the system of FIG. In fact, simulations and practical tests have shown that SNR improvements of more than 20-30 dB are typically possible. [0096] The noise suppressor comprises a first transducer 401 that receives a first microphone signal from a microphone (not shown). The first microphone signal may be captured, filtered, amplified, etc. as known in the art. Additionally, the first microphone signal may be a digital time domain signal generated by sampling the analog signal. [0097] The first transducer 401 is configured to generate a first frequency domain signal by applying a frequency transform to the first microphone signal. In particular, the first microphone 03-05-2019 15 signal is divided into time segments / intervals. Each time segment / interval contains a set of samples, which are converted to a set of frequency domain samples, for example by FFT. Thus, the first frequency domain signal is represented by frequency domain samples, each frequency domain sample corresponding to a particular time interval and a particular frequency interval. Each such frequency interval and time interval is typically known in the art as a time frequency tile. Thus, the first frequency domain signal is represented by the values for each of the plurality of time frequency tiles, ie, by the time frequency tile values. [0098] The noise suppressor further comprises a second transducer 403 that receives a second microphone signal from a microphone (not shown). The second microphone signal may be captured, filtered, amplified, etc. as known in the art. Additionally, the second microphone signal may be a digital time domain signal generated by sampling the analog signal. [0099] The second transducer 403 is configured to generate a second frequency domain signal by applying a frequency transform to the second microphone signal. In particular, the second microphone signal is divided into time segments / intervals. Each time segment / interval contains a set of samples, which are converted to a set of frequency domain samples, for example by FFT. Thus, the second frequency domain signal is represented by the values for each of the plurality of time frequency tiles, ie by the time frequency tile values. [0100] The first and second microphone signals are hereinafter referred to as z (n) and x (n) respectively, and the first and second frequency domain signals are vectored [0101] Referenced by (Each vector contains all M frequency tile values for a given processing / transformation time segment / frame. ) In use, z (n) is assumed to contain noise and speech, while x (n) is assumed to contain noise only. Furthermore, the noise components of z (n) and x (n) are assumed to be uncorrelated. (These components are assumed to be uncorrelated in time. However, it is typically assumed that there is a relationship between the mean amplitudes, which is represented by the coherence term. ) Such an assumption is that the first microphone (which captures z (n)) is located in close proximity to the speaker while the second microphone is located some distance from the speaker and the noise is For example, it tends to be effective in a scenario distributed indoors. Such a scenario is illustrated in FIG. 5, where the noise suppressor is depicted as a SUPP unit. [0102] Following the transformation to the frequency domain, it is assumed that the real and imaginary components of the temporal frequency values have a Gaussian distribution. This 03-05-2019 16 assumption is typically accurate, for example, for scenarios where noise originates from diffuse sound fields, for sensor noise and for some other noise sources experienced in many practical scenarios. [0103] FIG. 6 shows an example of functional elements of a possible implementation of the first and second transformation units 401, 403. In this example, a serial to parallel converter produces overlapping blocks (frames) of 2B samples, which are then Hanning windowed and transformed into the frequency domain by a fast Fourier transform (FFT). [0104] The first converter 401 is coupled to the first absolute value unit 405. A first absolute value unit 405 determines the absolute value of the time frequency tile value, thereby producing an absolute value time frequency tile value for the first frequency domain signal. [0105] Likewise, the second converter 403 is coupled to the second magnitude unit 407. The second absolute value unit 407 determines the absolute value of the time frequency tile value, thereby generating an absolute value time frequency tile value for the second frequency domain signal. [0106] The first and second absolute value units 405, 407 are fed to a gain unit 409. Gain unit 409 is configured to determine a gain for the time frequency tile based on an absolute value time frequency tile value of the first frequency domain signal and an absolute value time frequency tile value of the second frequency domain signal. The gain unit 409 is thus vector [0107] Calculate the time-frequency tile gain referenced by. [0108] The gain unit 409 is more particularly a predicted time frequency tile of a first frequency domain signal generated from a time frequency tile value of a first frequency domain signal and a time frequency tile value of a second frequency domain signal. Determine a difference indicator that indicates the difference between the values. Thus, the difference indicator may be a predicted difference indicator. In some embodiments, the prediction may simply be that the time frequency tile value of the second frequency domain signal is a direct prediction of the time frequency tile value of the first frequency domain signal. [0109] The gain is then determined as a function of the difference measure. Specifically, a 03-05-2019 17 difference measure may be determined for each time frequency tile, and the gain may be set such that the higher the difference measure (ie, the stronger the indication of the difference) the higher the gain. Thus, the gain may be determined as a monotonically increasing function of the distance indicator. [0110] As a result, the time frequency tile gain is determined, but the gain is relatively accurate for time frequency tiles where the differential index is relatively low, ie the value of the first frequency domain signal is from the value of the second frequency domain signal For predictable time frequency tiles, the difference measure is lower for relatively low time frequency tiles, ie, for time frequency tiles in which the value of the first frequency domain signal can not be effectively predicted from the value of the second frequency domain signal. . Thus, the gain for a time frequency tile that has a high probability that the first frequency domain signal contains significant speech components has a low probability that the first frequency domain signal has a low probability that contains speech components. Determined higher than the gain for the tile. The generated time frequency tile gain is a scalar value in this example. [0111] Gain unit 409 is coupled to scaler 411, which is input the gain and proceeds to scale the time frequency tile values of the first frequency domain signal by these time frequency tile gains. In particular, in the scaler 411, the signal vector [0112] Is the gain vector [0113] The resulting signal vector is multiplied by each element [0114] give. [0115] The scaler 411 thus produces a third frequency domain signal, also called output frequency domain signal. This corresponds to the first frequency domain signal but with spectral shaping corresponding to the expected speech component. Because the gain values are scalar values, individual time frequency tile values of the first frequency domain signal may be scaled in amplitude, while time frequency tile values of the third frequency domain signal are corresponding values of the first frequency domain signal. Have the same phase as [0116] The gain unit 409 is coupled to an optional third converter 413 which receives the third 03-05-2019 18 frequency domain signal. The third converter 413 is configured to generate an output signal from the frequency to time conversion of the third frequency domain signal. Specifically, the third converter 413 may perform an inverse conversion of the conversion of the first frequency domain signal by the first converter 401. In some embodiments, the third (output) frequency domain signal may be used directly, for example, with frequency domain speech recognition or speech enhancement. In such an embodiment, there is no need for a third converter 413. [0117] Specifically, as shown in FIG. 7, the third frequency domain signal is [0118] May be converted back to the time domain, and then the first B pieces of the current (latest) frame (transformed segment) due to the overlap and windowing of the first microphone signal by the first converter 401 The time domain signal may be reconstructed by adding the last B samples of the previous frame to the samples of. Finally, the resulting block [0119] Can be converted to a continuous output signal stream q (n) by a parallel to serial converter. [0120] However, the noise suppressor of FIG. 4 does not calculate the time-frequency tile gain based solely on the difference measure. Rather, the noise suppressor is configured to designate the time frequency tile as being a speech (time frequency) tile or as noise (time frequency tile) and to determine the gain in dependence of the designation. There is. Specifically, if the function for determining the gain for a given time-frequency tile as a function of the difference index is said to belong to the noise frame if the time-frequency tile is specified to belong to the speech frame Different from when specified. [0121] The noise suppressor of FIG. 4 specifically includes a designator 415 configured to designate the time-frequency tile of the first frequency-domain signal as an utterance tile or noise tile. [0122] It will be appreciated that there are many different techniques and techniques for determining whether a signal component corresponds to an utterance. Furthermore, it will be appreciated that any such approach may be used as appropriate. For example, a time frequency tile belonging to a signal portion may be designated as a speech time frequency tile if the signal portion is presumed to include speech components, and otherwise it may be designated as noise. 03-05-2019 19 [0123] Thus, in many embodiments, the designation of time-frequency tiles is a designation to speech and non-speech tiles. In fact, noise tiles may be considered equivalent to non-speech tiles (in fact, all non-speech can be considered as noise since the desired signal component is the speech component). [0124] In many embodiments, designation of the time frequency tile as an utterance or noise (time frequency) tile is based on comparison of the first and second microphone signals and / or comparison of the first and second frequency domain signals. May be In particular, the closer the correlation between the amplitudes of the signals, the less likely the first microphone signal contains significant speech components. [0125] Designation of time-frequency tiles as speech or noise tiles (where each category may, in some embodiments, include further subdivisions into subcategories) is, in some embodiments, each time While implemented for frequency tiles individually, in many embodiments may be implemented for groups of time frequency tiles. [0126] In particular, in the example of FIG. 4, the designator 415 is configured to generate one designation for each time segment / transform block. Thus, for each time segment, it may be estimated whether the first microphone signal contains a significant speech component. If included, all time frequency tiles of that time segment are designated as speech time frequency tiles, otherwise they are designated as noise time frequency tiles. [0127] In the example of FIG. 4, the designator 415 is coupled to the first and second magnitude units 405, 407 and is configured to designate time frequency tiles based on the magnitudes of the first and second frequency domain signals. Be done. However, it is understood that in many embodiments the designation may alternatively or additionally be based on, for example, the first and second microphone signals and / or the first and second frequency domain signals. I will. [0128] Designator 415 is coupled to gain unit 409. The gain unit 409 is input with designation of time frequency tile. That is, gain unit 409 receives information about which time frequency tiles are designated as speech tiles and which time frequency tiles are designated as noise tiles. [0129] The gain unit 409 is configured to calculate a time frequency tile gain in response to specifying the time frequency tile of the first frequency domain signal as an utterance tile or a 03-05-2019 20 noise tile. [0130] Thus, the gain calculation depends on the designation, and the resulting gain is different for time frequency tiles designated as speech tiles than for time frequency tiles designated as noise tiles. This difference or dependency may, for example, have two alternative algorithms or functions for the gain unit 409 to calculate the gain value from the difference index, and based on the designation between the two functions for the time frequency tile It may be implemented by the gain unit 409 by being configured to select. Alternatively or additionally, gain unit 409 may use different parameter values for a single function, the parameter values depending on said designation. [0131] Gain unit 409 is configured to determine a lower gain value for the time frequency tile gain when the corresponding time frequency tile is specified as a noise tile than when it is specified as a speech tile. Thus, if all other parameters used to determine gain are invariant, gain unit 409 calculates lower gain values for noise tiles than for speech tiles. [0132] In the example of FIG. 4, the designation is segment / frame based. That is, the same designation applies to all time frequency tiles of a time segment / frame. Thus, the gain for time segments / frames estimated to contain sufficient speech is set higher (for all other parameters equal) than for time segments estimated to not contain sufficient speech. [0133] In many embodiments, the difference value for a given time frequency tile may depend on whether the time frequency tile is designated as a noise tile or an utterance tile. Thus, in some embodiments, the same function may be used to calculate the gain from the difference measure, but the calculation of the difference measure itself may depend on the designation of the time frequency tile. [0134] In many embodiments, the difference measure may be determined as a function of an absolute value time frequency tile value of each of the first and second frequency domain signals. [0135] In fact, in many embodiments, the difference measure may be determined as the difference between the first and second values. Here, the first value is generated as a function of at least one time frequency tile value of the first frequency domain signal, and the second value as a function of at least one time frequency tile value of the second frequency domain signal It is 03-05-2019 21 generated. However, the first value may not be dependent on the at least one time frequency tile value of the second frequency domain signal, the second value being the at least one time frequency of the first frequency domain signal It does not have to depend on tile values. [0136] The first value for the first time frequency tile may in particular be generated as a monotonically increasing function of the absolute value time frequency tile value of the first frequency domain signal in the first time frequency tile. Similarly, the second value for the first time frequency tile may in particular be generated as a monotonically increasing function of the absolute value time frequency tile value of the second frequency domain signal in the second time frequency tile. [0137] At least one of the functions for calculating the first and second values may depend on whether the time frequency tile is designated as a speech time frequency tile or a noise time frequency tile. For example, the first value may be higher if the temporal frequency tile is an utterance tile than if it is a noise tile. Alternatively or additionally, the second value may be lower if the temporal frequency tile is an utterance tile than if it is a noise tile. [0138] An example of a function for calculating the gain function may in particular be the following function: [0139] Where α is a factor less than 1 and C (tk, ω 1) is an estimated coherence term representing the correlation between the amplitude of the first frequency domain signal and the amplitude of the second frequency domain signal And the oversubtraction factor γ n is a design parameter. For some applications, C (t k, ω l) can be approximated as one. The excess subtraction factor γ n is typically in the range of 1 to 2. [0140] Typically, the gain function is limited to positive values, and typically a minimum gain value is set. Thus, the above function [0141] It may be determined as [0142] Thereby, the maximum attenuation of noise suppression can be set by θ, which must be greater than or equal to zero. For example, if the minimum gain value is set to θ = 0.1, the maximum attenuation is 20 dB. Since the unconstrained gain function will be lower (in practice 03-05-2019 22 between 30 and 40 dB), this results in background noise that sounds more natural. This is particularly appreciated for communication applications. [0143] In the present example, the gain is thus determined as a function of the numerator which is the difference indicator. Furthermore, the difference indicator is determined as the difference between the two terms (values). The first term / value is a function of the absolute value of the time frequency tile value of the first frequency domain signal. The second term / value is a function of the absolute value of the time frequency tile value of the second frequency domain signal. Furthermore, the function for calculating the second value further depends on whether the time frequency tile is designated as noise or speech time frequency tile (ie, the time frequency tile is either noise or speech frame) Depends on the department). [0144] In the present example, gain unit 409 is configured to determine a noise coherence estimate C (tk 1, ω 1) indicative of a correlation between the amplitude of the second microphone signal and the amplitude of the first microphone signal. Ru. The function for determining the second value (or possibly the first value) depends in this case on the noise coherence estimate. This allows a more appropriate determination of the appropriate gain value. This is because the second value more accurately reflects the expected or estimated noise component in the first frequency domain signal. [0145] It will be appreciated that any suitable technique for determining the noise coherence estimate C (t k, ω l) may be used. For example, in one calculation that may be performed, the first and second frequency domain signals are compared, telling the speaker not to speak, and the noise correlation estimates C (tk, ω l) for each time frequency tile are It may be determined simply as an average of the ratio of time frequency tile values of the first frequency domain signal and the second frequency domain signal. [0146] In many embodiments, the dependence of gain on whether a time-frequency tile is designated as a speech tile or a noise tile is not a constant value, but is itself dependent on one or more parameters. . For example, the factor α may not be constant in some embodiments, but rather may be a function of the characteristics (whether direct or derivative) of the received signal. [0147] In particular, the gain difference may depend on at least one of the signal level of the first microphone signal; the signal level of the second microphone signal; and the signal to noise estimate for the first microphone signal. These values may be average values over time frequency 03-05-2019 23 tiles, in particular over frequency values and segments. These may in particular be (relatively long-term) indicators for the signal as a whole. [0148] In some embodiments, the factor α may be given as α = f (−v <2> / 2σ <2>). Here, v is the amplitude of the first microphone signal, and σ <2> is the energy / dispersion of the second microphone signal. Thus, in this example, α depends on the signal to noise ratio for the first microphone signal. This may provide improved perceived noise suppression. In particular, for low signal to noise ratios, strong noise suppression is performed, thereby improving eg the intelligibility of the resulting signal. However, for higher signal to noise ratios, the effect is reduced, thereby reducing distortion. [0149] Thus, the function f (−v <2> / 2σ <2>) can be determined and used to adapt the calculation of the gain for the speech signal. The function depends on (−v <2> / 2σ <2>), which corresponds to the SNR, ie the energy v <2> of the speech signal for the noise energy 2σ <2>. [0150] Various functions and techniques for determining gain based on the difference between the absolute values of the first and second microphone signals and the designation of the tile as speech or noise may be used in various embodiments. Will be understood. [0151] In fact, while the particular approach described above may provide particularly advantageous performance in many embodiments, in other embodiments many other functions and techniques may be used depending on the particular characteristics of the application. It is also good. [0152] The difference index may be calculated as: d (t k, ω l) = f 1 (| Z (t k, ω l) |) −f 2 (| X (t k, ω l) |). Here, f 1 (x) and f 2 (x) can be chosen to be any monotonic function that meets the individual preferences and requirements of the individual embodiments. Typically, the functions f 1 (x) and f 2 (x) are monotonically increasing functions. [0153] Thus, the difference measure is a first monotonic function f 1 (x) of the absolute value time frequency tile value of the first frequency domain signal and a second monotonic function of the absolute value time frequency tile value of the second frequency domain signal The difference between f 1 (x) is shown. In some embodiments, the first and second monotonic functions may be the same function. However, in most embodiments, the two functions are different. 03-05-2019 24 [0154] Furthermore, one or both of the functions f 1 (x) and f 2 (x) may depend on various other parameters and indicators, such as the overall averaged power level of the microphone signal, frequency, etc. . [0155] In many embodiments, one or both of the functions f 1 (x) and f 2 (x) may depend on signal values for other frequency tiles. For example, Z (tk, ωl), | Z (tk, ωl) |, f1 (| Z (tk, ωl) |), X (tk, ωl), | X (tk, ωl) Or one or more averages of f 2 (| X (tk, ω 1) |) (ie, the average of the values for the various indices of k and / or l). In many embodiments, averaging over neighborhoods extending in both time and frequency dimensions may be performed. Although specific examples based on the specific differential index equation given above will be described later, it will be understood that the corresponding approach may be applied to other algorithms or functions that determine the differential index. [0156] An example of a possible function for determining the difference measure is, for example: d (tk, ω 1) = | Z (tk, ω 1) | <α> -γ · | X (tk, ω 1) | <β Including>. Here, α and β are design parameters, and typically α = β as follows. [0157] Here, σ (ω 1) is a suitable weighting function used to provide the desired spectral characteristics of noise suppression. (For example, this may increase noise suppression for higher frequencies, eg, containing relatively large amounts of noise energy but likely containing relatively small amounts of speech energy, but may include relatively large amounts of speech energy It may be used to reduce noise suppression for mid frequencies that are likely to contain relatively small amounts of noise energy as a feature. In particular, σ (ω 1) may be used to provide the desired spectral characteristics of noise suppression while keeping the spectral shaping of the speech at a low level. [0158] It will be understood that these functions are merely exemplary, and that many other equations and algorithms can be envisioned to calculate a distance indicator that indicates the difference between the absolute values of the two microphone signals. [0159] In the above equation, the factor γ represents the factor introduced to bias the difference indicator towards negative values. While these examples introduce this bias as a simple scale factor applied to the time-frequency tile of the second microphone signal, it will be appreciated that many other approaches are possible. 03-05-2019 25 [0160] In fact, any suitable method of constructing the first and second functions f 1 (x) and f 2 (x) may be used to provide a bias towards negative values at least for the noise tile . The bias is, in particular, a bias that produces the expected value of the difference indicator that is negative when there is no speech, as in the previous examples. In fact, if both the first and second microphone signals contain only random noise (e.g. the sample values may be distributed symmetrically and randomly around the mean value), then the expected value of the difference indicator is It is not zero but negative. In the previous example, this was achieved by the oversubtraction factor γ, which in the absence of speech leads to negative results. [0161] In order to compensate for differences in signal levels of the first and second microphone signals when speech is present, the gain unit, as mentioned earlier, comprises the amplitude of the second microphone signal and the noise component of the first microphone signal. A noise coherence estimate may be determined that indicates a correlation between the amplitude of the The noise coherence estimate may, for example, be generated as an estimate of the ratio between the amplitudes of the first and second microphone signals. Noise coherence estimates may be determined for individual frequency bands, in particular for each time frequency tile. Various techniques for estimating the amplitude / magnitude relationship between two microphone signals are known to those skilled in the art and will not be described in further detail. For example, average amplitude estimates for different frequency bands may be determined during periods of time where there is no speech (eg, by dedicated manual measurement or by automatic detection of speech pauses). [0162] In the present system, at least one of the first and second monotonous functions f 1 (x) and f 2 (x) may compensate for the amplitude difference. In the previous example, the second monotonic function compensated for the amplitude difference by scaling the absolute value of the second microphone signal by the value C (t k, ω l). In other embodiments, the compensation may alternatively or additionally be performed by a first monotonic function, for example by scaling the absolute value of the first microphone signal by 1 / C (tk, ω 1) Good. [0163] Furthermore, in most embodiments, if the first monotonic function and the second monotonic function correspond to the estimated correlation between the amplitude relationship between the first microphone signal and the second microphone signal and the time frequency It is such that when the tile is designated as a noise tile, a negative expectation of the difference measure is generated. 03-05-2019 26 [0164] In particular, the noise coherence estimate may be calculated such that the estimated or expected absolute value difference (especially for a particular frequency band) between the first microphone signal and the second microphone signal is It may be shown to correspond to the ratio given by tk, ω l). In such a case, the first monotonic function and the second monotonic function specify that the corresponding time-frequency tile value has an absolute value equal to C (tk, ω l) (and that the time-frequency tile is a noise tile) And the generated difference indicator is chosen to be negative. [0165] For example, the noise coherence estimate [0166] It may be determined as (In practice, the values may be generated, for example, by averaging a suitable number of values in different time frames. ) In such cases, the first and second monotonic functions f 1 (x) and f 2 (x) are [0167] Then, it is selected to have an attribute that the difference index d (t k, ω l) has a negative value. That is, the first and second monotonic functions f 1 (x) and f 2 (x) are for noise tiles [0168] It is chosen to be [0169] In the previous example, this is because the difference index d (tk, ω1) = | Z (tk, ω1) | -γ n C (tk, ω1) | X (tk, ω1) | This was achieved by having an oversubtracting factor γ n with a value. [0170] In this example, f 1 (x) = x and f 2 (x) = γ n C (tk, ω l) x, but there are infinite other monotonous functions, which may be used instead It will be understood. Furthermore, in this example, the compensation for the noise level difference between the first and second microphone signals and the bias towards the negative difference index value are compensated to the second monotonic function f 2 (x) Is achieved by including However, it will be appreciated that in other embodiments, this may alternatively or additionally be achieved by including a compensation factor in the first monotonic function f 1 (x). [0171] Furthermore, in the described approach, the gain depends on whether the time-frequency tile is designated as an utterance or a noise tile. In many embodiments, this may be achieved by depending on whether the difference measure is specified as a time frequency tile as a speech or 03-05-2019 27 noise tile. [0172] Specifically, the gain unit specifies the expected value of the difference index when the time-frequency tile absolute value actually corresponds to the noise / coherence estimate, whether the time-frequency tile is designated as an utterance tile or as a noise tile It may be configured to change at least one of the first monotonous function and the second monotonous function so as to differ depending on whether it is different or not. [0173] As an example, the expected value for the difference measure when the relative noise level between the two microphone signals is as expected according to the noise coherence estimate, the tile is designated as a noise tile If the tile is specified as a speech tile, it may be a negative value. [0174] In many embodiments, the expected value may be negative for both the speech and noise tiles, but more negative for the noise tiles than for the speech tiles (ie, greater in magnitude / absolute value) Is large). [0175] In many embodiments, the first and second monotonic functions f 1 (x) and f 2 (x) may also include bias values that are modified depending on whether the tile is a speech tile or a noise tile Good. As a specific example, the previous example is | Z (tk, ω1) | -γ n C (tk, ω1) | X (tk, ω1) | For noise frame | Z (tk, ω1) | −γ s · α · C (tk, ω 1) | X (tk, ω 1) | For the speech frame, we used the difference index given by γ n> γ s. [0176] Alternatively, in this example, the difference index is d (tk, ω1) = | Z (tk, ω1) | -γ (D (tk, ω1)) · C (tk, ω1) | X (tk) , ω l) |. Here, D (t k, ω l) is a value indicating whether the tile is a noise tile or an utterance tile. [0177] For completeness, the requirement that the calculated difference measure have specific attributes for specific values / attributes of the input signal value provides an objective reference for the actual function used, which is Note that it does not depend on the signal value of or on the actual signal being processed. In particular, [0178] The requirement provides a limiting criterion for the function used. 03-05-2019 28 [0179] It will be appreciated that many different functions and techniques may be used to determine the gain based on the difference measure. The gain is generally constrained to a nonnegative value to avoid phase inversion and associated degradation. In many embodiments, it may be advantageous to constrain the gain not to fall below a certain minimum gain (thereby ensuring that any particular frequency band / tile is not completely attenuated). [0180] For example, in many embodiments, the gain is simply the smallest gain with gain (eg, G (tk, ω 1) = MAX (φ · d (tk, ω 1), θ). May be determined by scaling the difference indicator while ensuring that the gain is kept above (which may be zero) to ensure that it is not negative. Here, φ is a preferred selected scale factor for a particular embodiment (eg determined by trial and error) and θ is a non-negative value. [0181] In many embodiments, the gain may be a function of other parameters. For example, in many embodiments, the gain may be dependent on the attributes of at least one of the first and second microphone signals. In particular, scale factors may be used to normalize the difference measure. As a specific example, the gain is [0182] It may be determined as That is, φ (t k, ω 1) = 1 / | Z (t k, ω 1) |. For example, d (tk, ω1) = | Z (tk, ω1) | -γ (D (tk, ω1)) · C (tk, ω1) | X (tk, ω1) | d (tk, ω1) = | Z (tk, ω1) | -γ n C (tk, ω1) | X (tk, ω1) | For noise frame d (tk, ω1) = | Z (tk, ω 1) | −γ s · α · C (tk, ω 1) | X (tk 1, ω 1) | corresponds to the above example by putting on the speech frame). [0183] Thus, the gain calculation may include normalization. [0184] In other embodiments, more complex functions may be used. For example, a non-linear function may be used to determine the gain as a function of the difference indicator, for example G (t k, ω l) = MAX (δ · log d (t k, ω l), θ). Here, δ may be a constant. [0185] In general, the gain can be determined as the non-negative function of the difference measure: G (tk, ωl) = f3 (d (tk, ωl)). [0186] Typically, the gain can be determined as a monotonic function of the difference indicator, in particular as a monotonically increasing function. Thus, typically, the difference measure 03-05-2019 29 indicates a greater difference between the first and second microphone signals, whereby the time-frequency tile has a large amount of speech (which is mainly located near the speaker) Higher gain results when reflecting the increased probability including (captured by the microphone signal of [0187] Similar to the algorithm or function for determining the difference measure, the function for determining the gain may further depend on other parameters or characteristics. In fact, in many embodiments, the gain function may depend on the characteristics of one or both of the first and second microphone signals. For example, as noted above, the function may include normalization based on the absolute value of the first microphone signal. [0188] Another example of a possible function to calculate the gain from the difference measure is [0189] May be included. Here, σ (ω 1) is a suitable weighting function. [0190] A rigorous approach to determining gain depending on time frequency tile values and designation as speech or noise tiles is chosen to provide the desired operating characteristics and performance for a particular embodiment and application. It will be understood that it may be done. [0191] Thus, the gain may be determined as G (t k, ω l) = f 4 (α (t k, ω l), d (t k, ω l)). Here, α (tk, ω l) reflects whether the tile is designated as an utterance tile or a noise tile, and f 4 is a time-frequency tile value of the first and second microphone signals. It may be any suitable function or algorithm that includes components that reflect the difference between absolute values. [0192] Thus, the gain value for the time frequency tile depends on whether the tile is designated as a speech time frequency tile or a noise time frequency tile. In fact, for a given time frequency tile, gains are determined such that when the time frequency tile is specified as a noise tile, a lower gain value is determined than when the time frequency tile is specified as an utterance tile. [0193] The gain value may be determined by first determining the difference measure and then 03-05-2019 30 determining the gain value from the difference measure. The dependence on noise / speech designation may be included in the determination of the difference measure, in the determination of the gain from the difference measure, or in the determination of both the difference measure and the gain. [0194] Thus, in many embodiments, the difference measure may depend on whether the time frequency tile is designated as a noise frequency tile or a speech frequency tile. For example, one or both of the above functions f 1 (x) and f 2 (x) may depend on values indicating whether the time frequency tile is designated as noise or speech. The dependency may be such that when the time frequency tile is designated as an utterance tile (for the same microphone signal value), a larger difference indicator is calculated than when it is designated as a noise tile. [0195] For example, in the example given above for the calculation of the gain G (tk, ω l), the numerator may be considered as a difference indicator, so the difference indicator may specify whether the tile is designated as an utterance tile or noise It differs depending on whether it is specified as a tile. [0196] More generally, the difference index is d (tk, ω1) = f5 (α (tk, ω1), f1 (| Z (tk, ω1) |) -f2 (| X (tk, ω1) May be indicated by)). Here, α (tk, ω l) depends on whether the tile is specified as an utterance tile or a noise tile, and f 5 is a noise tile when α indicates that the tile is an utterance tile Depends on α so that the difference index is larger than when. [0197] Alternatively or additionally, the function for determining the gain value from the difference measure may depend on the speech / noise designation. Specifically, the following function may be used: G (tk, ωl) = f6 (d (tk, ωl), α (tk, ωl)) where α (tk, ω) l) depends on whether the tile is specified as a speech tile or a noise tile, f 6 has more gain than when it is a noise tile when α indicates that the tile is a speech tile Depends on α to be As mentioned earlier, any suitable approach may be used to designate the time frequency tiles as speech or noise tiles. However, in some embodiments, the designation is advantageously also based on the value of the difference determined by calculating the difference measure under the assumption that the time frequency tile is a noise tile Good. Thus, the differential index function for the noise time frequency tile can be calculated. If this difference indicator is low enough, it indicates that the time frequency tile value of the first frequency domain signal is predictable from the time frequency tile value of the second frequency domain signal. This is typically the case if the first frequency domain signal does not contain significant speech components. Thus, in some embodiments, a tile is designated as a noise tile if the difference measure calculated using the noise tile calculation is less than a threshold. Otherwise, the tile is designated as an utterance tile. 03-05-2019 31 [0198] An example of such an approach is shown in FIG. As shown, the designator 415 of FIG. 4 calculates the difference value for the time frequency tile by evaluating the distance indicator assuming that the time frequency tile is actually a noise tile. You may have. The resulting difference value is input to the tile designator 803. The tile designator 803 designates the tile as being a noise tile if the value of the distance is less than a given threshold, otherwise proceeds to designating as a speech tile. [0199] This approach provides very efficient and accurate detection and designation of tiles as speech or noise tiles. Furthermore, facilitated implementation and operation are achieved by reusing the function for calculating gains as part of the designator. For example, for all timefrequency tiles designated as noise tiles, the calculated difference measure can be used directly to determine the gain. Recalculation of the difference measure is only required by the gain unit 409 for time frequency tiles designated as speech tiles. [0200] In some embodiments, low pass filtering / smoothing (/ average) may be included in the specification based on the value of the difference. The filtering may in particular be across different time frequency tiles in both the frequency domain and the time domain. Thus, filtering may be performed over time frequency tiles in at least one of the time segments, as well as over the value of the difference of time frequency tiles belonging to different (nearby) time segments / frames. The inventors have realized that such filtering can provide substantial performance improvement and substantially improved designation, and thus provide substantially improved noise suppression. [0201] In some embodiments, low pass filtering / smoothing (/ average) may be included in the gain calculation. The filtering may in particular be across different time frequency tiles in both the frequency domain and the time domain. Thus, filtering may be performed over time frequency tile values belonging to different (nearby) time segments / frames and over a plurality of time frequency tiles in at least one of said time segments. The inventors have come to realize that such filtering can provide substantial performance improvement and substantially improved perceived noise suppression. [0202] Smoothing (ie low pass filtering) may in particular be applied to the calculated gain values. Alternatively or additionally, filtering may be applied to the first and second frequency domain signals prior to gain calculation. In some embodiments, filtering may be applied to the 03-05-2019 32 parameters of the gain calculation, eg, to the difference measure. [0203] Specifically, in some embodiments, gain unit 409 may be configured to filter gain values across multiple time frequency tiles. Here, the filtering includes time frequency tiles that differ in both time and frequency. [0204] Specifically, the output value may be calculated using an averaged / smoothed version of the unclipped gain: [0205] In some embodiments, after gain averaging, the gain lower limit may be determined. This is, for example, [0206] By calculating as. Here, G (t k, ω l) is calculated as a monotonic function of the difference index, but is not restricted to non-negative values. In fact, unclipped gains may have negative values for differential indicators that are negative. [0207] In some embodiments, the gain unit calculates at least one of an absolute value time frequency tile value of the first frequency domain signal and an absolute value time frequency tile value of the second frequency domain signal, which are gain values. It may be configured to filter before it is used. Thus, virtually, in this example, filtering is performed on the input to the gain calculation, not on the output. [0208] An example of this approach is shown in FIG. This example corresponds to the example of FIG. 8, but a low pass filter 901 is added which performs low pass filtering of the absolute values of the time frequency tile values of the first and second frequency domain signals. In this example, the absolute time frequency tile value [0209] Is the filtered and smoothed vector [0210] (In the figure [0211] Given as). 03-05-2019 33 [0212] In this example, the previously described functions for determining the gain values are for noise and speech tiles respectively: [0213] May be replaced by Here, ¯ means smoothing (average) over neighboring values in the (t, ω) plane. [0214] The filtering may in particular use a uniform window such as a rectangular window in time and frequency or a window based on characteristics of human hearing. In the latter case, the filtering may in particular be in accordance with so-called critical bands. The critical band refers to the frequency bandwidth of the "hearing filter" created by the cochlea. For example, an octave band or a bark critical band may be used. [0215] The filtering may be frequency dependent. In particular, at low frequencies, the average may be over only a few frequency bins. On the other hand, more frequency bins may be used at higher frequencies. [0216] Smoothing / filtering may be performed by averaging over neighboring values. たとえば ： [0217] Here, for example, N = 1, and W (m, n) is a 3 × 3 matrix with 1/9 each weight. N can also depend on the critical band, in which case it can depend on the frequency index l. For higher frequencies, N will typically be larger than for lower frequencies. [0218] In some embodiments, filtering is performed by filtering the difference indicator, for example, [0219] It may be by calculating as [0220] As discussed below, filtering / smoothing may provide substantial performance improvement. 03-05-2019 34 [0221] Specifically, when filtering in the (t k, ω l) plane, the variance of the noise components, particularly at | Z (t k, ω l) | and | X (t k, ω l) |, is substantially reduced. [0222] If there is no speech, ie, | Z (t k, ω l) | = | Z n (t k, ω l) | and assuming C (t k, ω l) = 1, [0223] となる。 Here, | Z (tk, ωl) | and | X (tk, ωl) | are smoothed over L independent values. [0224] Smoothing does not change the mean. よって [0225] である。 [0226] The variance of the difference of two stochastic signals is equal to the sum of the individual variances: [0227] If we limit d 〔d with bar to 0, the distribution of d d is symmetrical around 0, so the power of d d is half the value of the variance of d d: [0228] Here, comparing the power of the residual signal with the power of the input signal (2σ <2>), the following can be obtained for noise suppression caused by the post-processor: A = −10 log 10 ((4-π) / 4 L) = 6.68 + 10 log 10 L dB. [0229] As an example, when averaged over nine independent values, an additional 9.5 dB of suppression is obtained. [0230] Excessive damping combined with smoothing further increases damping. variable [0231] If we consider the smoothing as compared to the unsmoothed value, [0232] Cause a decrease in the 03-05-2019 35 [0233] Distribution will be more concentrated around the expected value. Expected value is negative, [0234] Given by [0235] Closed-form expressions for the sum (or difference) of independent Rayleigh random variables can not be obtained for ≧ 3. However, simulation results for attenuation in dB for various smoothing factors L and oversubtraction factors γ n are presented in the table below. Here, the first row corresponds to no smoothing. In this table, the rows show different oversubtraction factors (whose values are given in the first column) and the columns show different average areas (the number of tiles to be averaged are presented in the first row) Show. [0236] As can be seen, very high attenuation is achieved. [0237] For speech, the filtering / smoothing effect is very different than for noise. [0238] First, | X (t k, ω l) | has no speech information, and hence す る d does not include a “negative” speech contribution. Furthermore, the utterance components in neighboring time frequency tiles in the (t k, ω l) plane will not be independent. As a result, smoothing will not have much effect on the speech energy in d. Thus, the overall effect of smoothing is an increase in SNR, as filtering results in substantially reduced variance for noise but much less impact on speech components. This may be used for gain value determination and / or designation of time frequency tiles as described above. [0239] As an example, in many embodiments, the difference indicator is [0240] It may be determined as Here, f a and f b are monotonic functions, and K 1 through K 8 are integer values that define the average neighborhood for the time frequency tile. Typically, the total number of time frequency tile values summed in the values K 1 to K 8 or at least each sum may be identical. However, in instances where the number of values is different for two sums, the corresponding functions f a (x) and f b (x) may include compensation for differences in the number of values. 03-05-2019 36 [0241] The functions f a (x) and f b (x) may, in some embodiments, include weighting of values in the sum. That is, it may depend on the index of the sum. Same thing, [0242] Thus, in this example, time frequency tile values for both the first and second frequency domain signals are averaged / filtered over the neighborhood of the current tile. [0243] Examples of functions include the exemplary functions given above. In many embodiments, f 1 (x) or f 2 (x) further depends on a noise coherence estimate indicating an average difference between noise levels of the first and second microphone signals. May be One or both of the functions f 1 (x) or f 2 (x) specifically include scaling by a scale factor that reflects the estimated average noise level difference between the first and second microphone signals It is also good. One or both of the functions f 1 (x) or f 2 (x) may in particular depend on the coherence term C (t k, ω l) described above. [0244] As mentioned earlier, the difference measure is a first value generated as a monotonic function of the absolute value of the time frequency tile value for the first microphone signal and the absolute value of the time frequency tile for the second microphone signal. Calculated as the difference between the value and the monotonic function, that is, d (tk, ω1) = f1 (| Z (tk, ω1) |) f2 (| X (tk, ω1) |) Ru. Here, f 1 (x) and f 2 (x) are monotonous functions of x (typically monotonically increasing functions). In many embodiments, the functions f 1 (x) and f 2 (x) may simply be absolute value scaling. [0245] A particular advantage of such an approach is that a difference measure based on absolute value based subtraction can take both positive and negative values when only noise is present. This is particularly suitable for averaging / smoothing / filtering. In that case, for example, fluctuations around the zero mean tend to cancel each other. However, when speech is present, this is mainly only in the first microphone signal, ie mainly in | Z (t k, ω l) |. Thus, for example, smoothing or filtering over neighboring time frequency tiles tends to reduce the noise contribution in the difference measure but not the speech component. Thus, a particularly advantageous synergy can be achieved by combining the mean and the difference absolute value based difference index. [0246] The above description describes a scenario where only one of the microphones captures speech while the other microphones capture only diffuse noise without speech components (e.g., 03-05-2019 37 as illustrated in FIG. 5) The speaker is relatively close, and the reference microphone has focused on (almost) no pickup). [0247] Thus, in this example, it is assumed that the reference microphone signal x (n) has almost no speech, and the noise components in z (n) and x (n) originate from the diffuse sound field. The distance between the microphones is relatively large, and the coherence between the noise components of the plurality of microphones is approximately zero. [0248] However, in practice, the microphones are often placed in close proximity, as a result the two effects can be more significant. That is, both microphones may begin to capture elements of the desired speech, which means that the coherence between microphone signals at low frequencies can not be ignored. [0249] In some embodiments, the noise suppressor may further comprise an audio beamformer configured to generate the first microphone signal and the second microphone signal from the signals from the microphone array . This example is shown in FIG. [0250] The microphone array may have only two microphones in some embodiments, but typically has more. A beamformer depicted as a BMF unit may generate a plurality of different beams directed in different directions, which may each generate one of the first and second microphone signals . [0251] The beamformer may in particular be an adaptive beamformer in which one beam can be directed towards the speech source using a suitable adaptive algorithm. At the same time, other beams can be adapted to generate a notch (or in particular a null) in the direction of the speech source. [0252] For example, while U.S. Pat. Nos. 5,985,859 and 6,086,095 disclose examples of adaptive beamformers that focus on speech, they also provide (almost) speech-free reference signals. Such an approach may be used to generate a first microphone signal as the beamformer's primary output and a second microphone signal as the beamformer's secondary output. [0253] This may address the problem of the presence of speech in more than one microphone of the system. Noise components are obtained in both beamformer signals and are also Gaussian 03-05-2019 38 distributed with respect to diffuse noise. The coherence function between the noise components in z (n) and x (n) also depends on sinc (kd) as described above. That is, at higher frequencies, the coherence is nearly zero, and the noise suppressor of FIG. 4 can be used effectively. [0254] Because of the smaller distance between the microphones, sinc (kd) will not be zero for lower frequencies, and as a result the coherence between z (n) and x (n) will not be zero. [0255] In some embodiments, the noise suppressor further comprises an adaptive canceller for canceling from the first microphone signal a signal component of the first microphone signal that is correlated with the second microphone signal. May be [0256] An example of a noise suppressor with both the suppressor of FIG. 4 and the beamformer and adaptive canceller of FIG. 10 is shown in FIG. [0257] In this example, the adaptive canceller implements an additional adaptive noise cancellation algorithm that removes noise that is correlated with noise in x (n) at z (n). For such an approach, the coherence between x (n) and the residual signal r (n) (by definition) is zero. [0258] It will be appreciated that the above description has described embodiments of the invention with reference to various functional circuits, units and processors for clarity. However, it will be apparent that any suitable distribution of functionality between different functional circuits, units or processors may be used without detracting from the invention. For example, functionality illustrated to be performed by separate processors or controllers may be performed by the same processor or controller. Thus, reference to a specific functional unit or circuit should only be viewed as referring to the preferred means for providing the described functionality, rather than indicating the exact logical or physical structure or organization. . [0259] The invention can be implemented in any suitable form including hardware, software, firmware or any combination of these. The invention may optionally be implemented at least partially as computer software running on one or more data processors and / or digital signal processors. The elements and components of an embodiment of the present invention may be implemented physically, functionally and logically in any suitable manner. In fact, functions may be implemented in a single unit, in multiple units, or as part of other functional units. Thus, the present invention may be implemented in a single unit, or physically and functionally distributed between different units, circuits and processors. 03-05-2019 39 [0260] Although the present invention has been described in the context of some embodiments, it is not intended to be limited to the specific form set forth herein. Rather, the scope of the present invention is limited only by the appended claims. Further, while certain features may appear to be described in the context of particular embodiments, one skilled in the art will appreciate that various features of the described embodiments may be combined in accordance with the present invention. You will recognize the good thing. In the claims, the term comprising / comprising does not exclude the presence of other elements or steps. [0261] Furthermore, although individually listed, a plurality of means, elements, circuits or method steps may be implemented by eg a single circuit, unit or processor. Furthermore, even if individual features are included in different claims, they can possibly be advantageously combined, and it is not possible that combinations of features are realized that are included in different claims. It does not imply that it is not and / or not advantageous. Also, the inclusion of a feature in one category of claims does not imply a limitation to this category, but rather that the features are equally applicable to claims in other categories as appropriate. Indicates Furthermore, the order of features in the claims does not imply any particular order in which they must operate. In particular, the order of the individual steps in the method claims does not imply that the steps have to be performed in that order. Rather, the steps may be performed in any suitable order. In addition, singular references do not exclude a plurality. References to "a", "first", "second" etc do not exclude a plurality. Reference signs in the claims are provided merely as a clarifying example and shall not be construed as limiting the scope of the claims in any way. 03-05-2019 40

1/--страниц