Patent Translate Powered by EPO and Google Notice This translation is machine-generated. It cannot be guaranteed that it is intelligible, accurate, complete, reliable or fit for specific purposes. Critical decisions, such as commercially relevant or financial decisions, should not be based on machine-translation output. DESCRIPTION JP2010271411 [Object] To achieve both reduction of musical noise and effective suppression of noise components. A first noise suppressing unit 32 subtracts a spectrum Nw [j] of stationary noise from a spectrum X [j] of an acoustic signal V [j] of each channel at a degree according to a subtraction coefficient α. The coefficient setting unit 44 generates a filter coefficient W for emphasizing the target sound component from the non-stationary noise spectra Nd [1] to Nd [J]. The second noise suppression unit 42 generates a spectrum Z by performing directional array processing in which the filter coefficient W is applied to the spectra Y [1] to Y [J] processed by the first noise suppression unit 32. The index calculating unit 62 calculates a kurtosis change index KR indicating the degree to which the kurtosis in the frequency distribution of the signal intensity has changed before the processing of the first noise suppressing unit 32 and after the processing of the second noise suppressing unit. The coefficient adjustment unit 64 variably controls the subtraction coefficient α such that the kurtosis change index KR approaches the target value K0. [Selected figure] Figure 1 Noise suppressor and program [0001] The present invention relates to techniques for suppressing noise components from acoustic signals. [0002] Conventionally, a technique for suppressing a noise component from a mixed sound of a target 08-05-2019 1 sound component and a noise component has been proposed. For example, Patent Document 1 discloses a technique for subtracting the spectrum of a noise component estimated by independent component analysis from the spectrum of an acoustic signal in which a target sound component is emphasized by a delay-and-add type beamformer. [0003] JP, 2007-248534, A [0004] However, in the technology of suppressing noise components in the frequency domain as in Patent Document 1, after the noise components are suppressed, the components scattered on the time axis and on the frequency axis are perceived by the listener as artificial musical noise. Be done. If the degree of subtraction of the noise component is suppressed, the musical noise decreases, but there is a problem that the noise component can not be sufficiently suppressed (the SN ratio after processing is low). In view of the above circumstances, the present invention aims to achieve both reduction of musical noise and effective suppression of noise components. [0005] In order to solve the above problems, a noise suppression device according to the present invention is a device that suppresses noise components from acoustic signals of a plurality of channels generated by a plurality of sound collecting devices, and noises of acoustic signals of each channel Noise extraction means for extracting a component, stationary noise estimation means for estimating stationary noise included in the noise component, and first noise for subtracting the spectrum of stationary noise from the spectrum of the acoustic signal of each channel to a degree according to the subtraction coefficient Suppression means, non-stationary noise estimation means for estimating the spectrum of non-stationary noise by subtracting the spectrum of stationary noise from the spectrum of noise components of each channel, and a spectrum of non-stationary noise with filter coefficients for enhancing the target sound component Means for generating coefficients from the filter and a filter coefficient applied to acoustic signals of a plurality of channels processed by the first noise suppressing means. A kurtosis change index indicating the degree to which the second noise suppression means 08-05-2019 2 performing processing and the kurtosis in the frequency distribution of the intensity of the acoustic signal change before and after processing by the first noise suppression means And a coefficient adjustment unit that variably controls the subtraction coefficient in accordance with the kurtosis change index. [0006] In the above embodiment, according to the kurtosis change index indicating the degree to which the kurtosis in the frequency distribution of the intensity of the acoustic signal changes before processing by the first noise suppression means and after processing by the second noise suppression means Since the subtraction coefficient of the processing of the noise suppression means is variably controlled, it is possible to effectively suppress the noise component while suppressing the musical noise caused by the processing by the first noise suppression means. [0007] In a preferred aspect of the present invention, the coefficient adjustment means sets a subtraction coefficient such that the kurtosis change index approaches a predetermined value. In the above aspect, there is an advantage that the noise component can be effectively suppressed while suppressing the musical noise resulting from the processing by the first noise suppression means to a desired degree according to the predetermined value. [0008] The noise suppression device according to each of the above aspects is realized by hardware (electronic circuit) such as DSP (Digital Signal Processor) dedicated to noise suppression, and general-purpose arithmetic processing device such as CPU (Central Processing Unit) Is also realized by the collaboration between A program according to the present invention includes noise extraction processing for extracting noise components of acoustic signals of respective channels generated by a plurality of sound collection devices, stationary noise estimation processing for estimating stationary noises included in the noise components, stationary noise The first noise suppression processing of subtracting the spectrum from the spectrum of the acoustic signal of each channel to a degree 08-05-2019 3 according to the subtraction coefficient, and the spectrum of stationary noise by subtracting the spectrum of stationary noise from the spectrum of the noise component of each channel Nonstationary noise estimation processing, coefficient setting processing for generating filter coefficients for emphasizing the target sound component from the non-stationary noise spectrum, and filter coefficients for acoustic signals of a plurality of channels after execution of the first noise suppression processing Second noise suppression processing and kurtosis in the frequency distribution of the intensity of the acoustic signal before execution of the first noise suppression processing and the second noise suppression processing And index calculation process of calculating a kurtosis change index indicating the degree of change and after execution, to perform the coefficient adjustment processing for variably controlling the subtraction factor according to the kurtosis change index on the computer. According to the above program, the same operation and effect as the noise suppression device according to each aspect of the present invention are achieved. The program according to the present invention is provided to the user in the form of being stored in a computer readable recording medium and installed in the computer, and is also provided from the server device in the form of distribution via a communication network. Installed on a computer [0009] It is a block diagram of a noise suppression device concerning an embodiment. It is a conceptual diagram for demonstrating the change of the kurtosis in frequency distribution of the intensity | strength of an acoustic signal. It is a conceptual diagram for demonstrating the effect | action of directional array processing. It is a graph which shows the relationship between a subtraction coefficient and a kurtosis change index. It is a graph which shows the relationship between a subtraction coefficient and a noise suppression rate. It is a flowchart of operation | movement of a noise suppression apparatus. It is a graph for demonstrating the effect of embodiment. It is a graph for demonstrating the effect of embodiment. It is a block diagram of a noise extraction part concerning a modification. It is a block diagram of a noise extraction part concerning a modification. [0010] FIG. 1 is a block diagram of a noise suppression apparatus 100 according to an embodiment of the present invention. J (J is a natural number of 2 or more) sound collecting devices 12 [1] to 12 [J] (microphone array) arranged in the plane PL with a predetermined interval mutually connected to the noise suppression device 100 Ru. The sound collection device 12 [j] (j = 1 to J) generates an acoustic signal V [j] in the time domain that represents the waveform of the sound 08-05-2019 4 coming from the surroundings. The symbol j is the channel number of the acoustic signal V [j]. [0011] A mixed sound of the target sound component and the noise component arrives from the surroundings to the sound collection devices 12 [1] to 12 [J]. The target sound component is sound (voice or musical tone) which is the purpose of sound collection. The target sound component arrives at the sound collection devices 12 [1] to 12 [J] from a direction forming a known angle に 対 し て with respect to the normal to the plane PL. For example, assuming that the noise suppression device 100 is mounted on an electronic device (for example, a mobile phone) that receives the user's voice, the voice coming from the front direction (ξ = 0 °) with respect to the main body of the electronic device Corresponds to the target sound component. [0012] On the other hand, the noise component is a component other than the target sound component, and may include stationary noise and non-stationary noise. Stationary noise is a component that changes little with time (or does not change with time) in acoustic characteristics (for example, sound pressure). For example, the operation noise of the air conditioning equipment and the noise in crowded people correspond to the stationary noise. On the other hand, non-stationary noise is a component (instantaneous noise) whose acoustic characteristics change with time over time. For example, speech (utterance sound) and musical tones other than the target sound component correspond to non-stationary noise. [0013] The noise suppression apparatus 100 generates an acoustic signal VOUT in the time domain by executing processing for suppressing noise components (stationary noise and nonstationary noise) on the acoustic signals V [1] to V [J]. . The acoustic signal VOUT generated by the noise suppression device 100 is reproduced as sound by being supplied to the sound emission device 14 (for example, a speaker or a headphone). Note that illustration of an A / D converter that converts the acoustic signals V [1] to V [J] into digital signals, and a D / A converter that converts the acoustic signals VOUT into analog signals is omitted for convenience. . 08-05-2019 5 [0014] The noise suppression device 100 executes a program stored in a storage device (not shown) to execute a plurality of functions (frequency analysis unit 22, noise extraction unit 24, stationary noise estimation unit 26, first noise suppression unit 32, non- This is realized by an arithmetic processing unit that executes the stationary noise estimation unit 34, the filter processing unit 40, the waveform synthesis unit 52, and the suppression control unit 60). However, a configuration in which an electronic circuit (DSP) dedicated to noise suppression realizes the respective elements in FIG. 1 or a configuration in which the respective elements in FIG. 1 are dispersed in a plurality of integrated circuits are also adopted. [0015] The frequency analysis unit 22 divides the spectrum (power spectrum) X [j] (X [1] to X [J]) of each frame obtained by dividing the sound signal V [j] on the time axis into the sound signal V [1] to It is generated for each channel of V [J]. The spectrum X [j] is a series of intensity (power) at each of a predetermined number of frequencies discretely set on the frequency axis. A known technique (for example, short time Fourier transform) is arbitrarily adopted to generate the spectrum X [j]. [0016] The noise extraction unit 24 extracts a noise component included in the acoustic signal V [j] of each channel for each frame. Specifically, the noise extraction unit 24 generates the spectrum (power spectrum) N [j] (N [1] to N [J]) of the noise component for each frame. The spectrum X [j] matches the spectrum N [j] of the noise component in the noise section where the target sound component does not exist in the acoustic signal V [j]. Therefore, the noise extraction unit 24 divides the acoustic signal V [j] (time series of the spectrum X [j]) into the target sound section and the noise section on the time axis, and the spectrum X of each frame in the noise section Identify j] as the spectrum N [j] of the noise component. A well-known voice activity detection (VAD) technique is arbitrarily adopted to distinguish between the target sound section and the noise section. [0017] 08-05-2019 6 The stationary noise estimation unit 26 estimates stationary noise included in the noise component of each channel extracted by the noise extraction unit 24. Stationary noise is a temporally stationary component of noise components as described above. Therefore, the stationary noise estimating unit 26 averages (temporally averages) the spectrum N [j] of the noise component generated by the noise extracting unit 24 over a plurality of frames in the noise section, thereby obtaining the spectrum (power spectrum) Nw of the stationary noise. [j] (Nw [1] to Nw [J]) are generated. Non-stationary noise is removed from the spectrum Nw [j] by averaging the spectrum N [j]. The stationary noise spectrum Nw [j] is sequentially updated for each noise section. That is, the spectrum Nw [j] estimated in the noise section immediately before is maintained in the target sound section. [0018] The first noise suppression unit 32 suppresses stationary noise included in the acoustic signal V [j] for each channel in the frequency domain. As shown in FIG. 1, the first noise suppressing unit 32 includes J subtracting units SA [1] to SA [J] corresponding to the total number of channels of the audio signals V [1] to V [J]. Configured The subtracting unit SA [j] corresponding to the j-th channel subtracts the spectrum Nw [j] of stationary noise from the spectrum X [j] of the acoustic signal V [j] in the frequency domain (spectrum subtraction). A spectrum (power spectrum) Y [j] (Y [1] to Y [J]) is generated for each frame. Specifically, the subtraction unit SA [j] calculates the spectrum Y [j] by the following equation (1a) and equation (1b). [0019] That is, for the frequency at which the spectrum X [j] of the acoustic signal V [j] exceeds the threshold value Th1, the multiplication value of the stationary noise spectrum Nw [j] and the subtraction coefficient α is The spectrum Y [j] is calculated by subtracting it from X [j]. On the other hand, for frequencies at which the spectrum X [j] of the acoustic signal V [j] falls below the threshold Th1, as shown in equation (1b), the spectrum is obtained by multiplying the stationary noise spectrum X [j] by the flooring coefficient β. Y [j] is calculated. The threshold Th1 is set to, for example, a multiplication value of the subtraction coefficient α and the spectrum Nw [j]. As understood from Equation (1a) and Equation (1b), the subtraction coefficient α functions as a numerical value that determines the degree of suppression of the noise component (stationary noise). That is, the effect of suppressing stationary noise (the performance of noise suppression) increases as the subtraction coefficient α increases. 08-05-2019 7 [0020] The nonstationary noise estimation unit 34 estimates the spectrum (power spectrum) Nd [j] (Nd [1] to Nd [J]) of nonstationary noise included in the acoustic signal V [j] of each channel for each frame. . As shown in FIG. 1, non-stationary noise estimation unit 34 includes J subtraction units SB [1] to SB [J] corresponding to the total number of channels of acoustic signals V [1] to V [J]. Configured [0021] The noise component is a mixed sound of stationary noise and non-stationary noise. Therefore, the subtracting unit SB [j] corresponding to the j-th channel uses the spectrum N [j] of the stationary noise from the spectrum N [j] of each frame in the noise section specified by the noise extracting unit 24 in the frequency domain. The spectrum Nd [j] (Nd [1] to Nd [J]) of nonstationary noise is generated for each frame in the noise section by subtraction (spectral subtraction). For each frame in the target sound interval, the spectrum Nd [j] of the last frame in the immediately preceding noise interval is continuously output from the subtraction unit SB [j]. [0022] As described above, non-stationary noise in each frame in the target sound segment is not directly extracted from within the target sound segment. However, when the target sound component is, for example, the voice of one speaker, the noise section and the target sound section are alternately switched in a sufficiently short time with respect to the speed of fluctuation of non-stationary noise. Therefore, although the spectrum Nd [j] extracted from each frame in the noise section is used as the spectrum Nd [j] of non-stationary noise in the target sound section, the accuracy of noise suppression is excessively reduced. There is no. [0023] The following Formula (2a) and Formula (2b) are applied to the calculation of the spectrum Nd [j] by the calculation unit SB [j]. [0024] 08-05-2019 8 That is, for frequencies at which the noise component spectrum N [j] exceeds the threshold value Th2 (eg, the product of the coefficient δ and the spectrum Nw [j]), the stationary noise spectrum Nw [j ] Is subtracted from the spectrum N [j] of the noise component to calculate the spectrum Nd [j]. On the other hand, for frequencies at which the spectrum N [j] falls below the threshold value Th2, the spectrum Nd [j] of non-stationary noise is set to a predetermined value ε, as shown in equation (2b). The predetermined value ε is set to, for example, the product of the spectrum N [j] of the noise component and a predetermined coefficient. [0025] Since the target sound component, stationary noise and non-stationary noise are mixed in the acoustic signal V [j], the spectrum Y [j] after suppression of stationary noise by the first noise suppression unit 32 is the target sound component and non-stationary Including noise. The filter processing unit 40 sets the spectrum (power spectrum) Z of the acoustic signal VOUT in which the target sound component is emphasized (non-stationary noise is suppressed) from the spectrum Y [1] to Y [J] after suppression of the stationary noise. To generate sequentially. The waveform synthesis unit 52 converts the spectrum Z of each frame generated by the filter processing unit 40 into a signal in the time domain by inverse Fourier transform, and mutually connects the converted signals of the successive frames on the time axis. To generate an acoustic signal VOUT. The phase spectrum of any of the acoustic signals V [1] to V [J] is applied to the generation of the acoustic signal VOUT. [0026] As shown in FIG. 1, the filter processing unit 40 includes a second noise suppression unit 42 and a coefficient setting unit 44. The second noise suppression unit 42 performs signal processing (filter processing) for emphasizing the target sound component on the spectra Y [1] to Y [J] processed by the first noise suppression unit 32. Generate a spectrum Z for each frame. The signal processing performed by the second noise suppression unit 42 is directional array processing to which a filter coefficient W set so as to emphasize a target sound component is applied. Filtering to form a beam (an area with high sound collection sensitivity) directed in the direction (angle ξ) in which the target sound component arrives, or a beam whose dead angle is 08-05-2019 9 set in the direction in which noise components (nonstationary noise) arrive A filtering process to form a is preferably employed as a directional array process. Specifically, the second noise suppression unit 42 performs a delay-and-sum array process in which delays corresponding to the filter coefficient W are added to the spectra Y [1] to Y [J] and then added. [0027] The coefficient setting unit 44 generates a filter coefficient W to be applied to the processing of the second noise suppression unit 42. Specifically, the coefficient setting unit 44 is a filter for emphasizing the target sound component in an adaptive beamformer using the non-stationary noise spectra Nd [1] to Nd [J] generated by the non-stationary noise estimation unit 34. Generate a coefficient W. For example, MVDR (minimum variance distortionless response) determines the filter coefficient W so as to minimize the intensity of the noise component (non-stationary noise) from the direction while maintaining the intensity of the target sound component coming from the direction of the angle ξ. Is suitably employed as an adaptive beamformer. [0028] Specifically, the coefficient setting unit 44 calculates the filter coefficient W (fq) of each frequency fq (q = 1, 2,...) By the calculation of the following equation (3). The generation of the filter coefficient W (fq) is sequentially performed, for example, for each frame. [0029] The symbol RNN (fq) of Equation (3) is a covariance matrix of the intensities of the components of the frequency fq in each of the spectra Nd [1] to Nd [J]. That is, covariance matrix RNN (fq) is a vector vN having elements Nd [1] (fq) to Nd [J] (fq) at frequencies fq in each of spectra Nd [1] to Nd [J]. (fq) (vN (fq) = [Nd [1] (fq), Nd2 (fq),..., Nd [J] (fq)] <T>) and defined by the following equation (4) (Symbol T means transpose). RNN (fq) = E [vN (fq) vN (fq) <H>] (4) The symbol H in equation (3) or equation (4) means transpose (hermitian transposition) of a matrix. Further, the symbol E [] of Equation (4) means an average value (expected value) or an added value over a predetermined number of frames including the current frame (for example, the predetermined number of frames from the current frame to the past). The predetermined value ε of equation (2b) is preferably a nonzero value so that there is an inverse of covariance matrix RNN (fq) used to calculate filter coefficient W (fq) of equation (3). Set to 08-05-2019 10 [0030] The symbol d ξ (fq) of equation (3) indicates the time difference between the arrival of sound waves (plane waves) of frequency f q coming from the direction of angle に to each of sound collecting devices 12 [1] to 12 [J] It is a steering control vector of the column. The coefficient setting unit 44 generates the direction control vector dξ (fq) of Expression (3) according to the known angle す る at which the target sound component arrives. When the angle ξ is unknown, the coefficient setting unit 44 generates the direction control vector dξ (fq) after estimating the angle ξ of the target sound component. For estimation of the angle 法, publicly known techniques such as the MUSIC method and the ESPRIT method are arbitrarily adopted. Also, a method of forming beams in a plurality of directions by directional array processing (delay-sum array processing) and specifying the direction of the beam at which the volume of the acoustic signals V [1] to V [J] is maximum as the angle ξ Beam former method is also suitable. By applying the filter coefficient W (fq) generated in the above procedure to the directional array processing by the second noise suppression unit 42, the spectrum Z in which the target sound component is emphasized is sequentially generated for each frame. [0031] By the way, the process (spectral subtraction) in which the first noise suppressing unit 32 subtracts the spectrum Nw [j] of the stationary noise from the spectrum X [j] of the acoustic signal V [j] in the frequency domain (spectral subtraction) It generates high-intensity components (isolated points) that disperse in a dispersive manner, which causes artificial and offensive musical noise. The generation of musical noise due to spectral subtraction is described in more detail below. [0032] Part (A) of FIG. 2 is a graph of the frequency distribution of the intensity of the spectrum X [j] (probability density function with intensity as a random variable) FA over a predetermined number of frames before processing by the first noise suppression unit 32. . As shown in part (A) of FIG. 2, the frequency (probability) at which each intensity is distributed before spectral subtraction is non-linearly distributed such that the intensity decreases as it increases from zero. On the other hand, a part (B) of FIG. 2 is a graph of the frequency distribution FB of the intensity 08-05-2019 11 (for example, the intensity of the spectrum Y [j] or the spectrum Z) over a predetermined number of frames after processing by the first noise suppression unit 32. Since the frequency (probability) for which the intensity is a numerical value close to zero is increased by the subtraction by the first noise suppression unit 32, the distribution in the section where the intensity is a numerical value close to zero among the frequency distribution FB after spectral subtraction is the spectrum The shape is steeper compared to the frequency distribution FA before subtraction. [0033] Now, when kurtosis is introduced as a measure of the shape of the frequency distribution (the steepness of the slope), the kurtosis KB of the frequency distribution FB of the signal intensity after spectrum subtraction is the frequency distribution FA of the signal intensity before spectrum subtraction. This is a large figure compared to the kurtosis KA of (KB> KA). In consideration of the fact that kurtosis is a measure of gaussianness, stationary noise having high gaussianity of the frequency distribution of intensity among the acoustic signals V [j] is suppressed by the first noise suppression unit 32. It is understood that non-gaussianity increases. Because musical noise is noise that is strongly non-gaussian (noise with high intensity near zero), musical noise tends to become more apparent as kurtosis increases before and after spectral subtraction. [0034] Therefore, the degree to which the kurtosis in the frequency distribution of the signal intensity changes before and after the spectral subtraction (hereinafter referred to as the "curtiness change index") KR is a quantitative index of the degree to which musical noise occurs due to the spectral subtraction. Function. The relative ratio of kurtosis KB after spectrum subtraction to kurtosis KA before spectrum subtraction (curtness ratio) is exemplified below as kurtosis change index KR (KR = KB / KA). As understood from the above definition, the musical noise becomes more remarkable as the kurtosis change index KR is larger (change in kurtosis is larger). [0035] Portions (A) and (B) of FIG. 3 are graphs (distribution diagrams) illustrating the kurtosis change index KR for each frequency (vertical axis). It means that the kurtosis change index KR is larger 08-05-2019 12 (musical noise is more likely to occur) as the shaded area has a higher density. The kurtosis change index KR of the part (A) of FIG. 3 is the kurtosis Kx (spectrum X [1] to X [J] in the frequency distribution of the intensity of the spectrum X [j] before processing by the first noise suppression unit 32. Ratio of the average of the spectrum Y [j] in the frequency distribution of the intensity of the spectrum Y [j] immediately after the processing by the first noise suppression unit 32 (average value of the spectrum Y [1] to Y [J]) (Ky / Kx). On the other hand, the kurtosis change index KR of the part (B) of FIG. 3 is the kurtosis Kx in the frequency distribution of the intensity of the spectrum X [j] before processing by the first noise suppression unit 32 and the second noise suppression unit 42. It is a relative ratio (Kz / Kx) with kurtosis Kz (average value of spectrum Z [1]-Z [J]) in frequency distribution of intensity of spectrum Z after directivity array processing. That is, the kurtosis change index KR changes from the part (A) in FIG. 3 to the part (B) in FIG. 3 by the directional array processing by the second noise suppression unit 42. [0036] The kurtosis change index KR in FIG. 3 is a measurement value when a noise component (white Gaussian noise) in which directional noise and diffusive noise are mixed is generated. Directional noise is a noise component that arrives in a directional manner from one direction (narrow range) to the sound collection device 12 [1] to 12 [J], and diffusive noise is diffused from a plurality of directions. It is a noise component that arrives at the sound collection device 12 [1] to 12 [J]. The horizontal axis in part (A) and part (B) of FIG. 3 means the relative ratio of the intensity of directional noise to the intensity of diffusive noise (hereinafter referred to as “direction index”) D. As the directivity index D is larger, the directional noise is dominant (the directivity is stronger), and as the directivity index D is smaller, the diffusive noise is dominant (the diffusion is stronger). [0037] Since the directional array processing (delay-sum array processing) of the filter processing unit 40 in FIG. 1 acts to reduce the non-Gaussianity of the signal (central limit theorem), as shown in FIG. If K is strong, the kurtosis change index KR is sufficiently reduced in directional array processing after spectral subtraction. That is, when the diffusivity of the noise component is strong, the musical noise is sufficiently suppressed by the directional array processing. On the other hand, when the directionality of the noise component is strong, as shown in FIG. 3, the kurtosis change index KR tends to maintain the same high numerical value as that immediately after the spectral subtraction even after directional array processing. . That is, when the directionality of the noise component is strong, the directional array processing hardly 08-05-2019 13 contributes to the suppression of the musical noise. As shown in FIG. 3, the above tendency appears similarly over a wide range of frequencies. [0038] Next, FIG. 4 is a graph illustrating the relationship between the subtraction coefficient α (horizontal axis) of the equation (1a) and the kurtosis change index KR (vertical axis) for each direction index D. Further, FIG. 5 is a graph illustrating the relationship between the subtraction coefficient α (horizontal axis) of the equation (1a) and the noise suppression rate NRR (vertical axis) for each direction index D. In each of FIG. 4 and FIG. 5, when the noise component is only diffuse noise (D = − と) and when diffuse noise and directional noise are mixed in the same ratio (D = 0), It is assumed that directional noise dominates (D = 20). [0039] Similar to the portion (B) of FIG. 3, the kurtosis change index KR of FIG. 4 indicates the kurtosis Kx before processing (spectrum X [j]) by the first noise suppression unit 32 and pointing by the second noise suppression unit 42. It is a relative ratio (Kz / Kx) with kurtosis Kz after the sex array processing (spectrum Z). However, the kurtosis change index KR in FIG. 4 is an average value over the entire frequency range. Further, the noise suppression rate NRR in FIG. 5 is a difference between the SN ratio ROUT of the acoustic signal VOUT after processing by the noise suppression device 100 and the SN ratio RIN of the acoustic signal V [j] before processing (NRR = ROUT− RIN). Therefore, it can be evaluated that the effect (performance) of the noise suppression is higher as the noise reduction rate NRR is higher. As shown in FIGS. 4 and 5, the musical noise is more likely to occur (the kurtosis change index KR increases in FIG. 4) and the noise suppression effect increases (noise in FIG. 5) as the subtraction coefficient α increases. The suppression rate NRR tends to increase. [0040] As understood from FIG. 4, when the directivity of the noise component is strong (for example, D = 20), the subtraction coefficient α is increased as compared with the case where the diffusivity of the noise component is strong (for example, D = −∞) By doing this, the kurtosis change index KR greatly increases. On the other hand, as understood from FIG. 5, when the directionality of the noise component is strong, the noise suppression rate NRR is sufficiently high even when the 08-05-2019 14 subtraction coefficient α is small as compared with the case where the diffusivity of the noise component is strong. That is, under the configuration of FIG. 1, the noise suppression rate NRR is maintained at a high level even when the subtraction coefficient α is set to a small value so that musical noise is suppressed when the directionality of the noise component is strong. Ru. [0041] Further, as understood from FIG. 5, when the diffusibility of the noise component is strong (for example, D = −∞), the noise suppression rate NRR is low as compared with the case where the directivity of the noise component is strong. On the other hand, when the diffusivity of the noise component is strong, the musical noise is effectively reduced by the directional array processing by the second noise suppression unit 42 as described with reference to FIG. 3, as shown in FIG. Even when the subtraction coefficient α is set to a large value, the kurtosis change index KR is small (that is, musical noise is less likely to occur). That is, under the configuration of FIG. 1, musical noise is effectively suppressed even when the subtraction coefficient α is set to a large value in order to maintain the noise reduction rate NRR high when the diffusibility of the noise component is strong. . [0042] In consideration of the above tendency, the suppression control unit 60 of FIG. 1 variably controls the subtraction coefficient α according to the kurtosis change index KR. As shown in FIG. 1, the suppression control unit 60 includes an index calculating unit 62 and a coefficient adjusting unit 64. The index calculating unit 62 calculates the kurtosis change index KR for each frame. The calculation of the kurtosis change index KR will be described in detail below. [0043] The kurtosis は is a high-order statistic calculated from the nth moment μn by the following equation (5). [0044] The frequency distribution (probability density function) of the M intensities x1 to xM is approximated by the function Ga (x; k, θ) of the following equation (6). 08-05-2019 15 The coefficient C of equation (6) is defined as follows using the gamma function Γ (k). [0045] The following equation (7) is derived by replacing the distribution function (probability density function) P (x) in the definition equation of the second moment μ2 with the function Ga (x; k, θ) of the equation (6) . [0046] Similar to the derivation of the second-order moment μ 2, the following equation (the equation (6) can be obtained by replacing the distribution function P (x) in the definition equation of the fourth-order moment μ 4 with the function Ga (x; k, θ) of the equation (6) 8) is derived. [0047] Substituting the second moment μ2 of the equation (7) and the fourth moment μ4 of the equation (8) into the equation (5), the following equation (9) defining the kurtosis 尖 is derived. [0048] The index calculating unit 62 in FIG. 1 calculates M spectra X [1] to X [J] over a predetermined number of (predetermined number of frames in the past) frames including a frame to be calculated of the kurtosis change index KR. The kurtosis Kx before spectrum subtraction is calculated by executing the operation of Equation (9) for the intensities x1 to xM of the spectrum Z, and the spectrum Z of the spectrum Z over a predetermined number of frames including the frame to be calculated The kurtosis Kz after directional array processing is calculated by executing the operation of Formula (9) for M intensities x1 to xM. Then, the index calculating unit 62 calculates the relative ratio of the kurtosis Kz to the kurtosis Kx as the kurtosis change index KR (KR = Kz / Kx). [0049] 08-05-2019 16 The coefficient adjusting unit 64 of FIG. 1 variably sets the subtraction coefficient α in accordance with the kurtosis change index KR calculated by the index calculating unit 62. Specifically, the coefficient adjusting unit 64 sets the subtraction coefficient α such that the kurtosis change index KR approaches the target value K0. As shown in FIG. 4, the kurtosis change index KR increases as the subtraction coefficient α is increased. The coefficient adjustment unit 64 increases the subtraction coefficient α (increases the degree of noise suppression) until the kurtosis change index KR exceeds the target value K0. That is, the target value K0 corresponds to a numerical value (allowable value) indicating the degree to which musical noise due to spectral subtraction should be allowed. The target value K0 is variably set, for example, according to an instruction from the user (the degree to which the user can tolerate musical noise). However, target value K0 may be set to a predetermined fixed value. [0050] FIG. 6 is a flowchart of the operation of the noise suppression apparatus 100 focusing on the adjustment of the subtraction coefficient α. The process of FIG. 6 is sequentially performed every predetermined period (for example, predetermined number of frames). When the process of FIG. 6 starts, the coefficient adjusting unit 64 initializes the subtraction coefficient α to a predetermined value (for example, zero) (S1). Next, for the m-th frame (the current frame), the first noise suppressing unit 32 generates spectra Y [1] to Y [J] by spectrum subtraction to which the subtraction coefficient α is applied (S2), and the spectrum Y [ The second noise suppressing unit 42 generates a spectrum Z by directional array processing for 1] to Y [J] (S3). The spectrum Z generated in step S3 is output to the waveform synthesis unit 52. The index calculating unit 62 calculates the kurtosis change index KR from the spectra X [1] to X [J] of the m-th frame and the spectrum Z (S4). [0051] Next, the coefficient adjusting unit 64 determines whether the kurtosis change index KR calculated in step S4 exceeds the target value K0 (S5). If the kurtosis change index KR is less than the target value K0, the coefficient adjustment unit 64 calculates the sum of the current subtraction coefficient α and the predetermined value Δα as the updated subtraction 08-05-2019 17 coefficient α (S6). In step S2 following step S6, spectrum subtraction to which the updated subtraction coefficient α is applied is performed on the next ((m + 1) th frame). That is, the first noise suppressing unit 32 subtracts the spectrum Nw [j] of the stationary noise from each spectrum X [j] of the (m + 1) th frame according to the updated subtraction coefficient α. [0052] As described above, updating of the subtraction coefficient α (S6), spectral subtraction (S2) to which the updated subtraction coefficient α is applied, directional array processing (S3) after spectral subtraction, and kurtosis change index KR The calculation (S4) is repeated sequentially. Therefore, the subtraction coefficient α is sequentially increased by a predetermined value Δα for each frame so that the kurtosis change index KR sequentially approaches the target value K0. Then, when the kurtosis change index KR exceeds the target value K0 (S5: YES), the process of FIG. 6 ends. That is, the subtraction coefficient α after the update in the previous step S6 is maintained until the start of the process of FIG. 6 next time. [0053] FIG. 7 is a graph showing the relationship between the directivity index D (horizontal axis) and the kurtosis change index KR (vertical axis), and FIG. 8 is the directivity index D (horizontal axis) and the noise suppression rate NRR (vertical Is a graph showing the relationship with the axis). In FIGS. 7 and 8, when the subtraction coefficient α is controlled by the processing of FIG. 6 (solid line) so that the kurtosis change index KR approaches the target value K0 (K0 = 1.4), the subtraction coefficient α is fixed at 1. The case (dotted line) and the case where the subtraction coefficient α is fixed to 2 (dotted line) are shown together. [0054] In the above embodiment, the coefficient adjustment is performed so that the musical noise caused by the spectrum subtraction of the first noise suppression unit 32 is suppressed to the degree according to the target value K0 (the kurtosis change index KR approaches the target value K0). The unit 64 variably controls the subtraction coefficient α. When the noise component is rich in diffusive noise (in the case where the directivity index D is small), as described with reference to FIG. 4, the kurtosis change index KR increases even when the subtraction coefficient α is increased. Since it is hard to do (although musical noises do not 08-05-2019 18 easily occur), the subtraction coefficient α is automatically adjusted to a large value. Therefore, as shown in FIG. 8, it is possible to achieve the high noise suppression rate NRR as high as when the subtraction coefficient α is fixed to 2 while suppressing the musical noise to a degree according to the target value K0. [0055] On the other hand, when the noise component is rich in directional noise (the directional index D is large), as described with reference to FIG. 4, the kurtosis change index KR increases with the increase of the subtraction coefficient α. The subtraction coefficient α is automatically adjusted to a small value because it is easy (musical noise is likely to occur). However, in the case of rich directional noise, as described with reference to FIG. 5, a high noise suppression rate NRR can be achieved even when the subtraction coefficient α is small. Therefore, musical noise can be effectively suppressed as shown in FIG. 7 while maintaining the noise suppression rate NRR equivalent to the case where the subtraction coefficient α is fixed to 1. That is, according to the present embodiment, musical noise suppression (improvement of sound quality) and noise suppression rate are obtained even in an environment where there are more directional noise and diffuse noise compared to the case where the subtraction coefficient α is fixed to a predetermined value. There is an advantage that it is compatible with the improvement of NRR (the improvement of SN ratio). [0056] For example, it is assumed that a mobile phone equipped with the noise suppression device 100 is used in a space such as a station yard or an exhibition hall. The operating noise of the air conditioning equipment reaches the mobile phone as diffusive noise. In addition, the sound emitted from a sound source located far from the mobile phone (for example, the voice of another user, a walking sound, or the sound from a speaker for broadcasting) is also reflected by the wall surface or floor in the space. Reach the mobile phone as diffuse noise. On the other hand, the voices of other users who are near the mobile phone and the walking sounds intermittently arrive at the mobile phone as directional noise. That is, a space such as a station yard or an exhibition hall is a typical environment in which directional noise and diffuse noise are switched in a short time. Even in the above environment, according to the noise suppression device 100 of FIG. 1, the suppression of the musical noise and the noise suppression rate NRR both in the period in which the directional noise is dominant and in the period in which the diffuse noise is dominant. It is possible to effectively suppress noise components (stationary noise and nonstationary noise) while simultaneously achieving 08-05-2019 19 [0057] <Modification> Each form illustrated above is deformed variously. The aspect of a specific deformation | transformation is illustrated below. Two or more aspects arbitrarily selected from the following exemplifications may be combined as appropriate. [0058] (1) Modification 1 For the calculation of the filter coefficient W, a known adaptive beamformer may optionally be used other than the MVDR. For example, an SNR maximizing beamformer is preferably employed that determines the filter coefficient W such that the SN ratio of the acoustic signal VOUT after directional array processing is maximized. Specifically, the coefficient setting unit 44 calculates, as the filter coefficient W (fq), an eigenvector for which the eigenvalue is maximum under the eigenvalue problem expressed by the following equation (10). β · SNN (fq) K (fq) = SXX (fq) K (fq) (10) [0059] The symbol SXX (fq) of Equation (10) means the covariance matrix of the intensity of the component of frequency fq among the target sound components, and the symbol SNN (fq) of Equation (10) has a frequency fq of noise components. We mean the covariance matrix of component intensities. The covariance matrix SXX (fq) of the target sound component can be calculated, for example, from the intensity at the frequency fq in each of the spectra X [1] to X [J] in the target sound section detected by the noise extraction unit 24 Calculated in the same way as Further, for example, the covariance matrix RNN (fq) calculated by the equation (4) from the spectra Nd [1] to Nd [J] of nonstationary noise is applied as the covariance matrix SNN (fq) of the equation (10) . When the SNR maximizing beamformer is used, there is an advantage that it is not necessary to specify the direction (angle ξ) of the target sound component. [0060] (2) Modification 2 In the above embodiment, as described with reference to FIG. 6, the method of 08-05-2019 20 sequentially updating the subtraction coefficient α for each frame (ie, the optimum value of the subtraction coefficient α gradually over a plurality of frames) However, by repeating the process from step S2 to step S6 of FIG. 6 a plurality of times for one frame, a configuration is also adopted in which the subtraction coefficient α is set to an optimal value for each frame. . However, according to the method of updating the subtraction coefficient α stepwise for each frame as shown in FIG. 6, the processing amount of the noise suppression device 100 is smaller than the method of optimizing the subtraction coefficient α individually for each frame. It has the advantage of being significantly reduced. [0061] Further, in the above embodiment, the kurtosis change index KR is the target value K0 while the spectrum subtraction by the first noise suppression unit 32 and the filter processing (directional array processing) by the second noise suppression unit 42 are actually performed. The subtraction coefficient α is controlled so as to be close to K. However, the subtraction coefficient α is analytically calculated so that the kurtosis change index KR approaches the target value K 0 (that is, the first noise suppression unit 32 and the second noise suppression unit 42 It is also possible to calculate the subtraction factor α without actually operating. Specifically, the intensity (second-order statistic) of the noise component remaining in the spectrum Z calculated by the spectrum subtraction to which the subtraction coefficient α is applied and the filter processing to which the filter coefficient W is applied, and the spectrum subtraction and filtering The noise of the spectrum Z is defined under the condition that the kurtosis change index KR is maintained at the target value K 0 by defining a formula (iteration formula) expressing the relationship with the later kurtosis change index KR (quartic statistic) Calculate the subtraction coefficient α that maximizes the component strength (second-order statistics optimization under fourth-order statistical constraints). The above-described configuration also achieves the same effects as the configuration of FIG. [0062] (3) Modification 3 In the above embodiment, the spectrum Nd [j] of non-stationary noise estimated from the noise section is used as the spectrum Nd [j] of non-stationary noise in the target sound section. A configuration in which the spectrum Nd [j] of non-stationary noise is specified directly from each frame in the target sound section may also be adopted. For example, a configuration in which the noise extraction unit 24 of FIG. 1 is replaced with the noise extraction unit 24B of FIG. 9 or the noise extraction unit 24C of FIG. 10 is employed. 08-05-2019 21 [0063] The noise extraction unit 24B in FIG. 9 functions as a blind spot control type beam former that forms a blind spot (a region with low sensitivity) of sound collection in the direction (angle ξ) in which the target sound component arrives. For example, when the angle ξ of the target sound component is zero, as shown in FIG. 9, the noise extraction unit 24B selects one of the J sound collection devices 12 [1] to 12 [J] (J channels). It is configured to include (J-1) subtractors 72 [1] to 72 [J-1] corresponding to each combination of two adjacent sound collecting devices 12. The subtractor 72 [j] subtracts the acoustic signal V [j + 1] (spectrum X [j + 1]) from the acoustic signal V [j] (spectrum X [j]) to obtain the target sound from the angle ξ. Suppress the component. Therefore, the spectra N [1] to N [J-1] of the noise component are output from the noise extraction unit 24B. [0064] The noise suppression unit 24C of FIG. 10 corresponds to (J-1) separations corresponding to each combination of two adjacent sound collection devices 12 among the J number of sound collection devices 12 [1] to 12 [J]. It is constituted including parts 74 [1] to 74 [J-1]. The separating unit 74 [j] is an independent component analysis (ICA) using the acoustic signal V [j] (spectrum X [j]) and the acoustic signal V [j + 1] (spectrum X [j + 1]). Generate a spectrum N [j] of the noise component. Specifically, the separating unit 74 [j] is a filter for the sound signal V [j] and the sound signal V [j] that is a separation matrix in which the target sound component and the noise component are set to be statistically independent. A noise component is extracted by applying to processing (source separation). Therefore, the spectra N [1] to N [J-1] of the noise component are output from the noise extraction unit 24C. [0065] In any of the configurations shown in FIGS. 9 and 10, stationary noise estimating unit 26 generates spectrum Nw [1] of (J-1) system with time average of each of spectra N [1] to N [J-1]. Generate Nw [J-1]. Therefore, the first noise suppressing unit 32 performs (J-1) acoustic signals V [j] (for example, the acoustic signal V [1] among the acoustic signals V [1] to V [J] of the J channels. The spectrum Nw [j] is subtracted from ~V [J−1] to generate the spectrum Y [1] to Y [J−1] of the (J−1) family. On the other hand, non-stationary noise estimation unit 34 subtracts spectrum Nw [j] of stationary noise from each of spectra N [1] to N [J-1] to obtain spectrum Jd of 08-05-2019 22 the (J-1) system. -Generate Nd [J-1]. Therefore, the filter coefficient W generated by the coefficient setting unit 44 by the calculation of Equation (3) is a matrix of (J-1) rows and 1 column. The second noise suppression unit 42 performs a filter process in which the filter coefficient W is applied to the (J-1) series of spectra Y [1] to Y [J-1] generated by the first noise suppression unit 32. [0066] According to the configurations of FIGS. 9 and 10, spectra Nd [1] to Nd [J-1] of non-stationary noise are directly extracted from each frame in the target sound segment. Compared with the configuration of FIG. 1 in which [j] is diverted to the target sound section, it is possible to set a filter coefficient W capable of suppressing non-stationary noise with high accuracy. [0067] (4) Modification 4 The definition of the kurtosis change index KR is not limited to the above example (relative ratio between kurtosis Kx and kurtosis Kz). For example, a configuration in which the difference between kurtosis Kz and kurtosis Kx is calculated as kurtosis change index KR (KR = Kz-Kx) or the calculated value of a predetermined function with kurtosis Kx and kurtosis Kz as variables A configuration in which the kurtosis change index KR is calculated (for example, a configuration in which the relative ratio between the kurtosis Kx and the kurtosis Kz or the logarithmic value of the difference value is used as the kurtosis change index KR) is also suitable. In the above embodiment, the kurtosis Kx is calculated from the acoustic signals V [1] to V [J], but the kurtosis is calculated from only one acoustic signal V [j] selected from among the J channels. A configuration for calculating Kx is also adopted. [0068] In the above embodiment, the kurtosis change index KR increases as the kurtosis Kz increases with respect to the kurtosis Kx, but the kurtosis change index as the kurtosis Kz increases with respect to the kurtosis Kx A configuration in which the kurtosis change index KR is defined so as to decrease KR is also adopted. As understood from the above examples, the degree of kurtosis change index KR changes the degree of kurtosis in the frequency distribution of the signal strength before processing by the first noise suppression unit 32 and after processing by the 08-05-2019 23 second noise suppression unit 42 Specific calculation method (definition) is optional. [0069] (5) Modification 5 In the above-described embodiment, the processing from the frequency analysis unit 22 to the waveform synthesis unit 52 is performed in the frequency domain, but processing other than spectral subtraction by the first noise suppression unit 32 performs signal processing in the time domain as appropriate. Can be changed to For example, the index calculating unit 62 calculates the kurtosis Kx from each intensity of the sound signal V [j] in the time domain, or the index calculating unit 62 calculates the kurtosis Kz from each intensity of the sound signal VOUT in the time domain. The configuration is adopted. Also, the processing of the noise extraction unit 24 and the stationary noise estimation unit 26 may be performed in the time domain. [0070] (6) Modification 6 In the above embodiments, the spectrum Nw [j] of stationary noise is generated for each channel of the acoustic signal V [j}, but the spectrum Nw common to a plurality of channels (for example, FIG. A configuration that generates spectra Nw [1] to Nw [J] can also be adopted. The first noise suppression unit 32 subtracts the common spectrum Nw of stationary noise from each of the spectra X [1] to X [J] to generate spectra Y [1] to Y [J], and generates non-stationary noise. The estimation unit 34 subtracts the common spectrum Nw from each of the spectra N [1] to N [J] of the noise component to generate spectra Nd [1] to Nd [J] of nonstationary noise. [0071] 100: noise suppression device, 12: sound collection device, 14: sound emission device, 22: frequency analysis unit, 24: noise extraction unit, 26: stationary noise estimation unit, 32: first noise suppression 34: Non-stationary noise estimation unit 40: Second noise suppression unit 42: Second noise suppression unit 44: Coefficient setting unit 52: Waveform combining unit 60: Suppression control unit 62: Index calculation part, 64: Coefficient adjustment part. 08-05-2019 24

1/--страниц