close

Вход

Забыли?

вход по аккаунту

?

JP2010271411

код для вставкиСкачать
Patent Translate
Powered by EPO and Google
Notice
This translation is machine-generated. It cannot be guaranteed that it is intelligible, accurate,
complete, reliable or fit for specific purposes. Critical decisions, such as commercially relevant or
financial decisions, should not be based on machine-translation output.
DESCRIPTION JP2010271411
[Object] To achieve both reduction of musical noise and effective suppression of noise
components. A first noise suppressing unit 32 subtracts a spectrum Nw [j] of stationary noise
from a spectrum X [j] of an acoustic signal V [j] of each channel at a degree according to a
subtraction coefficient α. The coefficient setting unit 44 generates a filter coefficient W for
emphasizing the target sound component from the non-stationary noise spectra Nd [1] to Nd [J].
The second noise suppression unit 42 generates a spectrum Z by performing directional array
processing in which the filter coefficient W is applied to the spectra Y [1] to Y [J] processed by
the first noise suppression unit 32. The index calculating unit 62 calculates a kurtosis change
index KR indicating the degree to which the kurtosis in the frequency distribution of the signal
intensity has changed before the processing of the first noise suppressing unit 32 and after the
processing of the second noise suppressing unit. The coefficient adjustment unit 64 variably
controls the subtraction coefficient α such that the kurtosis change index KR approaches the
target value K0. [Selected figure] Figure 1
Noise suppressor and program
[0001]
The present invention relates to techniques for suppressing noise components from acoustic
signals.
[0002]
Conventionally, a technique for suppressing a noise component from a mixed sound of a target
08-05-2019
1
sound component and a noise component has been proposed.
For example, Patent Document 1 discloses a technique for subtracting the spectrum of a noise
component estimated by independent component analysis from the spectrum of an acoustic
signal in which a target sound component is emphasized by a delay-and-add type beamformer.
[0003]
JP, 2007-248534, A
[0004]
However, in the technology of suppressing noise components in the frequency domain as in
Patent Document 1, after the noise components are suppressed, the components scattered on the
time axis and on the frequency axis are perceived by the listener as artificial musical noise. Be
done.
If the degree of subtraction of the noise component is suppressed, the musical noise decreases,
but there is a problem that the noise component can not be sufficiently suppressed (the SN ratio
after processing is low). In view of the above circumstances, the present invention aims to
achieve both reduction of musical noise and effective suppression of noise components.
[0005]
In order to solve the above problems, a noise suppression device according to the present
invention is a device that suppresses noise components from acoustic signals of a plurality of
channels generated by a plurality of sound collecting devices, and noises of acoustic signals of
each channel Noise extraction means for extracting a component, stationary noise estimation
means for estimating stationary noise included in the noise component, and first noise for
subtracting the spectrum of stationary noise from the spectrum of the acoustic signal of each
channel to a degree according to the subtraction coefficient Suppression means, non-stationary
noise estimation means for estimating the spectrum of non-stationary noise by subtracting the
spectrum of stationary noise from the spectrum of noise components of each channel, and a
spectrum of non-stationary noise with filter coefficients for enhancing the target sound
component Means for generating coefficients from the filter and a filter coefficient applied to
acoustic signals of a plurality of channels processed by the first noise suppressing means. A
kurtosis change index indicating the degree to which the second noise suppression means
08-05-2019
2
performing processing and the kurtosis in the frequency distribution of the intensity of the
acoustic signal change before and after processing by the first noise suppression means And a
coefficient adjustment unit that variably controls the subtraction coefficient in accordance with
the kurtosis change index.
[0006]
In the above embodiment, according to the kurtosis change index indicating the degree to which
the kurtosis in the frequency distribution of the intensity of the acoustic signal changes before
processing by the first noise suppression means and after processing by the second noise
suppression means Since the subtraction coefficient of the processing of the noise suppression
means is variably controlled, it is possible to effectively suppress the noise component while
suppressing the musical noise caused by the processing by the first noise suppression means.
[0007]
In a preferred aspect of the present invention, the coefficient adjustment means sets a
subtraction coefficient such that the kurtosis change index approaches a predetermined value.
In the above aspect, there is an advantage that the noise component can be effectively
suppressed while suppressing the musical noise resulting from the processing by the first noise
suppression means to a desired degree according to the predetermined value.
[0008]
The noise suppression device according to each of the above aspects is realized by hardware
(electronic circuit) such as DSP (Digital Signal Processor) dedicated to noise suppression, and
general-purpose arithmetic processing device such as CPU (Central Processing Unit) Is also
realized by the collaboration between
A program according to the present invention includes noise extraction processing for extracting
noise components of acoustic signals of respective channels generated by a plurality of sound
collection devices, stationary noise estimation processing for estimating stationary noises
included in the noise components, stationary noise The first noise suppression processing of
subtracting the spectrum from the spectrum of the acoustic signal of each channel to a degree
08-05-2019
3
according to the subtraction coefficient, and the spectrum of stationary noise by subtracting the
spectrum of stationary noise from the spectrum of the noise component of each channel Nonstationary noise estimation processing, coefficient setting processing for generating filter
coefficients for emphasizing the target sound component from the non-stationary noise
spectrum, and filter coefficients for acoustic signals of a plurality of channels after execution of
the first noise suppression processing Second noise suppression processing and kurtosis in the
frequency distribution of the intensity of the acoustic signal before execution of the first noise
suppression processing and the second noise suppression processing And index calculation
process of calculating a kurtosis change index indicating the degree of change and after
execution, to perform the coefficient adjustment processing for variably controlling the
subtraction factor according to the kurtosis change index on the computer. According to the
above program, the same operation and effect as the noise suppression device according to each
aspect of the present invention are achieved. The program according to the present invention is
provided to the user in the form of being stored in a computer readable recording medium and
installed in the computer, and is also provided from the server device in the form of distribution
via a communication network. Installed on a computer
[0009]
It is a block diagram of a noise suppression device concerning an embodiment. It is a conceptual
diagram for demonstrating the change of the kurtosis in frequency distribution of the intensity |
strength of an acoustic signal. It is a conceptual diagram for demonstrating the effect | action of
directional array processing. It is a graph which shows the relationship between a subtraction
coefficient and a kurtosis change index. It is a graph which shows the relationship between a
subtraction coefficient and a noise suppression rate. It is a flowchart of operation | movement of
a noise suppression apparatus. It is a graph for demonstrating the effect of embodiment. It is a
graph for demonstrating the effect of embodiment. It is a block diagram of a noise extraction part
concerning a modification. It is a block diagram of a noise extraction part concerning a
modification.
[0010]
FIG. 1 is a block diagram of a noise suppression apparatus 100 according to an embodiment of
the present invention. J (J is a natural number of 2 or more) sound collecting devices 12 [1] to 12
[J] (microphone array) arranged in the plane PL with a predetermined interval mutually
connected to the noise suppression device 100 Ru. The sound collection device 12 [j] (j = 1 to J)
generates an acoustic signal V [j] in the time domain that represents the waveform of the sound
08-05-2019
4
coming from the surroundings. The symbol j is the channel number of the acoustic signal V [j].
[0011]
A mixed sound of the target sound component and the noise component arrives from the
surroundings to the sound collection devices 12 [1] to 12 [J]. The target sound component is
sound (voice or musical tone) which is the purpose of sound collection. The target sound
component arrives at the sound collection devices 12 [1] to 12 [J] from a direction forming a
known angle に 対 し て with respect to the normal to the plane PL. For example, assuming that
the noise suppression device 100 is mounted on an electronic device (for example, a mobile
phone) that receives the user's voice, the voice coming from the front direction (ξ = 0 °) with
respect to the main body of the electronic device Corresponds to the target sound component.
[0012]
On the other hand, the noise component is a component other than the target sound component,
and may include stationary noise and non-stationary noise. Stationary noise is a component that
changes little with time (or does not change with time) in acoustic characteristics (for example,
sound pressure). For example, the operation noise of the air conditioning equipment and the
noise in crowded people correspond to the stationary noise. On the other hand, non-stationary
noise is a component (instantaneous noise) whose acoustic characteristics change with time over
time. For example, speech (utterance sound) and musical tones other than the target sound
component correspond to non-stationary noise.
[0013]
The noise suppression apparatus 100 generates an acoustic signal VOUT in the time domain by
executing processing for suppressing noise components (stationary noise and nonstationary
noise) on the acoustic signals V [1] to V [J]. . The acoustic signal VOUT generated by the noise
suppression device 100 is reproduced as sound by being supplied to the sound emission device
14 (for example, a speaker or a headphone). Note that illustration of an A / D converter that
converts the acoustic signals V [1] to V [J] into digital signals, and a D / A converter that converts
the acoustic signals VOUT into analog signals is omitted for convenience. .
08-05-2019
5
[0014]
The noise suppression device 100 executes a program stored in a storage device (not shown) to
execute a plurality of functions (frequency analysis unit 22, noise extraction unit 24, stationary
noise estimation unit 26, first noise suppression unit 32, non- This is realized by an arithmetic
processing unit that executes the stationary noise estimation unit 34, the filter processing unit
40, the waveform synthesis unit 52, and the suppression control unit 60). However, a
configuration in which an electronic circuit (DSP) dedicated to noise suppression realizes the
respective elements in FIG. 1 or a configuration in which the respective elements in FIG. 1 are
dispersed in a plurality of integrated circuits are also adopted.
[0015]
The frequency analysis unit 22 divides the spectrum (power spectrum) X [j] (X [1] to X [J]) of
each frame obtained by dividing the sound signal V [j] on the time axis into the sound signal V [1]
to It is generated for each channel of V [J]. The spectrum X [j] is a series of intensity (power) at
each of a predetermined number of frequencies discretely set on the frequency axis. A known
technique (for example, short time Fourier transform) is arbitrarily adopted to generate the
spectrum X [j].
[0016]
The noise extraction unit 24 extracts a noise component included in the acoustic signal V [j] of
each channel for each frame. Specifically, the noise extraction unit 24 generates the spectrum
(power spectrum) N [j] (N [1] to N [J]) of the noise component for each frame. The spectrum X [j]
matches the spectrum N [j] of the noise component in the noise section where the target sound
component does not exist in the acoustic signal V [j]. Therefore, the noise extraction unit 24
divides the acoustic signal V [j] (time series of the spectrum X [j]) into the target sound section
and the noise section on the time axis, and the spectrum X of each frame in the noise section
Identify j] as the spectrum N [j] of the noise component. A well-known voice activity detection
(VAD) technique is arbitrarily adopted to distinguish between the target sound section and the
noise section.
[0017]
08-05-2019
6
The stationary noise estimation unit 26 estimates stationary noise included in the noise
component of each channel extracted by the noise extraction unit 24. Stationary noise is a
temporally stationary component of noise components as described above. Therefore, the
stationary noise estimating unit 26 averages (temporally averages) the spectrum N [j] of the
noise component generated by the noise extracting unit 24 over a plurality of frames in the noise
section, thereby obtaining the spectrum (power spectrum) Nw of the stationary noise. [j] (Nw [1]
to Nw [J]) are generated. Non-stationary noise is removed from the spectrum Nw [j] by averaging
the spectrum N [j]. The stationary noise spectrum Nw [j] is sequentially updated for each noise
section. That is, the spectrum Nw [j] estimated in the noise section immediately before is
maintained in the target sound section.
[0018]
The first noise suppression unit 32 suppresses stationary noise included in the acoustic signal V
[j] for each channel in the frequency domain. As shown in FIG. 1, the first noise suppressing unit
32 includes J subtracting units SA [1] to SA [J] corresponding to the total number of channels of
the audio signals V [1] to V [J]. Configured The subtracting unit SA [j] corresponding to the j-th
channel subtracts the spectrum Nw [j] of stationary noise from the spectrum X [j] of the acoustic
signal V [j] in the frequency domain (spectrum subtraction). A spectrum (power spectrum) Y [j] (Y
[1] to Y [J]) is generated for each frame. Specifically, the subtraction unit SA [j] calculates the
spectrum Y [j] by the following equation (1a) and equation (1b).
[0019]
That is, for the frequency at which the spectrum X [j] of the acoustic signal V [j] exceeds the
threshold value Th1, the multiplication value of the stationary noise spectrum Nw [j] and the
subtraction coefficient α is The spectrum Y [j] is calculated by subtracting it from X [j]. On the
other hand, for frequencies at which the spectrum X [j] of the acoustic signal V [j] falls below the
threshold Th1, as shown in equation (1b), the spectrum is obtained by multiplying the stationary
noise spectrum X [j] by the flooring coefficient β. Y [j] is calculated. The threshold Th1 is set to,
for example, a multiplication value of the subtraction coefficient α and the spectrum Nw [j]. As
understood from Equation (1a) and Equation (1b), the subtraction coefficient α functions as a
numerical value that determines the degree of suppression of the noise component (stationary
noise). That is, the effect of suppressing stationary noise (the performance of noise suppression)
increases as the subtraction coefficient α increases.
08-05-2019
7
[0020]
The nonstationary noise estimation unit 34 estimates the spectrum (power spectrum) Nd [j] (Nd
[1] to Nd [J]) of nonstationary noise included in the acoustic signal V [j] of each channel for each
frame. . As shown in FIG. 1, non-stationary noise estimation unit 34 includes J subtraction units
SB [1] to SB [J] corresponding to the total number of channels of acoustic signals V [1] to V [J].
Configured
[0021]
The noise component is a mixed sound of stationary noise and non-stationary noise. Therefore,
the subtracting unit SB [j] corresponding to the j-th channel uses the spectrum N [j] of the
stationary noise from the spectrum N [j] of each frame in the noise section specified by the noise
extracting unit 24 in the frequency domain. The spectrum Nd [j] (Nd [1] to Nd [J]) of nonstationary noise is generated for each frame in the noise section by subtraction (spectral
subtraction). For each frame in the target sound interval, the spectrum Nd [j] of the last frame in
the immediately preceding noise interval is continuously output from the subtraction unit SB [j].
[0022]
As described above, non-stationary noise in each frame in the target sound segment is not
directly extracted from within the target sound segment. However, when the target sound
component is, for example, the voice of one speaker, the noise section and the target sound
section are alternately switched in a sufficiently short time with respect to the speed of
fluctuation of non-stationary noise. Therefore, although the spectrum Nd [j] extracted from each
frame in the noise section is used as the spectrum Nd [j] of non-stationary noise in the target
sound section, the accuracy of noise suppression is excessively reduced. There is no.
[0023]
The following Formula (2a) and Formula (2b) are applied to the calculation of the spectrum Nd [j]
by the calculation unit SB [j].
[0024]
08-05-2019
8
That is, for frequencies at which the noise component spectrum N [j] exceeds the threshold value
Th2 (eg, the product of the coefficient δ and the spectrum Nw [j]), the stationary noise spectrum
Nw [j ] Is subtracted from the spectrum N [j] of the noise component to calculate the spectrum
Nd [j].
On the other hand, for frequencies at which the spectrum N [j] falls below the threshold value
Th2, the spectrum Nd [j] of non-stationary noise is set to a predetermined value ε, as shown in
equation (2b). The predetermined value ε is set to, for example, the product of the spectrum N
[j] of the noise component and a predetermined coefficient.
[0025]
Since the target sound component, stationary noise and non-stationary noise are mixed in the
acoustic signal V [j], the spectrum Y [j] after suppression of stationary noise by the first noise
suppression unit 32 is the target sound component and non-stationary Including noise. The filter
processing unit 40 sets the spectrum (power spectrum) Z of the acoustic signal VOUT in which
the target sound component is emphasized (non-stationary noise is suppressed) from the
spectrum Y [1] to Y [J] after suppression of the stationary noise. To generate sequentially. The
waveform synthesis unit 52 converts the spectrum Z of each frame generated by the filter
processing unit 40 into a signal in the time domain by inverse Fourier transform, and mutually
connects the converted signals of the successive frames on the time axis. To generate an acoustic
signal VOUT. The phase spectrum of any of the acoustic signals V [1] to V [J] is applied to the
generation of the acoustic signal VOUT.
[0026]
As shown in FIG. 1, the filter processing unit 40 includes a second noise suppression unit 42 and
a coefficient setting unit 44. The second noise suppression unit 42 performs signal processing
(filter processing) for emphasizing the target sound component on the spectra Y [1] to Y [J]
processed by the first noise suppression unit 32. Generate a spectrum Z for each frame. The
signal processing performed by the second noise suppression unit 42 is directional array
processing to which a filter coefficient W set so as to emphasize a target sound component is
applied. Filtering to form a beam (an area with high sound collection sensitivity) directed in the
direction (angle ξ) in which the target sound component arrives, or a beam whose dead angle is
08-05-2019
9
set in the direction in which noise components (nonstationary noise) arrive A filtering process to
form a is preferably employed as a directional array process. Specifically, the second noise
suppression unit 42 performs a delay-and-sum array process in which delays corresponding to
the filter coefficient W are added to the spectra Y [1] to Y [J] and then added.
[0027]
The coefficient setting unit 44 generates a filter coefficient W to be applied to the processing of
the second noise suppression unit 42. Specifically, the coefficient setting unit 44 is a filter for
emphasizing the target sound component in an adaptive beamformer using the non-stationary
noise spectra Nd [1] to Nd [J] generated by the non-stationary noise estimation unit 34. Generate
a coefficient W. For example, MVDR (minimum variance distortionless response) determines the
filter coefficient W so as to minimize the intensity of the noise component (non-stationary noise)
from the direction while maintaining the intensity of the target sound component coming from
the direction of the angle ξ. Is suitably employed as an adaptive beamformer.
[0028]
Specifically, the coefficient setting unit 44 calculates the filter coefficient W (fq) of each
frequency fq (q = 1, 2,...) By the calculation of the following equation (3). The generation of the
filter coefficient W (fq) is sequentially performed, for example, for each frame.
[0029]
The symbol RNN (fq) of Equation (3) is a covariance matrix of the intensities of the components
of the frequency fq in each of the spectra Nd [1] to Nd [J]. That is, covariance matrix RNN (fq) is a
vector vN having elements Nd [1] (fq) to Nd [J] (fq) at frequencies fq in each of spectra Nd [1] to
Nd [J]. (fq) (vN (fq) = [Nd [1] (fq), Nd2 (fq),..., Nd [J] (fq)] <T>) and defined by the following
equation (4) (Symbol T means transpose). RNN (fq) = E [vN (fq) vN (fq) <H>] (4) The symbol H in
equation (3) or equation (4) means transpose (hermitian transposition) of a matrix. Further, the
symbol E [] of Equation (4) means an average value (expected value) or an added value over a
predetermined number of frames including the current frame (for example, the predetermined
number of frames from the current frame to the past). The predetermined value ε of equation
(2b) is preferably a nonzero value so that there is an inverse of covariance matrix RNN (fq) used
to calculate filter coefficient W (fq) of equation (3). Set to
08-05-2019
10
[0030]
The symbol d ξ (fq) of equation (3) indicates the time difference between the arrival of sound
waves (plane waves) of frequency f q coming from the direction of angle に to each of sound
collecting devices 12 [1] to 12 [J] It is a steering control vector of the column. The coefficient
setting unit 44 generates the direction control vector dξ (fq) of Expression (3) according to the
known angle す る at which the target sound component arrives. When the angle ξ is unknown,
the coefficient setting unit 44 generates the direction control vector dξ (fq) after estimating the
angle ξ of the target sound component. For estimation of the angle 法, publicly known
techniques such as the MUSIC method and the ESPRIT method are arbitrarily adopted. Also, a
method of forming beams in a plurality of directions by directional array processing (delay-sum
array processing) and specifying the direction of the beam at which the volume of the acoustic
signals V [1] to V [J] is maximum as the angle ξ Beam former method is also suitable. By
applying the filter coefficient W (fq) generated in the above procedure to the directional array
processing by the second noise suppression unit 42, the spectrum Z in which the target sound
component is emphasized is sequentially generated for each frame.
[0031]
By the way, the process (spectral subtraction) in which the first noise suppressing unit 32
subtracts the spectrum Nw [j] of the stationary noise from the spectrum X [j] of the acoustic
signal V [j] in the frequency domain (spectral subtraction) It generates high-intensity components
(isolated points) that disperse in a dispersive manner, which causes artificial and offensive
musical noise. The generation of musical noise due to spectral subtraction is described in more
detail below.
[0032]
Part (A) of FIG. 2 is a graph of the frequency distribution of the intensity of the spectrum X [j]
(probability density function with intensity as a random variable) FA over a predetermined
number of frames before processing by the first noise suppression unit 32. . As shown in part (A)
of FIG. 2, the frequency (probability) at which each intensity is distributed before spectral
subtraction is non-linearly distributed such that the intensity decreases as it increases from zero.
On the other hand, a part (B) of FIG. 2 is a graph of the frequency distribution FB of the intensity
08-05-2019
11
(for example, the intensity of the spectrum Y [j] or the spectrum Z) over a predetermined number
of frames after processing by the first noise suppression unit 32. Since the frequency
(probability) for which the intensity is a numerical value close to zero is increased by the
subtraction by the first noise suppression unit 32, the distribution in the section where the
intensity is a numerical value close to zero among the frequency distribution FB after spectral
subtraction is the spectrum The shape is steeper compared to the frequency distribution FA
before subtraction.
[0033]
Now, when kurtosis is introduced as a measure of the shape of the frequency distribution (the
steepness of the slope), the kurtosis KB of the frequency distribution FB of the signal intensity
after spectrum subtraction is the frequency distribution FA of the signal intensity before
spectrum subtraction. This is a large figure compared to the kurtosis KA of (KB> KA). In
consideration of the fact that kurtosis is a measure of gaussianness, stationary noise having high
gaussianity of the frequency distribution of intensity among the acoustic signals V [j] is
suppressed by the first noise suppression unit 32. It is understood that non-gaussianity increases.
Because musical noise is noise that is strongly non-gaussian (noise with high intensity near zero),
musical noise tends to become more apparent as kurtosis increases before and after spectral
subtraction.
[0034]
Therefore, the degree to which the kurtosis in the frequency distribution of the signal intensity
changes before and after the spectral subtraction (hereinafter referred to as the "curtiness
change index") KR is a quantitative index of the degree to which musical noise occurs due to the
spectral subtraction. Function. The relative ratio of kurtosis KB after spectrum subtraction to
kurtosis KA before spectrum subtraction (curtness ratio) is exemplified below as kurtosis change
index KR (KR = KB / KA). As understood from the above definition, the musical noise becomes
more remarkable as the kurtosis change index KR is larger (change in kurtosis is larger).
[0035]
Portions (A) and (B) of FIG. 3 are graphs (distribution diagrams) illustrating the kurtosis change
index KR for each frequency (vertical axis). It means that the kurtosis change index KR is larger
08-05-2019
12
(musical noise is more likely to occur) as the shaded area has a higher density. The kurtosis
change index KR of the part (A) of FIG. 3 is the kurtosis Kx (spectrum X [1] to X [J] in the
frequency distribution of the intensity of the spectrum X [j] before processing by the first noise
suppression unit 32. Ratio of the average of the spectrum Y [j] in the frequency distribution of
the intensity of the spectrum Y [j] immediately after the processing by the first noise suppression
unit 32 (average value of the spectrum Y [1] to Y [J]) (Ky / Kx). On the other hand, the kurtosis
change index KR of the part (B) of FIG. 3 is the kurtosis Kx in the frequency distribution of the
intensity of the spectrum X [j] before processing by the first noise suppression unit 32 and the
second noise suppression unit 42. It is a relative ratio (Kz / Kx) with kurtosis Kz (average value of
spectrum Z [1]-Z [J]) in frequency distribution of intensity of spectrum Z after directivity array
processing. That is, the kurtosis change index KR changes from the part (A) in FIG. 3 to the part
(B) in FIG. 3 by the directional array processing by the second noise suppression unit 42.
[0036]
The kurtosis change index KR in FIG. 3 is a measurement value when a noise component (white
Gaussian noise) in which directional noise and diffusive noise are mixed is generated. Directional
noise is a noise component that arrives in a directional manner from one direction (narrow
range) to the sound collection device 12 [1] to 12 [J], and diffusive noise is diffused from a
plurality of directions. It is a noise component that arrives at the sound collection device 12 [1]
to 12 [J]. The horizontal axis in part (A) and part (B) of FIG. 3 means the relative ratio of the
intensity of directional noise to the intensity of diffusive noise (hereinafter referred to as
“direction index”) D. As the directivity index D is larger, the directional noise is dominant (the
directivity is stronger), and as the directivity index D is smaller, the diffusive noise is dominant
(the diffusion is stronger).
[0037]
Since the directional array processing (delay-sum array processing) of the filter processing unit
40 in FIG. 1 acts to reduce the non-Gaussianity of the signal (central limit theorem), as shown in
FIG. If K is strong, the kurtosis change index KR is sufficiently reduced in directional array
processing after spectral subtraction. That is, when the diffusivity of the noise component is
strong, the musical noise is sufficiently suppressed by the directional array processing. On the
other hand, when the directionality of the noise component is strong, as shown in FIG. 3, the
kurtosis change index KR tends to maintain the same high numerical value as that immediately
after the spectral subtraction even after directional array processing. . That is, when the
directionality of the noise component is strong, the directional array processing hardly
08-05-2019
13
contributes to the suppression of the musical noise. As shown in FIG. 3, the above tendency
appears similarly over a wide range of frequencies.
[0038]
Next, FIG. 4 is a graph illustrating the relationship between the subtraction coefficient α
(horizontal axis) of the equation (1a) and the kurtosis change index KR (vertical axis) for each
direction index D. Further, FIG. 5 is a graph illustrating the relationship between the subtraction
coefficient α (horizontal axis) of the equation (1a) and the noise suppression rate NRR (vertical
axis) for each direction index D. In each of FIG. 4 and FIG. 5, when the noise component is only
diffuse noise (D = − と) and when diffuse noise and directional noise are mixed in the same ratio
(D = 0), It is assumed that directional noise dominates (D = 20).
[0039]
Similar to the portion (B) of FIG. 3, the kurtosis change index KR of FIG. 4 indicates the kurtosis
Kx before processing (spectrum X [j]) by the first noise suppression unit 32 and pointing by the
second noise suppression unit 42. It is a relative ratio (Kz / Kx) with kurtosis Kz after the sex
array processing (spectrum Z). However, the kurtosis change index KR in FIG. 4 is an average
value over the entire frequency range. Further, the noise suppression rate NRR in FIG. 5 is a
difference between the SN ratio ROUT of the acoustic signal VOUT after processing by the noise
suppression device 100 and the SN ratio RIN of the acoustic signal V [j] before processing (NRR =
ROUT− RIN). Therefore, it can be evaluated that the effect (performance) of the noise
suppression is higher as the noise reduction rate NRR is higher. As shown in FIGS. 4 and 5, the
musical noise is more likely to occur (the kurtosis change index KR increases in FIG. 4) and the
noise suppression effect increases (noise in FIG. 5) as the subtraction coefficient α increases.
The suppression rate NRR tends to increase.
[0040]
As understood from FIG. 4, when the directivity of the noise component is strong (for example, D
= 20), the subtraction coefficient α is increased as compared with the case where the diffusivity
of the noise component is strong (for example, D = −∞) By doing this, the kurtosis change index
KR greatly increases. On the other hand, as understood from FIG. 5, when the directionality of the
noise component is strong, the noise suppression rate NRR is sufficiently high even when the
08-05-2019
14
subtraction coefficient α is small as compared with the case where the diffusivity of the noise
component is strong. That is, under the configuration of FIG. 1, the noise suppression rate NRR is
maintained at a high level even when the subtraction coefficient α is set to a small value so that
musical noise is suppressed when the directionality of the noise component is strong. Ru.
[0041]
Further, as understood from FIG. 5, when the diffusibility of the noise component is strong (for
example, D = −∞), the noise suppression rate NRR is low as compared with the case where the
directivity of the noise component is strong. On the other hand, when the diffusivity of the noise
component is strong, the musical noise is effectively reduced by the directional array processing
by the second noise suppression unit 42 as described with reference to FIG. 3, as shown in FIG.
Even when the subtraction coefficient α is set to a large value, the kurtosis change index KR is
small (that is, musical noise is less likely to occur). That is, under the configuration of FIG. 1,
musical noise is effectively suppressed even when the subtraction coefficient α is set to a large
value in order to maintain the noise reduction rate NRR high when the diffusibility of the noise
component is strong. .
[0042]
In consideration of the above tendency, the suppression control unit 60 of FIG. 1 variably
controls the subtraction coefficient α according to the kurtosis change index KR. As shown in
FIG. 1, the suppression control unit 60 includes an index calculating unit 62 and a coefficient
adjusting unit 64. The index calculating unit 62 calculates the kurtosis change index KR for each
frame. The calculation of the kurtosis change index KR will be described in detail below.
[0043]
The kurtosis は is a high-order statistic calculated from the nth moment μn by the following
equation (5).
[0044]
The frequency distribution (probability density function) of the M intensities x1 to xM is
approximated by the function Ga (x; k, θ) of the following equation (6).
08-05-2019
15
The coefficient C of equation (6) is defined as follows using the gamma function Γ (k).
[0045]
The following equation (7) is derived by replacing the distribution function (probability density
function) P (x) in the definition equation of the second moment μ2 with the function Ga (x; k, θ)
of the equation (6) .
[0046]
Similar to the derivation of the second-order moment μ 2, the following equation (the equation
(6) can be obtained by replacing the distribution function P (x) in the definition equation of the
fourth-order moment μ 4 with the function Ga (x; k, θ) of the equation (6) 8) is derived.
[0047]
Substituting the second moment μ2 of the equation (7) and the fourth moment μ4 of the
equation (8) into the equation (5), the following equation (9) defining the kurtosis 尖 is derived.
[0048]
The index calculating unit 62 in FIG. 1 calculates M spectra X [1] to X [J] over a predetermined
number of (predetermined number of frames in the past) frames including a frame to be
calculated of the kurtosis change index KR. The kurtosis Kx before spectrum subtraction is
calculated by executing the operation of Equation (9) for the intensities x1 to xM of the spectrum
Z, and the spectrum Z of the spectrum Z over a predetermined number of frames including the
frame to be calculated The kurtosis Kz after directional array processing is calculated by
executing the operation of Formula (9) for M intensities x1 to xM.
Then, the index calculating unit 62 calculates the relative ratio of the kurtosis Kz to the kurtosis
Kx as the kurtosis change index KR (KR = Kz / Kx).
[0049]
08-05-2019
16
The coefficient adjusting unit 64 of FIG. 1 variably sets the subtraction coefficient α in
accordance with the kurtosis change index KR calculated by the index calculating unit 62.
Specifically, the coefficient adjusting unit 64 sets the subtraction coefficient α such that the
kurtosis change index KR approaches the target value K0.
As shown in FIG. 4, the kurtosis change index KR increases as the subtraction coefficient α is
increased. The coefficient adjustment unit 64 increases the subtraction coefficient α (increases
the degree of noise suppression) until the kurtosis change index KR exceeds the target value K0.
That is, the target value K0 corresponds to a numerical value (allowable value) indicating the
degree to which musical noise due to spectral subtraction should be allowed. The target value K0
is variably set, for example, according to an instruction from the user (the degree to which the
user can tolerate musical noise). However, target value K0 may be set to a predetermined fixed
value.
[0050]
FIG. 6 is a flowchart of the operation of the noise suppression apparatus 100 focusing on the
adjustment of the subtraction coefficient α. The process of FIG. 6 is sequentially performed
every predetermined period (for example, predetermined number of frames). When the process
of FIG. 6 starts, the coefficient adjusting unit 64 initializes the subtraction coefficient α to a
predetermined value (for example, zero) (S1). Next, for the m-th frame (the current frame), the
first noise suppressing unit 32 generates spectra Y [1] to Y [J] by spectrum subtraction to which
the subtraction coefficient α is applied (S2), and the spectrum Y [ The second noise suppressing
unit 42 generates a spectrum Z by directional array processing for 1] to Y [J] (S3). The spectrum
Z generated in step S3 is output to the waveform synthesis unit 52. The index calculating unit 62
calculates the kurtosis change index KR from the spectra X [1] to X [J] of the m-th frame and the
spectrum Z (S4).
[0051]
Next, the coefficient adjusting unit 64 determines whether the kurtosis change index KR
calculated in step S4 exceeds the target value K0 (S5). If the kurtosis change index KR is less than
the target value K0, the coefficient adjustment unit 64 calculates the sum of the current
subtraction coefficient α and the predetermined value Δα as the updated subtraction
08-05-2019
17
coefficient α (S6). In step S2 following step S6, spectrum subtraction to which the updated
subtraction coefficient α is applied is performed on the next ((m + 1) th frame). That is, the first
noise suppressing unit 32 subtracts the spectrum Nw [j] of the stationary noise from each
spectrum X [j] of the (m + 1) th frame according to the updated subtraction coefficient α.
[0052]
As described above, updating of the subtraction coefficient α (S6), spectral subtraction (S2) to
which the updated subtraction coefficient α is applied, directional array processing (S3) after
spectral subtraction, and kurtosis change index KR The calculation (S4) is repeated sequentially.
Therefore, the subtraction coefficient α is sequentially increased by a predetermined value Δα
for each frame so that the kurtosis change index KR sequentially approaches the target value K0.
Then, when the kurtosis change index KR exceeds the target value K0 (S5: YES), the process of
FIG. 6 ends. That is, the subtraction coefficient α after the update in the previous step S6 is
maintained until the start of the process of FIG. 6 next time.
[0053]
FIG. 7 is a graph showing the relationship between the directivity index D (horizontal axis) and
the kurtosis change index KR (vertical axis), and FIG. 8 is the directivity index D (horizontal axis)
and the noise suppression rate NRR (vertical Is a graph showing the relationship with the axis). In
FIGS. 7 and 8, when the subtraction coefficient α is controlled by the processing of FIG. 6 (solid
line) so that the kurtosis change index KR approaches the target value K0 (K0 = 1.4), the
subtraction coefficient α is fixed at 1. The case (dotted line) and the case where the subtraction
coefficient α is fixed to 2 (dotted line) are shown together.
[0054]
In the above embodiment, the coefficient adjustment is performed so that the musical noise
caused by the spectrum subtraction of the first noise suppression unit 32 is suppressed to the
degree according to the target value K0 (the kurtosis change index KR approaches the target
value K0). The unit 64 variably controls the subtraction coefficient α. When the noise
component is rich in diffusive noise (in the case where the directivity index D is small), as
described with reference to FIG. 4, the kurtosis change index KR increases even when the
subtraction coefficient α is increased. Since it is hard to do (although musical noises do not
08-05-2019
18
easily occur), the subtraction coefficient α is automatically adjusted to a large value. Therefore,
as shown in FIG. 8, it is possible to achieve the high noise suppression rate NRR as high as when
the subtraction coefficient α is fixed to 2 while suppressing the musical noise to a degree
according to the target value K0.
[0055]
On the other hand, when the noise component is rich in directional noise (the directional index D
is large), as described with reference to FIG. 4, the kurtosis change index KR increases with the
increase of the subtraction coefficient α. The subtraction coefficient α is automatically adjusted
to a small value because it is easy (musical noise is likely to occur). However, in the case of rich
directional noise, as described with reference to FIG. 5, a high noise suppression rate NRR can be
achieved even when the subtraction coefficient α is small. Therefore, musical noise can be
effectively suppressed as shown in FIG. 7 while maintaining the noise suppression rate NRR
equivalent to the case where the subtraction coefficient α is fixed to 1. That is, according to the
present embodiment, musical noise suppression (improvement of sound quality) and noise
suppression rate are obtained even in an environment where there are more directional noise
and diffuse noise compared to the case where the subtraction coefficient α is fixed to a
predetermined value. There is an advantage that it is compatible with the improvement of NRR
(the improvement of SN ratio).
[0056]
For example, it is assumed that a mobile phone equipped with the noise suppression device 100
is used in a space such as a station yard or an exhibition hall. The operating noise of the air
conditioning equipment reaches the mobile phone as diffusive noise. In addition, the sound
emitted from a sound source located far from the mobile phone (for example, the voice of
another user, a walking sound, or the sound from a speaker for broadcasting) is also reflected by
the wall surface or floor in the space. Reach the mobile phone as diffuse noise. On the other
hand, the voices of other users who are near the mobile phone and the walking sounds
intermittently arrive at the mobile phone as directional noise. That is, a space such as a station
yard or an exhibition hall is a typical environment in which directional noise and diffuse noise
are switched in a short time. Even in the above environment, according to the noise suppression
device 100 of FIG. 1, the suppression of the musical noise and the noise suppression rate NRR
both in the period in which the directional noise is dominant and in the period in which the
diffuse noise is dominant. It is possible to effectively suppress noise components (stationary
noise and nonstationary noise) while simultaneously achieving
08-05-2019
19
[0057]
<Modification> Each form illustrated above is deformed variously. The aspect of a specific
deformation | transformation is illustrated below. Two or more aspects arbitrarily selected from
the following exemplifications may be combined as appropriate.
[0058]
(1) Modification 1 For the calculation of the filter coefficient W, a known adaptive beamformer
may optionally be used other than the MVDR. For example, an SNR maximizing beamformer is
preferably employed that determines the filter coefficient W such that the SN ratio of the
acoustic signal VOUT after directional array processing is maximized. Specifically, the coefficient
setting unit 44 calculates, as the filter coefficient W (fq), an eigenvector for which the eigenvalue
is maximum under the eigenvalue problem expressed by the following equation (10). β · SNN
(fq) K (fq) = SXX (fq) K (fq) (10)
[0059]
The symbol SXX (fq) of Equation (10) means the covariance matrix of the intensity of the
component of frequency fq among the target sound components, and the symbol SNN (fq) of
Equation (10) has a frequency fq of noise components. We mean the covariance matrix of
component intensities. The covariance matrix SXX (fq) of the target sound component can be
calculated, for example, from the intensity at the frequency fq in each of the spectra X [1] to X [J]
in the target sound section detected by the noise extraction unit 24 Calculated in the same way
as Further, for example, the covariance matrix RNN (fq) calculated by the equation (4) from the
spectra Nd [1] to Nd [J] of nonstationary noise is applied as the covariance matrix SNN (fq) of the
equation (10) . When the SNR maximizing beamformer is used, there is an advantage that it is not
necessary to specify the direction (angle ξ) of the target sound component.
[0060]
(2) Modification 2 In the above embodiment, as described with reference to FIG. 6, the method of
08-05-2019
20
sequentially updating the subtraction coefficient α for each frame (ie, the optimum value of the
subtraction coefficient α gradually over a plurality of frames) However, by repeating the process
from step S2 to step S6 of FIG. 6 a plurality of times for one frame, a configuration is also
adopted in which the subtraction coefficient α is set to an optimal value for each frame. .
However, according to the method of updating the subtraction coefficient α stepwise for each
frame as shown in FIG. 6, the processing amount of the noise suppression device 100 is smaller
than the method of optimizing the subtraction coefficient α individually for each frame. It has
the advantage of being significantly reduced.
[0061]
Further, in the above embodiment, the kurtosis change index KR is the target value K0 while the
spectrum subtraction by the first noise suppression unit 32 and the filter processing (directional
array processing) by the second noise suppression unit 42 are actually performed. The
subtraction coefficient α is controlled so as to be close to K. However, the subtraction coefficient
α is analytically calculated so that the kurtosis change index KR approaches the target value K 0
(that is, the first noise suppression unit 32 and the second noise suppression unit 42 It is also
possible to calculate the subtraction factor α without actually operating. Specifically, the
intensity (second-order statistic) of the noise component remaining in the spectrum Z calculated
by the spectrum subtraction to which the subtraction coefficient α is applied and the filter
processing to which the filter coefficient W is applied, and the spectrum subtraction and filtering
The noise of the spectrum Z is defined under the condition that the kurtosis change index KR is
maintained at the target value K 0 by defining a formula (iteration formula) expressing the
relationship with the later kurtosis change index KR (quartic statistic) Calculate the subtraction
coefficient α that maximizes the component strength (second-order statistics optimization under
fourth-order statistical constraints). The above-described configuration also achieves the same
effects as the configuration of FIG.
[0062]
(3) Modification 3 In the above embodiment, the spectrum Nd [j] of non-stationary noise
estimated from the noise section is used as the spectrum Nd [j] of non-stationary noise in the
target sound section. A configuration in which the spectrum Nd [j] of non-stationary noise is
specified directly from each frame in the target sound section may also be adopted. For example,
a configuration in which the noise extraction unit 24 of FIG. 1 is replaced with the noise
extraction unit 24B of FIG. 9 or the noise extraction unit 24C of FIG. 10 is employed.
08-05-2019
21
[0063]
The noise extraction unit 24B in FIG. 9 functions as a blind spot control type beam former that
forms a blind spot (a region with low sensitivity) of sound collection in the direction (angle ξ) in
which the target sound component arrives. For example, when the angle ξ of the target sound
component is zero, as shown in FIG. 9, the noise extraction unit 24B selects one of the J sound
collection devices 12 [1] to 12 [J] (J channels). It is configured to include (J-1) subtractors 72 [1]
to 72 [J-1] corresponding to each combination of two adjacent sound collecting devices 12. The
subtractor 72 [j] subtracts the acoustic signal V [j + 1] (spectrum X [j + 1]) from the acoustic
signal V [j] (spectrum X [j]) to obtain the target sound from the angle ξ. Suppress the
component. Therefore, the spectra N [1] to N [J-1] of the noise component are output from the
noise extraction unit 24B.
[0064]
The noise suppression unit 24C of FIG. 10 corresponds to (J-1) separations corresponding to
each combination of two adjacent sound collection devices 12 among the J number of sound
collection devices 12 [1] to 12 [J]. It is constituted including parts 74 [1] to 74 [J-1]. The
separating unit 74 [j] is an independent component analysis (ICA) using the acoustic signal V [j]
(spectrum X [j]) and the acoustic signal V [j + 1] (spectrum X [j + 1]). Generate a spectrum N [j] of
the noise component. Specifically, the separating unit 74 [j] is a filter for the sound signal V [j]
and the sound signal V [j] that is a separation matrix in which the target sound component and
the noise component are set to be statistically independent. A noise component is extracted by
applying to processing (source separation). Therefore, the spectra N [1] to N [J-1] of the noise
component are output from the noise extraction unit 24C.
[0065]
In any of the configurations shown in FIGS. 9 and 10, stationary noise estimating unit 26
generates spectrum Nw [1] of (J-1) system with time average of each of spectra N [1] to N [J-1].
Generate Nw [J-1]. Therefore, the first noise suppressing unit 32 performs (J-1) acoustic signals
V [j] (for example, the acoustic signal V [1] among the acoustic signals V [1] to V [J] of the J
channels. The spectrum Nw [j] is subtracted from ~V [J−1] to generate the spectrum Y [1] to Y
[J−1] of the (J−1) family. On the other hand, non-stationary noise estimation unit 34 subtracts
spectrum Nw [j] of stationary noise from each of spectra N [1] to N [J-1] to obtain spectrum Jd of
08-05-2019
22
the (J-1) system. -Generate Nd [J-1]. Therefore, the filter coefficient W generated by the
coefficient setting unit 44 by the calculation of Equation (3) is a matrix of (J-1) rows and 1
column. The second noise suppression unit 42 performs a filter process in which the filter
coefficient W is applied to the (J-1) series of spectra Y [1] to Y [J-1] generated by the first noise
suppression unit 32.
[0066]
According to the configurations of FIGS. 9 and 10, spectra Nd [1] to Nd [J-1] of non-stationary
noise are directly extracted from each frame in the target sound segment. Compared with the
configuration of FIG. 1 in which [j] is diverted to the target sound section, it is possible to set a
filter coefficient W capable of suppressing non-stationary noise with high accuracy.
[0067]
(4) Modification 4 The definition of the kurtosis change index KR is not limited to the above
example (relative ratio between kurtosis Kx and kurtosis Kz).
For example, a configuration in which the difference between kurtosis Kz and kurtosis Kx is
calculated as kurtosis change index KR (KR = Kz-Kx) or the calculated value of a predetermined
function with kurtosis Kx and kurtosis Kz as variables A configuration in which the kurtosis
change index KR is calculated (for example, a configuration in which the relative ratio between
the kurtosis Kx and the kurtosis Kz or the logarithmic value of the difference value is used as the
kurtosis change index KR) is also suitable. In the above embodiment, the kurtosis Kx is calculated
from the acoustic signals V [1] to V [J], but the kurtosis is calculated from only one acoustic
signal V [j] selected from among the J channels. A configuration for calculating Kx is also
adopted.
[0068]
In the above embodiment, the kurtosis change index KR increases as the kurtosis Kz increases
with respect to the kurtosis Kx, but the kurtosis change index as the kurtosis Kz increases with
respect to the kurtosis Kx A configuration in which the kurtosis change index KR is defined so as
to decrease KR is also adopted. As understood from the above examples, the degree of kurtosis
change index KR changes the degree of kurtosis in the frequency distribution of the signal
strength before processing by the first noise suppression unit 32 and after processing by the
08-05-2019
23
second noise suppression unit 42 Specific calculation method (definition) is optional.
[0069]
(5) Modification 5 In the above-described embodiment, the processing from the frequency
analysis unit 22 to the waveform synthesis unit 52 is performed in the frequency domain, but
processing other than spectral subtraction by the first noise suppression unit 32 performs signal
processing in the time domain as appropriate. Can be changed to For example, the index
calculating unit 62 calculates the kurtosis Kx from each intensity of the sound signal V [j] in the
time domain, or the index calculating unit 62 calculates the kurtosis Kz from each intensity of the
sound signal VOUT in the time domain. The configuration is adopted. Also, the processing of the
noise extraction unit 24 and the stationary noise estimation unit 26 may be performed in the
time domain.
[0070]
(6) Modification 6 In the above embodiments, the spectrum Nw [j] of stationary noise is
generated for each channel of the acoustic signal V [j}, but the spectrum Nw common to a
plurality of channels (for example, FIG. A configuration that generates spectra Nw [1] to Nw [J]
can also be adopted. The first noise suppression unit 32 subtracts the common spectrum Nw of
stationary noise from each of the spectra X [1] to X [J] to generate spectra Y [1] to Y [J], and
generates non-stationary noise. The estimation unit 34 subtracts the common spectrum Nw from
each of the spectra N [1] to N [J] of the noise component to generate spectra Nd [1] to Nd [J] of
nonstationary noise.
[0071]
100: noise suppression device, 12: sound collection device, 14: sound emission device, 22:
frequency analysis unit, 24: noise extraction unit, 26: stationary noise estimation unit, 32: first
noise suppression 34: Non-stationary noise estimation unit 40: Second noise suppression unit 42:
Second noise suppression unit 44: Coefficient setting unit 52: Waveform combining unit 60:
Suppression control unit 62: Index calculation part, 64: Coefficient adjustment part.
08-05-2019
24
Документ
Категория
Без категории
Просмотров
0
Размер файла
42 Кб
Теги
jp2010271411
1/--страниц
Пожаловаться на содержимое документа