close

Вход

Забыли?

вход по аккаунту

?

JP2014128013

код для вставкиСкачать
Patent Translate
Powered by EPO and Google
Notice
This translation is machine-generated. It cannot be guaranteed that it is intelligible, accurate,
complete, reliable or fit for specific purposes. Critical decisions, such as commercially relevant or
financial decisions, should not be based on machine-translation output.
DESCRIPTION JP2014128013
PROBLEM TO BE SOLVED: To provide a noise removing device capable of removing only noise
from an acoustic signal with high accuracy by extracting only non-directional noise from an
acoustic signal without mixing in directional target sound. SOLUTION: A noise removal apparatus
selectively uses an adaptive beamformer and a fixed beamformer for each frequency. At this
time, the direction of the null of the fixed beamformer is determined from the direction of the
null automatically formed by the adaptive beamformer. Also, the filter coefficient of the adaptive
beamformer based on the criterion of output power minimization is calculated by the minimum
norm method with the norm of the filter coefficient as the constraint. The selection is made, for
example, on the basis of the null depth automatically formed by the adaptive beamformer in the
selection. [Selected figure] Figure 3
NOISE REMOVAL DEVICE AND ITS CONTROL METHOD
[0001]
The present invention relates to a noise removal technique for removing noise from an acoustic
signal.
[0002]
A technique for removing unnecessary noise from an acoustic signal is an important technique
for improving the audibility of a target sound included in the acoustic signal and for enhancing
the recognition rate in speech recognition.
[0003]
03-05-2019
1
A beamformer is a typical technique for removing noise from an acoustic signal.
In this method, a plurality of microphone signals collected by a plurality of microphones are
respectively filtered and then added to obtain a single output signal.
The above-described filtering and adding process is called a beamformer because it corresponds
to forming a spatial beam pattern having directivity, that is, direction selection, with a plurality of
microphones.
[0004]
The portion where the gain of the beam pattern peaks is called the main lobe, and if the
beamformer is configured so that the main lobe points in the direction of the target sound, the
target sound is emphasized and noise that exists in a different direction from the target sound is
It can be suppressed.
[0005]
However, the main lobe of the beam pattern has a wide width, especially when the number of
microphones is small.
Also, non-directional sound sources having no directivity like wind noise outdoors can be
considered to be spatially omnidirectionally distributed noise sources. For this reason, nondirectional noise such as wind noise can not be sufficiently removed even by using the smooth
main lobe of the beam pattern.
[0006]
Therefore, a method of noise removal using nulls, which is not the main lobe but the portion
where the gain of the beam pattern is dip, has been proposed.
[0007]
03-05-2019
2
FIG. 2A shows an example of the horizontal beam pattern at about 3.3 kHz in polar coordinates
when the number of microphones is two.
Two microphones shall be arrange | positioned with a space | interval on the line segment which
ties -90 degrees and 90 degrees. The beam patterns in the semicircle in the 0 ° direction and
the semicircle in the 180 ° direction with respect to the line segment are symmetrical.
[0008]
As shown in FIG. 2 (a), the main lobe in the direction of 90 ° has a very wide width, but the null
in the direction of -30 ° has a sharp drop in gain, and almost no sound in this direction is
output. . A typical target sound included in the microphone signal is voice, but human voice is a
directional sound source whose power is spatially concentrated at one point. Therefore, it is
proposed to remove noise by a two-step process of extracting non-directional noise first and then
subtracting the extracted noise from the microphone signal by directing the null of the beam
pattern to the directional target sound. (E.g., Patent Document 1).
[0009]
In FIG. 2A, a non-directional noise source such as wind noise is schematically represented by a
mark of “」 ”as spatially distributed in all directions. Also, the human voice, which is the
directional target sound located in the −30 ° direction, is represented by the mark of the face.
Here, since the power per angle of the nondirectional noise source is smaller than the human
voice that is the directional target sound, if the beamformer is configured to minimize the output
power, −30 ° A null is automatically formed in the target sound direction of. Thus, a
beamformer in which nulls of beam patterns are automatically formed by a standard such as
output power minimization is called "adaptive beamformer". According to the adaptive
beamformer, as shown in FIG. 2 (a), a beam pattern directed to a null is automatically obtained
without knowing the direction of the target sound in advance. Is suitable.
[0010]
However, adaptive beamformers have the following problems.
03-05-2019
3
[0011]
For example, in the case of wind noise, the power in the low band is very strong although it is
non-directional, so in the low band, the power per angle is the target sound of the directional as
schematically shown in FIG. It is comparable in size.
The beam pattern in the same figure shows that at about 470 Hz, which is relatively low in the
beam pattern of the adaptive beamformer formed for the human voice under wind noise. At this
frequency, the power in the direction of the target sound is not particularly large compared to
the other directions, so the null is very smooth compared to the approximately 3.3 kHz in FIG.
For this reason, the target sound can not be sufficiently removed, and the target sound is mixed
in the extracted noise, so that the target sound is scraped in the subsequent noise subtraction.
[0012]
For adaptive beamformers where beam pattern nulls are automatically formed, beamformers that
fixedly form nulls in a particular direction are referred to as "fixed beamformers." Patent
Document 1 discloses a method of selecting for each frequency using an adaptive beamformer
and a fixed beamformer in combination when noise is extracted by a beamformer from a
microphone signal collected by a microphone array.
[0013]
Unexamined-Japanese-Patent No. 2003-271191
[0014]
However, the method of Patent Document 1 has the following problems.
[0015]
First, as an adaptive beamformer method, a method using Jim-Griffith's adaptive beamformer is
disclosed.
03-05-2019
4
This is based on the standard of output power minimization, and although beam pattern nulls are
automatically formed, the direction of the main lobe is used as a constraint to make the filter
coefficient vector of the beamformer a nonzero vector. It is necessary to specify.
However, in extracting non-directional noise, it is only the null that is essentially needed to direct
the target sound when it is necessary. Therefore, specifying the direction of the main lobe
explicitly affects the beam pattern and removes the target sound. The ability may decline.
[0016]
Also, for fixed beamformers, a method by simple channel-to-channel difference of microphone
signals is disclosed. However, in this method, a null is created in the direction of the
perpendicular bisector of the line connecting the microphones, and the null does not necessarily
point in the direction of the target sound, so the target sound is mixed in the extracted noise
There is a high possibility of doing it.
[0017]
Furthermore, as a method of selecting an adaptive beamformer and a fixed beamformer, a
method is disclosed which selects the smaller output power for each frequency band. However,
as described above, since the null of the fixed beamformer does not necessarily point in the
direction of the target sound, and only the output power is seen, a selection method that is
necessarily suitable for extracting only noise excluding the target sound It can not be said that.
[0018]
The present invention has been made to solve the above-mentioned problems. That is, the
present invention provides a noise removal apparatus capable of extracting only non-directional
noise from an acoustic signal without mixing in directional target sound and removing only noise
from the acoustic signal with high accuracy.
[0019]
According to one aspect of the present invention, an acquisition means for acquiring a plurality
03-05-2019
5
of microphone signals collected by a plurality of microphones, and a direction for extracting nondirectional noise from the plurality of microphone signals to obtain a noise extraction signal
Adaptive beamformer in which nulls of beam patterns are automatically formed in the direction
of a target sound, fixed beamformers forming nulls of beam patterns in a designated direction,
and the adaptive beamformer as a beamformer to be used for each frequency Or a selection
means for selecting the fixed beamformer, wherein the designated direction is determined from
the direction of a null automatically formed by the adaptive beamformer.
[0020]
According to the present invention, it is possible to extract only non-directional noise from the
acoustic signal without mixing in the directional target sound, and to remove only the noise from
the acoustic signal with high accuracy.
[0021]
1 is a block diagram of a noise removal apparatus according to an embodiment.
The figure explaining a beam pattern.
5 is a flowchart showing noise removal processing according to the first embodiment. FIG. 6 is a
diagram for explaining the depth and direction of the null according to the first embodiment. 6 is
a flowchart showing noise removal processing according to the second embodiment. FIG. 7 is a
view showing an example of the relationship between the correlation coefficient between a
plurality of microphone signals and the switching frequency according to the second
embodiment. 10 is a flowchart showing noise removal processing according to the third
embodiment. FIG. 13 is a diagram showing an example of the relationship between the amplitude
spectrum of noise and the switching frequency according to the third embodiment. 10 is a
flowchart showing noise removal processing according to the fourth embodiment. FIG. 13 is a
diagram showing an example of the relationship between a fundamental frequency and a
switching frequency according to a fourth embodiment.
[0022]
Hereinafter, the present invention will be described in detail based on preferred embodiments
03-05-2019
6
thereof with reference to the attached drawings. In addition, the structure shown in the following
embodiment is only an example, and this invention is not limited to the illustrated structure.
[0023]
As described above, the present invention provides a noise removal device capable of extracting
only non-directional noise from an acoustic signal without mixing in a directional target sound
and removing only noise from the acoustic signal with high accuracy. The noise removal
apparatus in the embodiment selectively uses an adaptive beamformer and a fixed beamformer
for each frequency. At this time, the direction of the null of the fixed beamformer is determined
from the direction of the null automatically formed by the adaptive beamformer. Furthermore,
the filter coefficients of the adaptive beamformer based on the criterion of output power
minimization are calculated by the minimum norm method with the norm of the filter coefficients
as the constraint.
[0024]
Embodiment 1 FIG. 1 is a block diagram showing an embodiment of the present invention. The
noise removal apparatus shown in FIG. 1 includes a system control unit 101 that controls all the
components in the main system controller 100, a storage unit 102 that stores various data, and a
signal analysis processing unit that performs signal analysis processing. And 103.
[0025]
As elements for realizing the function of the sound collection system, a sound collection unit 111
and an acoustic signal input unit 112 are provided. In the present embodiment, the sound
collection unit 111 is configured by a two-channel stereo microphone in which two microphone
elements 111a and 111b are arranged at an interval. Note that the storage unit 102 holds the
arrangement coordinates of each microphone element in advance. Alternatively, the data may be
externally input via a data input / output unit (not shown) mutually connected to the storage unit
102. The acoustic signal input unit 112 performs amplification and AD conversion on the analog
acoustic signal from each microphone element of the sound collection unit 111, and generates a
2ch microphone signal which is a digital acoustic signal at a cycle corresponding to a
predetermined sampling rate. The number of microphone elements may be plural, and may be
three or more. That is, the present invention is not limited to the case where the number of
03-05-2019
7
microphone elements is two.
[0026]
In this embodiment, it is assumed that a human voice in the direction of -30 ° as directional
target sound and wind noise as non-directional noise are mixed and input to the stereo
microphone. The 2ch microphone signal acquired by the sound collection system is sequentially
recorded in the storage unit 102, and the signal analysis processing unit 103 is at the center, and
the noise removal processing of the present embodiment is performed along the flowchart of FIG.
The sound sampling rate will be described as 48 kHz.
[0027]
A signal sample unit that performs filtering of a microphone signal in a beam former is referred
to as a time block, and in the present embodiment, a time block length is 1024 samples (about
21 ms). Also, microphone signal filtering is performed in the time block loop while shifting the
signal sample range by 512 samples (about 11 ms) which is half the time block length. That is,
the first to 1024th samples of the microphone signal are filtered in the first time block, and the
513th to 1536th samples in the second time block.
[0028]
The flowchart of FIG. 3 represents processing in one time block in a time block loop.
[0029]
First, in step S301, the 2ch microphone signals are subjected to Fourier transform to acquire
Fourier coefficients.
Here, since averaging processing is required to calculate the spatial correlation matrix, which is a
statistic, in the next S302, a unit of time frame is introduced based on the current time block. The
time frame length is 1024 samples which is the same as the time block length, and a signal
sample range shifted by a predetermined time frame shift length on the basis of the signal
sample range of the current time block is taken as a time frame. In this embodiment, the time
03-05-2019
8
frame shift length is 32 samples, and the number of time frames corresponding to the number of
times of averaging is 128. That is, in the first time block, the first time frame targets the first
sample to the 1024th sample of the microphone signal as the first time block, and the second
time frame targets the 33rd sample to the 1056th sample. And, since the 128th time frame
targets the 4065th sample to the 5088th sample, the spatial correlation matrix of the first time
block is calculated from the 106 ms microphone signal of the 5088th sample to the 1st sample.
The time frame may be a signal sample range prior to the current time block.
[0030]
Based on the above, in S301, the Fourier coefficients at the frequency f and time frame k related
to the current time block of the microphone signal of the ith channel are expressed as Zi (f, k) (i =
1, 2, k = 1 to 128). Get to It is preferable to window the microphone signal before the Fourier
transform, and the windowing is also performed after the time signal is restored to the time
signal by the inverse Fourier transform. Therefore, for a time block overlapping by 50%, a sine
window or the like is used as the window function in consideration of the reconstruction
condition in the two windowings.
[0031]
S302 to S307 are processes for each frequency, and are performed in a frequency loop.
[0032]
In S302, a spatial correlation matrix, which is a statistic representing the spatial property of the
microphone signal, is calculated.
The Fourier coefficients of each channel obtained in S301 are collectively vectorized, and placed
as z (f, k) = [Z1 (f, k) Z2 (f, k)] <T>. Using z (f, k), the matrix R k (f) at the frequency f and the time
frame k is determined as in equation (1). Here, superscript T represents transposition and
superscript H represents complex conjugate transposition.
[0033]
03-05-2019
9
[0034]
The spatial correlation matrix R (f) is obtained by averaging Rk (f) over all time frames, ie adding
R1 (f) to R128 (f) and dividing by 128.
[0035]
At S303, filter coefficients of the adaptive beamformer are calculated.
A filter coefficient for filtering the microphone signal of the ith channel is Wi (f) (i = 1, 2), and a
filter coefficient vector of the beam former is w (f) = [W1 (f) W2 (f)] <T> Put like.
[0036]
In the present embodiment, the filter coefficient of the adaptive beamformer is calculated by the
minimum norm method.
This is based on the standard of output power minimization, and the constraint for making w (f) a
nonzero vector is described not by the specification of the main lobe direction but by the
specification of the filter coefficient norm. This makes it unnecessary to specify the main lobe
direction, which is essentially unnecessary in the extraction of non-directional noise. Since the
average output power at the frequency f of the beamformer is represented by w <H> (f) R (f) w
(f), the filter coefficient of the minimum norm adaptive beamformer is limited by the equation (2)
It is obtained as a solution of the optimization problem.
[0037]
[0038]
This is a quadratic minimization problem in which the Hermitian matrix R (f) is a coefficient
matrix.
03-05-2019
10
Therefore, the eigenvector corresponding to the minimum eigenvalue of R (f) is the filter
coefficient vector wadapt (f) of the adaptive beamformer calculated by the minimum norm
method.
[0039]
In S304, the beam pattern of the adaptive beamformer is calculated. Using the filter coefficient
wadapt (f) of the adaptive beamformer calculated in S303, the value Ψ (f, θ) in the azimuth
angle θ direction of the beam pattern is obtained by the equation (3).
[0040]
[0041]
a (f, θ) is an array manifold vector represented by equation (4).
[0042]
[0043]
Here, j represents an imaginary unit.
Also, a vector summarizing propagation delay times τ i (θ) (i = 1, 2) from the point of azimuth
θ on the unit sphere centered on the origin of the coordinate system describing the microphone
arrangement coordinates to each microphone element Is set as τ (θ) = [τ1 (θ) τ2 (θ)] <T>.
[0044]
By calculating Ψ (f, θ) while changing θ from −180 ° to 180 °, a beam pattern in the
horizontal direction can be obtained.
03-05-2019
11
In addition, focusing on the symmetry of the beam pattern, only the beam pattern from -90 ° to
0 ° through 90 ° may be calculated.
Further, in order to accurately grasp the null depth of the beam pattern to be checked in the next
S305, it is also possible to calculate Ψ by making the interval of θ close in the vicinity of the
null where Ψ becomes small. Furthermore, not only the azimuth angle θ but also the elevation
angle φ is calculated from 90 (f, θ, φ) while changing from -90 ° to 90 ° other than 0 °, so
that all azimuth including not only horizontal direction but also vertical direction It is also
possible to target beam patterns of
[0045]
In S305, the null depth of the beam pattern formed by the adaptive beamformer is checked.
[0046]
FIG. 4A shows an example in which the beam pattern at a certain frequency calculated in S304 is
shown in orthogonal coordinates, and corresponds to the beam pattern in the case of FIG. 2A
shown in polar coordinates.
As shown in FIG. 4A, since a deep null is automatically formed in the target sound direction by
the adaptive beamformer, it is considered that only wind noise can be extracted at this frequency
without mixing the target sound. Here, the difference between the maximum value and the
minimum value of the beam pattern is defined as the null depth, as indicated by the doubleheaded arrow in FIG. Then, if the depth of the null is at least a predetermined value, for example,
at least 20 dB, the process proceeds to S306, and an adaptive beamformer is selected at this
frequency.
[0047]
On the other hand, FIG. 4 (b) is an example showing a beam pattern at another frequency
calculated in S304, which corresponds to the beam pattern in the case of FIG. 2 (b) shown by
polar coordinates. As shown in FIG. 4B, since the null automatically formed by the adaptive
beamformer is shallow and smooth, it is considered that the target sound is mixed in the
extraction of the wind noise at this frequency. Therefore, if the depth of the null is less than a
03-05-2019
12
predetermined value, for example, less than 20 dB, the process proceeds to S307, and at this
frequency, it is assumed that the fixed beamformer which forms the null fixedly in the designated
direction is selected.
[0048]
In S308, a null direction (designated direction) for forming nulls, which needs to be designated
when using a fixed beamformer, is determined. In the present invention, the null direction of the
fixed beamformer is determined from the beam pattern of the frequency at which the adaptive
beamformer checked in S305 is deep and the adaptive beamformer is selected in S306.
[0049]
When the null automatically formed by the adaptive beamformer is shallow, the direction of the
null may be deviated from the target sound direction (−30 °) as shown in FIG. 4 (b). On the
other hand, when a deep null is automatically formed by the adaptive beamformer, the direction
of the null is considered to indicate the target sound direction as shown in FIG. 4 (a). Therefore,
in S306, the adaptive beamformer averages the beam pattern of the selected frequency, and the
null direction for which the average beam pattern is the minimum value is set as the null
direction θnull designated by the fixed beamformer. That is, by averaging, the directions of
slightly different nulls are converged at each frequency to obtain representative values for use in
a fixed beamformer. Note that it is not necessary to obtain θ null using the beam pattern of all
frequencies for which the adaptive beamformer is selected, for example, only for the frequency
for which the adaptive beamformer is selected within the main frequency band of the sound that
is the target sound. May be
[0050]
FIG. 4 (c) shows an example of beam pattern averaging in this step. Thin lines in the figure are
beam patterns of several frequencies for which the adaptive beamformer is selected, and the
average beam pattern obtained by averaging them is represented by thick lines. From the
direction of nulls of this average beam pattern, the null direction θ null designated by the fixed
beam former is determined as −30 °.
03-05-2019
13
[0051]
Steps S309 to S312 are again processing for each frequency, and are performed in the frequency
loop.
[0052]
In S309, when the adaptive beamformer is not selected at the frequency of the current loop, the
fixed beamformer is selected, so it is necessary to proceed to S310 to calculate the filter
coefficients of the fixed beamformer.
[0053]
In S310, the filter coefficient wfix (f) of the fixed beamformer is calculated using the null
direction θ null designated by the fixed beamformer determined in S308.
[0054]
First, in the beam pattern of the fixed beamformer, the condition for forming a null in the null
direction θ null is expressed as in equation (5) using the array manifold vector a (f, θ null).
[0055]
[0056]
However, since the solution becomes a zero vector only by the equation (5), the equation (6) is
added as a condition for forming the main lobe in the main lobe direction θmain.
Here, the main lobe direction θmain is determined to be the opposite direction of the null
direction θnull or the like.
[0057]
[0058]
Expression (5) and Expression (6) can be expressed as Expression (7) if expressed using the
matrix A (f) = [a (f, θ null) a (f, θ main)].
03-05-2019
14
[0059]
[0060]
Therefore, by multiplying both sides of Expression (7) by the inverse matrix of A <H> (f) from the
left, the filter coefficient wfix (f) of the fixed beam former can be obtained.
Since the norm of wfix (f) varies from frequency to frequency, it is preferable to normalize the
norm so that it becomes 1, as in the adaptive beamformer.
When the number of elements of the filter coefficient vector wfix (f), that is, the number of
microphone elements of the sound collection unit 111, and the number of control points on the
beam pattern as in Equations (5) and (6) are different, Since A (f) is not a square matrix, a
generalized inverse matrix is used.
[0061]
As in the present step, in this embodiment, a fixed beamformer that forms nulls in the θ null
direction is used.
As a result, in the adaptive beamformer, a beam pattern in which a sharp null is formed in the
target sound direction as shown in FIG. 2C can be obtained even with the frequency that has
become a beam pattern as shown in FIG. 2B.
Therefore, it is possible to extract only wind noise without mixing in the target sound in the next
S311.
[0062]
03-05-2019
15
In S311, the microphone signal is filtered as shown in equation (8) to obtain the Fourier
coefficient Y (f) of the noise extraction signal.
Here, z (f) = z (f, 1).
[0063]
[0064]
The beamformer filter coefficient w (f) uses wadapt (f) at the frequency at which the adaptive
beamformer is selected and wfix (f) at the frequency at which the fixed beamformer is selected.
[0065]
In S312, the noise extracted in S311 is subtracted from each microphone signal in the frequency
domain to obtain the Fourier coefficients Xi (f) (i = 1, 2) of the noise removal microphone signal
from which the noise has been removed.
The noise subtraction is performed by spectral subtraction as expressed by equation (9) or the
like.
[0066]
[0067]
Here, Zi (f) = Zi (f, 1) (i = 1, 2), the amplitude spectrum is represented by an absolute value
symbol, and the phase spectrum is represented by arg.
Further, β is a subtraction coefficient for adjusting the subtraction strength, and η is a flooring
coefficient for securing a minute output when the subtraction result is not positive.
03-05-2019
16
[0068]
In S311, only the wind noise can be extracted without mixing in the target sound, so it is possible
to remove only the wind noise with high accuracy without cutting the target sound in the noise
subtraction of this step.
[0069]
In S313, the Fourier coefficients of the noise removal microphone signal acquired in S312 are
inverse Fourier transformed to obtain the noise removal microphone signal in the current time
block.
This is windowed and overlap-added to the noise removal microphone signal up to the previous
time block, and the obtained noise removal microphone signal is sequentially recorded in the
storage unit 102.
The noise removal microphone signal obtained as described above can be output to the outside
through the data input / output unit or reproduced by an unshown sound reproduction system
such as an earphone.
[0070]
Second Embodiment In the above embodiment, it is determined for each frequency whether to
select an adaptive beamformer or to select a fixed beamformer.
In the following embodiment, the switching frequency of the beam former is introduced based on
the fact that the wind noise assumed as a specific example of non-directional noise tends to have
a stronger power as the frequency is lower.
[0071]
That is, at a frequency higher than the switching frequency, the power of wind noise is smaller
than that of the target sound as shown in FIG. 2A, and the adaptive beamformer considers that
03-05-2019
17
sharp nulls are automatically formed in the target sound direction. Choose
On the other hand, at frequencies below the switching frequency, the power of wind noise is
comparable to the target sound as shown in FIG. 2 (b), and the null automatically formed by the
adaptive beamformer is considered to be smooth, and a fixed beamformer is selected.
[0072]
The switching frequency may be fixedly used at a predetermined value such as 1 kHz, for
example, but in the present embodiment, it is determined from the correlation coefficient
between the microphone signals, and the noise removal processing is performed according to the
flowchart of FIG. .
[0073]
First, in S501, the correlation coefficient between microphone signals is calculated from each
microphone signal of the signal sample range of the current time block.
Since the correlation coefficient is calculated for a combination of two channels of microphone
signals, if the number of microphone elements is M, MC2 correlation coefficients are obtained. In
the case of a stereo microphone, the correlation coefficient is one.
[0074]
In S502, the switching frequency is determined from the correlation coefficient calculated in
S501 using the relationship represented by the graph of FIG. In addition, what is necessary is just
to use the average value, when a microphone element is three or more and several correlation
coefficients are obtained. In addition, when the correlation coefficient is a negative value, it is
assumed that the absolute value is taken or 0.
[0075]
The shape of the graph of FIG. 6 is determined by the following concept. First, because the
03-05-2019
18
directional target sound has high correlation among the microphones, the correlation coefficient
has a value close to one. On the other hand, non-directional wind noise has a low correlation
between microphones, so the correlation coefficient has a value close to zero. Therefore, as the
correlation coefficient approaches 1 to 0, it is considered that wind noise is stronger than the
target sound, and the switching frequency is increased to increase the ratio of the frequency for
selecting the fixed beam former. In particular, when the correlation coefficient is close to 1, the
switching frequency is set to 0 Hz, and only the adaptive beamformer is used. In addition, the
switching frequency when the correlation coefficient is 0 is set to 1 kHz in consideration of the
main frequency band of wind noise.
[0076]
The processing of step S503 is the same as that of step S301, and thus the description thereof is
omitted.
[0077]
Although S504 to S506 are processes for each frequency and are performed in the frequency
loop, since they are processes relating to the adaptive beamformer, it may be performed only at a
frequency higher than the switching frequency determined in S502.
Note that the processing of S504 to S506 is the same as that of S302 to S304.
[0078]
The processing of step S507 is the same as that of step S308, and thus the description thereof is
omitted.
[0079]
Steps S508 to S511 are processes for each frequency again, and are performed in the frequency
loop.
In S508, if the frequency of the current loop is less than the switching frequency, the fixed
beamformer is to be selected, so it is necessary to proceed to S509 to calculate the filter
03-05-2019
19
coefficients of the fixed beamformer. The processing of S509 to S511 is the same as that of S310
to S312.
[0080]
Since the process of the last S512 is the same as S313, the description is omitted.
[0081]
Third Embodiment In this embodiment, it is assumed that the switching frequency is determined
from the noise extracted by the adaptive beamformer, and the noise removal processing is
performed along the flowchart of FIG.
[0082]
Since the process of S701 is the same as that of S301, the description is omitted.
[0083]
Steps S702 to S705 are processes for each frequency and are performed in the frequency loop.
The processes of S702 to S704 are the same as S302 to S304.
[0084]
In step S705, the microphone signal is filtered as shown in equation (8) to obtain the Fourier
coefficient Y (f) of the noise extraction signal.
However, since the filter coefficient of the beamformer calculated at this point is only wadapt,
noise extraction is performed only by the adaptive beamformer.
[0085]
In S706, the switching frequency is determined from the Fourier coefficient of the noise
extraction signal acquired in S705.
03-05-2019
20
[0086]
FIG. 8 represents a spectrogram in which an amplitude spectrum obtained from Fourier
coefficients of the noise extraction signal is displayed over a plurality of time blocks.
The values of the amplitude spectrum expressed in decibels are displayed by being binarized by a
threshold of a predetermined level, where white indicates the larger level and black indicates the
smaller level.
From the figure, it can be seen that an amplitude spectrum envelope of wind noise is obtained.
[0087]
At frequencies above the amplitude spectrum envelope, it is considered that only wind noise can
be extracted by the adaptive beamformer, since the fringe pattern due to the harmonic structure
of the voice as the target sound is hardly visible. However, since wind noise becomes
considerably strong at frequencies below the amplitude spectrum envelope, there is a high
possibility that voice is mixed in, although it is not visible due to the large amplitude spectrum of
wind noise.
[0088]
Therefore, in the present embodiment, the switching frequency of the beamformer is determined
from the amplitude spectrum envelope of the noise extracted by the adaptive beamformer, and
the fixed beamformer is used at frequencies below the switching frequency.
[0089]
As a specific process of this step, for example, assuming that the current time block is indicated
by a dotted line in FIG. 8, the switching frequency is the maximum frequency at which the level
of the noise amplitude spectrum is equal to or higher than the threshold. It becomes about 710
Hz shown by the arrow of.
03-05-2019
21
[0090]
The processing of step S707 is the same as that of step S308, so the description will be omitted.
[0091]
Steps S 708 to S 711 are processes for each frequency again, and are performed in the frequency
loop.
In S708, if the frequency of the current loop is less than the switching frequency, the fixed
beamformer is to be selected, so it is necessary to proceed to S709 to calculate the filter
coefficients of the fixed beamformer.
Note that the process of S709 is the same as S310.
[0092]
In S710, the Fourier coefficient Y (f) of the noise extraction signal acquired using the adaptive
beamformer filter coefficient wadapt (f) in S705 is acquired using the fixed beamformer filter
coefficient wfix (f) Update to
Note that the process of S711 is the same as S312.
[0093]
Since the process of the last S712 is the same as S313, the description is omitted.
[0094]
Fourth Embodiment In this embodiment, the switching frequency is determined from the
fundamental frequency detected from the microphone signal, and the noise removal processing
is performed along the flowchart of FIG.
[0095]
03-05-2019
22
Since the process of S901 is the same as that of S301, the description is omitted.
[0096]
In S902, the fundamental frequency of the sound that is the target sound is detected from the
Fourier coefficients Zi (f, 1) (i = 1, 2) in the current time block of each microphone signal
acquired in S901.
[0097]
FIG. 10 shows the real cepstrum calculated from Z1 (f, 1) of ch1 over a plurality of time blocks.
The value of the real cepstrum expressed in decibels is displayed by being binarized by a
threshold of a predetermined level, white indicates the larger level and black indicates the
smaller level.
The vertical axis of the graph is the reciprocal of the quefency, which has a frequency dimension,
and represents the fundamental frequency when the amplitude spectrum has a harmonic
structure.
[0098]
The horizontal line (about 285 Hz) surrounded by a solid circle in the same figure is the
frequency at which the level of the real cepstrum is equal to or higher than the threshold value,
and it is considered to represent the fundamental frequency of the voice included in the
microphone signal.
As described above, in the time block in which the fundamental frequency is detected in this step,
the process proceeds to S903, and the fundamental frequency is set as the switching frequency
of the beam former.
This is because it is more difficult to detect the fundamental frequency as the wind noise is
stronger than the voice, but if the fundamental frequency is detected above a predetermined
03-05-2019
23
level, only the wind noise can be extracted by the adaptive beamformer above that frequency.
Based on the idea of devil.
[0099]
When the threshold for binarization is lowered in FIG. 10, a horizontal line (about 142 Hz)
surrounded by a dotted circle in FIG. 10 appears. As described above, even the frequency lower
than the fundamental frequency (about 285 Hz) detected above the predetermined level may
contain the sound as the target sound, so the fixed beam former forms a null in the target sound
direction. It is considered to be meaningful.
[0100]
When a plurality of fundamental frequencies are detected in one channel in one time block, it is
preferable to set the lowest one as the fundamental frequency. In addition, when the fundamental
frequency is different for each channel, it is preferable to select the highest one.
[0101]
If the fundamental frequency can not be detected above the predetermined level in S902, it is
considered as an unvoiced section in which only wind noise is present, and the process proceeds
to S904 to set the switching frequency to 0 Hz. That is, noise is extracted using only an adaptive
beamformer. If there is no directional target sound and only non-directional wind noise, it is not
necessary to make the beamformer more directional in noise extraction. Thus, when only nondirectional noise is present, the adaptive beamformer is preferred because the beam pattern
shown in polar coordinates is approximately circular.
[0102]
In addition, when there is a time block in which the fundamental frequency is detected by the
time block which has been traced back by a predetermined number, the present time block is
considered not to be a voiceless section but a consonant section in which the harmonic structure
is unclear. May be used as the switching frequency.
03-05-2019
24
[0103]
The subsequent processes of S 905 to S 913 are the same as S 504 to S 512 of the second
embodiment, and thus the description thereof is omitted.
[0104]
In the third embodiment, the switching frequency is determined from the amplitude spectrum
envelope of wind noise.
On the other hand, in the present embodiment, even if the frequency is lower than the abovementioned amplitude spectrum envelope, since the switching frequency is determined if the
fundamental frequency is detected, the ratio of the frequency for selecting the adaptive
beamformer compared to the third embodiment is It tends to increase.
[0105]
Note that the microphone signal may not necessarily be acquired by the noise removing device of
the present invention, and the multi-channel microphone signal and the arrangement coordinates
of the corresponding microphone elements may be acquired from the outside via the data input /
output unit. It is also good.
[0106]
According to the present invention described above, the adaptive beamformer and the fixed
beamformer are selected for each frequency, and the direction of the null of the fixed
beamformer is determined from the direction of the null automatically formed by the adaptive
beamformer.
Furthermore, the filter coefficients of the adaptive beamformer based on the criterion of output
power minimization are calculated by the minimum norm method with the norm of the filter
coefficients as the constraint.
In addition, check the depth of the null automatically formed by the adaptive beamformer in the
above selection. By these processes, only non-directional noise can be extracted from the acoustic
03-05-2019
25
signal without mixing in the directional target sound, and only noise can be removed from the
acoustic signal with high accuracy.
[0107]
Other Embodiments The present invention is also realized by executing the following processing.
That is, software (program) for realizing the functions of the above-described embodiments is
supplied to a system or apparatus via a network or various storage media, and a computer (or
CPU, MPU or the like) of the system or apparatus reads the program. It is a process to execute. In
this case, the program and the storage medium storing the program constitute the present
invention.
03-05-2019
26
Документ
Категория
Без категории
Просмотров
0
Размер файла
39 Кб
Теги
jp2014128013
1/--страниц
Пожаловаться на содержимое документа