close

Вход

Забыли?

вход по аккаунту

?

JP2011123370

код для вставкиСкачать
Patent Translate
Powered by EPO and Google
Notice
This translation is machine-generated. It cannot be guaranteed that it is intelligible, accurate,
complete, reliable or fit for specific purposes. Critical decisions, such as commercially relevant or
financial decisions, should not be based on machine-translation output.
DESCRIPTION JP2011123370
PROBLEM TO BE SOLVED: In a process of separating a target sound and an interference sound
coming from an arbitrary direction other than the arrival direction of the target sound, even
when the arrival direction of the target sound deviates, the quality of the sound after separation
processing is maintained. SOLUTION: The present invention relates to a sound source separation
device. Then, the sound source separation device is any one of a predetermined range including
the arrival direction of the target sound with respect to the spectrum of the sound reception
signals of the two microphones among the sound reception signals of the plurality of
microphones arranged at intervals. A means for generating a target sound suppression spectrum,
a means for generating a target sound dominant spectrum, and a target sound suppression
spectrum and a target sound superiority, using a plurality of target sound suppressors that
process the directivity of signal suppression in a direction. It is characterized by having means
for separating an interference sound component and a target sound component of a received
signal by using a spectrum. [Selected figure] Figure 1
Sound source separation apparatus, program and method
[0001]
The present invention relates to a sound source separation device, a program and a method, and
can be applied to noise removal in voice capture of, for example, a telephone device or a voice
recognition device.
[0002]
In a telephone device or a speech recognition device, a microphone captures user speech, but
ambient noise may extremely degrade speech recognition accuracy, or the recorded speech may
04-05-2019
1
be difficult to hear due to noise.
[0003]
For this reason, in the past, attempts have been made to selectively capture only a desired target
sound by controlling the directional characteristics with a microphone array, etc. However, with
only such directional characteristics control, it is possible to obtain desired voices. It was difficult
to separate out the background noise.
[0004]
As a conventional microphone array technology, for example, a delay-and-sum array (DSA:
Delayed Sum Array), a technique related to directional pattern control called BF (Beam-Forming),
or directivity using a Directionally Constrained Minimization of Power (DCMP) adaptive array
There are technologies related to characteristic control.
[0005]
On the other hand, as a technology for separating speech by remote speech, a technology for
narrow-band spectrum analysis of output signals of a plurality of fixed microphones and
assigning sound of that frequency band to the microphone giving the largest amplitude for each
frequency band (SAFIA) There is the description technology of patent document 1 as being
carried out.
In the speech separation technology by band selection (BS: Band Selection) described in Patent
Document 1, in order to obtain a desired speech, the microphone closest to the sound source
emitting the desired speech is selected and assigned to the microphone Synthesize voice using
sounds in frequency band.
[0006]
In addition, as a further technology, Patent Document 2 describes a technology in which the
method of band selection is improved.
[0007]
In the technology described in Patent Document 2, a target sound to be captured by suppressing
interference noise using signals input to two microphones arranged in a direction perpendicular
04-05-2019
2
or substantially perpendicular to the target sound arrival direction. A target sound superior
signal emphasizing the target sound and a target sound inferior signal emphasizing the
disturbance sound by suppressing the target sound are created, and the two types of signals are
used to separate the target sound and the disturbance sound.
[0008]
In Patent Document 2, generation of a target sound dominant signal and a target sound inferior
signal is realized using a filter called a "spatial filter".
[0009]
FIG. 3 is an explanatory view showing the characteristics of the spatial filter.
[0010]
Hereinafter, a vertical plane with respect to a line connecting the two microphones M1 and M2 is
referred to as a direction of 0 degrees, and a clockwise direction is represented as a positive
angle, and a counterclockwise direction is represented as a negative angle.
That is, the above-mentioned direction is expressed in the range of -180 degrees-180 degrees (180 degrees is the same direction).
[0011]
FIG. 3 illustrates the case where there is a sound source input from the direction of the angle θ
with respect to the two microphones M1 and M2 arranged at the distance d.
In this case, a distance difference of d × sin θ occurs between the sound source input from the
direction of the angle θ and the two microphones M1 and M2, and as a result, for the arrival
time of sound, between the microphones M1 and M2 The time difference τ represented by the
following equation (1) occurs.
[0012]
04-05-2019
3
Then, when the output obtained by delaying the output of the microphone M1 by the time
difference τ from the output of the microphone M2 is subtracted from each other, the two
cancel each other and the sound in the θ direction is suppressed.
Hereinafter, the angle in the spatial filter in which the sound is suppressed (θ in the above
example) will be referred to as a “suppression angle”.
[0013]
FIG. 4 is an explanatory view showing directivity characteristics in the spatial filter.
[0014]
In FIG. 4, the curve L represents the directivity when the suppression angle of the spatial filter is
set to θ, and the gain is higher as the distance from the middle point of the line connecting the
microphones M1 and M2 increases. It indicates that the gain is smaller (the magnitude of
suppression is larger) as the distance is smaller (the magnitude of suppression is smaller) and the
distance is shorter.
[0015]
In FIG. 4, since the suppression angle of the spatial filter is set in the direction of θ, it is shown
that the suppression strength in that direction is set to be the largest.
[0016]
JP-A-10-313497 JP-A-2006-197552
[0017]
However, in the technology described in Patent Document 1, in a situation where two sounds
overlap, both can be separated well, but when there are three or more sound sources, although
separation is theoretically possible, it is separated Performance is extremely degraded.
Therefore, in the presence of multiple noise sources, it is difficult to accurately separate the
04-05-2019
4
target sound from the multiple noises.
[0018]
Moreover, in the description technique of patent document 2, although the process which isolate
| separates an object sound is performed using a space filter, when the arrival direction of the
object sound shifts in the middle of performing the process to isolate | separate, a space filter
Characteristics of the target sound may affect the quality of the target sound after separation.
Hereinafter, in the method described in Patent Document 2, the influence of the property of the
spatial filter on the target sound after separation will be described.
[0019]
FIG. 5 is an explanatory view showing a change characteristic of gain in the direction close to the
suppression angle in the spatial filter.
[0020]
In FIG. 5, assuming that the suppression angle of the spatial filter is θ, the target sound comes
from a direction slightly shifted counterclockwise from G 1, 0 ° when the target sound comes
from the direction of 0 ° (front) The case gain is described as G2.
[0021]
In the spatial filter, when the rate of change of the gain according to the change of the angle is
large near the suppression angle, as shown in FIG. 5, the angle deviation between the direction of
gain G1 and the direction of G2 There is a possibility that the difference between G1 and G2 may
become large even if
[0022]
In the target sound inferior signal generation means described in Patent Document 2 described
above, the suppression angle of the spatial filter is directed in the direction in which the target
sound is supposed to arrive, and the target sound component is suppressed while the disturbance
sound component is reduced. Although extracted, as described above, if the arrival direction of
the target sound deviates during the process of separating the target sound and the disturbance
sound, a large deviation is generated in the output sound even if the deviation is slight. There is a
04-05-2019
5
risk of consequences.
[0023]
Therefore, in the process of separating the target sound and the disturbance sound coming from
any direction other than the arrival direction of the target sound, the quality of the sound after
separation processing may be maintained even if the arrival direction of the target sound
deviates. What is needed is a sound source separation device, program and method that can.
[0024]
The sound source separation apparatus according to the first aspect of the present invention is
(1) an assumption that it is assumed that a target sound arrives for a spectrum of sound
reception signals of two microphones among a plurality of microphones arranged at intervals.
The target sound component is suppressed from the spectrum of the sound reception signal
using a plurality of target sound suppression units that direct and process directivity of
component suppression in different directions within a predetermined range including the
direction. Target sound suppression spectrum generating means for generating a target sound
suppression spectrum, and (2) a target sound dominant spectrum in which an interference sound
coming from any direction other than the predetermined range is suppressed for the spectrum of
the sound reception signal Using the target sound dominant spectrum generation means, (3)
target sound suppression spectrum, and target sound dominant spectrum, the above-mentioned
disturbance sound component of the above-mentioned sound receiving signal And having a
separating means for separating the components of the target sound.
[0025]
A sound source separation program according to a second aspect of the present invention relates
to a computer mounted on a sound source separation device, and (1) a spectrum of sound
reception signals of two microphones among a plurality of microphones disposed at intervals;
The spectrum of the sound reception signal is obtained by using a plurality of target sound
suppressing units that direct directivity processing of component suppression in different
directions within a predetermined range including an assumed arrival direction in which a target
sound is assumed to arrive. Target sound suppression spectrum generation means for generating
a target sound suppression spectrum from which the component of the target sound is
suppressed, and (2) an interference sound coming from an arbitrary direction other than the
predetermined range for the spectrum of the reception signal Using a target sound dominant
spectrum generating means for generating a target sound dominant spectrum that suppresses
the target sound, (3) a target sound suppressed spectrum, and a target sound dominant spectrum
For the received sound signal, characterized in that to function as a separating means for
separating the components of the component and the target sound of the interference sound.
04-05-2019
6
[0026]
A third aspect of the present invention is a sound source separation method performed by a
sound source separation device, comprising (1) target sound suppression spectrum generation
means, target sound dominant spectrum generation means, and separation means, and (2) the
target sound suppression spectrum generation. The means are different from each other within a
predetermined range including an assumed arrival direction in which it is assumed that the
target sound will come, with respect to the spectrum of the sound reception signal of the two
microphones among the plurality of microphones spaced apart Generate a target sound
suppression spectrum in which the component of the target sound is suppressed from the
spectrum of the reception signal by using a plurality of target sound suppression units that direct
and process the directivity of component suppression in the direction (3) The target sound
dominant spectrum generation means suppresses the interference sound coming from any
direction other than the predetermined range with respect to the spectrum of the reception
signal. And (4) the separation means separates the component of the disturbing sound and the
component of the target sound for the reception signal using the target sound suppression
spectrum and the target sound dominant spectrum. It is characterized by
[0027]
According to the present invention, in the process of separating the target sound and the
interference sound coming from any direction other than the arrival direction of the target
sound, the quality of the sound after the separation processing even if the arrival direction of the
target sound deviates You can keep
[0028]
It is the block diagram shown about the functional composition of the sound source separation
device concerning a 1st embodiment.
It is the block diagram shown about the functional composition of the sound source separation
device concerning a 2nd embodiment.
It is explanatory drawing shown about the characteristic of the conventional spatial filter.
04-05-2019
7
It is explanatory drawing shown about the directional characteristic in the conventional spatial
filter.
It is explanatory drawing shown about the change characteristic of the gain of the direction close
| similar to the suppression angle in the conventional spatial filter.
[0029]
(A) First Embodiment Hereinafter, a first embodiment of a sound source separation device,
program and method according to the present invention will be described in detail with reference
to the drawings.
[0030]
(A-1) Configuration and Operation of First Embodiment FIG. 1 is a block diagram showing a
functional configuration of a sound source separation device 10 according to the first
embodiment.
[0031]
The sound source separation device 10 separates the target sound from the disturbance sound
coming from any direction other than the arrival direction of the target sound.
Although the application of the sound source separation device 10 is not limited, for example, the
sound source separation device 10 may be mounted on a voice recognition device or a telephone
device such as a mobile phone and used for voice capture.
Specifically, for example, the sound source separation device 10 is mounted on a teleconference
device, and a voice of any speaker is separated as a target sound from mixed voices of a plurality
of speakers who perform remote speech, and the remote speech is performed. It may be used to
separate the speaker's voice as the target sound from the mixed sound of the speaker's voice and
the other sounds.
In addition, for example, in voice operation for a robot performing voice dialogue, on-vehicle
equipment such as a car navigation system, and voice recognition such as creation of minutes of
04-05-2019
8
a meeting, it may be used for separation of the user's voice as a target sound.
[0032]
The sound source separation device 10 mainly includes an input unit 20, an analysis unit 30, a
separation unit 40, a removal unit 50, and a generation unit 60.
[0033]
The sound source separation device 10 may be realized by installing the sound source separation
program according to the embodiment in a device having a processor (CPU or the like) with
respect to components other than hardware such as a microphone. Alternatively, all the
components may be realized using dedicated hardware (for example, a semiconductor chip).
[0034]
The input means 20 digitizes the two microphones 21 and 22 spaced apart from each other and
the sound reception signals of these two microphones 21 and 22 using an analog / digital signal
converter (not shown). It is converted into a signal, and the digital signal is given to the analysis
means 30.
[0035]
Hereinafter, similarly to FIGS. 3 to 5 described above, the plane perpendicular to the line
connecting the two microphones 21 and 22 is referred to as the direction of 0 degrees, the
clockwise direction is a positive angle, and the counterclockwise direction is negative. Represents
the direction as an angle of.
That is, the above-mentioned direction is expressed in the range of -180 degrees to 180 degrees
(-180 degrees is the same direction as 180 degrees).
[0036]
Also, in the following, as an example, the sound source separation device 10 will be described as
a configuration that assumes that the target sound arrives from a direction of approximately 0
degrees.
04-05-2019
9
[0037]
In the following description, it is assumed that the digital audio signal output from the
microphone 21 is x1 (n).
Similarly, let the digital audio signal output from the microphone 22 be x2 (n).
Here, n represents the n-th data (sample).
[0038]
The digital audio signals x1 (n) and x2 (n) are obtained, for example, by analog / digital
converting an analog audio signal input from an audio input device such as a microphone and
sampling it at every sampling period T. It is a thing.
It is desirable that the sampling period T be, for example, about 31.25 microseconds to about
125 microseconds.
[0039]
N consecutive x1 (n) and x2 (n) in the same time interval are one analysis unit (frame), and
processing of analysis means 30, separation means 40, removal means 50 and generation means
60 described later Shall be
[0040]
In the following description, in the sound source separation device 10, N = 1024 as an example.
Then, in the sound source separation apparatus 10, when a series of processing of the sound
source separation for the processing target analysis unit is finished, the latter half 3N / 4 data of
x1 (n) and x2 (n) are shifted to the first half, It is assumed that the continuous N / 4 pieces of
data input to are connected in the second half.
04-05-2019
10
Thus, the sound source separation device 10 generates new N consecutive x1 (n) and x2 (n) and
performs new processing as one analysis unit.
In the sound source separation device 10, it is assumed that the processing of such processing
target analysis unit is repeated.
[0041]
Note that the digital voice signal input to the analysis means 30 is not limited to one captured by
the microphone and converted to analog / digital.
For example, it may be read from a recording medium or the like, or may be given by
communication from another device.
That is, in the sound source separation device 10, as long as x1 (n) and x2 (n) can be held, the
input unit 20 may be omitted.
[0042]
When the digital voice signals x1 (n) and x2 (n) mixed with noise are supplied from the input
means 20, the analysis means 30 performs frequency analysis of x1 (n) and frequency analysis of
x2 (n). At 32, each performs FFT (Fast Fourier Transform) processing etc., and gives the result to
the separating means 40.
In the analysis means 30, in the FFT processing, a window function is applied to N consecutive
x1 (n) and x2 (n).
Although various window functions can be applied as the window function w (n), for example, a
Hanning window as shown in the following equation (2) may be applied.
04-05-2019
11
[0043]
The above-mentioned window processing by the analysis means 30 is a process performed in
consideration of the connection process of the analysis unit in the generation means 60
described later.
However, although it is preferable to apply the above-mentioned window function, it is not
essential.
[0044]
Hereinafter, the outputs of the frequency analysis units 31 and 32 will be represented as D1 (m)
and D2 (m), respectively.
Note that D1 (m) and D2 (m) are complex numbers.
[0045]
In addition, the analysis method in the analysis means 30 is not limited to FFT, You may make it
apply other frequency analysis methods, such as DFT (discrete Fourier transform).
[0046]
Further, depending on the device on which the sound source separation device 10 is mounted, a
configuration regarding analysis in a processing device for another purpose may be diverted as
the configuration of the sound source separation device 10.
For example, when the device on which the sound source separation device 10 is mounted is an
IP telephone device, such diversion is possible.
In the case of the IP telephone apparatus, the encoded result of the FFT output is inserted into
04-05-2019
12
the payload of the IP packet, but the FFT output can be diverted as the output of the analysis
means 30 described above.
[0047]
Further, in the processing of the separating means 40 described later, the property D (m) of the
spectrum D (m) = D * (N−m) (where 1 ≦ m ≦ N / 2-1, D * (N−m) Is a range of 0 ≦ m ≦ N / 2
from D (N−m).
[0048]
The separation unit 40 includes the interference sound suppression unit 41 and the target sound
suppression unit 42.
[0049]
The interference sound suppression unit 41 suppresses the component of the interference sound
by using D1 (m) and D2 (m), and generates a spectrum in which the component of the target
sound is emphasized.
Then, the target sound suppression unit 42 suppresses the component of the target sound using
D1 (m) and D2 (m), and generates a spectrum in which the component of the interference sound
is emphasized.
[0050]
Next, the configuration of the disturbance sound suppression unit 41 will be described.
[0051]
The interference sound suppression unit 41 includes two spatial filters 411 and 412 and a
minimum selection unit 413.
[0052]
The suppression angles of the spatial filters 411 and 412 are set to 90 degrees and -90 degrees,
respectively.
04-05-2019
13
[0053]
As described above, in the sound source separation device 10, it is assumed that the target sound
comes from the direction of approximately 0 degrees, so in the interference sound suppression
unit 41, the space is in the direction different from the direction in which the target sound
arrives. Although the suppression angle of the filter is directed, the combination of the number of
spatial filters and the suppression angle may be changed according to the direction in which the
target sound is supposed to arrive.
[0054]
As a specific process of the spatial filter 411, E1 (m) is obtained using the following equation (3).
Further, the spatial filter 412 obtains E2 (m) using the following equation (4).
In the following equations (3) and (4), f is a sampling frequency, and for example, 1600 Hz may
be applied.
[0055]
Then, the minimum selection unit 413 sets the minimum value M (m) of the absolute value of the
output E1 (m) of the spatial filter 411 and the output E2 (m) of the spatial filter 412 as shown in
the following equation (5) Calculate to obtain M (m).
The output M (m) is supplied from the minimum selecting unit 413 to the removing unit 50 as
the component of the target sound extracted.
[0056]
Next, the configuration of the target sound suppression unit 42 will be described.
[0057]
The target sound suppression unit 42 includes three spatial filters 421, 422, 423 and a minimum
04-05-2019
14
selection unit 424.
[0058]
The suppression angles of the spatial filters 421, 422, 423 are set to 0 degrees, 5 degrees, and -5
degrees, respectively.
[0059]
As described above, in the sound source separation apparatus 10, the target sound is assumed to
come from the direction of approximately 0 degrees, so the target sound suppression unit 42
sets the suppression angle of the spatial filter 421 to 0 degrees, The suppression angles of the
spatial filter 422 and the spatial filter 423 are set in a direction slightly shifted (about ± 5
degrees) from the direction of 0 degrees.
In the sound source separation apparatus 10, it is desirable to set the suppression angle of the
spatial filter so as to be a symmetrical pair about the direction in which the target sound is
supposed to arrive as in the above-mentioned example.
[0060]
The target sound suppression unit 42 uses three spatial filters, but within a predetermined range
(-5 degrees in the sound source separation device 10) including the direction in which the target
sound is supposed to arrive (0 degree in the sound source separation device 10) The number of
spatial filters and the combination of the suppression angles are not limited as long as different
suppression angles are directed by a plurality of spatial filters within a range of +5 degrees.
[0061]
As a specific process of the spatial filter 421, F0 (m) is obtained using the following equation (6).
[0062]
The spatial filter 422 obtains F1 (m) using the following equation (7).
04-05-2019
15
In equation (7), τ5 is a delay corresponding to the suppression angle = + 5 degrees.
[0063]
The spatial filter 423 obtains F2 (m) using the following equation (8).
In equation (8), τ-5 is a delay corresponding to the suppression angle = -5 degrees.
[0064]
Then, the minimum selection unit 424 calculates the minimum value N (m) of the absolute values
of F0 (m), F1 (m), and F2 (m) as shown in the following equation (9).
The output N (m) is supplied from the minimum selection unit 424 to the removal means 50 as
an extraction of the component of the disturbance sound.
[0065]
Next, the configuration of the removing means 50 will be described.
[0066]
The removing means 50 uses M (m) and N (m) given from the separating means 40 to find an
interference noise removal spectrum H (m) for removing interference noise in D1 (m), It is
supplied to the generation means 60.
[0067]
Below, an example of the interference sound removal spectrum H (m) which the removal means
50 requests | requires is demonstrated.
[0068]
04-05-2019
16
The removing means 50 obtains S (m) from the output M (m) of the minimum selecting unit 413
and the output N (m) of the minimum selecting unit 424 using the following equation (10).
Furthermore, the removing means 50 uses the following equation (11) with respect to S (m)
obtained in the range of 0 ≦ m ≦ N / 2, and the interference sound removal spectrum H which
is the output of the removing means 50 Find (m).
In the equations (10) and (11), D1 may be replaced with D2.
[0069]
H (m) = S (m) D1 (m) (11) Further, the removing unit 50 is configured to set H (m) = H * (N−m)
(where N / 2 + 1 ≦ m ≦ N−1). Using the property, an interference sound elimination spectrum
H (m) in the range of 0 ≦ m ≦ N−1 is determined and given to the generation means 60.
[0070]
The generation means 60 performs N-point inverse FFT processing on the interference noise
removal spectrum H (m) to obtain a sound source separation signal h (n).
Then, the generation means 60, as shown in the following equation (12), shows the current
sound source separation signal h (n) and the latter 3N / 4 of the sound source separation signal h
'(n) for the immediately preceding analysis unit The data of are added to obtain an output y (n).
[0071]
y (n) = h (n) + h ′ (n + N / 4) (12) In the sound source separation device 10, N / 4 pieces of data
are overlapped so that data (samples) are overlapped in consecutive analysis units. Although an
example in which the above-described processing is performed while shifting is described, this is
for the purpose of smoothly performing waveform connection, and therefore, not all necessary
processing may be performed, and N processing may be performed.
When N / 4 data are processed while being shifted, the time required for the series of processes
04-05-2019
17
from the analysis means 30 to the generation means 60 for one analysis unit has an upper limit
of NT / 4. Is desirable.
[0072]
(A-2) Effects of the First Embodiment According to the first embodiment, the following effects
can be achieved.
[0073]
In the sound source separation device 10, the three spatial filters of the target sound suppression
unit 42 have directivity of 0 degrees, 5 degrees, and -5 degrees, respectively, and the minimum
selection unit 424 outputs the outputs of the three spatial filters. The output value which is the
smallest absolute value among the values is applied to N (m).
That is, in the target sound suppression unit 42, when the target sound comes from the vicinity
of the 0 degree direction, the absolute value of the output value of the spatial filter 421 is the
smallest for the components in the vicinity of the 0 degree direction. Reflected in (m).
On the other hand, when the target sound comes from the vicinity of the 5 degree direction, the
output value of the spatial filter 422 is reflected on N (m) for the component in the vicinity of the
5 degree direction.
As described above, by providing the spatial filter group that is selected and applied according to
the arrival direction of the target sound in the target sound suppression unit 42, N (m (m The
target sound component is prevented from being mixed with the sound source separation device
10, and the sound quality degradation output from the sound source separation device 10 is
prevented.
[0074]
Therefore, as described above, by configuring the target sound suppression unit 42 using the
spatial filter group selected and applied according to the arrival direction of the target sound,
separation is achieved even when the arrival direction of the target sound deviates. The sound
quality of the target sound later can be improved to make it easier to hear.
04-05-2019
18
[0075]
(B) Second Embodiment Hereinafter, a second embodiment of the sound source separation
device, program and method according to the present invention will be described in detail with
reference to the drawings.
[0076]
(B-1) Configuration and Operation of Second Embodiment FIG. 2 is a block diagram showing the
overall configuration of a sound source separation device 10A of the second embodiment.
[0077]
The sound source separation apparatus 10 according to the first embodiment has a configuration
including one each of the input unit 20, the analysis unit 30, and the separation unit 40.
However, the sound source separation apparatus 10A according to the second embodiment
includes the input unit 20. , The analysis means 30 and the separation means 40 are different in
that a plurality of sets are provided.
Further, the sound source separation device 10A of the second embodiment is different from the
first embodiment in that the removing means 50 is replaced with the removing means 50A.
[0078]
In the sound source separation device 10A, as shown in FIG. 2, there are two sets of the input
means 20, the analysis means 30, and the separation means 40.
That is, it has two input means 20 (20-1, 20-2), two analysis means 30 (30-1, 30-2), and two
separation means 40 (40-1, 40-2). ing.
Further, the input unit 20-1 has two microphones 21-1 and 22-1, and the input unit 20-2 also
has two microphones 21-2 and 22-2.
04-05-2019
19
[0079]
About each processing of input means 20-1, 20-2, analysis means 30-1, 30-2, separation means
40-1, 40-2, input means 20 of a 1st embodiment, analysis means 30, and, The detailed
explanation is omitted because it is the same as the separating means 40.
[0080]
Furthermore, in the following, the output of the interference sound suppression unit in the
separation unit 40-1 is represented by MA (m), and the output of the target sound suppression
unit is represented by NA (m).
Further, the output of the interference sound suppression unit in the separation unit 40-2 is
denoted by MB (m), and the output of the target sound suppression unit is denoted by NB (m).
Moreover, what processed the signal from the microphone 21-1 by the analysis means 30-1 shall
be represented as D1 (m).
[0081]
Next, the configuration of the removing means 50A will be described.
[0082]
The removing means 50A uses MA (m) and NA (m) given from the separating means 40-1 and
MB (m) and NB (m) given from 40-2 in D1 (m). The interference sound removal spectrum H (m)
for removing the interference sound is obtained and given to the generation means 60.
[0083]
Below, an example of the interference sound removal spectrum H (m) which the removal means
50A calculates | requires is demonstrated.
[0084]
The removing unit 50A is configured by the MA (m) and the NA (m) given from the separating
04-05-2019
20
unit 40-1 and the MB (m) and the NB (m) given from 40-2 to the following equation (13) Apply
and find S (m).
Furthermore, the removing means 50A uses the following equation (14) for S (m) obtained in the
range of 0 ≦ m ≦ N / 2, and the interference sound removal spectrum H which is the output of
the removing means 50A Find (m).
In the equations (13) and (14), D1 may be replaced with a spectrum based on signals from other
microphones.
[0085]
H (m) = S (m) D1 (m) (14) Further, by using the property of H (m) = H * (N−m) (where N / 2 + 1
≦ m ≦ N−1) An interference noise removal spectrum H (m) in the range of 0 ≦ m ≦ N−1 is
determined and given to the generation means 60.
[0086]
The processing of the generation unit 60 is the same as that of the first embodiment, and thus
the description thereof is omitted.
[0087]
(B-2) Effects of Second Embodiment In the sound source separation device 10A of the second
embodiment, even when using more than two microphones in the input means, the same as in
the first embodiment It can produce an effect.
[0088]
(C) Other Embodiments The present invention is not limited to the above embodiments, and may
include modified embodiments as exemplified below.
[0089]
(C-1) In the first embodiment, depending on the application of the sound source separation
device 10, the generation means 60 can be omitted, or generation portions of other devices can
be diverted.
04-05-2019
21
For example, if the sound source separation apparatus is used for a speech recognition
apparatus, the generation means 60 can be omitted by using the separated spectrum H (m) as
the recognition feature quantity.
Further, for example, in the case where the sound source separation apparatus is used for an IP
telephone, since the IP telephone has a unit corresponding to a generation unit, the generation
unit may be diverted.
[0090]
(C-2) In the second embodiment, an example in which four microphones 21-1, 22-1, 21-2, and
22-2 are used has been described. However, the input unit 20-1 and the input unit 20-2 The
microphones may be constituted by three microphones by using one in common.
In this case, since the processing of the signal received by the microphones commonly used can
be made common, the amount of computation can be reduced.
Also, even when the number of microphones used is further increased, a common microphone
may be used among the input means.
[0091]
DESCRIPTION OF SYMBOLS 10 ... Sound source separation apparatus, 20 ... Input means, 21, 22
... Microphone, 30 ... Analysis means, 31, 32 ... Frequency analysis part, 40 ... Separation means,
41 ... Interference noise suppression part, 411, 412 ... Spatial filter, 413 ... minimum selection
unit, 42 ... target sound suppression unit, 421, 422, 423 spatial filter, 424 ... minimum selection
unit, 50 ... removal means, 60 ... generation means.
04-05-2019
22
Документ
Категория
Без категории
Просмотров
0
Размер файла
33 Кб
Теги
jp2011123370
1/--страниц
Пожаловаться на содержимое документа