close

Вход

Забыли?

вход по аккаунту

?

JP2009134102

код для вставкиСкачать
Patent Translate
Powered by EPO and Google
Notice
This translation is machine-generated. It cannot be guaranteed that it is intelligible, accurate,
complete, reliable or fit for specific purposes. Critical decisions, such as commercially relevant or
financial decisions, should not be based on machine-translation output.
DESCRIPTION JP2009134102
An object of the present invention is to extract musical sound with high target sound extraction
performance when the target sound and other noise (non-target sound) are mixed in the sound
signal obtained through a plurality of microphones and the mixed state may change. Ensure
(noise removal performance). A reference sound separation signal corresponding to a reference
sound other than a target sound is separated and generated based on a main sound signal and a
sub sound signal, the signal level is detected, and the detection signal level is in a predetermined
range. When the frequency spectrum of the reference sound corresponding signal is compressed
and corrected with a larger compression ratio as the detection signal level decreases, the
frequency spectrum obtained by the compression correction is subtracted from the frequency
spectrum of the target sound corresponding signal corresponding to the main sound signal. By
doing this, an acoustic signal corresponding to the target sound is extracted from the target
sound corresponding signal, and the acoustic signal is output. [Selected figure] Figure 1
Target sound extraction device, target sound extraction program, target sound extraction method
[0001]
The present invention relates to a target sound extraction apparatus which extracts and outputs
an audio signal corresponding to a target sound from a predetermined target sound source based
on an audio signal obtained through a microphone, a program thereof and a method thereof.
[0002]
04-05-2019
1
In devices equipped with a function to input the sound emitted by a sound source such as a
speaker etc., such as a teleconference system, a video conference system, a ticket vending
machine, a car navigation system, etc. Sound (hereinafter referred to as the target sound) is
picked up, but according to the environment in which the sound source is present, the acoustic
signal obtained through the microphone contains noise components other than the acoustic
signal component corresponding to the target sound. .
If the ratio of the noise component is large in the sound signal obtained through the microphone,
the clarity of the target sound is lost, and problems such as deterioration of the speech quality
and deterioration of the automatic speech recognition rate occur. Conventionally, as shown in,
for example, Non-Patent Document 1, a main microphone (voice microphone) mainly inputting
voice (an example of a target sound) emitted by the speaker and noise mainly around the speaker
(speaker) It is known that 2-input spectrum subtraction processing is used to remove noise
signals based on the acoustic signal obtained through the secondary microphone from the
acoustic signal obtained through It is done. Here, the two-input spectrum subtraction process is
an acoustic signal corresponding to the voice (the target sound) emitted by the speaker by
subtracting the time-series feature vectors of the input signal from the main microphone and the
input signal from the sub-microphone. Is a process of extracting (ie, removing noise
components).
[0003]
Further, in Patent Document 1, a plurality of sub-microphones (noise microphones) are used, and
an acoustic signal input through each of the sub-microphones (noise microphone) is selected by a
signal selected from among them according to the situation or by a predetermined weight. A
noise eliminator is shown that performs the two-input spectral subtraction process based on the
weighted averaged signal and the acoustic signal input through the main microphone. It is said
that this enables effective noise removal even in the acoustic space where nonstationary noise
whose property changes temporally and spatially is generated. Further, in Patent Document 2, in
the camera integrated type VTR device, correlation coefficients of a plurality of audio signals
obtained by collecting voices from a plurality of directions in the imaging range are obtained, and
based on the correlation coefficients, Techniques have been shown to enhance the audio signal
from the person present in the direction. In Patent Documents 3 to 5, reference sound (nontarget sound) other than the target sound is obtained from an acoustic signal (hereinafter
referred to as main sound signal) obtained through a microphone (corresponding to the main
microphone) that mainly inputs the target sound. To obtain an extracted signal of the target
sound by removing the signal obtained by processing the acoustic signal obtained through the
microphone (corresponding to the sub-microphone) that mainly inputs the signal by an adaptive
04-05-2019
2
filter so that the power of the extracted signal is minimized. Techniques for adjusting the filter
are shown.
[0004]
On the other hand, when a plurality of sound sources and a plurality of microphones (sound
input means) exist in a predetermined sound space, individual sound signals from each of the
plurality of sound sources (hereinafter referred to as sound source signals) The superimposed
acoustic signal (hereinafter referred to as a mixed acoustic signal) is input. A method of sound
source separation processing for identifying (separating) each of the sound source signals based
only on the plurality of mixed sound signals input in this manner is called a blind source
separation method (hereinafter referred to as BSS method). It is called. Furthermore, as one of the
sound source separation processes of the BSS system, there is a sound source separation process
of the BSS system based on the Independent Component Analysis (hereinafter referred to as the
ICA method). In the BSS method based on the ICA method, in a plurality of mixed sound signals
input through a plurality of microphones, a predetermined separation matrix (demixing matrix) is
generated by utilizing that the source signals are statistically independent. This is a processing
method of performing identification (source separation) of the sound source signal by
performing optimization and filtering processing using a separation matrix optimized for the
plurality of input mixed sound signals. At that time, optimization of the separation matrix is used
later by sequential calculation (learning calculation) based on the signal (separated signal)
identified (separated) by filter processing using the separation matrix set at a certain point in
time It is done by computing the separation matrix. Here, according to the sound source
separation processing of the BSS system based on the ICA method, each of the separated signals
has the same number of output terminals (may be referred to as output channels) as the number
of inputs of the mixed acoustic signal (= the number of microphones). Output through Such
sound source separation processing of the BSS system based on the ICA method is described in
detail in, for example, Non-Patent Document 2 and Non-Patent Document 3 and the like. Further,
as sound source separation processing, sound source separation processing by binary masking
processing (an example of binaural signal processing) is also known. The binary masking process
is mainly performed on each of the mixed speech signals by comparing the levels (powers) of the
plurality of divided frequency components (frequency bins) among the mixed speech signals
input through the plurality of directional microphones. It is a process for removing signal
components other than the audio signal from the sound source, and is a sound source separation
process that can be realized with a relatively low calculation load. This is described in detail in,
for example, Non-Patent Document 4 and Non-Patent Document 5 and the like.
[0005]
04-05-2019
3
Also, when various kinds of signal processing (processing of the signal) are performed on the
frequency spectrum of the acoustic signal to remove noise etc., an offensive musical noise
(artificial noise) is generated in the processed acoustic signal. Sounds containing such musical
noises can cause great discomfort to the listener even if the sound level is low, provided that the
sound level (volume) reaches human audible levels. Therefore, in a device such as a hearing aid, a
hearing aid, a mobile phone, or the like that performs signal processing on an acoustic signal to
output sound to be heard by human beings, musical noise is not generated as much as possible in
the acoustic signal (output signal) after signal processing. Is very important. For example, in Non
Patent Literature 6, Patent Literature 6, Patent Literature 7 and the like, a noise section in an
acoustic signal is estimated, and the frequency spectrum of the noise signal estimated from the
signal of the noise section is subtracted from the frequency spectrum of the original acoustic
signal. There is also disclosed a technique for suppressing musical noise by processing that
attenuates the signal level by changing the gain for each noise section. JP-A-6-67691 JP-A-20018285 JP-A-6-83372 JP-A-6-90493 JP-A-6-165286 JP-A 2005-195955 JP-A 2007-27897
Kashimura Et al., "In-vehicle speech recognition using a two-input noise removal method,"
Technical Report of the Institute of Electronics, Information and Communication Engineers, SP81, pp. 41-48, 1989 Hiroshi Saruwatari, "Blind sound source using array signal processing
Fundamentals of separation, "Technical report of IEICE. EA 2001-7, pp. 49-56, April 2001.
Tomoya Takatani et al., "High-fidelity blind source separation using ICA based on SIMO model",
Technical Report of IEICE. US 2002-87, EA 2002-108, January 2003. R. F. Lyon, "A
computational model of binaural localization and separation", In Proc. ICASSP, 1983. M. Bodden,
"Modeling human sound-source localization and the cocktail-party-effect", Acta Acoustica, vol. 1,
pp. 43-55, 1993.
Yukihiro NOMURA, et al. " Musical Noise Reduction by Spectral Using Morphologic al Filter ", In
Proceedings of NCSP '05, pp. 415-418, 2005
[0006]
However, in the technique shown in Non-Patent Document 1 and the techniques shown in Patent
Documents 3 to 5, when the target sound mixes in at a relatively large volume to the sub
microphone, the component of the acoustic signal corresponding to the target sound There is a
problem that high noise removal performance can not be obtained because the noise component
is erroneously removed as a noise component. Further, as disclosed in Patent Document 1, the
two-input spectral subtraction is an integrated signal obtained by weighted averaging of a
plurality of audio signals input through a plurality of the sub microphones (noise microphones)
04-05-2019
4
with a predetermined weight. When adopted as an input signal for processing, there is a problem
that the noise removal performance is degraded due to a mismatch between the weighted
average weight and the degree of mixing of the target sound for each of the plurality of
secondary microphones due to changes in the acoustic environment. there were. Further, as
disclosed in Patent Document 1, a signal selected from among a plurality of acoustic signals input
through a plurality of sub microphones (noise microphones) is adopted as an input signal of the
2-input spectrum subtraction process. In this case, noise components based on the acoustic
signal leaked to the selection are not removed under the situation where noises different from a
plurality of directions arrive at each microphone, and there is also a problem that the noise
removal performance is deteriorated. In the technique disclosed in Patent Document 2, although
the audio signal from the person at the center of the shooting range is emphasized, the other
audio signals also remain, and the signal of the target sound is not extracted.
[0007]
Further, if the sound source separation processing of the BSS system based on the ICA method
and the binary masking processing are performed based on the main sound signal and the sub
sound signal, a separated signal corresponding to the target sound can be obtained, Depending
on the acoustic environment, there is a problem that the separated signal may contain signal
components of noise other than the target sound at a relatively high ratio. For example, in the
sound source separation process of the BSS method based on the ICA method, the sound source
separation performance is degraded in an environment where the sound source of the target
sound and other noises is more than the number of microphones or the noise is reflected /
reflected Do. In addition, when the signal processing for removing the signal component of noise
other than the target sound is performed on the separated signal (sound signal) corresponding to
the target sound obtained by the sound source separation processing, the musical noise after the
signal processing is musical noise Occurs, which causes a great discomfort to the listener.
Further, in the musical noise suppression techniques shown in Non-Patent Document 6, Patent
Document 6, Patent Document 7 and the like, it is necessary to accurately estimate the noise
section in the acoustic signal, but background noise in the acoustic signal to be processed When
the level is large or there are many kinds, there is a problem that accurate estimation of the noise
section becomes difficult and sufficient noise removal performance can not be obtained.
Therefore, the present invention has been made in view of the above circumstances, and the
object of the present invention is to mix the target sound and other noise (non-target sound) in
the sound signal obtained through the plurality of microphones, and When the mixing state can
change, it is possible to extract (reproduce) the acoustic signal corresponding to the target sound
as faithfully as possible (the removal performance of the non-target sound is high) and further, in
the extracted signal, the listener gives discomfort to the listener It is an object of the present
invention to provide a target sound extraction device, a target sound extraction program and a
04-05-2019
5
target sound extraction method capable of suppressing noise.
[0008]
In order to achieve the above object, a target sound extraction apparatus according to the
present invention is a main sound obtained through a main microphone mainly inputting a sound
(hereinafter referred to as a target sound) output from a predetermined target sound source
(specific sound source). One or more sub-acoustics obtained through the signal and one or more
other sub-microphones (arranged at a different position from the main microphone, or having
directivity in a direction different from the main microphone) An acoustic signal corresponding
to the target sound is extracted based on the signal and the extracted signal is output, and the
following components (1-1) to (1-3) are provided. is there. (1-1) Separate one or more reference
sound separation signals corresponding to reference sounds other than the target sound (may be
called noise or non-target sound) based on the main sound signal and the sub sound signal Sound
source separation means for executing sound source separation processing to be generated. (1-2)
Signal level detection means for detecting the signal level of a reference sound corresponding
signal which is a signal obtained by integrating a plurality of the reference sound separation
signals or a plurality of the reference sound separation signals. (1-3) When the detection signal
level detected by the signal level detection means is within a predetermined range, the frequency
spectrum of the reference sound corresponding signal is compressed and corrected with a larger
compression ratio as the detection signal level decreases. The target sound corresponding signal
by subtracting the frequency spectrum obtained by the compression correction from the
frequency spectrum of the target sound corresponding signal which is a signal obtained by
subjecting the main sound signal or the main sound signal to predetermined signal processing
Spectral subtraction processing means for extracting an acoustic signal corresponding to the
target sound from the signal and outputting the acoustic signal. The compression ratio is the
ratio of the signal value before compression correction to the signal value after compression.
Then, for example, it is also conceivable that the target sound extraction device according to the
present invention further includes the components shown in the following (1-4). (1-4) Target
sound corresponding signal output means for outputting the target sound corresponding signal
as an acoustic signal corresponding to the target sound when the detection signal level by the
signal level detection means does not reach a predetermined lower limit level. In this case, when
the spectrum subtraction processing means determines that the signal level detected by the
signal level detection means is equal to or higher than the lower limit level, a signal obtained by
subtracting the frequency spectrum is used as the acoustic signal corresponding to the target
sound. Output. Further, as a specific example of the sound source separation process executed by
the sound source separation means, a sound source separation process by a blind sound source
separation method based on an independent component analysis method (FDICA method
described later) performed on acoustic signals in the frequency domain is considered. Be
04-05-2019
6
[0009]
In the present invention, the target sound corresponding signal is a signal mainly including the
signal component of the target sound, but depending on the position of the target sound source
with respect to a plurality of microphones (the main microphone and the sub microphones) and
the generation situation of noise, There may be a case where relatively many signal components
of noise other than the target sound remain in the target sound corresponding signal. On the
other hand, the reference sound corresponding signal obtained based on the processing of the
sound source separation means is the noise sound of the noise sound source (sounds other than
the target sound (reference sound) in the respective sound collection ranges of the
submicrophones different in position and direction of directivity. It is a signal which mainly
includes the signal component of). Then, even if the target sound corresponding signal contains
noise noise (reference sound) components other than the target sound, the target sound
corresponding signal can be obtained by subtracting the frequency spectrum by the spectrum
subtraction processing means. , Signal components of noise (reference sound) other than the
target sound are substantially removed. Moreover, even in a situation where noise (reference
sound) different from a plurality of directions arrives at the main microphone, the extraction
signal by the spectrum subtraction processing means is the signal component of all the reference
sound separation signals corresponding to each of the plurality of noises. Is the removed signal.
The frequency spectrum to be subtracted from the frequency spectrum of the target sound
corresponding signal in the processing of the spectrum subtraction processing means has a
smaller level (volume) of the reference sound corresponding signal than the frequency spectrum
of the reference sound corresponding signal. The compression correction is performed at a large
compression ratio. Therefore, in the present invention, when the level of the reference sound
corresponding signal is large (that is, the volume of the noise sound is large), the signal
component that is offensive to the listener is positively removed from the target sound
corresponding signal. An acoustic signal corresponding to a sound is extracted as faithfully as
possible. At this time, although the extraction signal (sound signal corresponding to the target
sound) may include some musical noise, it becomes an acoustic signal that is easy for the listener
to hear than the situation in which the noise sound signal component remains. Furthermore, in
the present invention, when the level of the reference sound corresponding signal is small (that
is, the volume of the noise sound is small), the process of removing the signal component from
the target sound corresponding signal is not actively performed. Suppresses musical noise that
may be offensive to the listener. At that time, although the sound signal corresponding to the
target sound contains the signal component of the noise sound, the listener is hardly concerned
with the noise sound because the signal level (volume) is small.
04-05-2019
7
That is, in the present invention, the removal of the signal component of the noise sound is
prioritized when the volume of the noise sound is large, and the suppression of the musical noise
is prioritized over the removal of the signal component of the noise sound when the volume of
the noise sound is small. Ru. Therefore, according to the present invention, when a specific noise
sound (non-target sound) or a plurality of noise sounds having different directions of presence
arrive at the main microphone at relatively high levels, an acoustic signal corresponding to the
target sound is minimized. It can be faithfully extracted (reproduced) and can suppress musical
noises that make the listener uncomfortable.
[0010]
Also, as an example of specific processing contents executed by each means provided in the
target sound extraction device according to the present invention, for example, the combination
of the processing shown in the following (1-5) to (1-7) is considered Be (1-5) For each
combination of the main sound signal and each of the plurality of sub sound signals, the sound
source separation unit may generate a plurality of target sound separation signals corresponding
to the target sound based on both sound signals. A sound source separation process is performed
to separate and generate the reference sound separation signal. (1-6) The signal level detection
means detects the signal level of each of the plurality of reference sound separation signals. (1-7)
The spectrum subtraction processing means performs the compression correction on each of the
plurality of reference sound separation signals, and generates a plurality of the target sound
corresponding signals obtained by integrating the plurality of target sound separation signals.
The plurality of frequency spectra obtained by performing the compression correction on each of
the reference sound separation signals are subtracted. Also, as another example of specific
processing contents executed by each means provided in the target sound extraction device
according to the present invention, the combination of the processing shown in the following (18) to (1-10) is considered Be (1-8) For each combination of the main sound signal and each of the
plurality of sub sound signals, the sound source separation unit may generate a target sound
separation signal corresponding to the target sound based on both sound signals and a plurality
of target sound separation signals. A sound source separation process is performed to separate
and generate the reference sound separation signal. (1-9) The signal level detection means
detects a signal level of a signal obtained by integrating a plurality of reference sound separation
signals. (1-10) The spectrum subtraction processing means performs the compression correction
on a signal obtained by integrating a plurality of reference sound separation signals from the
target sound corresponding signal obtained by integrating a plurality of the target sound
separation signals. Subtract the frequency spectrum that is being In the present invention, it is
also conceivable that the detection of the signal level by the signal level detection means and the
compression correction by the spectrum subtraction means are performed for each of a plurality
of predetermined frequency bands. As a result, the compression correction can be performed
04-05-2019
8
with different compression ratios for each of a plurality of frequency band divisions, and the
target sound extraction performance and musical noise suppression performance can be
enhanced by more detailed signal processing.
[0011]
Furthermore, the present invention can also be understood as a target sound extraction program
that causes a computer to execute the processing executed by each means in the target sound
extraction device described above. That is, in the target sound extraction program according to
the present invention, the main sound signal obtained through the main microphone for mainly
inputting the target sound output from the predetermined target sound source and the main
microphone are placed at different positions or An acoustic signal corresponding to the target
sound is extracted based on one or more sub-acoustic signals obtained through one or more submicrophones having directivity in a direction different from that of the main microphone, and an
extracted signal is output. It is a target sound extraction program that causes a computer to
execute processing, and is a program that causes a computer to execute the processing described
in the following (2-1) to (2-3). (2-1) A sound source separation process for separating and
generating one or more reference sound separation signals corresponding to reference sounds
other than the target sound based on the main sound signal and the sub sound signal. (2-2)
Signal level detection processing for detecting the signal level of a reference sound
corresponding signal which is a signal obtained by integrating a plurality of the reference sound
separation signals or a plurality of the reference sound separation signals. (2-3) When the
detection signal level detected by the signal level detection process falls within a predetermined
range, the frequency spectrum of the reference sound corresponding signal is compressed and
corrected with a larger compression ratio as the detection signal level decreases. The target
sound corresponding signal by subtracting the frequency spectrum obtained by the compression
correction from the frequency spectrum of the target sound corresponding signal which is a
signal obtained by subjecting the main sound signal or the main sound signal to predetermined
signal processing Spectrum subtraction processing of extracting an acoustic signal corresponding
to the target sound from the signal and outputting the acoustic signal. The same effect as the
target sound extraction device according to the present invention described above can be
obtained by the computer that executes the target sound extraction program described above.
The present invention can also be understood as a target sound extraction method in which each
process in the target sound extraction program according to the present invention described
above is executed by a computer.
[0012]
04-05-2019
9
According to the present invention, further, under an acoustic environment in which noise
different from a plurality of directions arrives at each microphone, or under an acoustic
environment in which a target sound mixes with any of the secondary microphones at a relatively
large volume. Can ensure high noise removal performance even when such an acoustic
environment changes. Furthermore, according to the present invention, when the volume of the
noise sound is large, the removal of the signal component of the noise sound is prioritized, and
when the volume of the noise sound is small, the suppression of the musical noise is prioritized
over the removal of the signal component of the noise sound. Therefore, it is possible to suppress
musical noises that cause the listener to feel uncomfortable.
[0013]
Hereinafter, embodiments of the present invention will be described with reference to the
accompanying drawings for understanding of the present invention. The following embodiment
is an example embodying the present invention and is not of the nature to limit the technical
scope of the present invention. FIG. 1 is a block diagram showing a schematic configuration of a
target sound extraction apparatus X1 according to the first embodiment of the present invention,
and FIG. 2 shows a schematic configuration of a target sound extraction apparatus X2 according
to the second embodiment of the present invention. FIG. 3 is a block diagram showing a
schematic configuration of a target sound extraction apparatus X3 according to a third
embodiment of the present invention, and FIG. 4 is a level subtraction process of reference sound
corresponding signals in the target sound extraction apparatuses X1 to X3. FIG. 5 is a diagram
showing an example of the relationship with the compression coefficient, FIG. 5 is a diagram
showing an example of the relationship between the level of the reference sound corresponding
signal in the target sound extraction device X1 to X3 and the subtraction amount of the spectral
subtraction processing, FIG. A diagram showing an example of the relationship between the level
of the reference sound corresponding signal and the compression ratio of the reference sound
corresponding signal spectrum in the devices X1 to X3, FIG. Constitution It is a block diagram
representing.
[0014]
First Embodiment First, a target sound extraction device X1 according to a first embodiment of
the present invention will be described with reference to the block diagram shown in FIG. As
shown in FIG. 1, the target sound extraction device X1 includes an acoustic input device V1
including a plurality of microphones, a plurality of (three in FIG. 1) sound source separation
04-05-2019
10
processing units 10 (10-1 to 10-3), a target sound The separation signal integration processing
unit 20, the spectrum subtraction processing unit 31, and the level detection / coefficient setting
unit 32 are provided. Here, the sound input device V1 includes one main microphone 101 and a
plurality of (three in FIG. 1) sub microphones 102 (102-1 to 102-3). The main microphone 101
and the plurality of sub microphones 102 are arranged at a plurality of different positions, or
have directivity in a plurality of different directions. The main microphone 101 is an acoustic
input unit that mainly inputs a sound (hereinafter referred to as a target sound) emitted by a
predetermined target sound source (for example, a speaker capable of moving within a
predetermined range). The plurality of sub microphones 102-1 to 102-3 are disposed at a
plurality of positions different from the main microphone 101, or have directivity in a plurality of
directions different from each other. It is an acoustic input means for inputting a reference sound
(noise) other than a sound. The description of the sub microphone 102 is a description
generically to the plurality of sub microphones 102-1 to 102-3. The main microphone 101 and
the sub microphone 102 shown in FIG. 1 are microphones each having directivity, and the sub
microphone 102 is disposed so as to have directivity in each of a plurality of directions different
from the main microphone 102. There is.
[0015]
When each of the main microphone 101 and the sub microphone 102 is a microphone having
directivity, a direction (for example, less than + 180 ° on one side with respect to the directional
center direction (front direction) of the main microphone 101 as the center (0 °) It is desirable
that the pointing center direction (front direction) of the sub microphone 102 be set in the
direction of + 90 °) and the direction less than −180 ° on the other side (for example, the
direction of −90 °). . Further, it is also conceivable that the pointing directions of the
microphones 101 and 102 are set to directions different from each other in the same plane, or to
three-dimensionally different directions.
[0016]
Then, the target sound extraction device X1 generates an acoustic signal corresponding to the
target sound based on the main sound signal obtained through the main microphone 101 and
the sub sound signals obtained through the plurality of other sub microphones 102. It extracts
and outputs the extraction signal (hereinafter referred to as a target sound extraction signal). In
the target sound extraction device X1, the sound source separation processing unit 10, the target
sound separation signal integration processing unit 20, the spectrum subtraction processing unit
31, and the level detection / coefficient setting unit 32 are, for example, DSPs (Digital) which are
04-05-2019
11
an example of a computer. A signal processor) and a program executed by the DSP are embodied
by a ROM, an ASIC or the like storing the program. In this case, the ROM performs processing
(described later) performed by the sound source separation processing unit 10, the target sound
separation signal integration processing unit 20, the spectrum subtraction processing unit 31,
and the level detection / coefficient setting unit 32 in the DSP. A program to be executed is
stored in advance.
[0017]
The sound source separation processing unit 10 (10-1 to 10-3) is provided for each combination
of the main audio signal and each of the plurality of auxiliary audio signals, and the main audio
signal and the auxiliary audio signal as the combination thereof. Based on the target sound
separation signal which is the separation signal (identification signal of the target sound)
corresponding to the target sound and the reference sound separation corresponding to the
reference sound (which may be called noise) other than the target sound. A sound source
separation process is performed to separate and generate a signal (identification signal of
reference sound) (an example of the sound source separation means). Hereinafter, in the first
embodiment of the present invention, the reference sound separation signal may be referred to
as a reference sound correspondence signal, but in the first embodiment of the present invention,
the reference sound separation signal and the reference sound correspondence The signal
represents the same signal. An A / D converter (not shown) is provided between each of the
microphones 101 and 102 and the sound source separation processing unit 10, and the acoustic
signal converted into a digital signal by the A / D converter is the above-mentioned It is
transmitted to the sound source separation processing unit 10. For example, when the target
sound is a human voice, it may be digitized at a sampling period of about 8 kHz. Here, the sound
source separation processing unit 10 (10-1 to 10-3) is, for example, a sound source separation
process by a blind sound source separation method based on the independent component
analysis method shown in Non-Patent Document 2 and Non-Patent Document 3. A sound source
separation process is performed.
[0018]
A sound source separation device Z, which is an example of a device that can be adopted as the
sound source separation processing unit 10, will be described below with reference to the block
diagram shown in FIG. A sound source separation device Z described below is a state in which a
plurality of sound sources and a plurality of microphones 101 and 102 exist in a predetermined
acoustic space, and individual voice signals from each of the sound sources (hereinafter referred
04-05-2019
12
to as sound sources) When a plurality of mixed speech signals, which are signals in which signals
are superimposed, are sequentially input, the sound source separation process of the BSS system
based on the ICA method for the mixed speech signal in the frequency domain, that is, the FDICA
system (Frequency A sound source separation process based on Domain ICA is performed to
sequentially generate a plurality of separated signals (signals identifying the sound source signal)
corresponding to the sound source signal.
[0019]
In the FDICA method, first, for the input mixed speech signal x (t), a short time discrete Fourier
transform is performed for each frame, which is a signal divided by the ST-DFT processing unit
13 every predetermined period. Hereinafter, ST-DFT processing is performed to analyze the
observation signal in a short time. Then, the separation operation processing unit 11 f performs
separation operation processing based on the separation matrix W (f) on the signal (signal of
each frequency component) of each channel after the ST-DFT processing to identify the sound
source (identification of sound source signal )I do. Here, when f is a frequency bin and m is an
analysis frame number, the separated signal (identification signal) y (f, m) can be expressed as
the following equation (1). Here, the update equation of the separation filter W (f) can be
expressed, for example, as the following equation (2). According to this FDICA method, the sound
source separation processing is treated as an instantaneous mixing problem in each narrow
band, and the separation filter (separation matrix) W (f) can be updated relatively easily and
stably. In FIG. 14, the separation signal y1 (f) corresponding to the main microphone 101 is the
target sound separation signal. Further, the separation signal y2 (f) corresponding to the sub
microphone 102 is the reference sound separation signal. The reference sound separation signal
(separation signal y2 (f)) is an acoustic signal in the frequency domain. Although FIG. 14 shows
an example in which the number of channels of mixed audio signals x1 and x2 to be input (ie, the
number of microphones) is two, (channel number n)) (number of sound sources m) As long as the
number of channels is three or more, the same configuration can be realized.
[0020]
Further, the level detection / coefficient setting unit 32 detects the signal level (the magnitude of
the signal value, the volume) of each of the plurality of reference sound separation signals
(reference sound corresponding signals), and the processing is performed according to the
detection level. And a process of setting a compression coefficient used in the process of the
spectrum subtraction processing unit 31 (an example of the signal level detection means). For
example, the level detection / coefficient setting unit 32 may calculate an average value or a total
04-05-2019
13
value of signal values of the frequency spectrum of each of the plurality of reference sound
separation signals (signal value for each frequency bin in the reference sound separation signal
in the frequency domain) Alternatively, a normalized value based on a predetermined reference
value is detected as a signal level. Further, the level detection / coefficient setting unit 32
determines an average value of signal values of frequency bins belonging to each of a plurality of
predetermined frequency bands for the frequency spectrum of each of the plurality of reference
sound separation signals. It is also conceivable to detect as the signal level a sum value or a value
obtained by normalizing them based on a predetermined reference value. The division of the
frequency band may be, for example, division of each frequency bin in the frequency spectrum of
the reference sound separation signal, or division of a frequency band defined by a combination
of a plurality of frequency bins.
[0021]
The level detection / coefficient setting unit 32 detects the detected signal level L when the
detected level L is a level within a predetermined range for each of the plurality of reference
sound separation signals. The compression coefficient α is set such that the smaller the value,
the smaller the value. The compression coefficient α (0 ≦ α ≦ 1) is a coefficient used for the
spectrum subtraction process described later, and the details will be described later. Further, the
subscript i of the compression coefficient α in FIG. 1 represents an identification number
corresponding to each of the plurality of reference sound separation signals. FIG. 4 is a diagram
showing an example of the relationship between the detection level L (vertical axis) and the
compression coefficient α (horizontal axis) for the reference sound corresponding signal (the
reference sound separation signal in the first embodiment) is there. A graph line g1 in FIG. 4
indicates a situation where the compression coefficient α which is in a positive proportional
relation to the detection level L is set when the detection signal level L is in the range of 0 or
more and Ls2 or less. It is an example to express. Further, the graph line g2 in FIG. 4 is
proportional to the detection level L in a positive proportion when the detection signal level L is
in a range of a predetermined lower limit Ls1 (> 0) or more and an upper limit Ls2 or less. It is an
example showing the situation where the compression coefficient alpha which becomes a relation
is set up. When the compression coefficient α of the graph line g2 is set, the compression
coefficient α is set to 0 (zero) when the detection signal level L does not reach the lower limit
level Ls1. The level detection / coefficient setting unit 32 sets the compression coefficient α as
indicated by the graph line g1 or g2 in FIG. 4 according to the detection signal level L.
Incidentally, for comparison with the compression coefficient α set by the level detection /
coefficient setting unit 32, FIG. 4 is a graph line representing a situation where the compression
coefficient α is constant regardless of the detection signal level L. g0 (wave line) is shown.
04-05-2019
14
[0022]
Further, in the target sound extraction device X1, the target sound separation signal integration
processing unit 20 executes a process of integrating a plurality of the target sound separation
signals separated and generated by the sound source separation processing unit 10, respectively.
Output an integrated signal. Hereinafter, in the first embodiment, an integrated signal obtained
by integrating a plurality of target sound separation signals is referred to as a target sound
corresponding signal. For example, the target sound separation signal integration processing unit
20 executes the averaging process or the weighted average process for each of the plurality of
divided target sound frequency separation signals for each of the plurality of target sound
separation signals. Synthesize sound separation signals. Further, in the target sound extraction
device X1, the spectrum subtraction processing unit 31 separates the target sound
corresponding signal (integrated signal) obtained by the target sound separation signal
integration processing unit 20 and the sound source separation processing unit 10 respectively.
By performing spectrum subtraction processing on the plurality of generated reference sound
separation signals, an acoustic signal corresponding to the target sound is extracted from the
target sound corresponding signal, and the extracted signal (the target sound extraction signal)
Output.
[0023]
Hereinafter, a specific example of the processing by the spectrum subtraction processing unit 31
will be described. Spectrum value of the observation signal which is an acoustic signal in the
frequency domain, that is, the spectrum value (the signal obtained by integrating the target
sound separation signal in the first embodiment) (signal value for each frequency bin in the
frequency spectrum) If the spectral value of the target sound signal is S (f, m) and the spectral
value of the noise signal (a signal of a sound other than the target sound) is N (f, m) The spectral
value Y (f, m) of the observed signal is expressed by the following equation (3). Then, in the
target sound extraction device X1, it is assumed that there is no correlation between the target
sound signal and the noise signal, and the spectral value N (f, m) of the noise signal is the
spectral value of the reference sound corresponding signal. The spectrum estimation value of the
target sound signal (that is, the spectrum value of the target sound extraction signal) is calculated
(extracted) based on the following equation (4). The compression coefficient α in the equation
(4) is a coefficient set by the level detection / coefficient setting unit 32 according to the
detection signal level L. Further, the term for multiplying the compression coefficient α in the
equation (4) by the spectrum value of the reference sound corresponding signal is an operation
for compressing and correcting the spectrum value of the reference sound corresponding signal
based on the compression coefficient α. It can be said that it is a section to carry out. The
04-05-2019
15
suppression coefficient β in the equation (4) is usually set to 0 (zero) or a very small value close
to 0.
[0024]
FIG. 5 is a spectrum subtraction process based on the detection level L (vertical axis) and the
equation (4) for the reference sound separation signal (represented as a reference sound
corresponding signal in the drawing) which is a signal corresponding to the reference sound. It is
a figure showing an example of a relation with subtraction amount. The subtraction amount is
the spectrum value after the compression correction when the spectrum value of the reference
sound corresponding signal is assumed to be proportional to the detection signal level L. The
graph line g1 'in FIG. 5 is an example showing the subtraction amount when the compression
coefficient α indicated by the graph line g1 in FIG. 4 is set. The graph line g2 'in FIG. 5 is an
example showing the subtraction amount when the compression coefficient α indicated by the
graph line g2 in FIG. 4 is set. The graph line g0 'in FIG. 5 is an example showing the subtraction
amount when the compression coefficient α is constant (graph line g0 in FIG. 4). Further, FIG. 6
is performed at the time of the spectrum subtraction process with the detection level L (vertical
axis) for the reference sound separation signal (represented as a reference sound corresponding
signal in the drawing) which is a signal corresponding to the reference sound. It is a figure
showing an example of a relation with compression ratio R in compression amendment of a
spectrum of a reference sound corresponding signal (the reference sound separation signal). The
compression ratio is the ratio of the signal value before compression correction to the signal
value after compression (the compression amount in FIG. 4) (that is, R = 1 / α). As shown in FIG.
6, in the target sound extraction device X1, when the detection signal level is within a
predetermined range (for example, 0 to Ls2 or Ls1 to Ls2), the value decreases as the detection
signal level L decreases. Since the compression coefficient α is set (see FIG. 4), the spectrum
subtraction processing unit 31 sets the frequency spectrum of the reference sound
corresponding signal within the predetermined range to a larger compression ratio R as the
detection signal level L decreases. Will be compressed and corrected. The predetermined range
may be considered to be all the range that the detection signal level can take.
[0025]
When the processing of the spectrum subtraction processing unit 31 based on the compression
coefficient α as described above is summarized, it can be said that the processing is as follows.
That is, the processing of the spectrum subtraction processing unit 31 (an example of the
spectrum subtraction processing means) is performed when the detection signal level L is a level
04-05-2019
16
within a predetermined range (for example, 0 to Ls2 or Ls1 to Ls2). The frequency spectrum of
each of the plurality of reference sound corresponding signals is compressed and corrected with
a larger compression ratio R as the target sound detection signal level L is smaller, and the main
sound signal is obtained by performing sound source separation processing and integration
processing. A sound signal corresponding to the target sound is extracted from the target sound
corresponding signal by subtracting a plurality of frequency spectra obtained by the compression
correction from the frequency spectrum of the target sound corresponding signal, and the sound
signal (the target sound It can be said that the processing is to output the extraction signal).
When the compression coefficient α indicated by the graph line g2 in FIG. 4 is set, the spectrum
subtraction processing unit 31 performs subtraction processing of the frequency spectrum when
the detection signal level L is equal to or higher than the lower limit level Ls1. The target sound
extraction signal is output as the target sound extraction signal, but if the detection signal level is
less than the lower limit level Ls1, the compression coefficient α is set to 0, so the target sound
corresponding signal is not changed. The target sound extraction signal (sound signal
corresponding to the target sound) is output (an example of the target sound corresponding
signal output unit).
[0026]
When the level L of the reference sound corresponding signal is large (that is, the volume of the
noise sound is large) by the processing of the spectrum subtraction processing unit 31 described
above, the signal component that causes the ear of the listener from the target sound
corresponding signal The signal is positively removed, and an acoustic signal corresponding to
the target sound is extracted as faithfully as possible. At that time, although the extraction signal
(the target sound extraction signal) may include some musical noise, it becomes an acoustic
signal that is easy for the listener to hear than the situation where the signal component of the
noise sound remains. Here, in the spectrum subtraction processing in which the compression
coefficient α is a constant value (graph line g0 shown in FIG. 4), musical noise is likely to occur
in the output signal (the target sound extraction signal). On the other hand, in the processing of
the spectrum subtraction processing unit 31, when the level L of the reference sound
corresponding signal is small (that is, the volume of the noise sound is small), the compression
coefficient α is set small and the reference sound corresponding signal The process of removing
the signal component of the target sound from the target sound corresponding signal is not
actively performed, and thereby, the musical noise that may be offensive to the listener is
suppressed. At this time, although the target sound extraction signal includes a signal component
of noise sound, the listener is hardly concerned with noise sound because the signal level
(volume) is small. That is, in the present invention, the removal of the signal component of the
noise sound is prioritized when the volume of the noise sound is large, and the suppression of the
musical noise is prioritized over the removal of the signal component of the noise sound when
04-05-2019
17
the volume of the noise sound is small. Ru. Therefore, according to the target sound extraction
device X1, in a situation where a specific noise sound (non-target sound) or a plurality of noise
sounds having different directions of presence arrive at the main microphone at relatively high
levels, an acoustic sound corresponding to the target sound. While being able to extract
(reproduce) the signal as faithfully as possible, it is possible to suppress musical noise that causes
discomfort to the listener.
[0027]
Second Invention Next, a target sound extraction device X2 according to a second embodiment of
the present invention will be described with reference to the block diagram shown in FIG. In FIG.
2, among the components included in the target sound extraction device X2, components that
execute the same processing as those included in the target sound extraction device X1 are
assigned the same reference numerals as the reference symbols in FIG. As shown in FIG. 2, the
target sound extraction apparatus X2 includes the sound input apparatus V1 including a plurality
of microphones and the plurality of (three in FIG. 2) sound source separation processing units 10
as in the target sound extraction apparatus X1. (10-1 to 10-3) The target sound separation signal
integration processing unit 20 is provided, and these are the same as those provided for the
target sound extraction device X1. Furthermore, the target sound extraction device X2 includes a
spectrum subtraction processing unit 31 ', a level detection / coefficient setting unit 32', and a
reference sound separation signal integration processing unit 33. In the target sound extraction
device X2, the sound source separation processing unit 10, the target sound separation signal
integration processing unit 20, the spectrum subtraction processing unit 31 ′, and the level
detection / coefficient setting unit 32 ′ are, for example, a DSP which is an example of a
computer. And, the program executed by the DSP is embodied by a ROM, an ASIC or the like
storing the program. In this case, the ROM performs the processing performed by the sound
source separation processing unit 10, the target sound separation signal integration processing
unit 20, the spectrum subtraction processing unit 31 ', and the level detection / coefficient
setting unit 32' in the DSP. A program for making the program be stored beforehand.
[0028]
Then, the target sound extraction device X2 also generates an acoustic signal corresponding to
the target sound based on the main sound signal obtained through the main microphone 101
and the sub sound signals obtained through the plurality of other sub microphones 102. It
extracts and outputs the extraction signal (the said target sound extraction signal). In the target
sound extraction device X2, the reference sound separation signal integration processing unit 33
04-05-2019
18
executes a process of integrating a plurality of the reference sound separation signals separated
and generated by each of the sound source separation processing units 10, and integration
obtained thereby It outputs a signal. Hereinafter, in the second embodiment, an integrated signal
obtained by integrating a plurality of the reference sound separation signals is referred to as a
reference sound corresponding signal. For example, the reference sound separation signal
integration processing unit 33 refers to the plurality of reference sound separation signals by
performing averaging processing or weighted average processing for each of the plurality of
divided frequency components (frequency bins). Synthesize sound separation signals. Further,
the level detection / coefficient setting unit 32 ′ in the target sound extraction device X2
determines the signal level (the magnitude of the signal value) of the reference sound
corresponding signal (integrated signal) obtained by the reference sound separated signal
integration processing unit 33. , Sound volume, and a process of setting the compression
coefficient α used in the process of the spectrum subtraction processing unit 31 'according to
the detected level (an example of the signal level detection means ). The processing content is the
same as that of the level detection / coefficient setting unit 32. Further, the spectrum subtraction
processing unit 31 ′ in the target sound extraction device X2 includes the target sound
corresponding signal (integrated signal) obtained by the target sound separated signal
integration processing unit 20 and the reference sound separated signal integration processing
unit 33. A sound signal corresponding to the target sound is extracted from the target sound
corresponding signal by performing a spectral subtraction process with the reference sound
corresponding signal (integrated signal) obtained by Output signal). The processing content is the
same as that of the spectrum subtraction processing unit 31. The target sound extraction device
X2 described above also achieves the same effects as the target sound extraction device X1. Such
a target sound extraction device X2 is also an example of the embodiment of the present
invention.
[0029]
Third Invention Next, a target sound extraction device X3 according to a third embodiment of the
present invention will be described with reference to the block diagram shown in FIG. In FIG. 3,
the same components as those of the target sound extraction apparatus X1 among the
components of the target sound extraction apparatus X3 are denoted by the same reference
numerals as those in FIG. As shown in FIG. 3, the target sound extraction device X3 includes the
sound input device V1 including a plurality of microphones, the plurality (three in FIG. 3) of the
sound source separation processing units 10 (10-1 to 10-3), A spectrum subtraction processing
unit 31 'and the level detection / coefficient setting unit 32 are provided. Here, the sound input
device V1, the sound source separation device 10, and the level detection / coefficient setting
unit 32 are the same as those included in the target sound extraction device X1. However, the
sound source separation device 10 in the target sound extraction device X3 does not have to
04-05-2019
19
output the target sound separation signal. Then, the target sound extraction device X3 also
generates an acoustic signal corresponding to the target sound based on the main sound signal
obtained through the main microphone 101 and the sub sound signals obtained through the
plurality of other sub microphones 102. It extracts and outputs the extraction signal (the said
target sound extraction signal). In the target sound extraction device X3, the sound input device
V1, the sound source separation processing unit 10, the spectrum subtraction processing unit 31
′ and the level detection / coefficient setting unit 32 are executed by, for example, a DSP which
is an example of a computer and its DSP The program to be executed is embodied by a ROM, an
ASIC or the like storing the program. In this case, a program for causing the DSP to execute the
processing performed by the sound source separation processing unit 10 and the spectrum
subtraction processing unit 31 'is stored in advance in the ROM.
[0030]
In the target sound extraction device X 3, the spectrum subtraction processing unit 31 ′
separates and generates the main sound signal (corresponding to the target sound corresponding
signal) obtained through the main microphone 101 and the sound source separation processing
unit 10. By performing spectrum subtraction processing on a plurality of the reference sound
separation signals (corresponding to the reference sound correspondence signal), an acoustic
signal corresponding to the target sound is extracted from the target sound correspondence
signal, and the extracted signal ( The target sound extraction signal is output. That is, although
the spectrum subtraction processing unit 31 'in the target sound extraction device X3 performs
subtraction processing of the same frequency spectrum as the spectrum subtraction processing
unit 31 in the target sound extraction device X1, the spectrum subtraction processing unit A
different point from 31 is that the frequency spectrum obtained by the compression correction
for each of the reference sound separation signals is subtracted from the frequency spectrum of
the main sound signal (an example of the target sound corresponding signal). In the target sound
extraction device X3, the target sound corresponding signal to be subjected to the spectral
subtraction is the main sound signal not subjected to the sound source separation processing,
that is, the signal component of relatively large noise sound. Therefore, the compression
coefficient α in the target sound extraction device X3 is usually set to a value (value close to 1)
larger than the compression coefficient α in the target sound extraction device X3. The target
sound extraction device X3 described above also achieves the same effects as the target sound
extraction device X1. Such a target sound extraction device X3 is also an example of the
embodiment of the present invention.
[0031]
04-05-2019
20
When the detection signal level L is within a predetermined range (0 to Ls2 or Ls1 to Ls2), the
compression coefficient α indicated by the graph lines g1 ′ ′ and g2 ′ ′ in FIG. 6 is in
proportion to the detection signal level L in positive proportion. It becomes a relation (relation
expressed by a linear expression), but in addition, the relation between the detection signal level
L and the compression coefficient α is nonlinear such as a relation expressed by a quadratic
expression or a cubic expression The relationship may be Further, the sound source separation
processing unit 10 (for example, sound source separation processing based on the FDICA
method) inputs sound source separation processing for three or more sound signals, for example,
one main sound signal and three sub sound signals. It is also possible to separately generate one
target sound separation signal and three reference sound separation signals. Therefore, it is also
conceivable to separate and generate one target sound separation signal and a plurality of
reference sound separation signals by the one sound source separation processing unit 10 in the
target sound extraction devices X1 to X3. In the embodiment described above, the target sound
extraction devices X1 to X3 include a plurality of the sub microphones 102, but the target sound
extraction devices X1 to X3 are one main microphone 101, An embodiment (hereinafter,
described as target sound extraction devices X1 ′, X2 ′, X3 ′) including one secondary
microphone 102 having a different position or direction of directivity is also conceivable. For
example, the target sound extraction apparatus X1 'according to the first embodiment has two
sub microphones 102-2 and 102-3 and two sound sources based on the configuration of the
target sound extraction apparatus X1 shown in FIG. The separation processing units 10-2 and
10-3 and the target sound separation signal integration processing unit 20 are omitted. In this
case, the target sound separation signal obtained by the sound source separation processing unit
10-1 is the target sound corresponding signal to be processed by the spectrum subtraction
processing unit 31. The target sound extraction apparatus X2 'according to the second
embodiment has two sub microphones 102-2 and 102-3 and two sound sources based on the
configuration of the target sound extraction apparatus X2 shown in FIG. The separation
processing units 10-2 and 10-3, the target sound separation signal integration processing unit
20, and the reference sound separation signal integration processing unit 33 are omitted. In this
case, the target sound separation signal and the reference sound separation signal obtained by
the sound source separation processing unit 10-1 are the target sound correspondence signal
and the reference sound correspondence signal to be processed by the spectrum subtraction
processing unit 31. Become. Further, the target sound extraction apparatus X3 'according to the
third embodiment has two sub microphones 102-2 and 102-3 and two sound sources based on
the configuration of the target sound extraction apparatus X3 shown in FIG. The separation
processing units 10-2 and 10-3 are omitted.
The target sound extraction devices X1 'to X3' described above are also considered as
embodiments of the present invention.
04-05-2019
21
[0032]
Further, in the above-described embodiment, in the target sound extraction devices X1 and X2
(FIG. 1 and FIG. 2), sound source separation processing based on the main sound signal and the
plurality of sub sound signals and the sound source separation processing are obtained. Although
an example in which a signal obtained by performing a process of integrating a plurality of target
sound separation signals to be selected is the target sound corresponding signal to be subjected
to a spectral subtraction process has been described, for example, the main sound signal It is also
conceivable to use an acoustic signal corresponding to a target sound corresponding to the target
sound (target of the spectral subtraction process) by integrating a plurality of the sub-acoustic
signals and a plurality of the auxiliary acoustic signals by a weighting synthesis process or the
like. In the weighting and combining process, it is conceivable to make the weight for the main
sound signal larger than the weight for a plurality of the sub sound signals. In the embodiment
described above, an example is shown in which the level detection / coefficient setting unit 32
'detects the level of a signal obtained by integrating a plurality of reference sound separation
signals in the target sound extraction device X2 (FIG. 2). The However, in the target sound
extraction device X2, the level detection / coefficient setting unit 32 ′ detects the signal level
for each of the plurality of reference sound separation signals, and based on the detected
plurality of signal levels (for example, It is also conceivable to set the compression coefficient α
based on an average level, a total level or the like.
[0033]
The present invention is applicable to a target sound extraction device that extracts and outputs
an audio signal corresponding to a target sound from an audio signal including a target sound
component and a noise component.
[0034]
FIG. 1 is a block diagram showing a schematic configuration of a target sound extraction device
X1 according to a first embodiment of the present invention.
The block diagram showing schematic structure of the target sound extraction apparatus X2
which concerns on 2nd Embodiment of this invention. The block diagram showing schematic
structure of the target sound extraction apparatus X3 which concerns on 3rd Embodiment of this
invention. The figure showing an example of the relation between the level of the reference
04-05-2019
22
sound corresponding signal in object sound extraction device X1-X3, and the compression
coefficient of spectrum subtraction processing. The figure showing an example of the relation
between the level of the reference sound corresponding signal in object sound extraction device
X1-X3, and the subtraction amount of spectrum subtraction processing. The figure showing an
example of the relation between the level of the reference sound corresponding signal in object
sound extraction devices X1-X3, and the compression ratio of a reference sound corresponding
signal spectrum. FIG. 8 is a block diagram showing a schematic configuration of a sound source
separation device Z that performs sound source separation processing of the BSS method based
on the FDICA method.
Explanation of sign
[0035]
X1: target sound extraction device according to the first embodiment X2: target sound extraction
device according to the second embodiment X3: target sound extraction device according to the
third embodiment V1: acoustic input device 10 (10-1 to 10-3 ): Sound source separation
processing unit 20: target sound separation signal integration processing unit 31, 31 ′:
spectrum subtraction processing unit 32, 32 ′: level detection / coefficient setting unit 33:
reference sound separation signal integration processing unit 101: main microphone 102:
Secondary microphone
04-05-2019
23
Документ
Категория
Без категории
Просмотров
0
Размер файла
43 Кб
Теги
jp2009134102
1/--страниц
Пожаловаться на содержимое документа