close

Вход

Забыли?

вход по аккаунту

?

JP2008054071

код для вставкиСкачать
Patent Translate
Powered by EPO and Google
Notice
This translation is machine-generated. It cannot be guaranteed that it is intelligible, accurate,
complete, reliable or fit for specific purposes. Critical decisions, such as commercially relevant or
financial decisions, should not be based on machine-translation output.
DESCRIPTION JP2008054071
PROBLEM TO BE SOLVED: In the conventional sound source separation method, it was not
possible to remove noise which is a problem in a video conference, such as paper rubbing sound
and the like, and sudden difference in elevation angle with voice is small. SOLUTION: In the
present invention, using a plurality of microphone intervals and a plurality of sub microphone
arrays, it is possible to estimate the arrival direction with high accuracy even if the direction
difference between the sound sources is small. Hold the calculator. In addition, since the phase
difference histogram calculation unit creates a histogram using only one frame of data,
localization can be performed even for noise that suddenly occurs. [Selected figure] Figure 2
Paper scraper
[0001]
The present invention belongs to a high-speed, high-resolution sound source localization
technique aimed at application to a voice communication apparatus such as a video conference
system.
[0002]
Sound source localization technology for estimating the direction of arrival of sound source is an
important technology that can be applied to learning of sound source separation filters and
speaker direction identification processing for robots, and has been actively studied since the
1980s.
04-05-2019
1
The simplest source localization method is a method called a delay and sum array (see, for
example, Non-Patent Document 1). The delay-sum array method is a very lightweight and highspeed method because it consists only of the process of multiplying the input signal by the
weighting factor and adding. However, due to the low localization performance, when there are
multiple sound sources, there is a problem that multiple sound source directions can not be
localized accurately. Therefore, a high-accuracy sound source localization technology such as
MUSIC (MUltiple SIgnal Classification) method (see, for example, Non-Patent Document 2) has
been proposed, but it requires high-load processing such as eigenvalue calculation and one frame
Because it is difficult to localize by the data of the above, it is not possible to localize the direction
of the noise generated suddenly. Therefore, a sound source localization method that can be
localized by only one frame of data, which is configured by a lightweight process that operates
even with an embedded CPU, is required. Moreover, in the MUSIC method, there is a problem
that the amount of processing increases in proportion to the search resolution of the sound
source direction. The DUET method (see, for example, Non-Patent Document 3) has been
proposed as a sound source localization method in which the processing amount is not
proportional to the resolution without requiring heavy processing such as eigenvalue calculation.
However, with the conventional DUET method, high-accuracy source localization becomes
difficult when multiple sound sources are physically close to each other.
[0003]
Jiro Oga, Yoshio Yamazaki, Yutaka Kanada, "Sound System and Digital Processing," IEICE
Information Society, 1995. Nobuyoshi Kikuma, "Adaptive Signal Processing with Array
Antennas," Science and Technology Publishing, 1998. Oe. Yilmaz, and S. Rickard, "Blind
Separation of Speech Mixtures via Time-Frequency Masking," IEEE Trans. SP, Vol. 52, No. 7,
2004. Akiko Araki, Hiroshi Sawada, Ryo Mukai, Shoji Makino, "Direction Estimation of Sparse
Signals Based on Clustering of Observed Signal Vectors," Proceedings of the National Meeting of
the Acoustical Society of Japan in 2006, pp.615-616, 2006.
[0004]
The voice band of the teleconferencing system currently on the market has shifted from the
conventional telephone band (4 kHz) to the wide band (7 khz), and in the future, it is expected to
shift to a voice band comparable to higher quality CD quality. It is done. Wide-band voice can
clearly hear high-frequency components of unvoiced consonant sounds and is easy to talk with,
but since noise is also broadened in bandwidth, if noise is generated, it becomes voice that is
04-05-2019
2
hard to hear immediately. Has the problem of
[0005]
Therefore, the demand for noise suppression technology is increasing with the spread of
broadband in voice communication devices such as video conferencing systems. In particular, it
is required to suppress the sound of a paper rub and the sound of hitting a desk at the other
party of the conference. A noise canceller is often used for the purpose of suppressing stationary
noise such as the sound of air conditioners and projector fans. However, the conventional noise
canceller has almost no suppression effect on sudden, high-power nonstationary noise such as
paper scraping noise and the sound of tapping on a desk. We have developed a sound source
separation technology that captures only the target sound by identifying the difference between
the arrival directions of the noise and the target sound when the direction of arrival of the sound
source is different for the purpose of suppressing sudden noise. doing. The separation
performance of such a source separation method depends on the estimation performance of the
noise and the arrival direction of the target sound. In other words, if the directions of arrival of
the noise and the target sound can be accurately identified, the sound source separation
performance is good. Conversely, when it is difficult to distinguish the direction of arrival, the
sound source separation performance is poor. The location of the sound source, such as the
paper scraping noise and the sound of tapping the desk, is usually on a desk, and the user's
speech and the direction of arrival of these noises are usually 20 ° apart, and the difference is
extremely small. Also, in a video conference, since it is necessary to minimize the delay of
conversation, input speech must be processed quickly to generate output speech. Therefore, it is
necessary to estimate the direction of the noise generated suddenly in a small number of frames.
[0006]
The outline of a representative invention disclosed in the present application is as follows. An
acoustic signal processing apparatus having a phase difference histogram calculation unit
characterized by sequentially improving localization accuracy and localizing directions of a
plurality of sound sources from data of one frame by using a plurality of microphone pairs
having different microphone intervals.
[0007]
In the broadband video conference, noise generated on the desk such as paper rubbing noise
does not impair the ease of listening to the voice, and it is possible to conference with easy-to-
04-05-2019
3
hear voice.
[0008]
The hardware configuration of this embodiment is shown in FIG.
The central processing unit 1 carries out all the calculations included in this embodiment. The
recording device 2 is a work memory configured of, for example, a RAM, and all variables used
when performing calculations are secured on the storage device 2. It is assumed that all data and
programs used at the time of calculation are stored in the storage device 3 configured by, for
example, a ROM. The microphone array 4 is composed of at least two or more microphone
elements. Each microphone element measures an analog sound pressure value. The number of
microphone elements is M. The A / D conversion device is a device that converts (samples) an
analog signal into a digital signal, and is a device that can synchronously sample signals of M
channels or more. The analog sound pressure value for each microphone element captured by
the microphone array 4 is sent to the A / D converter 5. The A / D conversion device 5 converts
the sound pressure value of each channel into digital data, and outputs the quantized sound
pressure value.
[0009]
The sound pressure value of each channel converted into digital data is processed by the central
processing unit 1 through the storage unit 2. The central processing unit 1 suppresses noise
components such as paper rubbing noise from the sound pressure value for each channel using
the information regarding the existence range of the target sound and noise stored in the storage
device 3 and the like, and emphasizes the target voice. Generate a signal.
[0010]
A block diagram of the software of this embodiment is shown in FIG. The microphone array 4 is
arranged in a straight line. The analog sound pressure value detected by the microphone array 4
is sent to the AD converter 6, and converted to digital data (Equation 1) for each channel. i is an
index representing a channel. A vector having digital data of each channel as an element is
described as (Equation 2). The equation (3) is sent to the Fourier transform unit 7. The Fourier
04-05-2019
4
transform unit 7 subjects digital data to Fourier transform for each microphone channel, and
outputs a signal (Equation 4) in the frequency domain. (Equation 4) is a vector having a band
division signal for each channel as an element, and is defined by (Equation 5). The Fourier
transform is a short time Fourier transform. Let τ be the frame index of the Fourier transform.
The frame size L and frame shift ST of the Fourier transform are set in advance. Hereinafter, the
index τ representing the frame and the frequency f will be omitted unless it is necessary to be
aware of the frame and the frequency, and the band division signal will be described as (Equation
6). The frequency band signal output from the Fourier transform unit 7 is sent to the phase
difference histogram calculation unit 8. The phase difference histogram calculation unit 8
calculates the phase difference between the microphones of the frequency band signal by
(Equation 7), and sequentially uses the calculated phase difference between the microphone pairs
of the plurality of microphone pairs from the phase difference of the microphone pair with short
microphone spacings. The accuracy of the phase difference is successively improved to generate
a histogram of the phase difference after the accuracy improvement. The method of calculating
the estimated value of the phase difference and the method of generating the histogram will be
described later.
[0011]
[0012]
[0013]
[0014]
[0015]
[0016]
[0017]
[0018]
04-05-2019
5
The obtained histogram of the phase difference is sent to the paper rubbing sound power
calculation unit 11.
A physical space that is highly likely to generate paper scraping noise is set and placed in
advance.
Assuming that the azimuth angle of the sound source is θ, the set physical space is expressed as
(Equation 8).
Although the range is specified only for the azimuth angle here, the elevation angle and the
distance may be restricted by the range.
In the physical space, the possible value of the phase difference between the microphones is
calculated by (Equation 9).
di is the microphone interval of the ith microphone pair.
The paper rubbing sound power calculation unit 11 adds P (δ) in the range of (Equation 9) and
outputs the result as the paper rubbing sound power.
Further, the paper rubbing sound power calculation unit 11 identifies a frequency band
satisfying (Equation 9) as a band in which the paper rubbing sound is dominant from the
estimated value of the phase difference for each frequency, and outputs an index of the
frequency band. In the target sound power calculation unit 12, as in the case of the paper
rubbing sound power calculation unit 11, a physical space in which the possibility of the target
sound being generated is high is set in advance as (Equation 10). Although the range is specified
only for the azimuth angle here, the elevation angle and the distance may be restricted by the
range. In the physical space, the possible value of the phase difference between the microphones
is calculated by (Equation 11). The paper scraping sound power is divided into a plurality of band
groups, for example, the paper scraping sound power is calculated every 1000 Hz, in addition to
the calculation from the frequency of the entire frequency band, and the paper scraping is
performed for each divided band group. Sound power may be calculated. By dividing into a
plurality of band groups in this manner, it is possible to more accurately estimate the paper rub
04-05-2019
6
sound power for each band group when the paper rub noise is biased to a part of the band
groups, etc. Become. The target sound power calculation unit 12 adds P (δ) in the range of
(Expression 11) and outputs the result as the target sound power. Further, the target sound
power calculation unit 12 specifies a frequency band satisfying (Expression 11) from the
estimated value of the phase difference for each frequency, and outputs the index of the
frequency band. Similar to paper scraping sound power, target sound power is divided from a
band into a plurality of band groups, for example, the target sound power is calculated every
1000 Hz, in addition to calculation from the frequency of all frequency bands. The target sound
power may be calculated each time.
[0019]
[0020]
[0021]
[0022]
[0023]
The paper rubbing sound presence determination unit 10 calculates the value of (Expression 12)
from the target sound power Psubject calculated by the target sound power calculation unit 12
and the paper rubbing sound power Pnoise calculated by the paper rubbing sound power
calculation unit 11.
If the calculated scale exceeds a predetermined threshold, it is determined that there is a paper
rubbing noise.
The paper rubbing sound presence determination unit 10 outputs the determination result as to
whether or not the paper rubbing sound exists, and the determination result is sent to the sound
source separation unit 9.
04-05-2019
7
When a band is divided into a plurality of band groups and paper scraping sound power and
target sound power are calculated for each of the divided band groups, it is determined for each
band group whether or not there is paper scraping sound, and band groups Output each
judgment result.
[0024]
The sound source separation unit 9 uses the band division signal which is an output signal of the
Fourier transform unit and the presence determination result of the paper rubbing sound to
perform the process of removing the paper rubbing sound.
Details of the removal process of the paper rubbing noise will be described later. The signal after
the paper scraping noise removal processing and the determination result as to whether or not
the paper scraping noise is present are sent to the dereverberation unit. The dereverberation unit
removes the reverberation component of the paper rubbing sound from the signal S ^ (f, τ) after
the paper rubbing sound removal processing based on the paper rubbing sound determination
result of the paper rubbing sound presence determination unit 10. Dereverberation is performed
by a spectral subtraction based method such as (Equation 13). Pecho is the power of the
reverberation component of the paper rub. Floor is a function that returns 0 if the argument is
less than or equal to 0, and returns the value of the argument if it is greater than or equal to 0.
Pecho is updated according to (Equation 14). | N | is the amplitude spectrum of the paper
rubbing sound for each frequency. When the paper rubbing sound power calculation unit 11
identifies the corresponding frequency as a band in which the paper rubbing sound is
predominant, it is assumed that | N | = | X |. In other cases, | N | = 0.
[0025]
[0026]
[0027]
[0028]
04-05-2019
8
The speech (15) after reverberation component removal is sent to the inverse Fourier transform
unit 14.
The inverse Fourier transform unit 14 performs inverse Fourier transform on the speech after
removing the reverberation component, and outputs a signal y (t) in the time domain.
The frame size of the inverse Fourier transform is equal to the frame size in the Fourier
transform unit.
The time domain signal output from the inverse Fourier transform unit is sent to the
superimposing and adding unit, and is superimposed and added according to the size of the
frame shift, and a superimposed time domain signal y ^ (t) is output.
[0029]
[0030]
FIG. 3 is a block diagram of the phase difference histogram calculation unit 8.
The frequency domain signal output from the Fourier transform unit 7 is sent to the phase
difference calculation unit 8-1. The phase difference calculating unit 8-1 first calculates phase
differences of a plurality of microphone pairs. Assuming that the index of the microphone pair is
i, the microphone interval of the microphone pair of index i is di. Further, the phase difference of
the microphone pair of index i is described as δi. The incoming azimuth of the sound source is
assumed to be θ. If there is no reverberation, reverberation and background noise, and there is
only one sound source, then θ and δi are in the relation of (Equation 16). The phase difference
calculating unit 8-1 calculates an estimated amount of the phase difference for each microphone
pair according to (Expression 17). arctan is an inverse function of tan and is a function taking
values from -π to + π. Therefore, δ ^ i also takes values from -π to + π. On the other hand, the
true phase difference takes a value in the range of (Equation 18). Therefore, in the case of
(Equation 19), δ ^ i can not cover the possible range of δi, and θ can not be determined. When
δi takes a value in a range that δ ^ i can not cover, ambiguity that is an integral multiple of 2π
occurs between δi and δ ^ i. Therefore, δi and δ ^ i are in the relationship of (Equation 20).
The phase difference calculation unit 8-1 uses a short microphone interval to obtain n, and then
04-05-2019
9
uses a long microphone interval to obtain δ ^ i. By doing so, n can be obtained at short
microphone intervals, so that the problem of ambiguity of integer multiples of 2π can be
eliminated, and the phase difference between omnidirectional noises between microphones does
not depend on microphone intervals, so The variation of δ ^ i does not depend on the
microphone interval. Therefore, it is considered that the variation from the true value is smaller
as sin θ obtained by (Equation 16) is longer.
[0031]
[0032]
[0033]
[0034]
[0035]
[0036]
Therefore, it is possible to obtain a more accurate phase difference as compared with δi
obtained using a short microphone interval.
Here, microphone arrangement of a linear arrangement as shown in FIG. 4 is assumed.
Select L microphone pairs from M microphone elements, and arrange L microphone pairs in
ascending order of the microphone interval.
The equation (23) is recursively executed from i = 0 to L−1 to obtain an estimated value δ ^
L−1 of the phase difference.
04-05-2019
10
The initial value of the microphone interval is (Equation 21), and the initial value of the phase
difference is (Equation 22).
[0037]
The phase difference obtained by the above processing is sent to the histogram calculation unit
8-2, and the histogram represented by (Expression 24) is calculated.
[0038]
[0039]
[0040]
[0041]
[0042]
The scraping noise that is a problem during video conferencing is usually the noise generated on
the desk.
On the other hand, human voice occurs at a position where the elevation angle is higher than
that on the desk.
When a microphone array arranged in a straight line in the vertical direction as shown in FIG. 5
is placed on a desk, the sound source whose sound elevation angle (vertical angle is 0 °) is 90 °
or more is a paper rubbing sound, and the elevation angle is A sound source of 90 ° or less can
be estimated to be human speech.
Therefore, when the peak of the histogram calculated by the phase difference histogram
calculation unit 8 stands in the range of the phase difference corresponding to an elevation angle
of 90 ° or less using the estimated phase difference δ ^ L-1, the peak is a paper It can be
04-05-2019
11
considered to indicate the power of rubbing noise.
By setting θ noise_min = 90 and θ noise_max = 180, the paper rubbing sound power
calculation unit 11 can calculate the paper rubbing sound power.
[0043]
FIG. 6 is a diagram showing the data structure of the noise presence range and the sound
presence range set by the user through the user interface.
”No.”
Indicates the index of registered data. The "type" designates noise or speech, and if noise, it is
specified whether it is sudden noise such as paper-scrubbing noise or stationary noise such as
the operation noise of an air conditioner. "Range" is a column that specifies the range in which
the sound source is present, and specifies the range of azimuth "θ" and "elevation" φ ".
[0044]
In the paper rubbing sound power calculation unit 11, a range designated in advance as in
(Equation 8) may be set as the existence range of the paper rubbing sound, or using the data
designated by the user through the user interface in the structure of FIG. It is also good. When
the user registers two or more sudden noises, the paper rubbing sound power calculation unit 11
calculates Pnoise for each noise. Also, the frequency index within the noise source range of each
noise is specified and output.
[0045]
Similarly, the paper rubbing sound presence determination unit 10 calculates a ratio for each
noise, and separately performs a presence determination for each sudden noise.
[0046]
04-05-2019
12
The sound source separation unit 9 calculates how to transmit the sound of each sound source
(steering vector) according to (Equation 25) from the frequency components included in each
sound source range of the target sound and noise.
i is an index for a sound source, and "No." in the data structure of FIG. And one-to-one
correspondence. Whether the noise is included in the sound source range of sudden noise or
voice can be known from the frequency index output by the paper rubbing sound power
calculation unit 11 and the target sound power calculation unit 12; It is determined for each
frequency whether the condition 11) is satisfied, and the frequency component determined to be
satisfied is regarded as a frequency component included in the sound source range of stationary
noise.
[0047]
[0048]
If the sound source direction of X is the range of the ith sound source, the steering vector of the
ith sound source is updated by (Equation 25).
The steering vector is not updated for the sound sources other than the i-th one. Further, the
magnitude of the steering vector is normalized to 1 by (Equation 26). A matrix having a steering
vector whose magnitude is normalized to 1 as an element is defined by (Equation 27) as A (f, τ).
Calculate the generalized inverse matrix of A (f, τ) by (Equation 28). The sound source
separation unit 9 generates three types of separated sounds using the generalized inverse matrix
of A (f, τ) and A (f, τ).
[0049]
[0050]
[0051]
04-05-2019
13
[0052]
From these three types of separated sounds, an appropriate separated sound is selected and
output for each time-frequency.
[0053]
The first separation sound is calculated by (Equation 29).
In (Equation 30), it is classified to which sound source each time-frequency belongs.
From the distributed result, a steering vector for separation is selected by (Equation 31), and
separated sound is obtained by (Equation 32).
This separated sound distributes the output signal to only one sound source at each timefrequency, and when there are two or more sound sources, the output sound may be distorted,
but the noise suppression performance is compared with S1. It is expensive.
In Equation (33), the power is distributed to one sound source for each time-frequency, and the
separated sound is obtained from the power of the sound source subtracted from the input
signal. Although this is less likely to distort the output sound as compared to S2, the noise
suppression performance is lowered. Also, this separated sound is used only when the sound
source to which the component is distributed for each time-frequency is noise. Alternatively, the
output signal obtained may be used as a separated sound by using a dead square shaped beam
former which forms a dead angle in the noise direction and forms a beam in the target sound
direction. Further, noise removal processing by spectral subtraction may be added to the
separated sound. At this time, the subtraction coefficient of the spectral subtraction may be
linked to the ratio defined by (Equation 12), and the subtraction coefficient may be set to be
larger as the ratio is larger. With such a configuration, it is possible to strongly suppress the
paper rubbing noise only when the paper rubbing noise is present.
[0054]
04-05-2019
14
[0055]
[0056]
[0057]
[0058]
[0059]
FIG. 7 is a processing flow diagram from the paper scraping sound presence determination unit
10 to the sound source separation unit 9 and the dereverberation unit 13.
In S1, if the ratio defined in (Equation 12) exceeds the predetermined threshold using the paper
rubbing sound power and the target sound power, it is determined that the paper rubbing sound
is present, and if it falls below the threshold, the paper rubbing sound It is determined that there
is no
When it is determined that the paper rubbing sound is present, the paper rubbing sound removal
is performed.
In the paper rubbing noise removal, the three separated sounds calculated by the sound source
separating unit 9 are switched and used according to the result of the presence determination of
the paper rubbing sound.
For the frequency components distributed in the paper rubbing sound direction in (Equation 30),
(Equation 32) is taken as the separated sound in the case where the paper rubbing sound exists.
With respect to frequency components that are not distributed in the paper rubbing sound
direction in (Equation 30), (Equation 29) is taken as the separation sound in the case where there
is a paper rubbing sound. That is, when there is a paper rubbing noise, it is necessary to remove
04-05-2019
15
the paper rubbing noise as much as possible, so a strong suppression process is performed. If
there is no paper scraping noise, the disturbance noise suppression processing is not performed
and the input signal is output without processing. By doing this, it is less likely to distort the
target sound when there is no paper rubbing noise. In addition, even when it is determined that
the paper scraping noise does not exist, when the ratio exceeds a certain value, weak suppression
processing may be performed based on (Expression 33). Further, in the case where the presence
of stationary noise can be assumed, the configuration may be such that stationary noise is always
suppressed using the separated sound of (Equation 29) even if there is no paper scraping noise.
[0060]
In the paper rubbing sound reverberation determination, after the paper rubbing sound is
present, when it is not after a predetermined number of frames, it is determined that the
dereverberation is performed. If there is a predetermined number of frames after the presence of
the paper rubbing sound, it is determined that the dereverberation is not performed. If it is
determined that dereverberation is to be performed, dereverberation processing is performed
based on (Equation 13), and a signal after dereverberation is output. FIG. 8 shows a typical
example of the time change of the amplitude value of the paper rubbing sound. Since the paper
rubbing noise makes a small paper rubbing noise, echo, and reverberation after the direct sound
is produced, it can be seen that the amplitude does not attenuate for a while. Therefore, it is
effective to detect the direct sound of the paper scraping noise and then to perform the
dereverberation processing for a while to strongly suppress the noise.
[0061]
FIG. 9 shows a comparison of the power spectra of human voice and paper rubbing sound.
[0062]
While the scraping noise has almost uniform power at all frequencies, the voice is biased to a
relatively low band such as 1000 Hz or less.
Therefore, even if the paper scraping sound power calculated from the signals of the entire band
exceeds the target sound power calculated from the signals of the entire band, the target sound
power of the audio is the paper scraping sound at frequencies lower than 1000 Hz. There is a
possibility to surpass the power. In such a case, if strong interference noise suppression
04-05-2019
16
processing is performed for 1000 Hz or less, the sound may be distorted and the sound may be
difficult to hear. The paper rubbing sound power calculation unit 11, the target sound power
calculating unit 12, and the paper rubbing sound presence determination unit 10 calculate the
paper rubbing sound power and the target sound power for each of a plurality of band groups,
and determine whether the paper rubbing sound exists. By performing the determination for
each band group and using the determination result and switching the separation method for
each band group, a weak separated sound is selected for a band group in which voice is
predominant, and a voice with less distortion is output. Is possible.
[0063]
Next, with respect to processing in the case of using an arrangement other than the linear
arrangement as the microphone arrangement, a change in the process in the phase difference
calculation unit 8-1 will be described. As an arrangement other than the linear arrangement, a
method of using a plurality of concentric equilateral triangle arrangements shown in FIG. 10 and
having different sizes will be described. The equilateral triangular microphone array 16 is used
instead of the microphone array 4. The equilateral triangular microphone array 16 has a
plurality of concentric equiangular sub microphone arrays 16-1 to U different in size. In the
linear arrangement, localization is possible only in the range of -90 degrees to 90 degrees, but in
the equilateral triangle arrangement, localization in all directions is possible from -180 degrees
to 180 degrees.
[0064]
For U sub-microphone arrays arranged at the vertices of each regular triangle, indexes are
assigned sequentially from the smallest size. L microphone pairs are selected for each sub
microphone array. The physical position vector of the microphone element is described as P.
[0065]
For the i-th microphone pair of the l-th sub-microphone array, let two microphone elements be i0
and i1. At this time, the difference between the position vectors of the microphone pair is
calculated by (Equation 34). Further, a matrix having elements of differences in position vectors
of these microphone pairs is defined by (Equation 35). The pseudo inverse matrix of Dl is
obtained by (Equation 36) and (Equation 37). A vector having, as an element, the phase
04-05-2019
17
difference for each of L microphone pairs of the l-th submicrophone is obtained from the input
signal for each time-frequency by Eq. (38). If the microphone spacings of all the microphone
pairs are c / 2f or less, the position vector of the sound source whose magnitude is normalized by
1 can be obtained by (Equation 39). The wider the distance between the microphones, the more
accurate the estimation of the position vector of the sound source is.
[0066]
[0067]
[0068]
[0069]
[0070]
[0071]
[0072]
However, if even one microphone spacing exceeds c / 2f, the ambiguity of 2π of the phase
occurs as in the linear arrangement, and the relationship between the sound source direction and
r becomes (Equation 40), with an indefinite term n .
Therefore, as in the linear arrangement, the indefinite term n is calculated by the sub microphone
array with a short microphone interval, and the phase difference is calculated more accurately by
the sub microphone array with a long microphone interval.
The initial value of the phase indeterminate term is set to (Equation 41).
04-05-2019
18
The initial value of the vector r consisting of the phase difference for each microphone pair is set
to (Expression 42).
nl is a vector having, as elements, indeterminate terms of integer values shown in (Equation 43).
For each sub microphone, calculate n1 that satisfies (Expression 44).
1Is a vector in which all elements have the value 1 as shown in (Equation 45). The phase vector
after finding the indeterminate n is defined by (Equation 46).
[0073]
The phase vector after finding the indeterminate n in all the sub microphone arrays is calculated,
and using the phase vector of the sub microphone array with the largest size, an estimated value
of the sound source direction is obtained by Eq. The histogram calculation unit 8-2 calculates a
histogram of the determined sound source direction. When the determined sound source
direction satisfies (Expression 48), it can be determined that the frequency component belongs to
the i-th sound source.
[0074]
[0075]
[0076]
[0077]
[0078]
[0079]
04-05-2019
19
[0080]
[0081]
[0082]
[0083]
It shows about the process at the time of using the several sub microphone array arrange |
positioned concentrically.
[0084]
Consider placing the microphones concentrically as shown in FIG.
[0085]
Regarding the same circumferential microphone array 17, the microphone spacing of the
microphone element 1 and the microphone element 2 and the microphone spacing of the
microphone element 4 and the microphone element 5 and the microphone spacing of the
microphone element 7 and the microphone element 8 are equal to d0, these three microphone
pairs Let be the microphone pair of the 0th sub microphone array.
Similarly, the microphone spacing of the microphone elements 2 and 3 and the microphone
spacing of the microphone elements 5 and 6 and the microphone spacing of the microphone
elements 8 and 9 are equal d1 and these three microphone pairs are the first sub microphone
array And a microphone pair.
Similarly, the microphone spacing of the microphone element 1 and the microphone element 3
and the microphone spacing of the microphone element 4 and the microphone element 6 and the
microphone spacing of the microphone element 7 and the microphone element 9 are equal d2
and these three microphone pairs are the second sub microphone array And a microphone pair.
04-05-2019
20
It is assumed that d0 <d1 <d2.
[0086]
For these three sub-microphone arrays, as in the regular triangle arrangement, a phase vector
with ambiguity resolved is obtained based on (Equation 44), and a sound source direction is
obtained based on (Equation 47) from the phase vector. Sound source localization becomes
possible.
[0087]
The figure which showed the hardware constitutions of this invention.
Block diagram of the software of the present invention.
The block diagram of the phase difference histogram calculation part of this invention.
The layout of a linear microphone array.
An example of arranging a microphone array on a desk.
Structure of data set by the user regarding the type of noise of the present invention.
The processing flow figure of the paper rubbing noise removal of this invention.
The figure which showed the time change of the amplitude value of the paper rubbing sound.
The comparison figure of the power spectrum of a voice, and the power spectrum of a paper
ruble.
04-05-2019
21
The figure which showed one example of the equilateral triangle arrangement | positioning
which can be used as a microphone array of this invention. The figure which showed one
example of the same circumferential arrangement | positioning which can be used as a
microphone array of this invention.
Explanation of sign
[0088]
1・· · Central processing unit, 2 · · · Storage device composed of RAM, etc · · · · Storage device
composed of ROM etc, 4 ... Microphone array composed of at least two or more microphone
elements, 5 ... A / D converter for converting analog sound pressure value to digital data, 6 ... A /
D conversion means for converting analog sound pressure value to digital data, 7: digital data in
time domain Band division means for converting data into digital data in the frequency domain, 8
... signal processing means for calculating the phase difference of the band divided signals for
each band and generating a histogram of the phase difference, 9 ... band division Sound source
separation means for separating and extracting the target sound component from the signal, 10 ·
· · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · 12 paper rubbing
noise presence determination means to determine whether paper rubbing noise is present for
each frame Range of power 12. Means for estimating, 12 ... means for estimating the power in
the range where the predetermined target sound exists, 13 ... dereverberation means for
suppressing the reverberation component of noise from the signal after sound source separation,
14 ... dereverberation Inverse Fourier transform means for performing inverse Fourier transform
on the subsequent signal and converting it to time domain signal, 15: superposition and addition
means for superposing the inverse Fourier transformed signal at each frame shift, 16: plural
subdivisions of an equilateral triangle An equilateral triangle microphone array having a
microphone array, 17: a microphone array having a plurality of sub microphone arrays on the
same circumference, S1: determination processing as to whether or not a paper rubbing sound is
present, S2: paper rubbing Processing to determine whether or not reverberation exists based on
whether it is within several frames after the sound exists.
04-05-2019
22
Документ
Категория
Без категории
Просмотров
0
Размер файла
34 Кб
Теги
jp2008054071
1/--страниц
Пожаловаться на содержимое документа