close

Вход

Забыли?

вход по аккаунту

?

JP2010175431

код для вставкиСкачать
Patent Translate
Powered by EPO and Google
Notice
This translation is machine-generated. It cannot be guaranteed that it is intelligible, accurate,
complete, reliable or fit for specific purposes. Critical decisions, such as commercially relevant or
financial decisions, should not be based on machine-translation output.
DESCRIPTION JP2010175431
An object of the present invention is to improve estimation accuracy of a sound source direction.
A sound source direction estimation apparatus according to the present invention includes a
microphone array consisting of three microphones arranged at the apex of an equilateral
triangle, and frequency conversion for converting signals received by the microphones of the
microphone array into signals in the frequency domain. And an arrival time difference calculation
unit that calculates arrival time differences for each combination of microphone pairs of different
microphones, and a sound source direction estimation unit that obtains sound source candidates
from the arrival time differences and classifies sound source direction candidates. The sound
source direction estimation unit includes a sparsity determination unit that determines whether
sparsity can be assumed or not for each frequency bin of the arrival time difference, and
determines sound source candidates from arrival time differences of frequency bins that can
assume sparsity; Classify the candidates. [Selected figure] Figure 1
Sound source direction estimation device and method thereof, and program
[0001]
The present invention relates to a sound source direction estimation device for detecting the
direction of a speaker used in videophone calls, audio conferences, etc.
[0002]
For example, Non-Patent Document 1 discloses a sound source direction estimation method used
in a conventional audio conference or the like.
04-05-2019
1
The method estimates the directions Sn of N (N ≧ 2) different sound sources using a microphone
array consisting of three microphones 1, 2, 3 arranged at the vertices of an equilateral triangle as
shown in FIG. It is a thing. The operation of the sound source direction estimation apparatus 300
will be described with reference to FIG.
[0003]
The sound source direction estimation device 300 includes three microphones 1, 2, 3 disposed at
the vertex of an equilateral triangle, frequency conversion units 11, 12, and 13, arrival time
difference calculation units 21, 22, and 23, a sound source direction estimation unit And 150. A
signal xi (n) in time sample n received by the microphones 1, 2, 3 is input to the frequency
conversion units 11, 12, 13 and is a signal in the frequency domain obtained for each frame
which is a set of a plurality of time samples It is converted to Xi (ω, m). Here, m and ω
respectively indicate the number of the signal frame subjected to the frequency conversion and
the frequency of the converted signal. The microphone reception signal whose frequency has
been converted is input to the arrival time difference calculation units 21, 22, 23. The arrival
time difference calculation units 21, 22, and 23 calculate Equation (1) for each of the
combinations of three different microphone pairs, and the arrival time differences τij (ω, m) (i, j
に お け るOutput 3, i ≠ j). i and j indicate the microphone numbers.
[0004]
[0005]
The arrival time difference τ ij is input to the sound source direction estimation unit 150, and
the estimated sound source direction θ n ^ is output.
In addition, the notation in the figure is correct. An example of the functional configuration of the
sound source direction estimation unit 150 is shown in FIG. The sound source direction
estimation unit 150 includes a vectorization unit 151, a sound source direction calculation unit
152, and a histogram calculation unit 153. The vectorization unit 151 receives the arrival time
differences τ 12 (ω, m), τ 23 (ω, m), and τ 31 (ω, m) output by the arrival time difference
calculation units 21, 22 and 23 as shown in Expression (2) The arrival time difference vector t
04-05-2019
2
(ω, m) is output. The vectorization unit 151 simply arranges the input arrival time difference τ
ij (ω, m) into a vector.
[0006]
[0007]
The sound source direction calculation unit 32 multiplies the input arrival time difference vector
t (ω, m) from the left by the coordinate transformation matrix D given by the equation (4) as in
the equation (3), and The sound source direction candidate θ ′ (ω, m) is obtained from the one
element and the second element by the calculation of equation (5).
[0008]
[0009]
The histogram calculation unit 153 obtains a histogram from the input sound source direction
candidate θ ′ (ω, m), and outputs the direction giving the peak of the histogram as a sound
source direction estimated value θa ^ (a = 1,..., A ′). .
A 'is the maximum number of simultaneous sound sources given in advance.
[0010]
Here, the histogram is calculated by classifying all the sound source direction candidates θ ′
(ω, m) obtained in the frequency bins of a plurality of continuous frames into predetermined
angle widths.
As the number of frames used when obtaining the histogram, the number of frames
corresponding to the length of time in which the sound source does not move is selected.
For example, when the frame length is 16 ms and it is considered that the sound source does not
04-05-2019
3
move for about 0.5 seconds, for example, the histogram is calculated using the sound source
direction candidate θ ′ (ω, m) obtained in each of the 30 frames Be Assuming that the
sampling frequency of the signal is 16 kHz and the frequency conversion method is, for example,
short-time Fourier transform using data of 256 points, the number of sound source direction
candidates θ ′ (ω, m) is a frequency of 3840 (128 × 30) Equal to the number of bins.
[0011]
Masao Matsuo, Yusuke Hioka and Nozomu Hamada, “Estimating DOA of multiple speech signals
by improved histogram mapping method,” Proceedings of IWAENC 2005, pp. 129-132.
[0012]
In the conventional method, when the sound source signal is a non-stationary signal such as
speech, in which components concentrate at a specific frequency, any frequency bin at any time
is only the component of one of a plurality of sound sources. The processing is performed under
the assumption that there is sparsity in the time frequency domain.
[0013]
[What is sparsity] Here, sparsity seems to be concentrated in a part of the region where the
energy of the target signal is present (in many cases, in the time frequency region) and 0 in many
other regions. If there is a property of interest, it is called signal sparsity.
[0014]
However, in general, as the number of sound sources increases, the assumption of signal sparsity
breaks down, so that the conventional technology can not estimate the sound source direction
with sufficient accuracy.
For example, when the speakers located in different directions speak at the same time, the
estimation accuracy of the sound source direction is degraded.
Further, in the actual environment, sounds other than voice are often generated, and many of the
sounds are signals in which the component of the sound spreads in a steady and wide frequency,
such as the sound of an air conditioner or a fan of a personal computer.
04-05-2019
4
Since these sounds can not be assumed to be sparse, superimposing them on the sound of the
sound source further causes the estimation accuracy of the sound source direction to deteriorate.
[0015]
The present invention has been made in view of this point, and even if a speaker located in
different directions speaks at the same time, a sound source direction estimation device and its
method capable of accurately estimating those directions, and its program Intended to provide.
[0016]
The sound source direction estimation apparatus according to the present invention is different
from a microphone array consisting of three microphones arranged at the apex of an equilateral
triangle and a frequency conversion unit for converting signals received by the microphones of
the microphone array into signals in the frequency domain. The present invention comprises an
arrival time difference calculation unit that calculates arrival time differences for each
combination of microphone pairs of microphones, and a sound source direction estimation unit
that obtains sound source candidates from the arrival time differences and classifies sound
source direction candidates.
The sound source direction estimation unit includes a sparsity determination unit that
determines whether sparsity can be assumed for each frequency bin of the arrival time
difference, and determines sound source candidates from arrival time differences of frequency
bins for which sparsity can be assumed. Classify direction candidates.
[0017]
According to the present invention, the sparsity determination unit removes the arrival time
difference of frequency bins for which the sparsity of the sound source can not be hypothesized,
and finds the sound source candidate from the arrival time difference of frequency bins for which
the remaining sparsity can be assumed. Therefore, the sound source direction estimation
apparatus according to the present invention excludes the arrival time difference between
frequency bins in which both voices are mixed even if speakers at different positions speak at the
same time, and based on the arrival time difference consisting of a single sound source. Estimate
04-05-2019
5
the direction. Therefore, sound source direction estimation can be performed with high accuracy.
[0018]
The figure which shows the function structural example of the sound source direction estimation
apparatus 100 of this invention. FIG. 6 is a diagram showing an operation flow of the sound
source direction estimation device 100. FIG. 2 is a diagram showing an example of a functional
configuration of a sound source direction estimation unit 30. FIG. 7 is a diagram showing an
example of a functional configuration of a sparsity determination unit 34. FIG. 7 is a diagram
showing an operation flow of the sparsity determination unit 34. The figure which shows the
example of an arrival time difference vector and an arrival time difference orthonormal vector.
The figure which shows an example of vector orthogonal degree P ((theta)) in case there are
multiple sound sources. The figure which shows an example of vector orthogonal degree P
((theta)) in case the number of sound sources is one. FIG. 7 is a view showing an example of the
functional configuration of a sparsity determination unit 34 '; The figure which shows the
operation | movement flow of sparsity determination part 34 '. The figure which shows an
example of the result of having estimated the sound source direction by the conventional sound
source direction estimation apparatus 300. FIG. The figure which shows an example of the result
of having estimated the sound source direction by the sound source direction estimation
apparatus 100 of this invention. The figure which shows the plane of a microphone array. The
figure which shows the function structural example of the conventional sound source direction
estimation apparatus 300. FIG. The figure which shows the function structural example of the
conventional sound source direction estimation part 150. FIG.
[0019]
Hereinafter, embodiments of the present invention will be described with reference to the
drawings. The same reference numerals are given to the same parts in the plurality of drawings,
and the description will be omitted.
[0020]
FIG. 1 shows an example of the functional configuration of a sound source direction estimation
apparatus 100 according to the present invention. The sound source direction estimation
04-05-2019
6
apparatus 100 includes a microphone array including three microphones, frequency conversion
units 11, 12, and 13, arrival time difference calculation units 21, 22, and 23, and a sound source
direction estimation unit 30. The sound source direction estimation apparatus 100 differs from
the sound source direction estimation apparatus 300 described in the prior art only in that the
sound source direction estimation unit 30 includes the sparsity determination unit 34 and the
processing procedure using the determination result.
[0021]
The same parts as the operation of the conventional sound source direction estimation apparatus
300 will be briefly described with reference to the operation flow of FIG. The frequency
converters 11, 12, 13 convert the signals received by the microphones 1, 2, 3 into signals in the
frequency domain (step S11). The arrival time difference calculation units 21, 22, 23 calculate
the arrival time differences τ ij (ω, m) (τ 12 (ω, m), τ 23 (ω, m) for each combination of
microphone pairs of different microphones 1, 2, 3 , Τ 31 (ω, m)) are calculated (step S 21). The
sound source direction estimation unit 30 obtains a sound source candidate θ ′ (ω, m) from
the arrival time difference τ ij (ω, m), and classifies the sound source candidate θ ′ (ω, m)
(step S30).
[0022]
The sound source direction estimation apparatus 100 according to the present invention is
characterized in that the sound source direction estimation unit 30 includes the sparsity
determination unit 34 that determines whether sparsity can be assumed or not for each
frequency bin of the arrival time difference τ ij (ω, m). new. The sound source direction
estimation unit 30 obtains sound source candidates from the arrival time difference τ ij (ω, m)
of frequency bins that can be assumed to be sparsity output by the sparsity determination unit
34, and classifies sound source candidates (step S30). The determination of the sparsity is
performed for each frame m and for each frequency bin ω. Therefore, even if speakers at
different positions speak simultaneously, the arrival time difference τ ij (ω, m) of the frequency
bin in which both voices are mixed is excluded, so that the estimation of each sound source
direction can be performed with high accuracy.
[0023]
04-05-2019
7
FIG. 3 shows an example of the functional configuration of the sound source direction estimation
unit 30. As shown in FIG. The sound source direction estimation unit 30 includes a vectorization
unit 151, a sparsity determination unit 34, a sound source direction calculation unit 152 ′, and
a histogram calculation unit 153. As apparent from comparison with the functional configuration
example (FIG. 15) of the sound source direction estimation device 300 according to the prior art,
the sound source direction estimation unit 30 determines between the sparseness determination
unit 34 between the vectorization unit 151 and the sound source direction calculation unit 152.
The conventional sound source direction estimation unit 150 differs from the conventional sound
source direction estimation unit 150 in that the sound source direction calculation unit 152 ′
calculates the sound source direction with reference to the determination result.
[0024]
An example of a functional configuration of the sparsity determination unit 34 of this
embodiment is shown in FIG. 4, and its operation flow is shown in FIG. The sparsity
determination unit 34 includes an orthogonal matrix calculation unit 35, a vector orthogonality
calculation unit 36, and an orthogonality determination unit 38. Orthogonal matrix calculation
unit 35 receives arrival time difference vector t (ω, m) output from vectorization unit 151 as
input, and two arrival time difference orthonormal vectors t⊥1 orthogonal to arrival time
difference vector t (ω, m) (Ω, m) and t⊥ 2 (ω, m) are output (step S35). This orthonormal vector
can be determined, for example, by Gram-Schmidt orthonormalization. (Reference "G. Strang,
"Linear Algebra and Its Applications," Industrial Books, pp. 141-143 ")
[0025]
The arrival time difference orthonormal vectors t⊥1 (ω, m) and t⊥2 (ω, m) are input to the
vector orthogonality calculation unit 36, and the orthogonality of the arrival time difference
vector to the theoretical value te (θ) is determined (Step S36). The theoretical value te (θ) of the
arrival time difference vector is a value that can be calculated by equation (6).
[0026]
[0027]
04-05-2019
8
Here, d is the length of one side of the triangle formed by the microphones 1, 2, 3 arranged at
the apex of the triangle (see FIG. 13).
c is the speed of sound. Thus, te (θ) is a theoretical value that can be calculated regardless of the
measured value. As the theoretical value te (θ) of the arrival time difference vector, one recorded
in the recording unit 37 may be read out sequentially as shown in FIG. 4 or a value recorded in
advance in the vector orthogonality calculation unit 36 is used. You may do so.
[0028]
Here, the meaning of obtaining two arrival time difference orthonormal vectors t⊥1 (ω, m) and
t⊥2 (ω, m) orthogonal to the arrival time difference vector t (ω, m) will be described. FIG. 6
shows arrival time difference orthonormal vectors t⊥1 (ω, m) and t⊥2 (ω, m) with respect to
any arrival time difference vector t (ω, m). In order to know the direction of this arrival time
difference vector t (ω, m), the vector whose direction is known and the arrival time difference
orthonormal vector t⊥1 (ω, m), t⊥2 (ω, m) You can see if they are orthogonal or not. If
orthogonal, the direction of arrival time difference vector t (ω, m) is the same as the direction of
the known vector.
[0029]
The vector orthogonality calculation unit 36 calculates the orthogonality P (the arrival time
difference orthonormal vectors t 差 1 (ω, m), t⊥2 (ω, m)) and the arrival time difference vector
te (θ) of the theoretical value. Equation (7) is used to calculate θ) (step S36).
[0030]
[0031]
Equation (7) is all directions for the arrival time difference orthonormal vector t⊥1 (ω, m), t⊥2
(ω, m) corresponding to each arrival time difference vector t (ω, m) It is calculated for the
arrival time difference vector te (θ) of theoretical values of 0 to 359 degrees.
Since the direction of the arrival time difference vector te (θ) of the theoretical value calculated
04-05-2019
9
by Equation (7) is known, the theoretical value and the arrival time difference orthonormal vector
t⊥1 (ω, m), t⊥2 (ω, m) When the symbol and the symbol are orthogonal to each other, the first
term and the second term of the denominator of equation (7) become 0 respectively.
Thus, the orthogonality P (θ) has a large value. On the contrary, in the case of an angle different
from the theoretical value, since the first term and the second term of the denominator in
equation (7) have a certain value, the value of the orthogonality P (θ) becomes a small value.
[0032]
In this way, arrival time difference orthonormal vectors t⊥1 (ω, m) and t⊥2 (ω, m) orthogonal
to the arrival time difference vector t (ω, m) are determined, and these and the arrival time
difference vector te of the theoretical value By evaluating whether (θ) is orthogonal or not, it is
possible that the arrival time difference vector t (ω, m) is a vector produced by one sound source
or a vector produced by mixing signals of other sound sources. It can be determined.
[0033]
Specific examples of the degree of orthogonality P (θ) calculated by the equation (7) are shown
in FIG. 7 and FIG.
The horizontal axis is the arrival direction of the signal [degrees], and the vertical axis is the
maximum vector orthogonality maxP (θ). Here, the 0 degree direction is the direction of the
microphone 1 viewed from the center of the microphone array when the microphone array is
placed on the desk (FIG. 13). FIG. 7 shows the maximum vector orthogonality maxP (θ) at each
angle when the angle between the sound source 1 located at an angle of 10 degrees and another
sound source 2 is changed from 0 degrees to 360 degrees. . The maximum vector orthogonality
maxP (θ) shows a large value of about 32 only when the angles of the sound source 1 and the
sound source 2 coincide with each other, and shows a small value of about 12 or less in the other
directions.
[0034]
FIG. 8 shows the maximum vector orthogonality P (θ) when the angle of the sound source is
changed from 0 degree to 360 degrees when there is only one sound source. The maximum
04-05-2019
10
vector orthogonality maxP (θ) in all directions of the signal arrival direction indicates the same
magnitude (about 32) as the angle of 10 degrees in FIG.
[0035]
The orthogonality determination unit 38 determines the orthogonality of the arrival time
difference vector t (ω, m) by comparing the orthogonality P (θ) with the threshold value Th
(step S 38). The arrival time difference vector t (ω, m) having high orthogonality is a vector from
a sound source at one fixed position, that is, the arrival time difference vector t (ω, m) where
sparsity can be assumed. On the contrary, sparsity can not be assumed for a small arrival time
difference vector t (ω, m) with a small degree of orthogonality P (θ).
[0036]
As shown in the equation (8), whether or not the sparsity can be assumed is determined with the
threshold value Th as, for example, 15 (step S380).
[0037]
[0038]
If the orthogonality P (θ) is larger than Th = 15, the sparsity determination result NJ (ω, m) is 1
(step S382), and if smaller, NJ (ω, m) is 0 (step S381), and all arrivals are achieved. The arrival
time difference vector t (ω, m) is updated (step S 384) until the determination on the time
difference vector t (ω, m) ends (Y in step S 383).
Therefore, the sparsity of all frame m and arrival time difference vectors t (ω, m) of frequency
bin ω is determined.
[0039]
The sound source direction calculation unit 152 ′ refers to the sparsity determination result NJ
(ω, m), and the sound source shown in the equation (5) only for the arrival time difference vector
04-05-2019
11
t (ω, m) of NJ (ω, m) = 1. Direction candidate θ ′ (ω, m) is calculated and output to the
histogram calculation unit 153.
The calculation of the sound source direction candidate θ ′ (ω, m) and the operation of
obtaining a histogram by the histogram calculation unit 153 and using the angle giving the peak
value as the sound source direction are the same as in the prior art.
[0040]
As described above, since the sound source direction estimation apparatus 100 estimates the
sound source direction using the arrival time difference vector t (ω, m) of the frequency bin for
which sparsity can be assumed, it is possible for speakers of different positions to speak at the
same time Even if there is a case, each sound source direction can be accurately estimated.
Although the determination of the sparsity has been described by the method of obtaining the
normalized orthogonal vector to the arrival time difference vector, the present invention is not
limited to this method. Another embodiment of the sparsity determination method will be
described next.
[0041]
The sparsity determination method of the second embodiment is a method of evaluating the
sparsity by evaluating the difference between the directions of the arrival time difference vector t
(ω, m) and the arrival time difference vector te (θ) of the theoretical value. FIG. 9 shows an
example of the functional configuration of the sparsity determination unit 34 'of the second
embodiment. The sparsity determination unit 34 'includes an inter-vector distance calculation
unit 90 and a vector match determination unit 91.
[0042]
The inter-vector distance calculation unit 90 receives the arrival time difference vector t (ω, m)
as an input and normalizes it with the magnitude of the arrival time difference vector itself to
obtain the theoretical value te (θ) of the arrival time difference vector itself. The distance P ′
(θ), which is the absolute value of the value obtained by subtracting the normalized theoretical
value normalized by the magnitude, is calculated by equation (9).
04-05-2019
12
[0043]
[0044]
Here, te (θ) is the magnitude of the theoretical value of the arrival time difference vector
calculated by equation (6).
As the theoretical value te (θ) of the arrival time difference vector, one recorded in the recording
unit 37 ′ may be read out sequentially as shown in FIG. 9, or the value recorded in the intervector distance calculation unit 90 is used. You may do so.
[0045]
The distance P ′ (θ) is a value that becomes 0 when the direction of the arrival time difference
vector t (ω, m) matches the direction of the theoretical value te (θ) of the arrival time difference
vector.
Therefore, it can be determined whether the arrival time difference vector t (ω, m) is a vector
from one sound source or a vector influenced by another sound source according to the
magnitude of the value. That is, it can be determined by the magnitude of the distance P ′ (θ)
whether or not the arrival time difference vector t (ω, m) where sparsity can be assumed.
[0046]
In the case of the second embodiment, the magnitude of the distance P ′ (θ) is determined by
the vector match determination unit 91 (step S91). Contrary to Example 1, the smaller the
distance P ′ (θ) is, the arrival time difference vector t (ω, m) where sparsity can be assumed.
The other processes are the same as in the first embodiment. Thus, it is also possible to
determine the presence or absence of the sparsity of the arrival time difference vector t (ω, m).
[0047]
04-05-2019
13
Simulation Results In order to confirm the effects of the present invention, the sound source
direction estimation performances of the conventional sound source direction estimation
apparatus 300 and the sound source direction estimation apparatus 100 of the present invention
were compared. The simulation assumes that the sound source is a male located in the direction
of an angle of 10 degrees, and a female located in the direction of an angle of 20 degrees, and
white noise without sparsity is superimposed at a SN ratio of 10 dB It went on condition.
[0048]
The resulting histograms are shown in FIG. 11 and FIG. The horizontal axis is the arrival direction
of the signal in degrees, and the vertical axis is the frequency. FIG. 11 is a histogram obtained by
the conventional sound source direction estimation apparatus 300. The top of the histogram is
offset in the direction of 5 degrees and 15 degrees. FIG. 12 is a histogram obtained by the sound
source direction estimation apparatus 100 of the present invention. Two different peaks occur
correctly in the directions of 10 degrees and 20 degrees, and the peaks appear prominently in
comparison with FIG. Thus, it has been confirmed that the sound source direction estimation
accuracy of the sound source direction estimation apparatus 100 of the present invention is high.
[0049]
The sound source direction estimation apparatus and the method of the present invention
described above are not limited to the above-described embodiment, and can be appropriately
modified without departing from the spirit of the present invention. For example, the processes
described in the apparatus and method described above are not only performed in chronological
order according to the order of description, but also as parallel or individually as needed
depending on the processing capability of the apparatus performing the process or the need. It is
also good.
[0050]
Further, when the processing means in the above-mentioned device is realized by a computer, the
processing content of the function that each device should have is described by a program. Then,
by executing this program on a computer, the processing means in each device is realized on the
04-05-2019
14
computer.
[0051]
The program describing the processing content can be recorded in a computer readable
recording medium. As the computer readable recording medium, any medium such as a magnetic
recording device, an optical disc, a magneto-optical recording medium, a semiconductor memory,
etc. may be used. Specifically, for example, as a magnetic recording device, a hard disk drive, a
flexible disk, a magnetic tape or the like as an optical disk, a DVD (Digital Versatile Disc), a DVDRAM (Random Access Memory), a CD-ROM (Compact Disc Read Only) Memory), CD-R
(Recordable) / RW (Rewritable), etc., a magneto-optical recording medium, MO (Magneto Optical
Disc), etc., and a semiconductor memory such as a flash memory can be used.
[0052]
Further, this program is distributed, for example, by selling, transferring, lending, etc. a portable
recording medium such as a DVD, a CD-ROM or the like in which the program is recorded.
Furthermore, the program may be stored in a storage device of a server computer, and the
program may be distributed by transferring the program from the server computer to another
computer via a network.
[0053]
Further, each means may be configured by executing a predetermined program on a computer,
or at least a part of the processing content may be realized as hardware.
04-05-2019
15
Документ
Категория
Без категории
Просмотров
0
Размер файла
26 Кб
Теги
jp2010175431
1/--страниц
Пожаловаться на содержимое документа