close

Вход

Забыли?

вход по аккаунту

?

JPH09261792

код для вставкиСкачать
Patent Translate
Powered by EPO and Google
Notice
This translation is machine-generated. It cannot be guaranteed that it is intelligible, accurate,
complete, reliable or fit for specific purposes. Critical decisions, such as commercially relevant or
financial decisions, should not be based on machine-translation output.
DESCRIPTION JPH09261792
[0001]
The present invention relates to an output of each microphone when a plurality of microphones
are used to receive a focus (point to be noted) in a reverberant sound field such as a concert hall
or a speech communication conference. The present invention relates to a sound receiving
method called delay sum array, and its apparatus for giving a time delay to a signal according to
the distance from a focal point to each microphone and extracting the output sum of each
microphone.
[0002]
2. Description of the Related Art FIG. 1 is a diagram for explaining the principle of a delay-andsum array having two sets of circumferentially-arranged microphone arrays, and also a diagram
showing the configuration of the present invention. In FIG. 1, 11 and 12 are circumferentially
arranged microphone arrays, 21 and 22 are microphone holding frames, 31, 32, ..., 3m, 3m + 1,
3m + 2, ... 3M are microphones, 41, 42, ... 4M are The delay units 51, 52,..., 5M are multipliers,
and 61, 62,.
[0003]
The delay units 41, 42,... 4M shown in FIG. 1 add a delay amount Dik represented by the
following equation to the received signal. Dik = D0-τik i = 1, 2, ... M; k = 1, 2, ... N ... (1) τik = rik
03-05-2019
1
/ c ... (2) where M is the number of microphones and N is the microphones The number of focal
points in the array, rik is the distance from the kth focal point to the ith microphone, and c is the
speed of sound. D0 is a fixed delay amount to be added to prevent the accuracy in realizing the
delay characteristic with a digital filter due to the value of Dik being too small.
[0004]
Here, a target signal from a target sound source is represented as s (t). At this time, if the target
signal received by the i-th microphone is expressed as xsi (t), then using the distance attenuation
1 / rsi of the sound wave and the propagation time τsi, xsi (t) = (1 / rsi) s ( t-τsi) ...... (3) Here,
rsi represents the distance from the target sound source to each microphone, and τsi = rsi / c
represents the propagation time.
[0005]
A delay Dik is added to each sound receiving signal xsi in the delay unit 4i. The result is xsi (tDik), and from the equations (1) and (3), xsi (t-Dik) = (1 / rsi) s (t-τsi-D0 + τik) ... (4). Here,
assuming that the target sound source is at the k-th focal position, rsi = rik, τsi = τik, and xsi
(t−Dik) = (1 / rik) s (t−D0) (5) . As can be seen from this equation, the outputs xsi (t-Dik), i = 1,
2,..., M from the delay units 41, 42,... 4M become signals of the same phase regardless of the
microphone number i. I understand that In other words, it is understood that the time difference
between the signals coming from the focal position is corrected and in phase by this delay
operation. .. 5M and the adders 61, 62,... 6N add to enhance the sound coming from the focal
point. On the other hand, sound coming from a direction different from the focal point is received
with a propagation time τNi different from τik. Therefore, in the delay operation based on Dik
expressed by the equation (5), the signals are not in phase, and the outputs of the delay elements
have waveforms which are shifted in time, and the emphasizing effect is small even if they are
added. As a result of the above, the delay-and-sum array forms a directional pattern with high
sensitivity only in the focal direction.
[0006]
By the way, in this first conventional delay-and-sum array, even if a point away from the focal
point, if the microphone is disposed near that point, the sensitivity to that point is increased, and
the noise source is located near that point There is a problem that the SN ratio is degraded if the
03-05-2019
2
That is, the problem of sensitivity increase at the non-focus position. For this problem, “do not
use the microphone output away from the focal point, or reduce the load when adding the
microphone outputs away from the focal point. The solution was tried on the basis of As a result,
it was shown that the sensitivity can be prevented from increasing at the non-focus position by
weighting in the multipliers 51, 52,... 5M by the reciprocal of the mth power of the distance from
the focus to each microphone. 2-19,941 "Sound receiving method and apparatus").
[0007]
However, this second conventional method has the following problems. That is, a sensitivity
difference with respect to the sound source at the focal position occurred. Specifically, the
sensitivity to the focal point at a position near the microphone array increased, and the
sensitivity to a focal point at the distant position decreased. Next, this will be described based on
experimental data.
[0008]
In the experiment, the two microphone arrays (the total number of microphones: M = 32) shown
in FIG. 1 were suspended from the ceiling in a room having a room volume of 86 m 3 and a
reverberation time of 0.2 seconds. Then, 28 focal points were set on a plane 1.1 m below the
microphone array. Specifically, a grid of 0.67 m × 0.8 m in height was drawn on a plane of 2 m
in width and 4.8 m in height, and the focus of the microphone array was set at 4 × 7 = 28
locations. The horizontal direction of this grid plane is taken as the X coordinate, and the vertical
direction as the Y coordinate. Next, a speaker is placed at the position of the first focal point ps1
(X, Y coordinate = [2, 2]) or the second focal point ps 2 (X, Y coordinate = [3, 3]), and phos noise
is It was generated at the same volume. Then, the power of the output signals y1 (t), y2 (t), ..., y28
(t) of the microphone array when focal points are formed at the above 28 locations is measured,
and the results are contoured for each yk (t). Indicated. However, each microphone gain
(multiplication coefficient) gik of the multipliers 51, 52,... 5M is a focal point as shown in the
second conventional method (Japanese Patent Application No. 6-219941 "Sound receiving
method and apparatus"). Are weighted by the reciprocal of the distance rik from each to each
microphone (gik = 1 / rik).
[0009]
03-05-2019
3
FIG. 5 shows the power distribution determined from y1 (t) when the sound source is placed at
the first focal point ps1, and FIG. 6 is derived from y2 (t) when the sound source is placed at the
second focal point ps2 Power distribution. As is clear from both figures, the output of the
microphone array is the highest at a certain focal point of the sound source. However, when the
sound source is placed at the first focal point ps1, the value of output power is about -24 dB (FIG.
5), and when the sound source is placed at the second focal point ps2, the value of output power
is about -28 dB Yes (Figure 6). That is, it can be seen that there is a level difference of about 4 dB
between the sound source positions ps1 and ps2. As described above, in the conventional
method, there is a problem that the output level of the array is different depending on the focal
point where the sound source is placed, even if the magnitude of the sound emitted from the
sound source is the same. In practice, this problem causes (i) volume change when focusing on
speakers with different positions, and (ii) an obstacle to the detection of the sound source
position.
[0010]
An object of the present invention is to solve the above-mentioned drawbacks of the conventional
delay-and-sum array device, and to pick up a signal with a high signal-to-noise ratio at the same
level regardless of the focal position where the sound source is present.
[0011]
(1) The invention according to claim 1 is a method of receiving a sound from a focal point using a
plurality of microphones, wherein an output signal of an ith microphone is: A time delay is given
according to the distance from the focal point to the i-th microphone, and the reciprocal of the
power of the distance is multiplied, and the multiplied signals are added and placed at the focal
point included in the added signal The addition result is normalized according to the sum of the
power of the direct sound component of the selected sound source and the power of the
reverberation sound component and then output.
[0012]
(2) The sound receiving apparatus according to claim 2 comprises a plurality of microphones (i =
1, 2,..., M) for receiving the sound from the focal point, the output signals of the respective
microphones, the microphones from the focal point Delay means for delaying in accordance with
the distance (ri) to the end, multiplication means for multiplying the output signal of each of the
microphones by the reciprocal (ri-m) of the power of the distance (ri), the multiplication means
and delay means Adding means for adding the output signal of each of the microphones
processed by the method, and normalizing the signal according to the sum of the power of the
direct sound component and the power of the reverberation component of the sound source
placed at the focal position included in the signal And normalization means for
03-05-2019
4
[0013]
(3) According to the invention of claim 3, in (2), the delay time Di (i = 1, 2,..., M) by the delay
means is Di = D0−τi; τi = ri / c (D0 is fixed. The delay is set to c).
(4) The invention according to claim 4 is characterized in that, in the above (2) and (3), assuming
that the critical distance in the room is rC, the normalization coefficient √C in the normalization
means is √C = √ [{Σ j = 1 M (4) It is set that 1 / rjm + 1)} 2+ (1 / rc2) {(j = 1M (1 / rj2m)}].
[0014]
(5) The invention according to claim 5 is characterized in that in (2), (3) and (4), means for
determining a desired sound source position from a plurality of the normalized signals
corresponding to each of a plurality of focal points; And means for selecting and outputting one
or a plurality of signals from the plurality of signals based on the determination result.
[0015]
DESCRIPTION OF THE PREFERRED EMBODIMENTS (1) Analysis of Problem The abovementioned problems are quantitatively analyzed.
In FIG. 1, consider the case where M microphones are attached to two sets of circumferentiallyarranged microphone arrays, and a delay Dik is added to form a focal point at the k-th position.
When the sound source is placed at the k-th focal position, the output signal of the microphone is
expressed by the following equation from equation (5).
[0016]
yk (t) = .SIGMA.i = 1 M gikxsi (t-Dik) =. SIGMA.i = 1 M (gik / rik) s (t-D0) (6) In the second
conventional method, in equation (6), In order to increase the signal-to-noise ratio of the output
03-05-2019
5
signal of the microphone, each microphone gain (multiplication coefficient) gik is an inverse rikm of the m th power of the distance rik from the k-th focus to the i-th microphone Weighting was
performed as m ≦ 3).
Therefore, the following equation is obtained by substituting gik = 1 / rikm into the equation (6).
[0017]
yk (t) = .SIGMA.i = 1 M (1 / rikm + 1) s (t-D0) (7) Therefore, a microphone array when the focal
point is formed at the k-th position and the sound source is at the focal position The mean square
value (power) of the absolute value of the output of can be expressed by the following equation. |
Yk (t) | 2 AV = | Σi = 1 M (1 / rikm + 1) s (t−D0) | 2 AV = | Σi = 1 M (1 / rikm + 1) | 2 | s2
(t−D0 ) | AV ... (8) The subscript AV of the equation (8) represents a time average. From the
result of the equation (8), it can be seen that the output power of the microphone array changes
in proportion to | Σi = 1 M (1 / rikm + 1) | 2 depending on the focal position.
[0018]
(2) Solution to the Problem Therefore, we propose a method to obtain a constant microphone
output regardless of the focal position of the sound source. From equation (8), the output power
of the microphone array | yk (t) | 2AV is a function of the distance from the focal point to the
microphone | ii = 1 M (1 / rikm + 1) | 2 and the source signal It is given by the term of | s2
(t−D0) | AV which is a function of. Here, | s2 (t−D0) | AV represents the sound source signal
power, which is a constant value regardless of the focal position of the sound source. Therefore,
in order to make | yk (t) | 2 AV constant regardless of the focal position, if equation (8) is
normalized by 1 / | Σi = 1 M (1 / rikm + 1) | I understand that it is good. Then, for that purpose,
each microphone gain (multiplication coefficient) gik may be set as the following equation.
[0019]
Here, M represents the number of microphones, and N represents the number of focal points of
the microphone array. When the power of yk (t) is calculated by substituting the microphone
gain gik newly defined by the equation (9) into the equation (6), the following equation is
obtained.
03-05-2019
6
[0020]
As is clear from the equation (10), regardless of the position of the focal point of the sound
source, the output signal power of the microphone array becomes | s2 (t−D0) | AV which is a
function of only the sound source signal and becomes constant. (3) Application to a room with a
lot of reverberation In accordance with the solution to the problem in the above item (2), using
the two sets of circumferentially-arranged microphone arrays shown in FIG. went. When the
sound source is at the k-th focal position, the microphone gain gik is set according to equation
(9) so that the signal-to-noise ratio of the microphone array is maximum and the output power is
constant. As a result, in free space without reverberation, the signal-to-noise ratio of the
microphone array is maximum and the output power is constant regardless of the focal position
at which the sound source is located, and good sound source detection can be realized. However,
this method did not work well for sound source detection in a reverberant room. Therefore, the
idea of the critical distance rC in the room was introduced, and the microphone gain gik was
devised as in the following equation.
[0021]
Here, the critical distance rC in the room means the distance until the direct sound power of the
sound source and the reverberation sound power become equal. This critical distance r c is given
by r c = √ (0.0032 V / T). Here, V represents a chamber volume, and T represents a
reverberation time in the chamber. Now, in consideration of the reverberation component, the
sound reception signal xsi (t) of each microphone expressed by equation (3) is expressed by the
following equation.
[0022]
xsi (t) = (1 / rsi) s (t-τsi) + vi (t) (11) where vi (t) represents a reverberation component received
by the ith microphone. An output signal yk (t) obtained by summing each sound reception signal
expressed by the equation (11) by multiplying the delay Dik and the weighting coefficient
(multiplication coefficient) gik (= 1 / rikm) gives the equation (6) Similarly, yk (t) =. SIGMA.i = 1M
gikxsi (t-Dik) =. SIGMA.i = 1M (1 / rikm) {1 / rik) s (t-D0) + vi (t-Dik)}... (12) Here, it is assumed
that the direct sound component s (t−D0) of the target signal and the reverberation component
vi (t−Dik), i = 1, 2,. . That is, [s (t−D0) · vi (t−Dik)] AV = 0 i = 1, 2,..., M; [vi (t−Dik) · vj (t−Djk)]
03-05-2019
7
AV = 0 j = 1, 2, ..., M ...... (13). Also, assuming that the power of each reverberation component is
equal to Pq, the power of the target signal is denoted as PS. That is, Pq = | vi2 (t−Dik) | AV; PS =
| s2 (t−D0) | AV (14) At this time, calculating the power (square mean) of the array output yk (t)
, | Yk (t) | 2 AV = | Σi = 1 M {(1 / rikm + 1) s (t−D0) + (1 / rikm) × vi (t−Dik)} | 2 AV = | Σi = 1
M (1 / rikm + 1) | 2 | s 2 (t-D 0) | AV + | i i = 1 M (1 / rik 2 m) | [vi 2 (t-Dik)] AV = | i i = 1 M (1 /
rikm + 1) | 2 Ps + | i i = 1 M (1 / rik 2 m) | P q (15) Here, since the power of the reverberation
sound is equal to the power of the direct sound at the critical distance rC, Pq = (1 / rC2) PS (16)
holds. From this, | yk (t) | 2 = [{i i = 1 M (1 / rik m 1)} 2 + (1 / rc 2) × {i i = 1 M (1 / rik 2 m)}] P s
...... ( 17) That is, the power of the array output in consideration of the reverberation component
is Ck = {.SIGMA.j = 1 M (1 / rjkm + 1)} 2 + (1 / rc2) using the focal distance between microphones
rjK and the critical distance rC. X {Σ j = 1 M (1 / r jk 2 m)} ...... (18) It turned out that it is
proportional. From this equation, the output power of the array depends on the value of rjK.
That is, it can be seen that it depends on the focal position. In order to prevent this, the weighting
coefficient gik may be normalized by ikCk. That is, gik may be newly determined as gik = (1 /
√Ck) (1 / rikm) (19). At this time, the output power is | yk (t) | 2 AV = | (1 / √Ck) ii = 1M {(1 /
rikm + 1) s (t−D0) + (1 / rikm) vi (t−Dik ) | 2 AV = (1 / Ck) [{j j = 1 M (1 / r j km 1)} 2 + (1 / rc
2) × {j j = 1 M (1 / r jk 2 m)}] PS = PS ... (20 ) And become independent of the focal position.
[0023]
The first term of the normalization coefficient CK shown in the above equation (18) represents
the power of the direct sound component of the sound source placed at the focal position
included in the array output when (1 / rikm) is a weighting coefficient. , The second term
represents the power of the reverberation component. In a sound field where reverberation is
small, such as in free space, the critical distance rc is ∞, and the second term of equation (18)
becomes zero, and the weight coefficient in free space expressed by equations (19) and (9) is It
turns out that it matches. Since 1 / √Ck is common to all i in the equation (19), yk (t)
synthesized with gik = 1 / rikm is divided by CCk to perform normalization at all. It becomes
equivalent.
[0024]
(4) Embodiment FIGS. 1 and 4 show an embodiment of the present invention. However, in the
present invention, the multiplication coefficient gik of FIG. 1 is different from that of the abovementioned proposed one, and gik = (1 / √Ck) (1 / rikm). Here, Ck is a constant. FIG. 4 is a block
03-05-2019
8
diagram of a delay and sum array (claim 5) having an automatic sound source position detecting
function. In FIG. 4, 8 is a microphone array having M microphones, 9 is a delay unit and a
multiplication unit, 10 is an addition unit, 11 is a sound source position detection unit, 12 is a
signal selection unit, and 13 is an output signal of the microphone array. The device of FIG. 4
operates as follows.
[0025]
First, the microphone array 8 receives the sound in the sound field. Next, signal processing (delay
sum) for forming a focus at the k-th position (where k = 1, 2,... N) is performed on this sound
reception signal. Specifically, the time delay is delayed by 9 according to the distance rik from
the sound source to each microphone so that all the signals received and generated from the k-th
focal position are added in the same phase. Give (however, i = 1, 2, ... M). Furthermore, the
microphone gain gik shown in equation (19) is applied to the input signal by the multiplication
unit 9 so that the output power is constant regardless of the focal position. The 10 adders add
the M input signals to combine the output yk (t). Similarly, the outputs y1 (t), y2 (t),..., YN (t)
when focusing on N focal positions ps1, ps2,. Do.
[0026]
These N output signals are input to 11 sound source position detection units and 12 signal
selection units, respectively. The sound source position detection unit 11 performs sound source
position detection using the N output signals y 1 (t), y 2 (t),..., Y N (t). Although various methods
can be considered as a method of sound source position detection, here, the focal point number
corresponding to the signal ykmax (t) having the largest power among y1 (t), y2 (t), ..., yN (t)
Determine kmax as the focal point where the sound source exists. Next, the focal point number
kmax at which the sound source is detected by the sound source position detection unit is input
to the signal selection unit 12. The signal selection unit 12 uses the kmax from the sound source
position detection unit 11 to select ykmax (t) from among the output signals y1 (t), y2 (t), ..., yN
(t) to select the sound receiving device. It will be output.
[0027]
In the embodiment of FIG. 4 described above, the sound source with the largest sound is
considered as the target sound source, and the focus is on this to realize the reception of a high
03-05-2019
9
SN ratio. In such a system, when a plurality of sounds, for example, a target signal source and a
noise source, are present in a target sound field, it is important to accurately locate the target
signal source. However, in the conventional delay-sum array method, the output power is
different when the noise source (such as an air conditioner) is close to the array because the
sensitivity is different depending on the focal position (the sensitivity to the focal position close
to the array is increased). As a result, in spite of the fact that the power of the noise source is
actually small, this may be misjudged as a target signal source. On the other hand, when the
present invention is applied, it becomes possible to accurately measure the magnitude of the
sound generated from the sound source regardless of the focal position, so that the sound source
position that emits the largest sound can be accurately grasped By focusing on that position,
good high SN ratio sound reception becomes possible.
[0028]
(5) Experimental results In the experiment, in a room with a room volume of 86 m3 and a
reverberation time of 0.2 seconds, the two sets of circumferentially-arranged microphone arrays
(total number of microphones: M = 32) shown in FIG. I lowered it. Then, 28 focal points were set
on a plane 1.1 m below the microphone array. Specifically, a grid of 0.67 m × 0.8 m in height
was drawn on a plane of 2 m in width and 4.8 m in height, and the focus of the microphone array
was set at 4 × 7 = 28 locations. The horizontal direction of this grid plane is taken as X
coordinate, and the vertical direction as Y coordinate. Next, a speaker is placed at the position of
the first focal point ps1 (X, Y coordinate = [2, 2]) or the second focal point ps 2 (X, Y coordinate =
[3, 3]), and phos noise is It was generated at the same volume. Then, the powers of the output
signals y1 (t), y2 (t), ..., y28 (t) of the microphone array when focal points were formed at the
above 28 locations were measured, and the results were shown by contour lines. However, the
microphone gains gik of the multipliers 51, 52,... 5M are weighted as in the equation (19)
according to the method of the present invention (where m = 1).
[0029]
Fig. 2 shows the power distribution determined from y1 (t) when the sound source is placed at
the first focal point ps1, and Fig. 3 is derived from y2 (t) when the sound source is placed at the
second focal point ps2. Power distribution. As is clear from both figures, the output of the
microphone array is the highest at a certain focal point of the sound source. Also, the value of the
output power when the sound source is disposed at the first focal point ps1 is about -22 dB, and
the value of the output power is also about -22 dB similarly when the sound source is disposed at
the second focal point ps2 The Thus, it can be seen that the power of the output of the present
03-05-2019
10
array system is equal in both cases where the sound source position is at ps1 and at ps2. From
this, it was confirmed that the microphone output of the same level was obtained regardless of
the focal position of the sound source.
[0030]
In the present invention, as a weighting coefficient of each microphone output, the reciprocal of
the m-th power of distance from the sound source to each microphone (ri -m) (where 1 m m 3 3)
[Japanese Patent Application 6-219941 "Sound receiving method and The apparatus is
characterized in that normalization is performed with the square root of Ck in equation (18). As a
result, even in a reverberant room, a microphone output with high sensitivity and the same
volume can be obtained regardless of the focal position of the sound source.
[0031]
The above normalization is performed based on the power sum of the direct sound component
and the reverberation component of the sound source placed at the focal position included in the
array output when (1 / rikm) is a weighting coefficient. This is very important. Therefore, the
effect equivalent to the present invention can be realized even if CK is not the equation (18) itself
but an approximate amount having an equivalent value is used.
[0032]
As described above, when voices and musical tones are picked up using a plurality of
microphones in a reverberation sound field such as a concert hall or a loud speech
communication conference, the output signals of the respective microphones are all added in the
same phase and summed. To give a time delay in accordance with the distance ri from the sound
source to each microphone, and the reciprocal of the mth power of the distance from the sound
source to each microphone ri -m (where 1 m m ≦ 3), If the output signal of each microphone is
normalized by the square root of CK in (18) and the output sum is taken out, the target sound
with high sensitivity and constant volume is obtained regardless of the focal position of the
sound source even in a reverberant room It can be picked up.
[0033]
Brief description of the drawings
03-05-2019
11
[0034]
1 is a block diagram showing the configuration of an embodiment of the present invention and a
conventional sound receiving device.
[0035]
The figure which shows an example of the power distribution calculated | required using output
signal y1 (t) in case the sound source is in focus ps1 in the invention apparatus of FIG. 1 using
the multiplication coefficient of FIG. 2 Formula (19).
[0036]
3 is a diagram showing a power distribution obtained using the output signal y 2 (t) when there
is a sound source at the focal point ps 2 in the inventive device of FIG. 1 using the multiplication
coefficient of the equation (19).
[0037]
4 is a block diagram showing an embodiment of claim 5.
[0038]
5 is a diagram showing a power distribution obtained using the output signal y1 (t) when there is
a sound source at the focal point ps1 in the conventional device of FIG. 1 with the multiplication
coefficient gik = 1 / rikm (but m = 1).
[0039]
6 is a diagram showing a power distribution obtained using the output signal y2 (t) when there is
a sound source at the focal point ps2 in the conventional device of FIG. 1 with the multiplication
coefficient gik = 1 / rikm (but m = 1).
03-05-2019
12
Документ
Категория
Без категории
Просмотров
0
Размер файла
25 Кб
Теги
jph09261792
1/--страниц
Пожаловаться на содержимое документа