Patent Translate Powered by EPO and Google Notice This translation is machine-generated. It cannot be guaranteed that it is intelligible, accurate, complete, reliable or fit for specific purposes. Critical decisions, such as commercially relevant or financial decisions, should not be based on machine-translation output. DESCRIPTION JPH09261792 [0001] The present invention relates to an output of each microphone when a plurality of microphones are used to receive a focus (point to be noted) in a reverberant sound field such as a concert hall or a speech communication conference. The present invention relates to a sound receiving method called delay sum array, and its apparatus for giving a time delay to a signal according to the distance from a focal point to each microphone and extracting the output sum of each microphone. [0002] 2. Description of the Related Art FIG. 1 is a diagram for explaining the principle of a delay-andsum array having two sets of circumferentially-arranged microphone arrays, and also a diagram showing the configuration of the present invention. In FIG. 1, 11 and 12 are circumferentially arranged microphone arrays, 21 and 22 are microphone holding frames, 31, 32, ..., 3m, 3m + 1, 3m + 2, ... 3M are microphones, 41, 42, ... 4M are The delay units 51, 52,..., 5M are multipliers, and 61, 62,. [0003] The delay units 41, 42,... 4M shown in FIG. 1 add a delay amount Dik represented by the following equation to the received signal. Dik = D0-τik i = 1, 2, ... M; k = 1, 2, ... N ... (1) τik = rik 03-05-2019 1 / c ... (2) where M is the number of microphones and N is the microphones The number of focal points in the array, rik is the distance from the kth focal point to the ith microphone, and c is the speed of sound. D0 is a fixed delay amount to be added to prevent the accuracy in realizing the delay characteristic with a digital filter due to the value of Dik being too small. [0004] Here, a target signal from a target sound source is represented as s (t). At this time, if the target signal received by the i-th microphone is expressed as xsi (t), then using the distance attenuation 1 / rsi of the sound wave and the propagation time τsi, xsi (t) = (1 / rsi) s ( t-τsi) ...... (3) Here, rsi represents the distance from the target sound source to each microphone, and τsi = rsi / c represents the propagation time. [0005] A delay Dik is added to each sound receiving signal xsi in the delay unit 4i. The result is xsi (tDik), and from the equations (1) and (3), xsi (t-Dik) = (1 / rsi) s (t-τsi-D0 + τik) ... (4). Here, assuming that the target sound source is at the k-th focal position, rsi = rik, τsi = τik, and xsi (t−Dik) = (1 / rik) s (t−D0) (5) . As can be seen from this equation, the outputs xsi (t-Dik), i = 1, 2,..., M from the delay units 41, 42,... 4M become signals of the same phase regardless of the microphone number i. I understand that In other words, it is understood that the time difference between the signals coming from the focal position is corrected and in phase by this delay operation. .. 5M and the adders 61, 62,... 6N add to enhance the sound coming from the focal point. On the other hand, sound coming from a direction different from the focal point is received with a propagation time τNi different from τik. Therefore, in the delay operation based on Dik expressed by the equation (5), the signals are not in phase, and the outputs of the delay elements have waveforms which are shifted in time, and the emphasizing effect is small even if they are added. As a result of the above, the delay-and-sum array forms a directional pattern with high sensitivity only in the focal direction. [0006] By the way, in this first conventional delay-and-sum array, even if a point away from the focal point, if the microphone is disposed near that point, the sensitivity to that point is increased, and the noise source is located near that point There is a problem that the SN ratio is degraded if the 03-05-2019 2 That is, the problem of sensitivity increase at the non-focus position. For this problem, “do not use the microphone output away from the focal point, or reduce the load when adding the microphone outputs away from the focal point. The solution was tried on the basis of As a result, it was shown that the sensitivity can be prevented from increasing at the non-focus position by weighting in the multipliers 51, 52,... 5M by the reciprocal of the mth power of the distance from the focus to each microphone. 2-19,941 "Sound receiving method and apparatus"). [0007] However, this second conventional method has the following problems. That is, a sensitivity difference with respect to the sound source at the focal position occurred. Specifically, the sensitivity to the focal point at a position near the microphone array increased, and the sensitivity to a focal point at the distant position decreased. Next, this will be described based on experimental data. [0008] In the experiment, the two microphone arrays (the total number of microphones: M = 32) shown in FIG. 1 were suspended from the ceiling in a room having a room volume of 86 m 3 and a reverberation time of 0.2 seconds. Then, 28 focal points were set on a plane 1.1 m below the microphone array. Specifically, a grid of 0.67 m × 0.8 m in height was drawn on a plane of 2 m in width and 4.8 m in height, and the focus of the microphone array was set at 4 × 7 = 28 locations. The horizontal direction of this grid plane is taken as the X coordinate, and the vertical direction as the Y coordinate. Next, a speaker is placed at the position of the first focal point ps1 (X, Y coordinate = [2, 2]) or the second focal point ps 2 (X, Y coordinate = [3, 3]), and phos noise is It was generated at the same volume. Then, the power of the output signals y1 (t), y2 (t), ..., y28 (t) of the microphone array when focal points are formed at the above 28 locations is measured, and the results are contoured for each yk (t). Indicated. However, each microphone gain (multiplication coefficient) gik of the multipliers 51, 52,... 5M is a focal point as shown in the second conventional method (Japanese Patent Application No. 6-219941 "Sound receiving method and apparatus"). Are weighted by the reciprocal of the distance rik from each to each microphone (gik = 1 / rik). [0009] 03-05-2019 3 FIG. 5 shows the power distribution determined from y1 (t) when the sound source is placed at the first focal point ps1, and FIG. 6 is derived from y2 (t) when the sound source is placed at the second focal point ps2 Power distribution. As is clear from both figures, the output of the microphone array is the highest at a certain focal point of the sound source. However, when the sound source is placed at the first focal point ps1, the value of output power is about -24 dB (FIG. 5), and when the sound source is placed at the second focal point ps2, the value of output power is about -28 dB Yes (Figure 6). That is, it can be seen that there is a level difference of about 4 dB between the sound source positions ps1 and ps2. As described above, in the conventional method, there is a problem that the output level of the array is different depending on the focal point where the sound source is placed, even if the magnitude of the sound emitted from the sound source is the same. In practice, this problem causes (i) volume change when focusing on speakers with different positions, and (ii) an obstacle to the detection of the sound source position. [0010] An object of the present invention is to solve the above-mentioned drawbacks of the conventional delay-and-sum array device, and to pick up a signal with a high signal-to-noise ratio at the same level regardless of the focal position where the sound source is present. [0011] (1) The invention according to claim 1 is a method of receiving a sound from a focal point using a plurality of microphones, wherein an output signal of an ith microphone is: A time delay is given according to the distance from the focal point to the i-th microphone, and the reciprocal of the power of the distance is multiplied, and the multiplied signals are added and placed at the focal point included in the added signal The addition result is normalized according to the sum of the power of the direct sound component of the selected sound source and the power of the reverberation sound component and then output. [0012] (2) The sound receiving apparatus according to claim 2 comprises a plurality of microphones (i = 1, 2,..., M) for receiving the sound from the focal point, the output signals of the respective microphones, the microphones from the focal point Delay means for delaying in accordance with the distance (ri) to the end, multiplication means for multiplying the output signal of each of the microphones by the reciprocal (ri-m) of the power of the distance (ri), the multiplication means and delay means Adding means for adding the output signal of each of the microphones processed by the method, and normalizing the signal according to the sum of the power of the direct sound component and the power of the reverberation component of the sound source placed at the focal position included in the signal And normalization means for 03-05-2019 4 [0013] (3) According to the invention of claim 3, in (2), the delay time Di (i = 1, 2,..., M) by the delay means is Di = D0−τi; τi = ri / c (D0 is fixed. The delay is set to c). (4) The invention according to claim 4 is characterized in that, in the above (2) and (3), assuming that the critical distance in the room is rC, the normalization coefficient √C in the normalization means is √C = √ [{Σ j = 1 M (4) It is set that 1 / rjm + 1)} 2+ (1 / rc2) {(j = 1M (1 / rj2m)}]. [0014] (5) The invention according to claim 5 is characterized in that in (2), (3) and (4), means for determining a desired sound source position from a plurality of the normalized signals corresponding to each of a plurality of focal points; And means for selecting and outputting one or a plurality of signals from the plurality of signals based on the determination result. [0015] DESCRIPTION OF THE PREFERRED EMBODIMENTS (1) Analysis of Problem The abovementioned problems are quantitatively analyzed. In FIG. 1, consider the case where M microphones are attached to two sets of circumferentiallyarranged microphone arrays, and a delay Dik is added to form a focal point at the k-th position. When the sound source is placed at the k-th focal position, the output signal of the microphone is expressed by the following equation from equation (5). [0016] yk (t) = .SIGMA.i = 1 M gikxsi (t-Dik) =. SIGMA.i = 1 M (gik / rik) s (t-D0) (6) In the second conventional method, in equation (6), In order to increase the signal-to-noise ratio of the output 03-05-2019 5 signal of the microphone, each microphone gain (multiplication coefficient) gik is an inverse rikm of the m th power of the distance rik from the k-th focus to the i-th microphone Weighting was performed as m ≦ 3). Therefore, the following equation is obtained by substituting gik = 1 / rikm into the equation (6). [0017] yk (t) = .SIGMA.i = 1 M (1 / rikm + 1) s (t-D0) (7) Therefore, a microphone array when the focal point is formed at the k-th position and the sound source is at the focal position The mean square value (power) of the absolute value of the output of can be expressed by the following equation. | Yk (t) | 2 AV = | Σi = 1 M (1 / rikm + 1) s (t−D0) | 2 AV = | Σi = 1 M (1 / rikm + 1) | 2 | s2 (t−D0 ) | AV ... (8) The subscript AV of the equation (8) represents a time average. From the result of the equation (8), it can be seen that the output power of the microphone array changes in proportion to | Σi = 1 M (1 / rikm + 1) | 2 depending on the focal position. [0018] (2) Solution to the Problem Therefore, we propose a method to obtain a constant microphone output regardless of the focal position of the sound source. From equation (8), the output power of the microphone array | yk (t) | 2AV is a function of the distance from the focal point to the microphone | ii = 1 M (1 / rikm + 1) | 2 and the source signal It is given by the term of | s2 (t−D0) | AV which is a function of. Here, | s2 (t−D0) | AV represents the sound source signal power, which is a constant value regardless of the focal position of the sound source. Therefore, in order to make | yk (t) | 2 AV constant regardless of the focal position, if equation (8) is normalized by 1 / | Σi = 1 M (1 / rikm + 1) | I understand that it is good. Then, for that purpose, each microphone gain (multiplication coefficient) gik may be set as the following equation. [0019] Here, M represents the number of microphones, and N represents the number of focal points of the microphone array. When the power of yk (t) is calculated by substituting the microphone gain gik newly defined by the equation (9) into the equation (6), the following equation is obtained. 03-05-2019 6 [0020] As is clear from the equation (10), regardless of the position of the focal point of the sound source, the output signal power of the microphone array becomes | s2 (t−D0) | AV which is a function of only the sound source signal and becomes constant. (3) Application to a room with a lot of reverberation In accordance with the solution to the problem in the above item (2), using the two sets of circumferentially-arranged microphone arrays shown in FIG. went. When the sound source is at the k-th focal position, the microphone gain gik is set according to equation (9) so that the signal-to-noise ratio of the microphone array is maximum and the output power is constant. As a result, in free space without reverberation, the signal-to-noise ratio of the microphone array is maximum and the output power is constant regardless of the focal position at which the sound source is located, and good sound source detection can be realized. However, this method did not work well for sound source detection in a reverberant room. Therefore, the idea of the critical distance rC in the room was introduced, and the microphone gain gik was devised as in the following equation. [0021] Here, the critical distance rC in the room means the distance until the direct sound power of the sound source and the reverberation sound power become equal. This critical distance r c is given by r c = √ (0.0032 V / T). Here, V represents a chamber volume, and T represents a reverberation time in the chamber. Now, in consideration of the reverberation component, the sound reception signal xsi (t) of each microphone expressed by equation (3) is expressed by the following equation. [0022] xsi (t) = (1 / rsi) s (t-τsi) + vi (t) (11) where vi (t) represents a reverberation component received by the ith microphone. An output signal yk (t) obtained by summing each sound reception signal expressed by the equation (11) by multiplying the delay Dik and the weighting coefficient (multiplication coefficient) gik (= 1 / rikm) gives the equation (6) Similarly, yk (t) =. SIGMA.i = 1M gikxsi (t-Dik) =. SIGMA.i = 1M (1 / rikm) {1 / rik) s (t-D0) + vi (t-Dik)}... (12) Here, it is assumed that the direct sound component s (t−D0) of the target signal and the reverberation component vi (t−Dik), i = 1, 2,. . That is, [s (t−D0) · vi (t−Dik)] AV = 0 i = 1, 2,..., M; [vi (t−Dik) · vj (t−Djk)] 03-05-2019 7 AV = 0 j = 1, 2, ..., M ...... (13). Also, assuming that the power of each reverberation component is equal to Pq, the power of the target signal is denoted as PS. That is, Pq = | vi2 (t−Dik) | AV; PS = | s2 (t−D0) | AV (14) At this time, calculating the power (square mean) of the array output yk (t) , | Yk (t) | 2 AV = | Σi = 1 M {(1 / rikm + 1) s (t−D0) + (1 / rikm) × vi (t−Dik)} | 2 AV = | Σi = 1 M (1 / rikm + 1) | 2 | s 2 (t-D 0) | AV + | i i = 1 M (1 / rik 2 m) | [vi 2 (t-Dik)] AV = | i i = 1 M (1 / rikm + 1) | 2 Ps + | i i = 1 M (1 / rik 2 m) | P q (15) Here, since the power of the reverberation sound is equal to the power of the direct sound at the critical distance rC, Pq = (1 / rC2) PS (16) holds. From this, | yk (t) | 2 = [{i i = 1 M (1 / rik m 1)} 2 + (1 / rc 2) × {i i = 1 M (1 / rik 2 m)}] P s ...... ( 17) That is, the power of the array output in consideration of the reverberation component is Ck = {.SIGMA.j = 1 M (1 / rjkm + 1)} 2 + (1 / rc2) using the focal distance between microphones rjK and the critical distance rC. X {Σ j = 1 M (1 / r jk 2 m)} ...... (18) It turned out that it is proportional. From this equation, the output power of the array depends on the value of rjK. That is, it can be seen that it depends on the focal position. In order to prevent this, the weighting coefficient gik may be normalized by ikCk. That is, gik may be newly determined as gik = (1 / √Ck) (1 / rikm) (19). At this time, the output power is | yk (t) | 2 AV = | (1 / √Ck) ii = 1M {(1 / rikm + 1) s (t−D0) + (1 / rikm) vi (t−Dik ) | 2 AV = (1 / Ck) [{j j = 1 M (1 / r j km 1)} 2 + (1 / rc 2) × {j j = 1 M (1 / r jk 2 m)}] PS = PS ... (20 ) And become independent of the focal position. [0023] The first term of the normalization coefficient CK shown in the above equation (18) represents the power of the direct sound component of the sound source placed at the focal position included in the array output when (1 / rikm) is a weighting coefficient. , The second term represents the power of the reverberation component. In a sound field where reverberation is small, such as in free space, the critical distance rc is ∞, and the second term of equation (18) becomes zero, and the weight coefficient in free space expressed by equations (19) and (9) is It turns out that it matches. Since 1 / √Ck is common to all i in the equation (19), yk (t) synthesized with gik = 1 / rikm is divided by CCk to perform normalization at all. It becomes equivalent. [0024] (4) Embodiment FIGS. 1 and 4 show an embodiment of the present invention. However, in the present invention, the multiplication coefficient gik of FIG. 1 is different from that of the abovementioned proposed one, and gik = (1 / √Ck) (1 / rikm). Here, Ck is a constant. FIG. 4 is a block 03-05-2019 8 diagram of a delay and sum array (claim 5) having an automatic sound source position detecting function. In FIG. 4, 8 is a microphone array having M microphones, 9 is a delay unit and a multiplication unit, 10 is an addition unit, 11 is a sound source position detection unit, 12 is a signal selection unit, and 13 is an output signal of the microphone array. The device of FIG. 4 operates as follows. [0025] First, the microphone array 8 receives the sound in the sound field. Next, signal processing (delay sum) for forming a focus at the k-th position (where k = 1, 2,... N) is performed on this sound reception signal. Specifically, the time delay is delayed by 9 according to the distance rik from the sound source to each microphone so that all the signals received and generated from the k-th focal position are added in the same phase. Give (however, i = 1, 2, ... M). Furthermore, the microphone gain gik shown in equation (19) is applied to the input signal by the multiplication unit 9 so that the output power is constant regardless of the focal position. The 10 adders add the M input signals to combine the output yk (t). Similarly, the outputs y1 (t), y2 (t),..., YN (t) when focusing on N focal positions ps1, ps2,. Do. [0026] These N output signals are input to 11 sound source position detection units and 12 signal selection units, respectively. The sound source position detection unit 11 performs sound source position detection using the N output signals y 1 (t), y 2 (t),..., Y N (t). Although various methods can be considered as a method of sound source position detection, here, the focal point number corresponding to the signal ykmax (t) having the largest power among y1 (t), y2 (t), ..., yN (t) Determine kmax as the focal point where the sound source exists. Next, the focal point number kmax at which the sound source is detected by the sound source position detection unit is input to the signal selection unit 12. The signal selection unit 12 uses the kmax from the sound source position detection unit 11 to select ykmax (t) from among the output signals y1 (t), y2 (t), ..., yN (t) to select the sound receiving device. It will be output. [0027] In the embodiment of FIG. 4 described above, the sound source with the largest sound is considered as the target sound source, and the focus is on this to realize the reception of a high 03-05-2019 9 SN ratio. In such a system, when a plurality of sounds, for example, a target signal source and a noise source, are present in a target sound field, it is important to accurately locate the target signal source. However, in the conventional delay-sum array method, the output power is different when the noise source (such as an air conditioner) is close to the array because the sensitivity is different depending on the focal position (the sensitivity to the focal position close to the array is increased). As a result, in spite of the fact that the power of the noise source is actually small, this may be misjudged as a target signal source. On the other hand, when the present invention is applied, it becomes possible to accurately measure the magnitude of the sound generated from the sound source regardless of the focal position, so that the sound source position that emits the largest sound can be accurately grasped By focusing on that position, good high SN ratio sound reception becomes possible. [0028] (5) Experimental results In the experiment, in a room with a room volume of 86 m3 and a reverberation time of 0.2 seconds, the two sets of circumferentially-arranged microphone arrays (total number of microphones: M = 32) shown in FIG. I lowered it. Then, 28 focal points were set on a plane 1.1 m below the microphone array. Specifically, a grid of 0.67 m × 0.8 m in height was drawn on a plane of 2 m in width and 4.8 m in height, and the focus of the microphone array was set at 4 × 7 = 28 locations. The horizontal direction of this grid plane is taken as X coordinate, and the vertical direction as Y coordinate. Next, a speaker is placed at the position of the first focal point ps1 (X, Y coordinate = [2, 2]) or the second focal point ps 2 (X, Y coordinate = [3, 3]), and phos noise is It was generated at the same volume. Then, the powers of the output signals y1 (t), y2 (t), ..., y28 (t) of the microphone array when focal points were formed at the above 28 locations were measured, and the results were shown by contour lines. However, the microphone gains gik of the multipliers 51, 52,... 5M are weighted as in the equation (19) according to the method of the present invention (where m = 1). [0029] Fig. 2 shows the power distribution determined from y1 (t) when the sound source is placed at the first focal point ps1, and Fig. 3 is derived from y2 (t) when the sound source is placed at the second focal point ps2. Power distribution. As is clear from both figures, the output of the microphone array is the highest at a certain focal point of the sound source. Also, the value of the output power when the sound source is disposed at the first focal point ps1 is about -22 dB, and the value of the output power is also about -22 dB similarly when the sound source is disposed at the second focal point ps2 The Thus, it can be seen that the power of the output of the present 03-05-2019 10 array system is equal in both cases where the sound source position is at ps1 and at ps2. From this, it was confirmed that the microphone output of the same level was obtained regardless of the focal position of the sound source. [0030] In the present invention, as a weighting coefficient of each microphone output, the reciprocal of the m-th power of distance from the sound source to each microphone (ri -m) (where 1 m m 3 3) [Japanese Patent Application 6-219941 "Sound receiving method and The apparatus is characterized in that normalization is performed with the square root of Ck in equation (18). As a result, even in a reverberant room, a microphone output with high sensitivity and the same volume can be obtained regardless of the focal position of the sound source. [0031] The above normalization is performed based on the power sum of the direct sound component and the reverberation component of the sound source placed at the focal position included in the array output when (1 / rikm) is a weighting coefficient. This is very important. Therefore, the effect equivalent to the present invention can be realized even if CK is not the equation (18) itself but an approximate amount having an equivalent value is used. [0032] As described above, when voices and musical tones are picked up using a plurality of microphones in a reverberation sound field such as a concert hall or a loud speech communication conference, the output signals of the respective microphones are all added in the same phase and summed. To give a time delay in accordance with the distance ri from the sound source to each microphone, and the reciprocal of the mth power of the distance from the sound source to each microphone ri -m (where 1 m m ≦ 3), If the output signal of each microphone is normalized by the square root of CK in (18) and the output sum is taken out, the target sound with high sensitivity and constant volume is obtained regardless of the focal position of the sound source even in a reverberant room It can be picked up. [0033] Brief description of the drawings 03-05-2019 11 [0034] 1 is a block diagram showing the configuration of an embodiment of the present invention and a conventional sound receiving device. [0035] The figure which shows an example of the power distribution calculated | required using output signal y1 (t) in case the sound source is in focus ps1 in the invention apparatus of FIG. 1 using the multiplication coefficient of FIG. 2 Formula (19). [0036] 3 is a diagram showing a power distribution obtained using the output signal y 2 (t) when there is a sound source at the focal point ps 2 in the inventive device of FIG. 1 using the multiplication coefficient of the equation (19). [0037] 4 is a block diagram showing an embodiment of claim 5. [0038] 5 is a diagram showing a power distribution obtained using the output signal y1 (t) when there is a sound source at the focal point ps1 in the conventional device of FIG. 1 with the multiplication coefficient gik = 1 / rikm (but m = 1). [0039] 6 is a diagram showing a power distribution obtained using the output signal y2 (t) when there is a sound source at the focal point ps2 in the conventional device of FIG. 1 with the multiplication coefficient gik = 1 / rikm (but m = 1). 03-05-2019 12

1/--страниц