close

Вход

Забыли?

вход по аккаунту

?

JPH1118192

код для вставкиСкачать
Patent Translate
Powered by EPO and Google
Notice
This translation is machine-generated. It cannot be guaranteed that it is intelligible, accurate,
complete, reliable or fit for specific purposes. Critical decisions, such as commercially relevant or
financial decisions, should not be based on machine-translation output.
DESCRIPTION JPH1118192
[0001]
The present invention relates to a sound collection method and apparatus for collecting a sound
by processing an output signal of a microphone array composed of a plurality of microphones,
and in particular to a video conference etc. Method and apparatus for detecting received speech
when applied to a teleconference, and the effect of received speech radiated from a received
speaker is eliminated to direct the directivity of the microphone array in the direction of the
target speech correctly The present invention relates to a sound collecting method and apparatus
capable of collecting sound.
[0002]
2. Description of the Related Art In recent years, with the advancement of multimedia technology,
communication conferences such as video conferences in the form of loud-speaking speech using
microphones and speakers are becoming possible. In such a case, a sound collecting device
capable of natural conversation without being aware of microphones and collecting only target
sound such as voice is required without installing microphones for the number of speakers on
the desk of the communication conference. There is.
[0003]
As an example of such a sound collection device, there is a sound collection device that installs a
03-05-2019
1
plurality of microphones (microphone arrays) and processes the outputs of the microphones to
extract a target sound. There are many known signal processing methods for suppressing noise
and extracting a target sound using such a microphone array, such as the delay-sum method,
AMNOR, etc. (For example, Oga, Yamazaki, and Kanada "Acoustic Systems and Digital
Processing") , The Institute of Electronics, Information and Communication Engineers, 1995, pp.
173-197) For example, in the delay-sum method, the target sound is extracted as follows.
[0004]
FIG. 10 is a diagram for explaining the principle of object extraction by the delay-and-sum
method. In FIG. 10, 1 is a sound pickup unit (microphone array), 21, 22, ..., 2M is a microphone
(M is the number of microphones), 31, 32, ..., 3M is a delay device, 4 is an adder , 5 is an output
signal, 6 is a noise suppression unit, d is a microphone interval, s (t) is a sound wave (t represents
time) arriving at the sound collection unit 1, and θ is a sound collection unit 1 of the sound wave
s (t) Is the time difference (delay time) for the sound wave to reach each microphone.
[0005]
It is assumed that the microphones 21, 22,..., 2M in FIG. 10 are linearly arranged at equal
intervals d, and the sound wave s (t) arrives at the microphones linearly arranged at an angle θ
from a distance. At this time, the distance that the sound wave that has reached the microphone
21 propagates until it reaches the microphone 22 is represented by dsin θ from the microphone
interval d and the arrival angle θ. Similarly, the distance propagated to the i-th microphone 2i (i
= 2,..., M) is represented by (i-1) d sin θ. Therefore, the delay time τi until reaching the
microphone 2i (i = 2,..., M) can be expressed by the following equation (1) by dividing this
propagation distance by the speed of sound c based on the microphone 21 Be done.
[0006]
Here, when the output signal from each microphone 2i (i = 1,..., M) is represented by Xi (t), since
the sound wave s (t) is delayed by τi, the following equation ( It becomes like 2).
[0007]
Here, it is shown below that if the delay amount Di of the delay device 3i (i = 1, 2,..., M) is
appropriately set, only the incoming sound wave from the θ direction can be emphasized and
output to the output signal 5 .
03-05-2019
2
[0008]
The delay amount Di of the delay device 3i (i = 1, 2,..., M) is set as in the following equation (3).
[0009]
D0 is a fixed delay amount to be added to prevent the accuracy in realizing the delay
characteristic with a digital filter when the value of τi is too small.
[0010]
At this time, the output of the delay device 3i (i = 1, 2,..., M) is the signal of the equation (2) in
which the delay Di of the equation (3) is generated. become.
[0011]
That is, regardless of the microphone number i, s (t) becomes the same signal delayed by D0.
[0012]
When the signals are thus added by the adder 4 after the phases are aligned, the sound waves
coming from the θ direction are emphasized by the amount of the addition.
On the other hand, since the sound waves coming from the θN direction different from the θ
direction are received with a delay time τN different from τi, the delay amount of equation (3)
does not match the phase, and the adder 4 adds the signals. It will not be emphasized even in
combination.
[0013]
Thus, in the delay-sum method, the sound wave coming from the target direction θ is
emphasized, and the noise coming from the other direction θN is relatively suppressed.
[0014]
03-05-2019
3
At this time, if the target direction θ is scanned and the output signal of the microphone array is
monitored, the output signal becomes large when θ is directed to the target speaker, so that the
direction of the target speaker can be searched.
Then, by emphasizing and adding the phases according to equation (4) so as to emphasize the
sound wave from the direction θ of the target speaker, that is, by aiming the directivity of the
microphone array in the direction of θ, the target sound is high. Sound can be picked up with
the SN ratio.
[0015]
Here, for convenience of explanation, although it has been described that a plurality of
microphones are arranged on a straight line at equal intervals d, the microphones may be
arranged at irregular intervals, and the arrangement shape is also two-dimensional · 3 You may
arrange in dimension.
[0016]
Also, as shown in FIG. 11, when the point sound source S is located at a relatively short distance
to the array, the delay elements 31, 32, ..., using the spherical wave property from the sound
source S. It is important to improve the sound-collection SN ratio by providing gains 71, 72,...
There is a way of giving the load as expressed by the following formulas (5), (6) and (7) (Nomura,
Kanada, Kojima "Near Field Microphone Array", Journal of the Acoustical Society of Japan, 53
Volume 2 (1997), pp. 110-116).
[0017]
Here, r1, r2, ..., rM are distances from the sound source S to the respective microphones 21, 22,
..., 2M, rC is the critical distance in the room, that is, the direct sound power and the
reverberation power of the sound source become equal (H. Kuttruff, “Room Acoustics (Third
Edition)”, Elsevier Applied), which is a distance and is represented by rC = √ (0.0032 V / T)
with respect to a room volume V [m 3] and a room reverberation time T [seconds] Science, pp.
100-132 (1991)).
03-05-2019
4
At this time, the microphone array is most sensitive to the “point” of the position of the sound
source S, and so to speak, the “focus” of the sensitivity is formed.
At this time, with respect to the distances ri (i = 1, 2,..., M) to the respective microphones, the
delays D.sub.0 -ri / c (c: sound speed) of the delay devices 31, 32,. If the sensitivity focus is
scanned by changing g0, that is, a, and the array output is monitored, the array output becomes
larger when the sensitivity focus is directed to the point where the target speaker exists. You can
find the position.
[0018]
Thus, the target sound can be picked up with a high sound collection SN ratio by finding the
existence area of the target speaker as the direction or position and directing the directivity of
the array to the existence area.
[0019]
It is attempted to apply this microphone array to a communication conference such as a video
conference.
The advantage of using a microphone array for the sound collection unit of a teleconference is
that the microphone array can be installed at a position distant from the speaker with a high
sound collection SN ratio, so it is necessary to install a plurality of microphones on the desk
There are advantages such as not being aware of the microphone and enabling natural
communication.
[0020]
An example of a communication conference apparatus in which this microphone array is a sound
pickup unit is shown in FIG.
In this figure, 10A and 10B indicate communication conference rooms, 11A and 11B microphone
arrays, 12A and 12B microphone array main devices, 13 communication circuits, and 14A and
03-05-2019
5
14B reception speakers.
The target voice uttered in the communication conference room 10A is picked up by the
microphone array 11A, and after processing for emphasizing the target voice in the microphone
array main device 12A, the communication conference which is the communication destination
through the communication line 13 It is transmitted to the room 10B and emitted from the
reception speaker 14B as a reception sound.
The flow of signals for the target voice uttered in the communication conference room 10B is
also the same flow as described above. As described above, the microphone array main units 12A
and 12B scan the directivity of the microphone arrays 11A and 11B to find the existence area of
the target speaker, and the directivity of the microphone arrays 11A and 11B is directed to the
existence area of the target speaker It operates in such a way as to pick up the target voice at a
high SN ratio.
[0021]
As described above, the microphone arrays 11A and 11B detect the presence area of the target
speaker, direct the directivity of the microphone arrays 11A and 11B to the presence area, and
collect the target sound at a high SN ratio. there were. However, when the receiving voice from
the communication destination is radiated from the receiving speaker 14A or 14B, the receiving
speaker is often erroneously detected as a target speaker, and the directivity of the microphone
array 11A, 11B is received by the receiving speaker 14A or 14B. It turned out that it turned to
the direction. Further, at this time, the sound radiated from the reception speaker 14A or 14B is
collected by the microphone array 11A or 11B and returned to the communication conference
room 10A or 10B where the speaker is present again to be perceived as an echo, a cause such as
howling It also turns out that the problem of poor call quality may arise.
[0022]
In order to solve the above-mentioned problems, it seems that the directivity of the microphone
array should not be directed to the direction of the receiving speaker. However, even in this case,
the directivity of the microphone array is directed near the receiving speaker (at a position
slightly away from the receiving speaker). As a result, the reflected sound from the wall or floor
near the receiving speaker is picked up. In order to solve this problem, the directivity is not made
03-05-2019
6
close to the receiving speaker. However, in this case, when the target speaker speaks at a
position near the receiving speaker, the directivity does not go in the direction of the target
speaker, and good sound collection can not be performed.
[0023]
Therefore, in the present invention, first, in order to distinguish the target voice emitted by the
target speaker from the received speech radiated from the received speaker, a reception
detection unit for detecting the reception state in which the received speech exists is provided.
Then, a directivity control unit is provided to prevent the directivity of the microphone array
from being directed to the receiving speaker or the area near the receiving speaker or the set
specific area based on the receiving state detected by the receiving detection unit. That is, by
performing directivity control to avoid the receiving speaker only in the receiving state, control
that does not affect the sound collection of the target speaker can be realized.
[0024]
The most basic method of earpiece detection is to monitor an electrical earpiece signal sent from
the other party. Reception detection can be performed, for example, by calculating the power of
the received electric signal and determining whether the power exceeds a threshold th1. More
complicated reception detection methods can use reception detection techniques used in, for
example, voice switch technology and acoustic echo canceller technology.
[0025]
If this method can not be used because an electrical reception signal can not be extracted, the
directivity of the microphone array is directed to the reception speaker, and the output signal of
the microphone array at this time is monitored, for example, the output signal The reception
signal can be determined by determining whether the power of the signal exceeds the threshold
th2.
[0026]
When using the delay-and-sum method shown in FIG. 10 or 11 as signal processing for the
microphone array output, the directivity is generally sharper in the high frequency region.
03-05-2019
7
Therefore, if high frequency components are used, sound waves emitted from the reception
speaker can be separated from sound waves emitted from the reception speaker even when a
speaker or noise source is present in the vicinity of the reception speaker. It is possible to
distinguish the reception voice emitted from the reception speaker and detect the reception. In
order to extract high frequency components, the output of the microphone array for the
receiving speaker can be extracted by passing it through a high frequency filter. Further, even in
the case of performing electrical reception detection, when the reception state can not be
detected favorably, for example, since low frequency electrical noise is unnecessary to the
electrical reception signal, it is possible to detect this low frequency. By using a high-pass filter to
remove electrical noise, the reception state can be detected well.
[0027]
The most basic method of directivity control is a method of excluding the position of the
reception speaker from the scanning range of directivity for target speaker detection while the
reception detection unit determines that the reception state. However, this is insufficient for
practical use. That is, the first reflected sound to the sound emitted from the reception speaker is
generated from the floor and the wall in the vicinity area of the reception speaker. Since the first
reflected sound generally has high energy, the microphone array may erroneously detect the
influence of the reflected sound. In order to prevent this, a directivity control unit is provided
which prevents the directivity of the microphone array from being directed to the area near the
reception speaker including the position of the reception speaker, not only at the position of the
reception speaker. Note that the area near the reception speaker is an area of about 0.5 to 2 m in
radius centered on the reception speaker, and the actual size of the radius is the application of
sound collection, the degree of reflection of the room used, noise, etc. It is decided depending on
the condition. However, it is desirable that the size be as large as possible without overlapping
with the target speaker's presence area.
[0028]
In this way, the reception speaker position is not erroneously detected as the target speaker
position. In addition, since the entire area of the target area is scanned and detected when not
receiving a voice, even if the target speaker occurs near the speaker, its position can be detected
and the directivity can be directed to the target speaker, resulting in good collection. Sound can
be realized.
03-05-2019
8
[0029]
If the sound pressure of a specific area in the room other than the area near the reception
speaker is raised by the room reflection of the reception sound radiated from the reception
speaker or noise coming from the air conditioning, room window or wall, this specification The
area is also set so as to be excluded from the scanning range of directivity together with the
reception speaker or the area near the reception speaker.
[0030]
Also, in order to realize the same principle, the following method can be applied.
When detecting the existence area of the target speaker by directing directivity to the area where
the average power of the array output is high, the power calculation for each area is performed
to calculate the power for each area when the directivity is scanned To calculate the power for
each area excluding the receiving speaker or the area near the receiving speaker or the
designated specific area, and detect the high power area as the sound source area from the
power calculated by the power calculation section according to each area can do. In this way, it is
possible to avoid the problem that the receiving speaker, the area near the receiving speaker, or
the designated specific area is erroneously detected as the target speaker's presence area.
[0031]
Furthermore, in order to realize the same principle, the following method can also be applied.
That is, from the output of each area power calculator for calculating the power for each area
when the directivity is scanned, the power from the area for the receiving speaker or the area
around the receiving speaker or the designated specific area is high. The area may be detected as
a sound source area.
[0032]
In addition, as a measure to reliably prevent false detection in the area near the reception
speaker during the reception state, while the reception detection unit is determined to be in the
reception state, the area directed to immediately before the reception state is detected. Measures
are taken to fix the directivity of the microphone array in the directivity control unit.
03-05-2019
9
[0033]
As described above, the method of detecting the radiation sound wave from the receiving speaker
and preventing the directivity of the array from being directed to the receiving speaker or the
area near the receiving speaker or the set specific area is to collect the voice of the speaker. The
present invention can also be applied to in-field loud-speaking in which loudspeakers and the like
are used in the same room where the speaker is present.
For example, when a listener asks a question to a speaker at a lecture in a relatively large venue,
etc., in order to make it easy to listen to the contents of the question, the listener's question is
spread into the venue by a loudspeaker. It is possible to use the above-mentioned microphone
array to direct directivity to the audience when collecting questions of the audience, but if the
energy of the sound wave emitted from the loudspeaker for loudening is large, the microphone
array will ask the question. There is a possibility that the speaker for loud sound may be
misdetected as a speaker other than the audience person who is doing. In order to prevent this,
there can be considered a method of controlling directivity except for a loudspeaker for
loudspeaker or a region near the loudspeaker for loudspeaker or a set specific region.
[0034]
According to the present invention as described above, by providing a reception detection unit
for detecting a reception state in which a reception speech is present, the target speech emitted
by the target speaker and the reception speech radiated from the reception speaker can be
obtained. And can be identified. By providing a directivity control unit for preventing the
directivity of the microphone array from being directed to the receiving speaker or the nearby
region of the receiving speaker or the set specific region based on the receiving state detected by
the receiving detection unit, It is possible to prevent the influence of the radiated reception voice
and prevent the operation of erroneously directing the directivity of the microphone array to the
reception speaker without the target voice or the area near the reception speaker or the
designated specific area.
[0035]
Embodiments of the present invention will be described below with reference to the drawings.
03-05-2019
10
[0036]
FIG. 1 is a block diagram showing the configuration of the first embodiment of the present
invention.
In this figure, reference numeral 20 denotes a communication conference apparatus, which
includes a microphone array 21, a microphone array main apparatus 22, a receiving line 23-1
serving as receiving means, a transmission line 23-2 serving as transmitting means, a receiving
speaker 24 and the like. And a reception detection unit 30 and a directivity control unit 40.
[0037]
To explain the operation, the signal picked up by the microphone array 21 is subjected to signal
processing by the microphone array main unit 22, and the directivity of the microphone array 21
is directed to the presence area of the target speaker to make the target voice SN high. The voice
is picked up by the ratio and the target voice is transmitted to the communication destination
through the transmission line 23-2. A reception signal received from the communication
destination through the reception line 23-1 is emitted as a reception voice by the reception
speaker 24. At this time, in order to prevent the directivity of the microphone array 21 from
being erroneously directed to the reception speaker 24 or the proximity region of the reception
speaker 24 where the target speaker does not speak, the reception detection unit 30 detects the
reception state. Based on the detected reception state, the directivity control unit 40 controls the
signal processing of the microphone array main apparatus 22 to control the directivity of the
microphone array 21.
[0038]
FIG. 2 shows a second embodiment of the present invention. In this embodiment, a reception
signal power calculation unit 31 for calculating the power of a reception signal and a reception
state when the power of the reception signal calculated by the reception signal power calculation
unit 31 exceeds a set threshold th1 This embodiment is an embodiment in which the reception
detection unit 30 is configured by the reception state determination unit 32 to be determined,
and the reception state is determined by calculating the power of the reception signal.
03-05-2019
11
[0039]
FIG. 3 shows a third embodiment of the present invention. In this embodiment, a reception
speaker output sound pickup unit 33 for directing directivity of the microphone array 21 to the
reception speaker 24 or a region near the reception speaker 24 or a specific area, and the power
of the output signal from the reception speaker output sound collection unit 33 Reception
speaker output power calculation unit 34 for calculating the reception state, and a reception
state determination unit that determines the reception state when the reception speaker output
power calculated by the reception speaker output power calculation unit 34 exceeds the set
threshold value th2 32 is an embodiment in which the reception detection unit 30 is configured.
[0040]
FIG. 4 shows a fourth embodiment of the present invention. In this embodiment, a band-pass
filter unit 50 for extracting a specific band component of the reception signal is provided, and
the reception detection unit 30 detects the reception state using the output signal of the bandpass filter 50. It is an example.
[0041]
FIG. 5 shows a fifth embodiment of the present invention. The directivity scanning unit 41 scans
the directivity of the microphone array 21, the sound source presence area detection unit 42 that
detects a target speaker area from the output signal of the directivity scanning unit 41, and the
reception detection unit 30 receive While the state is determined, the directivity control unit 40
is an embodiment in which the receiving speaker 24 or a region close to the receiving speaker or
a set specific region excluding the set specific region is excluded from the detection of the sound
source existing region.
[0042]
FIG. 6 shows a sixth embodiment of the invention. In this embodiment, an area-specific power
calculation section 421 which calculates the power of the output signal of the directional
scanning section 41 and an area-specific power maximum area which detects the area where the
03-05-2019
12
power calculated by the area-specific power calculation section 421 is maximum. In this
embodiment, the sound source existing area detection unit 42 is configured of the detection unit
422.
[0043]
FIG. 7 shows a seventh embodiment of the present invention. In this embodiment, the directivity
scanning unit 41 for scanning the directivity of the microphone array 21 and the reception
speaker 24 or each region excluding the reception speaker near region or the set specific region
while the reception detection unit 30 determines the reception state. Region-by-region power
calculator 421 for calculating the power of the output signal of the directional scanning unit 41,
and region-by-region power maximum region detector for detecting the region where the power
calculated by the region-by-region power calculator 421 is maximum It is the Example which
comprised the sound source presence area | region detection part 42 from 422 and.
[0044]
FIG. 8 shows an eighth embodiment of the present invention. In this embodiment, the directivity
scanning unit 41 for scanning the directivity of the microphone array 21 and the directivity to
the reception speaker 24 or the reception speaker near area or the designated specific area while
the reception detection unit 30 determines that the reception state is received. And the directivity
scan limited output power calculation unit 45 which calculates the power of the output signal of
the directivity scan unit 41, and the calculated power of the directivity scan limited output power
calculation unit 45 In this embodiment, the directivity control unit 40 is configured of the
directivity scan limited output power maximum region detection unit 46 that detects the region
that is the largest.
[0045]
FIG. 9 shows a ninth embodiment of the present invention. This embodiment is an embodiment in
which the directivity control unit 40 is provided with a directivity hold unit 47 that fixes the
directivity of the microphone array 21 while the reception detection unit 30 determines that the
reception detection state is set.
03-05-2019
13
[0046]
As described above, according to the sound collection method of the present invention, sound
collection is performed using a microphone array consisting of a plurality of microphones and a
microphone array main device that processes the output signal of the microphone array. In the
sound collection method to be performed, a reception signal from a communication destination is
received, the reception signal is emitted as a reception sound wave from a reception speaker, a
reception state is detected from the reception signal or the reception sound wave, and directivity
of the microphone array is determined. Since the control is performed, the directivity of the
microphone array can be accurately controlled.
[0047]
In the sound pickup apparatus according to the present invention, it is possible to distinguish
between the target voice emitted by the target speaker and the reception voice emitted from the
reception speaker by providing a reception detection unit for detecting a reception state in which
the reception voice is present. become able to.
By providing a directivity control unit for preventing the directivity of the array from being
directed to the reception speaker or the reception speaker near area or the designated specific
area based on the reception state detected by the reception detection unit, the reception speaker
is radiated It is possible to prevent the influence of the reception voice and prevent the operation
of directing the directivity of the microphone array to the reception speaker without the target
voice or the area near the reception speaker or the designated specific area by mistake.
[0048]
As described above, according to the present invention, it is possible to prevent the sound
emitted from the speaker from being collected by the microphone array, and the sound emitted
by the speaker passes through the line and the room in which the speaker is present again. It is
possible to prevent echoing back to, and to prevent howling, and has an excellent effect such as
preventing deterioration in speech quality due to these echoes and howling.
[0049]
Brief description of the drawings
[0050]
03-05-2019
14
1 is a block diagram showing the configuration of a first embodiment of the sound collection
device of the present invention.
[0051]
2 is a block diagram showing the configuration of a second embodiment of the sound collection
device of the present invention.
[0052]
<Figure 3> It is the block diagram which shows the constitution of the 3rd example of the sound
collection device of this invention.
[0053]
<Figure 4> It is the block diagram which shows the constitution of the 4th example of the sound
collection device of this invention.
[0054]
<Figure 5> It is the block diagram which shows the constitution of the 5th example of the sound
collection device of this invention.
[0055]
<Figure 6> It is the block diagram which shows the constitution of the 6th example of the sound
collection device of this invention.
[0056]
<Figure 7> It is the block diagram which shows the constitution of the 7th example of the sound
collection device of this invention.
[0057]
<Figure 8> It is the block diagram which shows the constitution of the 8th example of the sound
collection device of this invention.
[0058]
<Figure 9> It is the block diagram which shows the constitution of the 9th execution example of
03-05-2019
15
the sound collection device of this invention.
[0059]
10 is a diagram for explaining the principle of noise suppression and sound pickup by the
conventional delay-and-sum method.
[0060]
FIG. 11 is a view for explaining that the load of the gain at the rear stage of the delay unit is
appropriately set to improve the sound collection SN ratio when the sound source is located at a
position close to the microphone array.
[0061]
12 is a block diagram for explaining a communication conference using the conventional
microphone array.
[0062]
Explanation of sign
[0063]
20 communication conference apparatus 21 microphone array 22 microphone array main
apparatus 23-1 reception line 23-2 transmission line 24 reception speaker 30 reception
detection section 31 reception signal power calculation section 32 reception state determination
section 33 reception speaker output sound collection section 34 reception Speaker output power
calculation unit 40 directivity control unit 41 directivity scanning unit 42 sound source presence
area detection unit 421 area power calculation unit 422 area power maximum area detection
unit 43 sound source presence area detection limitation unit 44 scan limitation unit 45
directivity scan Limited output power calculation unit 46 Directivity scan limited output power
maximum area detection unit 47 Directional hold unit 50 Band filter unit
03-05-2019
16
Документ
Категория
Без категории
Просмотров
0
Размер файла
27 Кб
Теги
jph1118192
1/--страниц
Пожаловаться на содержимое документа