close

Вход

Забыли?

вход по аккаунту

?

JPH1155784

код для вставкиСкачать
Patent Translate
Powered by EPO and Google
Notice
This translation is machine-generated. It cannot be guaranteed that it is intelligible, accurate,
complete, reliable or fit for specific purposes. Critical decisions, such as commercially relevant or
financial decisions, should not be based on machine-translation output.
DESCRIPTION JPH1155784
[0001]
BACKGROUND OF THE INVENTION The present invention collects voices uttered by a speaker
with a microphone array composed of a plurality of microphones, processes the output signal of
the microphone array, and raises only the voice of the speaker. The present invention relates to
an in-field loud-speaking method and apparatus using a single or a plurality of speakers which
pick up an SN ratio and emit the picked up output signal as a sound wave into the same room
where the speaker is present. A speaker-in-a-field method and apparatus for accurately detecting
the speaker by removing the influence of the speaker and controlling amplification of the
speaker in the vicinity of the speaker to eliminate an adverse effect such as howling is there.
[0002]
2. Description of the Related Art Conventionally, in-field loud-speaking has a microphone
installed close to the speaker or the speaker utters with the microphone, the output signal from
the microphone is amplified, and the speaker is the same as the speaker as a sound wave. It was
common to radiate indoors.
[0003]
However, in recent years, it has been a hindrance to install a microphone close to the speaker, or
when it is necessary to hand over the microphone to the questioner when raising a question from
the listener in the hall of a lecture etc. It is not necessary to install a microphone in close
proximity to the speaker or the requester because it is complicated, etc., it is not possible to be
aware of the microphone, natural speech is possible, and only the target sound such as voice is
high SN ratio There is a need for a sound collection device that collects sound.
03-05-2019
1
[0004]
As an example of such a sound collection device, there is a sound collection device that installs a
plurality of microphones (microphone arrays) and processes the outputs of the microphones to
extract a target sound.
There are many known signal processing methods for suppressing noise and extracting target
speech using such a microphone array, such as the delay-sum method, AMNOR, etc. (For
example, Ogata, Yamazaki, Kanada “Acoustic System and Digital Processing” , The Institute of
Electronics, Information and Communication Engineers, 1995, pp. 173-197) For example, in the
delay-sum method, the target sound is extracted as follows.
[0005]
FIG. 5 is a diagram for explaining the principle of target sound extraction by the delay-and-sum
method.
In FIG. 5, 1 is a sound pickup unit (microphone array), 21, 22, ..., 2M is a microphone (M is the
number of microphones), 31, 32, ..., 3M is a delay device, 4 is an adder , 5 is an output signal, 6 is
a noise suppression unit, d is a microphone interval, s (t) is a sound wave (t represents time)
arriving at the sound collection unit 1, and θ is a sound collection of the sound wave s (t) It is an
arrival angle that arrives at the unit 1, and τ described later is a time difference (delay time) in
which the sound wave reaches each microphone.
[0006]
It is assumed that the microphones 21, 22,..., 2M in FIG. 5 are linearly arranged at equal intervals
d, and the sound waves s (t) arrive from the distance to the microphones linearly arranged at an
angle θ. At this time, the distance that the sound wave s (t) that has reached the microphone 21
propagates until it reaches the microphone 22 is represented by dsin θ from the microphone
interval d and the arrival angle θ. Similarly, the distance propagated to the i-th microphone 2i (i
= 2,..., M) is represented by (i-1) d sin θ. Therefore, the delay time τi until reaching the
03-05-2019
2
microphone 2i (i = 2,..., M) can be expressed by the following equation (1) by dividing this
propagation distance by the speed of sound c based on the microphone 21 Be done.
[0007]
Here, representing the output signal from each microphone 2i (i = 1,..., M) by xi (t), since the
sound wave s (t) is delayed by τi, the following equation ( It becomes like 2).
[0008]
Here, it is shown below that if the delay amount Di of the delay device 3i (i = 1, 2,..., M) is
appropriately set, only the incoming sound wave from the θ direction can be emphasized and
output to the output signal 5 .
[0009]
The delay amount Di of the delay device 3i (i = 1, 2,..., M) is set as in the following equation (3).
[0010]
Do is a fixed delay amount to be added in order to prevent the accuracy in realizing the delay
characteristics with a digital filter when the value of τi is too small.
[0011]
At this time, the output of the delay device 3i (i = 1, 2,..., M) is the signal of the equation (2) in
which the delay amount Di of the equation (3) is generated. It will be.
[0012]
That is, regardless of the microphone number i, s (t) becomes the same signal delayed by D0.
[0013]
When the signals are thus added by the adder 4 after the phases are aligned, the sound waves
coming from the θ direction are emphasized by the amount of the addition.
On the other hand, since the sound waves coming from the θN direction different from the θ
03-05-2019
3
direction are received with a delay time τN different from τi, the delay amount of equation (3)
does not match the phase, and the adder 4 adds the signals. It will not be emphasized even in
combination.
[0014]
In this manner, in the delay-sum method, the sound wave arriving from the target direction θ is
enhanced by setting the delay amount Di as shown in equation (3), and the noise coming from
other directions θN is relatively suppressed. .
[0015]
At this time, if the target direction θ is scanned by changing the delay amount Di and the output
signal of the microphone array 1 is monitored, the output signal becomes large when θ is
directed to the target speaker, so that the target talk I can look for the direction of the person.
Then, by emphasizing and adding the phases according to equation (4) so as to emphasize the
sound wave from the direction θ of the target speaker, that is, by aiming the directivity of the
microphone array 1 in the direction of θ, Sound can be picked up with a high SN ratio.
[0016]
Here, for convenience of explanation, although it has been described that the plurality of
microphones 21, 22, ..., 2M are arranged on a straight line at equal intervals d, it is also possible
to make the intervals of the microphones uneven. The shapes to be arranged may also be
arranged two-dimensionally or three-dimensionally.
[0017]
Also, as shown in FIG. 6, when the point source sound source S is located at a relatively close
distance to the microphone array 1, the delay elements 31, 32,. .., 7M are provided in the latter
stage of 3M, and it is important for improving the sound collection SN ratio to apply an
appropriate load to the gains.
There is a way of giving the load as expressed by the following formulas (5), (6) and (7) (Nomura,
03-05-2019
4
Kanada, Kojima "Near Field Microphone Array", Journal of the Acoustical Society of Japan, 53
Volume 2 (1997), pp. 110-116).
[0018]
Here, r1, r2, ..., rM are distances from the sound source S to the respective microphones 21, 22,
..., 2M, rC is the critical distance in the room, that is, the direct sound power and the
reverberation power of the sound source become equal (H. Kuttruff, “Room Acoustics (Third
Edition)”, Elsevier Applied), which is a distance and is represented by rC = √ (0.0032 V / T)
with respect to a room volume V [m 3] and a room reverberation time T [seconds] Science, pp.
100-132 (1991)).
At this time, the microphone array 1 is most sensitive to the “point” of the position of the
sound source S, and so to speak, the “focus” of the sensitivity is formed.
At this time, the delay amount D 0 −ri / c (c: sound velocity) of the delay devices 31, 32,..., 3 M
with respect to the distance ri (i = 1, 2,.. By scanning the focus of sensitivity by changing the gain
g0, that is, a, and monitoring the array output, the array output becomes larger when the
sensitivity is focused on the point where the target speaker exists, so that the target speaker The
position of can be found.
[0019]
Thus, the target sound can be picked up with a high sound collection SN ratio by finding the
existence area of the target speaker as the direction or position and directing the directivity of
the array to the existence area.
[0020]
An attempt is made to apply this microphone array 1 to in-field speech.
The advantage of using the microphone array 1 for the in-field sound collection unit is that, as
described above, the microphone array 1 can be placed at a distance from the speaker with a
high sound collection SN ratio, so it is close to the speaker It is not necessary to install a
03-05-2019
5
microphone, and there are advantages such as not being aware of the microphone and enabling
natural communication.
In addition, there is also an advantage such as when it is not necessary to hand over the
microphone to the inquirer when the question from the listener is expanded into the hall at the
seat of the lecture.
[0021]
An example of an in-field loudspeaker system in which this microphone array is a sound pickup
unit is shown in FIG.
In this figure, 11 represents the entire in-field loudspeaker, 12 represents a microphone array,
13 represents a microphone array processor, 14 represents an amplifier, and 15 represents a
speaker.
The target voice uttered is picked up by the microphone array 12, subjected to processing for
emphasizing the target voice by the microphone array processing device 13, and then amplified
by the amplifier 14 and amplified as sound waves from the speaker 15 into the field and
radiated. Ru. The microphone array processing unit 13 scans the directivity of the microphone
array 12 as described above to find out the existence area of the target speaker, and operates so
that the directivity of the array points to the existence area of the target speaker. Operates to
pick up sound with a high SN ratio.
[0022]
As described above, the microphone array 12 detects the area where the target speaker is
present, directs the directivity of the array to this area, and picks up the target voice with a high
SN ratio. Was effective. However, when the speaker's speaker's voice is emitted from the speaker
15 with a loud voice, the speaker position or the vicinity thereof is often detected as a target
speaker's presence area, and the directivity of the microphone array 12 is the direction of the
speaker. It turned out that it turned to
03-05-2019
6
[0023]
If it is prohibited to direct the directivity of the microphone array 12 in the direction of the
receiving speaker in order to avoid this problem, when the speaker speaks at a position close to
the speaker 15, the area where the speaker is present is sufficient. It has also been found that the
directivity of the microphone array 12 can not be directed to a new problem that the target voice
can not be picked up with a sufficiently high SN ratio. Further, even if the directivity of the
microphone array 12 can be directed to this speaker, a closed loop is formed so that the target
voice emitted from the speaker 15 is again picked up by the microphone array 12 and amplified
again. It was also found that there was a problem that howling was caused by the cause, and
there was no way of loudening within the room.
[0024]
SUMMARY OF THE INVENTION In order to solve the above problems, the present invention
controls an amplifier as follows. That is, a speaker position detection means for detecting the
position of a speaker, a determination means for determining a speaker proximity speaker
existing near the speaker position among a plurality of speakers installed indoors, and the
speaker proximity speaker And control means for controlling the gain of the amplification means
connected to the control means so as to lower the gain of the amplification means connected to
the speaker in the vicinity of the speaker when the speaker speaks.
[0025]
Speaker detection is performed as follows. That is, a speaker sound pickup means for scanning
the directivity of the microphone array excluding the position of the speaker, and a comparison
of directing the directivity of the microphone array in the vicinity of a region where the speaker
sound pickup means forms directivity. The speaker position is estimated using the sound pickup
means, the speaker sound pickup output signal output from the speaker sound pickup means,
and the comparison output signal output from the comparison sound pickup means.
[0026]
This position estimation is performed, for example, as follows. A speaker power calculation
03-05-2019
7
means for calculating the power of the output signal from the speaker collection means, a
comparison power calculation means for calculating the power of the output signal of the
comparison sound collection means, and the speaker power calculation means Speaker power
gradient determination means for determining whether the calculated speaker power exceeds the
comparison power calculated by the comparison power calculation means by a set value and the
threshold value W for which the speaker power is set If the determination of the speaker perparity determination means is true and the determination result of the speaker power-gradient
determination means is that the speaker utters a voice When determined, the speaker position
estimation means estimates the speaker position.
[0027]
In addition, when the delay-and-sum method shown in FIG. 5 or 6 is used as signal processing on
the microphone array output, the directivity is generally sharper in the high frequency region.
Therefore, if high frequency components are used, the sound wave emitted from the speaker can
be separated and collected from the sound wave emitted from the speaker even when the
speaker is present in the vicinity of the speaker. It is possible to detect the speaker position by
distinguishing the target voice emitted from the person. The high frequency components can be
extracted by passing the microphone array output for the speaker and the comparison focus
through a high frequency filter. Thus, a specific band-pass filter means is provided for the output
signal from the speaker collection means and the comparison sound collection means, and the
speaker position estimation means is a speaker using the output signal of the band-pass filter
means. Estimate the position.
[0028]
DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS In the present invention, in the
speaker position detection means, the directivity of the microphone array is scanned as described
above to find the position of the speaker. At this time, the position of the speaker is removed
from the scanning target. However, even if the position of the speaker is excluded from the
scanning target, when sound is emitted from the speaker, the sound pressure of the area in the
vicinity of the speaker rises, so the detected position is based on the voice from the speaker. It is
necessary to identify whether it has been detected or detected based on the increase in sound
pressure from the speaker. For this reason, the position of the speaker is accurately detected as
follows. That is, in addition to the directivity of the microphone array to the region Fsp of the
detected speaker, the directivity for comparison is directed to the region Fi (i = 1, 2,..., L) near the
speaker position. The speaker position is detected by comparing the array output Esp for Fsp
03-05-2019
8
with the array output Ei (i = 1, 2,..., L) for the area Fi. However, it is assumed that the area Fi is
closer to the position of the speaker assumed than the area Fsp and the area Fi. When the voice
is emitted from the speaker, the output Esp is the highest and thus Esp> Ei (i = 1, 2,..., L). On the
other hand, when the sound is emitted from the speaker and the sound pressure is high, the area
Fi is closer to the speaker than the area Fsp, and thus Esp <Ei (i = 1, 2,..., L). . Therefore, the
speaker position can be detected as the position where the voice is emitted from the speaker,
when Esp> Ei (i = 1, 2,... L). In addition to the condition of Esp> Ei (i = 1, 2,... L), in order to
remove the influence of noise in the room, the speaker is required when Esp exceeds a certain
threshold value th. It may be determined that the voice is uttered from.
[0029]
Next, the determination means determines a speaker proximity speaker existing near the speaker
position among the speakers, and the gain of the amplification means connected to the speaker
proximity speaker by the control means when the speaker speaks Control to lower the Here, in
general, the directivity of the delay-sum array in FIG. 5 or 6 is sharp in the high region and wide
in the low region. In other words, the amount of coupling in the low frequency range is large
when the microphone array is used for the loud sounding, which is a problem. Therefore, the low
frequency component is attenuated for the signal of the speaker near the speaker.
[0030]
Embodiments of the present invention will be described hereinbelow with reference to the
drawings.
[0031]
FIG. 1 shows a first embodiment of the present invention.
In this figure, reference numeral 21 denotes an in-field loudspeaker apparatus, which includes a
microphone array 22, a microphone array processing device 23, an amplifier (amplifying means)
24, a speaker 25, a speaker position detection unit (speaker position detection means) 26, and A
unit (determination means) 27 and a control unit (control means) 28 are provided.
[0032]
03-05-2019
9
Next, the operation will be described. The voice uttered by the speaker is collected by the
microphone array 22, and the output signal of the microphone array 22 is processed by the
microphone array processing unit 23, only the target sound is extracted, amplified by the
amplifier 24, and output from the speaker 25 into the field. Loud. At this time, the speaker
position detection unit 26 detects the speaker position from the output signal of the microphone
array 22, and based on the detected speaker position, the determination unit 27 selects the
speaker among the plurality of in-field loudspeakers 25. Is determined, and the control unit 28
decreases the gain of the amplifier 24 with respect to the speaker located near the speaker based
on the determination, and the directivity of the microphone array 22 is directed to the speaker
located near the speaker 25. Howling is prevented even when being played, so that the target
sound can always be loudened in the field with a high SN ratio.
[0033]
FIG. 2 shows a second embodiment of the present invention. This includes a speaker pickup unit
(speaker pickup unit) 31 for directing directivity of the microphone array 22 to the speaker, and
a comparison pickup unit (for comparison for directing directivity of the microphone array 22 in
the vicinity of the speaker Sound collecting means) 32 and a speaker position estimation unit
(speaker position estimation unit) 33 for estimating the speaker position using the output from
the speaker collection unit 31 and the output from the comparison sound collection unit 32 This
embodiment is an embodiment in which a speaker position detection unit (speaker position
detection means) 26 is configured. According to this embodiment, the speaker position can be
estimated more accurately.
[0034]
FIG. 3 shows a third embodiment of the present invention. This is a comparison power for
calculating the power of the output signal from the speaker power calculation unit (speaker
power calculation means) 34 for calculating the power of the output signal from the speaker
collection unit 31 and the comparison sound collection unit 32. An utterance that determines
whether the speaker power calculated by the calculation unit (comparison power calculation
means) 35 and the speaker power calculation unit 34 exceeds the comparison power calculated
by the comparison power calculation unit 35 by a set value A speaker power gradient judging
unit (speaker power gradient judging unit) 36 and a speaker power judging unit (speaker power
judging unit) 37 for judging whether the speaker power exceeds the set threshold th In this
03-05-2019
10
embodiment, the speaker position estimation unit 33 is configured to estimate the speaker
position when the determinations of the power gradient determination unit 36 and the speaker
power determination unit 37 are both true.
[0035]
FIG. 4 shows a fourth embodiment of the present invention. This is because a specific band
filtering unit (band filtering means) 41 of the output from the speaker collection unit 31 and the
comparison collection unit 32 is provided in the latter stage of the speech collection unit 31 and
the comparison collection unit 32. In this embodiment, the speaker position estimation unit 33
measures the speaker position using the output signal of the band filter unit 41.
[0036]
As described above, according to the present invention, even when the speaker is positioned
close to the speaker, the directivity of the array is directed to the area where the speaker is
present, and the target voice is collected. be able to. In addition, since control is made to lower
the gain of the amplification means connected to the speaker near the speaker when the speaker
speaks in the vicinity of the speaker, howling can be avoided even when the directivity of the
array is directed to the speaker It has an outstanding effect of being able to do so.
[0037]
And since the low frequency component is attenuated for the signal of the speaker near the
speaker, howling can be effectively suppressed.
[0038]
In addition to the directivity of the microphone array to the area Fsp where the speaker is
present, this speaker position detection means also has an area Fi (i = 1, 2,...) Near the speaker
and closer to the speaker than Fsp. · L: L indicates the directivity for comparison to the number of
comparison foci) and compare the array output Esp for Fsp with the array output Ei (i = 1, 2,..., L)
for Fi Since the speaker position is detected by determining Ei, it is possible to prevent an
erroneous determination of the speaker position due to the influence of the speaker even when
the target speaker pronounces in the vicinity of the speaker.
03-05-2019
11
In addition, in order to detect whether the speaker position is detected by determining whether
Esp exceeds the set threshold th, if Esp> Ei (i = 1, 2,... L) due to background noise in the room In
addition, it is possible to prevent an erroneous determination of the position of the speaker,
which produces an excellent effect as never before.
[0039]
Further, since the power of the output signal from the speaker collecting unit and the
comparison sound collecting unit is calculated by the speaker power calculation unit and the
comparison power calculation unit to determine the speaker position, more accurate judgment
can be made. The effect of being able to
[0040]
Brief description of the drawings
[0041]
1 is a block diagram showing the configuration of a first embodiment of the present invention.
[0042]
2 is a block diagram showing the configuration of a second embodiment of the present invention.
[0043]
3 is a block diagram showing the configuration of a third embodiment of the present invention.
[0044]
4 is a block diagram showing the configuration of a fourth embodiment of the present invention.
[0045]
5 is a diagram for explaining the principle of noise suppression and sound collection by the delay
and sum method.
03-05-2019
12
[0046]
6 is a diagram for explaining that the load of the gain of the rear stage of the delay unit is
appropriately set to improve the sound pickup SN ratio when the sound source microphone array
is located near the conventional sound source microphone array.
[0047]
7 is a diagram for explaining the in-field expansion using the conventional microphone array.
[0048]
Explanation of sign
[0049]
Reference Signs List 21 in-field loudspeaker 22 microphone array 23 microphone array
processing device 24 amplifier 25 speaker 26 speaker position detection unit 27 determination
unit 28 control unit 31 speaker collection unit 32 comparison sound collection unit 33 speaker
position estimation unit 34 speaker power Calculation unit 35 Comparison power calculation
unit 36 Speaker power gradient determination unit 37 Speaker power determination unit 41
Band filter unit
03-05-2019
13
Документ
Категория
Без категории
Просмотров
0
Размер файла
24 Кб
Теги
jph1155784
1/--страниц
Пожаловаться на содержимое документа