close

Вход

Забыли?

вход по аккаунту

?

JPH1118187

код для вставкиСкачать
Patent Translate
Powered by EPO and Google
Notice
This translation is machine-generated. It cannot be guaranteed that it is intelligible, accurate,
complete, reliable or fit for specific purposes. Critical decisions, such as commercially relevant or
financial decisions, should not be based on machine-translation output.
DESCRIPTION JPH1118187
[0001]
[Technical Field of the Invention] The present invention picks up the voice of a speaker by means
of a microphone or the like in a conference, a lecture, etc., loudens it there, and hears the voice of
the speaker from other participants in the hall. The present invention relates to a speaker
tracking type in-field loud-speaking device and a voice input method that facilitate the process.
[0002]
2. Description of the Related Art In recent years, with the advancement of multimedia technology,
communication conferences such as video conferences in the form of loud-speaking speech using
microphones and speakers are becoming possible. In such a case, a sound collecting device
capable of natural conversation without being aware of microphones and collecting only target
sound such as voice is required without installing microphones for the number of speakers on
the desk of the communication conference. There is.
[0003]
As an example of such a sound collection device, there is a sound collection device that installs a
plurality of microphones (microphone arrays) and processes the outputs of the microphones to
extract a target sound. There are many known signal processing methods for suppressing noise
and extracting a target sound using such a microphone array, such as the delay-sum method,
03-05-2019
1
AMNOR, etc. (For example, Oga, Yamazaki, and Kanada "Acoustic Systems and Digital
Processing") , The Institute of Electronics, Information and Communication Engineers, 1995, pp.
173-197) For example, in the delay-sum method, the target sound is extracted as follows.
[0004]
FIG. 2 is a diagram for explaining the principle of target sound extraction by the delay-and-sum
method. In FIG. 2, 1 is a sound pickup unit (microphone array) 21, 22,..., 2M is a microphone (M
is the number of microphones), 31, 32,..., 3M is a delay unit, 4 is an adder , 5 is an output signal,
6 is a noise suppression unit, d is a microphone interval, s (t) is a sound wave (t represents time)
arriving at the sound collection unit 1, and θ is a sound collection of the sound wave s (t) The
arrival angle to the unit 1, τ is the time difference (delay time) for the sound wave to reach each
microphone.
[0005]
It is assumed that the microphones 21, 22,..., 2M in FIG. 2 are linearly arranged at equal intervals
d, and the sound wave s (t) arrives at the microphones linearly arranged at an angle θ from a
distance. At this time, the distance that the sound wave that has reached the microphone 21
propagates until it reaches the microphone 22 is represented by d sin θ from the microphone
interval d and the arrival angle θ (FIG. 2). Similarly, the distance propagated to the i-th
microphone 2i (i = 2,..., M) is represented by (i-1) d sin θ. Therefore, the delay time τi until
reaching the microphone 2i (i = 2,..., M) can be expressed by the following equation (1) by
dividing this propagation distance by the speed of sound c based on the microphone 21 Be
[0006]
Here, when the output signal from each microphone 2i (i = 2,..., M) is represented by xi (t), since
the sound wave s (t) is delayed by τi, the following equation ( It becomes like 2).
[0007]
Here, it will be shown that if the delay amount Di of the delay unit 3i (i = 2,..., M) is appropriately
set, only the sound wave arriving from the θ direction can be emphasized and output to the
output signal 5.
03-05-2019
2
[0008]
The delay amount Di of the delay device 3i (i = 2,..., M) is set as in the following equation (3).
[0009]
D0 is a fixed delay amount to be added to prevent the accuracy in realizing the delay
characteristic with a digital filter when the value of .tau.i is too small.
[0010]
At this time, since the delay amount Di of equation (3) is generated in the signal of equation (2),
the output of delay device 3i (i = 2,..., M) is as shown by equation (4) Become.
[0011]
That is, regardless of the microphone number i, s (t) becomes the same signal delayed by D0.
[0012]
When the signals are thus added by the adder 4 after the phases are aligned, the sound waves
coming from the θ direction are emphasized by the amount of the addition.
On the other hand, since the sound waves coming from the θN direction different from the θ
direction are received with a delay time τN different from τi, the delay amount of equation (3)
does not match the phase, and the adder 4 adds the signals. It will not be emphasized even in
combination.
[0013]
Thus, in the delay-sum method, the sound wave coming from the target direction θ is
emphasized, and the noise coming from the other direction θN is relatively suppressed.
[0014]
At this time, if the target direction θ is scanned and the output signal of the microphone array is
monitored, the output signal becomes large when θ is directed to the target speaker, so that the
03-05-2019
3
direction of the target speaker can be searched.
Then, by emphasizing and adding the phases according to equation (4) so as to emphasize the
sound wave from the direction θ of the target speaker, that is, by aiming the directivity of the
microphone array in the direction of θ, the target sound is high. Sound can be picked up with
the SN ratio.
[0015]
Here, for convenience of explanation, although it has been described that a plurality of
microphones are arranged on a straight line at equal intervals d, the microphones may be
arranged at irregular intervals, and the arrangement shape is also two-dimensional · 3 You may
arrange in dimension.
[0016]
Also, as shown in FIG. 3, when the point sound source S is positioned relatively close to the array,
the delay elements 31, 32,. It is important to improve the sound collection SN ratio by providing
gains 71, 72,..., 7M in the latter stage of 3M and applying appropriate loads to these gains.
There is a way of giving the load as expressed by the following formulas (5), (6) and (7) (Nomura,
Kanada, Kojima "Near Field Microphone Array", Journal of the Acoustical Society of Japan, 53
Volume 2 (1997), pp. 110-116).
[0017]
Here, r1, r2, ..., rM are distances from the sound source S to the respective microphones 21, 22,
..., 2M, rC is the critical distance in the room, that is, the direct sound power and the
reverberation power of the sound source become equal The distance is expressed by rC = 表 わ
(0.0032 V / T) with respect to the room volume V [m 3] and the room reverberation time T
[seconds] (H. Kuttruff, “Room Acoustics (Third Edition)”, Elsevier Applied Science, pp. 100-132
(1991)).
At this time, the microphone array is most sensitive to the “point” of the position of the sound
03-05-2019
4
source S, and so to speak, the “focus” of the sensitivity is formed.
At this time, with respect to the distance ri (i = 1, 2,..., M) to each microphone, the delay amount
D 0 −ri / c (c: sound velocity) of the delay devices 31, 32. By scanning the focus of the
sensitivity by changing the gain g0, that is, a, and monitoring the array output, the position of the
target speaker can be found.
[0018]
Thus, the target sound can be picked up with a high sound collection SN ratio by finding the
existence area of the target speaker as the direction or position and directing the directivity of
the array to the existence area.
[0019]
As described above, as a method of in-field loud-speaking, in a conference, it is general to place a
microphone near each participant and to set the sensitivity direction opposite to the output
direction of the speaker.
Also, in the lecture hall, it is general to set so that the sensitivity direction of the microphone near
the speaker and the output direction of the speaker for the on-site loudspeaker do not match.
In this way, the sound from the speaker is collected by the microphone and amplified to prevent
the generation of a loop (howling) output from the speaker.
[0020]
On the other hand, in order to eliminate the inconvenience of space (on the desk) by placing a
microphone near the speaker, a plurality of microphones arranged at a place (for example, a
ceiling) away from the speaker are used. A scheme has been devised and proposed that picks up
the voice of the speaker at a high SN ratio by adding the microphone input with an appropriate
gain and an appropriate delay for the microphone input.
03-05-2019
5
(Nomura et al., "Considerations of near-field microphone array" Proceedings of the Acoustical
Society of Japan, March 1996)
[0021]
However, if the system focuses on a large input, if there is a speaker in the field, there is a
possibility that the speaker will be in focus. There were difficulties in doing.
[0022]
The speaker can confirm that his / her voice is being picked up by the microphone by listening to
his / her voice that has been louded into the venue, and from the point of speaking within the
venue, the loud speaker is made into the venue. There must be.
[0023]
An object of the present invention is to provide a speaker following type in-field loud-speaking
device and a voice input method capable of correctly estimating the position of a speaker even if
a speaker is installed in the place.
[0024]
SUMMARY OF THE INVENTION In order to achieve the above object, according to the present
invention, the number and position of loudspeakers in a field are given to the system in advance,
and processing for excluding the position of the speaker from the focal position candidate in the
field Incorporate
[0025]
Further, the number and the position of the speakers are given in advance prior to a meeting or
the like, or sounds for learning are made to flow from the speakers to perform sound source
detection as a system.
[0026]
BEST MODE FOR CARRYING OUT THE INVENTION A speaker following type of in-field loudspeaking apparatus and voice input method according to the present invention is a process of
scanning a focus position and adjusting the focus position to the speaker from the magnitude of
the signal level from the sound pickup system. By including processing for excluding the position
of the speaker, it is possible to prevent focusing on the position of the speaker.
03-05-2019
6
[0027]
DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS The embodiments of the present
invention will be described in detail with reference to the drawings.
[0028]
FIG. 1 is a block diagram showing the configuration of an embodiment of a speaker following
type in-speaker system according to the present invention.
In FIG. 1, reference numeral 11 denotes a microphone, which is arranged at a place away from
the speaker (for example, a ceiling).
A sound collection processing unit 12 performs amplification processing of a required gain and a
required delay for each sound signal input from the microphone 11, and performs processing of
converting into a predetermined sound input signal level and adding.
Reference numeral 13 denotes a control / determination processing unit, which compares the
control signal for providing the sound collection processing unit 12 with an operation pattern
such as a required gain and delay given by the sound collection processing unit 12 and the
addition signal for each operation pattern. Determine the position of the person.
A storage unit 14 stores, for each of the microphones 11, an operation pattern for giving gain,
delay time, and the like to an audio signal output from the microphone.
An output unit 15 outputs an audio signal input from each of the microphones 11 and subjected
to the above-described processing.
A speaker 16 is driven by the output of the output unit 15.
A control unit 17 performs control such that the position of the speaker 16 is not in focus.
03-05-2019
7
The reference numeral 18 denotes a line interface through which connection with the other line
is made.
[0029]
Next, the operation will be described. Delay processing and signal amplification processing are
applied to the audio signals from the plurality of microphones 11 for each input channel
according to the operation pattern given from the control / determination processing unit 13 in
the sound collection processing unit 12 The signal of the channel is added, and the position of
the speaker is estimated by comparing the added signal with the added signal of each operation
pattern in the control / determination processing unit 13. In general, the position of the focal
point corresponding to the operation pattern that outputs the largest added signal is taken as the
position of the speaker.
[0030]
Here, with the calculation pattern, calculation processing is performed to divide the target room
into meshes with an accuracy of about several centimeters to 1 m, and add appropriate delay
processing and signal amplification processing corresponding to each grid point as the focus
position candidate. Are prepared for each lattice point.
[0031]
At this time, the operation pattern instructed from the control / determination processing unit 13
to the sound collection processing unit 12 is accumulated in the storage unit 14 set in advance
according to the arrangement of the microphones 11.
[0032]
The input voice signal is output from the output unit 15 to the speaker 16 for in-field loudspeaking, and in the case of a communication conference, it is sent to the other party on the
network through the line interface 17.
Here, if the speaker 16 for in-field loud-speaking is in the same room as the speaker, the addition
signal in the control / determination processing unit 13 may be maximum when the position of
03-05-2019
8
the speaker 16 is in focus. Mistakes in position estimation may occur.
[0033]
Therefore, the positional information of the speaker 16 stored in the storage unit 14 is given to
the control unit 17, and the operation pattern corresponding thereto is excluded.
As a result, the position of the speaker other than the speaker position can be correctly
estimated.
[0034]
Although FIG. 1 shows the case where there is only one loudspeaker 16 for in-field amplification,
in the case of a plurality of loudspeakers 16, positional information of each loudspeaker 16 is
stored in storage unit 14 and the operation pattern corresponding thereto is excluded. To do.
[0035]
Next, another embodiment of the present invention will be described.
If a learning voice is sent from the speaker 16 prior to the conference and the speaker position at
that time is estimated, the position is the position of the speaker 16, so the corresponding
calculation pattern is stored in the storage unit 14, In the meeting place, if the operation pattern
corresponding to the position of the speaker 16 is excluded, the position of the speaker other
than the speaker position can be correctly estimated.
[0036]
That is, even if the position of the speaker 16 is not known or there is a change in the position of
the speaker 16, the position of the speaker other than the speaker position can be correctly
estimated by using the learning voice.
[0037]
03-05-2019
9
In this case, in the case where there are a plurality of speakers 16, the position of each speaker
16 can be known by making the learning voice flow one by one.
In addition, when learning voices are made to flow through a plurality of speakers 16
simultaneously, the operation pattern itself is stored and used for exclusion.
[0038]
According to the present invention, the speaker tracking type in-field loudspeaker and the speech
input method are the speech following speaker type in which the pattern that maximizes the
speech signal from among the speech signal processing results corresponding to the operation
pattern is the optimum pattern. The apparatus and its voice input method, wherein the position
of the speaker is excluded from the calculation pattern by storing the position information of the
speaker in the storage unit, so that the position of the speaker becomes a focus Instead, the
position of the speaker can be estimated correctly.
[0039]
In addition, since the learning voice is output from the speaker in advance, the optimal
calculation pattern at that time is stored in the storage unit, and the calculation pattern is
excluded from the subsequent processing, the speaker can be located even if the position is
unknown. It is possible to exclude the operation pattern corresponding to the position of, and to
estimate the position of the speaker correctly.
[0040]
Brief description of the drawings
[0041]
1 is a block diagram showing a configuration of an embodiment of a speaker following type of infield loudspeaker system to which the present invention is applied.
[0042]
2 is a diagram for explaining the principle of the target sound extraction according to the
conventional delay-sum method.
03-05-2019
10
[0043]
3 is a diagram for explaining the principle of another target sound extraction by the conventional
delay-sum method.
[0044]
Explanation of sign
[0045]
DESCRIPTION OF SYMBOLS 11 Microphone (s) 12 for voice input 12 sound-collection process
part 13 Control * determination process part 14 Memory | storage part 15 Output part 16
Speaker for in-field loud speaker 17 Control part 18 Line interface
03-05-2019
11
Документ
Категория
Без категории
Просмотров
0
Размер файла
20 Кб
Теги
jph1118187
1/--страниц
Пожаловаться на содержимое документа