Забыли?

?

# JP2006304124

код для вставкиСкачать
```Patent Translate
Notice
This translation is machine-generated. It cannot be guaranteed that it is intelligible, accurate,
complete, reliable or fit for specific purposes. Critical decisions, such as commercially relevant or
financial decisions, should not be based on machine-translation output.
DESCRIPTION JP2006304124
PROBLEM TO BE SOLVED: To determine an input direction of an audio signal which can be
specified by using a general-use personal computer and a generally-used sound device, which
can specify the speaker direction in 360 ° direction all around. And provide the method.
SOLUTION: An input signal of a first microphone group 21 consisting of a pair of microphones
21a and 21b, and a second microphone group 22 consisting of a pair of microphones 22a and
22b provided so as to intersect the first microphone group 21. And the phase differences
between the input signals of the respective microphone groups are detected, and the direction of
the sound source is determined from the phase difference of the first microphone group 21 and
the phase difference of the second microphone group 22 The direction of the sound source is
determined from the above, and the direction indicating the same direction among the four
directions respectively indicated by the first microphone group 21 and the second microphone
group 22 is determined as the direction of the sound source. [Selected figure] Figure 1
Sound source direction determining device and sound source direction determining method
[0001]
The present invention relates to an apparatus and method for determining an input direction of
an audio signal, and more particularly, to an apparatus and apparatus for determining a direction
of an audio signal capable of determining the direction of an audio signal input to sound pressure
acquisition means such as a microphone. .
[0002]
04-05-2019
1
The direction in which the audio signal is input has an important meaning for sound source
reproduction by a speaker at the time of music appreciation.
That is, when playing back music, if the sound generated from the speaker is, for example, vocal
voice, it is determined at which position the vocal is recorded with respect to the pair of left and
right microphone groups at the time of recording. The human who is listening to the sound from
the speaker senses the sound pressure (magnitude of sound) coming in from his own left and
right ears, and decides the direction in which the voice of the vocal comes. Determining the
direction of sound from such a sound source is generally performed using a stereo microphone,
and can be calculated by the magnitude of the input signal (sound pressure) to each stereo
microphone. . Further, determining the direction of the sound source generating such sound
pressure is convenient for automatically identifying the speaker from a plurality of participants,
for example, when imaging and relaying a video conference system or a conference. By
identifying the speaker and automatically controlling the imaging direction of the imaging
apparatus, there is an effect that the speaker and the other person can be distinguished and
displayed.
[0003]
By the way, in order to determine the direction of the sound source, the simplest one is to
measure the difference between the sound pressures input to the left and right microphone
groups, and to increase the sound pressure by the difference between the input sound pressures.
It is possible to calculate an angle corresponding to the difference in size on the microphone side
shown. In this case, the direction of the sound source can be determined within a range of 180 °
with respect to the position where the microphone group is installed. However, in the
measurement by the pair of left and right microphone groups, if the left and right direction of the
microphone group is 0 ° to 180 °, whether there is a sound source in the first quadrant and
the second quadrant? It was impossible to determine until there was any. Also, the closer the
sound source was to 0 ° or 180 °, ie, the closer to the installation direction of the microphone
group, the smaller the difference in sound pressure from the sound source for the change in
angle, and the less certainty in the difference in direction determination.
[0004]
In addition, at least three microphones are prepared to determine the direction of the sound
source by the microphone group capable of covering the entire circumference from the first
04-05-2019
2
quadrant to the fourth quadrant, and the same sound source input to the three microphone
groups is used. It is conceivable to measure the sound pressure and determine the direction of
the sound source to be input from the difference in the sound pressure. Alternatively, it is also
possible to measure the sound pressure input to each of the microphones and to determine the
direction of the sound source similarly by a microphone source further comprising a plurality of
microphones. As the number of microphones increases, the accuracy of sound source direction
determination improves. However, as the number of microphones increases, the process of
measuring the sound pressure to be input becomes enormous, and it takes a lot of time to be
determined, which is unsuitable for a video conference or the like.
[0005]
Then, the conventional example of a video conference system is demonstrated below.
Conventionally, in a video conference system performed using the Internet, communication of
image data and audio data using the Internet is performed. Only one or three people could use it
per person. Therefore, it was difficult to capture the expressions of the conference participants, it
was difficult to identify the speaker, and it was difficult to communicate the intention properly.
Therefore, there is a system in which the speaker is identified, the image of the speaker is
captured widely, and the intention of the speaker can be accurately transmitted to each
conference participant. For example, “TV conference system, camera control device in TV
conference system, and camera control method” (Japanese Patent Application Laid-Open No.
2001-339703, hereinafter referred to as Conventional Example 1). ）である。
[0006]
In the prior art example 1, “(the claim 1) said speech based on the plural sound collecting
means, the at least one speaker imaging means, the image display means, and the voice direction
information of the speaker obtained from the In the video conference system including the
imaging control means for changing the imaging direction of the imaging means for imaging the
person, the imaging control means is directed toward the estimated location of the speaker
estimated by the sound collection means. The orientation of the imaging means of the imaging
means is directed, and the motion pixels are extracted from the captured image, and the
distribution of the motion pixels is determined to specify the position on the image area of the
speaker, and the utterance A video conference system characterized in that the image pickup
control means is further controlled so that the speaker is displayed on a predetermined part of
an image area based on position information of the person. More specifically, as shown in FIG.
10, a plurality of microphones 100 are prepared, and the voice signal input to each microphone
04-05-2019
3
100 is detected by the speaker position detection means 101 from the phase difference of the
signals of the respective microphones 100. Identifies the direction of the speaker. However,
details of the method of identifying the speaker direction from the number of microphones 100
installed and the phase difference are not disclosed.
[0007]
Therefore, the "camera imaging control apparatus" (Japanese Patent Application Laid-Open No.
7-140527) taken as the prior art in the prior art 1 will be referred to as prior art 2 hereinafter. In
the conventional example 2, “(Claim 1) A plurality of microphones for detecting the sound
generation of a speaker, a phase difference detection circuit for detecting a phase difference of
sounds between these microphones, and a phase difference detection circuit What is claimed is:
1. A camera imaging control apparatus comprising: a camera direction control device that is
controlled so that the phase difference between the microphones becomes zero and controls the
direction of the camera and the plurality of microphones. It consists of others. In the second
conventional example, as shown in FIG. 11, the directions of the speakers A and B at different
positions with respect to the microphone 117A and the microphone 117B are determined. That
is, the phase difference detection circuit 118 determines the phase difference that is generated
from the difference in time required for the voices from the speakers A and B to reach the
microphone 117A or the microphone 117B. For example, in the speaker A, since the distance
from the speaker A is close in the microphone 117A and far in the microphone 117B, the voice
generated by the speaker A has a waveform as shown in FIG. In the microphone 117B, the
waveform detected in the microphone 117B has a similar shape, but the arrival of the sound is
delayed, so that the time as the horizontal axis appears slightly delayed as compared with the
figure, so-called phase shift occurs. This phase shift determines the direction of the speaker A. In
this case, since the arrival of the microphone 117B is delayed, the speaker A is found to be closer
to the microphone 117A than the microphone 117B, and the direction can be determined from
the magnitude of the phase difference. Then, the camera direction control device 119 controls
the direction of the camera 120 based on the result calculated by the phase difference detection
circuit 118. In addition, since the distance from the microphone 117A and the microphone 117B
is the same in the speaker B, as shown in FIG. Since the same waveform is formed, the phase
difference detection circuit 118 calculates that it is in the middle position, and the camera
direction control device 119 controls the direction of the camera 120 based on the result
calculated by the phase difference detection circuit 118.
[0008]
04-05-2019
4
Further, in the second conventional example, as shown in FIG. 14, a microphone 117D is
provided to enable calculation of the distance between the speaker B and the speaker C, and the
camera direction control device 119 can control the camera 120 according to the speaker
distance. I have to. That is, although the voices of the speaker B and the speaker C appear as the
same waveform in the microphone 117A and the microphone 117B, the voice of the speaker C
arrives earlier than the speaker B in the microphone 117D. It will occur. The camera direction
control device 119 can calculate the distance from this phase difference. However, in the case of
using the three or more microphones as described above, the inventor of the present invention is
generally provided with the computer when processing the voice input from each microphone
using a generally popular personal computer. Since input signals from the monaural microphone
input and the stereo line input are synthesized on the sound board when using the monaural
microphone input and the stereo line input, these inputs are used to process the inputs from the
three microphones. It was found that it could not be processed independently. Furthermore,
based on the above findings, the inventor should use a universal serial bus (abbreviated as USB
hereinafter) that can be used independently with an interface that is generally provided in a
commonly used personal computer based on the above findings. The input from multiple
microphones was temporarily converted to USB specifications by the USB conversion adapter and
input from the USB port to the personal computer for processing. However, when voice signals
simultaneously input from a plurality of microphones were input to a personal computer and the
phase was actually measured, it was found that an average of 780 μsec phase was generated
between the microphones despite the simultaneous input. did. According to the detection of the
phase difference as in the above-described conventional example, this error appears as a
deviation of several degrees to several tens of degrees as a result of position recognition of the
speaker, and it is difficult to accurately detect the speaker position. Found out. These errors are
considered to be due to the time difference that occurs when the operation system of the
personal computer processes the signal from each USB device, the individual difference of each
USB device, and the like. Japanese Patent Application Publication No. 2001-339703
(Conventional Example 1) Japanese Patent Application Publication No. 7-140527 (Conventional
Example 2)
[0009]
However, the specific example of the detection of the phase difference from the voice signal
inputted by each microphone 100 is not clearly shown in the conventional example 1, and the
speaker position identification by the microphones 117A to 117D shown in the conventional
example 2 When the direction of the microphone 117A and the microphone 117B is the X axis
and the direction of the camera 120 and the speaker B is the Y axis, it is possible to determine
the directions in the first quadrant and the second quadrant. It can not be specified whether it is
located in the first quadrant or in the fourth quadrant, and in Conventional Example 2 it can be
identified in which direction the 180 ° direction the speaker is located but in the 360 °
direction It has the problem that it can not do.
04-05-2019
5
[0010]
Therefore, it can not be used at all in the meeting room etc. which encloses a round table, and it
had the problem that an application range will be restricted.
Also, in addition to the microphones 117A and 117B, the microphone 117C and the microphone
117D can be added to calculate the direction and distance, and calculation of the direction with
higher accuracy is possible, but also 360 ° all around The problem is that it is impossible to
determine the direction along the way. Furthermore, at an angle close to the extension of the two
microphones, that is, around 0 ° or 180 °, the phase difference of the input signal becomes
small with a slight angle difference, and an error in the direction of the speaker occurs around 0
° or 180 °. As a result, even if the angle difference is small even at around 90 °, the phase
difference is large, so the speaker's direction is relatively accurate at around 90 °. It has the
problem that it can not be determined.
[0011]
Furthermore, according to the findings of the inventor, the prior art 1 and the prior art 2 have a
sound card which is a function and a generally available sound device that are generally provided
in commonly used personal computers. No example is given in the case of processing by the like,
and processing of the input signal from the microphone will be performed using a dedicated
conversion device etc. and can not be performed using a widely spread personal computer. I had
a problem. On the other hand, the inventor has found that no delay error occurs in the
processing of the left and right input signals for each of the stereo line inputs that are standardly
installed in personal computers. Furthermore, when an attempt is made to detect a phase
difference using a stereotype sound card called sound device, the input of sound does not cause
an error such as a delay in the input of sound signals from the left and right microphones input
to the sound card. I found that. Therefore, it has been noted that no delay error occurs in the
processing of the input signal between the left and right channels of the stereo line input
normally equipped in the personal computer and between the left and right channels of the
sound card.
[0012]
In view of the above problems, the present invention can be specified by using a general
04-05-2019
6
direction personal computer commonly used and a generally used sound device, as well as being
able to specify a speaker orientation in the 360 ° direction around the entire circumference.
Apparatus and method for determining the direction of possible audio signal input.
[0013]
Therefore, the inventor can identify the speaker direction in 360 ° all around compared to the
conventional example.
[0014]
A first microphone group consisting of a pair of microphones, A second microphone group
consisting of a pair of microphones provided so as to intersect the first microphone group, and
an input signal of the first microphone group are received and each microphone is received
Phase difference detection means for detecting the phase difference between the input signals
and receiving the input signals of the second microphone group to detect the phase difference
between the respective microphone input signals; The direction of the sound source is
determined from the phase difference of the microphone groups, and the direction of the sound
source is determined from the phase difference of the second microphone group, and the same
direction among the four directions indicated by the first microphone group and the second
microphone group And a direction determining means for determining the direction of the sound
source as the direction of the sound source. A sound source direction determining device
characterized in that the direction of the sound source can be detected in 360 ° direction by the
phase difference detected from the phone group and the second microphone group;
[0015]
I will provide a.
According to this sound source direction determining device, an audio signal from a sound
source is input to the first microphone group and the second microphone group.
Then, the phase difference detection means detects the phase difference due to the audio signal
input from the sound source input to the pair of microphones of the first microphone group, and
from the sound source input to the pair of microphones of the second microphone group Detects
the phase difference due to the audio signal input of
04-05-2019
7
The detected phase difference is input to the direction determining means. The direction
determining means determines the direction of the sound source with respect to the first
microphone group from the phase difference of the first microphone group. In this case, when
the direction of the first microphone group is taken as an axis, the direction of the sound source
is determined on both sides of the axis. That is, candidates for the direction of the sound source
exist on both sides of the axis. Similarly, the direction determining means determines the
direction of the sound source with respect to the second microphone group from the phase
difference of the second microphone group. When the direction determined in this case is an axis
of the direction of the second microphone group, the direction of the sound source is determined
on both sides of the axis as in the first microphone group. That is, candidates for the direction of
the sound source exist on both sides of the axis. The direction determining means obtains from
the phase difference between the first microphone group and the second microphone group from
the two directions obtained from the phase difference between the first microphone group and
the two directions from the phase difference between the second microphone group The
direction which becomes the same direction is determined as the direction of the sound source.
Therefore, if the video camera is rotated by the position control device of the video camera, for
example, in the determined direction, if the speaker is a sound source, the video camera can pick
up the speaker. In the present invention, in order to improve the accuracy of the direction of the
sound source,
[0016]
A first microphone group consisting of a pair of microphones, A second microphone group
consisting of a pair of microphones provided so as to intersect the first microphone group, and
an input signal of the first microphone group are received and each microphone is received
Phase difference detection means for detecting the phase difference between the input signals
and receiving the input signals of the second microphone group to detect the phase difference
between the respective microphone input signals; The direction of the sound source is
determined from the phase difference of the microphone groups, and the direction of the sound
source is determined from the phase difference of the second microphone group, and the same
direction among the four directions indicated by the first microphone group and the second
microphone group And the direction of the sound source, and the phase difference of the first
microphone group and the second And direction determining means for determining the
direction of the sound source from the phase difference of the microphone group having the
small phase difference among the phase differences of the microphone group, and the phase
difference detected from the first microphone group and the second microphone group A sound
source direction determining device characterized in that the direction of the sound source can
be detected in 360 ° direction,
04-05-2019
8
[0017]
I will provide a.
Therefore, according to the present invention, an audio signal from a sound source is input to the
first microphone group and the second microphone group. Then, the phase difference detection
means detects the phase difference due to the audio signal input from the sound source input to
the pair of microphones of the first microphone group, and from the sound source input to the
pair of microphones of the second microphone group Detects the phase difference due to the
audio signal input of The detected phase difference is input to the direction determining means.
The direction determining means determines the direction of the sound source with respect to
the first microphone group from the phase difference of the first microphone group. In this case,
when the direction of the first microphone group is taken as an axis, the direction of the sound
source is determined on both sides of the axis. That is, candidates for the direction of the sound
source exist on both sides of the axis. Similarly, the direction determining means determines the
direction of the sound source with respect to the second microphone group from the phase
difference of the second microphone group. When the direction determined in this case is an axis
of the direction of the second microphone group, the direction of the sound source is determined
on both sides of the axis as in the first microphone group. That is, candidates for the direction of
the sound source exist on both sides of the axis. The direction determining means obtains from
the phase difference between the first microphone group and the second microphone group from
the two directions obtained from the phase difference between the first microphone group and
the two directions from the phase difference between the second microphone group Find the
direction that is the same direction. The closer to the installation direction of the pair of
microphones, the smaller the change in the phase difference with respect to the angle change
(change in the direction of the sound source). Since the number is increased, the direction
determined by the microphone group with the smaller phase difference is determined as the
direction of the sound source. Therefore, the determined sound source direction can determine
the sound source direction based on more accurate data. Furthermore, in the present invention,
in order to further increase the accuracy in determining the direction of the sound source,
[0018]
A first microphone group consisting of a pair of microphones, A second microphone group
consisting of a pair of microphones provided so as to intersect the first microphone group, and
an input signal of the first microphone group are received and each microphone is received
Phase difference detection means for detecting the phase difference between the input signals
and receiving the input signals of the second microphone group to detect the phase difference
between the respective microphone input signals; The direction of the sound source is
04-05-2019
9
determined from the phase difference of the microphone groups, and the direction of the sound
source is determined from the phase difference of the second microphone group, and the same
direction among the four directions indicated by the first microphone group and the second
microphone group Is the direction of the sound source, and if the phase difference is within the
predetermined threshold, both microphones The direction of the sound source is determined by
adopting the phase difference of the phone group to determine the direction of the sound source,
and when the phase difference exceeds a predetermined threshold, the direction determined
from the microphone group whose phase difference does not exceed the threshold is determined.
Sound source direction determining device characterized in that the direction of the sound
source can be detected in 360 ° direction by the phase difference detected from the first
microphone group and the second microphone group. ,
[0019]
I will provide a.
Therefore, according to this sound source direction determining device, when the direction
determining means determines the direction of the sound source, the larger the phase difference
of the audio signals from the pair of microphones, the larger the error in determining the
direction. Smaller (more accurate) values of the phase difference can be used to increase the
accuracy of the direction of the determined sound source by such operations as adopting and
averaging both the first microphone group and the second microphone group. . On the other
hand, when the phase difference of the microphone group is larger (less accurate) than the
threshold value, the value of the microphone group showing the large phase difference is which
of the two directions indicated by the other microphone group When determining the direction
of the sound source, the phase difference of the other microphone group is adopted to determine
the direction of the sound source. Therefore, since the determined direction of the sound source
has high accuracy if the phase difference is within the threshold, both data can be adopted, and
the phase difference exceeding the threshold can be adopted from the two directions indicated
by the phase difference not exceeding the threshold. Since it is employed only to decide which
one of the two is adopted, it is possible to decide the sound source direction based on more
accurate data without affecting the direction of the sound source indicated by the phase
difference with high accuracy. Then, each microphone group performs processing for obtaining a
phase difference using a stereo line input standardly installed in a personal computer or a
stereotype sound card generally used. Can be processed without causing an error in the phase
difference of Furthermore, in the present invention,
04-05-2019
10
[0020]
A second microphone consisting of a pair of microphones that receives the input signals of a first
group of microphones consisting of a pair of microphones and detects the phase difference
between the respective microphone input signals and is arranged to intersect the first group of
microphones The group input signal is received to detect the phase difference between the
respective microphone input signals, and the direction of the sound source is determined from
the detected phase difference of the first microphone group, and the detected phase difference of
the second microphone group By finding the direction of the sound source and determining the
direction indicating the same direction among the four directions indicated by the first
microphone group and the second microphone group as the direction of the sound source, the
direction of the sound source is rotated 360 degrees. A sound source direction determining
method characterized in that
[0021]
And a second pair of microphones provided to receive input signals of a first microphone group
consisting of a pair of microphones, detect phase differences between the respective microphone
input signals, and to cross the first microphone group. Receiving an input signal of the
microphone group of the first group, detecting a phase difference between the respective
microphone input signals, determining a direction of the sound source from the phase difference
of the first microphone group detected, and detecting a position of the second microphone group
detected. The direction of the sound source is determined from the phase difference, and the
direction indicating the same direction among the four directions indicated by the first
microphone group and the second microphone group is taken as the direction of the sound
source, and the phase difference of the first microphone group and the second Of the phase
difference of microphone groups, it is obtained by the phase difference of microphone groups
having a small phase difference By determining the direction in the direction of the sound source,
the sound source direction determined wherein the detectable over the direction of the sound
source in the 360 ° direction,
[0022]
And a second pair of microphones provided to receive input signals of a first microphone group
consisting of a pair of microphones, detect phase differences between the respective microphone
input signals, and to cross the first microphone group. Receiving an input signal of the
microphone group of the first group, detecting a phase difference between the respective
microphone input signals, determining a direction of the sound source from the phase difference
of the first microphone group detected, and detecting a position of the second microphone group
detected. The direction of the sound source is determined from the phase difference, and the
direction indicating the same direction among the four directions indicated by the first
microphone group and the second microphone group is the direction of the sound source, and
04-05-2019
11
both phase differences are within a predetermined threshold. The phase difference of the
microphone group is adopted to determine the direction of the sound source, and the phase
difference exceeds the predetermined threshold. In this case, the sound source direction
determined wherein the determining the direction obtained from the microphone group towards
the phase difference does not exceed the threshold value in the direction of the sound source,
[0023]
I will provide a.
Therefore, according to this method, the direction of the sound source can be determined by the
first microphone group and the second microphone group, as in the above-described devices.
[0024]
Therefore, according to the present invention, the direction of the sound source can be
determined 360 ° around the entire circumferential direction by the phase difference between
each of the second microphone group and the second microphone group. It is possible to process
the phase difference between each of the two microphone groups without a delay error etc. using
a stereo line input standardly provided in a personal computer, and a stereotype sound card
generally used. The direction of the sound source can be determined over 360 ° all around at
low cost without using a special device.
[0025]
Furthermore, according to the present invention, among the phase differences of the first
microphone group and the second microphone group, the data of the microphone group with a
small phase difference can be adopted to determine the direction of the sound source. That is, the
phase difference of the large error microphone group is used only to determine the direction of
the sound source from the two directions determined by the other microphone group in
determining the direction, thereby reducing the error in the determined sound source direction.
Is possible.
[0026]
Further, in the present invention, among the phase differences of the first microphone group and
04-05-2019
12
the second microphone group, the phase difference is determined by both the microphone
groups in the case of the phase difference appropriate for determining the direction of the sound
source by the predetermined threshold. More accurate (accurate) sound source direction
determination can be made by averaging the directions from the phase difference.
In addition, when it is below the threshold value, that is, the phase difference of the microphone
group having a phase difference in which the error becomes large when determining the
direction of the sound source, the phase difference of the microphone group is determined from
the two directions determined by the other microphone group. Since it is used only to determine
the direction, it is possible to reduce the error of the determined direction of the sound source.
[0027]
The left and right pair of microphones of the first microphone group are separated by an
appropriate distance, and the left and right paired microphones of the first microphone group are
separated by the same distance as the respective microphones of the first microphone group. A
center of the microphone group and a center of the second microphone group are aligned, and
both the microphone groups are placed at right angles, and a microphone array consisting of the
first microphone group and the second microphone group is provided.
[0028]
For the purpose of explanation, the directions of the left and right microphones of the first
microphone group provided in the microphone array are taken as X axis, and the directions of
the left and right microphones of the second microphone group orthogonal to the X axis as Y
axis. The regions represented by the respective axes will be described as the first quadrant to the
On the other hand, two sound cards are installed in the personal computer, and an output signal
from the sound card is referred to as a central processing unit (CPU, hereinafter simply referred
to as CPU) of the personal computer.
Make it processable by).
The sound card is capable of converting an audio signal input from the microphone array into
04-05-2019
13
audio data that can be processed by the personal computer, and has a pair of left and right inputs
to process each input signal by the CPU of the personal computer.
[0029]
Each microphone from the microphone array connects the left and right pair of the first
microphone group to the left and right input of one sound card to process the audio signal as the
left and right pair of audio signal and processes the left and right pair of the second microphone
group Connected to the left and right inputs of another sound card, the audio signal is processed
as a pair of left and right audio signals.
The personal computer comprises an interface capable of receiving audio data converted by the
sound card, and further comprises a memory, buffer, CPU, input / output means such as a
keyboard and a mouse, etc. which the personal computer is normally equipped with Can process
the audio data converted by
[0030]
In the personal computer, CPU, buffer, memory and other means receive the audio signal of the
first microphone group input from one sound card and calculate the phase difference between
the left and right audio signals and the other A phase difference detection unit is configured to
receive the audio signal of the second microphone group input from the sound card and detect
the phase difference between the left and right audio signals. Furthermore, in the personal
computer, each means such as CPU, buffer and memory determines the direction of the sound
source from the phase difference of the first microphone group detected by the phase difference
detection means, and the phase difference of the second microphone group The direction of the
sound source is detected from the above, and the direction determining means is configured to
select the direction of the sound source appearing in the same quadrant of the two microphone
groups among the four directions detected from the respective microphone groups.
[0031]
Then, in the direction determining means, of the sound source direction detected from the first
microphone group appearing in the same quadrant and the sound source direction detected from
04-05-2019
14
the second microphone group, the direction result farther from the axial direction of the
microphone group, A microphone group (for example, 90 for 110 °) which constitutes an axis
far from the calculated angle of the 90 ° and 180 ° two axes that constitute the second
quadrant when the direction of the sound source appears in the quadrant Since the axis closer to
180 ° than ° is far, the direction determined from the first microphone group is determined as
the direction of the audio signal. Then, for the determination, an average value of a plurality of
samplings performed within a predetermined time is adopted.
[0032]
Embodiments of the present invention will be described below based on the drawings. FIG. 1 is
an explanatory view showing an embodiment of the present invention, FIG. 2 is an explanatory
view showing a state of a conference room, FIG. 3 is an explanatory view showing details of the
embodiment of the present invention, and FIG. It is an explanatory view showing a waveform and
a signal inputted with a phase difference, (a) represents a near distance, (b) represents a
waveform at a long distance, and FIG. 5 is an explanation for explaining a method of determining
the direction of the sound source. FIG. 6 is an explanatory view for determining the phase
difference, FIG. 7 is an explanatory view for explaining the phase difference, FIG. 8 shows an
equation for calculating the phase difference, and FIG. 9 is a phase difference It is an explanatory
view explaining the direction of.
[0033]
Reference numeral 1 denotes a video conference system using the sound source direction
determining apparatus according to the present invention. The video conference system 1
includes a microphone array 2, a stereo microphone mixer 3 for adjusting the volume from the
microphone array, and phase difference detection means And a personal computer 4 constituting
a direction detection means.
[0034]
The microphone array 2 comprises a first microphone group 21 and a second microphone group
22. The first microphone group 21 comprises a pair of microphones 21a and 22b, and the
second microphone group 22 comprises a pair of microphones 22a and 22b respectively.
The microphones 22a and 21b of the first microphone group 21 are installed and fixed at a
04-05-2019
15
predetermined distance, and the microphones 22a and 22b of the second microphone group 22
are also installed and fixed at a predetermined distance. Furthermore, the first microphone group
21 and the second microphone group 22 are disposed to intersect and be orthogonal to each
other at the center of the distance between the microphones 21a and 21b and the center of the
distance between the microphones 22a and 22b. Therefore, as shown in FIG. 1, the microphone
21a is installed in the 0 ° direction, the microphone 21b is installed in the 180 ° direction, the
microphone 22a is installed in the 90 ° direction, and the microphone 22b is installed in the
270 ° direction. Will be installed.
[0035]
The microphone mixer 3 is a stereo type microphone mixer, has two channels of microphone
input, can adjust the volume of each input audio signal, and can output the volume-adjusted
audio signal respectively, The output from the mixer 3 is input to the personal computer 4 as an
audio signal of each of the microphones 21a, 21b, 22a, 22b. Therefore, two microphone mixers 3
are provided, and one microphone mixer 3 receives one pair of audio signals output from the
first microphone group 21 and the other has one pair output from the second microphone group
22. The voice signal of is input. In this embodiment, although the microphone mixer 3 of stereo
type is used, it may be one having four or more channels. Then, in the microphone mixer 3 to
which the first microphone group 21 is input, adjustment is performed to make the volumes of
the microphones 21a and 21b uniform. That is, when there is a sound source at the center of the
first microphone group 21 (the intersection point of the first microphone group 21 and the
second microphone group 22) and an audio signal is generated, the input sound volumes of the
microphone 21a and the microphone 21b are the same. It has been adjusted to become Similarly,
in the microphone mixer 3 to which the second microphone group 22 is input, the input sound
volumes of the microphones 22a and 22b are adjusted to be the same. Desirably, the input sound
of each of the microphones 21a, 21b, 22a, 22b is adjusted to be the same.
[0036]
The personal computer 4 is a computer generally used as shown in FIG. 1, and although not
described in detail, it comprises a memory and a buffer for performing arithmetic processing
centering on the CPU and a hard disk as a storage device, And it consists of input / output
devices such as a display, a keyboard, a mouse, and a printer. Then, as shown in FIG. 3, the
personal computer 4 includes a sound card 41 as a sound device. The sound card 41 is a sound
device that inputs audio signals of left and right two channels and enables digital processing by
the personal computer 4, and four microphones 21 a used in the first microphone group 21 and
04-05-2019
16
the second microphone group 22. , 21b, 22a, 22b are processed, and therefore, it comprises two
sound cards 41a and 41b. Then, audio signals from the microphones 21a and 21b of the first
microphone group 21 are input to the first sound card 41a via the microphone mixer 3, and the
second sound card 41b is supplied with a second microphone. Audio signals from the
microphones 22 a and 22 b of the group 22 are input through the microphone mixer 3. Then,
the analog voice signal is converted to a digital voice signal by the sound card 41, and voice
processing by the personal computer 4 becomes possible.
[0037]
The personal computer 4 further includes phase difference detection means 42. The phase
difference detecting means 42 is substantially constituted by a CPU, a memory, a buffer and the
like provided in the personal computer 4. The phase difference detection means 42 is a means
for detecting the phase difference between two input audio signals, and a specific example of the
detection will be described below.
[0038]
FIG. 4 shows an input waveform in the case where the distance from the sound source S is 50 cm
by the microphone 21a and 100 cm by the microphone 21b, and the waveform S1 of FIG. The
waveform S2 of (b) represents a signal obtained by inputting the audio signal of the sound source
S by the microphone 21b, the horizontal axis represents time, and the vertical axis represents the
amplitude of the audio signal. The waveform shown in FIG. 4 is an audio signal when a hand is
struck on the sound source S. The input sensitivities of the microphones 21a and 21b are
adjusted in advance by the mixer 3, and are adjusted so as to represent the same amplitude when
the distances from the sound source are the same. As can be seen from FIG. 4, the amplitude of
the signal attenuates as the distance from the sound source increases. However, the waveform
has a substantially similar shape, and the peak point S3 of each waveform is present at the same
position in the same waveform. Further, in the portion where the peak time exists, the
microphone 21b farther from the sound source S is detected slightly later than the microphone
21a. The difference between the arrival times is the phase difference.
[0039]
And, in obtaining this phase difference, a time difference calculation method by pattern matching
04-05-2019
17
is adopted. That is, as shown in FIG. 6 in which the vertical axis represents the magnitude
(amplitude) of the signal and the horizontal axis represents the number of samples converted to
digital data by the number of sampling frequencies per second, the input waveform S1 from the
microphone 21a Appears like). Also, the input waveform S2 from the microphone 21b appears as
shown in (b). Therefore, one of the waveforms at a predetermined time of the input waveform S1
is shifted in the sample number direction (time axis direction) and comparisons are repeated to
obtain the number N of moving samples when overlapping in a shape similar to the input
waveform S2, The number of moving samples N is converted to the delay time by dividing it by
the sampling frequency. This delay time is treated as a phase difference. Therefore, in the
example shown in FIG. 6, if the moving sampling number N is 22 and the sampling frequency is
44.1 kHz adopted for music data etc., the delay time is determined to be about 0.000499
seconds.
[0040]
In this embodiment, the delay time is calculated from the difference between the input
waveforms of the microphone 21a and the microphone 21b and treated as a phase difference.
However, the phase difference is not only represented by the delay time, and the delay is
considered when handling the waveform. The phase difference can also be represented by the
angle difference, and the phase difference may be represented by adopting another measurement
quantity such as the delay angle without necessarily expressing the phase difference by the time
difference. Then, the calculation of the phase difference as described above is similarly
performed for the microphones 22a and 22b of the second microphone group 22. The phase
difference detection means 42 calculates such a phase difference every 0.05 seconds to obtain
the phase difference each time. Therefore, the phase difference between the first microphone
group 21 and the second microphone group 22 is calculated and output every 0.05 seconds. Of
course, it does not necessarily have to be performed every 0.05 seconds, and it is sufficient to
appropriately set the sampling frequency and the sampling time according to the quality of the
phase difference to be obtained.
[0041]
Incidentally, in the calculation of the time difference by pattern matching as shown in FIG. 6, the
sound waveform input to the first microphone group 21 and the second microphone group 22
includes the sound of turning paper as a material during the meeting, and the meeting Since
echoes due to the room environment, standing waves and background noises due to the
conference room and surrounding environment, etc. are also superimposed because they are
04-05-2019
18
superimposed on waveforms that are not suitable for calculating the phase difference of the
voice signal of the speaker. By detecting the phase difference as described above based on the
waveform obtained by removing the unnecessary waveform in advance, the quality of the phase
difference can be further improved. That is, as a result of examining the characteristics of the
frequency by analyzing the frequency of the voice waveform obtained by the microphone 21a or
the like in the conference room by fast Fourier transform or the like, the inventor handled a
frequency band of 90 Hz to 3500 Hz as the human voice frequency. It was found that Among
these, we found the presence of power supply noise in the low frequency band, and also found
that the paper noise generated by the paper turning during the meeting was the sound generated
as a component of the high frequency band, so the phase difference When the frequency band
used to obtain the frequency component is approximately 300 Hz to 2500 Hz, it is possible to
detect a good phase difference, so frequency components other than the frequency band are
removed. However, the removal of these frequency components is not necessarily performed,
and it is sufficient to appropriately remove only the low frequency band or only the high
frequency band or the like without processing.
[0042]
Furthermore, the personal computer 4 is provided with direction determining means 43. The
direction determining means 43 substantially comprises a CPU, a memory, a buffer and the like
provided in the personal computer 4. The direction determination means 43 receives the phase
difference of the first microphone group 21 and the second microphone group 22 detected by
the phase difference detection means 42, and detects the direction of the sound source relative
to the microphone array 2 from the input phase difference. It is a means to decide. The principle
of direction determination performed by the direction determining means 43 is based on the
principle of FIG. That is, FIG. 7 shows the principle of determining the direction of the audio
signal input to the microphones 21a and 21b of the first microphone group 21. The distance
between the microphones 21a and 21b is d, and the sound wave from the sound source is S0. I
assume.
[0043]
The sound wave S0 generated from the sound source reaches the microphone 21a and the
microphone 21b at substantially the same angle when the first microphone group 21 is
sufficiently far from the distance between the microphone 21a and the microphone 21b. The
arrival angle at this time is θs with respect to the direction perpendicular to the direction of the
first microphone group 21. When a line segment is drawn from each of the microphones 21a and
04-05-2019
19
21b in the direction perpendicular to S0, the angle between the line segment d in the direction of
the first microphone group 21 and the line segment drawn perpendicularly from the microphone
21a to S0 is an angle The distance ξ between the lines drawn in the direction perpendicular to
S0 from each of the microphones 21a and 21b appears as a distance which is excessively taken
from the microphone 21a to reach the microphone 21b, that is, a delay distance. .
[0044]
The delay distance ξ can be obtained by “ξ = d sin θs”. Since the sound velocity v is
constant under the same condition although there is a slight difference depending on the
temperature condition, if the delay time is expressed by τ, the delay time is “τ = ξ / v” using
the delay distance ξ. Since the direction θs of the sound source is obtained from the delay time
τ or the delay distance ((the distance d and the sound velocity v are known quantities). ) It can
be determined by the equation shown in FIG. Therefore, the direction determining means 43
detects the direction θs by the delay time which is the phase difference of the first microphone
group 21 inputted from the phase difference detecting means 42. As shown in FIG. 5, the
direction θs of the detected sound source is one of the directions of the area M1 and the area
M2 among the areas M1 to M4 divided by the first microphone group 21 and the second
microphone group 22. It is θ1. This is because even if the sound source S exists in the area M2,
only the angle can be determined from the equation shown in FIG.
[0045]
Further, since the direction determining means 43 performs the same direction detection
operation also in the second microphone group 22, the direction θs of the sound source
detected by the phase difference of the second microphone group 22 as in the first microphone
group 21 is Since it is detected as the direction θ2, it is either the direction of the region M2 or
the region M3 in FIG. Therefore, in the direction determining means 43, whether θ1 detected by
the phase difference of the first microphone group 21 is 0 ° to 90 ° or 90 ° to 180 ° and the
phase difference of the second microphone group 22 The sound source is present in the same
direction as the direction shown in the same area, in the example shown in FIG. 5, in the direction
shown in the area M2 according to whether the detected θ2 is 90 ° to 180 ° or 180 ° to 270
[0046]
04-05-2019
20
Furthermore, in the direction determining means 43, the direction θ1 detected from the first
microphone group 21 forms an angle of 0 ° or 180 ° (an angle formed with the direction of
the first microphone group 21) and the second microphone group 22 The detected direction θ2
is compared with the size (angle formed with the direction of the second microphone group 22)
to be 90 ° or 270 °, and the direction with the larger size is determined as the direction θs of
the sound source. The reason why the direction determining means 43 is configured to adopt
either θ1 or θ2 in this way is that the closer to the installation direction of the pair of
microphones, the smaller the change in phase difference with respect to the angle change
(change in the direction of the sound source) So, in other words, the larger the phase difference
between the sound signals from the pair of microphones, the larger the error in determining the
direction, so it is better to determine the direction determined by the microphone group with the
smaller phase difference as the sound source direction. It is possible to detect the direction θs of
the sound source with less error due to the decrease.
[0047]
When the direction θs of the sound source is determined by the direction determining unit 43,
the direction θ1 or θ2 calculated from the phase difference between the microphone groups 21
and 22 is about 45 ° when the angle from each of the microphone groups 21 and 22 is about
45 °. Both are relatively accurate values. Therefore, for example, 30 ° around 45 ° is set in
advance as a threshold, and if any detected direction θ1 or θ2 is large from the microphone
group, both directions are detected. Using the average value of the values of θ1 and θ2, if
either one is closer to the direction of the microphone group than this threshold (in this case, the
other is necessarily larger than this threshold), the other direction is used. Alternatively, another
embodiment may be configured to improve the accuracy of the direction θs of the sound source.
[0048]
Further, in the personal computer 4, the sound source direction θs determined by the direction
determining means 43 with respect to the video camera control device 5 is converted into a
numerical value so that the video camera control device 5 can control the video camera 6. A
direction output device 44 is provided.
The sound source direction output device 44 can convert the direction θs of the sound source
into a direction instruction signal to be controlled by the video camera control device 5 and
output it. The personal computer 4 further includes an audio output device 45. The audio output
04-05-2019
21
device 45 is a device for amplifying and outputting an audio signal input from the sound card 41,
and is connected to the speaker system 8 through an interface (not shown) provided in the
personal computer 4 to output an audio signal. The speaker system 8 outputs the voice of the
speaker.
[0049]
The video camera control device 5 is connected via an interface (not shown) provided to the
sound source direction output device 44 and the personal computer 4 and is capable of imaging
with a video camera and setting the direction of imaging to the direction θs of the sound source
to be input. It can be changed based on it. Accordingly, since the video camera control device 5
captures the direction θs of the sound source, the video of the speaker can be captured.
Furthermore, the video camera control device 5 is connected so as to be able to output the
captured video signal to the video communication PC 6, and outputs the captured video to the
video communication PC 6. Since this video camera control device 5 is expensive as a device for
rotating in the entire 360 ° and capturing an image, the video camera control device 5 may be a
plurality of video cameras and a rotation control device capable of rotating at a predetermined
angle. The imaging range of the television conference room may be allocated and determined in
advance, and the camera image to be imaged may be switched and imaged according to the
direction θs of the sound source output from the sound source direction output device 44.
Further, the sound source direction output device 44 may be configured to output a signal for
switching a camera to be imaged according to the direction θs of the sound source.
[0050]
The video communication PC 6 is configured to be connectable to the Internet environment, can
communicate with a video conference system existing elsewhere through the Internet
environment, and can transmit a video captured by the video camera control device 5 to the
video conference system. At the same time, it is possible to receive a video signal obtained by
imaging a video conference room in another place from the video conference system in another
place. Further, the video communication PC 6 is connected to the display device 7 installed for
each of the conferees participating in the video conference system 1, and the display device 7
displays the video of the conference at another place and the video captured by the video camera
control device 5. Can be switched as appropriate, or signal output can be made as appropriate. In
the above embodiment, an example in which the microphone array 2 is configured by two pairs
of microphone groups of the first microphone group 21 and the second microphone group 22
has been described, but the microphone groups do not necessarily have to be two pairs. The
04-05-2019
22
number may be increased for the purpose of improving the When the microphone group is three
pairs (the number of microphones is six) and the sound card is three, the centers of the
respective microphone pairs are aligned and arranged such that each microphone pair forms an
angle of 60 degrees. Do. And the phase difference of the audio | voice input signal obtained by
three pairs of microphones is comprised so that the phase difference of the smallest microphone
pair may be employ | adopted. Also, in the above embodiment, processing is performed focusing
on the peak value of the audio signal input in 0.05 seconds, but this audio signal division time is
not limited to 0.05 seconds, but it depends on the performance of the device, etc. The time may
be divided into smaller times or larger divided times.
[0051]
The present invention can be used to capture an image of a speaker of a teleconferencing system
performed at remote locations, and by determining the direction of the speaker, it is possible to
control the direction of an imaging device such as a video camera. A plurality of video camera
devices can be identified when there is a large number of sound source symmetries, such as
customers or their area users such as customers, or animals, etc. When capturing images while
chasing each sound source, or when switching the range that can not be rotated when the range
that can be rotated of the video camera device is limited to another video camera device that can
capture images, etc. It can be used for systems that need to identify the direction of the sound
source in the 360 ° range.
[0052]
Explanatory drawing showing the embodiment of this invention Explanatory drawing showing
the state of the conference room Explanatory drawing showing the details of the embodiment of
this invention It is an explanatory drawing showing the same waveform and the signal inputted
with a phase difference, and (a) is a short distance (B) illustrates the method of determining the
direction of the sound source representing the waveform at a long distance explanatory diagram
for determining the phase difference explanatory diagram for describing the phase difference
equation for calculating the phase difference The explanatory diagram for explaining the
direction of the phase difference The explanatory diagram for representing the configuration of
the conventional example 1 The illustration for representing the configuration of the prior art 2
The diagram for representing the state in which the waveform of the prior art 2 is delayed
Explanatory drawing showing a non-existing state Explanatory drawing showing the other
structure of prior art example 2
Explanation of sign
04-05-2019
23
[0053]
DESCRIPTION OF SYMBOLS 1 video conference system 2 microphone array 21 1st microphone
group 22 2nd microphone group 3 microphone mixer 4 personal computer 41 sound card 41a
1st sound card 41b 2nd sound card 42 phase difference detection means 43 direction
determination means 44 Sound source direction output device 45 Audio output device 5 Video
camera control device 6 PC for image communication 7 Display device
04-05-2019
24
```
###### Документ
Категория
Без категории
Просмотров
0
Размер файла
42 Кб
Теги
jp2006304124
1/--страниц
Пожаловаться на содержимое документа