close

Вход

Забыли?

вход по аккаунту

?

JP2011091851

код для вставкиСкачать
Patent Translate
Powered by EPO and Google
Notice
This translation is machine-generated. It cannot be guaranteed that it is intelligible, accurate,
complete, reliable or fit for specific purposes. Critical decisions, such as commercially relevant or
financial decisions, should not be based on machine-translation output.
DESCRIPTION JP2011091851
To provide a robot capable of collecting sound with a high S / N ratio, and a sound collecting
device. A sound collection device according to one aspect of the present invention includes a
microphone unit including a first channel CH1 formed of a plurality of microphones arranged on
a first virtual straight line L1, and a first virtual straight line L1. The second channel CH2
composed of a plurality of microphones arranged on the second virtual straight line L2 inclined
approximately 60 degrees with respect to the first virtual line L1 and the second virtual straight
lines L1 and L2 A third channel CH3 consisting of a plurality of microphones arranged on a third
imaginary straight line L3 and means for selectively using the first to third channels according to
the direction of the sound source of the sound acquired by the microphone unit , Is provided. The
first to third imaginary straight lines L1 to L3 extend in the horizontal direction and do not
intersect at one point. [Selected figure] Figure 4
Robot and sound collection device
[0001]
The present invention relates to a robot and a sound collecting device, and more particularly to a
robot using a plurality of microphones and a sound collecting device.
[0002]
There is disclosed a technology using a plurality of microphones (hereinafter also referred to as a
microphone or a microphone element) to detect an audio signal emitted by a user (Patent
Documents 1 to 8).
04-05-2019
1
In the patent documents 1, 4 and 8, the sound source direction is specified by a plurality of
microphones. In Patent Document 2, at least three microphones are arranged on each spatial axis
(X, Y, Z). Further, in Patent Document 3, a directional microphone is attached to the head of the
robot, and an array microphone is attached to the body. Further, in the microphone system of
Patent Document 5, a sensor for determining the direction of the speaker is provided. Then,
among the plurality of microphones, only the microphone closest to the sound source is turned
on, and the other microphones are turned off. In Patent Document 6, voices are collected by
microphones of a plurality of channels to determine an utterance period.
[0003]
JP 2006-181651 JP-A 12-134688 JP-A 2007-221300 JP-A 2002-286084 JP-A 2006-245725 A
International publication 2004/071130 pamphlet JP-A 2004-274763 A special feature Japanese
Patent Publication No. 2006-245725
[0004]
In the microphone array, sounds collected by a plurality of microphones can be added.
This enables clearer sound recording than a single microphone. Specifically, addition processing
is performed after DSP delay control is performed on audio data collected by the microphone. As
a result, the S / N ratio can be improved, so that clearer and more audible sound can be realized.
[0005]
The sound collecting ability can be enhanced by forming a sound collecting beam from the
microphone to the speaker in the horizontal direction. At the same time, it eliminates excess
ambient noise, making it less likely to catch looping noises that cause echo.
[0006]
However, when the direction of the sound source is not determined, it is necessary to provide a
04-05-2019
2
large number of microphones in the above microphone array. There is a problem that the
microphone array becomes large. In particular, in the case of a humanoid robot, the mounting
position of the microphone array is limited. For example, when the microphone array is mounted
on the front side of the robot, as shown in FIG. 15, the microphone pointing area is limited to the
front of the robot. In addition, the microphone position is not at a level with the height of the lip
of the speaker 52. For this reason, the noise in the direction in which the sound collection beam
is to be intensified is strengthened, and the sound to be collected is relatively weak. That is, the
performance is lowered for the event that the lip height of the speaker is not level. Thus, the S /
N ratio (signal noise ratio) is lowered due to the relationship between the direction of the sound
source and the position of the microphone array.
[0007]
The present invention has been made in view of the above problems, and an object of the present
invention is to provide a robot capable of collecting sound with a high S / N ratio, and a sound
collecting device.
[0008]
A first aspect of the present invention is a sound collection device comprising a microphone unit
having a plurality of microphones, wherein the microphone units are arranged from a plurality of
microphones arranged on a first virtual straight line. A second channel comprising a plurality of
microphones arranged on a second virtual straight line inclined approximately 60 degrees with
respect to the first virtual straight line, and the first and second virtual According to the third
channel comprising a plurality of microphones arranged on a third virtual straight line inclined
approximately 60 degrees with respect to each of the straight lines, and the direction of the
sound source of the sound acquired by the microphone unit, Means for selectively using the third
to the third channels, wherein the first to third imaginary straight lines extend in the horizontal
direction and do not intersect at one point.
Thus, sound can be collected at a high S / N ratio.
[0009]
A sound collection apparatus according to a second aspect of the present invention is the abovedescribed sound collection apparatus, wherein at least one microphone included in the first
04-05-2019
3
channel and one microphone included in the second channel are It is common. This can reduce
the number of microphones.
[0010]
A sound pickup apparatus according to a third aspect of the present invention is the above sound
pickup apparatus, wherein the direction of the sound source is estimated, and the direction from
the first to third channels is closest to the direction of the sound source. It further comprises
means for selecting a channel, and means for delaying and adding the signals detected by the
plurality of microphones constituting the selected channel. Thus, sound can be collected at a high
S / N ratio.
[0011]
A robot according to a fourth aspect of the present invention has the above-described sound
collecting device on the head.
[0012]
According to the present invention, it is possible to provide a robot capable of collecting sound
with a high S / N ratio, and a sound collecting device.
[0013]
FIG. 1 is a front view schematically showing a robot according to an embodiment of the present
invention.
It is a figure showing the head of the robot concerning the embodiment of the present invention.
It is a figure which shows the structure of the sound collection apparatus provided in the robot. It
is a top view which shows the structure of the microphone unit provided in the sound collection
apparatus. FIG. 6 is a diagram showing the configuration of channel 1 in the microphone unit
provided in the sound collection device. FIG. 6 is a diagram showing the configuration of channel
2 in the microphone unit provided in the sound collection device. FIG. 6 is a diagram showing the
configuration of channel 3 in the microphone unit provided in the sound collection device. It is a
04-05-2019
4
figure which shows the polar pattern of channel 1 in the microphone unit provided in the sound
collection apparatus. It is a figure which shows the polar pattern of channel 2 in the microphone
unit provided in the sound collection apparatus. It is a figure which shows the polar pattern of
channel 3 in the microphone unit provided in the sound collection apparatus. It is a graph which
shows the microphone directivity characteristic according to sound source direction. It is a circuit
block diagram which shows the structure which performs an analog process with respect to an
audio | voice signal. It is a block diagram which shows the structure which performs a speech
recognition process with respect to an audio | voice signal. It is a graph which shows the
comparison result of the performance by a microphone type and a mounting position. It is a
figure which shows the microphone directivity area | region of the microphone array mounted in
the robot.
[0014]
Hereinafter, an embodiment of a robot according to the present invention will be described in
detail based on the drawings. However, the present invention is not limited to the following
embodiments. Further, in order to clarify the explanation, the following description and the
drawings are simplified as appropriate.
[0015]
An overall configuration of a robot 10 according to an embodiment of the present invention will
be described with reference to FIG. FIG. 1 is a front view schematically showing the entire
configuration of the robot, and shows how the robot 10 picks up a voice uttered by a utterer. The
robot 10 shown in FIG. 1 is a humanoid robot (humanoid robot), and includes a body 11 and a
head 12 provided on the body 11. The head 12 may be rotated relative to the body 11. The body
11 incorporates a motor and a battery for driving joints of arms and legs. The robot 1 is not
limited to a complete humanoid robot. For example, it may be a robot moving on wheels instead
of the legs. Thus, only a part may be a humanoid robot. Further, a proximity sensor 41 is
provided in the body portion 11. The proximity sensor 41 determines whether the speaker 52 is
in proximity. That is, the proximity sensor 41 outputs a sensor signal indicating that the speaker
52 approaches from the robot 10 within a predetermined distance.
[0016]
04-05-2019
5
The head 12 has a built-in sound pickup device 13. The sound pickup device 13 is provided
above the head 12 as shown in FIG. That is, the sound collection device 13 is attached to the top
of the head. The sound collection device 13 is provided with a plurality of microphone elements
as described later. Each microphone element is facing upward. Thereby, a microphone directivity
area 51 as shown in FIG. 1 can be obtained. The microphone directivity area 51 is disposed on
the head 12. Thus, the lip height of the speaker 52 can be adjusted to the microphone directivity
area 51, and the voice generated by the speaker 52 can be efficiently collected.
[0017]
Next, the configuration of the sound collection device 13 will be described with reference to FIG.
FIG. 3 is a view schematically showing the entire configuration of the sound collection device 13.
As shown in FIG. As shown in FIG. 3, the sound collection device 13 is provided with a plurality of
microphone elements 20. Then, a plurality of microphone elements 20 are fixed to the
microphone substrate 23. The microphone substrate 23 is disposed horizontally. The
microphone substrate 23 and the plurality of microphone elements 20 constitute a microphone
unit 24.
[0018]
A cover 22 is provided to cover the plurality of microphone elements 20. The cover 22 is
provided above the microphone unit 24. The cover 22 is attached to the top of the head of the
robot 10. Therefore, the cover 22 is to constitute the top of the head. Covering the microphone
element 20 and the microphone substrate 23 with the cover 22 prevents the microphone
element 20 and the microphone substrate 23 from being exposed. The microphone unit 24 is
thereby housed in the head 12. In addition, an EMC shield material 21 is provided between the
cover 22 and the microphone element 20. The EMC shield material 21 is electrically grounded in
order to reduce EMC noise generated in the microphone element 20 and the microphone
substrate 23 and the like. Therefore, the S / N ratio can be improved. In addition, since the
microphone element 20 is directed upward, it is directed in the direction of the cover 22.
[0019]
Each microphone element 20 is connected to the amplifier substrate 25 via the microphone
substrate 23. That is, the audio signal collected by the microphone element 20 is amplified by the
04-05-2019
6
amplifier provided on the amplifier substrate 25. Here, the amplifier board 25 is provided with
the same number of amplifiers as the microphone elements 20. Then, the audio signal amplified
by the amplifier is converted from an analog signal to a digital signal by the A / D conversion
substrate 28. Then, the audio signal converted into the digital signal is input to a PC (Personal
Computer) 40. The PC 40 performs speech recognition processing on the input speech signal.
Note that these processes will be described later. For example, the amplifier board 25 and the
microphone unit 24 are housed in the head 12. The PC 40 may be housed in the body portion
11. The amplifier board 25, the A / D conversion board 28, the PC 40, and the like each become
a processing unit that performs processing on the audio signal collected by the microphone unit.
In FIG. 3, a beam width calculation substrate to be described later is omitted.
[0020]
Next, the arrangement of the microphone element 20 in the microphone unit 24 will be
described with reference to FIG. FIG. 4 is a top view showing the configuration of the microphone
unit 24. As shown in FIG. In FIG. 4, the six microphone elements 20 are identified as the
microphone elements M1 to M6. That is, the microphone element M1, the microphone element
M2, the microphone element M3, the microphone element M4, the microphone element M5, and
the microphone element M6 are disposed on the microphone substrate 23. The microphone
elements M <b> 1 to M <b> 6 face upward and are fixed to the microphone substrate 23. That is,
the microphone elements M1 to M6 are respectively directed to the top of the head of the head.
Six microphone elements M1 to M6 are disposed on the same plane. That is, the six microphone
elements M1 to M6 are arranged horizontally.
[0021]
Here, the six microphone elements M1 to M6 are arranged such that the centers thereof are
arranged on three sides of an equilateral triangle. In FIG. 4, each side of the regular triangle is
shown as a virtual straight line L1, a virtual straight line L2, and a virtual straight line L3. That is,
virtual straight lines L1 to L3 constitute a triangle. Each of the virtual straight line L1, the virtual
straight line L2, and the virtual straight line L3 is in the horizontal plane. The center points of the
microphone elements M1 to M3 are disposed on the virtual straight line L1. In addition, the
microphone elements M1 to M3 are arranged at equal intervals. The microphone element M2 is
disposed between the microphone element M1 and the microphone element M3. Therefore, the
center point of the microphone element M1 and the center point of the microphone element M3
are respectively disposed at the vertices of an equilateral triangle. Thus, the microphone
elements M1 to M3 are arranged in a line on the virtual straight line L1. The microphone
04-05-2019
7
elements M1 to M3 on the virtual straight line L1 constitute a CH (channel) 1 configuration. The
direction of the virtual straight line L1 is the front-rear direction of the robot 10.
[0022]
The virtual straight line L2 is inclined by 60 ° from the virtual straight line L1. Then, the virtual
straight line L1 and the virtual straight line L2 intersect at the center point of the microphone
element M1. Microphone elements M1, M4, and M5 are arranged on the virtual straight line L2.
That is, the center points of the microphone elements M1, M4, and M5 are disposed on the
imaginary straight line L2. The microphone elements M1, M4 and M5 are arranged at equal
intervals. The microphone element M4 is disposed between the microphone element M1 and the
microphone element M5. Therefore, the center point of the microphone element M5 is disposed
at the vertex of the regular triangle. Thus, the microphone elements M1, M4, and M5 are
arranged in a line on the virtual straight line L2. The microphone elements M 1, M 4 and M 5 on
the virtual straight line L 2 constitute a CH (channel) 2.
[0023]
The virtual straight line L3 is inclined 60 ° from the virtual straight line L1 and the virtual
straight line L2. The virtual straight line L1 and the virtual straight line L3 intersect at the center
point of the microphone element M3. The virtual straight line L2 and the virtual straight line L3
intersect at the center point of the microphone element M5. Microphone elements M3, M6 and
M5 are arranged on the virtual straight line L3. That is, the center points of the microphone
elements M3, M6, and M5 are disposed on the imaginary straight line L3. The microphone
elements M3, M6 and M5 are arranged at equal intervals. The microphone element M6 is
disposed between the microphone element M3 and the microphone element M5. Therefore, the
center points of the microphone elements M1 and M5 are disposed at the vertices of an
equilateral triangle. Thus, the microphone elements M3, M6 and M5 are arranged in a line on the
virtual straight line L3. The microphone elements M3, M6 and M5 on the virtual straight line L3
constitute a CH (channel) 3.
[0024]
Each channel is composed of three microphone elements. The virtual straight line L1, the virtual
straight line L2, and the virtual straight line L3 are on the outermost circumferences of the
04-05-2019
8
plurality of microphone elements M1 to M6. The virtual straight line L1, the virtual straight line
L2, and the virtual straight line L3 constitute an equilateral triangle which is an outer shape of
the microphone unit. Therefore, when the microphone elements M1 to M6 at the outermost
periphery are connected, they form an equilateral triangle. Thus, one for the first row (a row
parallel to the imaginary straight line L1 passing through the center of M5), and two for the
second row (a row passing the centers of M4 and M6 parallel to the imaginary straight line L1)
three rows Three microphone elements are disposed in the eye (virtual straight line L1).
[0025]
The virtual straight line L1 extends in the front-rear direction of the robot 10. Here, the
microphone element M1 is disposed on the front side of the robot, and the microphone element
M3 is disposed on the rear side of the robot. Then, according to the voice arrival direction, as
shown in FIGS. 5 to 7, the channel to be used is selected. That is, depending on the direction of
the sound source, the optimal channel is used. 5 to 7 are top views showing the arrangement of
the microphone unit 24. The upper side in the drawing is the front side of the robot 10, and the
lower side is the rear side.
[0026]
For example, audio from the front or back is detected on channel CH1. As shown by the arrows in
FIG. 5, when voice arrives from behind, the microphone elements M1 to M3 of CH1 are used.
That is, after performing the DSP delay control on the sound data collected by using the
microphone elements M1 to M3, the addition processing is performed. If the direction of the
sound source is parallel to the virtual straight line L1, CH1 is used.
[0027]
On the other hand, sound from an oblique direction is detected using CH2 or CH3. For example,
as shown by the arrow in FIG. 6, the sound from the rear left is detected at CH2. In this case,
microphone elements M1, M4 and M5 of CH2 are used. That is, the DSP delay control is
performed on the sound data collected using the microphone elements M1, M4, and M5, and
then the addition processing is performed. When the direction of the sound source is parallel to
the virtual straight line L2, CH2 is used. Also, as shown by the arrows in FIG. 7, the sound from
the rear right is detected at CH3. In this case, microphone elements M3, M6 and M5 of CH3 are
04-05-2019
9
used. After DSP delay control is performed on audio data collected using the microphone
elements M3, M6, and M5, addition processing is performed. When the direction of the sound
source is parallel to the virtual straight line L3, CH3 is used.
[0028]
The polar patterns of each channel are shown in FIGS. FIG. 8 shows polar patterns of CH1, FIG. 9
shows polar patterns of CH2, and FIG. 10 shows polar patterns of CH3. Thus, each channel has a
different polar pattern. That is, the directional regions are different for each channel. Pick up the
sound using one of the channels. Of course, if the orientation of the sound source is not parallel
to the channel direction, then the channel of the closest orientation is used. By doing this, it is
possible to pick up sound with high S / N ratio even for voice from any direction.
[0029]
FIG. 11 is a graph showing the relationship between each channel and the direction of the sound
source. When the sound source direction is 0 ° (CH1) is shown on the left, when the sound
source direction is 60 ° left (CH2) is shown in the center and when the sound source direction is
60 ° right (CH3) is shown on the right There is. In each case, three measurements are shown.
That is, for each case, the measurement results when the microphone pointing direction is
changed are shown. In FIG. 11, the vertical axis indicates the S / N ratio. As shown in FIG. 11,
when the direction of the sound source is 0 °, the S / N ratio is high when the microphone
pointing direction is the front. When the direction of the sound source is 60 ° to the left, the S /
N ratio is high when the microphone directivity direction is 60 ° to the left. When the sound
source direction is 60 ° to the right, the S / N ratio is high when the microphone directivity
direction is 60 ° to the right. Therefore, depending on the direction of the sound source, it is
possible to pick up sound with a high S / N ratio in any direction by using different channels.
That is, a virtual straight line in the direction closest to the direction of the sound source is
estimated. Then, the microphone element 20 on a virtual straight line close to the direction of the
sound source is used. Here, audio processing is performed on the audio signal from the
microphone element 20 on the selected virtual straight line. That is, the channel to be used is
estimated based on the direction of the sound source.
[0030]
04-05-2019
10
Next, the speech processing method according to the present embodiment will be described
using FIG. 12 and FIG. FIG. 12 is a circuit block diagram showing a configuration for performing
analog processing on audio data. FIG. 13 is a diagram showing the configuration of the PC 40 for
performing speech recognition processing.
[0031]
As shown in FIG. 12, the amplifier substrate 25 is provided with an amplifier 31 and a buffer 32.
The amplifier 31 amplifies the audio signal from each of the microphone elements 20. The buffer
32 buffers the audio signal amplified by the amplifier 31.
[0032]
The buffered audio signal is input to the beam width calculation board 26. The beam width
computing substrate 26 computes the beam width of each channel to select a channel to be used.
Then, delay processing and addition processing are performed on the audio signals detected by
the three microphone elements 20 included in the channel. Therefore, the directivity calculation
unit 34 and the filter 35 are provided on the beam width calculation board 26. Here, the
directivity calculation unit 34 and the filter 35 are shown as one circuit.
[0033]
The filter 35 cuts an input of 300 Hz or less as a measure against power supply noise. As the
filter 35, a low pass filter can be used. Of course, a filter other than a low pass filter may be used
as the filter 35. For example, a high pass filter or a band pass filter may be used as the filter 35
to cut the input of a predetermined frequency band. The filter 35 may perform the filtering
process on the audio signal before the addition process, or may perform the filtering process on
the audio signal after the addition process.
[0034]
The directivity calculating unit 34 compares the audio signals stored in the buffer 32 to estimate
the direction of the sound source. Then, a channel close to the direction of the sound source is
04-05-2019
11
selected as the used channel. Specifically, the directivity calculation unit 34 extracts a channel
with a large phase delay among CH1 to CH3. That is, the direction in which the time delay of the
audio signal detected by the three microphone elements 20 included in each channel is large is
estimated. For example, the delay time of the audio signal stored in the buffer 32 is calculated.
Here, the delay time is calculated for every three channels. The arrangement direction of the
microphone elements 20 provided in the channel where the delay time becomes large is
estimated as the sound source direction. In other words, the virtual straight line closest to the
direction of the sound source is selected from the virtual straight lines L1 to L3. As described
above, it is estimated whether the direction of the sound source is closer to front (CH1), left 60
° (CH2), or right 60 ° (CH3). Select the channel with the closest orientation.
[0035]
Then, the directivity calculation unit 34 performs DSP delay control on the audio signal of the
selected channel. Thereby, the audio signals from the three microphone elements 20 included in
the used channel coincide in time and overlap. The delay time has a value corresponding to the
distance between the adjacent microphone elements 20. Furthermore, the directivity calculation
unit 34 performs addition processing on the audio signal delayed by the DSP delay control.
Thereby, human voice can be emphasized in the selected channel. It is considered that noise is
generated randomly in the audio signal of each microphone element 20. Therefore, noise is not
emphasized even if it is delayed and then added to three audio signals shifted in time. . Therefore,
the speech signal based on speech is emphasized by adding three speech signals. That is, the
speech signal is delayed and then added so that the speech part of the speaker 52 overlaps. It is
possible to emphasize the speech signal of the speaker's voice against the noise generated by the
noise source. By combining the direction estimation of the speaker and the superdirective
microphone, it is possible to improve the separation performance of the target voice and the
ambient noise.
[0036]
The beam width calculation substrate 26 is provided with a gain switching unit 37 and a
switching switch 38. Here, the gain switching unit 37 and the switching switch 38 are configured
as one circuit. The gain switching unit 37 switches the gain in accordance with the level of the
added audio signal. That is, an appropriate gain is selected to raise the voice signal after addition
to a predetermined level. The switch SW 38 switches the output signal so that the channel signal
selected by the directivity calculation unit 34 is output. As a result, the audio signal of the
selected channel is output from the beam width operation board 26. That is, by means of the
04-05-2019
12
switching SW 38, the beam width operation board 26 outputs the added audio signal in one
channel.
[0037]
As shown in FIG. 13, the audio signal from the beam width calculation board 26 is input to the A
/ D conversion board 28. The A / D conversion board 28 converts an analog voice signal into a
digital signal and outputs the digital signal to the PC 40. Thus, the audio signal after the addition
process becomes digital data. The PC 40 is a data processing device and performs voice
recognition processing on a digital voice signal.
[0038]
Further, a sensor signal from the proximity sensor 41 is input to the PC 40. As described above,
the proximity sensor 41 determines whether the speaker 52 approaches. That is, the proximity
sensor 41 outputs a sensor signal to the PC 40 when the utterer 52 is in the predetermined
range from the robot 10.
[0039]
The PC 40 performs voice recognition processing only when a sensor signal indicating that the
utterer 52 is approaching is input. That is, when the speaker 52 is separated from the robot 10
by a predetermined distance or more, the speech recognition process is not performed. When the
speaker 52 is separated from the robot 10 by a predetermined distance or more, the microphone
unit 24, the amplifier board 25, the beam width calculation board 26, the A / D conversion board
28 and the like may be turned off. As described above, the on / off control of the microphone
unit 24 and the like is controlled according to the signal from the proximity sensor 41. As a
result, processing for voice is performed only when necessary. Note that, instead of the proximity
sensor 41, a distance sensor, a camera, or the like may be used to determine whether or not they
are approaching. As described above, when the utterer 52 is far, the sound collection device 13 is
controlled to be ON / OFF so as to turn off the microphone element 20 and the like. According to
the output from the proximity sensor 41, ON / OFF of the input from the microphone element 20
is controlled. As a result, each process for voice is performed only when necessary.
04-05-2019
13
[0040]
Then, the PC 40 controls the A / D conversion substrate 28 using the A / D control driver.
Thereby, an audio signal from the selected channel is input. Then, the noise suppression module
removes noise. Control up to this point is performed by device control. That is, processing is
performed by hardware control.
[0041]
The recognition engine performs speech recognition processing on the noise-suppressed speech
signal. The recognition engine performs each process by software control. The segmenter
performs, for example, framing processing on an audio signal of continuous speech. In addition,
the segmenter discards unnecessary data before and after the speech section after noise
processing. Thereby, only the section (frame) where the speaker 52 utters is extracted. That is,
the speech recognition process is not performed at a portion where the input level of the speech
signal is low. Therefore, it is possible to reduce misrecognition (misinsertion).
[0042]
Then, using the acoustic model, it is determined which word the pattern of the audio signal
corresponds to. Thereby, a word string for continuous speech is obtained. For example, using a
feature vector or the like, words of a pattern corresponding to the audio signal are determined.
Then, noise is removed using a noise model. After this, the speech dictionary is referenced to
determine whether the sentence is meaningful. For example, using a language model, it is
determined whether it is grammatically correct. If the sentence is grammatically correct, this
recognition result is output to the robot control application. This will produce the most
grammatically correct sentences. The voice may be recognized by processing other than the
voice recognition processing described above. That is, the speech recognition processing is not
particularly limited, and a known speech recognition processing method can be used.
[0043]
Thus, the plurality of microphone elements 20 are arranged upward. Then, among the plurality
of arranged microphone elements 20, the microphone elements 20 at the outermost periphery
04-05-2019
14
are arranged along the three sides of the triangle. Speech is recognized using microphone
elements arranged on one side close to the direction of the sound source. That is, a delay process
and an addition process are performed on the audio signal detected by the microphone element
20 on one side. By this, the S / N ratio can be improved. Thereby, the speech recognition rate can
be improved. Further, by arranging the microphone elements 20 in a triangular shape in top
view, the number of the microphone elements 20 can be reduced as compared with the case
where they are arrayed vertically and horizontally. Furthermore, even if the direction of the
sound source is in any direction, the S / N ratio can be improved with a small number of
microphone elements.
[0044]
Here, a difference in S / N ratio depending on the type and mounting position of the microphone
unit 24 is shown in FIG. The microphone B is the microphone unit of the triangle arrangement
shown in FIG. The microphone A is a microphone unit of conventional arrangement. As shown in
FIG. 14, the S / N ratio can be increased by attaching the microphone unit 24 in a triangular
arrangement to the head 12. Further, the microphone B has a higher S / N ratio than the
microphone A. The S / N ratio is higher when the microphone unit 24 is provided at the head
than at the chest (body part).
[0045]
Thus, the sound collection beam is formed with each side of the triangle as one channel.
Furthermore, the microphone elements 20 facing upward are arranged along the horizontal
direction. Therefore, sound from any direction can be collected at a high S / N ratio. Also, by
making the height of the microphone element 20 the same as or lower than the height of the lips
of the speaker, it is possible to collect voice in the microphone directivity area 51. This makes it
possible to reliably pick up sound. Therefore, voice can be picked up clearly. Furthermore, since
the microphone elements 20 at the outermost periphery are arranged on three sides of a
triangle, the number of microphone elements 20 can be reduced. That is, even when the number
of microphone elements 20 is reduced, a high S / N ratio can be realized.
[0046]
For example, in the body portion 11, a motor for driving a joint, a cooling fan of the PC 40, and
04-05-2019
15
the like are stored. When each of these devices operates, noise is generated. In other words, a
device that operates mechanically becomes a noise source. If the input level of noise generated
by these noise sources is higher than the input signal level of the target voice, the voice signal is
buried in the noise. Therefore, in the present embodiment, the sound collection device 13 is
housed in the head 12 in which the noise source does not exist or the number thereof is small.
The head 12 has a smaller number of mounted devices as noise sources than the body 11.
Therefore, the microphone unit 24 can be kept away from the noise source. Thereby, noise
mixing to the microphone element 20 can be reduced. The S / N ratio can be increased.
Therefore, voices can be collected clearly, and the decrease in voice recognition rate can be
reduced. Furthermore, by installing the microphone unit 24 on the top of the head 12, the space
can be efficiently used.
[0047]
Although six microphone elements 20 are arranged in the microphone unit 24 in the present
embodiment, the number of microphone elements 20 may be other than this. For example, when
the number of microphone elements 20 is ten, four microphone elements 20 can be provided on
one side. For example, one microphone is arranged in the first row, two in the second row, three
in the third row, and four microphone elements 20 in the fourth row. In this case, four
microphone elements 20 are closely arranged in the direction parallel to the virtual straight line
L1 next to the virtual straight line L1 in the configuration shown in FIG. That is, four microphone
elements 20 are added on the left side of the configuration of FIG. また。 In this arrangement,
the second nine microphone elements in the third row (the present microphone element M2) are
not the outermost microphone elements, so the remaining nine microphone elements 20 are
arranged along the three sides of the triangle. It is arranged. By increasing the number of
microphone elements 10, the number of audio signals to be added is increased, so that the S / N
ratio can be further increased. The S / N ratio can be increased with a smaller number of
microphone elements 20 as compared to the arrangement in an array. Therefore, the S / N ratio
can be increased with a simple configuration. Further, the S / N ratio can be further improved by
arranging the microphone elements 20 on three sides at equal intervals.
[0048]
In the above description, the triangle on which the outermost microphone element 20 is
arranged is an equilateral triangle, but triangles other than this may be used. For example, it may
be an isosceles triangle or a right triangle. Of course, other triangles may be used. Further, it is
preferable that the angle of the apex of the triangle be an acute angle. As a result, it is possible to
04-05-2019
16
prevent the angle formed by the two sides from being reduced, and therefore, even when the
sound source is in any direction, it is possible to pick up the voice signal with a high S / N ratio.
[0049]
DESCRIPTION OF SYMBOLS 10 robot 11 body part 12 head 13 sound collection apparatus 15
microphone unit 16 processing part 21 shield material 22 cover 23 microphone board 24
microphone unit 25 amplifier board 26 beam width operation board 28 A / D conversion board
31 amplifier 32 buffer 34 directivity Arithmetic unit 35 Filter 37 Gain switching unit 38 Switch
SW 40 PC 41 Proximity sensor 51 Microphone directivity area 52 Speaker M1 Microphone
element M2 Microphone element M3 Microphone element M4 Microphone element M5
Microphone element M6 Microphone element
04-05-2019
17
Документ
Категория
Без категории
Просмотров
0
Размер файла
30 Кб
Теги
jp2011091851
1/--страниц
Пожаловаться на содержимое документа