close

Вход

Забыли?

вход по аккаунту

?

JP2008205896

код для вставкиСкачать
Patent Translate
Powered by EPO and Google
Notice
This translation is machine-generated. It cannot be guaranteed that it is intelligible, accurate,
complete, reliable or fit for specific purposes. Critical decisions, such as commercially relevant or
financial decisions, should not be based on machine-translation output.
DESCRIPTION JP2008205896
The present invention provides a sound emission and collection device that can easily make
settings such as directing an audio beam in a specific direction or reducing the volume in a
specific direction. A control unit 4 performs processing for detecting a sound source position. In
addition, it analyzes the contents of the voice collected by the microphone and extracts the
command. The analysis of the content of speech, for example, to extract a specific audio content
by speech recognition, and extracts it as a command. The control unit 4 performs directivity
setting processing for setting the delay amount of the beam control units 7A and 7B based on the
detected sound source position and command content. Accordingly, the user only issues a certain
command speech, or toward the sound beam in the direction of its own, the settings or toward
the sound beam in the other direction can be easily performed. [Selected figure] Figure 1
Sound emission device
[0001]
The present invention relates to a sound emitting and collecting apparatus which picks up sound
and outputs a sound beam having strong directivity in a specific direction.
[0002]
Conventionally, by delaying controlling audio signals supplied to the respective units of the
speaker array, the sound output device for outputting a sound beam having a strong directivity in
a specific direction are known.
[0003]
09-05-2019
1
For example, in Patent Document 1, in order to set parameters for controlling directivity, such as
the delay amount of each speaker unit, a device that specifies the position of the speaker using a
microphone array and directs the voice beam in the direction of the speaker is proposed. It is
done.
Unexamined-Japanese-Patent No. 2006-270876
[0004]
However, the device of Patent Document 1 has low versatility because it directs the voice beam
toward the speaker.
For example, when used at home, there are users who want to listen to the movie sound at a
large volume on the one hand and users who want to make the movie sound smaller for making a
phone call on the other hand, etc. In addition to turning, there are cases where you want to turn
down the volume only in a specific direction.
[0005]
Accordingly, it is an object of the present invention is to provide a sound emitting and collecting
apparatus that can perform or direct the sound beam in a particular direction, the setting of
lowering the volume only certain directions easily.
[0006]
The sound emission and collection device according to the present invention releases the sound
collection unit for collecting the sound and outputting the sound collection signal, the sound
source position detection unit for detecting the sound source position, and the sound with
directivity in a specific direction. A sound emitting unit that sounds, a voice analysis unit that
receives the collected signal, and extracts a command for instructing directivity included in the
collected signal, a sound source position detected by the sound source position detection unit,
and the voice analysis based on the contents of the command for instructing the extracted
directional parts, characterized by comprising a control unit that sets the directivity pattern of
the sound emitting part.
09-05-2019
2
[0007]
In this configuration, a command instructing directivity is extracted from the collected sound
signal.
For example, words such as “here” and “sound louder” are extracted by speech recognition.
In addition, the position of the sound source which is the basis of the collected signal is detected.
Detection of the sound source position, for example, performs a linear prediction from the output
audio signal of each microphone unit of the microphone array. The directivity is controlled based
on the result of extraction of these commands and the detection result of the sound source
position. Directivity setting are conceivable various embodiments, for example, when extracting
the word "here" to direct the sound beam having a strong directivity in that direction.
[0008]
Further, according to the present invention, the voice analysis unit further extracts a command
for selecting a source included in the collected signal, and the control unit is based on the
command for selecting the source extracted by the voice analysis unit. Setting the directivity
pattern of the voice of the selected source, and the sound emitting unit simultaneously gives
voices of different sources in a plurality of directions based on the directivity pattern set by the
control unit It is characterized by emitting noise.
[0009]
In this configuration, the sound made to have directivity sound from different sources in a
plurality of directions.
For example, by separately delaying audio signals input to each speaker unit of the speaker
array, directivity can be simultaneously given in a plurality of directions. Furthermore, in this
configuration, a command for selecting a source is extracted from the collected sound signal. For
example, when the speech consists of two sources "source A" and "source B", the words "source
A" and "source B" are extracted by speech recognition. Set only the directivity pattern of the voice
of the selected source. Thus, for example, when the word "here" is extracted after the statement
09-05-2019
3
"source A", only the voice of source A is directed to that direction.
[0010]
Further, according to the present invention, the voice analysis unit further extracts a command
serving as a trigger included in the collected sound signal, and the control unit performs the
extraction only when the voice analysis unit extracts a command serving as a trigger. setting the
subsequent directional pattern based on the contents of the command for instructing the
directivity characterized.
[0011]
In this configuration, a command serving as a trigger is extracted from the collected sound signal.
The command to be triggered is, for example, the word "command input". The directivity pattern
is set only when this word is recognized. For example, when the word "here" is extracted after the
statement "command input", only the voice of the source A is directed to that direction. If you just
extracted the word "to here" ignore this. By ignoring the voice of the content that is spoken
unconsciously, the user's setting intention is reflected.
[0012]
Further, the invention is characterized in that the voice analysis unit extracts a specific rhythm
pattern included in the collected sound signal as a command.
[0013]
In this configuration, a specific rhythm pattern is extracted as a command.
For example, it counts a predetermined level or more and a short single pronunciation (for
example, voice clap), extracts the command by the input number within a predetermined time
period (e.g. 3 seconds). For example, it is determined that "louder" in the single pronunciation
once, it is determined that "small sounds" twice.
09-05-2019
4
[0014]
Further, the invention is characterized in that the control unit sets a directivity pattern so that the
volume is reduced only in a predetermined direction.
[0015]
In this configuration, as an aspect of the directivity pattern, the volume only in a predetermined
direction to be lowered.
For the speaker array, sound that is emitted from the speaker units, the phase is weakened in
different regions. Therefore, by controlling the delay amount of the audio signal input to each
speaker unit, the directivity can be set so that the volume is reduced only in the predetermined
direction. In this case, it may be extracted word "only mute here" etc. as a command for
instructing the directivity. As a result, only by uttering a specific voice, the volume can be
reduced only in a specific area that is desired to be quiet.
[0016]
Further, the present invention further includes an echo canceller that removes an echo
component of the collected sound signal, and the voice analysis unit extracts a command
included in the collected sound signal from which the echo canceler has removed the echo
component. It is characterized by
[0017]
In this configuration, echo components are removed from the collected signal.
To perform the sound signals picked up after removal of the echo component speech recognition
or the like, thereby improving the accuracy of the command extraction.
[0018]
According to the present invention, it is possible to direct a voice beam in a specific direction by
the user's speech or to lower the volume in a specific direction by extracting a command
09-05-2019
5
instructing directivity included in a collected sound signal. .
[0019]
The sound emission and collection device of this embodiment is a device that controls the
emission directivity based on the sound collected by the microphone, and controls the directivity
of the sound input from another device in a predetermined direction to release the sound. Sound.
This sound emission and collection device can be used as a speaker device that emits various
audio sources by connecting to a television or an audio device, and an audio conference can be
performed by outputting the sound collected by a microphone to another device. It is also
possible to use it as an apparatus. A sound emission and collection device according to an
embodiment of the present invention will be described below with reference to the drawings. FIG.
1 is a block diagram showing the configuration of the sound emission and collection device.
[0020]
The sound emission and collection device 1 includes a microphone array 2, an input / output
interface (I / F) 3, a control unit 4, a speaker array 5, an echo canceller 6, a beam control unit 7A,
a beam control unit 7B, a mixer 8, and a D / A. Converters 11 to 18, amplifiers (AMPs) 31 to 38,
amplifiers (AMPs) 41 to 48, A / D converters 51 to 58, a sound collection beam generation unit
61, and a sound collection beam selection unit 71 are provided.
[0021]
The microphone array 2 has a plurality of (eight in the example shown) microphone units 21 to
28 linearly arranged, and outputs voices (sound collection signals) collected by the microphone
units 21 to 28, respectively.
Speaker array 5, (in the example of FIG. 8) a plurality of result by arranging the speaker units 51
to 58 in a straight line, to sound an audio signal input, respectively.
[0022]
09-05-2019
6
Collected sound signal by the microphone unit 21 to 28 picked up is amplified by the amplifier
41 to 48 of the front end, is digitally converted by the A / D converter 51-58. Digitized sound
pickup signal by the A / D converter 51 to 58 is input to the echo canceller 6.
[0023]
Echo canceller 6 comprises a filter processing section 60, inputs the audio signal corresponding
to the speaker units 51 to 58 that are input from the mixer 8 in the filter processing unit 60. The
filter processing unit 60 performs filter processing on audio signals corresponding to the speaker
units 51 to 58, and generates a pseudo-regression sound signal simulating a regression audio
signal that is routed from the speaker array 5 to the microphone array 2. Filter processing unit
60 erases the echo component by subtracting this pseudo regression sound signal from the
collected sound signal, and outputs the sound collecting beam generating portion 61. By
eliminating the echo component by the echo canceller 6, the accuracy of the sound source
position detection process and the command analysis process described later is improved.
[0024]
The sound collection beam generation unit 61 beam-forms the sound collection directivity of the
entire microphone array 2 by delaying and synthesizing the sound collection signals from which
the echo components have been removed by the echo canceller 6. Due to this beamed directivity
of sound collection, sound generated in a specific area is collected with high gain. Note that the
beamed directivity of the beam is called a collected beam. In the present embodiment, the
collected sound beams MB11 to MB14 corresponding to four areas around the microphone array
2 are generated.
[0025]
FIG. 2 is a diagram showing an example of a sound collection beam. In the figure, the voice
collecting beam generating portion 61 forms a voice collecting beam that focused on the position
to be picked up and collected by a high gain audio narrow range. Here, the sound collection areas
P1 to P4 are set, for example, in front of the microphone array. The sound collecting beam
generation unit 61 delays the sound signals collected by the microphone units 21 to 28 so as to
09-05-2019
7
be equidistant from the focal point (F3 in the figure) and then synthesizes them, thereby
collecting the focal point (collecting sound The voice generated in the region P3) can be
extracted with high gain.
[0026]
In FIG. 1, the four sound collection beams MB11 to MB14 generated by the sound collection
beam generation unit 61 are input to the sound collection beam selection unit 71. The sound
collection beam selection unit 71 selects a signal with the highest level among the four sound
collection beams MB11 to MB14, and outputs the selected sound collection beam to the input /
output I / F 3 as a main sound collection beam.
[0027]
FIG. 3 is a block diagram showing the main configuration of the sound collection beam selection
unit 71. As shown in FIG. The sound collection beam selection unit 71 includes a BPF (band pass
filter) 171, a full wave rectification circuit 172, a peak detection circuit 173, a level comparator
174, and a signal selection circuit 175.
[0028]
The BPF 171 is a band pass filter that uses a main component band of human voice as a pass
band, performs band pass filtering on the collected beams MB11 to MB14, and outputs the result
to the full wave rectification circuit 172. The full-wave rectifier circuit 172 full-wave rectifies
(makes an absolute value) the sound collection beams MB11 to MB14. Peak detection circuit 173
performs peak detection of the sound collecting beam MB11~MB14 full-wave rectified, and
outputs peak value data Ps11∼Ps14. The level comparator 174 compares the peak value data
Ps11 to Ps14 and provides selection instructing data for selecting the sound collection beam
corresponding to the highest level peak value data to the signal selection circuit 175. The level
comparator 174 also gives selection instruction data to the control unit 4. The control unit 4 uses
the selection instruction data for sound source position detection processing described later. The
signal selection circuit 175 selects the sound collection beam indicated by the selection
instruction data, and outputs it as the main sound collection beam to the input / output I / F 3.
Further, the signal selection circuit 175 selects the sound collection beam indicated by the
selection instruction data, and outputs it to the control unit 4 as the main sound collection beam.
09-05-2019
8
The control unit 4 uses the main sound collecting beam for command analysis processing
described later. This utilizes the fact that the signal level of the sound collection beam
corresponding to the sound collection area where the sound source is present is higher than the
signal level of the sound collection beam corresponding to other areas.
[0029]
The main sound collection beam input to the input / output I / F 3 (output I / F 30 C) is output to
another device when the sound emitting and collecting device 1 is used as an audio conference
device. When output via the network, it is output as audio information of a predetermined
protocol.
[0030]
The input / output I / F 3 functionally includes an input I / F 30A, an input I / F 30B, and an
output I / F 30C, and inputs / outputs an audio signal (or audio information) to / from another
device. The audio signal input to the input I / F 30A is output to the beam control unit 7A, and
the audio signal input to the input I / F 30B is output to the beam control unit 7B. When audio
information is input, it is converted to an audio signal and output.
[0031]
The beam control units 7A and 7B perform delay processing and gain control on the audio
signals input to the speaker units 51 to 58 of the speaker array 5, so that an audio beam having
strong directivity in a predetermined direction can be formed. Also, conversely, an audio beam
whose volume is reduced only in a predetermined direction (hereinafter referred to as an audio
dip). ) Can also be formed. The delay amount and gain of the audio signal corresponding to each
of the speaker units 51 to 58 are set by the control unit 4. The sound emitted by each of the
speaker units 51 to 58 is intensified in the region where the phase is common, and conversely is
weakened in the region where the phase is different. Therefore, by controlling the delay amount
of the audio signal input to each speaker unit, the audio beam can be directed in a specific
direction or the audio dip can be directed.
[0032]
09-05-2019
9
The audio signals output from the beam control units 7A and 7B are input to the mixer 8. The
mixer 8 mixes audio signals corresponding to the speaker units 51 to 58 output by the beam
control units 7A and 7B, respectively, and outputs the mixed audio signals to the echo canceller
6. The echo canceller 6 generates the pseudo-regression sound signal from the sound signal
corresponding to the speaker units 51 to 58 as described above. Also, the echo canceller 6
outputs audio signals corresponding to the speaker units 51 to 58 to the D / A converters 11 to
18. Audio signals corresponding to the speaker units 51 to 58 are converted to analog audio
signals by the D / A converters 11 to 18, respectively, amplified by the amplifiers 31 to 38, and
then emitted by the speaker units 51 to 58.
[0033]
Here, by performing delay processing so that the beam control units 7A and 7B output voice
beams to different areas, the user can listen to voice of different sources at each place. For
example, as shown in FIG. 4, the user h1 in the position of the sofa in the living room can listen
to the movie sound (source A), and the user h2 in the position of the dining table can listen to the
music (source B). Also, even if the movie sound is the same, the user h1 can listen to the
Japanese voice and the user h2 can listen to the English voice. The source and direction of each
voice beam (voice dip) are set by the control unit 4.
[0034]
The control unit 4 includes a CPU, and performs sound source position detection processing for
detecting the position of the sound source based on the selection instruction data input from the
level comparator 174. In the simplest case, it is determined that the sound source is present in
the sound collection area of the sound beam indicated by the selection instruction data, and this
sound collection area is used as the sound source position. Although not shown, the sound
pickup signals picked up by the microphone units 21 to 28 (the sound pickup signals outputted
from the echo canceller 6) are respectively inputted, and other general methods such as a linear
prediction method and a minimum dispersion method are provided. It may be used to detect the
sound source position.
[0035]
09-05-2019
10
The control unit 4 also performs command analysis processing to analyze the main sound
collection beam input from the signal selection circuit 175. The command analysis process is a
process of performing speech recognition and extracting a command from the speech content of
the main sound collection beam. Specifically, the control unit 4 compares the input audio signal
with the pattern of the audio signal stored in advance in a memory (not shown) or the like. The
comparison method uses, for example, a probabilistic model such as a hidden Markov model.
When the control unit 4 recognizes a specific voice content from the content of the input voice
signal, the control unit 4 extracts this as a command. The contents of the command are classified
into trigger, source selection, and beam setting.
[0036]
The control unit 4 predetermines voice (for example, voice of “command input”) extracted as a
command of trigger, selects a source of a voice signal input after recognition of the trigger voice,
and sets a beam. The command extraction process of extracting as a command is performed, and
the command extraction process is not executed if the trigger voice is not recognized.
[0037]
Similarly, the control unit 4 predetermines voice content to be extracted as a source selection
command.
The audio contents extracted as the source selection command are, for example, "source A",
"source B" and the like. Further, the control unit 4 also predetermines voice contents to be
extracted as a beam setting command. The audio content extracted as a command for setting the
beam is, for example, “sound louder”, “sound smaller” or the like. Note that source selection
and beam setting command extraction are not essential in the present invention.
[0038]
In addition to speech recognition, for example, a specific rhythm pattern can be extracted as a
command. The control unit 4 counts single tones (for example, a clapping voice etc.) with a voice
of a predetermined level or more and a short time of the predetermined level or more, and
extracts a command according to the number of inputs within a predetermined time (for example
3 seconds). For example, it is determined that "the sound is loud" in one single sound generation,
09-05-2019
11
and "the sound is small" in two times.
[0039]
The control unit 4 performs directivity setting processing for setting the delay amount and gain
of the beam control units 7A and 7B based on the sound source position detected in the sound
source position detection processing and the command content analyzed in the command
analysis processing. Hereinafter, specific examples of the directivity setting process will be
described with reference to the drawings. In any of the examples, it is assumed that the user first
emits a trigger sound such as "command input".
[0040]
FIG. 5 is a diagram showing an example of controlling an audio beam as an example of directivity
setting processing. The figure (A) is a figure showing an example in case a voice beam is turned
to a user's direction. In the same figure, when the user h1 says "To source A here", the control
unit 4 extracts "source A" as a source selection command and extracts "here" as a beam setting
command. Further, the control unit 4 detects the position of the user h1. Then, the control unit 4
sets the delay amount of the beam control unit 7A so that the sound of the source A (in the
example, the movie sound) is directed to the position of the user h1. Thus, the user h1 can direct
the voice beam to his / her own direction simply by saying "Source A here" at each location.
[0041]
Next, FIG. 7B is a diagram showing an example in the case of changing the volume of the sound
beam directed to the direction of the user. In the same figure, when the user h1 says "Large
source A sound", the control unit 4 extracts "source A" as a source selection command and
extracts "large sound" as a beam setting command. Do. Further, the control unit 4 detects the
position of the user h1. Then, the control unit 4 sets the gain of the beam control unit 7A so that
the volume of the sound beam of the source A becomes large. If the position of the user h1
detected at this time deviates from the direction of the sound beam, the delay amount of the
beam control unit 7A may be set to direct the sound beam to the position of the user h1. Thus,
the user h1 can increase the volume of the source A by the position of the user h1 simply by
saying "the source A sound is louder" at each place. In the example of directivity setting shown in
FIG. 5A and FIG. 5B, when enjoying television or music at night, when other sounds in the home
09-05-2019
12
are loud and it is difficult to hear a movie sound, And so on.
[0042]
Next, FIG. 6 is a diagram showing an example in the case of directing an audio dip in the
direction of the user. In the figure, when the user h1 says "source A here only mute", the control
unit 4 extracts "source A" as a source selection command and extracts "here only" as a beam
setting command. Do. Further, the control unit 4 detects the position of the user h1. Then, the
control unit 4 sets the delay amount of the beam control unit 7A so that the volume of the sound
of the source A decreases by the position of the user h1 (so that the audio dip indicated by the
two-dot broken line in the figure is directed). . As a result, the user h1 can turn the audio dip to
his / her own direction only by saying "Source A here only mute" at each location. The example
shown in the figure is suitable, for example, when the user enjoys a television or music, or when
the user calls in and wants to temporarily reduce the volume.
[0043]
Next, FIG. 7 is a diagram showing an example in the case where the voice beam is directed to a
direction (specific direction) other than the user. In FIG. 6A, when the user h1 says "source A
reverse direction", the control unit 4 extracts "source A" as a source selection command, and
"reverse direction" as a beam setting command. Extract. Further, the control unit 4 detects the
position of the user h1. Then, the control unit 4 sets the delay amount of the beam control unit
7A so that the voice beam of the source A is directed in the opposite direction to the user. Note
that the opposite direction means a position that is symmetrical with respect to the central axis O
of the speaker array 5 with respect to the longitudinal axis Y of the array. In the example of the
figure, the user h2 exists in the opposite direction of the position of the user h1. Thus, the voice
beam of source A will be directed to user h2. As described above, the user h1 can turn the voice
beam in a different direction from itself by simply saying "source A reverse direction" at each
location. It is also possible to set a plurality of directions in which the sound beam is directed and
to direct the sound beam in that direction.
[0044]
In FIG. 7B, the control unit 4 sets in advance a plurality of directions 1 to 3 as the direction in
which the sound beam is directed. The number of directions to be set is not limited to this
09-05-2019
13
example. Here, when the user h1 says "source A direction 1", the control unit 4 extracts "source
A" as a source selection command, and extracts "direction 1" as a beam setting command. Then,
the control unit 4 sets the delay amount of the beam control unit 7A so that the voice beam of
the source A is directed in the direction 1 set in advance. The example of FIG. 7 is suitable, for
example, when the user is enjoying music and wants other people to listen to it. Further, as
described above, when the user is enjoying television or music, it is preferable also when, for
example, the user receives a call and wants to temporarily turn an audio beam in another
direction.
[0045]
Next, FIG. 8 is a diagram showing an example in which the audio dip is directed to a direction
other than the user (specific direction). In the same figure (A), when the user h1 says "Mute only
in the opposite direction of source A", the control unit 4 extracts "source A" as a source selection
command, and "reverse direction" as a beam setting command. Extract only "mute". Further, the
control unit 4 detects the position of the user h1. Then, the control unit 4 sets the delay amount
of the beam control unit 7A so that the volume of the sound of the source A decreases in the
direction opposite to the user (the audio dip indicated by the two-dot broken line in the figure is
directed). Do. In the example of the figure, the user h2 exists in the opposite direction of the
position of the user h1. Therefore, the volume of the sound of the source A is reduced by the
position of the user h2.
[0046]
As described above, the user h1 can turn the audio dip in a direction different from itself only by
saying "Mute only in the opposite direction of the source A" at each location. Note that a plurality
of directions in which the audio dip is directed may be set in advance, and the audio dip may be
directed in that direction.
[0047]
In FIG. 6B, the control unit 4 sets a plurality of directions 1 to 3 as the direction in which the
audio dip is directed. Also in this example, the number of directions to set is not limited to this
example. Here, when the user h1 says "Mute only in source A direction 1", the control unit 4
extracts "source A" as a source selection command, and "mute only one direction" as a beam
09-05-2019
14
setting command. Extract. Then, the control unit 4 sets the delay amount of the beam control unit
7A so that the audio dip of the source A is directed in the direction 1 set in advance. The example
shown in FIG. 8 is suitable for reducing the volume only in the direction in which the baby is
sleeping. Also, if the direction of the telephone in the home is set in advance, it is possible to
lower the volume only in the direction of the telephone when a call is received.
[0048]
As described above, according to the sound emission and collection device of the present
invention, it is possible to easily control the sound beam and the sound dip only by emitting the
sound without the need for the user to operate the main body and the remote control to perform
complicated settings. Can.
[0049]
It is a block diagram showing composition of a sound emission and collection device.
It is a figure which shows the formation concept of a sound collection beam. FIG. 18 is a block
diagram showing the main configuration of a sound collection beam selection unit 71. It is a
figure showing an example in the case of listening to the sound of a different source in each
place. It is a figure which shows the example which controls an audio | voice beam as an example
of directivity setting processing. It is the figure which showed the example in the case of
directing an audio | voice dip to the direction of a user. It is the figure which showed the example
in the case of directing an audio | voice beam to directions (specific direction) other than a user.
It is a figure showing an example in case a voice dip is directed to directions (specific direction)
other than a user.
Explanation of sign
[0050]
1-Sound emission and collection device 2-Microphone array 3-Input / output interface 4-Control
unit 5-Speaker array 6-Echo canceler 7A, 7B-Beam control unit 8-Mixer
09-05-2019
15
Документ
Категория
Без категории
Просмотров
0
Размер файла
27 Кб
Теги
jp2008205896
1/--страниц
Пожаловаться на содержимое документа