close

Вход

Забыли?

вход по аккаунту

?

JP2009089133

код для вставкиСкачать
Patent Translate
Powered by EPO and Google
Notice
This translation is machine-generated. It cannot be guaranteed that it is intelligible, accurate,
complete, reliable or fit for specific purposes. Critical decisions, such as commercially relevant or
financial decisions, should not be based on machine-translation output.
DESCRIPTION JP2009089133
The present invention provides a sound emission and collection device capable of accurately
detecting a speaker direction based on a sound collection signal even if sound is emitted from a
speaker. SOLUTION: Microphone units MU1 to MU8 pick up voices of sound collection areas
MA1 to MA8 formed so as to be rotationally symmetrical with respect to arrangement positions
of speakers SP1 and SP2, and synthesize signals SA1 to SA8 (hereinafter referred to , SAk). The
logarithmic calculators L1 to L8 calculate the logarithmic value P of the power level of the
combined signal SAk. The amplifier unit 11 calculates the power level average value AV from the
logarithmic value P of the power level, and the subtracting unit subtracts the power level average
value AV from the logarithmic value P of the power level to generate a differential signal level D.
The maximum value detection unit 12 compares the difference signal levels D to detect the
maximum value. The control unit 20 detects the azimuth of the sound collection area
corresponding to the difference signal level D indicating the maximum value as the speaker
azimuth. [Selected figure] Figure 2
Sound emission device
[0001]
The present invention relates to a sound emission and collection device that detects a speaker
direction based on a sound collection signal.
[0002]
Generally, there is a sound emission and collection device that changes the directivity of a
microphone array formed by a plurality of microphones and detects the sound collection
direction in which the output of the microphone array is maximum as the arrival direction of the
04-05-2019
1
sound source.
[0003]
However, in the above sound emitting and collecting apparatus, when the speaker is emitting
sound, the emitted sound is collected by the microphone, and the sound collecting direction
(orientation) of the microphone located in the vicinity of the speaker is the voice arrival direction
There was a problem of false detection.
[0004]
Japanese Patent Laid-Open No. 2008-112118 discloses an arrangement for preventing the
directivity of the microphone array from being directed to a sound collection area located in the
vicinity of the speaker emitting sound based on the reception signal when the reception signal
from the communication destination is detected. A sound device is disclosed.
Japanese Patent Application Laid-Open No. 11-18192
[0005]
However, in the sound emission and collection device shown in the above-mentioned Patent
Document 1, there has been a problem that the speaker direction can not be accurately detected
when the speaker emits sound based on the reception signal (sound emission signal). .
[0006]
Therefore, it is an object of the present invention to provide a sound emission and collection
device capable of accurately detecting a speaker direction based on a sound collection signal
even if sound is emitted from a speaker.
[0007]
The sound emission and collection device of the present invention includes a sound emission
unit, a plurality of sound collection units, a difference level calculation unit, and a speaker
direction detection unit, and emits sound based on the sound emission signal and collects sound
from the surroundings of the own device. The sound pickup signal is generated, and the speaker
04-05-2019
2
direction is detected based on the sound pickup signal.
The sound emitting unit performs sound emission.
The plurality of sound collection units respectively form sound collection regions set so that the
sound emission sound from the sound collection units wraps around equally, and generates
sound collection signals by collecting sound from the sound collection regions.
The difference level calculation unit calculates the logarithmic value of the power of the collected
sound signal from the plurality of sound collection units and the average value of the logarithmic
value of the power of the plurality of collected sound signal, and calculates the logarithmic value
of the power of each collected sound signal Subtracting the average value from the above
generates a difference level signal corresponding to each sound collection unit.
The speaker direction detection unit compares the level values of the difference level signal to
detect the maximum value, and detects the direction of the sound collection unit corresponding
to the difference level signal indicating the maximum value as the speaker direction.
[0008]
In this configuration, the sound of each of the sound collection areas set so that the sound
emission sound from the sound emission unit wraps equally to all is collected to generate a
sound collection signal, and the logarithmic value of the power of the sound collection signal ,
The average value of the logarithmic value of the power of the collected signal is obtained. Then,
the above average value is subtracted from the logarithmic value of the power of the collected
signal to generate a difference level signal. Furthermore, the sound collection direction of the
sound collection unit corresponding to the difference level signal indicating the maximum value
is detected as the speaker direction. As a result, even when the sound emitting unit is emitting
sound, it is possible to detect the speaker direction from the sound collecting region where the
difference signals are compared and the maximum value is shown.
[0009]
04-05-2019
3
In this configuration, the speaker orientation detection unit presets the speaker sound detection
threshold for the level value of the difference level signal, and indicates the maximum value when
the maximum value becomes larger than the speaker sound detection threshold. The direction of
the sound collection unit corresponding to the difference level signal may be the speaker
direction. Thus, the speaker direction can be detected based on the speaker sound detection
threshold.
[0010]
Furthermore, the difference level calculation unit may use only the low frequency range
component of the collected signal.
[0011]
As a result, it is possible to detect the speaker direction by using a low frequency range
component including a large amount of frequency components of human voice among the
frequency components in the audible range included in the collected sound signal.
[0012]
According to the present invention, in the sound emission and collection device in which the
speaker and the plurality of microphones are installed in one case, the speaker direction is
accurately determined based on the collected signal even if the speaker emits sound. Can be
detected.
[0013]
Hereinafter, a sound emission and collection device 1 according to an embodiment of the present
invention will be described with reference to the drawings.
[0014]
The sound emission and collection device 1 has a cylindrical casing (not shown) that is circular in
top view.
FIG. 1 schematically shows the positional relationship between the speakers SP1 and SP2 and the
microphone units MU1 to MU8 of the sound emission and collection device 1 in top view, and
the sound collection areas MA1 to MA8 formed around the sound emission and collection device
04-05-2019
4
1. FIG.
FIG. 2 is a diagram schematically showing the flow of the speaker direction detection in the
sound emission and collection device 1.
[0015]
As shown in FIGS. 1 and 2, the sound emission and collection device 1 includes microphone units
MU1 to MU8, logarithmic calculators L1 to L8, adders 10, amplifiers 11, subtracters SR1 to SR8,
and a maximum value detector 12. The comparator 14, the control unit 20, the speakers SP 1
and SP 2, an echo canceller (not shown), and the like are provided.
[0016]
The speakers SP1 and SP2 are disposed inside the casing substantially at the center of the sound
emission and collection device 1 in top view, and the regions on the upper surface side and the
lower surface side of the casing are based on the sound output signal S with the sound output
region. Emits noise.
[0017]
The microphone units MU <b> 1 to MU <b> 8 are arranged so as to be 45 ° rotationally
symmetrical with each other about the arrangement position of the speakers SP <b> 1 and SP
<b> 2 in top view.
Here, 45-degree rotational symmetry means that when a certain figure is rotated 45 degrees with
respect to a rotational symmetry center point, the original figure is overlapped.
The 45-degree rotational symmetry can also be expressed as eight-fold rotational symmetry.
[0018]
Also, in the microphone units MU1 to MU8, sound collection directivity is set such that the
sounds of the sound collection areas MA1 to MA8 are collected.
04-05-2019
5
Here, the sound collection areas MA1 to MA8 are formed so as to be eight-fold rotational
symmetric with the arrangement positions of the speakers SP1 and SP2 as a center.
[0019]
In such an arrangement, the wraparound transmission path lengths until all the emitted sound
from the speakers SP1 and SP2 are collected by the microphone units MU1 to MU8 through the
collection areas MA1 to MA8 are all microphone units MU1 to MU8. It becomes almost the same.
As a result, it is possible to equalize the wraparound sound level in which the sound emitted from
the speakers SP1 and SP2 gets into the microphone units MU1 to MU8 and is collected.
[0020]
Here, the configuration of the microphone units MU1 to MU8 will be described below, taking the
microphone unit MU1 as an example. The microphone units MU <b> 1 to MU <b> 8 have the
same configuration except for different sound collecting areas.
[0021]
The microphone unit MU1 includes microphones MIC1 to MIC4, linear filters F1 to F4, and an
adder SU1.
[0022]
The microphones MIC1 to MIC4 are arranged in a line along a predetermined reference plane,
and each have predetermined sound collecting directivity.
[0023]
The linear filters F1 to F4 perform delay processing on the collected sound signals collected by
04-05-2019
6
the microphones MIC1 to MIC4.
The adder SU1 synthesizes the collected sound signals subjected to delay processing in the linear
filters F1 to F4.
By using such a configuration and processing, sound collection directivity for realizing the sound
collection area MA1 as the entire microphone unit MU1 is set.
[0024]
The adder SU1 outputs the synthesized signal SA1 subjected to the above synthesis processing to
the logarithmic calculator L1 (see FIG. 2).
[0025]
The logarithmic calculators L1 to L8 calculate the logarithmic value (logarithmic power) of the
bass component included in the synthesized signal SAk output from the microphone units MU1
to MU8 according to the equation (1).
k is a subscript of 1 to 8 indicating the microphone units MU1 to MU8.
[0026]
Here, in general, the frequency band of the human audible range is 20 Hz to 20000 Hz, whereas
human voice contains many frequency bands of 400 Hz to 4000 Hz, which are relatively bass
components of the audible range.
[0027]
Therefore, in the sound emission and collection device 1, for example, logarithmic values of
signal power in the frequency band of 400 Hz to 4000 Hz, which is the above-described bass
component, are used in the logarithmic calculation units L1 to L8.
04-05-2019
7
In this way, it is possible to use frequency components containing many human voices for
speaker orientation detection. Therefore, the speaker direction can be detected more accurately.
[0028]
[0029]
Here, x k indicates the signal level of the combined signal SAk (SA1 to SA8), and P k indicates the
logarithmic value of the signal level (power level) of the power signal SBk (SB1 to SB8) with
respect to the combined signal SAk.
Also, k is a subscript from 1 to 8 indicating which one of the microphone units MU1 to MU8 is a
composite signal output from. t shows time. T is set by the sampling time length of the
synthesized signal SAk.
[0030]
Then, the logarithmic calculators L1 to L8 output the logarithmic value P k of the power level
calculated by the above equation (1) (see FIG. 2).
[0031]
The adder 10 and the amplifier unit 11 calculate the power level average value AV from the
logarithmic value P k of the power level based on the equation (2).
More specifically, the adder 10 calculates the sum of the logarithmic values P k of the power
levels and outputs the sum to the amplifier unit 11. The amplifier unit 11 calculates the power
level average value AV by dividing the sum of the power levels P k of logarithmic values by the
number N (N = 8 in the present embodiment) of the combined signal SAk.
[0032]
04-05-2019
8
[0033]
The subtractors SR1 to SR8 respectively subtract the power level average value AV from the
logarithmic value P k of the power level to generate a differential signal level D k (see the
following equation (3)).
[0034]
[0035]
Here, D k indicates a differential signal level.
[0036]
The maximum value detection unit 12 detects a difference signal level D kM indicating the
maximum value from among the difference signal levels D k and outputs the difference signal
level D kM to the comparator 14 (see FIG. 2).
[0037]
The comparator 14 compares the threshold value Th with the differential signal level D kM
indicating the maximum value output from the maximum value detection unit 12.
Then, when the difference signal level D kM is larger than the threshold value Th, the difference
signal level D kM is output to the control unit 20.
Note that the threshold Th is a level at which it can be determined that the speaker of the
apparatus speaks and picks up, and based on the level, the picked-up voice level becomes higher
by a predetermined level than the emitted voice level. It is set from the differential signal level of
On the other hand, when the difference signal level D kM is equal to or less than the threshold
value Th, the comparator 14 does not output the difference signal level D kM to the control unit
20.
As a result, when any one of the sound collection areas MA1 to MA8 speaks a voice louder to a
04-05-2019
9
certain extent than the aloud voice, the difference signal level D kM of the sound collection area
uttered by the speaker is set. It can be used for speaker orientation detection.
[0038]
When control unit 20 receives difference signal level D kM from comparator 14, control unit 20
outputs, as speaker direction information, direction information associated with the microphone
unit that has output difference signal level D kM from microphone units MU1 to MU8. .
Then, the control unit 20 newly maintains the detected speaker position until the difference
signal level D kM exceeding the threshold value Th is received from the comparator 14.
[0039]
Thus, even if the speakers SP1 and SP2 emit sound based on the sound emission signal S, it is
possible to accurately detect the speaker direction based on the combined signal SAk output
from the microphone units MU1 to MU8.
[0040]
In the sound emission and collection device 1 according to the present embodiment, the example
in which the difference signal level D kM and the threshold value Th are compared in the
comparator 14 has been described.
However, the present invention is not limited to this. For example, instead of using the
comparator 14, it is also conceivable to output a differential signal level D kM that directly
indicates the maximum value to the control unit 20 at predetermined time intervals to detect the
speaker direction.
[0041]
Here, as a method of detecting the speaker direction, the signal level of the sound emission signal
S and the signal levels x k of the synthesized signals SA1 to SA8 are compared, and the speaker
04-05-2019
10
position is detected based on the difference signal of the two. It is also conceivable that However,
in this case, since the value of the sound emission signal S is 0 when there is no sound emission,
a large calculation error occurs even if calculation is performed using the level of the sound
emission signal that is “0” as a reference level. And may cause problems in signal processing.
In addition, since the noise characteristics are different between the sound emission signal and
the sound collection signal, there is also a problem that it is difficult to detect the speaker
direction accurately even if the both are simply compared.
[0042]
On the other hand, in the sound emission and collection device 1, as shown in equation (3), the
power level average value AV of the logarithmic value is subtracted from the logarithmic value P
k of the power level to calculate the differential signal level D k. Therefore, it is possible to
calculate the difference signal level D k without directly using the signal level of the noise
emission signal S in the calculation formula. Therefore, the speaker direction can be detected
with high accuracy based on only the signal levels x k of the combined signals SA1 to SA8.
Further, in Equation (3), by using the logarithmic value, the difference signal level D k can be
calculated as the difference between the logarithmic value P k of the power level and the power
level average value AV. Therefore, the threshold value Th can be set as a fixed value, and there is
also an effect that the speaker direction can be detected using the threshold value Th which is
the fixed value.
[0043]
In the present embodiment, the threshold value Th has been described with an example fixed.
However, the present invention is not limited to this. For example, it is also conceivable to store a
plurality of threshold values in the comparator 14. In this case, the threshold Th can be switched
according to the use environment of the sound emission and collection device 1.
[0044]
Next, a specific example of the speaker direction detection of the sound emission and collection
device 1 will be described with reference to FIG.
[0045]
04-05-2019
11
FIG. 3A is a diagram showing the level of the sound emission signal S and the change in the level
W k of the utterance sound (speaker speech) in each sound collection region.
Further, FIG. 3 (B) is a diagram showing changes in the logarithmic value P k of the power level
and the power level average value AV. FIG. 3C is a view schematically showing the threshold
value Th and the differential signal level D k. Note that, in FIG. 3, the subscript i indicates a sound
collection area where the logarithmic value P k of the power level is the largest value among the
sound collection areas MA1 to MA8. On the other hand, the subscript j indicates a sound
collection area other than the subscript i. In FIG. 3, P j shows only one output for simplicity.
[0046]
In time zone I shown in FIG. 3, there is no sound emission from speakers SP1 and SP2, and the
state of the signal level when no one of the speakers in sound collection areas MA1 to MA8 is
speaking is shown schematically. There is. In this case, as shown in FIG. 3C, since both of the
difference signal levels D i and D j are smaller than the threshold value Th, the control unit 20
does not set a new speaker direction.
[0047]
In time zone II shown in FIG. 3, the speaker utters in one of the sound collection areas (area
corresponding to i) among sound collection areas MA1 to MA8 and there is no sound emission
from speakers SP1 and SP2. The state of the signal level is schematically shown.
[0048]
In this case, as shown in FIG. 3C, the differential signal level D i becomes larger than the
threshold value Th, and the other differential signal levels D j become smaller than the threshold
value Th.
Therefore, the control unit 20 sets the speaker orientation to the orientation of the microphone
unit indicated by the subscript i.
04-05-2019
12
[0049]
In time zone III shown in FIG. 3, when the speaker speaks in one of the sound collection areas
(area corresponding to i) among sound collection areas MA1 to MA8 and sound is emitted from
speakers SP1 and SP2 Further, the state of each signal level when the speech sound level is
substantially the same as the sound collection level of the sound emission sound that has run
around is schematically shown. In this case, as shown in FIG. 3C, the differential signal level D i is
smaller than the threshold value Th. Therefore, the control unit 20 does not update the speaker
orientation. That is, the set speaker orientation is maintained at the time point of the immediately
preceding time zone II.
[0050]
In the time zone IV shown in FIG. 3, although the sound is emitted from the speakers SP1 and
SP2, the voice collection area MA1 to MA8 is a voice that is somewhat louder than the emitted
sound, and an area corresponding to the sound collection area (i ) Shows the state of each signal
level when the speaker is speaking.
[0051]
In this case, as shown in FIG. 3C, the differential signal level D i becomes larger than the
threshold value Th, and the other differential signal levels D j become smaller than the threshold
value Th.
Therefore, the control unit 20 sets the speaker orientation to the orientation of the microphone
unit indicated by the subscript i.
[0052]
By performing such processing, it is possible to reliably detect the speaker direction regardless of
the sound emission state from the speakers SP1 and SP2. Also, even if the direction of the
speaker can not be detected due to the level of the emitted sound from the speaker, maintaining
the previous direction of the speaker does not eliminate the speaker direction or randomly
change the speaker direction. It is possible to maintain the orientation with high possibility of the
person's orientation.
04-05-2019
13
[0053]
In the above embodiment, an example in which the microphone units MU1 to MU8 are arranged
in an octagonal shape so as to be arranged in eight rotational symmetry around the speakers SP1
and SP2 has been described. However, the present invention is not limited to this. That is, the
wraparound of the sound emission sound from the speaker may be equalized in all the
microphone units. For example, if the respective sound collection areas are formed in rotational
symmetry centering on the speakers SP1 and SP2, the microphone unit is positive. It may be
arranged in a triangular shape. In this case, since each sound collection area which each
microphone unit picks up can be formed in three-fold rotational symmetry, the same effect as the
above embodiment can be obtained.
[0054]
In the above embodiment, an example in which the sound collection areas MA1 to MA8 are
formed to be rotationally symmetric around the speakers SP1 and SP2 has been described.
However, the present invention is not limited to this. For example, when the wraparound of the
emitted sound from the speaker in the predetermined sampling time width is equal in all the
microphone units that collect the sound, the ON / OFF switching of the microphone unit
collecting sound in every predetermined sampling time width is switched Alternatively, it may be
considered to set to change the shape of each sound collection area. Also in this case, the same
effect as that of the above embodiment can be obtained.
[0055]
Also, when the sound emission characteristics (directivity) from the speakers SP1 and SP2 are
variable, the sound collection directivity of each microphone unit is obtained so that the same
level of wraparound can be obtained for all the microphone units according to this change. You
may control sex. That is, the mechanical positional relationship is not particularly limited as long
as the wraparound sound levels in all the microphone units are the same.
[0056]
04-05-2019
14
It should be understood that the above description of the embodiment is illustrative in all points
and not restrictive. The scope of the present invention is indicated not by the embodiments
described above but by the claims. Further, the scope of the present invention is intended to
include all modifications within the scope and meaning equivalent to the claims.
[0057]
It is a figure which shows typically the positional relationship of the speaker and microphone
unit in the plain view of the sound emission and collection device which concerns on one
Embodiment of this invention, and each sound collection area | region. It is a figure which shows
typically the flow of the speaker direction detection in the sound emission-collection apparatus
shown in FIG. (A) is a figure which shows the change of the level of the sound emission signal S,
and the level Wk of the utterance sound (speaker speech) in each collection area, (B) is
logarithmic value Pk of a power level, and It is a figure which shows the change of the power
level average value AV, and (C) is a figure which shows threshold value Th and difference signal
level Dk typically.
Explanation of sign
[0058]
DESCRIPTION OF SYMBOLS 1 sound emission and collection device 10 adder 11 amplifier part
12 maximum value detection part 14 comparator 20 control part AV power level average value
Dk, DkM difference signal level F1-F32 linear filter L1-L8 logarithm calculation part MA1-MA8
collection Sound area MU1 to MU8 Microphone unit P k Logarithmic value S of power level S
emission signal SAk Combined signal SBk Power signals SP1 and SP2 Speakers SR1 to SR8
Subtractors SU1 to SU8 Adder
04-05-2019
15
Документ
Категория
Без категории
Просмотров
0
Размер файла
25 Кб
Теги
jp2009089133
1/--страниц
Пожаловаться на содержимое документа