close

Вход

Забыли?

вход по аккаунту

?

JP2008259022

код для вставкиСкачать
Patent Translate
Powered by EPO and Google
Notice
This translation is machine-generated. It cannot be guaranteed that it is intelligible, accurate,
complete, reliable or fit for specific purposes. Critical decisions, such as commercially relevant or
financial decisions, should not be based on machine-translation output.
DESCRIPTION JP2008259022
The present invention provides a sound emission and collection device which accurately
determines only the voice of a speaker even if it has a speaker and microphone integrated
structure and does not erroneously determine a non-stationary sound from a distance as the
voice of the speaker. Do. A signal difference circuit outputs difference signals MS1 to MS4 of
sound collection beams MB11 to MB14 and sound collection beams MB21 to MB24. The level
comparator 195 selects the differential signal at the maximum level. The waveform shaping
circuit 197 outputs an effective sound detection signal to the control unit 10 if the difference
signal at the maximum level is equal to or higher than a predetermined threshold. The control
unit 10 sets the voice signal to be output to the network only when the valid sound detection
signal is input. [Selected figure] Figure 5
Sound emission device
[0001]
The present invention relates to a sound emission and collection device used for an audio
conference or the like held between a plurality of points via a network or the like, and more
particularly to a sound emission and collection device in which a microphone and a speaker are
arranged relatively close to each other.
[0002]
2. Description of the Related Art Conventionally, as a method of conducting an audio conference
between remote places, a method of installing a sound emitting and collecting device at each
04-05-2019
1
point where an audio conference is to be performed and connecting these devices by a network
to communicate audio signals is widely used.
Such an audio conference apparatus is equipped with a function of determining the presence or
absence of sound from the signal level of the sound collected by the microphone and not
transmitting the audio information when there is no sound. By not transmitting voice information
at the time of silence, the amount of information to be sent can be reduced.
[0003]
However, in the above-described voice conference apparatus, when there is a high level
background sound (noise of an air conditioner or the like), it is always determined to be voiced.
Also, when the speaker's speech level is low and the speaker's input speech signal level is low, it
is always judged as silent. Therefore, the power of the input speech signal is determined from the
relative level to the background sound, and even if the background sound is large or the
speaker's speech level is low, the speech input of the speaker is accurately judged. An apparatus
that has been developed has been proposed (see, for example, Patent Document 1). Japanese
Patent Application Laid-Open No. 11-205460
[0004]
However, in the device of Patent Document 1, there is a problem that when the wraparound
voice from the speaker is input to the microphone, this is erroneously determined as the voice of
the speaker. Further, in the device of Patent Document 1, although it is possible to exclude a
steady background sound, for example, there is a problem that a non-stationary sound input from
a distance is erroneously determined as the voice of the speaker.
[0005]
According to the present invention, there is provided a sound emission and collection device
which accurately determines only the voice of a speaker even if it has a speaker and microphone
integrated structure and does not erroneously determine non-stationary sound from a distance as
the voice of the speaker. Intended to be provided.
[0006]
04-05-2019
2
In the sound emission and collection device of the present invention, a speaker for emitting
sound at a sound pressure symmetrical to a predetermined reference plane, a first microphone
group for collecting sound on one side of the predetermined reference plane, and a speaker A
second microphone group for collecting voice, each collected beam of a first collected beam
group based on a collected signal of the first microphone group, and a second collected sound
based on a collected signal of the second microphone group Sound collecting beam generating
means for generating the sound collecting beams of the beam group symmetrically with respect
to the predetermined reference plane; and the second collection of the sound collecting beams of
the first sound collecting beam group. Difference signal generating means for respectively
generating difference signals of sound collection beams of sound beam, signal comparison means
for comparing difference levels of each difference signal and selecting difference signal of
maximum level, difference of the maximum level Active sound detection signal when the absolute
level of the signal is equal to or higher than a predetermined threshold Differential signal level
detection means for outputting; sound collection beam output means for outputting a differential
signal of the maximum level selected by the signal comparison means only when the differential
signal level detection means outputs a valid sound detection signal; , And characterized in that.
[0007]
In this configuration, since each sound collection beam of the first sound collection beam group
and each sound collection beam of the second sound collection beam group are symmetrical with
respect to the reference plane, the sound collection beams having a plane symmetry relationship
are The wraparound speech component has the same size in the direction perpendicular to the
reference plane.
The sound collection beams in the plane symmetry relationship are differentiated to generate a
difference signal, and the signal levels of these are compared.
As a result, the differential signal which becomes the maximum level is selected as the output
signal. Furthermore, it is detected whether or not the absolute level of the differential signal
which is the maximum level is equal to or higher than a predetermined threshold, and an output
signal is externally output only when the absolute level is equal to or higher than the
predetermined threshold.
[0008]
Further, in the sound emission and collection device according to the present invention, the first
04-05-2019
3
microphone group and the second microphone group may be microphone arrays in which a
plurality of microphones are arranged in a straight line along the predetermined reference plane.
It features.
[0009]
In this configuration, when a sound collection beam is generated based on the sound collection
signal of each microphone by configuring the microphone array along a predetermined reference
plane, a simple signal such as delay processing for each sound collection signal You just have to
do the processing.
[0010]
Further, in the sound emission and collection device of the present invention, the difference
signal level detection means outputs an effective sound detection signal when a state where the
absolute level of the maximum level difference signal is less than the predetermined threshold
continues for a predetermined time or more. To stop.
[0011]
In this configuration, the output signal is not output to the outside when a situation where the
absolute level of the differential signal which is the maximum level is less than the predetermined
threshold continues for a predetermined time or more.
By not having to stop the immediate signal output when it becomes less than the predetermined
threshold but to stop when continuing for a predetermined time or more, it is possible to
continuously output a series of conversation contents in which short breaks exist. .
[0012]
According to the sound emission and collection device of the present invention, a speaker for
emitting sound at a sound pressure symmetrical to a predetermined reference plane, and sound
of a predetermined area on one side of the predetermined reference plane are collected as a first
sound collection signal. A first sound collecting means for sounding, and a second sound
collecting means for collecting the sound of the other area symmetrical to the first sound
collecting signal with respect to the predetermined reference plane as a second sound collecting
signal Means, difference signal generating means for generating a difference signal of the first
sound collection signal and the second sound collection signal, and an effective sound when the
04-05-2019
4
absolute level of the difference signal is equal to or more than a predetermined threshold value A
differential signal level detection means for outputting a detection signal, and a control means for
outputting the differential signal only when the differential signal level detection means is
outputting an effective sound detection signal. .
[0013]
In this configuration, since the first collected signal and the second collected signal are audio
signals obtained by collecting an area symmetrical with respect to the reference plane,
wraparound of the respective collected signals in a plane-symmetrical relationship The audio
components have the same magnitude in the direction perpendicular to the reference plane.
A difference signal is generated by subtracting the respective collected signals in the plane
symmetry relationship.
Furthermore, it is detected whether or not the absolute level of the differential signal which is the
maximum level is equal to or higher than a predetermined threshold, and the differential signal is
output to the outside only when the absolute level is equal to or higher than the predetermined
threshold.
In order to pick up a symmetrical area on the reference plane, it is sufficient to use two
unidirectional microphones whose directivity axes are directed symmetrically with respect to the
reference plane. Also, by arranging the microphone array composed of a plurality of
microphones symmetrically with respect to the reference plane, two sound collection beams
symmetrical with respect to the reference plane may be generated.
[0014]
Further, in the sound emission and collection device of the present invention, the first sound
collection means and the second sound collection means may each be a microphone array in
which a plurality of microphones are arranged in a straight line along the predetermined
reference plane. The virtual focus is set in the direction opposite to the area to be collected for
each microphone array, and the audio signals collected by each microphone are respectively
delayed and synthesized so that the distances to the virtual focus become equal. And sound
collecting beam generation means for collecting the first sound collecting signal and the second
sound collecting signal from an area symmetrical to the predetermined reference plane.
04-05-2019
5
[0015]
In the present invention, the microphone array is configured along a predetermined reference
plane.
A focus is set to the rear of the microphone array, and the sound signal is picked up by a
wavefront that converges to this focus. Here, “delay the distance to the virtual focal point to be
equal” means that the microphones of the microphone array have different distances from the
virtual focal point, but these microphones are arranged equidistant from the virtual focal point It
is a process of delaying the collected sound signal of the microphone far from the virtual focus so
that the signal can be synthesized at such timing. By performing such processing, the area
sandwiched by two half straight lines passing from the virtual focus to both ends of the
microphone array becomes a sound collection area. By making these sound collection areas
symmetrical with respect to the reference plane, the first sound collection signal and the second
sound collection signal are collected from the area symmetrical with respect to the
predetermined reference plane.
[0016]
Further, in the sound emission and collection device of the present invention, the difference
signal level detection means outputs an effective sound detection signal when a state where the
absolute level of the maximum level difference signal is less than the predetermined threshold
continues for a predetermined time or more. To stop.
[0017]
According to the present invention, differential signals are generated by subtracting the
respective sound collection signals in the plane symmetry relationship, and it is detected whether
or not the absolute level of the differential signal to be the maximum level is a predetermined
threshold or more. By outputting the difference signal to the outside only when it is the above,
only the voice (effective voice) of the speaker is accurately determined even in the speaker and
microphone integrated configuration, and only the valid voice is obtained. It can be output to the
outside.
[0018]
04-05-2019
6
A sound emission and collection device according to a first embodiment of the present invention
will be described with reference to the drawings.
FIG. 1 is a plan view showing the arrangement of microphones and speakers of the sound
emission and collection device 1 according to the present embodiment.
The sound emission and collection device 1 of the present embodiment includes a plurality of
speakers SP1 to SP3 and a plurality of microphones MIC11 to MIC17 and MIC21 to MIC27 in a
housing 101.
[0019]
The housing 101 has a substantially rectangular shape elongated in one direction, and at both
ends of the long sides (faces) of the housing 101, the lower surface of the housing 101 is
separated from the installation surface by a predetermined distance. Legs (not shown) are
installed. In the following description, among the four side surfaces of the housing 101, the long
surface is referred to as a long surface, and the short surface is referred to as a short surface.
[0020]
On the lower surface of the casing 101, single directional non-directional speakers SP1 to SP3
having the same shape are installed. The single speakers SP1 to SP3 are linearly installed at
regular intervals along the longitudinal direction, and the straight line connecting the centers of
the single speakers SP1 to SP3 is along the long surface of the housing 101, The horizontal
position is set to coincide with a central axis 100 connecting the centers of the short surfaces.
That is, a straight line connecting the centers of the speakers SP1 to SP3 is disposed on a vertical
reference plane including the central axis 100. As described above, the speaker array SPA 10 is
configured by arranging the single speakers SP1 to SP3. In such a state, when sound is emitted
from each single speaker SP1 to SP3 of the speaker array SPA10, the emitted sound is equally
transmitted to the two long surfaces. At this time, the emitted sound propagating to the two
opposing long planes travels in mutually symmetrical directions orthogonal to the reference
plane.
04-05-2019
7
[0021]
Microphones MIC11 to MIC17 of the same specification are installed on one long surface of the
housing 101. The microphones MIC11 to MIC17 are linearly arranged at regular intervals along
the longitudinal direction, and thereby the microphone array MA10 is configured. Further, on the
other long surface of the housing 101, the microphones MIC21 to MIC27 of the same
specification are installed. The microphones MIC21 to MIC27 are also linearly arranged at
regular intervals along the longitudinal direction, and the microphone array MA20 is thus
configured. The microphone array MA10 and the microphone array MA20 are arranged such
that the vertical positions of their arrangement axes coincide with each other, and furthermore,
the microphones MIC11 to MIC17 of the microphone array MA10 and the microphones MIC21
to MIC27 of the microphone array MA20 They are respectively disposed at symmetrical
positions with respect to the reference plane. Specifically, for example, the microphone MIC11
and the microphone MIC21 are symmetrical with respect to the reference plane, and similarly,
the microphone MIC17 and the microphone MIC27 are symmetrical.
[0022]
In the present embodiment, the number of speakers in the speaker array SPA10 is three and the
number of microphones in each of the microphone arrays MA10 and MA20 is seven. However,
the number of speakers and the number of microphones are not limited thereto. May be set as
appropriate. In addition, each speaker interval of the speaker array and each microphone interval
of the microphone array may not be constant. For example, they are densely arranged at the
central portion along the longitudinal direction and are sparsely arranged toward both ends It
may be an aspect as well.
[0023]
The sound emission and collection device of the present embodiment beam-forms the sound
collection directivity of the entire microphone array by delaying and synthesizing the sounds
collected by the respective microphones. At the destination of the sound collection beam, the
voice of the speaker generated in a specific spot or area is collected with a high gain, and the
voice (noise) of the non-speaker is suppressed. Note that the beamed directivity of the beam is
called a collected beam.
04-05-2019
8
[0024]
In the sound emission and collection device of this embodiment, the above-described beam
forming of sound collection directivity can be performed in two modes. FIG. 2 is a diagram for
explaining these two modes. The figure (A) is a figure explaining the spot sound collection mode
which is the 1st mode. Further, FIG. 6B is a diagram for explaining an area sound collection mode
which is a second mode.
[0025]
In FIG. 6A, the spot sound collection mode is a mode in which a sound collection beam is formed
so as to focus on a point (sound collection spot) to be collected, and sound in a narrow range is
collected with high gain. Here, the sound collection spots P1 to P4 are set to, for example, the
seating positions of the meeting attendees. The audio signal collected by each of the microphones
MIC11 to MIC17 (or MIC21 to MIC27) is delayed so as to be equidistant from the focal point (F4
in the figure) and then synthesized to produce high-level audio generated around the focal point
It can be taken out by gain.
[0026]
Here, the equal distance from the focal point means that the sum of the physical distance from
the focal point to the microphone and the distance obtained by multiplying the delay time of the
sound signal collected by the microphone by the speed of sound is equal for each of the
microphones.
[0027]
In the spot sound collection mode, a plurality of sound collection spots are set in accordance with
the seat of the conference room and the like.
For example, as shown in FIG. 3A, microphone array MA10 forms sound collection beams MB11
to MB14 directed to the respective sound collection spots in parallel, and microphone array
MA20 collects the sound collection beams directed to each sound collection spot. Sound beams
MB21 to MB24 are formed in parallel.
04-05-2019
9
[0028]
Next, FIG. 2B is a diagram for explaining an area sound collection mode. This mode is a mode in
which a virtual focal point F10 is set behind the microphone array, and an audio signal heading
for the focal point F10 is picked up by the microphone array. In this mode, the area between the
virtual focus F10 and the two half lines R10 and R11 passing through both ends MIC11 and
MIC17 of the microphone array is a sound collection area. Note that the virtual focus position in
the area sound collection mode is not limited to the position of F10, and is set according to the
area where sound collection is desired. In FIG. 2B, the microphone closest to the virtual focal
point F10 is the microphone MIC11, and the distance thereof is L11. Distances L12 to L17 from
the other microphones MIC12 to MIC17 to the focal point F10 are longer than L11. Therefore,
the signals collected by the microphones MIC12 to MIC17 are L12 to L17 and L11 so that the
virtual distance from the microphones MIC12 to MIC17 to the focal point is equal to the distance
L11 between the microphones MIC11 and F10. Add a delay corresponding to the difference. As a
result, after the voices coming from the sound collecting area are picked up by the respective
microphones, the timing is adjusted by the above delay, and they are synthesized at substantially
the same timing, and the level can be raised.
[0029]
In the area pickup mode, the gain is not high because the sound pickup range is wider compared
to the above-mentioned spot pickup mode, but a wide area can be collected at one time, and the
speaker moves when the speaker moves. It is possible to pick up sounds properly without having
to In this area sound collection mode, for example, as shown in FIG. 3B, the microphone array
MA10 forms a sound collection beam MB101 directed to the front sound collection area of the
microphones MIC11 to MIC17, and the microphone array MA20 is a microphone A sound
collection beam MB201 directed to the front sound collection area of MIC21 to MIC27 is formed.
[0030]
Next, FIG. 4 is a block diagram showing the configuration of the sound emission and collection
device 1. As shown in FIG. 4, the sound emission and collection device 1 of the present
embodiment includes an operation unit 4, a control unit 10, an input / output connector 11, an
input / output I / F 12, a sound emission directivity control unit 13, and a D / A converter 14. ,
04-05-2019
10
The sound emitting amplifier 15, the aforementioned speaker array SPA10 (speakers SP1 to
SP3), the aforementioned microphone arrays MA10 and MA20 (microphones MIC11 to MIC17,
MIC21 to MIC27), the sound collecting amplifier 16, the A / D converter 17, and The sound
beam generation units 181 and 182, the sound collection beam selection / correction unit 19,
and the echo cancellation unit 20 are provided.
[0031]
The control unit 10 controls the sound emission and collection device 1 in an integrated manner,
and instructs the sound collection beam generation units 181 and 182 and the sound collection
beam selection / correction unit 19 to switch the above-described two sound collection modes.
The operation unit 4 receives an operation input from a user and outputs the operation input to
the control unit 10. The user can use the operation unit 4 to issue an instruction to switch
between two sound collection modes. Further, the control unit 10 sets whether or not to output
an audio signal to the input / output I / F 12 based on the signal level of each sound collection
beam detected by the sound collection beam selection / correction unit 19. That is, the sound
collection beam selection / correction unit 19 determines an effective sound and an invalid
sound from the signal levels of the respective sound collection beams, and the control unit 10
outputs an audio signal only when the determination result of the effective sound is acquired. To
set. Details will be described later.
[0032]
The input / output I / F 12 converts an input audio signal from another sound emitting and
collecting device input through the input / output connector 11 from a data format (protocol)
corresponding to the network, and transmits the converted signal through the echo cancellation
unit 20. To the sound emission directivity control unit 13. Further, the input / output I / F 12
converts the output voice signal generated by the echo cancellation unit 20 into a data format
(protocol) corresponding to the network, and transmits the data to the network through the input
/ output connector 11.
[0033]
The sound emission directivity control unit 13 simultaneously supplies a sound emission signal
based on the input sound signal to the speakers SP1 to SP3 of the speaker array SPA 10 if the
04-05-2019
11
sound emission directivity is not set. Further, when the sound emission directivity such as setting
of the virtual point sound source is designated, the sound emission directivity control unit 13
sets each of the speakers SP1 to SP3 of the speaker array SPA 10 based on the designated sound
emission directivity. An individual sound emission signal is generated by performing an inherent
delay process, an amplitude process, and the like on the input sound signal. The sound emission
directivity control unit 13 outputs these individual sound emission signals to the D / A converter
14 installed for each of the speakers SP1 to SP3. Each D / A converter 14 converts an individual
sound emission signal into an analog format and outputs it to each sound emission amplifier 15,
and each sound emission amplifier 15 amplifies the individual sound emission signal and gives it
to the speakers SP1 to SP3.
[0034]
The speakers SP1 to SP3 emit the given individual sound emission signal to the outside. Since the
speakers SP1 to SP3 are installed on the lower surface of the housing 101, the emitted sound is
reflected on the installation surface of the desk on which the sound emission and collection
device 1 is installed, and there is a conference person from the side of the device It is propagated
obliquely upward. In addition, part of the emitted sound flows from the bottom of the sound
emission and collection device 1 to the side where the microphone arrays MA10 and MA20 are
installed.
[0035]
The microphones MIC11 to MIC17 and MIC21 to MIC27 of the microphone arrays MA10 and
MA20 may be omnidirectional or directional, but are preferably directional and are external to
the sound emission and collection device 1. And the sound pickup signal is output to each sound
pickup amplifier 16.
[0036]
At this time, from the configuration of the speaker array SPA10 and the configuration of the
microphone arrays MA10 and MA20, the microphones MIC1n (n = 1 to 7) of the microphone
array MA10 and the microphones of the microphone array MA20 located at symmetrical
positions with respect to the reference plane. In MIC2n (n = 1 to 7), the wraparound sound from
the single speakers SP1 to SP3 of the speaker array SPA10 is equally collected.
[0037]
04-05-2019
12
Each sound pickup amplifier 16 amplifies the sound pickup signal and applies it to the A / D
converter 17, and the A / D converter 17 converts the sound pickup signal into a digital signal
and outputs it to the sound pickup beam generating units 181 and 182. .
A sound collection signal from each of the microphones MIC11 to MIC17 of the microphone
array MA10 installed on one long surface is input to the sound collection beam generation unit
181, and the other long surface is input to the sound collection beam generation unit 182. A
sound collection signal is input to the microphones MIC21 to MIC27 of the microphone array
MA20 installed in the.
[0038]
The sound collection beam generation unit 181 and the sound collection beam generation unit
182 select the spot type sound collection beam or the area type sound collection beam shown in
FIGS. 2 and 3 based on the sound collection mode specified by the control unit 10. In order to
form either of the above, the delay processing is performed on the sound signal collected by each
microphone.
[0039]
The sound collection beam generation unit 181 performs predetermined delay processing or the
like on the sound collection signals of the microphones MIC11 to MIC17, and in the spot sound
collection mode, it is a signal emphasizing the sound coming from a specific spot as described
above. A certain sound collecting beam MB11 to MB14 is generated.
Further, in the area pickup mode, the pickup beam MB101, which is a signal in which an audio
signal coming from a specific area is emphasized, is generated.
As shown in FIG. 3 (A), the sound collection beams MB11 to MB14 have, on the long surface side
on which the microphones MIC11 to MIC17 are installed, regions with different predetermined
widths along the long surface, respectively. It is set as (a specific space, direction to be
emphasized by the sound collection beam). As shown in FIG. 3 (B), the sound collecting beam
MB101 has an area (wide area) of a predetermined width along the long surface on the long
surface side where the microphones MIC11 to MIC17 are installed as a sound collecting beam
region. It is set.
04-05-2019
13
[0040]
The sound collection beam generation unit 182 performs predetermined delay processing or the
like on the sound collection signals of the microphones MIC21 to MIC 27 and generates sound
collection beams MB21 to MB24 in the spot sound collection mode. Also, in the area pickup
mode, the pickup beam MB201 is generated. As shown in FIG. 3A, the sound collection beams
MB21 to MB24 have, on the long surface side on which the microphones MIC21 to MIC 27 are
installed, regions with different predetermined widths along the long surface, respectively. It is
set as. As shown in FIG. 3B, the sound collecting beam 201 has an area (wide area) of a
predetermined width along the long surface on the long surface side on which the microphones
MIC21 to MIC 27 are installed as a sound collecting beam region. It is set.
[0041]
At this time, the sound collection beam MB11 and the sound collection beam MB21 are formed
as beams symmetrical with respect to a vertical plane (reference plane) having the central axis
100. Similarly, the collected beam MB12 and the collected beam MB22, the collected beam
MB13 and the collected beam MB23, and the collected beam MB14 and the collected beam
MB24 are formed as symmetrical beams with respect to the reference plane. Also, the sound
collection beam MB101 and the sound collection beam MB201 are also formed as symmetrical
beams with respect to the reference plane.
[0042]
The sound collecting beam selecting / correcting unit 19 calculates difference signals between
the sound collecting beams MB11 to MB14 and the sound collecting beams MB21 to MB24 input
in the spot sound collecting mode, respectively, and among the difference signals of these levels,
A high signal is selected, and the difference signal is output to the echo cancellation unit 20 as a
corrected sound collection beam MB. Further, the sound collection beam selection / correction
unit 19 calculates a difference signal between the sound collection beam MB101 input in the
area sound collection mode and the sound collection beam MB201, and echoes this difference
signal as the corrected sound collection beam MB. Output to the cancellation unit 20.
04-05-2019
14
[0043]
Further, the sound collection beam selection / correction unit 19 outputs the determination
result of the valid sound and the invalid sound to the control unit 10 from the signal level of the
selected difference signal. The sound collection beam selection / correction unit 19 determines
as an effective sound if the absolute value of the signal level of the difference signal is equal to or
more than a predetermined threshold, and as an invalid sound if less than the threshold
continues for a predetermined time or more.
[0044]
FIG. 5 is a block diagram showing the main configuration of the sound collection beam selection
/ correction unit 19. The sound collection beam selection / correction unit 19 includes a signal
difference circuit 191, a BPF (band pass filter) 192, a full wave rectification circuit 193, a peak
detection circuit 194, a level comparator 195, a signal selection circuit 196, a waveform shaping
circuit 197, and A subtractor 199 is provided.
[0045]
The signal difference circuit 191 calculates a difference between sound collection beams
symmetrical to the reference plane from the sound collection beams MB11 to MB14 and MB21
to MB24. Specifically, the difference between the sound collection beams MB11 and MB21 is
calculated to generate a difference signal MS1, and the difference between the sound collection
beams MB12 and MB22 is calculated to generate a difference signal MS2. Further, a difference
signal MS3 is generated by calculating a difference between the sound collection beams MB13
and MB23, and a difference signal MS4 is generated by calculating a difference between the
sound collection beams MB14 and MB24. In the differential signals MS1 to MS4 generated in
this manner, the original sound collecting beams are symmetrical with respect to the axis of the
speaker array on the reference plane.
[0046]
The BPF 192 is a band pass filter having a band mainly having beam characteristics and a main
component band of human voice as a pass band, performs band pass filtering on the differential
04-05-2019
15
signals MS1 to MS4, and outputs the result to the full wave rectification circuit 193. . The full
wave rectification circuit 193 performs full wave rectification (absolute value conversion) on the
differential signals MS1 to MS4, and the peak detection circuit 194 performs peak detection on
the full wave rectified differential signals MS1 to MS4 to obtain peak value data Ps1 to Ps1.
Output Ps4. The level comparator 195 compares the peak value data Ps1 to Ps4 and provides
selection instructing data for selecting the differential signal MS corresponding to the peak value
data Ps of the highest level to the signal selection circuit 196. The level comparator 195 also
outputs the peak value data Ps of the highest level among the peak value data Ps1 to Ps4 to the
waveform shaping circuit 197.
[0047]
The waveform shaping circuit 197 determines whether the peak value data Ps input from the
level comparator 195 is equal to or more than a predetermined threshold value or less than the
threshold value. It outputs the invalid sound detection signal to the control unit 10 if the value
less than the threshold continues for a predetermined time or more (for example, 50 msec) or
stops the output of the valid sound detection signal. The waveform shaping circuit 197 continues
to output an effective sound detection signal within a predetermined time even if the peak value
data Ps input from the level comparator 195 is less than the threshold. The control unit 10 sets
the voice signal to be output to the input / output I / F 12 only when the valid sound detection
signal is input, and when the invalid sound detection signal is input (when the valid sound
detection signal is not input) Is set so as not to output an audio signal to the input / output I / F
12.
[0048]
This utilizes the fact that the signal level of the sound collection beam corresponding to the
sound collection area where the utterer is present is higher than the signal level of the sound
collection beam corresponding to other areas. That is, in the case where sound collection beams
having a symmetrical relationship with respect to the reference plane are one of the sound
collection beams corresponding to the sound collection region in which the utterer is present, the
signal level of the difference signal is an utterance sound from the utterer Exist at a certain
height based on. However, when both are sound collection beams corresponding to the area
where the speaker does not exist, the wraparound speech components cancel each other out, and
the signal level of the difference signal becomes extremely low. For this reason, the difference
signal including the sound collection beam corresponding to the sound collection area where the
utterer is present has a signal level higher than that of the other difference signals. Therefore, by
04-05-2019
16
selecting the difference signal with the highest signal level, the direction of the speaker can be
detected, and if the signal level of this difference signal is equal to or higher than the
predetermined threshold value, it is determined that the speaker has an utterance sound. can do.
[0049]
FIG. 6 is a diagram showing the situation where two conference persons A and B are holding a
conference, with the sound emission and collection device 1 of the present embodiment arranged
on a desk C, and (A) shows the situation of the conferencer A (B) shows the situation where the
conferee B is speaking, and (C) shows the situation where the sudden sound is generated from
the distant sound source D without the conferees A and B speaking.
[0050]
For example, as shown in FIG. 6A, when the conferee A in the area corresponding to the sound
collection beam MB13 speaks, the signal level of the sound collection beam MB13 is different
from that of the other sound collection beams MB11, MB12, MB14 and MB21 to It becomes
higher than the signal level of MB24.
Therefore, the signal level of the difference signal MS3 obtained by subtracting the sound
collection beam MB23 from the sound collection beam MB13 becomes higher than the signal
levels of the difference signals MS1, MS2, and MS4. As a result, peak value data Ps3 of difference
signal MS3 becomes higher than other peak value data Ps1, Ps2 and Ps4, and level comparator
195 detects peak value data Ps3 and selects selection signal difference signal MS3. The data is
applied to signal selection circuit 196. Further, since the peak value data Ps3 is at a high level
(above the threshold), the waveform shaping circuit 197 outputs an effective sound detection
signal to the control unit 10, assuming that the peak value data Ps3 exceeds the predetermined
threshold.
[0051]
FIG. 7 is a diagram showing each sound collecting beam and the signal level (average energy) of
the difference signal. The horizontal axis of the graph shown in the figure represents time. The
figure (A) is a figure showing the signal level of each sound collection beam, and the figure (B) is
a figure showing any one signal level among difference signals. The figure (C) is a figure showing
the effective sound / ineffective sound detection signal which waveform shaping circuit 197
04-05-2019
17
outputs.
[0052]
As shown in (A) of the figure, each sound collection beam shows an average energy of about 20
dB when no sound is generated in the sound collection region, and when sound is generated,
about 40 to 60 dB Indicates the average energy. Here, each sound collection beam shows an
average energy of about 40 to 60 dB in a section of 400 to 700 msec. As these sound collection
beams have the same average energy as each other in the sound collection beams symmetrical to
the reference plane, the average energy of the difference signal is zero as shown in FIG. It will be
around dB. Therefore, in this case, the waveform shaping circuit 197 outputs an invalid sound
detection signal to the control unit 10, assuming that the peak value data Ps is less than the
predetermined threshold and this situation continues for a predetermined time (for example, 50
msec) or more. In this case, the level comparator 195A provides selection instructing data to the
signal selection circuit 196 based on the peak value data Ps at the highest level immediately
before.
[0053]
On the other hand, although each sound collection beam shows an average energy of about 40 to
60 dB even in a section of 700 to 900 msec, any one sound collection beam has an average
energy higher by about 10 dB than the other sound collection beams. Show. This situation is a
state in which a voice is generated from the speaker in any one sound collecting area as shown in
FIG. 6 (A). Therefore, as shown to the figure (B), the average energy of difference signal MS3 will
be about 10 dB. In this case, the waveform shaping circuit 197 outputs an effective sound
detection signal to the control unit 10, assuming that the peak value data Ps3 is equal to or
greater than a predetermined threshold (for example, 5 dB).
[0054]
In addition, although each sound collection beam shows an average energy of about 40 to 60 dB
in the subsequent 900 to 1100 msec interval, these sound collection beams are in the sound
collection beams which are symmetrical with respect to the reference plane, Both have the same
average energy. Therefore, as described above, the average energy of the difference signal
becomes about 0 dB, and the waveform shaping circuit 197 outputs the invalid sound detection
04-05-2019
18
signal to the control unit 10.
[0055]
On the other hand, as shown in FIG. 6B, when the conferee B in the area corresponding to the
sound collection beam MB21 speaks, the level comparator 195A detects the peak value data Ps1
and selects the difference signal MS1. The selection instruction data is applied to signal selection
circuit 196. In this case, as shown in the interval of 1100 to 1200 msec in FIG. 7A, each of the
sound collection beams indicates an average energy of about 40 to 60 dB, but any one of the
sound collection beams (sound collection beam MB21 ) Show an average energy about 10 dB
higher than the other sound collecting beams.
[0056]
Therefore, as shown to the figure (B), the average energy of difference signal MS1 will be about 10 dB. In FIG. 7B, for ease of explanation, the difference signal MS3 is shown in the 700 to 900
msec section, and the average energy of the difference signal MS1 is shown in the 1100 to 1200
msec section. It is assumed that the device is calculating each differential signal. Here, the
difference signal indicates minus because the signal difference circuit 191 subtracts the signal
level of the sound collection beam on the microphone array MA20 side from the sound collection
beam on the microphone array MA10 side. In this case, the waveform shaping circuit 197
outputs an effective sound detection signal to the control unit 10, assuming that the peak value
data Ps1 (absolute value) is equal to or greater than a predetermined threshold (for example, 5
dB).
[0057]
Further, as shown in FIG. 6 (C), when a sound is emitted from a distant sound source C in a
situation where both of the conferees A and B are not speaking, the sound source C collects any
of the collected sound beams. Since it does not correspond to the sound area, the average energy
of each sound collection beam will not be high.
[0058]
As described above, the selection instruction data selected by level comparator 195 is output to
04-05-2019
19
signal selection circuit 196, and signal selection circuit 196 generates two differential signals MS
forming instructed differential instruction data. The sound collection beams MB1x and MB2x (x =
1 to 4) are selected.
For example, in the situation of FIG. 6 (A), the sound collection beams MB13 and MB23
constituting the difference signal MS3 are selected, and in the situation of FIG. 6 (B), the sound
collection beam MB11 constituting the difference signal MS1. , MB21 is selected.
[0059]
The subtractor 199 subtracts the sound collection beam MB2x from the sound collection beam
MB1x input from the signal selection circuit 196 and applies the corrected sound collection
beam MB to the echo cancellation unit 20.
[0060]
For example, in the case of FIG. 6A, the sound collection beam MB23 is subtracted from the
sound collection beam MB13 and given to the echo cancellation unit 20 as the corrected sound
collection beam MB, and in the case of FIG. The sound collection beam MB21 is subtracted from
the sound collection beam MB11 and given to the echo cancellation unit 20 as a corrected sound
collection beam MB.
[0061]
The echo cancellation unit 20 includes an adaptive filter 201 and a post processor 202.
The adaptive filter 201 generates, for the input sound signal, a pseudo-regression sound signal
based on the sound collection directivity of the selected corrected sound collection beam MB.
The post processor 202 subtracts the pseudo-regression sound signal from the corrected sound
collection beam MB output from the sound collection beam selection / correction unit 19 and
outputs the result as an output sound signal to the input / output I / F 12. By performing such an
echo cancellation process, it is possible to suppress a wraparound sound that could not be
suppressed by the sound collection beam selection / correction unit 19 and to collect and output
a vocal sound with a higher S / N ratio. .
04-05-2019
20
[0062]
Finally, the input / output I / F 12 outputs the output sound signal input from the echo
cancellation unit 20 according to the setting of the control unit 10. That is, in the situation shown
in FIG. 6A and FIG. 6B, the input / output I / F 12 outputs the audio signal only when the
controller 10 is set to output the audio signal. It is converted into a data format (protocol)
corresponding to the network, and transmitted to the network via the input / output connector
11. When the input / output I / F 12 is set not to output an audio signal from the control unit 10
in the situation as shown in FIG. 6C, the input / output I / F 12 discards the output audio signal
without converting it.
[0063]
In the above description, although an example in which the sound collection beam MB2x is
subtracted and corrected from the sound collection beam MB1x is shown, conversely, the sound
collection beam MB1x may be corrected by subtraction from the sound collection beam MB2x. In
any case, since the differential signal indicating the maximum level is output (although the signal
level is inverted), the speaker's voice (voice based on the above-mentioned corrected sound
collection beam MB) is emitted on the sound emitting side that receives this signal. Be heard.
[0064]
The above description is the operation performed when the sound collection beam selection /
correction unit 19 specifies the spot sound collection mode from the control unit 10, but the
same operation is performed in the area sound collection mode. Even in the area collection mode,
if the signal level of the difference signal between the collection beam MB101 and the collection
beam MB201 is equal to or higher than a predetermined threshold, the effective sound detection
signal is output from the collection beam selection / correction unit 19 to the control unit 10.
The voice signal is output to the network. On the other hand, if the signal level of the difference
signal between the sound collection beam MB101 and the sound collection beam MB201 is less
than a predetermined threshold and continues for a predetermined time or more, the invalid
sound detection signal is from the sound collection beam selection / correction unit 19 to the
control unit No voice signal is output to the network.
04-05-2019
21
[0065]
As described above, an effective sound / invalid sound is detected from the difference signal
between the sound collected by one microphone array and the sound collected by the other
microphone array, and only the effective sound is output to the network. Can reduce the amount
of information to be transmitted.
[0066]
When the sound emission and collection device of this embodiment performs only in the area
sound collection mode, instead of the microphone arrays MA10 and MA20, two single directivity
patterns in which the directivity axes are directed symmetrically with respect to the reference
plane. You may use the microphone of.
In this case, the sound collection beam generation unit 181 and the sound collection beam
generation unit 182 may output the sound signals collected by the respective unidirectional
microphones to the subsequent stage without performing delay control.
[0067]
A plan view showing a microphone and a speaker arrangement of the sound emitting and
collecting apparatus according to the present embodiment A sound collecting beam area formed
by a sound emitting and collecting apparatus showing a setting example of sound collecting
modes on the front and back sides of the sound emitting and collecting apparatus Block diagram
showing the configuration of the sound collection and collection device block diagram showing
the configuration of the sound collection beam selection and correction unit 19 shown in FIG. 4
The sound collection and collection device 1 of this embodiment is arranged on a desk C The
figure which showed the situation where the conferees A and B are holding a conference The
figure which showed the signal level (average energy) of each sound collection beam, and the
difference signal
Explanation of sign
[0068]
1-Sound emission and collection device 101-Case 11-Input / output connector 12-Input / output
I / F 13-Sound emission directivity control unit 14-D / A converter 15-Sound emission amplifier
16-Sound collection amplifier 17- A / D converter 181, 182-sound collection beam generation
04-05-2019
22
unit 19-sound collection beam selection / correction unit 20-echo cancellation unit 201-adaptive
filter 202-post processor SP1 to SP3-speaker SPA10-speaker array MIC11 to MIC17, MIC21 ~
MIC 27-Microphone MA10, MA 20-Microphone array
04-05-2019
23
Документ
Категория
Без категории
Просмотров
0
Размер файла
37 Кб
Теги
jp2008259022
1/--страниц
Пожаловаться на содержимое документа