close

Вход

Забыли?

вход по аккаунту

?

JP2010221945

код для вставкиСкачать
Patent Translate
Powered by EPO and Google
Notice
This translation is machine-generated. It cannot be guaranteed that it is intelligible, accurate,
complete, reliable or fit for specific purposes. Critical decisions, such as commercially relevant or
financial decisions, should not be based on machine-translation output.
DESCRIPTION JP2010221945
An object of the present invention is to effectively suppress a reproduced sound from a speaker
which becomes an interfering sound at the time of voice recognition and hands-free calling. Kind
Code: A1 A signal processing apparatus that performs array processing so as to have different
sensitivities according to the sound reception direction of reproduced sound to a sound reception
signal obtained by receiving reproduced sound of a plurality of reproduction channels by a
microphone array. Source signals generated from audio signals of a plurality of channels are prefiltered so that the sound source is localized in a direction relatively low in sensitivity, and the
filtered signal is supplied to the reproduction channel. [Selected figure] Figure 1
Signal processing method, apparatus and program
[0001]
The present invention relates to a signal processing method, apparatus and program using
microphone array technology.
[0002]
Research on techniques for emphasizing sounds coming from a specific direction and
suppressing other sounds and techniques for detecting the direction of a sound source using a
plurality of arranged microphones, so-called microphone arrays, is actively conducted. .
For example, in the case where voice recognition is performed in order to understand what a
04-05-2019
1
driver (hereinafter referred to as a driver) desires in a car, it is possible to suppress traveling
noise and reproduction sound of car audio and extract only driver's utterance. A microphone
array is used.
[0003]
There are various methods for the microphone array, but a delay-and-sum array disclosed in
Non-Patent Document 1 can be given as a representative method. The delay-and-sum array
performs an addition process after delaying the signals from the respective microphones,
whereby an acoustic signal arriving from a predetermined direction (for example, the direction of
the driver in an automobile) is set. Is based on the principle that acoustic signals coming from
other directions are destructive because they are out of phase. By this principle, acoustic signals
from a specific direction can be emphasized, that is, directivity can be formed in a specific
direction.
[0004]
Another example of the microphone array is the Griffith-Jim type array disclosed in Non-Patent
Document 2. The Griffith-Jim type array is characterized in that an adaptive filter is used to form
directivity having a low sensitivity characteristic (hereinafter also referred to as null (zero point))
in the direction of the interference sound, and the interference sound is selectively removed. It is.
The number of directions of interference noise that can be removed is generally N-1 (N is the
number of microphones).
[0005]
The sound reproduced from a car audio, which is an interference sound in speech recognition in
a car, differs from the traveling noise of the car and the like, and the sound source signal and the
position of the sound source are known. Therefore, it is possible to suppress the reproduced
sound by performing so-called array processing on the signals from the microphone array. For
example, by directing the low sensitivity direction of the microphone array to the direction of the
speaker of the car audio by the array processing, it is possible to suppress the reproduced sound
of the car audio. Since the position of the speaker is known, it is possible to suppress more
effectively than the ambient disturbance sound whose source position is unknown.
04-05-2019
2
[0006]
J. L. Flanagan, J. D. Johnston, R. Zahn and G. W. Elko, "Computer-steered microphone arrays for
sound transduction in large rooms," J. Acoust. Soc. Am., Vol. 78, no. 5, pp. 1508-1518, 1985 L. J.
Griffiths and C. W. Jim, "An Alternative Approach to Linearly Constrained Adaptive
Beamforming," IEEE Trans. Antennas & Propagation, Vol. AP-30, No. 1, Jan., 1982
[0007]
When the reproduction sound of car audio is stereo, the localization position differs depending
on the sound source. For example, vocals are often localized at the center, and accompaniment is
often localized by dispersing to the left and right for each instrument. In this case, it is desirable
to allow the microphone array to have a blind spot in the direction of all the sound sources, that
is, to make the directions of all the sound sources have low sensitivity, but this is difficult when
the number of microphones is small. As a result, the suppression performance of the disturbance
sound is reduced. In addition, when a part of the sound source of the reproduction sound is
localized in the same direction as the target sound, it is very difficult to remove the reproduction
sound which is an interference sound.
[0008]
An object of the present invention is to provide a signal processing method, an apparatus and a
program for effectively suppressing a reproduced sound from a speaker which becomes an
interfering sound at the time of voice recognition and hands free calling.
[0009]
According to an aspect of the present invention, an array is formed so as to have different
sensitivities in accordance with the receiving direction of the reproduction sound with respect to
a reception signal obtained by receiving reproduction sounds of a plurality of reproduction
channels by a plurality of microphones. The sensitivity to the sound source signal is relatively
low for each of the plurality of reproduction channels, an array processing unit that performs
processing, a sound source signal generating unit that generates a sound source signal of at least
one channel from audio signals of a plurality of channels A signal processing apparatus is
provided, comprising: a filtering unit that performs filtering so that a sound source is localized in
a direction and generates a plurality of filtering signals to be supplied to the plurality of
reproduction channels.
04-05-2019
3
[0010]
According to the present invention, even when using a microphone array composed of a small
number of microphones, it is possible to effectively suppress the reproduced sound by multichannel audio signals which becomes an interfering sound at the time of voice recognition and
hands-free calling.
[0011]
A block diagram showing a signal processing apparatus according to a first embodiment of the
present invention A diagram showing a relationship between a sound source position and a
microphone for explaining the operation of the first embodiment A signal processing according
to a second embodiment of the present invention The block diagram showing the device The
block diagram showing the signal processing device according to the third embodiment of the
present invention The block diagram showing the signal processing device according to the
fourth embodiment of the present invention The signal according to the fifth embodiment of the
present invention A block diagram showing a processing device A block diagram showing a
signal processing device according to a sixth embodiment of the present invention A block
diagram showing an electronic device according to the seventh embodiment of the present
invention
[0012]
Hereinafter, embodiments of the present invention will be described.
First Embodiment As shown in FIG. 1, the signal processing apparatus according to the first
embodiment includes an adder 103, filtering units 104-1 and 104-2, selectors 105-1 and 105-2,
The array processing unit 108 and the phase control unit 109 are included.
[0013]
Adder 103, filtering units 104-1 and 104-2, and selectors 105-1 and 105-2 respectively receive
audio input terminals 101-1 and 101-2 which receive audio signals of a plurality of channels,
and speakers 106-of a plurality of channels. It is disposed between 1,106-2.
04-05-2019
4
The array processing unit 108 performs signal processing on sound reception signals from the
plurality of microphones 107-1 to 107 -N forming the microphone array to form predetermined
directivity and outputs a processed voice signal.
[0014]
Next, the operation of the signal processing apparatus according to the present embodiment will
be described.
Audio signals of a plurality of channels, in this example, stereo two channels, are input to the
audio input terminals 101-1 and 101-2.
The selectors 105-1 and 105-2 are, for example, changeover switches switched by the control
signal from the control input terminal 102, and when switched to the upper side, the audio
signals from the audio input terminals 101-1 and 101-2 are selected. And are emitted as sound
from the speakers 106-1 and 106-2 which are a plurality of reproduction channels. Hereinafter,
the sound radiated from the speakers 106-1 and 106-2 will be referred to as reproduction sound.
The audio signals supplied to the speakers 106-1 and 106-2 are actually amplified by the audio
amplifier, but the audio amplifier is omitted in the figure.
[0015]
The audio signals from the audio input terminals 101-1 and 101-2 are also input to the adder
103, where the signals are added together to form a signal of one channel, that is, a monaural
signal. Hereinafter, the signal of one channel output from the adder 103 is referred to as a source
signal. The sound source signal output from the adder 103 is filtered by the plurality of filtering
units 104-1 and 104-2. The filter characteristics of the filtering units 104-1 and 104-2 will be
described later. Here, when the selectors 105-1 and 105-2 are switched to the lower side, the
filtering signals from the filtering units 104-1 and 104-2 are selected, and the reproduced sound
from the speakers 106-1 and 106-2 respectively. It is emitted as
[0016]
04-05-2019
5
Reproduction sounds from the speakers 106-1 and 106-2 are received by the plurality of
microphones 107-1 to 107-N, and electric signals called sound reception signals are output from
the microphones 107-1 to 107-N. The array processing unit 108 performs signal processing
called array processing on sound reception signals from the microphones 107-1 to 107 -N, and
outputs a processed voice signal from which an interference sound is removed.
[0017]
In the present embodiment, the array processing unit 108 performs an array process such that
the reproduced sound from the speakers 106-1 and 106-2 has different sensitivities in
accordance with the sound reception directions of the microphones 107-1 to 107-N. According
to this embodiment, the processed voice signal from the array processing unit 108 is a voice
signal mainly composed of the driver's uttered sound from which the disturbing sound has been
removed. The disturbance sound is, for example, an utterance sound from the assistant driver's
seat or a reproduction sound from the speakers 106-1 and 106-2 as described later. The
processed voice signal from the array processing unit 108 is input to, for example, a voice
recognition device (not shown).
[0018]
FIG. 2 shows a specific usage example of the signal processing device according to the present
embodiment, and shows a state in which the signal processing device is mounted in a car. In this
example, two microphones and two speakers are used, and the microphones 107-1 and 107-2
are installed, for example, at a central portion of a rearview mirror or an instrument panel. The
speakers 106-1 and 106-2 are often arranged, for example, in the front seat door. In addition to
the front seat door, a speaker is often arranged at the rear.
[0019]
When the driver performs an operation of a car navigation system by voice using voice
recognition technology, for example, the directivity of the high sensitivity characteristic in the
direction of the driver 210 and the low sensitivity characteristic (null) in the direction of the
passenger 211 is formed. As described above, by controlling the characteristics of the
microphone array, it is possible to accurately capture only the driver's utterance. Specifically, for
04-05-2019
6
example, in the case of using a delay and sum array as described in Non-Patent Document 1, the
array processing unit 108 delays the sound reception signals from the microphones 107-1 to
107-N and then adds them.
[0020]
Here, by appropriately controlling the delay time in the array processing unit 108 by the phase
control unit 109, for example, only the acoustic signal that has arrived from the direction of the
driver in the array processing unit 108 is added in the same phase and emphasized. The array
processing can be performed to cancel out the acoustic signal from the direction because the
phases are not aligned. That is, directivity can be formed in the direction of the driver by this
array processing.
[0021]
Assuming that the sound reception signals from the N microphones 107-1 to 107-N are
represented by Xn (t) (n = 1,..., N), the processed audio signal Y (t) from the array processing unit
108 is And is expressed by the following equation.
[0022]
However, the microphones 107-1 to 107-N are arranged at equal intervals in the order of
subscript n.
Also, τ is a delay time for making the sound reception signals X (t) from the microphones 107-1
to 107-N in phase with the arrival direction of the target sound.
[0023]
As another example, as in the Griffith-Jim type array described in the above-mentioned nonpatent document 2, an adaptive filter is used to form nulls in the direction of the disturbing
sound so as to selectively remove the disturbing sound. It is also good. In that case, the direction
of the interference noise that can be removed is generally N-1 (N is the number of microphones).
Various other methods have been proposed, but this embodiment does not depend on the
04-05-2019
7
microphone array method itself, and any method can be used.
[0024]
As a design policy for such a microphone array, it is natural to design to suppress the speech of a
passenger sitting in the front passenger seat in order to accurately capture only the driver's
speech. In this case, when the disturbance sound is only the speech sound from the front
passenger seat, the disturbance sound can be effectively suppressed.
[0025]
However, when the reproduced sound is simultaneously output from the speaker which is also an
interference sound, it is difficult to suppress the reproduced sound. This is because the directions
of arrival of the speech sound from the front passenger seat and the reproduction sound from
the speaker are generally different. In particular, when the number of microphones is small, the
number of azimuths that can form null also decreases. For example, when the number of
microphones is 2, when null is formed in the direction of the passenger seat, null can not be
formed in other directions. , Can not suppress the playback sound from the speaker.
[0026]
Furthermore, when the reproduction sound is output in stereo from the speaker, various sound
source directions may exist simultaneously, and there is a limit in array processing using a small
number of microphones. Even in the case of a monaural signal, it is necessary to generate null in
two directions in combination with the front passenger seat, and it is difficult with two
microphones.
[0027]
In this embodiment, when the reproduced sound from the speakers 106-1 and 106-2 is received
by the microphones 107-1 to 107-N, the filtering unit 104 is configured to localize the sound
source in the direction in which the sensitivity is relatively low. The sound source signal from the
adder 103 is filtered and processed in advance by -1, 104-2. Specifically, filtering is performed
04-05-2019
8
by the filtering units 104-1 and 104-2 so that, for example, reproduced sounds from the
speakers 106-1 and 106-2 are observed as if they came from the direction of the front passenger
seat. .
[0028]
By doing this, it is possible to put out the utterance sound from the passenger seat and the
reproduction sound from the speakers 106-1 and 106-2 with one null in the same direction. That
is, even in the case where many nulls can not be formed because the number of microphones
107-1 to 107-N is small, in addition to the utterance sound of the front passenger seat,
simultaneously suppress the reproduced sound from the speakers 106-1 and 106-2. Can. As
described above, by suppressing not only the speech sound from the assistant driver's seat but
also the reproduction sound from the speakers 106-1 and 106-2 in the processing sound signal,
the speech recognition can be performed accurately.
[0029]
In the above description, the array processing unit 108 performs array processing so as to make
the sensitivity to a predetermined direction (for example, the direction of the front passenger
seat) be a minimum value, and the filtering unit 104-1 and 104-2 perform the speaker 106-1 ,
106-2 are described in the case where filtering is performed so that the sound source is localized
in the predetermined direction. However, depending on the system of the microphone array, it is
not always necessary to perform the filtering so that the reproduced sound from the speakers
106-1 and 106-2 is considered to be in the same direction as the speech from the front
passenger seat as described above.
[0030]
For example, Japanese Patent Laid-Open No. 2007-10897 discloses a method of designing the
directivity of the array so as to generate a desired response (target signal) for each of the sound
source positions for learning. In this case, it is possible to design the driver to emphasize the
signal emitted from the possible directional range and suppress the other signals. In such an
array, the receiving direction of the microphone array may be outside the range in which the
driver may be present, and does not necessarily have to match the direction of the speech sound
from the passenger seat.
04-05-2019
9
[0031]
As described above, the array processing unit 108 performs array processing so that the
sensitivity for the predetermined direction range is higher than the sensitivity for the direction
outside the range, and the filtering units 104-1 and 104-2 perform the direction processing.
Filtering may be performed to localize the sound source in the direction out of the range.
[0032]
By the way, when the filtering signals from the filtering units 104-1 and 104-2 are supplied to
the speakers 106-1 and 106-2, the reproduction sound from the speakers 106-1 and 106-2 is
the reproduction of the original audio signal. It sounds different from the time, and the sense of
stereo is lost.
That is, the reproduced sound corresponding to the filtering signal is a sound having a sense of
discomfort to the listener.
[0033]
Therefore, in the present embodiment, the filtering signals from the filtering units 104-1 and
104-2 are selected by the selectors 105-1 and 105-2 only while the driver speaks, and supplied
to the speakers 106-1 and 106-2. Otherwise, the audio signal from the audio input terminals
101-1 and 101-2 is selected and supplied to the speakers 106-1 and 106-2 to perform normal
reproduction, thereby making it possible to Such discomfort can be minimized.
[0034]
Next, a design method of the filtering units 104-1 and 104-2 will be described.
The sound source signals supplied to the speakers 106-1 and 106-2 are monaural signals
obtained by adding stereo signals in the adder 103. When the sound source signal is filtered by
the filtering units 104-1 and 104-2 and then reproduced by the speakers 106-1 and 106-2, and
the reproduced sound is received by the two microphones 107-1 and 107-2, the sound source
The transfer function (y1, y2) from (the output terminal of the adder 103) to the microphones
04-05-2019
10
107-1 and 107-2 is expressed by the following equation (2).
[0035]
Here, hxy is a transfer function from the speaker y to the microphone x, and g1 and g2 are
transfer functions of the filtering units 104-1 and 104-2.
[0036]
In order for the filtering units 104-1 and 104-2 to perform filtering as if the reproduced sound
from the speakers 106-1 and 106-2 came from the direction of the front passenger seat, (y1, y2)
The transfer functions (g1 and g2) of the filtering units 104-1 and 104-2 may be designed to be
equal to the transfer functions (a1 and a2) to the microphones 107-1 and 107-2.
For this purpose, (a1, a2) may be substituted for (y1, y2) in the equation (2) to solve for (g1, g2).
[0037]
In the case of two microphones and two speakers, since the matrix representing the transfer
function between the speaker and the microphone is a square matrix, an inverse matrix can
usually be calculated. If the number of microphones and the number of speakers are different, it
is general to use a generalized inverse matrix because an inverse matrix can not be defined.
[0038]
By the way, the transfer functions (g1 and g2) of the filtering units 104-1 and 104-2 largely
differ for each frequency, and the spectrum of the reproduced sound output from the speakers
106-1 and 106-2 is filtered corresponding to the filtering signal. It may differ greatly from the
spectrum of the reproduced sound corresponding to the previous audio signal. In such a case, it
is possible to adjust the size of the transfer functions (g1, g2) so that the spectrum of the
reproduced sound corresponding to the filtered signal is the same as the spectrum of the
reproduced sound corresponding to the audio signal before filtering. is there. Because, in sound
source localization, the magnitude of the sound is irrelevant, for example, even if the transfer
functions of the filtering units 104-1 and 104-2 are (A × g1, A × g2) using an arbitrary
04-05-2019
11
constant A, It is because there is no influence on the localization.
[0039]
In the control input terminal 102 of FIG. 1, the audio signals from the audio input terminals 1011 and 101-2 may be reproduced as they are by the speakers 106-1 and 106-2 or may be filtered
by the filtering units 104-1 and 104-2. And a control signal for switching and controlling
whether or not to perform reproduction is given, and the selectors 105-1 and 105-2 are
controlled by this control signal.
[0040]
As a method of giving a control signal, for example, a speech period (a speech period) of the
driver is detected, and filtering signals from the filtering units 104-1 and 104-2 are supplied to
the speakers 106-1 and 106-2 only in the speech period. As such, there is a method of
controlling the selectors 105-1 and 105-2.
For example, a speech zone detector is used to detect the speech zone. As a method of a voice
section detector, a method of determining a voice section based on signal power information,
signal-to-noise ratio based on estimated noise, spectral information, etc. has been proposed. In
addition, a discrimination method based on statistical methods (Sohn J., Kim N. S., and Sung W.,
"A statistical model-based voice activity detection", IEEE Signal Process. See also Lett., Pp. 1-3,
1999, 16, (1)). As another way of giving the control signal, a method may be considered in which
the driver explicitly designates a voice section.
[0041]
When speech recognition is performed, the speaker often presses a button indicating the start of
speech. By using the information on the start of speech as the speech section information, the
speech section can be identified with certainty, and this can be used as a control signal. It may be
pressed once at the time of speech and the program may determine the end of speech. In this
case, the start of speech may be a button, and the end may use information of a speech zone
detector.
[0042]
04-05-2019
12
Second Embodiment FIG. 3 shows a signal processing apparatus according to a second
embodiment which is a modification of the first embodiment, and the control input terminal 102
and the selectors 105-1 and 105-2 in FIG. The filtered signals are always supplied from the
filtering units 104-1 to 104-N to the speakers 106-1 to 106-N. As described above, in the case
where it is acceptable to always reproduce the filtering signal from the speakers 106-1 to 106-N,
the configuration of the present embodiment is desirable also from the viewpoint of
simplification of the apparatus. In addition, although the audio input terminal, the filtering part,
and the speaker are all M in FIG. 3, it is clear from the description so far that M should just be
two or more.
[0043]
Third Embodiment FIG. 4 shows a signal processing apparatus according to a third embodiment
which is a modification of the first embodiment, and the filtering signals from the filtering units
104-1 and 104-2 are stored in the storage unit 110. The filtering signals temporarily stored in -1
and 110-2 and read from the storage units 110-1 and 110-2 are supplied to the speakers 106-1
and 106-2. That is, in the present embodiment, filtering is not performed online, and the
waveforms of the filtering signals obtained by filtering the sound source signal in advance by the
filtering units 104-1 and 104-2 are stored in the storage units 110-1 and 110-2. In addition, they
are appropriately read out at the time of actual use and reproduced by the speakers 106-1 and
106-2.
[0044]
For example, among signals to be reproduced by the speakers 106-1 and 106-2, a predetermined
signal such as a fixed message for car navigation is filtered and stored in advance rather than
filtering each time reproduction is performed. It is desirable to store in 110-1 and 110-2. Then,
at the time of reproduction, filtered signal waveforms may be taken out from the storage units
110-1 and 110-2 and supplied to the speakers 106-1 and 106-2.
[0045]
The fixed message is often monaural sound, but in this case, it may be considered that the same
data is input to the storage units 101-1 and 101-2, and no problem occurs even with the same
configuration as stereo data. .
04-05-2019
13
[0046]
Fourth Embodiment FIG. 5 shows a signal processing device according to a fourth embodiment.
In the present embodiment, a correlation reduction unit 303 is provided instead of the adder 103
in FIG. 1, and audio signals from audio input terminals 101-1 and 101-2 are input to the
correlation reduction unit 303. The correlation reduction unit 203 generates sound source
signals of a plurality of channels (two channels in this example) in which the correlation is
reduced by the correlation between audio signals of a plurality of channels (two channels in this
example). Of the two-channel excitation signals from the correlation reduction unit 203, the
excitation signals of the first channel are input to the filtering units 304-1 and 304-3, and the
excitation signals of the second channel are input to the filtering units 304-2 and 304-4. Is input
to
[0047]
The filtering signals from the filtering units 304-1 and 304-2 are added by the adder 311-1 and
then supplied to the speaker 106-1 via the selector 105-1. Similarly, the filtering signals from the
filtering units 304-3 and 304-4 are added by the adder 311-2 and then supplied to the speaker
106-2 via the selector 105-2.
[0048]
Thus, filtering is performed on the two-channel sound source signal from the correlation
reduction unit 203 for each of the speakers 106-1 and 106-2 that are reproduction channels,
and the addition signal of the filtering signal of each two channels is the speaker 106- 1, 106-2.
[0049]
When two microphones 107-1 and 107-2 receive the reproduced sound from the speakers 1061 and 106-2, the microphones 107-1 and 107-2 are received from the sound source (two output
ends of the correlation reduction unit 303). The transfer functions (y1, y3) and (y2, y4) up to are
expressed by the following equations (3) and (4).
04-05-2019
14
[0050]
[0051]
Here, hxy is a transfer function from the speaker y to the microphone x, g1 and g3 are transfer
functions of the filtering units 304-1 and 304-3, and g2 and g4 are transfer functions of the
filtering units 304-2 and 304-4. is there.
Similar to the transfer functions (g1, g2) of the filtering units 104-1 and 104-2 obtained from the
equation (2) in the first embodiment, different directions for (y1, y3) and (y2, y4) The transfer
functions g1, g3 and g2, g4 can be obtained from the equations (3) and (4) by giving
[0052]
Next, the operation principle of the present embodiment will be described.
In the first embodiment, the input audio signals of a plurality of channels are added by the adder
103 to be one channel, ie, a monaural sound source signal, whereas in the present embodiment, a
plurality of channels (first and second channels) are added. The sound source signal of 2
channels is output.
The sound source signals of the plurality of channels are filtered for each of the speakers 106-1
and 106-2 as described above, and the addition signal of the filtering signal of each two channels
is further supplied to the speakers 106-1 and 106-2. . In this case, it should be noted that the
reproduced sounds from the speakers 106-1 and 106-2 interfere with each other, and when
received by the microphones 107-1 to 107-N, the sound is localized in a direction different from
the design. It is to end up.
[0053]
In the present embodiment, in order to avoid such an adverse effect, the correlation reduction
unit 303 reduces the correlation between the sound source signals of two channels. That is,
04-05-2019
15
although the audio signals from the audio input terminals 101-1 and 101-2 are stereo signals,
the correlation is large, but the two-channel sound source signals output from the correlation
reduction unit 303 are signals with small correlation. Here, although the number of channels of
the input audio signal and the number of channels of the sound source signal whose correlation
has been reduced are both two, the number of channels may not be the same.
[0054]
More specifically, the correlation reduction unit 303 adds, for example, audio signals of two
channels to be input to form a monaural signal, and then performs separation according to
frequency bands to obtain a low frequency signal as a sound source signal of the first channel,
the second channel The high frequency signal is output as a sound source signal of In addition, a
monaural signal obtained by adding two channels of audio signals is separated into a large
number of frequency components to be divided into so-called sub-bands, and each frequency
component is alternately distributed to two channels of sound source signals or randomly
distributed. You can also. Also, as in the case of an audio signal generated from MIDI (Musical
Instrument Digital Interface) data, when individual signals (parts) constituting the signal are
known, the sound source signals of two channels are generated separately for each part. It is also
conceivable.
[0055]
As described above, when the filtering signal is supplied to the speakers 106-1 and 106-2, the
stereo feeling of the reproduced sound from the speakers 106-1 and 106-2 is lost. According to
this embodiment, it is possible to compensate for such a reduction in the stereo feeling of the
reproduced sound. That is, in this embodiment, when the microphone array has local minimum
values of sensitivity in a plurality of directions by the array processing unit 108, the generated
multi-channel sound source signals are distributed to the local minimum values to compensate
for the reduction in stereo feeling. It becomes possible. If the microphone array shows high
sensitivity in the direction of a specific direction range, the generated multi-channel sound source
signal can be localized in directions other than the high sensitivity direction range to compensate
for the reduction between stereos. Become.
[0056]
04-05-2019
16
Fifth Embodiment FIG. 6 shows a signal processing apparatus according to a fifth embodiment.
The differences with the signal processing device according to the first embodiment will be
described. In this embodiment, the signal separation units 410-1 and 410-2 disposed
immediately after the audio input terminals 101-1 and 102-2 are added. 1, and the selectors
105-1 and 105-2 of FIG. 1 are replaced by the adders 312-1 and 312-2, and the input
destination of the control signal from the control input terminal 102 is the selector 105-1 and
105-2. Are changed to signal separation units 410-1 and 410-2.
[0057]
The two-channel audio signals from the audio input terminals 101-1 and 102 are added by the
adder 403 in the signal separation units 410-1 and 410-2 to become a monaural sound source
signal (referred to as a monaural component), It is separated into a component (called a stereo
component) which is output directly as it is in two stereo channels.
[0058]
The former monaural component is added by the adder 403 as in the first embodiment to be a
monaural sound source signal.
The sound source signals are filtered by the filtering units 304-1 and 304-2, and the filtered
signals are input to the adders 312-1 and 312-2, respectively. The latter stereo component is
input as it is to the adders 312-1 and 312-2. The sound source signal and the signal of the stereo
component are added by the adders 312-1 and 312-2 and output as reproduced sound from the
speakers 106-1 and 106-2.
[0059]
According to the present embodiment, the signals filtered by the filtering units 304-1 and 304-2
are components of a part of the input audio signal separated by the signal separation units 410-1
and 410-2. For example, when the sound source direction of each component of the audio signal
is known in advance, or each component can be separated by using a sound source separation
method or the like, the filtering units 304-1 and 304-2 are Among them, it is sufficient to filter
only the components within the direction range of the front passenger seat, for example, in which
the utterance sound as the disturbing sound may come. The components present outside this
direction range can be suppressed by the array processing unit 108 even if they are output as
04-05-2019
17
they are from the speakers 106-1 and 106-2.
[0060]
As described above, according to the present embodiment, among the input audio signals, only
the components in the direction range in which the speech sounds as the disturbance sounds
may arrive in the filtering units 304-1 and 304-2 are filtered. Just do it. Therefore, in the
reproduced sound from the speakers 106-1 and 106-2 based on the input audio signal, it is
possible to minimize the reduction in stereo feeling due to the filtering.
[0061]
Although the signal separation units 410-1 and 410-2 are controlled by the control signal from
the control input terminal 102 in FIG. 6, the input of the control signal can be omitted when the
control is not necessary.
[0062]
As another specific example of the signal separation units 410-1 and 410-2, a component of a
specific frequency band can also be selected.
For example, when the band of the input audio signal is 20 kHz and the band in which the array
processing unit 109 performs the array processing is 8 kHz, components of the band of 8 kHz or
more in the audio signal are irrelevant to the array processing. Therefore, as the monaural
component that is added by the adder 403 by the signal separation units 410-1 and 410-2 and
becomes the monaural sound source signal, only the component that does not meet 8 kHz to be
subjected to array processing is separated. The filtering units 304-1 and 304-2 filter the sound
source signal consisting of only the components of 8 kHz or less output from the adder 403, and
the filtered signals are input to the adders 312-1 and 312-2.
[0063]
On the other hand, components of the band of 8 kHz or more among the input audio signals are
separated as stereo components to be directly output as the two stereo channels in the signal
04-05-2019
18
separation units 410-1 and 410-2, and are added to the adders 312-1 and 312. Each is input to 2. The sound source signal and the signal of the stereo component are added by the adders 3121 and 312-2 and output as reproduced sound from the speakers 106-1 and 106-2. In this way, it
is possible to minimize the reduction in the stereo feeling due to the filtering in the reproduced
sound from the speakers 106-1 and 106-2 based on the input audio signal.
[0064]
Sixth Embodiment FIG. 7 shows a signal processing device according to a sixth embodiment. In
the fourth embodiment shown in FIG. 5, in the present embodiment, signal separating units 4101 and 410-2 are added immediately after the audio input terminals 101-1 and 101-2 in FIG. ,
Selectors 105-1 and 105-2 are replaced by adders 312-1 and 312-2, and the input destination of
the control signal from control input terminal 102 is signal separated from selectors 105-1 and
105-2. It has been changed to sections 410-1 and 410-2.
[0065]
In the present embodiment, filtering is performed on some components of the input signal as in
the fourth embodiment. In the fourth embodiment, filtering is performed in the same manner as
in the first embodiment, but in the present embodiment, as in the third embodiment, the twochannel sound source signal output from the correlation reduction unit 303 To be done.
[0066]
Therefore, according to the present embodiment, in the reproduced sound from the speakers
106-1 and 106-2 based on the input audio signal as in the fifth embodiment, the reduction in the
stereo feeling caused by the filtering is minimized. Can. Although the signal separation units 4101 and 410-2 are controlled by the control signal from the control input terminal 102 in FIG. 7,
the input of the control signal can be omitted when the control is not necessary.
[0067]
Seventh Embodiment FIG. 8 shows an electronic device 501 according to a seventh embodiment
04-05-2019
19
of the present invention, including the signal processing apparatus as described in the previous
embodiments. Here, the electronic device 501 is, for example, a personal computer, a portable
communication terminal, or the like, and includes a display unit 502. Speakers 106-1 and 106-2
and microphones 107-1 and 107-2 are installed around, for example, a display unit 502 of the
electronic device 501, and a speaker 503 who handles the electronic device 501 is connected to
the microphones 107-1 and 107-2. You can point and input voice.
[0068]
Next, the operation principle of the present embodiment will be described. The reproduced sound
from the speakers 106-1 and 106-2 based on the output signal generated by the electronic
device 501 is emitted toward the speaker 503. This reproduced sound is, for example, the voice
of the other party, or an audio signal including music. The speaker 503 inputs, for example,
speech to the other party of the call or an instruction to the terminal toward the microphones
107-1 to 107 -N provided in the electronic device 501.
[0069]
By the way, the sound source direction observed when the microphones 107-1 and 107-2
receive the reproduced sound from the speakers 106-1 and 106-2, and the sound source
observed when the speech of the speaker 503 is received If the directions overlap, the voice of
the speaker 503 and the reproduced sound from the speakers 106-1 and 106-2 will be mixed in
the received sound signals output from the microphones 107-1 and 107-2. This causes an echo
to be generated at the other end in the call, which causes a recognition error in speech
recognition.
[0070]
When the reproduced sound from the speakers 106-1 and 106-2 is received by the plurality of
microphones 07-1 and 107-2 using the signal processing apparatus described in the first to sixth
embodiments, the sound source direction Such problems can be avoided by pre-filtering the
reproduced sound from the speakers 106-1 and 106-2 so that the signal is out of the direction
range in which the speaker may exist. .
[0071]
04-05-2019
20
The signal processing based on the embodiment of the present invention described above can be
realized not only by hardware but also by software using a computer such as a personal
computer.
Therefore, according to the present invention, it is possible to provide a program for causing a
computer to function as the above-described signal processing device, or a computer readable
storage medium storing the program.
[0072]
The present invention is not limited to the above embodiment as it is, and at the implementation
stage, the constituent elements can be modified and embodied without departing from the scope
of the invention. In addition, various inventions can be formed by appropriate combinations of a
plurality of constituent elements disclosed in the above embodiment. For example, some
components may be deleted from all the components shown in the embodiment. Furthermore,
components in different embodiments may be combined as appropriate.
[0073]
108 ... array processing unit 109, 109-1, 109-2 ... phase control unit 110-1, 110-2 ...
accumulation unit 303 ... correlation reduction unit 304-1 to 304-4 ... Filtering unit 312-1, 3122, 403 Adder 410-1, 410-2 Signal separation unit 501 Electronic device 502 Display unit
04-05-2019
21
Документ
Категория
Без категории
Просмотров
0
Размер файла
35 Кб
Теги
jp2010221945
1/--страниц
Пожаловаться на содержимое документа