close

Вход

Забыли?

вход по аккаунту

?

JP2004064584

код для вставкиСкачать
Patent Translate
Powered by EPO and Google
Notice
This translation is machine-generated. It cannot be guaranteed that it is intelligible, accurate,
complete, reliable or fit for specific purposes. Critical decisions, such as commercially relevant or
financial decisions, should not be based on machine-translation output.
DESCRIPTION JP2004064584
[Objective] To raise the function to separate and extract signals. A linear prediction analysis for
calculating prediction residuals with respect to some sensor outputs in an apparatus for forming
directional characteristics using a plurality of sensors (microphones as an example) and
separating and extracting input signals. It has a circuit, a resynthesis filter that receives the
output of the circuit, and a subtractor that calculates the difference between the output of the
resynthesis filter and the other sensor output so that the output of the subtractor is minimized A
signal separation and extraction apparatus characterized in that as a result of updating the
coefficients of the redefinition filter, a part of the signal contained in the other sensor output is
removed. [Selected figure] Figure 1
Signal separation and extraction device
TECHNICAL FIELD The present invention realizes a function of installing a plurality of sensors in
a system to which a plurality of signals are incident, performing appropriate processing on each
sensor output, and separating and extracting the incident signals. The improvement of the signal
separation and extraction apparatus. One possible application for the application of the present
invention is an emergency telephone set installed in a highway tunnel. The emergency telephone
is used in the noise of a jet fan exhausting exhaust gas in addition to the noise of a car traveling
in a tunnel. Therefore, there is a problem that the voice of the speaker who uses the telephone is
buried in the noise at the time of use, making it difficult to hear the contents of the speech. In
order to solve this problem, there is a strong demand for establishing a method of electronically
removing noise superimposed on the voice of the speaker. If many techniques strongly related to
the present invention as such techniques are classified based on the difference in construction
method, they can be roughly divided into the following two. One of them is a method of obtaining
high signal-to-noise ratio voice by suppressing various noises by using various algorithms for
04-05-2019
1
noise-superimposed voice mixed into one microphone with only one microphone as a sensor to
be used. is there. The other one is a method of installing a plurality of microphones to constitute
a directional microphone, directing the dead angle thereof in the direction of noise incidence to
suppress noise and extract speech at a high signal-to-noise ratio. The present invention relates to
the improvement of the latter device. [0004] The trend of technology concerning microphone
array which is one of the well-known technologies to which the present invention belongs is (1)
Osamu Takazou, Akihiko Sugiyama, "Research Trend and Realization Technology of Microphone
Array" Technical report of the Institute of Electronics, Information and Communication
Engineers, DSP 99-122 (1999-12) (2) Abe, Masato, "Summary and Recent Trends of Microphone
Array Technology" 1998 Spring Meeting of the Acoustical Society of Japan 5-12 (1998-03). A
further look at these shows that they have a common principle. That is, there is no difference in
that the various methods introduced here constitute an apparatus for suppressing noise by
directing a dead angle in the incident direction of the noise. They merely differ in the way they
point at their blind spots. Therefore, in order to explain the principle of the method of pointing
the dead angle, a microphone array consisting of two microphones shown in FIG. 2 will be taken
as an example of the device of the simplest structure.
However, in this figure, the two microphones a and b are separated by an interval d, and noise is
incident from the direction of the right θ at the sound velocity v. In this case, it is apparent that
noise is input to the left microphone b after (d / v) cosθ seconds after the noise is input to the
right microphone a. Therefore, when the delay (d / v) cos θ seconds of the same magnitude is
electronically given to the noise input from the right microphone a, the electronically delayed
noise and the left microphone b The noise that is obtained as the output of is the same noise.
Here, assuming that the distance d between the microphones is small and the noise is transmitted
without attenuation during that time, if the output of the right microphone a that has been
electronically delayed from the output of the left microphone b is subtracted, the left The noise is
completely removed from the output of the microphone b. That is, a directional microphone
having a dead angle in the direction of noise is configured. Next, in order to simplify the
explanation of the problem with this system, the sound is incident from the direction opposite to
the noise at the same angle of φ shown in FIG. I assume. At this time, the sound that has entered
the right microphone a, together with the electronic delay placed between it and the output of
the left microphone b, causes a time difference of (d / v) (cos φ + cos θ) seconds. That is, in this
microphone array, a difference between (d / v) (cos φ + cos θ) seconds and a signal having a
time difference with respect to speech is formed. Clearly, this delay (d / v) (cos .phi. + Cos .theta.)
Is doubled in the frequency component constituting the phase difference of 180 degrees, and the
component which becomes 360 degrees is cancelled. The problem is that the phase difference
caused by the delay (d / v) (cos + + cos 異 な る) differs depending on the frequency in this way,
so this microphone array gives amplitude distortion on the frequency characteristics for speech.
For example, assuming that the distance between microphones is d = 17 cm, noise and voice are
incident at an acoustic velocity of 340 m / s, and angle θ = φ = π / 3, frequency components at
04-05-2019
2
integer multiples of 2 kHz at the time of calculating the above difference Causes a phase
difference that is an integral multiple of the wavelength. That is, frequency components of
integral multiples of 2 kHz are not output. In order to avoid this problem, it is necessary to
narrow the distance between the microphones so that the differential output does not become
zero within the required band.
It is equivalent to putting a big restriction on the installation position of the microphone. Also,
even if the amplitude distortion on the frequency characteristic is neglected, there are other
serious practical problems in the microphone array. In other words, simultaneous input of voice
and noise can not be avoided in practical use. In addition, the incident angle of noise and voice
fluctuates. It is difficult to deal with these two facts. For this problem, generally, a method is
adopted in which the delay is adjusted to minimize the difference output aiming at the case
where only noise is present against fluctuation of the incident angle of noise. . In this case, as a
matter of course, it is necessary to have a means for detecting that only the noise is incident
without error. It is necessary to identify whether the input signal is noise, speech or both. It is
natural that this identification is difficult for the purpose of the microphone array which
originally aims to separate and extract voice and noise, and at present, no identification system
has been found yet that can obtain sufficient performance. The present invention has been made
in consideration of these points, and it is an object of the present invention to improve the
function of separating and extracting signals. First of all, the book will explain the principle of the
invention. The present invention can be viewed as an application to the microphone array of the
former method described above using one microphone. The principle is described in the paper
(A. Kawamura, K. Fujii, Y. Itoh, and Y. Fukui “A new noise reduction method using estimated
noise spectrum” IEICE Trans. Fundamentalsvol. E85−A、no.4、
pp.784−789. Or JP-A 2001-175298. Here, the explanation will be limited to the parts
necessary to understand the principle of the present invention. FIG. 3 shows the structure of the
one microphone system. The speech and noise picked up by the microphone in this FIG. 3 are
subjected to linear prediction analysis in a linear prediction analysis circuit (210) and are
whitened. That is, speech and noise prediction residuals are obtained at the output of this circuit
(210). The prediction residual is then input to the noise resynthesis filter (220) and the
difference between its output and the microphone output is calculated in the subtractor (230).
Furthermore, if the coefficients of the noise resynthesis filter (220) are updated using an
appropriate algorithm such as a learning identification method so that the output of the
subtractor (230) is minimized, the noise resynthesis filter (220) Noise and speech will be resynthesized in However, if the convergence speed of the coefficient is lowered at the time of the
coefficient update (specifically, if a constant called step size set in the learning identification
method is given small), the phoneme change for the noise resynthesis filter (220) It becomes
difficult to keep up with the loud voice of the voice, and the voice acts as a disturbance to the
04-05-2019
3
coefficient update. That is, if the convergence speed of the coefficients is reduced to such an
extent that it is difficult to follow the signal change (phonetic change), the coefficients of the
noise re-synthesis filter (220) are updated so as to re-synthesize only the noise. As a result, the
noise resynthesis filter (220) resynthesizes only the noise, and the subtractor (230) outputs only
the voice. DETAILED DESCRIPTION OF THE INVENTION The present invention utilizes this
principle for the construction of a microphone array. FIG. 1 illustrates the principle of the present
invention. First, in order to simplify the description, it is assumed that noise and sound are
incident on the microphones A (101) and B (102) at the angles shown in FIG. The present
invention adds the noise and voice thus acquired by the microphone A (101) to the linear
prediction analysis circuit (110) to add a linear prediction analysis. As a result, residuals of
whitened speech and noise are obtained as outputs at the output of the circuit (110). Next, the
difference is input to the noise re-synthesis filter (120) and the difference between the output
obtained and the output of the microphone B (102) is calculated by the subtractor (130). Also,
the coefficients of the noise resynthesis filter (120) are updated using a learning identification
method or the like so that the difference is minimized. At this time, it should be noted that in the
present invention, noise and voice require different operations required for the noise re-synthesis
filter (120). That is, since the noise is input to the microphone A (101) earlier than the
microphone B (102), the noise resynthesis filter (120) identifies the acoustic system from the
microphones A to B simultaneously with the noise resynthesis for noise. Will be done at the same
time. On the other hand, since the voice is input to the microphone B (102) earlier than the
microphone A (101), the noise resynthesis filter (120) is required to operate as a linear
prediction filter for voice.
In this state, in order to perform linear prediction on speech, it is necessary to make the
convergence speed of the coefficients of the noise resynthesis filter (120) fast enough to follow
the phonetic change. On the contrary, since it can be assumed that the change of the acoustic
system from the microphone A to the microphone B is slow, the convergence speed of the
coefficient of the noise re-synthesis filter (120) can be slowed for noise. Therefore, if the
coefficients of the noise resynthesis filter (120) are slowly updated to identify the acoustic
system, linear prediction for speech is not configured, and the noise resynthesis filter (120) is a
noise resynthesis Will only do. As a result, only the speech appears at the output of the
subtractor (130). This means that a directional microphone is configured in the present
invention. (Claim 1) Thus, in the system of FIG. While one operation is required, in the present
invention, different operations such as linear prediction for speech and system identification for
noise are simultaneously required, and it is difficult to realize the two operations simultaneously.
Therefore, it is expected that noise re-synthesis will be performed more reliably and noise
suppression performance will be improved. FIG. 4 shows an audio waveform in which noise input
to the microphone is superimposed. The horizontal axis shows time, and the unit is
microseconds. 5 to 7 are also the same unit. However, in this example, the signal-to-noise ratio of
voice and noise is 0 dB. Further, FIG. 5 shows a waveform of the voice extracted by suppressing
04-05-2019
4
the noise with the circuit of FIG. 1 showing the principle of the present invention with respect to
the input voice of FIG. Clearly from this result, according to the present invention, noise is
suppressed and speech is separated and extracted. That is, it is confirmed that the directional
microphone is configured. Here, the linear prediction analysis circuit and the noise resynthesis
filter in the present invention operate in a state where speech and noise are simultaneously
present, and the coefficients of the noise resynthesis filter are automatically updated so that the
subtraction output is minimized. Therefore, it is also confirmed that the "simultaneous existence
of noise and voice" and the "change in incident angle of noise", which are problems in the
conventional microphone array, are solved. FIG. 6 shows a waveform of a voice whose noise is
suppressed by the one microphone system shown in FIG. 3 with respect to the input signal of
FIG. 4, and FIG. 7 shows an original waveform of the voice used in the above processing. Here,
when the above waveforms are compared, the voice waveform obtained by the present invention
shown in FIG. 5 is a voice waveform particularly at the beginning and the back of the waveform
obtained by the one microphone system of FIG. It can be seen that the fidelity is more
reproduced.
Further, although not clearly shown in the waveform, when the processed voice is actually output
to a speaker for comparison, it can be confirmed that the high frequency component is highly
reproducible in the present invention. This means that the noise is more faithfully reproduced at
the output of the noise resynthesis filter (120), and it can be said that the noise and the speech
are better separated in the present invention. As can be understood from the above description,
in the present invention, unlike the conventional microphone array, it is possible to perform an
operation of separating noise and voice even in a situation where voice and noise are
simultaneously present. Also, since the noise re-synthesis filter faithfully reproduces the noise
and only uses it to subtract the noise, it does not give distortion to the frequency characteristics
of speech as in the prior art. Also, it can be seen that the spacing of the microphones is free, thus
all the problems in the microphone arrays that have been present are solved. As an example, FIG.
8 is one of the variations that can further enhance the effects of the present invention. In this
embodiment, separation performance is improved by reusing the speech obtained as the output
of the subtractor (130). That is, the noise contained in the extracted voice obtained as the output
of the subtractor (130) is naturally small. Therefore, system identification of an acoustic system
from microphone B to A using this as a reference signal can be configured. However, in order for
the adaptive filter (240) shown in FIG. 8 not to constitute a linear prediction analysis filter for
speech, it is necessary to set the convergence speed of the coefficients of the adaptive filter to a
low speed. When the convergence is thus delayed, the adaptive filter (240) identifies the acoustic
system from the microphone B to the A, and the subtractor (250) outputs only the noise
excluding the voice. It will be. This means that the sound is less than the output of the
microphone A (101) and the rate of noise is high. Therefore, if the output of the subtractor (250)
is subjected to linear prediction analysis, the prediction residual obtained as the output of the
linear prediction analysis circuit (260) is predicted by the linear prediction analysis circuit (ll0) in
04-05-2019
5
which a large amount of speech is mixed. The ratio of the residual to noise is higher than the
residual. That is, if noise re-synthesis is performed using this residual, it is possible to reproduce
noise with higher fidelity than the noise re-synthesis filter (120), and as a result, the speech of
the output of the subtractor (251) is more faithful It will be obtained in degrees.
(Claim 2) FIG. 9 shows the result of the simulation. Although slight, when comparing the results
of FIG. 9 with FIGS. 5 and 6, it is confirmed that the voice is more faithfully reproduced in the
present embodiment. It is also confirmed that the sound quality has been improved as a result of
actually making a sound. Here, since the output of the subtractor (250) exceeds the output of the
subtractor (130) in voice fidelity, the system identification of the acoustic system from
microphone B to A is further added as a reference signal. If it comprises the adaptive filter
provided in, the identification performance will be improved more than that by the adaptive filter
(240). Therefore, if the difference between the output of the newly provided adaptive filter and
the output of the microphone A is used again to perform linear prediction analysis, and noise
resynthesis is performed using the prediction residuals, the degree of speech reproduction is
further enhanced. It will be. That is, it is possible to improve the voice separation and extraction
performance by repeatedly configuring the circuit including (240), (250), (260), (270) and (251)
shown in FIG. FIG. 10 shows the identification of the acoustic system from microphone B to A by
the voice obtained as the output of the subtractor (130) in FIG. 8 by the input of the linear
prediction analysis circuit (110). It shows the structure to be done. Also in this structure, it is
apparent that the speech component is suppressed from the input of the linear prediction
analysis circuit (110), and the same effect as the configuration of FIG. 8 is obtained. However, in
this case, the effect of saving the circuit scale can be obtained. (Claim 4) Here, if the l microphone
system shown in FIG. 3 is applied to the voice extracted according to the present invention, it is
possible to further reduce the noise remaining there. As described above, according to the
present invention, it is possible to operate even in the case where signals to be separated are
mixed, which is conventionally difficult, and there are few restrictions on the installation position
of the microphone. A microphone array can be configured. However, although the above
description is given for a microphone array, it should be noted that the same applies to radio
waves as long as the system constitutes directivity. Also, it should be noted that the number of
sensors does not have to be 2 at all, and there is no inconvenience in setting it to 3 or more.
BRIEF DESCRIPTION OF THE DRAWINGS FIG. 1 is a diagram showing the principle of the present
invention. FIG. 2 is a structural diagram for explaining the principle of the microphone array. FIG.
3 is a diagram showing a noise suppression system with one microphone. Fig. 5 is a diagram
showing a sound waveform superimposed on noise input to the voice. Fig. 5 is a diagram
showing a voice waveform obtained by the apparatus of the present invention. Fig. 6 is a diagram
showing a voice waveform obtained by the one microphone system shown in Fig. Fig. 8 shows an
original waveform of speech used in processing Fig. 8 shows an embodiment of the present
invention for reusing extracted speech Fig. 9 is a speech obtained in the embodiment of the
present invention for reusing extracted speech Waveforms [FIG. 10] A diagram showing an
04-05-2019
6
embodiment of the present invention in which the circuit scale is reduced when reusing extracted
speech. [Description of the code] 101 microphone A 102 microphone B 110 linear prediction
analysis circuit 120 noise re-synthesis filter 130 Subtractor 210 Linear prediction analysis
circuit 220 Noise re-synthesis filter 230 Subtractor 240 Adaptive filter 250 Subtractor 251
Subtractor 252 Subtractor 260 Linear prediction analysis circuit 270 Noise re-synthesis filter
04-05-2019
7
Документ
Категория
Без категории
Просмотров
0
Размер файла
19 Кб
Теги
jp2004064584
1/--страниц
Пожаловаться на содержимое документа