close

Вход

Забыли?

вход по аккаунту

?

JP2008060635

код для вставкиСкачать
Patent Translate
Powered by EPO and Google
Notice
This translation is machine-generated. It cannot be guaranteed that it is intelligible, accurate,
complete, reliable or fit for specific purposes. Critical decisions, such as commercially relevant or
financial decisions, should not be based on machine-translation output.
DESCRIPTION JP2008060635
A target signal is extracted from a mixed signal using a beam former even without prior
knowledge of an impulse response vector of a target signal. In a case where signals emitted from
N signal sources are observed by M sensors 4m and one or more signals are extracted, where N
and M are integers of 2 or more, the sensors 4m Transforms the observed signal x (t) observed in
x into the signal x (f, τ) in the frequency domain (5), normalizes x (f, τ) (22), and normalizes x
Cluster (f, τ) into N clusters C (24), and estimate the correlation matrix R (f) of the observed
signal including only the unnecessary signal from C (25), cluster information C and R ( f)
calculate the beam former w (f) (28), use w (f) and extract the target signal y (f, τ) from x (f, τ)
(30), y (f, Converting τ) into a time domain signal (32). [Selected figure] Figure 3
Blind signal extraction device, method thereof, program thereof, and recording medium
recording the program
[0001]
The present invention estimates and extracts the target signal in a situation where only the
required source signal (target signal) can not be observed directly, and other noises, interference
signals, etc. are superimposed on the target signal and observed. The present invention relates to
a blind signal extraction device, method, program, and recording medium.
[0002]
Here, first, modeling of the observation signal and definition of the frequency domain of the
signal are performed, and then, the prior art will be briefly described.
04-05-2019
1
[Observed signal] All signals are sampled at a sampling frequency fs and expressed discretely. It
is assumed that N (N is an integer of 2 or more) source signals are mixed and observed by M (M
is an integer of 2 or more) sensors. The present invention deals with the situation where
transmission line distortion may occur due to signal attenuation or delay depending on the
distance from the signal source to the sensor, and reflection of the signal due to a wall or the like.
In such a situation, source signals sn (t) (n = 1,..., N) from a plurality of signal sources are
observed by a plurality of sensors xm (t) (m = 1,. The impulse response from each signal source n
to the sensor m is hmn (u) (where u represents time). The observation signal xm (t) at the sensor
m is convoluted and mixed with the corresponding impulse response hmn (u) for each source
signal sn (t), and is expressed by the following equation. xm (t) = Σn = 1 <N> Σu = 0 <∞> hmn
(u) sn (t−u) (1) Here, the source signal s1 (t),. . . , SN (t) and the impulse response h11 (u),. . . ,
H1N (u),. . . , HM1 (u),. . . , Consider the situation where information about hMN (u) can not be
obtained in advance. In this situation, the observed signal x1 (t),. . . , XM (t) only with the source
signal s1 (t),. . . , SN (t) is a broad object of this invention. [Frequency Domain Representation] In
the present invention, each operation is performed in the frequency domain. Therefore, a known
technique such as short-time Fourier transform is applied to the L point (L is an arbitrary integer)
to the observation signal xm (t) by the sensor, and a time sequence for each frequency xm (f, τ)
= Σu = −L / 2 <L / 2-1> xm (τ + u) g (u) e <−i2πfu> (2) is obtained. Where f is the frequency,
f = 0, fs / L,. . . , Fs (L−1) / L, τ is an arbitrary time, and as described above, fs is a sampling
frequency. g (u) is a window function such as a Hanning window, for example.
[0003]
The convolutional mixture in the time domain represented by the equation (1) is as follows in the
frequency domain: xm (f, τ) = Σn = 1 <N> hmn (f) sn (f, τ) (3) It is expressed approximately as
a simple mixture. Here, hmn (f) is the frequency response (impulse response) for the frequency
component f from the signal source n to the sensor m, and sn (f, τ) is the source signal sn (t)
according to the same equation as equation (2) ) For a short time Fourier transformation, and so
on. Observed signals x1 (f, τ) of sensors 1 to M,. . . XM (f, τ) is expressed as a vector using
equation (3), x (f, τ) = Σn = 1 <N> hn (f) sn (f, τ) (4) Here, x (f, τ) is x (f, τ) = [x 1 (f, τ),. . . ,
Xm (f, τ),. . . , XM (f, τ)] <T>, and hn (f) is hn (f) = [h1n (f),. . . , Hmn (f),. . . , HMn (f)] <T>, and is
a vector summarizing the frequency response from the signal source to each sensor. [A] <T>
indicates a transposed vector of vector A. The same applies to the following description. [Typical
prior art] As a typical signal extraction method for extracting a target signal from a mixed signal,
adaptive beamformer (ABF) is described in Non-Patent Document 1 etc. and widely known. .
04-05-2019
2
[0004]
A functional configuration example of a conventional adaptive beamformer (hereinafter referred
to as a conventional beamformer) is shown in FIG. The signal xm (t) observed by the plurality of
sensors 4m is input to the frequency domain conversion unit 5. The frequency domain
conversion unit 5 converts the signal xm (t) into a frequency domain signal xm (f, τ). All x m (f,
τ) (m = 1,..., M) are input to the conventional beamformer 6.
[0005]
The conventional beamformer 6 emphasizes the target signal sn (t) in a system using a plurality
of sensors, and the unnecessary signals s1 (t),. . . , Sn-1 (t),. . . , Sn + 1 (t),. . . , SN (t) as much as
possible wn (f) = [w1n (f),. . . , Wmn (f),. . . , WMn (f)] <T> is realized.
[0006]
When designing the conventional beamformer 6, “impulse response vector hn (f) from the
target signal generation source to each sensor or steering vector an (f) = [exp (−i 2πfτ 1 n),
which is an approximation thereof,. , Exp (-i2πfτMn)] <T> (5) is known.
Here, τ mn is a time difference between the time when the signal source n reaches the sensor m
and the time when it reaches the origin 0. Conventionally, as shown in FIG. 2, a sensor system
arranged in a straight line is often used, and assuming that the direction of the signal source n is
θ n and the coordinates d m based on the sensor 41 of the sensor 4 m, the above τ mn is τ m
n It is given by = dm cos theta n / c. Where c is the speed of the signal.
[0007]
Referring back to FIG. 1, a filter group (vector) wn (f) which minimizes an output power A '(wn
(f)) expressed by the following equation as a conventional beamformer 6 for suppressing
unnecessary signals Estimate A ′ (wn (f)) = E {| yn | <2> (f, τ)} = E {yn (f, τ) yn <*> (f, τ)} = E
{wn <H> ( f) x (f, τ) x <H> (f, τ) wn (f)} = wn <H> (f) Rx (f) wn (f) (6) where E {·} is the time The
averaging operation on τ, A <*> is a complex conjugate of A, Rx (f) = E {x (f, τ) x <H> (f, τ)} is a
correlation matrix of the observed signal, [A] <H Indicates a conjugate transpose matrix (vector)
of the matrix (vector) A, yn (f, τ) is an output of the conventional beamformer 6, and can be
04-05-2019
3
expressed by the following equation (7). yn (f, τ) = wn <H> (f) x (f, τ) (7) where the meaningless
solution (wn (f) = 0 = [0,..., 0] <T> In order to avoid), the constraint shown in the following
equation that the target signal is obtained without distortion is given. wn <H> (f) hn (f) = 1 (8)
Thus, wn (f) satisfying equation (8) and minimizing the value of A ′ (wn (f)) in equation (6)
above. The problem of finding the value of) can be expressed by the following equation (9) using
Lagurange's undetermined multiplier p. A (wn (f)) = A ′ (wn (f)) + p (wn <H> (f) hn (f) −1) (9) By
solving equation (9), the conventional beamformer 6 Can be obtained with
[0008]
In the conventional beamformer 6 (conventional adaptive beamformer), the impulse response
vector hn (f) in equation (10) is measured and stored in the impulse response storage unit 10 as
the impulse response vector hn (f). It is ideal to read out and use. However, instead, the steering
vector storage unit 12 stores the steering vector an (f) shown in the above equation (5), and the
read steering vector an (f) is an impulse response vector hn (f) of the above equation (10). It is
widely used in place of).
[0009]
However, in a real environment, it is rare that the impulse response vector hn (f) and the steering
vector an (f) are correctly given, and the minimization of A (wn (f)) shown in the above equation
(9) is not necessarily required. It is often the case that the signal alone is not minimized. From
this, instead of the correlation matrix Rx (f) of the mixed signal (observed signal), the correlation
matrix RJ (f) = E {ξ (f, τ) of the signal ξ (f, τ) in the time interval of only the unnecessary
signal Non-Patent Document 2 and the like widely use a) ξ <H> (f, τ)}. This is known to achieve
higher performance than with Rx (f). That is, in the conventional beamformer, it is desirable that
the correlation matrix in the time interval of only the unnecessary signal (the time interval of
absence of the target sound) can be accurately estimated.
[0010]
The conventional beamformer 6 outputs an output signal vector yn (f, τ) according to the
equation (7) according to wn (f) of the equation (10) and the observation signal vector x (f, τ).
The output signal yn (f, τ) is input to the time domain conversion unit 8 and converted from the
frequency domain to the time domain to generate yn (t). Haykin,S. Adaptive filter
04-05-2019
4
theory Science and Technology Publishing 2001 pp. 690-693 Yoshio Oga Yoshio Yamazaki
Yutaka Kanada Acoustics System and Digital Processing ed.
[0011]
As described above, in the conventional beamformer, the impulse response vector hn (f) from the
target signal source to each sensor or the steering vector an (f) that is an approximation thereof
is required. That is, there is a disadvantage that prior knowledge about the target signal is
required. Furthermore, they are difficult to obtain correctly in the real environment, and the
performance of the conventional beamformer is significantly degraded if the prior knowledge
and the impulse response vector hn (f) in the usage environment deviate. Also, in order to obtain
high performance, it is necessary to estimate the correlation matrix RJ (f) of the signal in the time
interval of only the unwanted signal, but it is very difficult when the unwanted signal is a
nonstationary signal It is.
[0012]
In a signal extraction apparatus that observes signals emitted from N signal sources with M
sensors and extracts one or more of the observed signals, where N and M are integers of 2 or
more Yes, convert the observed signals observed by the M sensors into signals in the frequency
domain, normalize the signals in the frequency domain, calculate a normalized observed signal
vector, and calculate the normalized observed signal vector Are clustered into the N clusters, and
an unnecessary signal correlation matrix which is a correlation matrix of an observation signal
including only the unnecessary signals is estimated from the information of the clusters,
information of the clusters, the unnecessary signal correlation matrix, The beamformer is
calculated from the above, and the target signal is extracted from the signal in the frequency
domain using the beamformer, and the extracted target signal is converted into a signal in the
time domain.
[0013]
According to the above configuration, it is possible to separate and extract the target signal with
high accuracy without measuring the impulse response vector or the steering vector in advance
and even if the unnecessary signal is a non-stationary signal.
[0014]
The best mode for carrying out the invention will be shown below.
04-05-2019
5
[0015]
An example of the functional configuration of the present invention is shown in FIG. 3, and the
main processing flow of the present invention is shown in FIG.
The same functional components as those in FIG.
The same applies to the following.
[0016]
Also, in the present invention, the sparsity of the signal is assumed.
Sparse indicates that the signal is zero at most of time τ. The sparsity of the signal is confirmed,
for example, in the speech signal. By assuming the sparsity of the signals, it can be assumed that,
even if there are multiple source signals, they do not overlap each other at each time frequency
point (f, τ) and there is at most one. That is, the above equation (4) can be expressed by the
following equation.
[0017]
x (f, τ) = hn (f) sn (f, τ) (11) where hn (f) is an impulse response vector and sn (f, τ) is a source
signal present in (f, τ) Represent.
[0018]
Each observation signal xm (t) (m = 1, ...) collected by sensor 4m
M) is input to the frequency domain conversion unit 5. In the frequency domain transforming
unit 5, each observed signal xm (t) is transformed from the time domain to the frequency domain
and transformed to xm (f, τ) by, for example, the above-mentioned short time Fourier
04-05-2019
6
transformation which is a known technique. (Step S2). Furthermore, xm (f, τ) is output as an
observation signal vector x (f, τ). The observation signal vector x (f, τ) is input to the
normalization unit 22, and the normalized observation signal vector is calculated (step S4).
Specifically, the observed signal vector x (f, τ) = [x1 (f, τ),. . . , Xm (f, τ)] <T>, normalization of
the declination is performed by the following equation. Also, norm normalization is performed by
the following equation.
[0019]
x <−> (f, τ) ← x <−> (f, τ) / ‖x <−> (f, τ) ‖ (13) where x <−> (f, τ) is normalized
Represents the observed signal vector x (f, τ), arg (r) represents the argument of r, i represents
an imaginary unit, | r | represents the absolute value of r, and ‖ r ‖ is the norm of r , Q
represents the reference sensor number (Q∈ {1,..., M}), c represents the velocity of the signal,
and α represents any positive constant. For α, α = 4 dmax is most preferred. However, dmax
represents the maximum value of the distance between any sensor Q selected as a reference and
other sensors. Also, α may be another numerical value.
[0020]
From the above equations (11) to (13), the normalized observation signal vector x <-> (f, τ) can
be expressed by the following equation. Here, it is understood that An = (Σm = 1 <M> | hmn |
<2>) <1/2>, and it depends only on impulse response information on the signal sn (f, τ).
[0021]
The observed signal vectors x <-> (f, τ) of all normalized time frequencies are input to the
clustering unit 24 and clustered into N clusters (step S6). This clustering can be effectively
performed using, for example, the k-means method. For details, see “RO Duda, P. E. Hart, and
DG Stork, Pattern Classification, Wiley Interscience, 2nd edition, 2000. "It is described in. The
method of clustering will be specifically described below.
[0022]
04-05-2019
7
In the storage unit 26, the normalized observation signal vector x <-> (f, τ) represented by the
above equation (14) and the initial value c <j> n (j = 0, n = 1,. .., N) are stored. The clustering unit
24 reads the normalized observation signal vector x <−> (f, τ) from the storage unit 26 and
clusters these to generate N clusters C 1,..., CN. That is, the normalized observation signal vector
x <-> (f, τ) which is an M-dimensional complex vector is directly clustered in the following
procedure in the M-dimensional complex space.
[0023]
1. The cluster's centroid initial value c <j> n is read from the storage unit 26. The initial value c
<j> n of the centroid is a vector (M-dimensional complex vector) having the same dimension as
the normalized observation signal vector x <−> (f, τ). Note that how to select the initial value c
<0> n of the centroid will be described later. 2. Let j + 1 be a new j. 3. The normalized
observed signal vector x <−> (f, τ) at all time frequencies (f, τ) is assigned to the cluster Cn
represented by the nearest centroid c <j−1> n. That is, n is selected such that ‖x <−> (f, τ) −c
<j−1> n‖ is smallest for each normalized vector x <−> (f, τ). 4. The centroid is updated by
calculating the average value of the normalized observation signal vector x <-> (f, τ) assigned to
each cluster Cn and setting its norm to 1. That is, for the normalized observation signal vector x
<−> (f, τ) assigned to each cluster Cn, cn <j> = E {x (f, τ)} n / ‖E {x <− Update the centroid by
performing the operation of> (f, τ)} n‖ (15). Here, E {·} n represents the averaging operation on
the members of the cluster Cn. 5. Repeat steps 2-5 until the centroid c <j> n converges. The
centroid converged last is stored in the storage unit 26 as cn (n = 1,..., N). The above is the
clustering procedure.
[0024]
Next, an example of how to select the initial value of the centroid will be described. << Initial
Value Setting Method 1 >> N vectors are randomly selected from the normalized observation
signal vector x <-> (f, τ), and it is used as an initial value c <0> n (n = 1,) of the centroid. ... and
N). << Initial Value Setting Method 2 >> Assuming that | hmn (f) | = 1 for all m and n in the
equations (11) to (15), the centroid can be written as the following equation (16) So use this. {Cn}
q = E [x <-> (f, τ)] n = exp [i2π (dm-dQ) <T> vn / α] / M <1/2> = exp [i2π‖dm-dQ‖ <T> cos
は n <mQ> / α] / M <1/2> (16) Here, dm represents the position vector of the sensor 4m, and vn
= cosΘn <mQ> is the sensor 4Q selected as the reference with the sensor 4m Representing the
direction of arrival vector of the signal sn (t) with respect to the axis connecting the two and is
indicated by a thick vector in FIG. Also, v n is a unit vector and ‖ v n ‖ = 1.
04-05-2019
8
[0025]
The sensor position dm (m = 1,..., M) appropriately gives the azimuth θn and the elevation angle
φn (n = 1,..., N) held in the storage unit 26. Here, since the sensor position dm, the azimuth θn
and the elevation angle φn are initial values, they may be appropriate values. For example, if θn
= 2πn / N and φn = 0, spatially dispersed initial values are obtained.
[0026]
Referring back to FIG. 3, the cluster Cn obtained by the clustering unit 24 corresponds to each
source signal sn (f, τ). Also, the centroid c <-> n = E {x <-> (f, τ)} x − (f, τ) ∈ C n can be
understood from the above equation (14) as follows: , Τ) are shown to represent impulse
response information.
[0027]
Each cluster Cn is input to the unnecessary signal correlation matrix estimation unit 25. From the
information in each cluster Cn, the unnecessary signal correlation matrix estimation unit 25
determines the correlation matrix of the unnecessary signal section with respect to the source
signal sn (f, τ), that is, the unnecessary signal that is the correlation matrix of the observation
signal including only the unnecessary signal. The correlation matrix R <n> J (f) is estimated by
the following equations (17) and (18) (step S8). The estimated unnecessary signal correlation
matrix R <n> J (f) and the centroid information c <−> n of the cluster from the clustering unit 24
are input to the beam former calculation unit 28. The beam former calculator 28 calculates a
beam former wn (f) from the cluster information Cn and the unnecessary signal correlation
matrix R <n> J (f) (step S10). A specific calculation method of the beam former wn (f) will be
described in detail in a second embodiment.
[0028]
The calculated beamformer wn (f) is input to the target signal extraction unit 30. The target
signal extraction unit 30 uses the beamformer wn (f) to calculate the following equation (19) to
extract the target signal yn (f, τ) from the observation signal x (f, τ) in the frequency domain.
04-05-2019
9
(Step S12). yn (f, τ) = wn (f) <H> x (f, τ) (19) Perform the equations (17) to (19) for all n (n =
1,..., N) To extract all N signals. All yn (f, τ) are input to the time domain conversion unit 32. The
target signal yn (f, τ) extracted by the target signal extraction unit 30 is converted into a signal
yn (t) in the time domain by, for example, a short-time inverse Fourier transform which is a
known technique (step S14).
[0029]
Next, a second embodiment of the present invention will be described. The second embodiment is
an example in which the beamformer calculation unit 28 described in the first embodiment is
configured in more detail. FIG. 6 shows a functional configuration example of the beamformer
calculation unit 28 of the second embodiment and a portion related to this. The parts not
illustrated in FIG. 6 perform the same processing as that described in the first embodiment, and
the same applies to the following embodiments. The beamformer calculation unit 28 includes an
impulse response estimation unit 40 and an adaptive beamformer calculation unit 42.
[0030]
The impulse response estimation unit 40 estimates an impulse response vector hn (f) of the
target signal from the centroid information c <-> n of the cluster Cn. Specifically, the impulse
response vector for the source signal sn (f, τ) is estimated by performing inverse normalization
on the centroid information c <−> n from the clustering unit 24.
[0031]
First, the centroid of the cluster Cn is the above-mentioned equation (15), and x <-> (f, τ) can be
expressed by the equation (14). Here, assuming that | hmn | = 1 for all m and n, the m-th
component c <−> mn of the centroid c <−> n holds the following equation (20). If equation (20)
is solved for hmn (f), the following equation (21) can be obtained.
[0032]
The right side of equation (20) is the one in which the impulse response hmn (f) is normalized,
but equation (21) is recalculated for the impulse response vector hn (f) from equation (20). ,
Equation (21) is called inverse normalization.
04-05-2019
10
[0033]
Referring back to the explanation of FIG. 6, the estimated impulse response vector hn (f) and R
<n> J (f) from the unnecessary signal correlation matrix estimation unit 25 are input to the
adaptive beamformer calculation unit 42. Ru.
[0034]
The adaptive beamformer calculation unit 42 calculates an adaptive beamformer wn (f) using the
impulse response vector hn (f) and the unnecessary signal correlation matrix R <n> J (f).
Specifically, the adaptive beamformer wn (f) can be calculated by the following equation (22).
The above equation (22) can be obtained by replacing Rx (f) of the above equation (10) with R
<n> J (f).
[0035]
The target signal extraction unit 30 applies the adaptive beamformer wn (f) of Expression (22)
and the signal x (f, τ) in the frequency domain to the above Expression (19) to obtain the target
signal yn (f, extract τ).
[0036]
Next, as a modification of the second embodiment, the second embodiment will be described.
Instead of outputting the estimation of the impulse response vector hn (f) using the above
equation (22) by the impulse response estimation unit 40, it is also conceivable to estimate and
output the steering vector an (f). Assuming that the steering vector of the signal sn (f, τ) is an (f)
= [exp (−i2πfτ1 n),..., Exp (−i2πfτMn)] <T> (23) as in the above equation (5) Since the
steering vector an (f) is an estimate of the impulse response vector hn (f), when the phase terms
of the above equation (21) and the above equation (23) are compared, τ mn (m = 1,..., M) Can be
estimated by the following equation (24). τ ^ mn = αc <-1> arg [c <-> mnc <-> Qn] / 2π (24)
04-05-2019
11
The impulse response estimation unit 40 'performs the calculation of this equation (24).
[0037]
A steering vector an (f) using τ ^ mn is output as an estimation of the impulse response vector
hn (f). That is, the steering vector an (f) can be expressed by the following equation (27) from the
above equations (23) and (24), and is output as an impulse response ^ n (f) from the impulse
response estimation unit 40 ′. an (f) = [exp (-i2πfτ ^ 1n), ..., exp (-i2πfτ ^ Mn)] <T> ≒ h ^ n
(f) (25) h ^ n (f) in equation (25) And R <n> J, the adaptive beamformer is calculated by the above
equation (22).
[0038]
Further, when the unnecessary signal correlation matrix estimation unit 25 estimates the
adaptive beamformer wn (f) using the above equation (22), the unnecessary signal correlation
matrix R <n> J (f) is used, If the acoustic transfer characteristics, that is, the impulse response
vector hn (f) and the steering vector an (f) are known, they may be used. Also, instead of the
unnecessary signal correlation matrix R <n> J (f) in the equation (22), the correlation matrix Rx
(f) of the observation signal may be used. The same applies to the following embodiments.
[0039]
In the adaptive beamformer shown in the second embodiment, high performance can be obtained
when N ≦ M, but there is a problem that the performance is limited when N> M. Specifically, the
adaptive beamformer can effectively extract the target signal yn (f, τ) if the number of
unnecessary signals is M-1 or less, but if M or more, the effect is obtained It is known to be
inadequate. In the third embodiment, therefore, it is shown that the target signal can be extracted
even in the case of N> M, that is, when there are N-1 (> M-1) unnecessary signals. An example of
functional configuration of the third embodiment is shown in FIG. The third embodiment is
different from the second embodiment in that the unnecessary signal selection unit 49 and the
input signal estimation unit 50 are added, and the unnecessary signal correlation matrix
estimation unit 25, the impulse response estimation unit 40, the adaptive beamformer
calculation unit 42, The processing of has been changed.
04-05-2019
12
[0040]
The unnecessary signal selection unit 49 estimates the unnecessary signal correlation matrix R
<n> J (f) for the K unnecessary signals. Here, K is an integer satisfying K ≦ M−1. That is, K
clusters from clusters CL (L = 1,..., N−1, n + 1,..., N) corresponding to unnecessary signals other
than the cluster Cn corresponding to the target signal sn (f, τ) Choose these clusters as CJ. As a
method of selecting K clusters, a method of selecting K clusters in order from a cluster having
many cluster members in cluster CL, or a power of ξ L (f, τ) represented by the following
equation (26) It is conceivable to select K clusters in order from.
[0041]
Using the K clusters CJ selected by the unnecessary signal selection unit 49, the unnecessary
signal correlation matrix R <n> J (f) for the K unnecessary signals is calculated by the following
equations (27) and (28) .
[0042]
Further, the input signal estimation unit 50 estimates a beamformer input signal vector x (f, τ)
in which the K unnecessary signals selected by the unnecessary signal selection unit 49 and the
target signal Cn are mixed.
This can be obtained by the following equation (29) using the unnecessary signal cluster CJ and
the target signal cluster Cn. In the adaptive beamformer calculation unit 42, the unnecessary
signal correlation matrix R <n> J (f) obtained by the above equations (27) and (28) and the
impulse response vector hn (f from the impulse response estimation unit 40 or 40 ' The adaptive
beamformer wn (f) is calculated using equation (22) above).
[0043]
The target signal extraction unit 30 receives the beamformer input signal x (f, τ) from the input
signal estimation unit 50. The target signal extraction unit 30 calculates the above equation (19)
using the adaptive beamformer wn (f) from the adaptive beamformer calculation unit 42 and the
beamformer input signal vector x (f, τ), The target signal yn (f, τ) is extracted.
04-05-2019
13
[0044]
In the third embodiment, as described above, the case of N> M has been described. However, it
can be implemented even in the case of N ≦ M. In this case, as compared with the second
embodiment, extra processing is required for the processing of the input signal estimation unit
50 and the unnecessary signal selection unit 49.
[0045]
The fourth embodiment is an example where the sensor position information is known. A part of
the functional configuration example of the fourth embodiment is shown in FIG. The impulse
response estimation unit 40 described in the second or third embodiment is composed of an
arrival direction estimation unit 60 and an impulse response calculation unit 62. The sensor
position information storage unit 64 stores sensor position information indicating the positions
of M sensors.
[0046]
The arrival direction estimation unit 60 estimates the arrival direction of the signal. In the arrival
direction estimation unit 60, three-dimensional vectors dm (m = 1,..., M) representing the
centroid information c <-> n from the clustering unit 24 and the position of each sensor from the
sensor position information storage unit 64. Is input. Assuming that the three-dimensional vector
of length 1 representing the direction of arrival of the signal sn is vn (n = 1,..., N), the estimated
value of the direction of arrival of the signal sn uses centroid information c <-> n , Can be
calculated by the following equation (30).
[0047]
vn = αD <+> arg [c <-> n] / 2π (30) where D = [d1-dQ,. . . , Dm-dQ,. . . , DM−dQ] <T>, dQ is a
three-dimensional vector representing the position of an arbitrarily selected sensor 4Q as a
reference, and D <+> is a generalized inverse matrix of D.
[0048]
04-05-2019
14
Next, in the impulse response calculation unit 62, an impulse response is calculated using the
arrival direction of the signal sn and the sensor position information.
The impulse response calculation unit 62 receives the estimated value qn of the arrival direction
of the signal sn from the arrival direction estimation unit 60 and the sensor position information
dQ from the sensor position information storage unit 64. The impulse response calculation unit
62 obtains an estimated value of the steering vector an ^ (f) for the signal sn represented by the
following equation (31). This steering vector an ^ (f) is calculated as an estimated value hn (f) of
the impulse response vector. Then, the steering vector an ^ (f) (estimated value hn (f) of the
impulse response vector) is output from the impulse response calculator 62 and is input to the
adaptive beamformer calculator 42.
[0049]
In this embodiment, a configuration is shown in which a maximum gain beamformer is used
instead of the adaptive beamformer. The maximum gain beamformer is a method in which a filter
wn (f) is used as a beamformer that minimizes the unwanted signal component at the sensor
array output while maximizing the target signal at the sensor array output.
(D.H.Johnson and D.E.Dudgeon,“Array Signal
Processing Concepts and Techniques”,Prentice
Hall,1993. In the maximum gain beamformer, it is one point to estimate the target
signal component and the unwanted signal component in the sensor array output, but it is very
difficult to estimate the unwanted signal when the unwanted signal is a non-stationary signal.
There was a problem that it was difficult. The fifth embodiment solves this problem by using the
sparsity assumption. That is, (1) target signal correlation matrix RT <n> (f) which is a correlation
matrix of observed signals of only target signals, and unnecessary signal correlation matrix RJ
<n> (f) which is a correlation matrix of observed signals of only unnecessary signals. And (2)
maximum gain beamformer wn (f) from the target signal correlation matrix RT <n> (f) and the
unwanted signal correlation matrix RJ <n> (f) in the maximum gain beamformer calculation unit.
It can be solved by estimating.
[0050]
In addition, since the maximum gain beamformer does not have the above equation constraint
condition (8) of “minimizing distortion of the target signal”, beamformers wn (f) having
04-05-2019
15
various gain characteristics are configured at each frequency f. Ru. This means that, for example,
when the maximum gain beamformer is applied to a wideband signal such as a speech signal, the
output is distorted due to the frequency characteristic of wn (f). For this reason, conventionally, it
has been difficult to use a maximum gain beamformer for a wideband signal. In the fifth
embodiment, this is achieved by correcting the maximum gain beamformer wn (f) so that the
error between the observed signal vector x (f, τ) and the output signal of the maximum gain
beamformer wn (f) is minimized. Solve
[0051]
First, the principle of the maximum gain beamformer will be briefly described. As described
above, under the condition of “maximize the target signal in the sensor array output and
minimize the unnecessary signal component in the sensor array output”, the evaluation function
is given by the following equation (32).
[0052]
Here, the denominator is the output power of the unnecessary signal, the numerator is the output
power of the target signal, RT <n> (f) is the correlation matrix of the observed signal of the target
signal only, and RJ <n> (f) is the unnecessary signal only The correlation matrix of the
observation signal of Also, it can be expressed as (RJ <n> (f)) <1/2> = EF <1/2> E <H>, where E =
[e1,. . . em], ei is an eigenvector of RJ <n> (f), F = diag (λ1,..., λM), λi is an eigenvalue of RJ <n>
corresponding to ei, w < If it is set that >> = (RJ <n> (f)) <1/2> wn, the above equation (32) can be
changed to the following equation (33).
[0053]
Here, the maximum value of g (w <∼>) is (RJ <n> (f), according to the Rayleigh quotient theorem
described in “Kodama, Suda,“ Matrix theory for system control, Corona Co., 1995 ”. Given a
maximum eigenvalue λ of <−1⁄2> (RT <n> (f)) (RJ <n> (f)) <− 1⁄2> and the corresponding
eigenvector e, the maximum value is It becomes maxg (w <->) = (lambda) = g (e). That is, the
maximum gain beamformer wn to be obtained can be expressed by the following equations (34)
and (35). w <∼> = e (34) wn = (RJ <n> (f)) <− 1⁄2> e (35) An example of a functional
configuration of the fifth embodiment is shown in FIG. Compared with the first embodiment, the
observed signal correlation matrix estimation unit 72 is added, and the beamformer calculation
04-05-2019
16
unit 28 includes the target signal correlation matrix estimation unit 70, the eigenvector
calculation unit 74, the maximum gain beamformer calculation unit 76, and the correction vector
calculation. And a correction unit 80.
[0054]
The target signal correlation matrix estimation unit 70 estimates the correlation matrix of the
time section of only the target signal sn (f, τ) from the information of the cluster according to
the following equations (36) and (37). Here, Cn is a cluster corresponding to the target signal.
The unnecessary signal correlation matrix RJ <n> (f) from the unnecessary signal correlation
matrix estimation unit 25 and the target signal correlation matrix RT <n> (f) are input to the
eigenvector calculation unit 74. In the eigenvector calculation unit 74, the maximum eigenvector
en (f of (RJ <n> (f)) <− 1/2> (RT <n> (f)) (RJ <n> (f)) <− 1/2> ) Is calculated from the Rayleigh
quotient theorem described above.
[0055]
The maximum gain beamformer calculation unit 76 receives RJ <n> (f) from the unnecessary
signal correlation matrix estimation unit 25 and en (f) from the eigenvector calculation unit 74.
The maximum gain beamformer calculation unit 76 calculates the maximum gain beamformer
wn (f) according to the following equation (38). wn (f) = (RJ <n> (f)) <− 1⁄2> en (f) (38) This
equation (38) is based on the above equation (35).
[0056]
On the other hand, the observation signal correlation matrix estimation unit 72 estimates the
observation signal correlation matrix Rx (f), which is a correlation matrix of the observation
signal vector x (f, τ), using the following equation (39). Rx (f) = E {x (f, τ) x <H> (f, τ)} (39) In
the correction vector calculator 78, the maximum gain beamformer wn (f) from the maximum
gain beamformer calculator 76. And the observation signal correlation matrix Rx (f) from the
observation signal correlation matrix estimation unit 72. The correction vector calculation unit
78 generates a correction vector α n (f) for correcting the maximum gain beamformer wn (f).
This correction transforms the maximum gain beamformer wn (f) such that the distortion the
maximum gain beamformer wn (f) imparts to the output is minimal. For example, a correction
vector α n (f) is calculated which minimizes an error A between the observed signal vector x (f,
04-05-2019
17
τ) and the output signal vector yn (f, τ) represented by the following equation (40). A (αn (f)) =
E {‖x (f, τ) −αn (f) yn (f, τ) ‖2>} (40) where yn (f, τ) is the maximum gain beamformer The
output wn (f) of wn (f) = wn (f) x (f, τ). Expanding the right side of the above equation (40), A (α
n (f)) = {E [‖ x (f, τ) ‖]} <2> -α n (f) E [x <H> (f, τ ) Yn (f, τ)]-αn <H> (f) E [yn (f, τ) <*> x (f,
τ)] + αnαn <H> E [│yn (f, τ) │ <2 >] (41) In the equation (41), partial differentiation of both
sides by α n <H> (f) results in the following equation (42). ∂ A (α n (f)) / α α n <H> (f) =-E [yn
(f, τ) <*> x (f, τ)] + α n E [│ yn (f, τ) │ <2> (42) Assuming that the left side of the above
equation (42) is 0 and obtaining for α n, the following equation (43) is obtained. αn (f) = E [yn
(f, τ) <*> x (f, τ)] / E [│yn (f, τ) │ <2>] (43) where the above equation (19) and From the
above equation (39), the above equation (43) becomes the following equation (44).
[0057]
Here, as described above, Rx (f) is a correlation matrix of the observation signal vector x (f, τ).
As understood from the above equation (44), using the maximum gain beamformer wn (f) and
the observation signal vector x (f, τ), the correction vector calculator 78 calculates the
correction vector αn (f). Ru.
[0058]
The correction unit 80 corrects the frequency distortion for the maximum gain beamformer wn
(f) using the correction vector α n (f) and calculates a correction beamformer. Specifically, the
corrected beam former wn '(f) can be determined by the following equation (45). wn '(f) = [.
alpha.n (f)] Bwn (f) (45) where B is an arbitrary sensor number, and B.ident. {1,. . . , M}, and
indicates that [q] B is the B-th element of the vector q.
[0059]
The target signal extraction unit 30 extracts the target signal yn (f, τ) by the following equation
(46) using the correction beam former wn ′ (f). yn (f, τ) = wn '<H> (f) x (f, τ) (46) Further, a
functional configuration example of a modification of the fifth embodiment is shown in FIG. The
beamformer calculation unit 28 includes a target signal correlation matrix estimation unit 70, an
eigenvector calculation unit 74, and a maximum gain beamformer calculation unit 76. The target
signal extraction unit 30 includes a signal extraction unit 81 and a distortion correction unit 82.
It is done.
04-05-2019
18
[0060]
The maximum gain beamformer wn (f) from the maximum gain beamformer calculation unit 76
and the observation signal vector x (f, τ) from the frequency domain conversion unit 5 are input
to the signal extraction unit 81. The signal extraction unit 81 calculates the following equation
(47) to extract the target signal yn (f, τ) including distortion. yn (f, τ) = wn <H> (f) x (f, τ) (47)
The target signal yn (f, τ) including distortion is input to the distortion correction unit 82.
Further, the correction vector α n (f) from the correction vector calculation unit 78 is also input
to the distortion correction unit 82. The distortion correction unit 82 corrects the distortion by
converting the output signal by the following equation (48), and outputs a corrected output
signal yn ′ (f, τ). yn '(f, τ) = [αn (f)] Byn (f, τ) (48) In the first to fifth embodiments described
above, it has been described that signals are extracted for all n. The beamformer may be
configured only for the signal (one n). For selection of the target signal, for example, comparing
the impulse response vector hd of the target signal on the database with the impulse response
vector hn estimated for all the sound sources n by the invention method, the sound source n
having hn closest to hd It can be selected by selecting. For example, an algorithm such as minn
(h1 · hn) can be considered. If a beamformer using the above equation (24) or the like by the
beamformer calculation unit 28 described in the second to fifth embodiments only for the
selected n is configured, an adaptive beamformer for a target signal can be obtained.
[0061]
[Experimental Results] Experiments were conducted to show the effects of the above examples. A
mixed signal was simulated by convolutionally mixing impulse responses measured in the room
shown in FIG. 11 with a plurality of sounds. The experimental conditions are as shown in FIG. In
a room with a long side of 880 cm, a short side of 375 cm and a height of 240 cm, and
reverberation of 120 ms, three sensors 41, 42 and 43 were arranged at a position of 200 cm
from the long side of the bottom and 282 cm from the short side. A long side and a parallel axis
are x, and a short side and a parallel axis are y. As shown in FIG. 12, three sensors 41, 42 and 43
are two on the y axis, one on the x axis, and the long side. The experiment is performed in the
case of arranging in two dimensions at the apex of an equilateral triangle of 4 cm. Also, a
microphone was used as a sensor. Signal to unwanted signal ratio (SIR) and signal to distortion
ratio (SDR) were evaluated for the four speech combinations. The unit is dB.
[0062]
04-05-2019
19
The four sound sources are located at the intersection of the x and y axes at the sensor position,
the + direction of the x axis is 0 degrees, 30 degrees counterclockwise, the direction of the 315
degrees and the sensor position and a circle with a radius of 50 cm Each sound source is placed
on the intersection, and the sound source is placed on the intersection of a direction of 225
degrees and 315 degrees and a circle with a radius of 80 cm. In an experiment to confirm the
effect of the second embodiment, sound sources in the directions of 120 degrees, 225 degrees,
and 315 degrees were used, and N (number of source signals) = M (number of sensors) = 3. In
the experiment to confirm the effect of Example 3, N = 4 and M = 3.
[0063]
FIG. 13 shows the results of this experiment. The third embodiment shows the case where the
input signal estimation unit 50 shown in FIG. 7 is provided in the conventional method, the
second embodiment, the second embodiment, the fourth embodiment and the fifth embodiment.
In the conventional method, in the above equation (10) representing the adaptive beamformer 6
shown in FIG. 1, hn (f) provided with a known steering vector an (f) is used. In this case, high
performance was not obtained for both N = M and N> M. Since this is an experiment in a
reverberant environment, it can be considered as the main reason that the given steering vector
an (f) could not be considered until the reverberation effect. Also, the fact that a sufficient SIR can
not be obtained when N> M indicates that only the limit of the adaptive beamformer, that is, M-1
unnecessary signals can be effectively suppressed.
[0064]
Compared with the conventional method, it is understood that the above embodiment has higher
performance than the conventional method when the values of SIR and SDR are compared when
N = M and even when N> M.
[0065]
In addition to the above embodiments, the blind signal extraction device according to the present
invention is not limited to the above-described embodiments, and various modifications can be
made without departing from the scope of the present invention.
04-05-2019
20
Also, the processing described in the blind signal extraction device is not only performed in
chronological order according to the order of description, but also if it is performed in parallel or
individually depending on the processing capability of the device performing the processing or
the need. Good.
[0066]
Further, when the processing in the blind signal extraction device of the present invention is
realized by a computer, the processing content of the function that the blind signal extraction
device should have is described by a program. Then, by executing this program on a computer,
the processing function of the blind signal extraction device is realized on the computer.
[0067]
The program describing the processing content can be recorded in a computer readable
recording medium. As the computer readable recording medium, any medium such as a magnetic
recording device, an optical disc, a magneto-optical recording medium, a semiconductor memory,
etc. may be used. Specifically, for example, as a magnetic recording device, a hard disk device, a
flexible disk, a magnetic tape or the like as an optical disk, a DVD (Digital Versatile Disc), a DVDRAM (Random Access Memory), a CD-ROM (Compact Disc Read Only) Memory), CD-R
(Recordable) / RW (Rewritable), etc. as magneto-optical recording medium, MO (Magneto-Optical
disc) etc., as semiconductor memory EEP-ROM (Electronically Erasable and Programmable Only
Read Memory) etc. Can be used.
[0068]
Further, this program is distributed, for example, by selling, transferring, lending, etc. a portable
recording medium such as a DVD, a CD-ROM or the like in which the program is recorded.
Furthermore, this program may be stored in a storage device of a server computer, and the
program may be distributed by transferring the program from the server computer to another
computer via a network.
[0069]
04-05-2019
21
For example, a computer that executes such a program first temporarily stores a program
recorded on a portable recording medium or a program transferred from a server computer in its
own storage device. Then, at the time of execution of the process, the computer reads the
program stored in its own recording medium and executes the process according to the read
program. Further, as another execution form of this program, the computer may read the
program directly from the portable recording medium and execute processing according to the
program, and further, the program is transferred from the server computer to this computer
Each time, processing according to the received program may be executed sequentially. In
addition, a configuration in which the above-described processing is executed by a so-called ASP
(Application Service Provider) type service that realizes processing functions only by executing
instructions and acquiring results from the server computer without transferring the program to
the computer It may be Note that the program in the present embodiment includes information
provided for processing by a computer that conforms to the program (such as data that is not a
direct command to the computer but has a property that defines the processing of the computer).
[0070]
Further, in this embodiment, the blind signal extraction device is configured by executing a
predetermined program on a computer, but at least a part of the processing contents may be
realized as hardware.
[0071]
The present invention separates the target voice even in a situation where the microphone may
pick up a sound other than the target speaker's voice because the input microphone of the voice
recognition machine and the speaker are separated for application in the audio field. Extraction
makes it possible to construct a speech recognition system with a high recognition rate.
[0072]
The block diagram which shows the function structural example of the system of a prior art.
When using the sensor system arrange | positioned linearly, the figure for demonstrating time
difference (tau) of the time when the sound source n reaches arbitrary sensor j, and the time
which reaches the origin 0. FIG.
04-05-2019
22
BRIEF DESCRIPTION OF THE DRAWINGS The block diagram which shows the function structural
example of the system of Example 1 of this invention. The flowchart which shows the flow of the
main processing of Example 1 of this invention. The figure for demonstrating cos (epsilon) n
<mQ> used by said Formula (16) in the sensor m and the sensor Q which are arbitrary two
sensors. The block diagram which shows a part of function structural example of the system of
Example 2 of this invention. The block diagram which shows a part of function structural
example of the system of Example 3 of this invention. The block diagram which shows a part of
function structural example of the system of Example 4 of this invention. The block diagram
which shows a part of function structural example of the system of Example 5 of this invention.
The block diagram which shows a part of function structural example of the system of the
modification of Example 5 of this invention. The figure which looked at the comparative
experiment of a prior art and the technique of this invention from right above. The figure which
shows the detail of the positional relationship of three sensors 41, 42, 43 of FIG. The figure
which shows the experimental result which compared the effect of the prior art and the
technique of this invention.
04-05-2019
23
Документ
Категория
Без категории
Просмотров
0
Размер файла
40 Кб
Теги
jp2008060635
1/--страниц
Пожаловаться на содержимое документа