Patent Translate Powered by EPO and Google Notice This translation is machine-generated. It cannot be guaranteed that it is intelligible, accurate, complete, reliable or fit for specific purposes. Critical decisions, such as commercially relevant or financial decisions, should not be based on machine-translation output. DESCRIPTION JP2008060635 A target signal is extracted from a mixed signal using a beam former even without prior knowledge of an impulse response vector of a target signal. In a case where signals emitted from N signal sources are observed by M sensors 4m and one or more signals are extracted, where N and M are integers of 2 or more, the sensors 4m Transforms the observed signal x (t) observed in x into the signal x (f, τ) in the frequency domain (5), normalizes x (f, τ) (22), and normalizes x Cluster (f, τ) into N clusters C (24), and estimate the correlation matrix R (f) of the observed signal including only the unnecessary signal from C (25), cluster information C and R ( f) calculate the beam former w (f) (28), use w (f) and extract the target signal y (f, τ) from x (f, τ) (30), y (f, Converting τ) into a time domain signal (32). [Selected figure] Figure 3 Blind signal extraction device, method thereof, program thereof, and recording medium recording the program [0001] The present invention estimates and extracts the target signal in a situation where only the required source signal (target signal) can not be observed directly, and other noises, interference signals, etc. are superimposed on the target signal and observed. The present invention relates to a blind signal extraction device, method, program, and recording medium. [0002] Here, first, modeling of the observation signal and definition of the frequency domain of the signal are performed, and then, the prior art will be briefly described. 04-05-2019 1 [Observed signal] All signals are sampled at a sampling frequency fs and expressed discretely. It is assumed that N (N is an integer of 2 or more) source signals are mixed and observed by M (M is an integer of 2 or more) sensors. The present invention deals with the situation where transmission line distortion may occur due to signal attenuation or delay depending on the distance from the signal source to the sensor, and reflection of the signal due to a wall or the like. In such a situation, source signals sn (t) (n = 1,..., N) from a plurality of signal sources are observed by a plurality of sensors xm (t) (m = 1,. The impulse response from each signal source n to the sensor m is hmn (u) (where u represents time). The observation signal xm (t) at the sensor m is convoluted and mixed with the corresponding impulse response hmn (u) for each source signal sn (t), and is expressed by the following equation. xm (t) = Σn = 1 <N> Σu = 0 <∞> hmn (u) sn (t−u) (1) Here, the source signal s1 (t),. . . , SN (t) and the impulse response h11 (u),. . . , H1N (u),. . . , HM1 (u),. . . , Consider the situation where information about hMN (u) can not be obtained in advance. In this situation, the observed signal x1 (t),. . . , XM (t) only with the source signal s1 (t),. . . , SN (t) is a broad object of this invention. [Frequency Domain Representation] In the present invention, each operation is performed in the frequency domain. Therefore, a known technique such as short-time Fourier transform is applied to the L point (L is an arbitrary integer) to the observation signal xm (t) by the sensor, and a time sequence for each frequency xm (f, τ) = Σu = −L / 2 <L / 2-1> xm (τ + u) g (u) e <−i2πfu> (2) is obtained. Where f is the frequency, f = 0, fs / L,. . . , Fs (L−1) / L, τ is an arbitrary time, and as described above, fs is a sampling frequency. g (u) is a window function such as a Hanning window, for example. [0003] The convolutional mixture in the time domain represented by the equation (1) is as follows in the frequency domain: xm (f, τ) = Σn = 1 <N> hmn (f) sn (f, τ) (3) It is expressed approximately as a simple mixture. Here, hmn (f) is the frequency response (impulse response) for the frequency component f from the signal source n to the sensor m, and sn (f, τ) is the source signal sn (t) according to the same equation as equation (2) ) For a short time Fourier transformation, and so on. Observed signals x1 (f, τ) of sensors 1 to M,. . . XM (f, τ) is expressed as a vector using equation (3), x (f, τ) = Σn = 1 <N> hn (f) sn (f, τ) (4) Here, x (f, τ) is x (f, τ) = [x 1 (f, τ),. . . , Xm (f, τ),. . . , XM (f, τ)] <T>, and hn (f) is hn (f) = [h1n (f),. . . , Hmn (f),. . . , HMn (f)] <T>, and is a vector summarizing the frequency response from the signal source to each sensor. [A] <T> indicates a transposed vector of vector A. The same applies to the following description. [Typical prior art] As a typical signal extraction method for extracting a target signal from a mixed signal, adaptive beamformer (ABF) is described in Non-Patent Document 1 etc. and widely known. . 04-05-2019 2 [0004] A functional configuration example of a conventional adaptive beamformer (hereinafter referred to as a conventional beamformer) is shown in FIG. The signal xm (t) observed by the plurality of sensors 4m is input to the frequency domain conversion unit 5. The frequency domain conversion unit 5 converts the signal xm (t) into a frequency domain signal xm (f, τ). All x m (f, τ) (m = 1,..., M) are input to the conventional beamformer 6. [0005] The conventional beamformer 6 emphasizes the target signal sn (t) in a system using a plurality of sensors, and the unnecessary signals s1 (t),. . . , Sn-1 (t),. . . , Sn + 1 (t),. . . , SN (t) as much as possible wn (f) = [w1n (f),. . . , Wmn (f),. . . , WMn (f)] <T> is realized. [0006] When designing the conventional beamformer 6, “impulse response vector hn (f) from the target signal generation source to each sensor or steering vector an (f) = [exp (−i 2πfτ 1 n), which is an approximation thereof,. , Exp (-i2πfτMn)] <T> (5) is known. Here, τ mn is a time difference between the time when the signal source n reaches the sensor m and the time when it reaches the origin 0. Conventionally, as shown in FIG. 2, a sensor system arranged in a straight line is often used, and assuming that the direction of the signal source n is θ n and the coordinates d m based on the sensor 41 of the sensor 4 m, the above τ mn is τ m n It is given by = dm cos theta n / c. Where c is the speed of the signal. [0007] Referring back to FIG. 1, a filter group (vector) wn (f) which minimizes an output power A '(wn (f)) expressed by the following equation as a conventional beamformer 6 for suppressing unnecessary signals Estimate A ′ (wn (f)) = E {| yn | <2> (f, τ)} = E {yn (f, τ) yn <*> (f, τ)} = E {wn <H> ( f) x (f, τ) x <H> (f, τ) wn (f)} = wn <H> (f) Rx (f) wn (f) (6) where E {·} is the time The averaging operation on τ, A <*> is a complex conjugate of A, Rx (f) = E {x (f, τ) x <H> (f, τ)} is a correlation matrix of the observed signal, [A] <H Indicates a conjugate transpose matrix (vector) of the matrix (vector) A, yn (f, τ) is an output of the conventional beamformer 6, and can be 04-05-2019 3 expressed by the following equation (7). yn (f, τ) = wn <H> (f) x (f, τ) (7) where the meaningless solution (wn (f) = 0 = [0,..., 0] <T> In order to avoid), the constraint shown in the following equation that the target signal is obtained without distortion is given. wn <H> (f) hn (f) = 1 (8) Thus, wn (f) satisfying equation (8) and minimizing the value of A ′ (wn (f)) in equation (6) above. The problem of finding the value of) can be expressed by the following equation (9) using Lagurange's undetermined multiplier p. A (wn (f)) = A ′ (wn (f)) + p (wn <H> (f) hn (f) −1) (9) By solving equation (9), the conventional beamformer 6 Can be obtained with [0008] In the conventional beamformer 6 (conventional adaptive beamformer), the impulse response vector hn (f) in equation (10) is measured and stored in the impulse response storage unit 10 as the impulse response vector hn (f). It is ideal to read out and use. However, instead, the steering vector storage unit 12 stores the steering vector an (f) shown in the above equation (5), and the read steering vector an (f) is an impulse response vector hn (f) of the above equation (10). It is widely used in place of). [0009] However, in a real environment, it is rare that the impulse response vector hn (f) and the steering vector an (f) are correctly given, and the minimization of A (wn (f)) shown in the above equation (9) is not necessarily required. It is often the case that the signal alone is not minimized. From this, instead of the correlation matrix Rx (f) of the mixed signal (observed signal), the correlation matrix RJ (f) = E {ξ (f, τ) of the signal ξ (f, τ) in the time interval of only the unnecessary signal Non-Patent Document 2 and the like widely use a) ξ <H> (f, τ)}. This is known to achieve higher performance than with Rx (f). That is, in the conventional beamformer, it is desirable that the correlation matrix in the time interval of only the unnecessary signal (the time interval of absence of the target sound) can be accurately estimated. [0010] The conventional beamformer 6 outputs an output signal vector yn (f, τ) according to the equation (7) according to wn (f) of the equation (10) and the observation signal vector x (f, τ). The output signal yn (f, τ) is input to the time domain conversion unit 8 and converted from the frequency domain to the time domain to generate yn (t). Ｈａｙｋｉｎ，Ｓ． Adaptive filter 04-05-2019 4 theory Science and Technology Publishing 2001 pp. 690-693 Yoshio Oga Yoshio Yamazaki Yutaka Kanada Acoustics System and Digital Processing ed. [0011] As described above, in the conventional beamformer, the impulse response vector hn (f) from the target signal source to each sensor or the steering vector an (f) that is an approximation thereof is required. That is, there is a disadvantage that prior knowledge about the target signal is required. Furthermore, they are difficult to obtain correctly in the real environment, and the performance of the conventional beamformer is significantly degraded if the prior knowledge and the impulse response vector hn (f) in the usage environment deviate. Also, in order to obtain high performance, it is necessary to estimate the correlation matrix RJ (f) of the signal in the time interval of only the unwanted signal, but it is very difficult when the unwanted signal is a nonstationary signal It is. [0012] In a signal extraction apparatus that observes signals emitted from N signal sources with M sensors and extracts one or more of the observed signals, where N and M are integers of 2 or more Yes, convert the observed signals observed by the M sensors into signals in the frequency domain, normalize the signals in the frequency domain, calculate a normalized observed signal vector, and calculate the normalized observed signal vector Are clustered into the N clusters, and an unnecessary signal correlation matrix which is a correlation matrix of an observation signal including only the unnecessary signals is estimated from the information of the clusters, information of the clusters, the unnecessary signal correlation matrix, The beamformer is calculated from the above, and the target signal is extracted from the signal in the frequency domain using the beamformer, and the extracted target signal is converted into a signal in the time domain. [0013] According to the above configuration, it is possible to separate and extract the target signal with high accuracy without measuring the impulse response vector or the steering vector in advance and even if the unnecessary signal is a non-stationary signal. [0014] The best mode for carrying out the invention will be shown below. 04-05-2019 5 [0015] An example of the functional configuration of the present invention is shown in FIG. 3, and the main processing flow of the present invention is shown in FIG. The same functional components as those in FIG. The same applies to the following. [0016] Also, in the present invention, the sparsity of the signal is assumed. Sparse indicates that the signal is zero at most of time τ. The sparsity of the signal is confirmed, for example, in the speech signal. By assuming the sparsity of the signals, it can be assumed that, even if there are multiple source signals, they do not overlap each other at each time frequency point (f, τ) and there is at most one. That is, the above equation (4) can be expressed by the following equation. [0017] x (f, τ) = hn (f) sn (f, τ) (11) where hn (f) is an impulse response vector and sn (f, τ) is a source signal present in (f, τ) Represent. [0018] Each observation signal xm (t) (m = 1, ...) collected by sensor 4m M) is input to the frequency domain conversion unit 5. In the frequency domain transforming unit 5, each observed signal xm (t) is transformed from the time domain to the frequency domain and transformed to xm (f, τ) by, for example, the above-mentioned short time Fourier 04-05-2019 6 transformation which is a known technique. (Step S2). Furthermore, xm (f, τ) is output as an observation signal vector x (f, τ). The observation signal vector x (f, τ) is input to the normalization unit 22, and the normalized observation signal vector is calculated (step S4). Specifically, the observed signal vector x (f, τ) = [x1 (f, τ),. . . , Xm (f, τ)] <T>, normalization of the declination is performed by the following equation. Also, norm normalization is performed by the following equation. [0019] x <−> (f, τ) ← x <−> (f, τ) / ‖x <−> (f, τ) ‖ (13) where x <−> (f, τ) is normalized Represents the observed signal vector x (f, τ), arg (r) represents the argument of r, i represents an imaginary unit, | r | represents the absolute value of r, and ‖ r ‖ is the norm of r , Q represents the reference sensor number (Q∈ {1,..., M}), c represents the velocity of the signal, and α represents any positive constant. For α, α = 4 dmax is most preferred. However, dmax represents the maximum value of the distance between any sensor Q selected as a reference and other sensors. Also, α may be another numerical value. [0020] From the above equations (11) to (13), the normalized observation signal vector x <-> (f, τ) can be expressed by the following equation. Here, it is understood that An = (Σm = 1 <M> | hmn | <2>) <1/2>, and it depends only on impulse response information on the signal sn (f, τ). [0021] The observed signal vectors x <-> (f, τ) of all normalized time frequencies are input to the clustering unit 24 and clustered into N clusters (step S6). This clustering can be effectively performed using, for example, the k-means method. For details, see “RO Duda, P. E. Hart, and DG Stork, Pattern Classification, Wiley Interscience, 2nd edition, 2000. "It is described in. The method of clustering will be specifically described below. [0022] 04-05-2019 7 In the storage unit 26, the normalized observation signal vector x <-> (f, τ) represented by the above equation (14) and the initial value c <j> n (j = 0, n = 1,. .., N) are stored. The clustering unit 24 reads the normalized observation signal vector x <−> (f, τ) from the storage unit 26 and clusters these to generate N clusters C 1,..., CN. That is, the normalized observation signal vector x <-> (f, τ) which is an M-dimensional complex vector is directly clustered in the following procedure in the M-dimensional complex space. [0023] １． The cluster's centroid initial value c <j> n is read from the storage unit 26. The initial value c <j> n of the centroid is a vector (M-dimensional complex vector) having the same dimension as the normalized observation signal vector x <−> (f, τ). Note that how to select the initial value c <0> n of the centroid will be described later. 2. Let j + 1 be a new j. ３． The normalized observed signal vector x <−> (f, τ) at all time frequencies (f, τ) is assigned to the cluster Cn represented by the nearest centroid c <j−1> n. That is, n is selected such that ‖x <−> (f, τ) −c <j−1> n‖ is smallest for each normalized vector x <−> (f, τ). ４． The centroid is updated by calculating the average value of the normalized observation signal vector x <-> (f, τ) assigned to each cluster Cn and setting its norm to 1. That is, for the normalized observation signal vector x <−> (f, τ) assigned to each cluster Cn, cn <j> = E {x (f, τ)} n / ‖E {x <− Update the centroid by performing the operation of> (f, τ)} n‖ (15). Here, E {·} n represents the averaging operation on the members of the cluster Cn. ５． Repeat steps 2-5 until the centroid c <j> n converges. The centroid converged last is stored in the storage unit 26 as cn (n = 1,..., N). The above is the clustering procedure. [0024] Next, an example of how to select the initial value of the centroid will be described. << Initial Value Setting Method 1 >> N vectors are randomly selected from the normalized observation signal vector x <-> (f, τ), and it is used as an initial value c <0> n (n = 1,) of the centroid. ... and N). << Initial Value Setting Method 2 >> Assuming that | hmn (f) | = 1 for all m and n in the equations (11) to (15), the centroid can be written as the following equation (16) So use this. {Cn} q = E [x <-> (f, τ)] n = exp [i2π (dm-dQ) <T> vn / α] / M <1/2> = exp [i2π‖dm-dQ‖ <T> cos は n <mQ> / α] / M <1/2> (16) Here, dm represents the position vector of the sensor 4m, and vn = cosΘn <mQ> is the sensor 4Q selected as the reference with the sensor 4m Representing the direction of arrival vector of the signal sn (t) with respect to the axis connecting the two and is indicated by a thick vector in FIG. Also, v n is a unit vector and ‖ v n ‖ = 1. 04-05-2019 8 [0025] The sensor position dm (m = 1,..., M) appropriately gives the azimuth θn and the elevation angle φn (n = 1,..., N) held in the storage unit 26. Here, since the sensor position dm, the azimuth θn and the elevation angle φn are initial values, they may be appropriate values. For example, if θn = 2πn / N and φn = 0, spatially dispersed initial values are obtained. [0026] Referring back to FIG. 3, the cluster Cn obtained by the clustering unit 24 corresponds to each source signal sn (f, τ). Also, the centroid c <-> n = E {x <-> (f, τ)} x − (f, τ) ∈ C n can be understood from the above equation (14) as follows: , Τ) are shown to represent impulse response information. [0027] Each cluster Cn is input to the unnecessary signal correlation matrix estimation unit 25. From the information in each cluster Cn, the unnecessary signal correlation matrix estimation unit 25 determines the correlation matrix of the unnecessary signal section with respect to the source signal sn (f, τ), that is, the unnecessary signal that is the correlation matrix of the observation signal including only the unnecessary signal. The correlation matrix R <n> J (f) is estimated by the following equations (17) and (18) (step S8). The estimated unnecessary signal correlation matrix R <n> J (f) and the centroid information c <−> n of the cluster from the clustering unit 24 are input to the beam former calculation unit 28. The beam former calculator 28 calculates a beam former wn (f) from the cluster information Cn and the unnecessary signal correlation matrix R <n> J (f) (step S10). A specific calculation method of the beam former wn (f) will be described in detail in a second embodiment. [0028] The calculated beamformer wn (f) is input to the target signal extraction unit 30. The target signal extraction unit 30 uses the beamformer wn (f) to calculate the following equation (19) to extract the target signal yn (f, τ) from the observation signal x (f, τ) in the frequency domain. 04-05-2019 9 (Step S12). yn (f, τ) = wn (f) <H> x (f, τ) (19) Perform the equations (17) to (19) for all n (n = 1,..., N) To extract all N signals. All yn (f, τ) are input to the time domain conversion unit 32. The target signal yn (f, τ) extracted by the target signal extraction unit 30 is converted into a signal yn (t) in the time domain by, for example, a short-time inverse Fourier transform which is a known technique (step S14). [0029] Next, a second embodiment of the present invention will be described. The second embodiment is an example in which the beamformer calculation unit 28 described in the first embodiment is configured in more detail. FIG. 6 shows a functional configuration example of the beamformer calculation unit 28 of the second embodiment and a portion related to this. The parts not illustrated in FIG. 6 perform the same processing as that described in the first embodiment, and the same applies to the following embodiments. The beamformer calculation unit 28 includes an impulse response estimation unit 40 and an adaptive beamformer calculation unit 42. [0030] The impulse response estimation unit 40 estimates an impulse response vector hn (f) of the target signal from the centroid information c <-> n of the cluster Cn. Specifically, the impulse response vector for the source signal sn (f, τ) is estimated by performing inverse normalization on the centroid information c <−> n from the clustering unit 24. [0031] First, the centroid of the cluster Cn is the above-mentioned equation (15), and x <-> (f, τ) can be expressed by the equation (14). Here, assuming that | hmn | = 1 for all m and n, the m-th component c <−> mn of the centroid c <−> n holds the following equation (20). If equation (20) is solved for hmn (f), the following equation (21) can be obtained. [0032] The right side of equation (20) is the one in which the impulse response hmn (f) is normalized, but equation (21) is recalculated for the impulse response vector hn (f) from equation (20). , Equation (21) is called inverse normalization. 04-05-2019 10 [0033] Referring back to the explanation of FIG. 6, the estimated impulse response vector hn (f) and R <n> J (f) from the unnecessary signal correlation matrix estimation unit 25 are input to the adaptive beamformer calculation unit 42. Ru. [0034] The adaptive beamformer calculation unit 42 calculates an adaptive beamformer wn (f) using the impulse response vector hn (f) and the unnecessary signal correlation matrix R <n> J (f). Specifically, the adaptive beamformer wn (f) can be calculated by the following equation (22). The above equation (22) can be obtained by replacing Rx (f) of the above equation (10) with R <n> J (f). [0035] The target signal extraction unit 30 applies the adaptive beamformer wn (f) of Expression (22) and the signal x (f, τ) in the frequency domain to the above Expression (19) to obtain the target signal yn (f, extract τ). [0036] Next, as a modification of the second embodiment, the second embodiment will be described. Instead of outputting the estimation of the impulse response vector hn (f) using the above equation (22) by the impulse response estimation unit 40, it is also conceivable to estimate and output the steering vector an (f). Assuming that the steering vector of the signal sn (f, τ) is an (f) = [exp (−i2πfτ1 n),..., Exp (−i2πfτMn)] <T> (23) as in the above equation (5) Since the steering vector an (f) is an estimate of the impulse response vector hn (f), when the phase terms of the above equation (21) and the above equation (23) are compared, τ mn (m = 1,..., M) Can be estimated by the following equation (24). τ ^ mn = αc <-1> arg [c <-> mnc <-> Qn] / 2π (24) 04-05-2019 11 The impulse response estimation unit 40 'performs the calculation of this equation (24). [0037] A steering vector an (f) using τ ^ mn is output as an estimation of the impulse response vector hn (f). That is, the steering vector an (f) can be expressed by the following equation (27) from the above equations (23) and (24), and is output as an impulse response ^ n (f) from the impulse response estimation unit 40 ′. an (f) = [exp (-i2πfτ ^ 1n), ..., exp (-i2πfτ ^ Mn)] <T> ≒ h ^ n (f) (25) h ^ n (f) in equation (25) And R <n> J, the adaptive beamformer is calculated by the above equation (22). [0038] Further, when the unnecessary signal correlation matrix estimation unit 25 estimates the adaptive beamformer wn (f) using the above equation (22), the unnecessary signal correlation matrix R <n> J (f) is used, If the acoustic transfer characteristics, that is, the impulse response vector hn (f) and the steering vector an (f) are known, they may be used. Also, instead of the unnecessary signal correlation matrix R <n> J (f) in the equation (22), the correlation matrix Rx (f) of the observation signal may be used. The same applies to the following embodiments. [0039] In the adaptive beamformer shown in the second embodiment, high performance can be obtained when N ≦ M, but there is a problem that the performance is limited when N> M. Specifically, the adaptive beamformer can effectively extract the target signal yn (f, τ) if the number of unnecessary signals is M-1 or less, but if M or more, the effect is obtained It is known to be inadequate. In the third embodiment, therefore, it is shown that the target signal can be extracted even in the case of N> M, that is, when there are N-1 (> M-1) unnecessary signals. An example of functional configuration of the third embodiment is shown in FIG. The third embodiment is different from the second embodiment in that the unnecessary signal selection unit 49 and the input signal estimation unit 50 are added, and the unnecessary signal correlation matrix estimation unit 25, the impulse response estimation unit 40, the adaptive beamformer calculation unit 42, The processing of has been changed. 04-05-2019 12 [0040] The unnecessary signal selection unit 49 estimates the unnecessary signal correlation matrix R <n> J (f) for the K unnecessary signals. Here, K is an integer satisfying K ≦ M−1. That is, K clusters from clusters CL (L = 1,..., N−1, n + 1,..., N) corresponding to unnecessary signals other than the cluster Cn corresponding to the target signal sn (f, τ) Choose these clusters as CJ. As a method of selecting K clusters, a method of selecting K clusters in order from a cluster having many cluster members in cluster CL, or a power of ξ L (f, τ) represented by the following equation (26) It is conceivable to select K clusters in order from. [0041] Using the K clusters CJ selected by the unnecessary signal selection unit 49, the unnecessary signal correlation matrix R <n> J (f) for the K unnecessary signals is calculated by the following equations (27) and (28) . [0042] Further, the input signal estimation unit 50 estimates a beamformer input signal vector x (f, τ) in which the K unnecessary signals selected by the unnecessary signal selection unit 49 and the target signal Cn are mixed. This can be obtained by the following equation (29) using the unnecessary signal cluster CJ and the target signal cluster Cn. In the adaptive beamformer calculation unit 42, the unnecessary signal correlation matrix R <n> J (f) obtained by the above equations (27) and (28) and the impulse response vector hn (f from the impulse response estimation unit 40 or 40 ' The adaptive beamformer wn (f) is calculated using equation (22) above). [0043] The target signal extraction unit 30 receives the beamformer input signal x (f, τ) from the input signal estimation unit 50. The target signal extraction unit 30 calculates the above equation (19) using the adaptive beamformer wn (f) from the adaptive beamformer calculation unit 42 and the beamformer input signal vector x (f, τ), The target signal yn (f, τ) is extracted. 04-05-2019 13 [0044] In the third embodiment, as described above, the case of N> M has been described. However, it can be implemented even in the case of N ≦ M. In this case, as compared with the second embodiment, extra processing is required for the processing of the input signal estimation unit 50 and the unnecessary signal selection unit 49. [0045] The fourth embodiment is an example where the sensor position information is known. A part of the functional configuration example of the fourth embodiment is shown in FIG. The impulse response estimation unit 40 described in the second or third embodiment is composed of an arrival direction estimation unit 60 and an impulse response calculation unit 62. The sensor position information storage unit 64 stores sensor position information indicating the positions of M sensors. [0046] The arrival direction estimation unit 60 estimates the arrival direction of the signal. In the arrival direction estimation unit 60, three-dimensional vectors dm (m = 1,..., M) representing the centroid information c <-> n from the clustering unit 24 and the position of each sensor from the sensor position information storage unit 64. Is input. Assuming that the three-dimensional vector of length 1 representing the direction of arrival of the signal sn is vn (n = 1,..., N), the estimated value of the direction of arrival of the signal sn uses centroid information c <-> n , Can be calculated by the following equation (30). [0047] vn = αD <+> arg [c <-> n] / 2π (30) where D = [d1-dQ,. . . , Dm-dQ,. . . , DM−dQ] <T>, dQ is a three-dimensional vector representing the position of an arbitrarily selected sensor 4Q as a reference, and D <+> is a generalized inverse matrix of D. [0048] 04-05-2019 14 Next, in the impulse response calculation unit 62, an impulse response is calculated using the arrival direction of the signal sn and the sensor position information. The impulse response calculation unit 62 receives the estimated value qn of the arrival direction of the signal sn from the arrival direction estimation unit 60 and the sensor position information dQ from the sensor position information storage unit 64. The impulse response calculation unit 62 obtains an estimated value of the steering vector an ^ (f) for the signal sn represented by the following equation (31). This steering vector an ^ (f) is calculated as an estimated value hn (f) of the impulse response vector. Then, the steering vector an ^ (f) (estimated value hn (f) of the impulse response vector) is output from the impulse response calculator 62 and is input to the adaptive beamformer calculator 42. [0049] In this embodiment, a configuration is shown in which a maximum gain beamformer is used instead of the adaptive beamformer. The maximum gain beamformer is a method in which a filter wn (f) is used as a beamformer that minimizes the unwanted signal component at the sensor array output while maximizing the target signal at the sensor array output. （Ｄ．Ｈ．Ｊｏｈｎｓｏｎ ａｎｄ Ｄ．Ｅ．Ｄｕｄｇｅｏｎ，“Ａｒｒａｙ Ｓｉｇｎａｌ Ｐｒｏｃｅｓｓｉｎｇ Ｃｏｎｃｅｐｔｓ ａｎｄ Ｔｅｃｈｎｉｑｕｅｓ”，Ｐｒｅｎｔｉｃｅ Ｈａｌｌ，１９９３． In the maximum gain beamformer, it is one point to estimate the target signal component and the unwanted signal component in the sensor array output, but it is very difficult to estimate the unwanted signal when the unwanted signal is a non-stationary signal. There was a problem that it was difficult. The fifth embodiment solves this problem by using the sparsity assumption. That is, (1) target signal correlation matrix RT <n> (f) which is a correlation matrix of observed signals of only target signals, and unnecessary signal correlation matrix RJ <n> (f) which is a correlation matrix of observed signals of only unnecessary signals. And (2) maximum gain beamformer wn (f) from the target signal correlation matrix RT <n> (f) and the unwanted signal correlation matrix RJ <n> (f) in the maximum gain beamformer calculation unit. It can be solved by estimating. [0050] In addition, since the maximum gain beamformer does not have the above equation constraint condition (8) of “minimizing distortion of the target signal”, beamformers wn (f) having 04-05-2019 15 various gain characteristics are configured at each frequency f. Ru. This means that, for example, when the maximum gain beamformer is applied to a wideband signal such as a speech signal, the output is distorted due to the frequency characteristic of wn (f). For this reason, conventionally, it has been difficult to use a maximum gain beamformer for a wideband signal. In the fifth embodiment, this is achieved by correcting the maximum gain beamformer wn (f) so that the error between the observed signal vector x (f, τ) and the output signal of the maximum gain beamformer wn (f) is minimized. Solve [0051] First, the principle of the maximum gain beamformer will be briefly described. As described above, under the condition of “maximize the target signal in the sensor array output and minimize the unnecessary signal component in the sensor array output”, the evaluation function is given by the following equation (32). [0052] Here, the denominator is the output power of the unnecessary signal, the numerator is the output power of the target signal, RT <n> (f) is the correlation matrix of the observed signal of the target signal only, and RJ <n> (f) is the unnecessary signal only The correlation matrix of the observation signal of Also, it can be expressed as (RJ <n> (f)) <1/2> = EF <1/2> E <H>, where E = [e1,. . . em], ei is an eigenvector of RJ <n> (f), F = diag (λ1,..., λM), λi is an eigenvalue of RJ <n> corresponding to ei, w < If it is set that >> = (RJ <n> (f)) <1/2> wn, the above equation (32) can be changed to the following equation (33). [0053] Here, the maximum value of g (w <∼>) is (RJ <n> (f), according to the Rayleigh quotient theorem described in “Kodama, Suda,“ Matrix theory for system control, Corona Co., 1995 ”. Given a maximum eigenvalue λ of <−1⁄2> (RT <n> (f)) (RJ <n> (f)) <− 1⁄2> and the corresponding eigenvector e, the maximum value is It becomes maxg (w <->) = (lambda) = g (e). That is, the maximum gain beamformer wn to be obtained can be expressed by the following equations (34) and (35). w <∼> = e (34) wn = (RJ <n> (f)) <− 1⁄2> e (35) An example of a functional configuration of the fifth embodiment is shown in FIG. Compared with the first embodiment, the observed signal correlation matrix estimation unit 72 is added, and the beamformer calculation 04-05-2019 16 unit 28 includes the target signal correlation matrix estimation unit 70, the eigenvector calculation unit 74, the maximum gain beamformer calculation unit 76, and the correction vector calculation. And a correction unit 80. [0054] The target signal correlation matrix estimation unit 70 estimates the correlation matrix of the time section of only the target signal sn (f, τ) from the information of the cluster according to the following equations (36) and (37). Here, Cn is a cluster corresponding to the target signal. The unnecessary signal correlation matrix RJ <n> (f) from the unnecessary signal correlation matrix estimation unit 25 and the target signal correlation matrix RT <n> (f) are input to the eigenvector calculation unit 74. In the eigenvector calculation unit 74, the maximum eigenvector en (f of (RJ <n> (f)) <− 1/2> (RT <n> (f)) (RJ <n> (f)) <− 1/2> ) Is calculated from the Rayleigh quotient theorem described above. [0055] The maximum gain beamformer calculation unit 76 receives RJ <n> (f) from the unnecessary signal correlation matrix estimation unit 25 and en (f) from the eigenvector calculation unit 74. The maximum gain beamformer calculation unit 76 calculates the maximum gain beamformer wn (f) according to the following equation (38). wn (f) = (RJ <n> (f)) <− 1⁄2> en (f) (38) This equation (38) is based on the above equation (35). [0056] On the other hand, the observation signal correlation matrix estimation unit 72 estimates the observation signal correlation matrix Rx (f), which is a correlation matrix of the observation signal vector x (f, τ), using the following equation (39). Rx (f) = E {x (f, τ) x <H> (f, τ)} (39) In the correction vector calculator 78, the maximum gain beamformer wn (f) from the maximum gain beamformer calculator 76. And the observation signal correlation matrix Rx (f) from the observation signal correlation matrix estimation unit 72. The correction vector calculation unit 78 generates a correction vector α n (f) for correcting the maximum gain beamformer wn (f). This correction transforms the maximum gain beamformer wn (f) such that the distortion the maximum gain beamformer wn (f) imparts to the output is minimal. For example, a correction vector α n (f) is calculated which minimizes an error A between the observed signal vector x (f, 04-05-2019 17 τ) and the output signal vector yn (f, τ) represented by the following equation (40). A (αn (f)) = E {‖x (f, τ) −αn (f) yn (f, τ) ‖2>} (40) where yn (f, τ) is the maximum gain beamformer The output wn (f) of wn (f) = wn (f) x (f, τ). Expanding the right side of the above equation (40), A (α n (f)) = {E [‖ x (f, τ) ‖]} <2> -α n (f) E [x <H> (f, τ ) Yn (f, τ)]-αn <H> (f) E [yn (f, τ) <*> x (f, τ)] + αnαn <H> E [│yn (f, τ) │ <2 >] (41) In the equation (41), partial differentiation of both sides by α n <H> (f) results in the following equation (42). ∂ A (α n (f)) / α α n <H> (f) =-E [yn (f, τ) <*> x (f, τ)] + α n E [│ yn (f, τ) │ <2> (42) Assuming that the left side of the above equation (42) is 0 and obtaining for α n, the following equation (43) is obtained. αn (f) = E [yn (f, τ) <*> x (f, τ)] / E [│yn (f, τ) │ <2>] (43) where the above equation (19) and From the above equation (39), the above equation (43) becomes the following equation (44). [0057] Here, as described above, Rx (f) is a correlation matrix of the observation signal vector x (f, τ). As understood from the above equation (44), using the maximum gain beamformer wn (f) and the observation signal vector x (f, τ), the correction vector calculator 78 calculates the correction vector αn (f). Ru. [0058] The correction unit 80 corrects the frequency distortion for the maximum gain beamformer wn (f) using the correction vector α n (f) and calculates a correction beamformer. Specifically, the corrected beam former wn '(f) can be determined by the following equation (45). wn '(f) = [. alpha.n (f)] Bwn (f) (45) where B is an arbitrary sensor number, and B.ident. {1,. . . , M}, and indicates that [q] B is the B-th element of the vector q. [0059] The target signal extraction unit 30 extracts the target signal yn (f, τ) by the following equation (46) using the correction beam former wn ′ (f). yn (f, τ) = wn '<H> (f) x (f, τ) (46) Further, a functional configuration example of a modification of the fifth embodiment is shown in FIG. The beamformer calculation unit 28 includes a target signal correlation matrix estimation unit 70, an eigenvector calculation unit 74, and a maximum gain beamformer calculation unit 76. The target signal extraction unit 30 includes a signal extraction unit 81 and a distortion correction unit 82. It is done. 04-05-2019 18 [0060] The maximum gain beamformer wn (f) from the maximum gain beamformer calculation unit 76 and the observation signal vector x (f, τ) from the frequency domain conversion unit 5 are input to the signal extraction unit 81. The signal extraction unit 81 calculates the following equation (47) to extract the target signal yn (f, τ) including distortion. yn (f, τ) = wn <H> (f) x (f, τ) (47) The target signal yn (f, τ) including distortion is input to the distortion correction unit 82. Further, the correction vector α n (f) from the correction vector calculation unit 78 is also input to the distortion correction unit 82. The distortion correction unit 82 corrects the distortion by converting the output signal by the following equation (48), and outputs a corrected output signal yn ′ (f, τ). yn '(f, τ) = [αn (f)] Byn (f, τ) (48) In the first to fifth embodiments described above, it has been described that signals are extracted for all n. The beamformer may be configured only for the signal (one n). For selection of the target signal, for example, comparing the impulse response vector hd of the target signal on the database with the impulse response vector hn estimated for all the sound sources n by the invention method, the sound source n having hn closest to hd It can be selected by selecting. For example, an algorithm such as minn (h1 · hn) can be considered. If a beamformer using the above equation (24) or the like by the beamformer calculation unit 28 described in the second to fifth embodiments only for the selected n is configured, an adaptive beamformer for a target signal can be obtained. [0061] [Experimental Results] Experiments were conducted to show the effects of the above examples. A mixed signal was simulated by convolutionally mixing impulse responses measured in the room shown in FIG. 11 with a plurality of sounds. The experimental conditions are as shown in FIG. In a room with a long side of 880 cm, a short side of 375 cm and a height of 240 cm, and reverberation of 120 ms, three sensors 41, 42 and 43 were arranged at a position of 200 cm from the long side of the bottom and 282 cm from the short side. A long side and a parallel axis are x, and a short side and a parallel axis are y. As shown in FIG. 12, three sensors 41, 42 and 43 are two on the y axis, one on the x axis, and the long side. The experiment is performed in the case of arranging in two dimensions at the apex of an equilateral triangle of 4 cm. Also, a microphone was used as a sensor. Signal to unwanted signal ratio (SIR) and signal to distortion ratio (SDR) were evaluated for the four speech combinations. The unit is dB. [0062] 04-05-2019 19 The four sound sources are located at the intersection of the x and y axes at the sensor position, the + direction of the x axis is 0 degrees, 30 degrees counterclockwise, the direction of the 315 degrees and the sensor position and a circle with a radius of 50 cm Each sound source is placed on the intersection, and the sound source is placed on the intersection of a direction of 225 degrees and 315 degrees and a circle with a radius of 80 cm. In an experiment to confirm the effect of the second embodiment, sound sources in the directions of 120 degrees, 225 degrees, and 315 degrees were used, and N (number of source signals) = M (number of sensors) = 3. In the experiment to confirm the effect of Example 3, N = 4 and M = 3. [0063] FIG. 13 shows the results of this experiment. The third embodiment shows the case where the input signal estimation unit 50 shown in FIG. 7 is provided in the conventional method, the second embodiment, the second embodiment, the fourth embodiment and the fifth embodiment. In the conventional method, in the above equation (10) representing the adaptive beamformer 6 shown in FIG. 1, hn (f) provided with a known steering vector an (f) is used. In this case, high performance was not obtained for both N = M and N> M. Since this is an experiment in a reverberant environment, it can be considered as the main reason that the given steering vector an (f) could not be considered until the reverberation effect. Also, the fact that a sufficient SIR can not be obtained when N> M indicates that only the limit of the adaptive beamformer, that is, M-1 unnecessary signals can be effectively suppressed. [0064] Compared with the conventional method, it is understood that the above embodiment has higher performance than the conventional method when the values of SIR and SDR are compared when N = M and even when N> M. [0065] In addition to the above embodiments, the blind signal extraction device according to the present invention is not limited to the above-described embodiments, and various modifications can be made without departing from the scope of the present invention. 04-05-2019 20 Also, the processing described in the blind signal extraction device is not only performed in chronological order according to the order of description, but also if it is performed in parallel or individually depending on the processing capability of the device performing the processing or the need. Good. [0066] Further, when the processing in the blind signal extraction device of the present invention is realized by a computer, the processing content of the function that the blind signal extraction device should have is described by a program. Then, by executing this program on a computer, the processing function of the blind signal extraction device is realized on the computer. [0067] The program describing the processing content can be recorded in a computer readable recording medium. As the computer readable recording medium, any medium such as a magnetic recording device, an optical disc, a magneto-optical recording medium, a semiconductor memory, etc. may be used. Specifically, for example, as a magnetic recording device, a hard disk device, a flexible disk, a magnetic tape or the like as an optical disk, a DVD (Digital Versatile Disc), a DVDRAM (Random Access Memory), a CD-ROM (Compact Disc Read Only) Memory), CD-R (Recordable) / RW (Rewritable), etc. as magneto-optical recording medium, MO (Magneto-Optical disc) etc., as semiconductor memory EEP-ROM (Electronically Erasable and Programmable Only Read Memory) etc. Can be used. [0068] Further, this program is distributed, for example, by selling, transferring, lending, etc. a portable recording medium such as a DVD, a CD-ROM or the like in which the program is recorded. Furthermore, this program may be stored in a storage device of a server computer, and the program may be distributed by transferring the program from the server computer to another computer via a network. [0069] 04-05-2019 21 For example, a computer that executes such a program first temporarily stores a program recorded on a portable recording medium or a program transferred from a server computer in its own storage device. Then, at the time of execution of the process, the computer reads the program stored in its own recording medium and executes the process according to the read program. Further, as another execution form of this program, the computer may read the program directly from the portable recording medium and execute processing according to the program, and further, the program is transferred from the server computer to this computer Each time, processing according to the received program may be executed sequentially. In addition, a configuration in which the above-described processing is executed by a so-called ASP (Application Service Provider) type service that realizes processing functions only by executing instructions and acquiring results from the server computer without transferring the program to the computer It may be Note that the program in the present embodiment includes information provided for processing by a computer that conforms to the program (such as data that is not a direct command to the computer but has a property that defines the processing of the computer). [0070] Further, in this embodiment, the blind signal extraction device is configured by executing a predetermined program on a computer, but at least a part of the processing contents may be realized as hardware. [0071] The present invention separates the target voice even in a situation where the microphone may pick up a sound other than the target speaker's voice because the input microphone of the voice recognition machine and the speaker are separated for application in the audio field. Extraction makes it possible to construct a speech recognition system with a high recognition rate. [0072] The block diagram which shows the function structural example of the system of a prior art. When using the sensor system arrange | positioned linearly, the figure for demonstrating time difference (tau) of the time when the sound source n reaches arbitrary sensor j, and the time which reaches the origin 0. FIG. 04-05-2019 22 BRIEF DESCRIPTION OF THE DRAWINGS The block diagram which shows the function structural example of the system of Example 1 of this invention. The flowchart which shows the flow of the main processing of Example 1 of this invention. The figure for demonstrating cos (epsilon) n <mQ> used by said Formula (16) in the sensor m and the sensor Q which are arbitrary two sensors. The block diagram which shows a part of function structural example of the system of Example 2 of this invention. The block diagram which shows a part of function structural example of the system of Example 3 of this invention. The block diagram which shows a part of function structural example of the system of Example 4 of this invention. The block diagram which shows a part of function structural example of the system of Example 5 of this invention. The block diagram which shows a part of function structural example of the system of the modification of Example 5 of this invention. The figure which looked at the comparative experiment of a prior art and the technique of this invention from right above. The figure which shows the detail of the positional relationship of three sensors 41, 42, 43 of FIG. The figure which shows the experimental result which compared the effect of the prior art and the technique of this invention. 04-05-2019 23

1/--страниц