close

Вход

Забыли?

вход по аккаунту

?

JP2010212818

код для вставкиСкачать
Patent Translate
Powered by EPO and Google
Notice
This translation is machine-generated. It cannot be guaranteed that it is intelligible, accurate,
complete, reliable or fit for specific purposes. Critical decisions, such as commercially relevant or
financial decisions, should not be based on machine-translation output.
DESCRIPTION JP2010212818
An object of the present invention is to estimate those parameters using only observation signals
from the state where the spatial positions of a sound source and a microphone and the time
origin of each received signal are unknown. A method of processing a multi-channel signal with
unknown recording start time received by a plurality of microphones, wherein provisional
synchronization of the multi-channel signals received by each microphone is performed; In the
received signal, the step of detecting the time difference of the received signal between the two
channels, and minimizing the error between the detected time difference and the estimated value
of the time difference theoretically derived from the sound source position, the microphone
position, and the recording start time And estimating the unknown number among the recording
start time, the sound source position, and the microphone position. [Selected figure] Figure 1
Method of processing multi-channel signals received by a plurality of microphones
[0001]
The present invention relates to microphone array signal processing.
[0002]
In microphone array signal processing in which sound source localization, sound source
separation, noise suppression, and the like are performed by disposing a plurality of
microphones spatially and processing a plurality of reception signals received by the plurality of
microphones, generally, It is necessary that the position is known and that each received signal
be synchronized in time.
04-05-2019
1
Therefore, conventionally, it has been common to use a system in which a microphone is fixed to
a frame or a mount and each received signal is synchronized by an A / D converter.
[0003]
On the other hand, there are many recording devices such as microphones and IC recorders built
in PCs and mobile phones around us. If it becomes possible to configure an array using such a
distributed recording device, it is considered that the application range of array signal processing
technology will be significantly expanded. However, when such independent recording devices
are used, their positional relationship is unknown, and it is usually not synchronized on the time
axis.
[0004]
On the other hand, Kobayashi et al. Proposed a method of simultaneously estimating the spatial
positions of the microphone and the sound source from the observation signal even if the
microphone position is unknown (Patent Document 1, Patent Document 2, Non-Patent Document
1). More specifically, for example, in Patent Document 1, an inter-channel time difference
estimation step of obtaining an actual measurement value of an inter-channel sound reception
time difference between two channels among the plurality of channels from sound reception
signals of a plurality of channels; Position estimation that estimates the sound source position
and the sound reception position by minimizing an error between the estimated value of the
inter-channel sound reception time difference obtained from the sound source position and the
estimated sound reception position and the measured value of the inter-channel sound reception
time difference A sound source reception position estimation method is disclosed. However, it
was assumed that the received signal at each recording device was synchronized.
[0005]
That is, although array signal processing is actively researched and is being put to practical use
for the purpose of acquisition of positional information of sound sources and separation of mixed
sounds, time difference between channels in both localization of sound sources and separation Is
important information, and in order to obtain it accurately, it is necessary in the prior art that the
recording start times of multi-channel signals be synchronized.
04-05-2019
2
[0006]
Patent No. 3720795 JP, 2007-81455, A
[0007]
Kobayashi Kazunori, Furuya Kenichi, Kataoka Akitoshi, "Blind sound source position estimation
using multiple microphones whose positions are unknown", Transactions of the Institute of
Electronics, Information and Communication Engineers, Vol.
86, No. 6, pp. 619-627, 2003.
[0008]
An object of the present invention is to estimate the unknown among the sound source position,
the position of the microphone, and the recording start time only from the observation signal
even if the recording start time of the observed multi-channel signal is unknown. .
An object of one embodiment of the present invention is to estimate a sound source position, a
position of a microphone, and a recording start time only from an observation signal even if the
recording start time of a multi-channel signal is unknown.
[0009]
The technical means adopted by the present invention is a method of processing a multi-channel
signal whose recording start time received by a plurality of microphones is unknown, and
temporarily synchronizing the multi-channel signals received by each microphone; Detecting a
time difference between received signals of two channels among a plurality of synchronized
received signals; detected time difference; estimated value of time difference theoretically derived
from sound source position, microphone position, and recording start time; Estimating an
unknown among the recording start time, the sound source position, and the microphone
position so as to minimize the error of the signal processing method.
[0010]
04-05-2019
3
In one aspect, the recording start time, the sound source position, and the microphone position
are all unknowns, and only the observation signal is used to simultaneously estimate the sound
source position, the microphone position, and the recording start time.
[0011]
“To provisionally synchronize multi-channel signals” means an operation to reduce the
difference in recording start time of multi-channel signals to such an extent that sound
components coming from the same sound source can be associated among the multi-channel
signals. means.
The purpose of the step of provisional synchronization is to roughly compensate for the
difference in recording start time between the observed signals, and to allow the sound coming
from the same sound source to be associated between the observed signals in the next time
difference detection step. It is to be.
That is, in the step of provisional synchronization, the difference between the recording start
times of multi-channel signals received by each microphone is reduced, and sound components
coming from the same sound source can be associated among the multi-channel signals. The step
of detecting the time difference is a step of detecting the time difference between the two
channels with respect to the associated sound component. In one aspect, temporary
synchronization is performed by shifting the time axis of signals using a time difference (referred
to as an average time difference) acquired from the peak of the cross correlation function, and
roughly aligning time origins between observation signals. This is referred to herein as average
time synchronization. That is, in one aspect, the step of establishing synchronization determines
the average time difference between the respective received signals by peak detection of the
cross correlation function, and the respective received signals on the time axis so that the
average time difference becomes zero. It is time synchronization by shifting. In the
synchronization step, the requirements for the calculation of the cross-correlation function are as
follows: “A sound source signal that is sufficiently long and meaningful for the peak of the
cross-correlation function to appear clearly is included (silence If it is a section, it does not matter
how long it is included). This time interval does not necessarily have to be the entire time
interval, and does not have to include the recording start time, nor does it need to include all the
sound source signals. Regarding the aspect of the step of taking temporary synchronization,
besides detecting the cross correlation peak of the time signal waveform, the peak detection of
the cross correlation between the signal envelopes, the peak detection of the cross correlation
between the spectrograms, etc. Those skilled in the art will understand that is possible. It is also
04-05-2019
4
understood by those skilled in the art that the means for temporarily synchronizing is not limited
to the one using the cross correlation function.
[0012]
In one aspect, the step of detecting the time difference comprises: dividing the received signal
into short time frames, selecting a frame that is considered to be observable only by a single
sound source signal, and setting the time difference between the received signals frame by frame.
To detect. In the case of obtaining a time difference for each frame, in one aspect, information of
all frequencies is used in a certain time frame to obtain one time difference, but in another
aspect, a certain time frame, one component of a certain frequency The time difference may be
detected "every time frequency" to estimate one time difference. That is, the time difference may
be determined for one or a plurality of time frequency components, and it is "per frame" that a
plurality of methods become all frequency components in one time frame, and the other plurality
It may be taken.
[0013]
The cross-correlation function used in the present invention is not limited to the usual crosscorrelation function, and may be, for example, a “generalized cross-correlation method” for
obtaining a peak of the filtered cross-correlation function. Further, the means for obtaining the
time difference is not limited to the one using the cross correlation function, but the candidate
value of the time difference is obtained from the phase difference for each frequency component,
and the histogram is generated to obtain the largest vote. It is possible that the “voting
method” to be used, “maximum likelihood estimation method”, etc., which is obtained by
assuming the probability model of the error included in the observation signal and maximizing
the evaluation criterion of likelihood most likely time difference, etc. It is understood by the
trader. The means for obtaining the time difference not using the cross correlation function here
can be applied not only to the time difference detection step but also to the step of provisional
synchronization.
[0014]
In one aspect, the step of estimating the unknown quantity is performed by preparing an
evaluation function J (Θ) that defines the error, and estimating a parameter that minimizes the
04-05-2019
5
evaluation function J (Θ) by optimization means. In one aspect, the error is Where: ε imn: error
between detected time difference and estimated value of time difference, s: position vector of
sound source, r: position vector of microphone, t: time origin of microphone (recording start
time), i: ordinal number of sound source m, n: Ordinal number of the microphone, τ imn: The mth based on the n-th signal assuming that the signals coming from the sound source i are
observed by the microphones m and n and the time difference between the signals is detected
Signal time delay, c: sound velocity,
[0015]
In one aspect, the unknown parameter Θ is determined by minimizing the evaluation function 最
適 by optimization calculations. Here, K: number of sound sources, L: number of microphones.
[0016]
In one aspect, the so-called auxiliary function method is used as the optimization calculation used
to perform the third step. The auxiliary function is applied to the evaluation function, and the
parameter is estimated iteratively by the following update equation derived from the auxiliary
function. Here, μ, e: auxiliary variable.
[0017]
By using the auxiliary function method, the unknown parameters si, rj and tj can be efficiently
solved, but the optimization calculation used in the present invention is not limited to the method
using the auxiliary function method. For example, reduce the evaluation function J (Θ) using
gradient method (including conjugate gradient method, steepest descent method), Newton
method, quasi-Newton method, or approximation method or deformation method of these, or
other optimization algorithm The parameter Θ may be determined iteratively.
[0018]
The present invention is also provided as a processing device for multi-channel signals with
unknown recording start time received by a plurality of microphones, and means for temporarily
04-05-2019
6
synchronizing multi-channel signals received by each microphone; Of the received signal
between the two channels, the error between the detected time difference, and the estimated
value of the time difference theoretically derived from the sound source position, the microphone
position, and the recording start time. Means for estimating an unknown among the recording
start time, the sound source position, and the microphone position, and a signal processing
method comprising: The hardware configuration for executing each step or each means of the
present invention includes a computer such as a personal computer (specifically, an input unit,
an output unit (may include a display unit), a CPU, a storage device (ROM , RAM, etc., and a bus
etc. connecting these. Can be composed of Therefore, according to the present invention, a
computer is received by each microphone in order to estimate the unknown among the recording
start time, the sound source position, and the microphone position from the multichannel signal
whose unknown recording start time is received by a plurality of microphones. Means for
obtaining temporary synchronization of multi-channel signals, means for detecting a time
difference between reception signals of two channels among a plurality of temporarily
synchronized reception signals, detected time difference, sound source position, microphone
position, recording start A computer program for causing an unknown number among a
recording start time, a sound source position, and a microphone position to be estimated so as to
minimize an error between an estimated value of time difference theoretically derived from time
of day; Is also provided.
[0019]
According to the present invention, even if the recording start time of a multi-channel signal is
unknown, the sound source position, the position of the microphone, and the recording start time
can be simultaneously estimated from only the observation signal. Although there are many
potential applications such as sound source localization, sound source separation, robot hearing,
etc., microphone arrays have been greatly restricted since it was premised on acquiring timesynchronized multi-channel signals until now. The present invention is a basic technology for
configuring an independent recording device such as a PC built-in microphone or an IC recorder
as a microphone array, and the application range of sound source localization and sound source
separation by the microphone array is greatly expanded. It also leads to new applications such as
sound security and monitoring linked to networking.
[0020]
It is a figure showing an outline of the present invention. It is a figure which shows the
correspondence of the time origin and the time axis of each microphone. It is a figure which
04-05-2019
7
shows the observation signal 1 of a microphone. It is a figure which shows the observation signal
2 of a microphone. It is a figure which shows the observation signal which shifted the time origin.
It is a figure which shows the observation signal which acquired the average time
synchronization. It is a figure which shows the observation signal divided into the flame | frame.
FIG. 5 is a diagram showing a normalized cross-correlation function of a single sound frame. FIG.
5 is an enlarged view of a certain frame selected from FIG. 4; The left figure shows three points
near the peak of the normalized cross correlation function, and the right figure shows the
interpolated peak. It is a principle figure of an auxiliary function method. It is a figure which
shows the estimation result of a microphone position. It is a figure which shows the presumed
result of a sound source position.
[0021]
In one embodiment of the present invention, only the observation signal is used to
simultaneously estimate the spatial positions of the sound source and the microphone and the
recording time origin of each recording device, all of which are unknown. In this embodiment,
simultaneous estimation of the sound source position, the position of the microphone, and the
recording start time is realized by the following three steps. (1) Average time synchronization of
observed signals By peak detection of the cross correlation function, an average time difference
between each observed signal is obtained, and time synchronization is performed by shifting the
signals on the time axis such that the average time difference becomes zero. Take (2) Single
source detection and time difference detection for each frame The observation signal is divided
into short time frames, and a frame which is considered to be observable only by the single
source signal is selected, and the time difference between the observation signals is selected for
each frame. To detect. (3) Iterative estimation of unknown numbers The sound source position,
the microphone position, and the recording start time are iteratively estimated so that the
detected time difference satisfies the theoretical formula. The present embodiment will be
described in detail below.
[0022]
[A] Approach of the Present Embodiment In the following, it is assumed that acoustic signals
from K sound sources are observed by L microphones, and the positions of the sound source and
the microphone are si = (xiyizi) <t> (1 ≦ i ≦ K), rn = (unvnwn) <t> (1 ≦ n ≦ L) Where t
represents transposition. Also, tn is assumed to be the recording start time (time origin) of the nth microphone measured by a certain reference clock, and it is assumed that the advance of time
on each recording device is equal. The purpose is to estimate all these parameters using only the
04-05-2019
8
observation signal of each microphone from the situation where si, rn and tn are all unknown.
[0023]
In position estimation of a sound source or a microphone, it is important to acquire time-arrival
differences between observation signals. Now, the signals coming from the sound source i are
observed by the microphones m and n, and the arrival times on the time axis of the respective
microphones are t <m> i, t <n> i as shown in FIG. 1A. t <m> i, t <n> i are observable quantities,
and by taking their difference, the apparent time delay of the m-th signal relative to the n-th
signal with respect to the sound source i That is, the time difference τimn between the
observation signals is obtained as follows. It will be considered that the observation amount
τimn is expressed using the previously defined parameters si, rn, tn. The arrival times of the
signals from the sound source i at the microphones m and n measured by the reference clock are
(tm + t <m> i) and (tn + t <n> i), respectively. The right side of the equation (3) is a theoretical
equation, and the first term of the right side means the true time difference, and the second term
means the deviation of the recording start time.
[0024]
When each observation signal is synchronized and the second term of equation (3) is 0, a method
is proposed to simultaneously estimate the sound source position and the microphone position
by ensuring consistency of this time difference. (Patent Documents 1 and 2, Non-patent
Document 1). However, if the time difference obtained from the observation signal also includes
the difference between the unknown recording start times, it may seem that no effective
information can be obtained. Therefore, first, the relationship between the number of observables
and the unknowns will be described as the condition under which the solution can be obtained.
[0025]
The observation amount is a time difference τimn between the observation signals, and L−1
time differences are independent observation amounts for one sound source. On the other hand,
the unknowns are the three-dimensional position (xi, yi, zi), (un, vn, zn) of the sound source and
the microphone, and the recording start time tn (where 1 ≦ i K K, 1 n n 、 L). In the case of
estimation based on, it is only relatively decided, and one degree of freedom of the choice of the
reference clock and six degrees of freedom of translation and rotation of the choice of the
04-05-2019
9
absolute coordinate system are not decided. In order for the unknown number of to be
determined, it is necessary to satisfy at least the following. It becomes as it is arranged.
[0026]
[B] Acquisition of observation quantity As shown in equation (3), the detected time difference
includes 1) time difference due to difference in recording start time 2) arrival time difference due
to positional relationship between sound source signal and microphone ing.
[0027]
1)
It is difficult to predict in advance what value it will be.
On the other hand, 2) takes different values for each sound source, and assuming that the
maximum distance between the sound source and the microphone is D, its absolute value does
not exceed at most D / c. For example, when recording the contents of a discussion in a meeting
room or the like, it is sufficient that D = 10 [m], and in this case, D / c = 3.0 × 10 <-2> [s]. On the
other hand, 1) and 2) may be considered to have different orders, since the recording start time
is expected to deviate from several seconds to several minutes. Therefore, in order to detect the
time difference efficiently, it is first considered to roughly compensate 1) and roughly align the
observed signals, and then calculate 2). 2)It is considered effective to divide the observation
signal into frames when obtaining. In order to estimate the sound source position, the
microphone position, and the time origin, 1) may be compensated later, and 2) may be used as
the observation amount. Therefore, in this embodiment, the following time difference detection
algorithm is used.
[0028]
Step 1: Average time synchronization of observation signals Calculate the cross-correlation
function of one observation signal and all other observation signals using all time intervals, and
based on the average time difference found from the peak, the observation signal Make a rough
alignment between them. Step 2: Frame division of observation signal A frame length sufficiently
large for D / c is selected, and the observation signal is divided into frames. Step 3: Single sound
detection and time difference detection for each frame Calculate a cross-correlation function
normalized between the observed signals for each frame, and if the peak value exceeds a certain
04-05-2019
10
threshold, significant single sounds are included. It is determined that the frame is present, and
the time difference between the observed signals is detected from the peak position. Describe
each step in detail.
[0029]
[B-1] Average time synchronization of observation signals Assuming that L waveforms as shown
in FIG. 2A, FIG. 2B,... Are obtained as observation signals, they are expressed as wi (n) (1 ≦ i ≦
L). Do. Further, it is assumed that the sampling frequency of each observation signal is fi and that
the number of sampling points is N in advance by adding 0 to the end of each observation signal.
When these are superimposed and plotted, it becomes like FIG. 3A, and it is almost impossible to
know the correspondence of the signal from each sound source between each observation signal.
Therefore, we consider using the cross correlation function between observed signals. The cross
correlation function of wi (n) and wj (n) is defined as follows. Using w1 (n) as a reference
observation signal, a cross correlation function Rj1 (m) (1 ≦ j ≦ L) of w1 (n) and wj (n) (1 ≦ j ≦
L) is calculated. Then, (mj-N-1) / fj is an average time difference of wj (n) with respect to w1 (n).
Based on this, rough alignment of observation signals is taken. Then, it becomes like FIG. 3B, and
the correspondence of each sound source sky signal between observation signals becomes clear.
[0030]
[B-2] Frame Division of Observation Signal Next, in order to detect the time difference between
the microphones for each sound source, FIG. 3B is equally divided into frames. As described
above, assuming that the maximum distance between the microphone and the sound source is D,
the arrival time difference when observing a certain sound source between two microphones
does not exceed D / c. Therefore, the frame length needs to be larger than D / c. It is FIG. 4 which
selected the frame length paying attention to this, and divided equally into a frame. Assuming
that each observation signal is divided into Q frames, the q (1 ≦ q ≦ Q) th frame of wi (n) is set
as wi <(q)> (n).
[0031]
[B-3] Single Tone Detection and Time Difference Detection Per Frame In order to detect a frame
containing only a single tone, a normalized cross-correlation function is calculated for all pairs of
observed signals in each frame. That is, for q = 1, 2,..., Q, if the average of wi <(q)> (n) is
04-05-2019
11
represented by w (bar) i <(q)>, is calculated.
[0032]
If wi <(q)> (n) and wj <(q)> (n) contain only a single tone, Rij <(q)> (m) has a sharp peak as shown
in FIG. Conversely, if it does not contain a single note, or if it contains multiple notes, it has a dull
peak. Therefore, in order to detect a frame including a single sound, it is appropriate to make a
judgment based on its peak value. The threshold value I (0 <I <1) regarding the peak value is set.
Then, if pij <(q) >> I for any i, j, the q-th frame is detected as a frame containing only a single
sound.
[0033]
Next, consider detecting a time difference for a frame in which a single sound is detected. If three
points near the peak of Rij <(q)> (m) are viewed in order to detect the time difference more
accurately, it is generally considered as shown in the left diagram of FIG. In the neighborhood, it
is sufficient to approximate by the second order terms of the Taylor expansion, so as shown in
the right figure of Fig. 7, fitting is performed with the quadratic function fij <(q)> (m) passing
through these three points. It is an estimate of true m that gives a peak of Rij <(q)> (m), and (mij
<(q)> − N / Q−1) / fi is for the sound source included in the qth frame , Arrival time difference
at the microphone i with respect to the microphone j. Therefore, the time difference can be
detected by setting this as τqij.
[0034]
[C] Derivation of Iterative Solution [C-1] Setting of Evaluation Function By minimizing the square
error of Equation (3) which is a theoretical expression: unknown parameter Θ = {si, rn, tn | 1 ≦ i
≦ Consider determining K, 1 ≦ n ≦ L}. That is, the position vector s of the sound source which
is an unknown number, the position vector r of the microphone, and the time origin (recording
start time) t of the microphone are estimated using the observation amount τimn (difference of
arrival time of m with respect to the microphone n regarding the sound source i). Do.
[0035]
04-05-2019
12
[C-2] Auxiliary Function Method In this embodiment, an optimization method called an auxiliary
function method is used in order to efficiently obtain a solution that minimizes Equation (11). For
the evaluation function J (Θ), define J + (Θ, Θ +) as an auxiliary function of J (Θ) and Θ + as an
auxiliary variable when Then the following holds. (Theorem 1) The evaluation function J (Θ) can
be monotonically decreased by repeating the step of minimizing the auxiliary function J + (Θ, Θ
+) with respect to Θ + and the step of minimizing with respect to ((FIG. 8) reference). For details
of the auxiliary function method, for example, reference can be made to the following document.
H. Kameoka, N. Ono, and S. Sagayama, “Auxiliary functional approach to parameter estimation
of constrained sinusoidal model for monaural speech separation,” Proc. ICASSP, pp. 2932,2008.
[0036]
[C-3] Decomposition of unknowns by auxiliary function method ε imn includes terms of
subscripts which differ with respect to rn and tn. We use the following theorem to resolve them.
(Theorem 2) Under nn = 1 <N> an = B, Also, the equal sign holds in the case of Here, according to
Theorem 1, the following is considered as an auxiliary function of J (Θ). J ≦ J1 and the equal
sign holds at the following time. ここでさらに、 とすると、 とかける。 μimn <m> and μimn
<n> are so-called target values of | si−rm | and | si−rn | at the time of the next update, and the
error εimn included in the current estimation is equally divided, and It is equivalent to trying to
correct the value. In this case, the equal sign establishment is as follows.
[0037]
[C-4] Auxiliary Function of Absolute Value Function J1 can be solved analytically for tn, but is
difficult because si and rn still contain an absolute value symbol. Therefore, in order to replace
this in an easily distinguishable form, the following theorem is focused. (Theorem 3) The
following holds for any vector x and unit vector e, and any nonnegative real number a. The
equality condition establishment condition is a = 0 or e = x / | x |. Using this, with respect to the
right side of equation (25), since it holds, we can think of as an auxiliary function of J1 (Θ, μ).
J1 ≦ J2, and the equal sign holds at the following time. Since J2 can also be analytically solved
for si and rn, the desired J auxiliary function J2 can be obtained.
[0038]
04-05-2019
13
[C-5] Derivation of Iterated Solution Formula For si and rn, an update formula is derived by
partially differentiating J2. すなわち、 すなわち、
[0039]
For t n, partial differentiation of the expression of equation (20) of J 1, that is, where the sides of
equation (23) and equation (24) are added, An iterative solution equation was obtained.
[0040]
Here, assuming that each parameter si, rn, tn after the p-th iteration is si <(p)>, rn <(p)>, tn <(p)>,
the iterative solution formula obtained in the above discussion In summary, it is the next.
[0041]
[C-6] Parameter estimation algorithm With regard to the order of parameter calculation, 1)
calculate the auxiliary variable μ, 2) update t, 3) calculate the auxiliary variable e, 4) update s
and r, repeat and repeat I do.
More specifically, calculation of parameter estimation is performed in the following order.
<(p)> shows the p-th calculation result of repeatedly calculating s and r. Step 1: Calculate εimn
<(p)> according to equation (47). Step 2: μimn <m (p)>, μimn <n (p)> are calculated by the
equations (45) and (46). Step 3: tn <(p + 1)> is updated by equation (43). Step 4: eim <(p)>, ein
<(p)> are calculated by equation (44). Step 5: Update si <(p + 1)> by equation (41). Step 6: Update
rn (p + 1) according to equation (42). Step 7: Return to Step 1.
[0042]
[D] Evaluation Experiment [D-1] The result of a basic experiment conducted to verify whether
simultaneous estimation of the sound source position, microphone position, and time origin is
possible by minimizing Equation (11) is Show. 10Assuming a room of × 10 × 10 [m 3],
spherical wave propagation in an almost anechoic environment was simulated on a computer.
The number of sound sources was 8, the number of microphones was 9, and the position was
04-05-2019
14
determined by random numbers. The sound source signals were recorded from the sound of one
applause, and were assumed to be conditions in which a single sound source could be observed
without overlapping each other. The sampling frequency was 44,100 Hz, the signal length was
5.0 s, and a random time difference within 1.0 s was given to each observation signal as the
deviation of the time origin. Rough alignment is obtained for observation signals obtained by
simulation, and then divided into frames with a frame length of 100 ms (> D / c ≒ 50 ms), and
significant acoustic signals are included. The time difference was detected from the frame, the
initial value of each parameter was given by random numbers, and estimation by iterative
solution method was performed. The number of iterations was 60000.
[0043]
[D-2] Fig. 9 and Fig. 10 show xy coordinates plotted for position estimation of the microphone
and the sound source, respectively. From the observation signal whose time origin is unknown, it
can be seen that the microphone position and the sound source position are estimated almost
correctly. In addition, it was confirmed that the standard deviation of the estimation error of the
time origin was 1.0 [ms], and this was also estimated approximately correctly.
[0044]
[E] Consideration of the relationship between the number of sound sources and the number of
microphones In the above embodiment, the spatial positions of the sound sources and the
microphones and the recording time origin of each recording device are all estimated
simultaneously using only observation signals from unknown states. I argued that. In a practical
environment, it is possible that some of these unknowns may be known, and the following will be
divided to discuss the necessary conditions in the estimation of the present invention.
[0045]
[E-1] Case 1 (by means of the height of the sound source and the microphone) Consider case
separation based on the height information of the sound source and the microphone. If the
heights of the sound sources and the microphones are the same in a meeting or the like, it is
considered that they are effective cases because they can be sufficiently considered. K: number of
sound sources, L: number of microphones. Summarized below.
04-05-2019
15
[0046]
[E-2] Case 2 (When a Stereo Microphone is Used) A device using one or more sets of stereo
microphones such as an IC recorder is also considered. Most of the IC recorders and PC built-in
microphones are stereo microphones, and this case is considered to be very practical, and the
conditions can be alleviated significantly. When using q sets of stereo microphones, a stereo
microphone such as an IC recorder can know in advance the distance between the two
microphones, and because it is time synchronized between the two microphones, one pair Since
it is possible to reduce the degree of freedom by two, it is as described above. Summarized below.
[0047]
The present invention can be used as a basic technology for configuring an independent
recording device such as a PC built-in microphone or an IC recorder as a microphone array. More
specifically, it can be applied to sound source localization, sound source separation, noise
suppression by a microphone array. The invention also leads to new applications such as sound
security and monitoring coupled with networking. More specifically, sound source localization
can be used to exemplify a GPS-like system, localization of gunfire and explosions and the like.
04-05-2019
16
Документ
Категория
Без категории
Просмотров
0
Размер файла
31 Кб
Теги
jp2010212818
1/--страниц
Пожаловаться на содержимое документа