close

Вход

Забыли?

вход по аккаунту

?

JP2016194657

код для вставкиСкачать
Patent Translate
Powered by EPO and Google
Notice
This translation is machine-generated. It cannot be guaranteed that it is intelligible, accurate,
complete, reliable or fit for specific purposes. Critical decisions, such as commercially relevant or
financial decisions, should not be based on machine-translation output.
DESCRIPTION JP2016194657
Abstract: To effectively perform sound source separation even in a distributed microphone array
environment. SOLUTION: A process of estimating a sound source presence / absence probability
for each sound source in each microphone node based on a microphone node observation signal
and correction information, and co-occurrence relation of the sound source presence / absence
probability is modeled The parameters in the model are estimated so that the co-occurrence of L
becomes large, and the process of calculating correction information that corrects the sound
source existence posterior probability based on the parameters is repeated until the output value
converges, and finally the microphone node observation signal On the other hand, the sound
source signal is estimated by filtering using the sound source presence posterior probability or
the update information. [Selected figure] Figure 1
Sound source separation device, sound source separation method and sound source separation
program
[0001]
The present invention relates to a sound source separation device, a sound source separation
method, and a sound source separation program.
[0002]
When an acoustic signal is picked up in an environment where there are multiple target sound
sources, mixed signals in which the target signals overlap each other are often observed.
03-05-2019
1
At this time, if the target sound source of interest is a voice signal, the intelligibility of the target
sound is greatly reduced due to the influence of the other sound source signals superimposed on
the target signal.
[0003]
In addition, when other sound source signals are observed in a form in which other sound source
signals are superimposed on a target voice signal (hereinafter, target signal), it becomes difficult
to accurately extract the property of the target signal from the observation signal. The
recognition rate of speech recognition systems is also significantly reduced. Therefore, in order
to prevent a decrease in the recognition rate, it is necessary to separate a plurality of target
signals to recover the clarity of the target signals.
[0004]
The elemental technology that separates a plurality of target signals can be used in various
acoustic signal processing systems. For example, a hearing aid improves the intelligibility by
extracting a target signal from the sound collected in a real environment, a TV conference system
improves speech intelligibility by extracting a target signal, and a voice used in a real
environment It can be used for a recognition system, a machine-human interaction device in a
machine control interface, and the like.
[0005]
FIG. 7 shows a functional configuration of a conventional sound source separation apparatus
(see, for example, Non-Patent Document 1), and the operation thereof will be briefly described.
FIG. 7 is a diagram showing a conventional sound source separation apparatus. As shown in FIG.
7, the sound source separation device 50 includes an all microphone common sound source
presence posterior probability estimation unit 51 and a filtering unit 52.
[0006]
03-05-2019
2
The all-microphone common sound source presence posterior probability estimating unit 51 is a
plurality of channels of observation signals collected by a plurality of microphones from sound
source signals emitted from a plurality of sound sources and is a feature vector characterizing
each time frequency bin of each observation signal Is calculated and the feature vectors are
classified to calculate the sound source presence / absence probability which is the presence
probability of each sound source. The filtering unit 52 recovers the sound source signal by
multiplying the observation probability of the plurality of channels collected by the plurality of
microphones by the presence probability.
[0007]
H. Sawada, S. Araki, and S. Makino, “Underdetermined Convolutive Blind Source Separation via
Frequency Bin-Wise Clustering and Permutation Alignment,” IEEE Trans. Audio, Speech and
Lang. Process., Vol. 19, pp. 516-527, March 2011.
[0008]
However, in the conventional sound source separation technology, it is assumed that all the
microphones are densely arranged, and the situation in which the microphones are spatially
distributed (hereinafter, the distributed microphone array environment) is not assumed. The That
is, when a plurality of microphone nodes are arranged in a spatially largely dispersed form, the
sound pressure of a certain sound source observed at each microphone node is not equal. In the
extreme case, certain sound sources may also occur at substantially unobservable circumstances
at certain microphone nodes. In such a situation, it is appropriate to assume different sound
source presence probabilities (activity patterns) at each microphone node. The microphone node
refers to a microphone array composed of two or more microphones. For example, an IC recorder
having a plurality of microphones corresponds to one microphone node.
[0009]
However, in the conventional method, using all the observations obtained at all the microphone
nodes at the recording site, it is only possible to calculate the sound source presence probability
common to all the microphone nodes. In addition, it was possible to calculate the sound source
presence probability separately for each microphone node if the conventional method is applied
and processed independently for each microphone node, but in this case, it will be beneficial to
03-05-2019
3
be present between each microphone node Information is not effectively used, and as a result,
there is a problem that effective sound source separation can not be performed in the distributed
microphone array environment.
[0010]
The present invention has been made in view of such problems, and has an object of effectively
performing sound source separation even in a distributed microphone array environment.
[0011]
The sound source separation apparatus according to the present invention determines the sound
source presence / absence probability of each sound source in each microphone node based on
microphone node observation signals which are observation signals of a plurality of channels
obtained by collecting sound source signals emitted from a plurality of sound sources by a
plurality of microphones. A sound source presence posterior probability estimation unit for each
microphone node that estimates the sound source presence posterior probability based on
update information that is information for updating the sound source presence posterior
probability; Assuming that the sound source presence posterior probability for each sound
source co-occurs between microphone nodes, modeling the co-occurrence relation of the sound
source presence posterior probability, the co-occurrence of the sound source presence posterior
probability of each microphone node is large Parameters in the model are estimated so that the
updated information is calculated based on the parameters Between the inter-micronode sound
source existence a posteriori probability co-occurrence pattern detection unit, the update of the
sound source existence a posteriori probability in the individual microphone node sound source
existence posterior probability estimation unit, and the update in the inter-micronode sound
source existence posterior probability co-occurrence pattern detection unit A convergence
determination unit that repeatedly executes calculation of information until the sound source
presence posterior probability or the parameter converges, and filtering the microphone node
observation signal using the sound source presence posterior probability or the update
information, And an output sound estimation unit configured to estimate the sound source signal
of each of the sound sources.
[0012]
A sound source separation method according to the present invention is a sound source
separation method executed by a sound source separation device, and microphone node
observation which is observation signals of a plurality of channels obtained by collecting sound
source signals emitted from a plurality of sound sources by a plurality of microphones. A sound
source presence posterior probability for each sound source in each microphone node is
estimated based on a signal, and the sound source presence posterior probability is updated
03-05-2019
4
based on update information which is information for updating the sound source presence
posterior probability Assuming that the sound source presence posterior probability for each
sound source co-occurs in the same time-frequency bin between the sound source presence
posterior probability estimation step and the same time frequency bin, co-occurrence relation of
the sound source presence posterior probability is modeled, The parameter in the model is set so
that the co-occurrence of the sound source presence posterior probability of each microphone
node is increased. Between the microphone nodes, calculating the update information based on
the parameters, updating the sound source presence posterior probability in the microphone
node-specific sound source presence posterior probability estimating step, and A convergence
determination step of repeatedly executing calculation of the update information in the intermicronode sound source presence posterior probability co-occurrence pattern detection step
until the sound source presence posterior probability or the parameter converges; And an output
sound estimation step of estimating the sound source signal of each of the sound sources by
performing filtering using the sound source presence posterior probability or the update
information.
[0013]
According to the present invention, sound source separation can be performed effectively even in
a distributed microphone array environment.
[0014]
FIG. 1 is a diagram showing an outline of a configuration of a sound source separation device
according to the embodiment.
FIG. 2 is a block diagram showing a detailed configuration of the sound source separation device
according to the embodiment.
FIG. 3 is a diagram showing an acoustic environment in which the sound source separation
device according to the embodiment is used.
FIG. 4 is a diagram showing the sound source separation performance of the sound source
separation device according to the embodiment.
03-05-2019
5
FIG. 5 is a flowchart showing processing of the sound source separation device according to the
embodiment. FIG. 6 is a diagram showing a computer that executes a sound source separation
program. FIG. 7 is a diagram showing a conventional sound source separation apparatus.
[0015]
Hereinafter, embodiments of a sound source separation device, a sound source separation
method, and a sound source separation program according to the present application will be
described in detail based on the drawings. Note that the sound source separation device, the
sound source separation method, and the sound source separation program according to the
present application are not limited by this embodiment. First, modeling of the observed signal
will be described.
[0016]
[Modeling of Observation Signal] In modeling of observation signal, variables are first defined. I is
the total number of microphone nodes, J is the number of clusters in each microphone node, K is
the number of sound sources (J = K in this specification, but J and K may be different values), xi
is An observation feature of the i-th microphone node, x is a set of xi that summarizes the
observation features of all the microphone nodes, ni, j is a binary variable representing the
activity of the sound source corresponding to the j-th cluster of the i-th microphone node (1
means that the sound source is active, 0 means that the sound source is not active), n is a set of
ni and j, and ak is a variable representing potential sound source activity common to all the
microphone nodes (1 In the case where the sound source is active, 0 indicates that the sound
source is not active), a represents a set of ak.
[0017]
Since all the processes in the following description are performed independently for each
frequency bin, the frequency index is omitted for simplicity. When the conventional clusteringbased source separation (see, for example, Non-Patent Document 1) is applied to the i-th
microphone node observation signal xi (xi corresponds to a normalized observation vector), the
microphone node observation signal xi is It is represented by a mixed distribution type stochastic
model as shown in FIG.
03-05-2019
6
[0018]
[0019]
At this time, p (n i, j) in Equation (1) represents the prior probability that the j-th sound source
becomes active at the i-th node.
Further, p (xi | ni, j; θ <(n> i <)>) in equation (1) represents a distribution such as Watson
distribution, and θ <(n> i <)> is a parameter of distribution (Watson distribution) In the case of, it
corresponds to the average direction parameter and the density parameter, and in the case of the
Gaussian distribution, it corresponds to the mean, the variance, etc.). The p (ni, j | xi) obtained
after adjusting the distribution parameter to maximize the likelihood represented by this
equation is obtained when the information obtained from the microphone node other than the ith one is not used Is the sound source presence / absence probability of the j-th sound source at
the i-th node.
[0020]
On the other hand, in the embodiment, the probability model of the observation signal x (that is,
the likelihood p (x; θ) regarding the observation signal) is expressed as Expression (2).
[0021]
[0022]
The third stage of equation (2) is obtained under the assumption that the observation values x i
of the microphone nodes are independent.
According to the equation (2), the present invention provides a potential common to all the
microphone nodes newly in the part of the prior probability (ie, p (n, a; θ <(w)>) indicating the
activity of the sound source. It can be seen that the variable a representing the typical sound
source activity is added, and the prior probability is represented by the joint probability of the
sound source activity information n at each node and the potential sound source activity
03-05-2019
7
information a common to all nodes.
[0023]
The prior probabilities p (n, a; θ <(w)>) indicating the activity of the sound source can take
various forms, but here the co-occurrence of the source activity between the microphone nodes
(ie n 1 Restricted Boltzman machine (RBM: Restricted Boltzman) as shown in Equations (3) to (5)
so as to be a model focusing on (co-occurrence of (j, n, n2, j,. In the form of
[0024]
[0025]
Θ <(w)> in equation (3) represents the parameter {W i, b i, c} used in the RBM.
The restricted Boltzmann machine is a model capable of capturing the co-occurrence of
observation signals between nodes (corresponding to the sound source existence a posteriori
probability between nodes in the embodiment), such as being used for a collaborative filter.
In RBM, in general, the posterior probability of the value ak in the hidden layer given the input n
to the input layer, and the posterior probability of the value n in the input layer given the value
ak in the hidden layer Define and use it in the parameter estimation algorithm.
Those posterior probabilities are defined as in equations (6) to (8).
[0026]
[0027]
Before the detailed description of the embodiments, an overview of the idea of the invention will
be given.
03-05-2019
8
The present invention calculates a sound source presence / absence probability which is a filter
for sound source separation in each of the microphone nodes. In the prior art, it was not possible
to incorporate information from other microphone nodes to calculate this value.
[0028]
However, in the proposed method, if a sound source activity pattern that exchanges information
between microphone nodes and co-occurs with the sound source activity observed in a certain
microphone node is observed in another microphone node, the co-occurrence occurs The
parameter estimation proceeds in the inter-micronode sound source presence a posteriori
probability co-occurrence pattern detection unit 12 so that As a result, if a certain sound source
is observed by a plurality of microphone nodes, the existence posterior probabilities of the sound
source are adjusted in parameters so as to increase co-occurrence with each other, and more
accurate estimation becomes possible.
[0029]
For example, in the same time frequency bin of microphone nodes 1, 2 and 3, it is often assumed
that a posterior probability for a certain sound source is co-occurring. Under such circumstances,
it is assumed that in a certain time frequency bin, a co-occurrence relation is confirmed only for
the microphone nodes 1 and 2 with respect to the sound source, and no co-occurrence occurs in
the microphone node 3. Then, the estimated value at this time-frequency bin of the microphone
node 3 has a high probability of being erroneous.
[0030]
Such errors are eliminated by estimating the parameters to increase the co-occurrence of the
presence a posteriori probability for this source at microphone nodes 1, 2 and 3. Conversely, if
the same sound source is active only at the microphone node 1 and not active at the microphone
nodes 2 and 3, it is highly likely that the same sound source is not active at that time frequency
bin. In such a case as well, the co-occurrence of "not active" is increased, whereby the error of the
microphone node 1 is corrected. Specific procedures for learning parameters to increase the cooccurrence of sound source presence / absence probabilities between microphone nodes will be
described in detail in the description of the embodiment.
03-05-2019
9
[0031]
Embodiment The configuration of a sound source separation apparatus according to the
embodiment will be described with reference to FIG. FIG. 1 is a diagram showing an outline of a
configuration of a sound source separation device according to the embodiment. The sound
source separation device 10 includes a sound source presence / absence probability estimation
unit 11 classified by microphone node, a sound source present posterior probability cooccurrence pattern detection unit 12 between microphone nodes, a convergence determination
unit 13, and an output sound estimation unit 14.
[0032]
As shown in FIG. 1, the sound source separation device 10 receives, as input, observation signals
of a plurality of channels in which sound source signals emitted from a plurality of sound sources
are collected by a plurality of microphones. In particular, when describing the processing in the
sound source separation device 10, the observation signals of a plurality of channels input may
be referred to as microphone node observation signals.
[0033]
Based on microphone node observation signals, which are observation signals of a plurality of
channels, in which sound source signals emitted from a plurality of sound sources are collected
by a plurality of microphones, sound sources related to each sound source in each microphone
node The sound source presence posterior probability is updated based on update information
which is information for updating the sound source presence posterior probability.
[0034]
For example, the microphone node-specific sound source presence posterior probability
estimating unit 11 calculates the sound source signals emitted from the plurality of sound
sources in the time frame t, which are microphone node observation signals, based on the
observation feature amount xi The a posteriori probability p (ni, j) of the j-th sound source at the
i-th microphone node which is the sound source existence posterior probability by estimating p
(xi | ni; θ <(n> i <)>) of (2) Estimate j j xi) and reestimate p (xi | ni; θ <(n> i <)>) based on the
updated information so that the likelihood p (x; θ) of the observed signal is maximized. Thus, the
a posteriori probability p (ni, j | xi) of the j-th acoustic source at the i-th microphone node, which
03-05-2019
10
is the acoustic source existence posterior probability, is updated.
[0035]
The inter-micronode sound source presence posterior probability co-occurrence pattern
detection unit 12 assumes that the sound source presence posterior probability of each sound
source between the microphone nodes co-occurs in the same time frequency bin, and the cooccurrence relation of the sound source presence posterior probability is Modeling is performed,
parameters in the model are estimated so that the co-occurrence of the sound source presence /
absence probability of each microphone node becomes large, and update information is
calculated based on the parameters.
[0036]
For example, the inter-micronode sound source presence posterior probability co-occurrence
pattern detection unit 12 detects p (ni, j xi xi, t 2) for all time frames t of sound source presence
posterior probabilities for all i, all j, and all t. The simultaneous probability of the sound source
activity information n at each microphone node and the potential sound source activity
information a common to all the nodes using a set of, and the prior probability p (n, a; θ <( w)>)
is maximized, θ <(w)>, which is a parameter of the model indicating the co-occurrence of the
sound source presence posterior probability, is estimated, and the update information is
calculated.
[0037]
The convergence determination unit 13 updates the sound source existence posterior probability
in the microphone node-specific sound source existence posterior probability estimation unit 11
and calculates the update information in the inter-micronode sound source existence posterior
probability co-occurrence pattern detection unit 12 as the sound source existence posterior
probability or parameter Make it repeat until it converges.
[0038]
The output sound estimation unit 14 estimates the sound source signal of each sound source by
filtering the microphone node observation signal using the sound source presence posterior
probability or the update information.
[0039]
03-05-2019
11
Next, each part of the sound source separation device 10 will be described in detail with
reference to FIG.
FIG. 2 is a block diagram showing a detailed configuration of the sound source separation device
according to the embodiment.
The sound source separation device 10 receives microphone node observation signals from a
plurality of microphone nodes 20, estimates a sound source image of each sound source, and
outputs the image to the output device 21 or the like.
The sound source separation device 10 may output the estimated sound source image to an
output device such as a speaker, or may output and store the estimated sound source image in a
storage device or the like.
[0040]
As shown in FIG. 2, the sound source separation apparatus 10 includes a sound source presence
posterior probability estimation unit 11 classified by microphone node, a sound source existence
posterior probability co-occurrence pattern detection unit 12 between microphone nodes, a
convergence determination unit 13, and an output sound estimation unit 14, an input unit 15,
and an output unit 16.
Further, the microphone node specific sound source presence posterior probability estimation
unit 11 has a first sound source presence posterior probability initial value calculation unit 111
and a first sound source presence posterior probability update unit 112.
Further, the inter-micronode sound source presence posterior probability co-occurrence pattern
detection unit 12 includes a co-occurrence relation model parameter calculation unit 121 and a
second sound source existence posterior probability calculation unit 122.
[0041]
03-05-2019
12
First, observation signals in which sound source signals emitted from a plurality of sound sources
are collected by a plurality of microphone nodes 20 are input to the input unit 15.
Then, the first sound source existence posterior probability initial value calculation unit 111 can
be obtained using information obtained from each microphone node using observation signals
obtained by collecting sound source signals emitted from a plurality of sound sources by a
plurality of microphone nodes. The first sound source presence posterior probability which is the
probability that each sound source is active is calculated.
[0042]
Next, the co-occurrence relationship model parameter calculation unit 121 models the cooccurrence relationship between the first sound source presence posterior probabilities of the
microphone nodes, calculates the parameters of the model so as to increase the co-occurrence
relationship, and has already calculated If the parameter exists, update it to the latest parameter.
[0043]
Furthermore, the second sound source presence posterior probability calculation unit 122
calculates a second sound source presence posterior probability which is a sound source
presence posterior probability using information obtained from the plurality of microphone
nodes, using the parameter.
Then, the first sound source existence posterior probability updating unit 112 updates the first
sound source existence posterior probability using the second sound source existence posterior
probability.
[0044]
Here, the convergence determination unit 13 determines whether the update amount in the first
sound source presence posterior probability update unit 112 and the co-occurrence relationship
model parameter calculation unit 121 is equal to or less than a predetermined threshold, and the
update amount is a predetermined amount. If not, the processing in the first sound source
presence posterior probability updating unit 112 and the co-occurrence relation model
parameter calculating unit 121 is repeatedly executed until the update amount becomes equal to
or less than a predetermined threshold.
03-05-2019
13
[0045]
Finally, when the convergence determination unit 13 determines that the update amount is equal
to or less than a predetermined threshold, the output sound estimation unit 141 performs
filtering on the observation signal using the second sound source existence posterior probability,
Estimate the sound source image for each sound source.
The processing in each part will be described below.
[0046]
[Process in mic-source-specific sound source presence posterior probability estimation unit 11
(calculation of initial value)] First, mic-source-specific sound source presence posterior
probability estimation before processing in the inter-micronode sound source presence posterior
probability co-occurrence pattern detection unit 12 The process in the unit 11 will be described.
At this time, the microphone node specific sound source presence posterior probability
estimation unit 11 calculates an initial value of the first sound source presence posterior
probability. A process of updating the first sound source presence posterior probability using the
correction information output from the inter-micronode sound source presence posterior
probability co-occurrence pattern detection unit 12 will be described later.
[0047]
First, the first sound source presence posterior probability initial value calculation unit 111 of
the microphone node-specific sound source existence posterior probability estimation unit 11
calculates the observed feature amount xi where sound source signals emitted from a plurality of
sound sources are collected by the i-th microphone node Using Equation (1), calculate the
existence posterior probability p (ni, j | xi) of the j-th sound source at the i-th microphone node.
Specifically, the first sound source existence posterior probability initial value calculation unit
111 estimates the distribution parameter θ <(n> i <)> by maximum likelihood estimation so as to
maximize the value of equation (1). Calculate the initial value. It is known that maximum
likelihood estimation of the mixed distribution parameter of equation (1) can be performed using
an expectation value maximization algorithm, in which p (n i, j | x i) is calculated.
03-05-2019
14
[0048]
[Process in Microphone Inter-Micronode Sound Source Presence Posterior Probability Cooccurrence Pattern Detection Unit 12] Next, the co-occurrence relation model parameter
calculation unit 121 of the inter-micronode sound source existence posterior probability cooccurrence pattern detection unit 12 is obtained as described above. A set of first sound source
existence posterior probabilities, that is, all i (microphone node index), all j (cluster index at each
microphone node), p (ni, j | xj, t) (x for all time frames t The time frame index t is added to model
(learn) the co-occurrence relation of each posterior probability. Specifically, the co-occurrence
relation model parameter calculation unit 121 determines that the parameter {W i, bi, c} of the
RBM represented by the equation (4) or the like is p (n, a; θ <(w)>) is the maximum. Learn to be
This learning is generally performed by the steepest descent method using contrastive
divergence (Reference 1: G. E. Hinton, “A practical guide to training restricted Boltzmann
machines,” Univ. Of Toronto, Toronto, ON, Canada, Tech. Rep., 2010. Is used. Here, the gradient
of each parameter {W i, b i, c} to be estimated by the steepest descent method is calculated by
Equations (9) to (11).
[0049]
[0050]
Then, after calculating the gradients, the co-occurrence relationship model parameter calculation
unit 121 updates each parameter according to Equations (12) to (14) according to the normal
steepest descent method.
[0051]
[0052]
Here, μ is a step size for parameter update, and a relatively small value such as 0.0001 is used.
Further, as described later, the co-occurrence relationship model parameter calculation unit 121
repeats the calculations of the equations (12) to (14) until the parameter update amount
becomes sufficiently small by being controlled by the convergence determination unit 13 .
03-05-2019
15
In addition, n ^ and n ~ in the equations (9) to (11) representing gradient calculation of each
parameter are calculated as follows each time the calculation is repeated.
[0053]
[Calculation of n ^ t, i, j] <Procedure a1> First, p (n i, j = 1 | x j) is calculated using conventional
clustering-based sound source separation or the like.
<Procedure a2> Next, the initial value of n ^ t, i, j is sampled from p (ni, j | xj, t). Here, a specific
processing example of sampling will be described. First, for the clusters 1 to J, the posterior
probability that the observation feature quantity x ^ t, i of the node i belongs to the cluster j at
time t is calculated. At this time, it will be 1 if the values of the posterior probability from 1 to J
are summed. Next, sections 0 to 1 are divided based on these posterior probabilities. For
example, if the attribution posterior probabilities calculated in clusters 1, 2 and 3 are respectively
0.1, 0.7 and 0.2, the sections 0 to 1 are divided into [0.0 0.1), [0.1 0.8) and [0.8 1.0]. , Each
section is associated with each cluster. After that, one random number in the range of 1 to 0 is
generated, and to which section the random number belongs is detected. Let n ^ t, i, j of the class
corresponding to the corresponding section be 1 and n ^ t, i, j in the same other microphone
node be 0. <Procedure a3> The following (a3.1) and (a3.2) are repeated a predetermined number
of times. (1 in this embodiment) (a3.1) Based on n ^ t, i, j currently calculated, a ^ t, k is
calculated using equation (6). (A3.2) a ^ t, k and xt, i (microphone node i, observed feature of
time t) and p (xi | ni, j = 1) estimated by the sound source presence posterior probability
estimation unit 11 for each microphone node On the basis of), n ^ t, i, j is calculated using the
equations (7) and (8). <Procedure a4> The equations (9) to (11) are calculated using n ^ t, i, j
calculated in the procedure a3.
[0054]
[Calculation of n t, i, j] <Procedure b1> The initial value of n t, i, j is sampled from p (n i, j | x j, t).
(The specific processing example is the same as the procedure a2.) <Procedure b2> The following
(b3.1) and (b3.2) are repeated a predetermined number of times. (1 in this embodiment) (b3.1)
Based on n t, i, j currently obtained, calculate a t, k using equation (6). (B3.2) Calculate n ~ t, i, j
based on a ~ t, k using equation (7). <Procedure b3> The equations (9) to (11) are calculated
using n ~t, i, j calculated in the procedure b3.
03-05-2019
16
[0055]
Then, the second sound source presence posterior probability calculation unit 122 of the intermicronode sound source presence posterior probability co-occurrence pattern detection unit 12
updates the obtained parameters {W i, bi, c} based on the equation (8). Calculate a certain second
sound source existence posterior probability p (ni, j = 1 | a ^ t, xt).
[0056]
[Process in mic-source-dependent sound source presence posterior probability estimation unit 11
(processing after initial value calculation)] The first sound source presence posterior probability
update unit 112 in the mic-source-specific sound source existence posterior probability
estimation unit 11 By using the second sound source presence posterior probability p (ni, j = 1 |
a ^ t, xt) which is the update information obtained by the posterior probability co-occurrence
pattern detection unit 12, the equation (2) is maximized. Update the distribution parameters of p
(xi | ni; θ <(n> i <)>).
Below, an example of the update method is shown.
[0057]
First, the first sound source presence posterior probability updating unit 112 expresses p (x i | n
i; θ <(n> i <)>) in Expression (2) as Expression (15).
[0058]
[0059]
Formula (15) represents p (x i | n i; θ <(n> i <)>) as a function of a general family of exponential
distributions.
Here, the gradient with respect to θ <(n> i <)> of the logarithm of the likelihood equation of
equation (15) (log likelihood function) is as shown in equation (16) below.
03-05-2019
17
[0060]
[0061]
At this time, the first sound source presence posterior probability updating unit 112
approximately determines p (n, a | x) as shown in the following equation (17).
[0062]
[0063]
The value of the equation (17) is the second sound source presence posterior probability p
calculated based on the equation (8) obtained at the final stage of the processing in the intermicronode sound source presence posterior probability co-occurrence pattern detection unit 12
in the previous stage It corresponds to the value obtained by averaging (ni, j = 1 | a ^ t, xt) for all
time frames t.
Finally, the first sound source existence a posteriori probability updating unit 112 places the
following equation (18) such that the value of equation (17) becomes 0, and solves the equation
to obtain θ <(n> i < Calculate the value of)>.
[0064]
[0065]
If the value of θ <(n> i <)> in equation (18) is calculated, it is possible to calculate the first sound
source existence posterior probability p (ni, j | xi) again, and the value is If output to the intersound source existence a posteriori probability co-occurrence pattern detection unit 12, the
inter-micronode sound source existence a posteriori probability co-occurrence pattern detection
unit 12 performs update processing of the parameter {W i, bi, c} again.
[0066]
03-05-2019
18
[Processing in Convergence Determination Unit 13] The convergence determination unit 13
repeatedly performs the processing in the first sound source existence posterior probability
update unit 112, the co-occurrence relation model parameter calculation unit 121 and the
second sound source existence posterior probability calculation unit 122, The parameter θ <(n>
i <)> of the microphone node-specific sound source presence posterior probability estimation unit
11 of (18) or the parameter θ <(w) of the inter-micronode sound source presence posterior
probability co-occurrence pattern detection unit 12 of equation (2) It is determined that
convergence has occurred when the update amount of> has become equal to or less than a
predetermined threshold value, and control is performed so as to end the repetition.
Alternatively, it may be determined that the convergence has occurred when the likelihood
shown in equation (2) has become a sufficiently large value.
[0067]
[Evaluation Experiment] An evaluation experiment was conducted for the purpose of evaluating
the performance of the sound source separation device according to the embodiment.
The experimental conditions were as follows.
FIG. 3 shows the acoustic environment used for the simulation.
FIG. 3 is a diagram showing an acoustic environment in which the sound source separation
device according to the embodiment is used.
The size of the room was 10 m (W) x 5 m (D) x 5 m (H), and the reverberation time was four
conditions of 0.2, 0.4, 0.6, and 0.8 seconds. This acoustic environment is a mirror image method
(Ref. 2: J. B. Allen and D. A. Berkeley, "Image method for efficiently simulating small-room
acoustics," J. Acoust. Soc. Am., Vol. 65 (4), pp. 943-950, 1979. It simulated using).
[0068]
03-05-2019
19
Also, in order to simulate an environment with background noise, white noise was generated on a
computer and added to the signal so that the SN ratio would be 10 dB, and an observation signal
was created. There are six speakers, and three out of six speakers are equally spaced at 80 cm
radius on the left side of the room, and the other three people equally spaced at 80 cm radius. I
opened it and sat on the right side of the room, assuming that everyone was talking at the same
time. This simulates the conversation situation in a meeting room or restaurant. As a sound
collecting device, as shown in FIG. 3, it is assumed that there are two microphone nodes each
consisting of three microphones.
[0069]
The conventional method to be compared with the present invention is the method shown in
Non-Patent Document 1 in which sound source separation using a soft mask is performed
assuming a common sound source presence posterior probability in all the microphones. Signalto-interference ratio (SIR) indicating the sound source separation performance was used as an
evaluation index. As for the sound source separation performance, the larger the value of SIR, the
better the performance is. As evaluation speech, TIMIT (Reference Document 3: W. Fisher, G. R.
Doddington, and K. M. Goudie-Marshall, "The DARPA speech recognition research database:
specifications and status," in Proc. DARPA workshop on Speech Recognition, 7986, pp. 96-99. A
total of 20 different mixed voices were prepared in each acoustic environment using voices
randomly extracted from the above), and the results were calculated as their average value.
[0070]
FIG. 4 shows the result of the evaluation experiment. FIG. 4 is a diagram showing the sound
source separation performance of the sound source separation device according to the
embodiment. The horizontal axis represents the reverberation time, and the vertical axis
represents the SIR value, that is, the sound source separation performance (dB). In a total
reverberant environment, the present invention has been shown to achieve higher performance
than the prior art. As described above, according to the sound source separation device of the
present invention, it has been confirmed that sound source separation can be efficiently
performed even in a distributed microphone array environment.
[0071]
[Processing in Output Sound Estimating Unit 14] If the convergence judging unit 13 determines
03-05-2019
20
that the update amount has converged, the output sound estimating unit 14 performs filtering
using the second sound source presence posterior probability, and the respective sound sources
Estimate the sound source image.
[0072]
[Flow of Processing of Embodiment] The flow of processing of the sound source separation
device 10 according to the embodiment will be described with reference to FIG.
FIG. 5 is a flowchart showing processing of the sound source separation device according to the
embodiment. First, the microphone node-specific sound source presence posterior probability
estimation unit 11 calculates an initial value of the first sound source existence posterior
probability (step S101). Next, the inter-micronode sound source presence posterior probability
co-occurrence pattern detection unit 12 calculates a parameter for modeling the co-occurrence
relation of the first sound source existence posterior probability, and the existing parameter
already calculated exists Updates the existing parameters (step S102). Then, the inter-micronode
sound source presence posterior probability co-occurrence pattern detection unit 12 calculates a
second sound source presence posterior probability based on the calculated parameter (step
S103).
[0073]
Here, when the convergence determination unit 13 determines that each update amount is not
equal to or less than the threshold (No in step S104), the microphone node-specific sound source
presence posterior probability estimation unit 11 determines the second sound source existence
posterior probability based on the second sound source existence posterior probability. One
sound source existence posterior probability is updated (step S105). Then, the inter-micronode
sound source presence posterior probability co-occurrence pattern detection unit 12 performs
the process again using the updated first sound source existence posterior probability.
[0074]
On the other hand, when the convergence determination unit 13 determines that each update
amount is equal to or less than the threshold (Yes in step S104), the inter-micronode sound
03-05-2019
21
source presence posterior probability co-occurrence pattern detection unit 12 detects the second
sound source existence The posterior probability is output to the output sound estimation unit
14 (step S106). Finally, the output sound estimation unit 14 performs sound source separation
using the sound source presence posterior probability for each time as a filter (step S107).
[0075]
[Effects of the Embodiment] First, the sound source separation device 10 generates each sound
source in each microphone node based on microphone node observation signals which are
observation signals of a plurality of channels obtained by collecting sound source signals emitted
from a plurality of sound sources by a plurality of microphones. The sound source presence
posterior probability is estimated, and the sound source presence posterior probability is updated
on the basis of update information which is information for updating the sound source presence
posterior probability. The sound source separation apparatus 10 then models co-occurrence
relation of the sound source existence posterior probability, assuming that the sound source
existence posterior probability regarding each sound source co-occurs in the same time
frequency bin, and The parameters in the model are estimated so that the co-occurrence of the
sound source presence posterior probability becomes large, and the update information is
calculated based on the parameters. Further, the sound source separation device 10 updates the
sound source presence posterior probability in the sound source presence posterior probability
estimation unit for each sound microphone node and calculates the update information in the
sound source presence posterior probability co-occurrence pattern detection unit between
microphone nodes, Execute repeatedly until the parameters converge. Finally, the sound source
separation device 10 estimates the sound source signal of each sound source by filtering the
microphone node observation signal using the sound source presence posterior probability or
the update information.
[0076]
As a result, it is possible to create a model of the sound source presence posterior probability in
consideration of co-occurrence and to use information obtained from a plurality of microphone
nodes for sound source separation. As a result, sound source separation can be effectively
performed even in a distributed microphone array environment.
[0077]
03-05-2019
22
[Device Configuration, Etc.] When the processing means in the voice separation device 10 is
implemented by a computer, the processing content of the function that each device should have
is described by a program. Then, by executing this program on a computer, the processing means
in each device is realized on the computer.
[0078]
Although the method using the contrastive divergence method has been described for the
purpose of efficiently estimating the RBM parameters, the present invention is not limited to this
embodiment. Although the method of setting the value of equation (16) to zero for the estimation
of the distribution parameter in the microphone node-specific sound source existence posterior
probability estimation unit 11 has been described, the present invention is not limited to this
embodiment. For example, even if using an all combination search method for searching all
combinations of all parameters in order to maximize the value of equation (2), it is included in
the scope of the technical idea of the present invention.
[0079]
The program describing the processing content can be recorded in a computer readable
recording medium. As the computer readable recording medium, any medium such as a magnetic
recording device, an optical disc, a magneto-optical recording medium, a semiconductor memory,
etc. may be used. Specifically, for example, as a magnetic recording device, a hard disk device, a
flexible disk, a magnetic tape or the like as an optical disk, a DVD (Digital Versatile Disc), a DVDRAM (Random Access Memory), a CD-ROM (Compact Disc Read Only) Memory), CD-R
(Recordable) / RW (Rewritable), etc., magneto-optical recording medium, MO (Magneto Optical
Disc), etc., semiconductor memory EEP-ROM (Electronically Erasable and Programmable-Read
Only Memory), etc. be able to.
[0080]
Further, this program is distributed, for example, by selling, transferring, lending, etc. a portable
recording medium such as a DVD, a CD-ROM or the like in which the program is recorded.
Furthermore, the program may be stored in a storage device of a server computer, and the
program may be distributed by transferring the program from the server computer to another
03-05-2019
23
computer via a network.
[0081]
Further, each means may be configured by executing a predetermined program on a computer,
or at least a part of the processing content may be realized as hardware.
[0082]
The sound source separation device 10 can physically and virtually disperse each function
(device) to be contained, and in that case, each function (device) in both devices may be
dispersed as one unit. .
Further, for example, the convergence determination unit 13 can be omitted, and may be
incorporated in the microphone node-specific sound source presence posterior probability
estimation unit 11 or the inter-micronode sound source presence posterior probability cooccurrence pattern detection unit 12. In addition, each unit in each device may be configured to
be incorporated in another device to the extent that it effectively functions.
[0083]
[Program] It is also possible to create a program in which the process executed by the sound
source separation device 10 according to the above embodiment is described in a language that
can be executed by a computer. In this case, when the computer executes the program, the same
effect as that of the above embodiment can be obtained. Furthermore, such a program may be
recorded in a computer readable recording medium, and the computer may read and execute the
program recorded in the recording medium to realize the same processing as that of the above
embodiment. Hereinafter, an example of a computer that executes a sound source separation
program that realizes the same function as that of the sound source separation device 10 will be
described.
[0084]
FIG. 6 is a diagram showing a computer that executes a sound source separation program. As
03-05-2019
24
shown in FIG. 6, the computer 1000 includes, for example, a memory 1010, a CPU (Central
Processing Unit) 1020, a hard disk drive interface 1030, a disk drive interface 1040, a serial port
interface 1050, a video adapter 1060, and a network. And an interface 1070. These units are
connected by a bus 1080.
[0085]
The memory 1010 includes a read only memory (ROM) 1011 and a random access memory
(RAM) 1012. The ROM 1011 stores, for example, a boot program such as a BIOS (Basic Input
Output System). The hard disk drive interface 1030 is connected to the hard disk drive 1090.
Disk drive interface 1040 is connected to disk drive 1100. For example, a removable storage
medium such as a magnetic disk or an optical disk is inserted into the disk drive 1100. For
example, a mouse 1110 and a keyboard 1120 are connected to the serial port interface 1050.
For example, a display 1130 is connected to the video adapter 1060.
[0086]
Here, as shown in FIG. 6, the hard disk drive 1090 stores, for example, an OS 1091, an
application program 1092, a program module 1093, and program data 1094. Each table
described in the above embodiment is stored, for example, in the hard disk drive 1090 or the
memory 1010.
[0087]
The sound source separation program is stored in the hard disk drive 1090 as a program module
in which an instruction to be executed by the computer 1000 is described, for example.
Specifically, a program module in which each process executed by the sound source separation
device 10 described in the above embodiment is described is stored in the hard disk drive 1090.
[0088]
Also, data used for information processing by the sound source separation program is stored as
program data in, for example, the hard disk drive 1090. Then, the CPU 1020 reads the program
03-05-2019
25
module 1093 and the program data 1094 stored in the hard disk drive 1090 into the RAM 1012
as necessary, and executes the above-described procedures.
[0089]
The program module 1093 and the program data 1094 related to the sound source separation
program are not limited to being stored in the hard disk drive 1090, for example, stored in a
removable storage medium and read by the CPU 1020 via the disk drive 1100 or the like. It may
be issued. Alternatively, the program module 1093 relating to the sound source separation
program and the program data 1094 may be stored in another computer connected via a
network such as a local area network (LAN) or a wide area network (WAN). It may be read by the
CPU 1020.
[0090]
DESCRIPTION OF REFERENCE NUMERALS 10 sound source separation device 11 sound source
existence posterior probability estimation unit 12 for each microphone node sound source
existence a posteriori probability coexistence pattern detection unit 13 convergence
determination unit 14 output sound estimation unit 15 input unit 20 microphone node 21
output device
03-05-2019
26
Документ
Категория
Без категории
Просмотров
0
Размер файла
41 Кб
Теги
jp2016194657
1/--страниц
Пожаловаться на содержимое документа