close

Вход

Забыли?

вход по аккаунту

?

JP2011139409

код для вставкиСкачать
Patent Translate
Powered by EPO and Google
Notice
This translation is machine-generated. It cannot be guaranteed that it is intelligible, accurate,
complete, reliable or fit for specific purposes. Critical decisions, such as commercially relevant or
financial decisions, should not be based on machine-translation output.
DESCRIPTION JP2011139409
PROBLEM TO BE SOLVED: To provide an acoustic signal processing device, an acoustic signal
processing method, and a computer program capable of obtaining reliability of an estimated
sound source direction. An acoustic signal processing device (1) includes two microphones (2, 2),
an amplifier (3, 3) separately connected to the microphones (2, 2), and an A / D converter
connected to each of the amplifiers (3, 3). 4, 4, the CPU 5 connected to the A / D converter 4, 4,
and the ROM 51 and the RAM 52 connected to the CPU 5. The CPU 51 frames the A / D
converted acoustic signal, acquires the sound space feature amount from the framed sound
signal, estimates the sound source direction of the target sound based on the sound space feature
amount, and calculates the sound space feature amount. The reliability of the sound source
direction is estimated by acquiring higher order statistics of third order or higher. [Selected
figure] Figure 1
Acoustic signal processing apparatus, acoustic signal processing method, and computer program
[0001]
The present invention relates to an acoustic signal processing apparatus and an acoustic signal
processing method for estimating the reliability of a sound source direction estimated from an
observed sound signal, and a computer program for causing a computer to estimate the
reliability of a sound source direction.
[0002]
The sound source direction is important information in multi-channel acoustic signal processing.
04-05-2019
1
Conventionally, the sound source direction is estimated by various methods, and is used, for
example, in sound processing techniques such as separation of a plurality of sound sources, noise
removal, dereverberation, and voice section detection.
[0003]
There are many different noise sources and reverberations in the real environment, which
change from moment to moment. These disturbances cause unnecessary distortion to the
observation signal and distort the sound space feature quantity used for sound source direction
estimation, thereby reducing the estimation accuracy of the sound source direction. For these
reasons, it is difficult to accurately estimate the sound source direction. Therefore, a method of
removing the noise component from the observed signal to estimate the sound source direction
(see Non-Patent Document 1), using the feature of the target signal (the sound signal which is the
target of the sound source direction estimation) or the feature of the noise It is possible to
estimate the sound source direction with high accuracy in a real environment, such as the
method of enhancing the noise resistance of the sound space feature quantity, which is
information indicating the spatial feature of, and estimating the sound source direction Methods
are being developed.
[0004]
S. F. Boll, "Suppression of acoustic noise in speech using spectral subtraction," IEEE Trans.
Acoust., Speech, and Signal Process., Vol. 27, no. 2, pp. 113-120, 1979. M. Brandstein, "On the
use of explicit speech modeling in microphone array applications," Proc. Intl. Conf. On Acoust.,
Speech, and Signal Process. (ICASSP '98), pp. 613-616, 1998. M. Mizumachi and K. Niyada, "DOA
Estimation Based on Cross-Correlation with FrequencySelectivity," RISP Journal of Signal
Process., Vol. 11, No. 1, pp. 43-50,2007.
[0005]
However, these conventional source direction estimation methods are limited in that they require
some prior knowledge about the target signal or noise. For example, in the methods disclosed in
Non-Patent Documents 1 and 3, the power spectrum of noise needs to be known in advance or
04-05-2019
2
can be estimated. Further, in the method disclosed in Non-Patent Document 2, it is necessary to
make the target signal speech, and it is also necessary to be able to know or estimate the
fundamental frequency of the speech (physical quantity corresponding to the height of the voice)
There is. Therefore, it is not possible to estimate the sound source direction with high accuracy
unless the environment can acquire such prior knowledge. As described above, since the
estimated sound source direction may or may not be correct, separation of multiple sound
sources, noise removal, dereverberation, voice segment detection, etc. using the sound source
direction estimated by the conventional method When performing sound processing, the wrong
direction may be set as the sound source direction, and an appropriate processing result may not
be obtained.
[0006]
The present invention has been made in view of such circumstances, and its main object is to
provide an acoustic signal processing apparatus, an acoustic signal processing method, and a
computer program capable of obtaining the reliability of the estimated sound source direction. It
is.
[0007]
In order to solve the problems described above, an acoustic signal processing apparatus
according to one aspect of the present invention includes a plurality of microphones that capture
an acoustic signal including a target acoustic signal emitted from a sound source and output an
acoustic signal indicating the acoustic signal. Sound space feature acquisition means for
acquiring sound space feature quantities related to features of the sound space based on the
sound signals output from the plurality of microphones, and sound space features acquired by
the sound space feature quantity acquisition means Sound source direction estimation means for
estimating the sound source direction of the target sound based on the amount, and high-order
statistics for obtaining third-order or higher order statistics of the sound space feature acquired
by the sound space feature acquired. Reliability acquisition means for estimating the reliability of
the sound source direction estimated by the sound source direction estimation means based on
the amount acquisition means and the higher order statistics acquired by the higher order
statistic acquisition means; Provided.
[0008]
In this aspect, it is preferable that the high-order statistic acquiring unit is configured to acquire
the high-order statistic indicating kurtosis in a graph indicating a distribution state of the sound
space feature quantity in the space. .
04-05-2019
3
[0009]
Further, in the above aspect, the sound space feature amount acquiring unit acquires a frequency
estimated to be less affected by noise in the sound, and performs band limitation by a band pass
filter centered on the acquired frequency. Preferably, it is configured to extract sound space
features from the acoustic signal.
[0010]
Further, in the above aspect, the sound space feature quantity acquiring unit is a sound space
feature quantity extracted from the sound signal subjected to the band limitation and framed the
sound signal output over time from the microphone for each predetermined time. Using the
particle filter based on the dynamic characteristics model of the sound space feature that
indicates the change of the sound source direction between the adjacent frames using the as a
likelihood, the state of the sound space feature in the frame one time before the target frame
Preferably, it is configured to estimate the sound space feature of the frame.
[0011]
Further, in the above aspect, the sound space feature value acquiring means is an initial particle
distribution setting means for uniformly arranging a plurality of particles having the same weight
in the space, and the dynamic characteristic represented by the equation (1) By generating a
particle {に よ り k <(l)>} l = 1 <M> having a weight {wk <(l)>} l = 1 <M> represented by equation
(2) according to the model: Prior distribution acquisition means for acquiring the prior
distribution of particles at time k, and particles in the prior distribution acquired by the prior
distribution acquisition means are divided into a number whose weight is more than a
predetermined value according to the weight Preferably, sound space feature quantity estimation
means for estimating the sound space feature quantity at time k is provided by setting the weight
less than the predetermined value to 0.
However, xk is an acoustic signal at time k, Θk is a noise following a Gaussian distribution with a
mean 0 at time k, and kk is a Gaussian distribution of variance σ <2> at time k, N is a Gaussian
distribution, and l is M represents the particle number.
[0012]
Further, in the above aspect, it is preferable that the high-order statistic acquiring unit is
configured to acquire high-order statistics about the weight w <(l)> as the sound space feature
quantity.
04-05-2019
4
[0013]
In the above aspect, the high-order statistic acquiring unit is configured to acquire the high-order
statistic Skewness shown in the equation (3) for the sound space feature quantity w <(l)>. Is
preferred.
[0014]
In the above aspect, the high-order statistic acquiring unit is configured to acquire the high-order
statistic Kurtosis shown in the equation (4) for the sound space feature quantity w <(l)>. Is
preferred.
[0015]
Further, in the acoustic signal processing method according to one aspect of the present
invention, the plurality of microphones capture a sound including a target sound emitted from a
sound source and convert the sound into an acoustic signal indicating the sound; Acquiring a
sound space feature amount related to a feature of the sound space based on a signal; estimating
a sound source direction of the target sound based on the acquired sound space feature amount;
the sound space feature amount Acquiring third order or higher order statistics, and estimating
the reliability of the estimated sound source direction based on the acquired high order statistics.
[0016]
Further, a computer program according to one aspect of the present invention includes a CPU
connected to a plurality of microphones that capture a sound including a target sound emitted
from a sound source and convert the sound into a sound signal indicating the sound; A computer
program for processing an acoustic signal output from the plurality of microphones, the step of
acquiring a sound space feature amount related to a feature in the space of the sound based on
the acoustic signals output from the plurality of microphones; Estimating the sound source
direction of the target sound based on the sound space feature, acquiring the third or higher
order statistic of the sound space feature, and obtaining the high order statistic And estimating
the reliability of the estimated sound source direction.
[0017]
According to the acoustic signal processing device, the acoustic signal processing method, and
the computer program according to the present invention, it is possible to obtain the reliability of
the estimated sound source direction.
04-05-2019
5
[0018]
FIG. 1 is a block diagram showing a configuration of an acoustic signal processing device
according to a first embodiment.
6 is a flowchart showing a flow of acoustic signal processing of the acoustic signal processing
device according to the first embodiment.
6 is a flowchart showing a procedure of sound source direction estimation processing according
to the first embodiment.
The graph which shows the relationship between a sound space feature-value and a sound source
direction estimated value.
15 is a flowchart showing the flow of acoustic signal processing of the acoustic signal processing
device according to the third embodiment.
The graph which shows the sound source direction estimation result and relevant information in
the environment where diffuse noise exists.
The graph which shows the sound source direction estimation result and relevant information in
the environment where directional noise exists.
Graph showing calculation results of ESS, crest factor, skewness, and kurtosis in each frame in
the presence of diffusive noise.
The graph which shows the calculation result of ESS, crest factor, skewness, and kurtosis in each
flame | frame in the environment where directional noise exists.
04-05-2019
6
[0019]
Hereinafter, preferred embodiments of the present invention will be described with reference to
the drawings.
[0020]
Embodiment 1 FIG. 1 is a block diagram showing a configuration of an acoustic signal processing
device according to the present embodiment.
As shown in FIG. 1, the acoustic signal processing device 1 includes two microphones 2 and 2, an
amplifier 3 connected to each of the microphones 2 and 2, and an A / 1 connected to each of the
amplifiers 3 and 3. It comprises a D converter 4, 4, a CPU 5 connected to the A / D converter 4,
4, and a ROM 51 and a RAM 52 connected to the CPU 5.
[0021]
The two microphones 2 and 2 are arranged at a distance of 10 cm from each other.
These microphones 2 and 2 capture surrounding sound and output an acoustic signal that is an
electrical signal corresponding thereto.
Around the microphones 2, 2, sound (target sound) emitted from the sound source 6 such as a
speaker or a speaker device and noise and reverberation etc. are generated, and the microphones
2, 2 capture sounds including these.
[0022]
The acoustic signals output from the microphones 2 and 2 are separately provided to the
amplifiers 3 and 3, respectively. The amplifiers 3, 3 amplify the supplied acoustic signals by a
predetermined amplification factor, and output the amplified acoustic signals.
04-05-2019
7
[0023]
The amplified acoustic signals output from the amplifiers 3 are separately provided to the A / D
converters 4. The A / D converters 4, 4 convert acoustic signals, which are analog signals, into
digital signals, and store the converted acoustic data in a built-in register.
[0024]
The CPU 5 can execute a computer program stored in the ROM 51. Then, the CPU 5 executes the
computer program 51a for acoustic signal processing so that the CPU 5 reads the acoustic data
stored in the register of the A / D converters 4 and 4 and performs data processing as described
later.
[0025]
The ROM 51 is configured by a mask ROM, a PROM, an EPROM, an EEPROM, or the like, and
stores a computer program to be executed by the CPU 5 and data used for the computer
program. That is, the ROM 51 stores a computer program 51a for causing the CPU 5 to execute
acoustic signal processing to be described later, and data 51b used in the execution. The data
51b includes a noise model described later.
[0026]
The RAM 52 is configured by SRAM, DRAM or the like. The RAM 52 is used as a work area of the
CPU 5 when the CPU 5 executes a computer program.
[0027]
Next, the operation of the acoustic signal processing device 1 according to the present
embodiment will be described. When the sound signal processing apparatus 1 is started, the CPU
5 executes the computer program 51a stored in the ROM 51. At this time, a noise model is
loaded from the ROM 51 to the RAM 52. In this state, the acoustic signal processing device 1
04-05-2019
8
operates as follows.
[0028]
FIG. 2 is a flow chart showing the flow of acoustic signal processing of the acoustic signal
processing device 1 according to the present embodiment. The sound captured by the
microphones 2 and 2 is converted into an acoustic signal and output from the microphones 2
and 2. The acoustic signal which is an analog signal is amplified by the amplifiers 3 and 3,
respectively, and the amplified acoustic signal is converted to a digital signal by the A / D
converters 4 and 4, and the converted acoustic data is converted to an A / D converter It is
stored in a register built in 4, 4. Such an operation is repeatedly performed at a predetermined
sampling frequency.
[0029]
The CPU 5 reads the acoustic data from the register of the A / D converters 4, 4 and frames the
acoustic signal cut out at the sampling frequency (step S1). Next, the CPU 5 Fourier-transforms
the sound signal (step S2), and determines whether there is a target signal in the sound signal,
that is, a signal indicating the target sound emitted by the sound source 6 based on the data after
the Fourier transform. (Step S3). In this process, since the energy density of the target signal is
considered to be higher than the noise component, it is performed by determining whether or
not the energy density is high and a high frequency is present in the data.
[0030]
If the target signal does not exist in step S3 (NO in step S3), the acoustic signal of that frame is
considered to contain only noise. Here, among the noise present in the real environment, for the
stationary noise component, its frequency feature can be obtained as an average long-time
average power spectrum observed in a section where the target signal does not exist. Therefore,
when the target signal does not exist in the acoustic signal, the CPU 5 updates the noise model of
the RAM 52 with the acoustic signal of the frame (step S4), and the process proceeds to step S1.
[0031]
04-05-2019
9
The noise model is given by the following equation (5). Here, the operator F (·) represents Fourier
transform, the operator | · | <2> represents power spectrum operation, and the noise model is
given as an average value of the power spectrum from time k1 to time k2.
[0032]
If the target signal exists in step S3 (YES in step S3), the target signal is dominant by obtaining
the difference between the acoustic signal (mixed signal of target signal and noise) and the noise
model of the RAM 52 in the frequency domain. Frequency is estimated (step S5). This makes it
possible to obtain a frequency that is considered to be the least affected by noise, that is, a high
SN ratio.
[0033]
Next, the CPU 5 extracts the narrow bandwidth signal of the acoustic signal by passing the
acoustic signal through a band pass filter having a predetermined bandwidth centered on the
frequency estimated in step S5 (step S6). As described above, the noise resistance of the sound
space feature quantity can be improved by extracting the signal in the band in which the
influence of noise is smaller than that of the acoustic signal. Here, since the noise captured by the
microphones 2 and 2 changes from moment to moment, it is considered difficult to estimate its
power spectrum strictly, so it is not the difference between the acoustic signal and the noise
model. A band pass filter is applied to the acoustic signal read out from the A / D converters 4
and 5.
[0034]
Next, the CPU 5 executes a sound source direction estimation process (step S7). Although noise
models are effective in stationary noise environments, noise spatial features are more or less
distorted due to noise, even at frequencies where the signal of interest dominates. Therefore, in
the present embodiment, the noise resistance of the sound space feature quantity is further
improved by the introduction of the sound source model. When the target signal is an audio
signal, its frequency feature is time-variant, and it is difficult to uniquely determine the statistical
property of the frequency feature due to the influence of individuality. Therefore, we focus on the
temporal movement of the sound source. That is, the acoustic signal is cut out in a short time
04-05-2019
10
frame, and the movement of the sound source between the frames is modeled. Here, a random
walk model (Equation (6)) having the highest versatility is adopted as a model for describing the
motion of an object. Here, Θ k represents a true sound source direction at time k, and k k is
noise according to a Gaussian distribution with an average of 0 and a dispersion σ <2>.
Expression (6) represents that the object moves smoothly in time, and means that the smaller the
variance σ <2>, the smoother the movement locus is. For example, when the target object is a
car or a rocket, it is desirable to describe the sound source model as constant motion or constant
acceleration motion. When the target sound source is a person, it is appropriate to describe the
temporal movement of the sound source with a random walk model by setting the frame length
as short as several tens of milliseconds.
[0035]
Here, a method of realizing the sound source direction estimation by combining the dynamic
characteristic model of the sound source and the frequency characteristic model of the noise will
be described. First, in order to consider time-series filtering, the time series up to the time k of
the sound source direction Θ and the observation signal (acoustic signal) x will be expressed as
follows. As sound space feature values, cross correlations between two observed signals (C. H.
Knapp and G. C. Carter, "The generalized correlation method for estimation of time delay," IEEE
Trans. Acoust., Speech, Signal Process., Vol. 24, pp. 320MUSIC algorithm (R. O. Schmidt, "Multiple
emitter location and signal parameter estimation," IEEE Trans. Antennas Propagation, Vol. 34, No.
3, pp. 276-280, 1986. Is often used. Here, sound space feature quantities are calculated based on
the cross correlation method. However, in order to use the sound space feature quantity p (Θ | x)
as the likelihood p (x | Θ) (the range needs to be [0, 1]), the cross correlation value (the range is
[−1, 1) Use half-wave rectified ones.
[0036]
At this time, the sound source shown in equation (6) and the posterior probability p (Θ1: k-1 |
x1: k-1) of the sound space feature quantity one time ago and the likelihood p (xk | Θk) at time k
Using the system model p (し た k | Θ k-1) describing the motion of the sound space feature
quantity a posteriori probability p (Θ 1: k | x1: k) can be obtained.
[0037]
The state estimation of Equation (8) can be performed by using a bootstrap filter that uses a
system model as a proposal distribution (A. Doucet, JF G. de Freitas, and N. J. Gordon, Sequential
04-05-2019
11
Monte Carlo Methods in Practice, Springer-Verlag, New York, 2001.
)によるものである。 In the real problem, the equation (8) can not be analytically solved
because nonlinear / non-Gaussian likelihood is used in the state estimation. Therefore, in the
present embodiment, state estimation is performed using a particle filter that expresses an
arbitrary probability distribution as a set of weighted particles. The particle filter performs oneperiod forward prediction, updating of weights, and redistribution of particles (resampling) at
each time. The concrete procedure of the sound source direction estimation algorithm using the
sound space feature quantity estimated from the state estimation by the particle filter and the
posterior distribution by it is shown below.
[0038]
FIG. 3 is a flowchart showing a procedure of sound source direction estimation processing
according to the present embodiment. First, the CPU 5 predicts the particle distribution and
updates the weight (step S71). In this process, when the frame to be processed is the first frame,
the sound source direction is unknown, so one-dimensional space [−90 deg. , 90 deg. Arrange
the particles {Θ0 <(l)>} l = 1 <M> uniformly in. Here, l represents a particle number, and M
represents a particle number. In the initial frame, it is assumed that the particles all have equal
weights {w0 <(l)>} l = 1 <M> = 1 / M. On the other hand, when the frame to be processed is the
second and subsequent frames, the CPU 5 generates particles {Θ k <(l)>} l = 1 <M> according to
the system model shown in equation (6). Thus, the prior distribution of particles at time k is
estimated as in equation (9). Also, as shown in equation (10), the weight {wk <(l)>} l = 1 <M> of
each particle is updated according to the likelihood p (xk | Θk). Here, the likelihood is calculated
as a half-wave rectified value of the cross-correlation function band-limited at the dominant
frequency estimated using the noise model.
[0039]
Next, the CPU 5 redistributes (resamples) the particles so that each particle has an equal weight
(step S72). In this process, particles having a weight equal to or greater than a predetermined
value are divided into numbers proportional to the weight, and particles having a weight less
than the predetermined value are eliminated. That is, due to the redistribution of particles,
particles with a large weight are divided into many particles, and particles with a small weight
are annihilated. The resampled set of weighted particles is used as a proposal distribution at the
next time. Further, the CPU 5 reconstructs (estimates) the sound space feature quantity from the
04-05-2019
12
set of weighted particles (step S73). Since this sound space feature quantity is estimated in
consideration of both the noise model and the sound source model, it is expected that distortion
due to noise is greatly reduced.
[0040]
FIG. 4 is a graph showing the relationship between the sound space feature and the estimated
sound source direction. In FIG. 4, the vertical axis represents the magnitude of the sound space
feature, and the horizontal axis represents the angle. The sound space feature quantity can be
considered to be proportional to the probability distribution of the sound source direction
obtained from the observation signal, as shown in FIG. Therefore, the CPU 5 estimates the sound
source direction Θ k at time k as Θ that gives the maximum value of the sound space feature
quantity p (Θ k | x k) according to the following equation (11) (step S74). Thereafter, the CPU 5
returns the process to the call address of the sound source direction estimation process in the
main routine.
[0041]
Next, the CPU 5 estimates the reliability of the sound source direction estimated in step S7 (step
S8). Hereinafter, this process will be described in detail. First, the relationship between the
second-order statistic based on the number of effective samples and the reliability of the
estimated sound source direction will be described. Here, the number of effective samples
mentioned here refers to a measure proposed to determine the necessity of resampling in the
particle filter (J. S. Liu and R. Chen, "Blind deconvolution via sequential implications," J. Amer.
Stat. Assoc., Vol. 90, pp. 567-576, 1995.)。
[0042]
The effective sample number ESS is defined as follows using M particle weights {w <(l)>} l = 1
<M>. Equation (12) represents the degree of concentration of particles in one-dimensional space.
That is, if particles are concentrated in a certain direction, ESS takes a large value, and if particles
are dispersed, ESS takes a small value. In the source direction estimation problem, it is desirable
that the sound space feature be unimodal and that the main lobe be sharp. Therefore, it is
considered that the reliability of the sound source direction estimation value is higher as the ESS
is larger.
04-05-2019
13
[0043]
In fact, the inventor has confirmed that for diffuse noise (if the noise source direction is not
clear), the ESS can estimate the reliability of the source direction estimate (M. Mizumachi and K.
Niyada, " Robust direction-of-arrival estimation by particle filtering with confidence measure
based on effective sample size under noisy environments, "Proc. Joint 4th Intl. Conf. On Soft
Computing and Intelligent Systems and 9th Intl. Sympo. On advanced Intelligent Systems (SCIS &
ISIS 2008), CD-ROM, 2008. See ). However, it was also found that for directional noise (if the
noise source has a sharp directivity), ESS can not estimate the reliability of the source direction
estimate. In general, when directional noise is present, the sound space feature has two
maximum values in the target sound source direction and the noise source direction. If sound
space feature quantities with improved noise resistance estimated by the estimation method
according to the present embodiment are used, the peak in the noise source direction should be
relatively small, but some noise source directions may be used. The particles may have been
distributed. In order to evaluate the weight of all particles, the ESS defined by equation (12) is
directed not only to the weight of particles existing near the target sound source direction to be
evaluated but also to the noise source direction irrelevant to the sound source direction
estimation result. Affected by the weight of the distributed particles. Therefore, under directional
noise environment, reliability estimation of source direction estimates by ESS is not desirable.
[0044]
Therefore, in the present embodiment, the reliability of the sound source direction estimation
value is estimated using high-order statistics of third or higher order. As the third-order statistic,
Skewness which is a third-order moment (expected value of the third power) shown by the
following equation (13) is used. That is, the CPU 5 calculates the skewness as the reliability of the
sound source direction estimated value. Here, wrms is the effective value of the weight of all
particles, and wmean is the average value of the weights of all particles, which are shown below.
[0045]
Skewness is a measure that represents the asymmetry of a distribution, and takes a value closer
to 0 as the distribution is symmetrical. In the source direction estimation problem, since noise
comes from all directions in a diffuse noise environment, there is a possibility that the particles
04-05-2019
14
that should be concentrated in the target source direction are dispersed symmetrically around
the target source direction. is there. Therefore, when the skewness is low, it is suspected that the
sound space feature is distorted due to the presence of the diffusive noise. The skewness can also
be said to be an index indicating the kurtosis of the sound space feature quantity. As described
above, by using the index indicating the kurtosis of the sound space feature amount, it is possible
to determine the degree of concentration of particles near the true target sound source direction
with high accuracy of the reliability of the sound source direction estimation value. It is
considered possible to estimate.
[0046]
When the process of step S8 is completed, the CPU 5 returns the process to step S1.
[0047]
The CPU 5 can display the estimated value of the sound source direction thus obtained and the
reliability thereof on a display unit (not shown).
In addition to or in addition to this, the estimated value of the sound source direction and the
reliability thereof can also be output as data to the outside of the acoustic signal processing
apparatus 1.
[0048]
The CPU 5 can also use the estimated sound source direction for other applications. For example,
it can be used for sound processing techniques such as separation of a plurality of sound sources,
noise removal, dereverberation, and voice section detection. Here, the estimated reliability is
compared with a predetermined reference value, and when the reliability is equal to or higher
than the reference value, the estimated sound source direction is used for the application, and
the reliability is less than the reference value. The estimated sound source direction can not be
used for the application. Further, when the reliability is equal to or higher than the reference
value, a narrow range of sound signal centered on the estimated sound source direction is
extracted, and the extracted sound signal is used for the application, and the reliability is less
than the reference value In this case, the sound signal can be used for the application without
extracting a wide range of sound signal centered on the estimated sound source direction or
limiting the observation direction of the sound signal. By doing this, it is suppressed that the
04-05-2019
15
wrong direction is the sound source direction, and it can be expected that a more appropriate
processing result can be obtained as compared with the prior art.
[0049]
Second Embodiment In the present embodiment, the fourth moment (the expected value of the
fourth power) represented by the following equation (14) is used as the third or higher order
statistic in the estimation process of the reliability of the sound source direction. Use Kurtosis,
which is the That is, the CPU 5 calculates the kurtosis as the reliability of the sound source
direction estimated value.
[0050]
The kurtosis is a statistic that represents the degree of concentration of the distribution, and thus
can be expected as a measure suitable for evaluating the unimodal nature of sound space feature
quantities. Although ESS defined by equation (12) is also an index proposed to know the degree
of distribution of particles, the kurtosis is the fourth moment, that is, the degree of concentration
of the distribution is emphasized more because the order is higher than ESS. It is possible.
[0051]
The other configuration and operation of the audio signal processing device according to the
present embodiment are the same as the configuration and operation of the audio signal
processing device 1 according to the first embodiment, and thus the description thereof will be
omitted.
[0052]
Third Embodiment The configuration of an acoustic signal processing device according to the
present embodiment is the same as the configuration of acoustic signal processing device 1
according to the first embodiment, and therefore the same components are denoted by the same
reference numerals. The explanation is omitted.
[0053]
04-05-2019
16
The operation of the acoustic signal processing device according to the present embodiment will
be described.
FIG. 5 is a flowchart showing a flow of acoustic signal processing of the acoustic signal
processing device according to the present embodiment.
First, the sound captured by the microphones 2 and 2 is converted into an acoustic signal and
output from the microphones 2 and 2. The acoustic signal which is an analog signal is amplified
by the amplifiers 3 and 3, respectively, and the amplified acoustic signal is converted to a digital
signal by the A / D converters 4 and 4, and the converted acoustic data is converted to an A / D
converter It is stored in a register built in 4, 4. Such an operation is repeatedly performed at a
predetermined sampling frequency.
[0054]
The CPU 5 reads acoustic data (acoustic signal) from the register of the A / D converters 4 (step
S301). When the target sound source s (t) exists in the Θ direction, the acoustic signals x (t) ≡
(x1 (t), x2 (t)) observed at two different positions are given by the following equation (15) Can be
represented. Here, h1 (t) and h2 (t) are impulse responses from the target sound source to the
respective observation points (microphones 2 and 2), and n1 (t) and n2 (t) are noises at the
respective observation points The time difference τ = τ1−τ2 until the target signal s (t)
arrives at each observation point changes according to the sound source direction Θ. If the
sound source direction is limited to a one-dimensional direction, τ corresponds to the sound
source direction Θ one to one. That is, the source direction estimation problem can be regarded
as a problem of estimating the signal arrival time difference τ inherent to the acoustic signal x
(t).
[0055]
Next, the CPU 5 calculates the sound space feature quantity p (Θ | x) from the observation signal
x (t) (step S302). The sound space feature quantity can be considered to be proportional to the
probability distribution of the sound source direction Θ obtained from the observation signal x
(t) (see FIG. 4). In the present embodiment, cross-correlation between two observation signals is
adopted as the sound space feature quantity. Besides this, when the number of sound sources is
known, the sound space feature quantity may be determined using the MUSIC method.
04-05-2019
17
[0056]
Next, the CPU 5 estimates the sound source direction Θ as a habit of giving the maximum value
of the sound space feature quantity p (Θ | x) according to the following equation (16) (step
S303).
[0057]
Next, the CPU 5 estimates the reliability of the sound source direction estimated in step S303
(step S304).
In this process, the skewness of the sound space feature quantity p (Θ | x) is calculated as the
reliability of the sound source direction. The kurtosis of the sound space feature quantity p (Θ |
x) may be calculated as the reliability of the sound source direction.
[0058]
With this configuration, it is possible to easily obtain the reliability of the estimated value of the
sound source direction.
[0059]
(Evaluation Experiment) The inventor of the present invention has been under diffuse noise and
directional noise environment in order to verify the validity of the reliability evaluation measure
of the sound source direction estimation value in the acoustic signal processing method
according to the first and second embodiments. An experiment was conducted to investigate the
relationship between the sound source direction estimation results in and the behavior of each
confidence measure.
The experimental results will be described below.
[0060]
04-05-2019
18
The target signal is a voice read out by a woman extracted from a TI-digit voice database, which
is emitted from a speaker in a soundproof room and re-recorded by two microphones arranged at
an interval of 10 cm. The target sound source was to move continuously and smoothly. The noise
was white noise, and as the diffuse noise, uncorrelated white noise was added to the target signal
between the observation signals of the two microphones, and as directional noise, it was emitted
from the speaker arranged in the -15 ° direction. White noise was observed by two
microphones.
[0061]
As a parameter necessary for sound source direction estimation, the number of particles is fixed
at 500, and the variance σ <2> of the system noise of Equation (6) is set to an optimum one
under each noise environment.
[0062]
FIG. 6 is a graph showing the sound source direction estimation result and the related
information in the presence of diffuse noise, and FIG. 7 is a graph showing the sound source
direction estimation result and the related information in the presence of directional noise is
there.
The upper part of each figure shows a graph showing the estimation result of the sound source
direction. In these upper graphs, in each frame, the true sound source direction is indicated by
○, the sound source direction estimated by the method of the third embodiment is indicated by
+, and the sound source direction estimated by the method of the first embodiment is solid line It
shows by. The middle stage of each of FIG. 6 and FIG. 7 shows the difference between the true
sound source direction (o mark) and the estimated sound source direction (solid line) as the
sound source direction estimation error according to the method of the first embodiment. The
lower part of each of FIG. 6 and FIG. 7 shows the signal to noise ratio in each frame. The smaller
the signal-to-noise ratio, the greater the energy of the noise. In this experiment, it is referred to as
an environment in which diffuse noise exists (hereinafter referred to as “diffuse noise
environment”. And directional noise (hereinafter referred to as "directional noise environment").
In each of the above, the sound source direction was estimated, and the ESS, the crest factor
(Crest factor), the skewness (Skewness), and the kurtosis (Kurtosis) in each frame were calculated
as the reliability of the estimated sound source direction. FIG. 8 corresponds to FIG. 6, and shows
calculation results of ESS, crest factor, skewness, and kurtosis in each frame under a diffusive
noise environment. FIG. 9 corresponds to FIG. 7 and shows the calculation results of ESS, crest
04-05-2019
19
factor, skewness, and kurtosis in each frame under a directional noise environment.
[0063]
Here, the crest factor will be described. The crest factor (CF) is a second-order statistic that
focuses on particles near the maximum peak of the sound space feature shown by the following
equation (17), and is the ratio of the maximum value to the effective value. Here, w <(max)> is the
weight of the particle having the largest weight among all particles.
[0064]
As shown in FIG. 6 and FIG. 7, the true sound source of the sound source direction estimated in
the area between the frame numbers 10 to 20, that is, in the region where the angle of the sound
source direction is large, under both diffuse noise environment and directional noise
environment. The error from the direction is large.
[0065]
As shown in FIG. 8, under the diffuse noise environment, the reliability evaluation measures
(reliability estimation values) of the four sound source direction estimation results behave
similarly.
That is, in all the scales, the numerical values fall in the range of frame numbers 10-15. Since the
range of frame numbers 10 to 15 is included in the region where the above error is large, the
results in FIG. 8 show that the estimation results show that the reliability is low in the region
where the error is large in any scale. Also, it can be seen that the reliability estimation works
correctly in any of the scales. Comparing each reliability evaluation scale in more detail, it can be
seen that the dynamic range of the reliability is large (the degree of depression is large) for the
skewness and kurtosis which are the third and fourth statistics. That is, although all four scales
can be used as the reliability evaluation scale of the sound space feature quantity, it is desirable
to use higher order statistics of third order or higher.
[0066]
04-05-2019
20
On the other hand, in a directional noise environment, as shown in FIG. 9, the skewness and
kurtosis, which are the third or higher order statistics, exhibit completely different behavior with
respect to the second order statistics, ESS and the crest factor. Know that That is, the numerical
values of the ESS and the crest factor are prominent in the range of frame numbers 10 to 13, and
the numerical values of the skewness and kurtosis are depressed at frame numbers 11 to 15.
Since all of these ranges are included in the area where the above error is large, the result in FIG.
9 shows that the ESS and the crest factor are erroneously estimated to be high in the area where
the error is large, The skewness and kurtosis show that the correct estimation result is obtained
that the reliability is low in the region where the error is large. That is, the second order statistics
(ESS and crest factor) fail in reliability estimation. For noise sources coming from a specific
direction, ESS that expresses the degree of variation of all particles that approximate sound space
features and the crest factor that focuses on the maximum peak are strongly affected by noise
and should be evaluated It is inferred that the concentration of particles near the target sound
source direction can not be expressed. On the other hand, the skewness and kurtosis, which are
the third or higher order statistics, can correctly estimate the reliability of the sound source
direction estimation result in almost the same manner as in the diffusive noise environment even
in the directional noise environment. From the above, it is understood that it is desirable to use
one based on higher order statistics of third or higher order as the reliability evaluation measure
of the sound source direction estimation result.
[0067]
(Other Embodiments) In Embodiments 1 to 3 described above, although the configuration in
which the third or fourth order statistic of the sound space feature quantity is the reliability of
the sound source direction has been described, the present invention is limited thereto It is not
something to be done. The order is not limited as long as it is a fifth-order or sixth-order, thirdorder or higher statistic.
[0068]
Moreover, in the first to third embodiments described above, although the configuration for
calculating the third or higher order statistic of the sound space feature amount as the reliability
of the sound source direction has been described, the present invention is not limited thereto. A
configuration may be adopted in which a numerical value obtained by appropriately processing
the statistic instead of the statistic itself, such as normalizing the third or higher order statistic of
the sound space feature, is the reliability of the sound source direction . However, the reliability
of the sound source direction needs to be a numerical value that increases or decreases in
04-05-2019
21
accordance with the increase or decrease of the statistic.
[0069]
Further, in the first and second embodiments described above, the configuration is described in
which the sound space feature value is calculated after band limitation of the acoustic signal to a
band where the influence of noise is small by the band pass filter. It is not limited. The sound
space feature may be derived from an acoustic signal which is not band-limited by the band pass
filter.
[0070]
Further, in the above-described first and second embodiments, the band-pass filter is applied to
the acoustic signal read from the A / D converters 4 and 4 to limit the band. However, the
present invention is not limited thereto. It is not a thing. If the power spectrum of the noise is
known or can be estimated, applying a band pass filter to the difference between the acoustic
signal and the noise model and band limiting the difference is the effect of the noise. It is
preferable at the point which can eliminate.
[0071]
Further, in the first to third embodiments described above, the configuration in which the CPU 5
performs the acoustic signal processing by executing the computer program 51a has been
described, but the present invention is not limited to this. As long as the configuration can
perform equivalent processing, the configuration may be such that acoustic signal processing is
performed by the hardware itself without executing a computer program by an application
specific integrated circuit (ASIC) or a field programmable gate array (FPGA). Alternatively, a
computer program for audio signal processing may be installed in a hard disk provided in a
general-purpose personal computer, and the CPU of the personal computer may execute the
computer program to execute equivalent audio signal processing.
[0072]
An acoustic signal processing apparatus, an acoustic signal processing method, and a computer
program according to the present invention are an acoustic signal processing apparatus and an
04-05-2019
22
acoustic signal processing method for estimating the reliability of a sound source direction
estimated from an observed acoustic signal, and a computer Is useful as a computer program for
estimating the reliability of
[0073]
DESCRIPTION OF SYMBOLS 1 sound signal processing apparatus 2 microphone 3 amplifier 4
converter 5 CPU 6 sound source 51 ROM 51a computer program 51b data 52 RAM
04-05-2019
23
Документ
Категория
Без категории
Просмотров
0
Размер файла
39 Кб
Теги
jp2011139409
1/--страниц
Пожаловаться на содержимое документа