close

Вход

Забыли?

вход по аккаунту

?

JP2010066506

код для вставкиСкачать
Patent Translate
Powered by EPO and Google
Notice
This translation is machine-generated. It cannot be guaranteed that it is intelligible, accurate,
complete, reliable or fit for specific purposes. Critical decisions, such as commercially relevant or
financial decisions, should not be based on machine-translation output.
DESCRIPTION JP2010066506
PROBLEM TO BE SOLVED: To provide a sound pickup device capable of suppressing noise more.
SOLUTION: A sound signal before processing of intensity according to sound pressure of input
sound, time derivative signal of intensity according to time derivative value of the sound
pressure, and the sound in each axial direction of a two-dimensional orthogonal coordinate
system A sound receiving unit 1 for outputting an x differential signal and a y differential signal
according to a spatial differential value of pressure, an unprocessed speech signal output from
the sound receiving unit 1, a time differential signal, an x differential signal and a y differential
signal And a voice processing unit 2 that generates a processed voice signal in which noise from
other than the predetermined target position is suppressed by the load sum of. The voice
processing unit 2 appropriately updates the reference distance used to determine the load used
to generate the processed voice signal. Compared with the case where the reference distance is
always matched with the distance to the target position, it is possible to further suppress the
noise (noise) from other than the target position. [Selected figure] Figure 1
Sound pickup device
[0001]
The present invention relates to a sound collecting device.
[0002]
BACKGROUND ART Conventionally, there has been provided a sound collection device that
generates a processed audio signal in which a sound from a predetermined target position is
04-05-2019
1
selectively reflected by using an unprocessed audio signal output from a microphone or the like
(for example, Patent Literature 1 to 3).
As the target position, a position where it is considered that a sound source (such as a speaker) of
the target sound is present is selected.
[0003]
In addition, the pre-processing audio signal of the intensity according to the sound pressure of
the input sound, the time differential signal of the intensity according to the time differential
value of the sound pressure, and the sound pressure in each axial direction of the twodimensional orthogonal coordinate system In the space-time gradient method using sound
receiving means for outputting x differential signal and y differential signal according to the
spatial differential value, the weighted sum of the preprocessed audio signal, the temporal
differential signal, the x differential signal and the y differential signal There is known a
technique for generating an audio signal after processing (see, for example, Non-Patent
Document 1). The load used for the above-mentioned load sum is a processing which is the
integral value of the square of the strength of the processed voice signal over a predetermined
load sum calculation time under the constraint that the gain for the sound from the target
position is constant. If determined by the MV (Minimum Variance) method that gives the lowest
power, in the obtained processed audio signal, the sound from other than the target position is
suppressed and the sound from the target position is selectively reflected Be done. SUMMARY OF
THE INVENTION Ono, Ando, "Measurement of sound field and directivity control, 22nd sensing
forum material, pp. 305-310,2005.
[0004]
In this type of sound collection device, sounds such as ambient noise and reverberation from a
position other than the target position in the processed audio signal (hereinafter referred to as
"noise"). ) Should be as small as possible. For example, in the case where the post-processing
audio signal output from the above-described sound collection device is used for speech
recognition, reduction of noise in the post-processing audio signal leads to an improvement in
recognition accuracy.
[0005]
04-05-2019
2
The present invention has been made in view of the above, and an object thereof is to provide a
sound collection device capable of further suppressing noise.
[0006]
The invention according to claim 1 relates to each axial direction of the two-dimensional
orthogonal coordinate system and the time differential signal of the intensity according to the
time differential value of the sound pressure before processing and the sound pressure of the
intensity according to the sound pressure of the input sound A sound receiving unit for
outputting an x differential signal and a y differential signal according to the spatial differential
value of the sound pressure, and an unprocessed speech signal, a time differential signal, an x
differential signal and a y differential signal output from the sound receiving unit And a voice
processing unit that generates a post-processing voice signal based on the weighted sum of the
two, and the voice processing unit determines the load used for the weighted sum according to a
predetermined target position considered to be a target voice. MV under which the postprocessing power which is the integral value of the square of the intensity of the post-processing
audio signal over a predetermined weighted sum calculation time is minimized under the
constraint that the gain for the sound from the reference position is made constant. To be
determined by There is a possibility to update the reference distance which is the distance
between the sound receiving unit and the reference position so as to select from on the straight
line passing the target position starting from the sound receiving unit and minimizing the postprocessing power. The update operation is performed periodically.
[0007]
According to the present invention, compared to the case where the reference position used for
determining the load is always matched with the target position, it is possible to further suppress
noise from other than the target position in the processed audio signal.
[0008]
The invention of claim 2 relates to the invention according to claim 1, wherein a pre-processing
power computing unit for computing pre-processing power which is an integral value of the
square of the intensity of the pre-processing audio signal over the weighted sum calculation time;
The determination unit compares the power before processing obtained by the calculation unit
with a predetermined sound volume threshold, and the audio processing unit determines that the
power before processing is less than the sound volume threshold as a result of comparison in the
determination unit during the update operation. It is characterized in that the reference distance
is not changed.
04-05-2019
3
[0009]
According to the present invention, it is possible to avoid that the reference distance is changed
when the sound pressure (volume) of the sound incident on the sound receiving unit is
insufficient.
[0010]
The invention of claim 3 relates to the invention according to claim 1 or 2, wherein a preprocessing power calculating unit calculates a pre-processing power that is an integral value of
the square of the intensity of the pre-processing audio signal over the weighted sum calculation
time; The processing ratio which is the ratio of the post-processing audio power output by the
post-processing power operating unit to the pre-processing audio power output by the postprocessing power operating unit and the pre-processing audio power output by the preprocessing power operating unit The audio processing unit does not change the reference
distance if the processing ratio obtained by the determining unit is equal to or higher than the
target sound threshold during the updating operation. It is characterized by
[0011]
According to the present invention, it is possible to avoid that the reference distance is changed
when the noise is low.
[0012]
The invention according to claim 4 relates to the invention according to claim 1 or 2, wherein a
weighted sum of the preprocessed audio signal output from the sound receiving unit, the time
differential signal, the x differential signal and the y differential signal from the target position.
An insensitive processor that generates an insensitive speech signal that does not reflect speech,
a pre-processing power arithmetic unit that calculates an unprocessed power that is an integral
value of the square of the strength of the unprocessed speech signal over the load sum
calculation time; Post-insensitivity power calculation unit for calculating post-insensitivity
processing power which is an integral value of the square of the intensity of the insensitive
speech signal over the sum calculation time, and post-inactivation power for the pre-processing
audio power output by the pre-processing power operation unit The voice processing unit further
includes a determination unit that calculates the insensitivity processing ratio, which is the ratio
of the post-insensitivity processing voice power output from the arithmetic processing unit, and
compares the acquired insensitivity processing ratio with a predetermined noise threshold.
Format Insensitive treatment ratio obtained in part is characterized in that it does not change the
reference distance is less than the noise threshold.
04-05-2019
4
[0013]
According to the present invention, it is possible to avoid that the reference distance is changed
when the noise is low.
[0014]
According to the invention of claim 5, in the invention of claim 3, the audio processing unit
determines the reference distance and the processing ratio to be minimum when the processing
ratio obtained by the determination unit is less than the target sound threshold during the
update operation. It is characterized by updating to such a value.
[0015]
The invention according to claim 6 is the invention according to claim 4, further comprising a
post-processing power computing unit for computing post-processing power, and the voice
processing unit is configured such that the insensitive processing ratio obtained by the
determining unit during the update operation And updating the reference distance to a value that
minimizes the ratio of the post-processing power output by the post-processing power operation
unit to the post-insensitivity processing power output by the post-insensitivity processing power
operation unit. It features.
[0016]
According to the invention of claim 7, in the invention according to any one of claims 1 to 6,
when updating the reference distance, the speech processing unit outputs the unprocessed
speech signal output from the sound receiving unit, the time differential signal, and the x
differential signal. Based on the y differential signal, it is determined whether the number of
sound sources is one or not, and when the number of sound sources is one, the distance to the
sound source is estimated, and the distance to the estimated sound source is determined. The
distance is set as a reference distance after updating.
[0017]
The invention according to claim 8 is characterized in that, in the invention according to any one
of claims 5 to 7, the voice processing unit sets the change width of the reference distance in one
update operation to a predetermined upper limit width or less.
[0018]
According to the present invention, distortion of the processed audio signal accompanying a
large change of the reference distance in one update operation is suppressed, and the discomfort
04-05-2019
5
caused by the audio converted from the processed audio signal is reduced. .
[0019]
According to the invention of claim 9, in the invention according to any one of claims 5 to 7, the
operation which the speech processing unit may perform in relation to the reference distance at
the time of the update operation does not change the reference distance and increases the
reference distance by a predetermined unit. It is characterized in that the reference distance is
increased by an amount or decreased by a predetermined unit reduction amount.
[0020]
According to the present invention, distortion of the processed audio signal accompanying a
large change of the reference distance in one update operation is suppressed, and the discomfort
caused by the audio converted from the processed audio signal is reduced. .
[0021]
According to the first aspect of the present invention, the voice processing unit is configured
such that the weight used for the weight sum is constant for the sound from the reference
position determined according to the predetermined target position where the target sound is
considered to be present. Under the restriction condition that the processing is performed using
an MV method that minimizes the post-processing power, which is the integral value of the
square of the intensity of the post-processing audio signal over a predetermined weighted sum
calculation time. The position is selected from a straight line starting from the sound receiving
unit and passing through the target position, and the reference distance, which is the distance
between the sound receiving unit and the reference position, can be updated to minimize the
post-processing power. Since the update operation having the characteristic is performed
regularly, noise from other than the target position can be suppressed in the processed audio
signal compared to the case where the reference position used to determine the load is always
matched with the target position. .
[0022]
According to the second aspect of the invention, the pre-processing power computing unit for
computing the pre-processing power that is the integral value of the square of the intensity of
the pre-processing audio signal over the weight sum calculation time and the pre-processing
power computing unit The audio processing unit does not change the reference distance if the
pre-processing power is less than the volume threshold as a result of the comparison in the
determination unit during the update operation. Therefore, it is possible to avoid that the
reference distance is changed when the sound pressure (volume) of the sound incident on the
04-05-2019
6
sound receiving unit is insufficient.
[0023]
According to the invention of claim 3, the pre-processing power computing unit for computing
the pre-processing power which is the integral value of the square of the intensity of the preprocessing audio signal over the load sum calculation time and the post-processing for
computing the post-processing power The processing ratio which is the ratio of the postprocessing audio power output by the post-processing power computing unit to the preprocessing audio power output by the power computing unit and the pre-processing power
computing unit is computed and the processing ratio obtained is a predetermined purpose The
determination unit includes a determination unit to compare with the sound threshold, and the
voice processing unit does not change the reference distance if the processing ratio obtained by
the determination unit is equal to or more than the target sound threshold during the update
operation. Can be avoided to change.
[0024]
According to the invention of claim 4, an insensitive speech signal to which the speech from the
target position is not reflected is generated by the weighted sum of the unprocessed speech
signal output from the sound receiving unit, the time differential signal, the x differential signal
and the y differential signal. An insensitive processor, a pre-processing power calculator for
calculating a power before processing which is an integral value of the square of the intensity of
the unprocessed signal over the weighted sum calculation time, and an insensitive speech signal
intensity over the weighted sum calculation time The post-insensitivity processing power output
unit for the post-inactivation processing power output unit for the post-inactivation processing
power output unit that calculates the post-inactivation processing power calculation unit that
calculates the post-insensitivity processing power that is an integral value of The voice
processing unit further includes a determination unit that calculates the insensitivity processing
ratio, which is a ratio of power, and compares the acquired insensitivity processing ratio with a
predetermined noise threshold, and the speech processing unit performs the invalidation
processing ratio obtained by the determination unit during the update operation. Is noise Does
not change the reference distance is less than the value, it is possible to prevent the reference
distance is changed when the noise is small.
[0025]
According to the invention of claim 8, the voice processing unit sets the change width of the
reference distance in one update operation to a predetermined upper limit width or less, and
according to the invention of claim 9, the voice processing unit performs the update operation.
The operations that may be performed with respect to the reference distance are either not
04-05-2019
7
changing the reference distance, increasing the reference distance by a predetermined unit
increment, or decreasing the reference distance by a predetermined unit decrement, respectively.
Distortion of the post-processing audio signal accompanying a large change of the reference
distance in one update operation is suppressed, and discomfort caused by the audio converted
(processed) from the post-processing audio signal is reduced.
[0026]
Hereinafter, the best mode for carrying out the present invention will be described with
reference to the drawings.
[0027]
In the present embodiment, as shown in FIG. 1, a sound from a predetermined target position is
selectively reflected by using the sound receiving unit 1 for converting an input voice into an
electric signal and the output of the sound receiving unit 1. And an audio processing unit 2 for
generating the processed post-processing audio signal.
[0028]
The sound receiving unit 1 is a preprocessed audio signal having an intensity corresponding to
the sound pressure f of the input sound, a time derivative signal having an intensity
corresponding to the time differential value ft of the sound pressure f, and a predetermined xaxis direction The x differential signal according to the spatial differential value fx of the sound
pressure f and the y differential signal according to the spatial differential value fy of the sound
pressure f in a predetermined y-axis direction orthogonal to the x-axis direction Output.
[0029]
Specifically, as shown in FIG. 2, for example, four sound receiving units 1 are provided in the
arrangement of the apexes of a square and generate original voice signals of intensities fA to fD
according to the sound pressure of the input sound. Space-time gradient processing unit (shown
in the figure) that generates the unprocessed speech signal, the time differential signal, the x
differential signal and the y differential signal using the original voice signals output from the
microphones 10A to 10D and the microphones 10A to 10D, respectively. And).
The microphones 10A to 10D and the space-time gradient processing unit as described above
can be realized by a known technique, and thus detailed illustration is omitted.
04-05-2019
8
[0030]
That is, the spatiotemporal gradient processing unit uses the intensities fA to fD of the outputs of
the microphones 10A to 10D and the length d of one side of the above-described square to
generate the sound pressure f, the time differential value ft, and the space differential The values
fx and fy are obtained by the following equations (1) to (4).
[0031]
f = (fA + fB + fC + fD) / 4 (1) ft = df / dt (2) fx = {(fA + fB)-(fC + fD)} / 2d (3) fy = {(fA + fC)-(fB + fD)}
/ 2d (4) The voice processing unit 2 calculates the weight sum of the preprocessed voice signal,
the time differential signal, and the x differential signal and the y differential signal output from
the sound receiving unit 1 to obtain a predetermined reference position. To generate a processed
audio signal to which the sound of B. is selectively reflected.
That is, the intensity of the processed audio signal is an intensity corresponding to the weighted
sum of the sound pressure f, the time differential value ft, and the space differential values fx and
fy.
The voice processing unit 2 can be realized by a known electronic circuit, so a detailed circuit
diagram and the like will be omitted.
[0032]
Here, if the input vector F is defined as F = (f ftfxfy) <T> and the load vector W is defined as W =
(w wtwxwy) <T>, the strength of the processed speech signal is W <H>. It is expressed as F.
[0033]
The well-known MV (Minimum Variance) method is used to determine the load vector W used for
the above-mentioned load sum.
The MV method is a method of controlling directivity using the spatiotemporal gradient method,
04-05-2019
9
and the spatiotemporal gradient method is originally proposed as a method of determining the
optical flow which is an apparent velocity field in a moving image. (See Reference 1).
More specifically, in the MV method, the constraint is to make the gain of the processed audio
signal constant with respect to the voice from the reference position, and the average of the
squares of the strength of the processed audio signal in a predetermined period is The constraint
optimization is performed to minimize the value (variance) E [(W <H> F) <2>].
As a result, sounds from other than the reference position can be suppressed in the processed
audio signal.
[0034]
Hereinafter, the method to determine load vector W concretely is demonstrated (refer to
references 2-4).
For the sake of simplicity, the number of sound sources is one, and as shown in FIG.
Consider a coordinate system whose origin is).
The sound velocity is c, the coordinates of the sound source is (x, y, z), the distance between the
sound source and the observation point is r = (x <2> + y <2> + z <2>) <1/2>, Assuming that the
sound field formed by the position is g, the sound field formed at the observation point, that is,
the sound pressure f is expressed by the following equation.
[0035]
[0036]
The spatial derivatives (slopes) fx and fy in the x and y directions of the sound pressure f at the
observation point are
04-05-2019
10
[0037]
[0038]
Here, ξx = x / r <2>, ξy = y / r <2> (8) is called an intensity gradient, and τx = x / cr, τy = y /
cr (9) is x, y It is called directional time gradient.
[0039]
Equations (6) and (7) showing the spatial gradients in the x and y directions of the sound
pressure f (t) at the observation point can be expressed as a vector R = (-x, -y, -z) from the sound
source to the observation point Rewriting using | R | = r)
[0040]
[0041]
となる。
Next, when f (t), ft (t) and ∇f (t) are observed, their weighted sum is
[0042]
[0043]
It is expressed as
Here, w and wt are real constants, and WS = (wx, wy, 0) is a unit vector.
Substituting equation (10) into equation (11),
04-05-2019
11
[0044]
[0045]
となる。
Therefore, the weighted sum of space-time gradients is expressed as a sum of filters having
directional characteristics H (R) and Ht (R) different from f (t) and ft (t).
When H (R) = α, equation (13) is
[0046]
[0047]
Can be deformed.
Here, in general, assuming that an angle formed by two vectors a and b is θ, the following
formula is established.
[0048]
[0049]
Using the formula of equation (18), equation (16) can be rewritten as the following equation.
[0050]
[0051]
ここで、|WS|=1より、
04-05-2019
12
[0052]
[0053]
It is expressed by the equation of the sphere.
In the case of w + α = 0, equation (15) becomes R · WS = 0 (22)
Also, when Ht (R) = α, equation (14) is
[0054]
[0055]
Since the angle between vector R and WS is θ (R),
[0056]
[0057]
となる。
Therefore, equation (23) is
[0058]
[0059]
となる。
04-05-2019
13
[0060]
From the formulas (21), (22) and (25), the following properties are obtained for H (R) and Ht (R).
1) Two directional characteristics H (R) and Ht (R) have rotational symmetry with Ws as the axis
2) When H (R) = 0, the distribution of R has a diameter of 1 / w (w ≠ 0) Of the spherical surface
or plane (w = 0) of 3) Ht (R) = 0, the distribution of R forms a conical surface or plane (wt = 0) of
apex angle 2 cwt (wt ≠ 0) 4) H ( The intersection of the distribution of R when R) = 0 and Ht (R)
= 0 forms a circle or a plane. The directivity characteristics H (R) and Ht (R) described above are
respectively H1 (R1) and H2 (R1) Replace and define as follows.
Where R1 is the position vector of the reference position, r1 = | R1 | is the distance from the
observation point to the reference position (hereinafter referred to as “reference distance”),
and n1x and n1y are unit vectors R1 in the same direction as vector R1. It is an x component and
a y component of / r1, and n1x <2> + n1y <2> = 1.
[0061]
[0062]
Furthermore, two constraint conditions such as the following formulas (28) and (29) are placed
on these H1 (R1) and H2 (R1).
W <H> H1 (R1) = p (28) W <H> H2 (R1) = q (29) Here, p and q are positive real constants,
respectively.
Then, for the sound from the sound source at the reference position indicated by the vector R1,
the gain by the weighted sum using the weight vector W becomes a constant value p + jωq, so to
compensate for this, the voice of FIG. In the processing unit 2, a first-order low-pass filter 22
such as (p + jωq) <− 1> is provided at the subsequent stage of the load sum operation unit 21
which takes the load sum.
04-05-2019
14
As a result, the gain of the sound processing unit 2 as a whole is 1 for the sound from the sound
source at the reference position indicated by the vector R1.
In this embodiment, p = 1 / r1, q = 1 / c.
[0063]
Then, a load vector W that suppresses most of the sound (noise) from other than the reference
position without suppressing the sound from the reference position in the processed voice signal
is under the condition of equations (28) and (29). The power of the post-processing audio signal
in the observation time interval Γ (hereinafter referred to as “post-processing power”.
) Pc
[0064]
[0065]
Obtained by using the Minimum Variance Beamformer (MV method) to minimize.
That is, the observation window corresponds to the load sum calculation time in the claims.
This solution is expressed as the following equations (31) and (32).
[0066]
[0067]
However, B ij (i, j = a, x, y) in equation (32) is expressed by equation (33), and ba (t), bx (t), by (t)
04-05-2019
15
in equation (33) Are each represented by formulas (34) to (36).
[0068]
Here, according to the simulation of the inventor, even when the reference position S is on the z
axis as shown in FIG. 2 and FIGS. 3 (a) and (b), as shown in FIG. 4 and FIG. Even when the
reference position S is a position deviated from the z-axis, on the straight line connecting the
observation point (that is, the position of the sound receiving unit 1 and the origin in the above
coordinate system) O and the reference position S The sound from the sound source is hardly
suppressed in the processed audio signal.
That is, even if the reference position S does not match the target position where it is considered
that the sound source of the sound desired to be selectively left in the processed audio signal, the
reference on the straight line connecting the observation point O and the target position If there
is a position S, the goal of avoiding sound suppression from the sound source at the target
position is achieved.
In other words, the reference distance r1 is necessarily referred to as the distance between the
observation point and the target position (hereinafter, referred to as "target distance".
It is not necessary to match the above, and even if the reference distance r1 and the target
distance are different from each other, the sound from the sound source at the target position is
not suppressed in the processed audio signal.
In each of the simulations shown in FIGS. 2 to 5, one sound source is provided.
3 (a) and 3 (b) and FIG. 5 are diagrams showing the relationship between the amount of sound
suppression (hereinafter referred to as "noise suppression amount") in the processed audio signal
and the position of the sound source. The darker the position, the higher the noise suppression
amount for the sound from the sound source, assuming that the sound source is present at the
position.
In FIG. 3 (a) and FIG. 5, the reference distance r1 is 0.5 m, and in FIG. 3 (b), the reference
04-05-2019
16
distance r1 is 1 m.
[0069]
Furthermore, according to the inventor's simulation, it is a point on a spherical surface (white
line in FIG. 5) whose center is the observation point O and whose reference distance r1 is the
radius, and relatively to points other than the reference position S. The noise suppression amount
in the processed audio signal when there is a noise source at a position deviated from the
straight line connecting the observation point and the reference position such that the noise
suppression amount is high depends on the reference distance r1.
For example, consider the case where the reference position S is taken on the z axis.
As shown in FIG. 6, in the case where the noise source N makes an angle φ of 30 ° with respect
to the z axis and is at a distance of 1.0 m, as shown in FIG. 7 and FIG. The noise suppression
amount is 14.5 dB when the distance is 0.5 m, and the noise suppression amount is 25.6 dB
when the reference distance r1 is 1 m, the same as the distance to the noise source N. It has
become.
8 (a) shows the waveform of the audio signal before processing, FIG. 8 (b) shows the waveform of
the audio signal after processing when the reference distance r1 is 0.5 m, and FIG. 8 (c) is the
reference The waveform of the audio | voice signal after a process in the case where distance r1
is 1.0 m is shown.
Further, as shown in FIG. 9, there are two noise sources N1 and N2 in the direction in which the
angle φ made with respect to the z axis is 45 °, and the distance between one noise source N1
and the observation point is 1.0 m. Even if the two noise sources N1 and N2 are separated from
each other such that the distance between the other noise source N2 and the observation point is
0.5 m, the reference distance r1 is observed as shown in FIG. The noise suppression amount has
a peak value when the distance from the point to the middle of the two noise sources N1 and N2
is set.
That is, if the reference distance r1 is appropriately changed, sound (noise) from other than the
target position may be suppressed more than when the reference position is always matched
04-05-2019
17
with the target position.
[0070]
Therefore, in the present embodiment, the reference distance r1 used for determining the load
vector W is updated as needed so that sounds (noises) from other than the target position can be
further suppressed.
[0071]
Specifically, the voice processing unit 2 performs an update operation of updating the load
vector W at predetermined time intervals, and may update the reference distance r1 at this time.
The specific content of the update operation will be described using the flowchart of FIG.
[0072]
When the audio processing unit 2 starts the updating operation (S1), first, it determines whether
the volume input to the sound receiving unit 1 is sufficient (S2).
That is, in the present embodiment, the power of the pre-processing audio signal in the
observation window 決定 determined based on the start time of the updating operation
(hereinafter, referred to as “pre-processing power”).
)Pf
[0073]
[0074]
And a determination unit 30 that compares the obtained pre-processing power Pf with a
04-05-2019
18
predetermined silence threshold.
When the determination unit 30 determines that the pre-processing power Pf is less than the
silence threshold, the voice processing unit 2 determines that the input volume is not sufficient
and is not suitable for updating the reference distance r1 or the load vector W, and performs the
update operation End (S3).
On the other hand, when the determination unit 30 determines that the pre-processing power Pf
is equal to or higher than the silence threshold, the voice processing unit 2 determines that the
volume is sufficient, and first, a load vector is obtained by the MV method using the reference
distance r1 before updating. W is determined (S4).
[0075]
After determining the load vector W, the voice processing unit 2 determines whether noise is
present (S5).
That is, the present embodiment includes the post-processing power calculating unit 32 that
calculates the power (post-processing power) Pc of the post-processing audio signal obtained
using the load vector W determined in step S4 using Equation (30). The determination unit 30
refers to a ratio of the post-processing power Pc output by the post-processing power calculation
unit 32 to the pre-processing power Pf output by the pre-processing power calculation unit 31
(hereinafter referred to as a “processing ratio”.
2.) Compare Pc / Pf to a predetermined target sound threshold.
If it is determined in the determination unit 30 that the processing ratio is less than the target
sound threshold, that is, if the sound pressure as a whole of the post-processing audio signal is
lower than the sound pressure f of the pre-processing audio signal, The voice processing unit 2
determines that noise is present, and if the determination unit 30 determines that the processing
ratio is equal to or higher than the target sound threshold, the voice processing unit 2
determines that noise is not present.
04-05-2019
19
On the other hand, when it is determined that the noise does not exist because the determination
unit 30 determines that the processing ratio is less than the target sound threshold, there is no
point in updating the reference distance, so the voice processing unit 2 proceeds directly to step
S3. End the update operation.
On the other hand, when it is determined that noise is present, the audio processing unit 2
determines whether the number of sound sources is one (S6).
The above target sound threshold is a positive constant smaller than one.
Since the determination unit 30, the pre-processing power calculation unit 31, and the postprocessing power calculation unit 32 can be realized by respective well-known electronic circuits,
detailed illustration is omitted.
[0076]
The specific method of determination of step S6 is demonstrated.
The speech processing unit 2 calculates the covariance matrix S estimated from the observation
window
[0077]
[0078]
The number of sound sources is estimated by computing the rank rank (S) of.
An estimation of the number of such sound sources is described in reference 4. That is, if the
rank number rank (S) is 2, the number of sound sources is 1, if the floor rank (S) is 3, the number
of sound sources is 2, and if the floor rank (S) is 4, the sound source The number is 3 or more.
04-05-2019
20
[0079]
When it is determined in step 6 that the number of sound sources is one, the voice processing
unit 2 calculates the distance to the sound source, and updates the reference distance r1 with the
obtained distance as a new reference distance, The load vector W is calculated again according to
the new reference distance r1 and updated (S7), and then the process proceeds to step S3 and
the update operation is ended. A specific method of calculating the distance to the sound source
will be described (see Reference 5). First, τx, τy, ξx, ξy in equations (8) and (9) are
determined by the method of least squares. Let the evaluation function be J = 評 価 {(fx + fxf +
τxft) <2> + (fy + ξyf + τyft) <2>} dt (39) in a short observation window Γ. (39) is partially
differentiated with respect to τx, τy, ξx, ξy, and is set to 0 to obtain the following equation.
[0080]
[0081]
If the matrix elements of the covariance matrix S of equation (21) are used, equations (40) and
(41) become
[0082]
[0083]
It will be rewritten.
By solving the equations (42) and (43), τx, τy, ξx and ξy can be obtained as the following
equations.
[0084]
[0085]
04-05-2019
21
The distance r to the sound source can be obtained by applying the method of least squares from
the equations (8) and (9).
Evaluation function
[0086]
[0087]
And if this is differentially differentiated by 1 / r and is 0
[0088]
[0089]
となる。
If you solve this
[0090]
[0091]
The distance r to the sound source is determined as
In step S7, the voice processing unit 2 sets the distance r obtained by the equations (38), (44),
(45) and (48) as a new reference distance r1.
[0092]
04-05-2019
22
When it is determined in step S6 that the number of sound sources is two or more, the audio
processing unit 2 maximizes the noise suppression amount (that is, minimizes the postprocessing power Pc and the processing ratio). ) Search for the reference distance r1 (S8), update
the reference distance r1 to the value obtained in step S8, and calculate the load vector W again
according to the new reference distance r1 and update (S9) The process proceeds to step S3 to
end the update operation.
Specifically, after increasing or decreasing the reference distance r1 by a predetermined unit
width, the load vector W is determined, and the post-processing power Pc is obtained from the
post-processing power calculation unit 32 and stored. The reference distance r1 is updated such
that the post-processing power Pc is increased (that is, the noise suppression amount is
decreased) even if the direction is changed.
[0093]
According to the above configuration, noise can be suppressed more than when the reference
distance is always set as the target distance.
[0094]
In addition, since this method is based on the space-time gradient method, it is possible to
miniaturize the sound receiving unit 1 as compared with a general beam former or superdirective
microphone, and all calculations are performed in the time domain Computational costs can be
reduced compared to other schemes that require computation in the frequency domain.
[0095]
As shown in FIG. 12, the insensitive processor 33 generates an insensitive speech signal in which
the sound from the target position is not reflected (that is, the target position is assumed to be an
insensitive point), and the power of the insensitive speech signal The post-insensitivity
processing power calculating unit 34 for calculating the post power “Pz” is provided, and in
step S5, the determination unit 30 performs processing before the post-inactivation processing
power Pz output from the post-inactivation processing power calculating unit 34 The ratio to the
pre-processing power Pf output from the power calculation unit 31 (hereinafter, referred to as
"the insensitive processing ratio".
04-05-2019
23
) PZ / Pf may be compared to a predetermined noise threshold.
In this case, if it is determined in the determination unit 30 that the insensitive processing ratio is
equal to or higher than the noise threshold, voices from other than the target position are
present to some extent, so the voice processing unit 2 determines that noise is present. If the
determination unit 30 determines that the processing ratio is less than the noise threshold, the
voice processing unit 2 determines that no noise is present.
The above noise threshold is a positive constant less than one.
The insensitive processor 33 and the post-insensitive power calculator 34 can be realized by
well-known electronic circuits in the same manner as the voice processor 2 and the like, so
detailed circuit diagrams and the like are omitted.
The operation of the insensitivity processor 33 will be described in detail. With respect to H1
(R1) and H2 (R1) defined by the equations (26) and (27), the vector R1 indicating the reference
position is replaced with the vector R0 indicating the target position, and the following equations
(49) and (50) Put two constraints like). WZ <H> H1 (R0) = 0 (49) WZ <H> H2 (R0) = 0 (50) A load
vector satisfying this condition (hereinafter referred to as "dead point load vector"). The
insensitive speech signal obtained by the weighted sum WZ <H> F (t) according to WZ =
(w'w'tw'xw'y) <T> forms an insensitive point at the target position. The power after
desensitization processing Pz is expressed by the following equation.
[0096]
[0097]
Further, in the example of FIG. 12, the post-processing power calculation unit 32 is added, and in
step S8, the post-processing power calculation unit 32 outputs the post-processing power Pz
output from the post-insensitivity processing power calculation unit 34. A reference distance r1
may be searched so as to minimize the ratio PC / PZ of the rear power PC.
[0098]
04-05-2019
24
Further, in step S7 or step S9, the fluctuation range of the reference distance r1 per update
operation is referred to as the upper limit value (hereinafter, referred to as "upper limit width").
), And when the absolute value of the difference between the reference distance obtained in step
S7 and step S8 and the reference distance r1 before updating exceeds the upper limit width, the
reference distance r1 is changed by the upper limit width. It is also good.
By adopting this configuration, distortion of the processed audio signal due to a large change in
the reference distance r1 can be suppressed, and discomfort caused by the audio obtained by
converting (reproducing) the processed audio signal can be reduced.
[0099]
Alternatively, in step S7 or step S9, the fluctuation range of the reference distance r1 in one
updating operation may be fixed. Specifically, for example, the voice processing unit 2 does not
change the reference distance r1, increases the reference distance r1 by the unit increase width,
and decreases the reference distance r1 by the unit decrease width, One operation is performed
such that the reference distance r1 after the end of the update operation is closest to the
reference distance obtained by the calculation in step S7 or the search in step S8. In this case, the
unit increase width and the unit decrease width may be different from each other. If this
configuration is adopted, distortion of the processed audio signal due to a large change in the
reference distance is suppressed, and discomfort caused by the audio converted (processed) from
the processed audio signal is reduced. <References list> Reference 1: Shigeru Ando "Velocity
measurement system using spatio-temporal differentiation of images" Proceedings of the Society
of Measurement and Automatic Control 22-12, 1330/1336 (1986) Reference 2: N. Ono , T. Arita,
Y. Senjo, and S. Ando, "Directivity steering principle for biomimicry silicon microphone", Proc. Int.
Conf. Solid State Sensors, Actuators, and Microsystems (Transducers' 05), pp. 792-795, 2005.
Reference 3: Ono, Ando, "Measurement of sound field and directivity control, The 22nd Sensing
Forum Material, pp. 305-310,2005. Reference 4: Ono, Arita, Chika, Ando, “Theories of
directional control and source separation based on spatiotemporal gradient measurement,
Proceedings of the 2005 Acoustical Society of Japan Acoustical Society Conference, 2-6-13, pp.
607-608, 2005. Reference 5: Shigeru Ando, Hiroyuki Shibata, Katsuya Ogawa, Kun Mitsuyama
"3D Sound Source Localization Sensor System Based on Spatio-temporal Gradient Method"
Proceedings of the Society of Measurement and Control Engineers Vol. 1993
04-05-2019
25
[0100]
1 is a block diagram illustrating an embodiment of the present invention. It is explanatory
drawing which shows an example of the positional relationship of the reference position in the
same as the above, and a sound receiving part. (A) and (b) are explanatory drawings each
showing an example of distribution of the noise suppression amount in the case of FIG. 2, (a)
shows the case where a reference distance is 0.5 m, (b) shows a reference distance The case of
1.0 m is shown. It is explanatory drawing which shows another example of the positional
relationship of the reference position in the same as the above, and a sound receiving part. It is
explanatory drawing which shows an example of distribution of the noise suppression amount in
the case of FIG. It is explanatory drawing which shows an example of the positional relationship
of the position of a noise source in the same as the above, and a sound receiving part. It is
explanatory drawing which shows the relationship of the reference distance and noise
suppression amount in the case of FIG. (A)-(c) is an explanatory view showing the operation of
the same as above, (a) shows the waveform of the audio signal before processing, (b) is the audio
signal after processing when the reference distance is 0.5 m (C) shows the waveform of the
processed audio signal when the reference distance is 1.0 m. It is explanatory drawing which
shows another example of the positional relationship of the position of a noise source in the
same as the above, and a sound receiving part. It is explanatory drawing which shows the
relationship of the reference distance and noise suppression amount in the case of FIG. It is a
flowchart which shows the update operation in the same as the above. It is a block diagram
which shows another form same as the above.
Explanation of sign
[0101]
DESCRIPTION OF SYMBOLS 1 sound receiving part 2 audio processing part 30 determination
part 31 pre-processing power calculating part 32 post-processing power calculating part 33
insensitivity processing part 34 post-insensitive power calculating part
04-05-2019
26
Документ
Категория
Без категории
Просмотров
0
Размер файла
39 Кб
Теги
jp2010066506
1/--страниц
Пожаловаться на содержимое документа