close

Вход

Забыли?

вход по аккаунту

?

JP2017059956

код для вставкиСкачать
Patent Translate
Powered by EPO and Google
Notice
This translation is machine-generated. It cannot be guaranteed that it is intelligible, accurate,
complete, reliable or fit for specific purposes. Critical decisions, such as commercially relevant or
financial decisions, should not be based on machine-translation output.
DESCRIPTION JP2017059956
Abstract: To provide a sound source extraction system and the like that can reliably extract a
sound propagating from a target sound source regardless of the target sound source, the
disturbing sound source, and the directionality existing in an extraction region. A sound source
extraction system according to the present invention is configured such that each sound source
(30, 30) is previously based on output signals from a plurality of microphones (1a, 1b) that
collect sound propagating from an extraction area divided into a plurality of unit areas. A matrix
including a plurality of transfer functions of sound propagation from each unit area
corresponding to the position of 31a, 31b) to each microphone as elements is calculated, and one
inverse matrix obtained therefrom is calculated, Extract the sound propagating from the target
sound source (30) using a matrix. [Selected figure] Figure 6
Sound source extraction system and sound source extraction method
[0001]
The present invention relates to a sound source extraction system and a sound source extraction
method for extracting a sound propagating from a target sound source using a plurality of
microphones.
[0002]
In general, various techniques are known for extracting only sound propagating from a specific
sound source as a target from among spaces in which various sound sources exist (see Patent
Document 1).
03-05-2019
1
Among them, in particular, the beamforming method collects the sound propagated from the
target sound source present at a predetermined position using a microphone array having a
plurality of microphones, and specifies the direction in which the target sound source is located
by arithmetic processing. It becomes possible to separate and extract the target sound source
from other sound sources. By applying such a beamforming method, it is possible to follow not
only when the target sound source is stationary but also when the target sound source is moving.
[0003]
JP 2013-183358
[0004]
In the case of applying the above-described beamforming, if a disturbing sound source is present
in the same direction as the target sound source as viewed from the microphone array, a
situation may occur where the target sound source and the disturbing sound source can not be
separated.
In this case, by narrowing the width of the beam from the microphone array toward the target
sound source, an effect of separating the target sound source from the disturbing sound source
and enhancing the spatial resolution of the sound can be obtained. However, narrowing the beam
width by the beamforming method is not only restricted by microphones and various parameters
of sound, but also inevitably increases in the amount of calculation for enhancing the spatial
resolution. In addition, although it is assumed that multiple microphone arrays are installed to
monitor the target sound source from various directions, even in this case, since each
microphone array independently extracts the target sound source, It is difficult to avoid the
above problems if there is a disturbing sound within the width range.
[0005]
The present invention has been made to solve these problems, and sound source extraction
capable of securing better performance than the case of applying the beamforming method by
arithmetic processing when extracting a sound from a target sound source. It aims at providing a
system etc.
03-05-2019
2
[0006]
In order to solve the above problems, a sound source extraction system according to the present
invention is a sound source extraction system for extracting a sound propagating from a target
sound source, and distributed and arranged outside a predetermined extraction area divided into
a plurality of unit areas. A plurality of microphones collecting sounds propagating from one or
more sound sources including the target sound source, and an output signal of each of the
plurality of microphones are associated in advance with the position of the one or more sound
sources A matrix including, as elements, a plurality of transfer functions of sound propagation
from each of the plurality of unit regions to each of the plurality of microphones is calculated,
and one inverse matrix is determined from the matrix, and this inverse matrix is used Arithmetic
means for extracting a sound generated from the target sound source.
[0007]
According to the sound source extraction system of the present invention, the extraction area in
which the target sound source and the disturbing sound source are present is divided into a
plurality of unit areas, and sound propagation is performed for each position of each unit area in
advance based on each output signal of a plurality of microphones. The sound propagated from
the target sound source is extracted by obtaining a transfer function and using one inverse
matrix giving the inverse characteristic thereof.
Therefore, even in the situation where the target sound source and the disturbing sound source
exist in the same direction from the position of each microphone, distribution is made over the
entire extraction area without performing complicated calculations such as sound source
separation in the conventional beamforming method. The target sound source can be reliably
separated and extracted from the disturbing sound source based on the output signal of the
microphone.
[0008]
In the present invention, a predetermined number of the microphones included in the plurality of
microphones may be respectively provided, and a plurality of microphone arrays disposed at
different positions outside the extraction area may further be provided.
03-05-2019
3
In this case, a plurality of microphone arrays in the sound source extraction system constitute socalled arrays of arrays. For example, if each of the L microphone arrays has M microphones, a
total of L × M microphones will be installed. Even with such an arrangement, since one inverse
matrix is generated in the entire system, it can be easily installed in addition to the effect of
separating the target sound source and the disturbing sound source existing in the same
direction as described above. It is highly useful in point. Note that it is desirable that the plurality
of microphone arrays be dispersedly disposed in the vicinity of the outer edge portion without
being disposed unevenly outside the extraction region.
[0009]
In the present invention, as the plurality of microphone arrays, it is possible to use a spherical
microphone array in which the predetermined number of microphones are arranged on the
surface of a spherical baffle. The spherical micron array is advantageous in that it can be
configured in a small size, and it is possible to perform the operation to generate the abovementioned inverse matrix relatively easily. また、システムとしてロバストになる。
[0010]
Further, in order to solve the above problems, the sound source extraction method according to
the present invention is a sound source extraction method for extracting a sound propagating
from a target sound source, and is distributed outside a predetermined extraction area divided
into a plurality of unit areas. Collecting the sound propagating from one or more sound sources
including the target sound source by the plurality of microphones, and the one or more sound
sources in advance based on output signals of the plurality of microphones. Calculating a matrix
including, as elements, a plurality of transfer functions of sound propagation from each of the
plurality of unit regions associated with the position of each to each of the plurality of
microphones, one inverse matrix obtained from the matrix Calculating the sound source from the
target sound source using the inverse matrix.
[0011]
According to the sound source extraction method of the present invention, the same function and
effect as those of the above-described sound source extraction system can be realized.
Moreover, it is applicable similarly to the above-mentioned also about the structure which
03-05-2019
4
provides several microphone array further. In the sound source extraction method of the present
invention, it is desirable that the target sound source be extracted using a spatial window
function according to the arrangement of the plurality of microphones in the calculation step.
[0012]
According to the present invention, a plurality of microphones are dispersedly arranged outside
an extraction region divided into a plurality of unit regions, and propagation from a target sound
source is performed using one inverse matrix obtained based on a plurality of transfer functions
of sound propagation. Since the sound is extracted, it is possible to construct a highly reliable
sound source extraction system by simple arithmetic processing while avoiding the influence of
the directivity of the target sound source and the disturbing sound source, which is a problem in
the conventional beamforming method. It becomes possible.
[0013]
It is a figure which shows the structure of the spherical microphone array which is a main
component used by the sound source extraction system of this embodiment.
It is a figure which shows the example of arrangement | positioning of the several microphone
array in the sound source extraction system of this embodiment. It is a figure which shows an
example of the functional block of the sound source extraction system corresponding to the
example of arrangement | positioning of FIG. It is a flowchart which shows the flow of the
process mainly related to the calculation of an inverse matrix among the arithmetic processing
performed by an arithmetic processing part. It is a figure which shows typically the case where
the conventional beamforming method is applied, for comparison with the case where arithmetic
processing is applied in the sound source extraction system of this embodiment. It is a figure
which shows typically the case where arithmetic processing is applied in the sound source
extraction system of this embodiment. It is a figure explaining the verification result of the
performance at the time of using the sound source extraction system of this embodiment.
[0014]
Hereinafter, an embodiment of a sound source extraction system to which the present invention
is applied will be described with reference to the attached drawings. However, the embodiment
described below is an example of the form to which the technical idea of the present invention is
03-05-2019
5
applied, and the present invention is not limited by the contents of the present embodiment.
[0015]
FIG. 1 shows the structure of a spherical microphone array (hereinafter simply referred to as a
"microphone array") 1 which is a main component used in the sound source extraction system
according to the present embodiment. The microphone array 1 shown in FIG. 1 includes a
spherical baffle 10 made of a hard material, a plurality of microphones 11 disposed at
predetermined positions on the surface of the sphere of the baffle 10, and electric signals output
from the plurality of microphones 11. And a wiring unit 12 in which a plurality of wires for
transmitting the signal are stored.
[0016]
As shown in the lower part of FIG. 1, the position in the space including the sound source
extraction system of the present embodiment is displayed in polar coordinates by azimuth angle
θ, elevation angle φ, and distance r by converting X, Y, Z coordinates. . For example, the
position on the polar coordinates of any microphone 11 can be expressed as (θ m, φ m, r m),
and if it is assumed that the center of the spherical baffle 10 is the origin, it is attached to one
microphone array 1 All the microphones 11 will be set to the same distance r m.
[0017]
The position of each of the plurality of microphones 11 included in one microphone array 1 is
not limited, but a configuration similar to a general beam forming method can be employed. In
addition, when the number of the plurality of microphones 11 included in one microphone array
1 is too small, the accuracy decreases, and when it is too large, the amount of calculation
necessary for the calculation described later increases. For example, 64 microphones 11 are
attached to one microphone array 1.
[0018]
Here, the sound pressure p m of each microphone 11 is expressed by the following equation (1).
03-05-2019
6
Where k: wave number (= 2πf / c) r <→> m: microphone position vector r <→> s: sound source
position vector p s: sound source sound pressure h n: sphere Hankel function h 'n: The function P
n: nth order Legendre polynomial derived from h n
[0019]
In the conventional method, the above-described sound pressure pm is acquired for each
microphone 11 and the target sound source is extracted using a so-called beamforming method,
whereas in the sound source extraction system of this embodiment, the conventional method The
characteristic point is that the target sound source is extracted by a method different from the
beamforming method of Details of this point will be described later.
[0020]
In the sound source extraction system of the present embodiment, so-called array of arrays is
configured by using a plurality of microphone arrays 1 of FIG. It is necessary to synchronize each
microphone 11 and each microphone array 1 with sampling. FIG. 2 shows an arrangement
example of the plurality of microphone arrays 1 in the sound source extraction system of the
present embodiment. Although the sound source extraction system arranged in the area AA
represented by XY coordinates is assumed in FIG. 2 for easy understanding, an actual sound
source extraction system is configured in a three-dimensional space including the Z direction. In
the example of FIG. 2, in the rectangular area AA including the extraction area A, four
microphone arrays 1 (a), 1 (b), which are arranged symmetrically at four corners outside the
extraction area A, 1 (c) and 1 (d) are shown. In this case, when each microphone array 1 has N
microphones 11, a total of 4N microphones 11 exist. Although the installation positions of the
respective microphone arrays 1 can be freely determined, it is desirable that the microphone
arrays 1 be dispersedly disposed in the vicinity of the outer edge portion of the extraction area A
so that the positions are not deviated as much as possible.
[0021]
The extraction area A forms a large number of grids G (unit areas of the present invention) by
straight line groups arranged at equal intervals along the X direction and the Y direction. Then,
each of one or more sound sources including the target sound source to be extracted is assumed
to be disposed as a point sound source in any grid G of the extraction area A, and from each
03-05-2019
7
sound source based on the position of the grid G Calculation processing of sound source
extraction is performed by the transfer function leading to each microphone 11. The specific
arithmetic processing will be described later. Here, the size and the number of grids G in the
extraction area A are not particularly limited, but are appropriately set according to the amount
of operation and the spatial resolution. That is, in the extraction area A, if the grid G is too small,
the amount of calculation increases, and if the grid G is too large, the spatial resolution becomes
insufficient and the separation of the sound sources becomes difficult, so the grid G is set to an
appropriate size. There is a need.
[0022]
FIG. 3 shows an example of functional blocks of the sound source extraction system
corresponding to the arrangement example of FIG. In the sound source extraction system shown
in FIG. 3, in addition to the four microphone arrays 1 (a), 1 (b), 1 (c) and 1 (d) of FIG. And an
output unit 22. Among them, the AD conversion unit 20, the arithmetic processing unit 21, and
the output unit 22 can be integrally configured by, for example, a personal computer or the like
that can be connected to the four wiring units 12 of the microphone array 1 described above.
[0023]
The 4N microphones 11 included in the four microphone arrays 1 collect the sound propagated
from each sound source, convert each to an analog signal Sa, and transmit it to the AD
conversion unit 20 via the corresponding wiring unit 12 . The AD conversion unit 20 samples
each of 4N analog signals Sa output from the 4N microphones 11 at a predetermined sampling
frequency, and converts the sampled signals into 4N digital signals Sd. That is, in the AD
conversion unit 20, a plurality of AD converters at least corresponding to the number of
microphones 11 are arranged in parallel. The arithmetic processing unit 21 uses the digital
signals Sd obtained by the AD conversion unit 20 to execute arithmetic processing described
later necessary for extracting a target sound source, and generates a signal S corresponding to
the arithmetic result. The output unit 22 outputs the signal S output from the arithmetic
processing unit 21 to a device outside the system or storage means or display means inside the
system.
[0024]
03-05-2019
8
Next, an outline of arithmetic processing in the sound source extraction system according to the
present embodiment will be described. FIG. 4 is a flow chart showing the flow of processing for
calculating the inverse matrix in advance. Here, a reference sound source that generates a known
output sound is arranged on a grid G present in a predetermined extraction area A. As shown in
FIG. 4, based on the output signal of each microphone 11 (corresponding to the digital signal Sd
in FIG. 3), the sound pressure p m corresponding to the output signals of all the microphones 11
of the N microphone arrays 1 is Acquire (step S1). For example, when there are L microphones
11 in total, L sound pressures p m corresponding to each are obtained.
[0025]
Here, the sound pressure p m at each of the microphones 11 in the case of using the spherical
microphone array 1 is expressed by the above-mentioned equation (1), as described above. The
sound from the reference sound source present in the grid G is input to each of the microphones
11 through various paths. Therefore, for each microphone 11 of each microphone array 1, it is
possible to obtain a transfer function of sound propagation from the position of the reference
sound source (corresponding to the grid G in FIG. 2). Calculation sequentially (step S2). In step
S2, the transfer function H am of each of the microphones 11 of the predetermined microphone
array 1 can be expressed by the following equation (2), for example, in relation to the equation
(1). However, r <→> m: microphone position vector r <→> a: microphone array position vector r
<→>: sound source position vector R a: baffle radius h n: sphere Hankel function h 'n : H n
differentiated function P n: nth order Legendre polynomial
[0026]
Here, the total sound pressure p am of each of the microphones 11 included in the
predetermined microphone array 1 is actually represented by the volume integral in the
extraction area A as represented by the following equation (3). However, ψ: distribution of sound
sources at each position
[0027]
On the other hand, since the extraction area A of the present embodiment is divided into the grid
G as described above, the number of calculations of equation (3) increases or decreases
according to the predetermined spatial resolution, and the calculation process is simplified by the
03-05-2019
9
setting of the grid G. can do. First, using an output vector S whose elements are the outputs of all
the microphones 11 of all the microphone arrays 1, the following equation (4) holds. Where H:
matrix な る consisting of all transfer functions H am: size (distribution) of sound generated by
each sound source assuming that all grid points have sound sources (distribution) (4) distribution
The element of 表 し represents the sum of sound energy in any grid G in the extraction area A.
[0028]
In the sound source extraction system of the present embodiment, in order to obtain the
distribution Λ described above, the matrix H having all the transfer functions H am as elements
is obtained from the equation (4), and the inverse matrix H <−1> is obtained in advance. In
addition, it is characteristic that the operation using this one inverse matrix H <−1> is
performed. Therefore, the above-mentioned inverse matrix H <−1> is generated based on all the
transfer functions H am obtained by the equation (2) (step S3). On the other hand, when
obtaining the sound pressure p t of the target sound source in the extraction area A, first, the
output vector S is determined based on the output signals of all the microphones 11 that all the
microphone arrays 1 have.
[0029]
Next, the sound pressure p t of the target sound source is calculated based on the following
equation (5). Where W t: spatial window function
[0030]
The spatial window function W t used in the equation (5) is appropriately set depending on the
spatial resolution of the gridded extraction area A. As described above, as a result of the
arithmetic processing of the sound source extraction system according to the present
embodiment, the target sound source can be extracted within the extraction area A, and the
target sound source can be reliably obtained even in the case where a disturbing sound source is
present. It becomes separable.
[0031]
03-05-2019
10
Here, in the sound source extraction system of the present embodiment, the effects in the case of
applying the above-described arithmetic processing will be described using FIGS. 5 and 6. FIG. 5
schematically illustrates the case where the conventional beamforming method is applied in a
situation where two interfering sound sources 31a and 31b exist in the same direction when
extracting the target sound source 30 by two microphone arrays 1a and 1b. FIG. 6 schematically
shows the case where the method according to the present invention is applied in the same
situation as FIG. In any case, the target sound source 30 and one interfering sound source 31a
are disposed in the direction of the beam Ba from one microphone array 1a, and the target sound
source 30 and the other disturbing sound source 31b in the direction of the beam Bb when
viewed from the other microphone array 1b. , And both beams Ba and Bb are in a mutually
orthogonal positional relationship.
[0032]
First, when the conventional method is applied as shown in FIG. 5, both the target sound source
30 and the disturbing sound source 31a exist in the range of the beam Ba by one microphone
array 1a, and the beam by the other microphone array 1b Both the target sound source 30 and
the disturbing sound source 31b exist in the range of Bb. In the conventional method, each of the
two microphone arrays 1a and 1b independently calculates the inverse characteristic of acoustic
diffusion to extract the target sound source 30, so that the directivity (beam width) of each of the
beams Ba and Bb can be obtained. The restriction makes it difficult to separate the target sound
source 30 from the disturbing sound sources 31a and 31b. In this case, it is not realistic to
separate the target sound source 30 and the disturbing sound sources 31a and 31b by applying
a complicated sound source separation algorithm, since this increases the amount of calculation.
[0033]
On the other hand, in FIG. 6, when the method according to the present invention is applied, the
target sound source is generated using the above-described one inverse matrix calculated in
advance based on the outputs of all the microphones 11 of the respective microphone arrays 1a
and 1b. Extract 30. Therefore, for example, when viewed from one microphone array 1a, it is
possible to separate the target sound source 30 and the disturbing sound sources 31a and 31b
whose directions are different from each other as shown in FIG. 6 as a virtual beam Bc. The same
applies to the relationship between the target sound source 30 and the disturbing sound sources
31a and 31b when viewed from the other microphone array 1b. Therefore, in the whole sound
source extraction system, as long as the target sound source 30 and a large number of other
03-05-2019
11
disturbing sound sources are located on different grids G, the target sound source 30 is easily
extracted without being affected by each disturbing sound source It becomes possible.
[0034]
Next, the simulation verification result of the performance in the case of using the sound source
extraction system of the present embodiment will be described using FIG. For comparison with
the present invention, FIG. 7A does not apply the method related to sound source extraction, FIG.
7B applies the conventional beamforming method, and FIG. 7C relates to the method according
to the present invention. The verification result by each experiment in the case of applying is
shown. In both figures, the extraction performance of the target sound source 30 was compared
in the situation where both the target sound source 30 and the disturbing sound source 31 are
present. The time range of the horizontal axis was 1 second, and the vertical axis was the sound
pressure normalized in the range of -1 to +1. Further, as a sound source extraction system, two
microphone arrays 1 separated by 3 m from the center of the extraction area which is the target
of sound source extraction were installed, and 64 microphones 11 were attached to each of
them. Further, as the target sound source 30, an output period of a burst signal of white noise of
25 ms and a signal stop period of 25 ms were set to be repeated. The target sound source 30 and
the disturbing sound source 31 are independent.
[0035]
As shown in FIG. 7A, the target sound source 30 and the disturbing sound source 31 were set so
that the peaks of the respective sound pressure levels would be equal, from the results when the
above methods were not applied. Therefore, in FIG. 7A, the sound pressure level of the target
sound source 30 is buried in the disturbing sound source 31. On the other hand, in FIG. 7B
where the conventional beamforming method is applied, the target sound source 30 can be
separated from the disturbing sound source 31, but its SN ratio is about 10.4 dB. On the other
hand, in FIG. 7 (C) to which the method according to the present invention is applied, the target
sound source 30 can be separated from the disturbing sound source 31, and the SN ratio is about
18.6 dB, clearly compared to FIG. 7 (B). It was confirmed that the improvement was made.
[0036]
As described above, by adopting the sound source extraction system (sound source extraction
03-05-2019
12
method) to which the present invention is applied, the matrix of transfer functions from each
sound source in the gridded extraction area A to all the microphones 11 is inverted 1 Inverse
matrix can be generated so that the target sound source can be extracted with good performance
without being influenced by the directivity of the disturbing sound source. In this case, a
complicated sound source separation algorithm used in the conventional beamforming method is
unnecessary, and extraction of a target sound source can be performed by a simple arithmetic
process. Further, according to the sound source extraction system to which the present invention
is applied, not only the case where the target sound source is stationary but also the case where
the target sound source is moving can be extracted following it. Although the present invention
can be applied to the plurality of microphones 11 in the extraction area A even when the
microphone array 1 is not configured, the predetermined number of microphone arrays 1 are
used for the corners of the extraction area A and the like. The effect of easy installation can be
obtained.
[0037]
As mentioned above, although the content of the present invention was concretely explained
based on this embodiment, the present invention is not limited to the above-mentioned
embodiment, A various change can be given in the range which does not deviate from the gist.
The main components (FIG. 1, FIG. 2, FIG. 3) of the above embodiment and the procedure of the
arithmetic processing (FIG. 4), etc. are not limited to the contents disclosed in the above
embodiment and the operation of the present invention As long as the effect can be obtained, it
can be changed appropriately.
[0038]
DESCRIPTION OF SYMBOLS 1 Microphone array 10 Baffle 11 Microphone 12 Wiring part 20 AD
conversion part 21 Arithmetic processing part 22 Output part 30 Target sound source 31
Disturbance sound source A Extraction area G Grid
03-05-2019
13
Документ
Категория
Без категории
Просмотров
0
Размер файла
25 Кб
Теги
jp2017059956
1/--страниц
Пожаловаться на содержимое документа