close

Вход

Забыли?

вход по аккаунту

?

JP2017046256

код для вставкиСкачать
Patent Translate
Powered by EPO and Google
Notice
This translation is machine-generated. It cannot be guaranteed that it is intelligible, accurate,
complete, reliable or fit for specific purposes. Critical decisions, such as commercially relevant or
financial decisions, should not be based on machine-translation output.
DESCRIPTION JP2017046256
Abstract: The present invention provides a binaural signal generation technology capable of
generating a highly accurate binaural signal. A binaural signal generation apparatus matches a
desired sound field with a synthetic sound field in consideration of a plurality of head related
transfer functions based on reproduced sound from a circularly arranged circular speaker array.
And a filtering unit 1 that generates a binaural signal by convoluting a filter obtained by
performing the above-described process and a signal observed by a circular microphone array
arranged in a circular shape in a space-time direction. [Selected figure] Figure 1
Binaural signal generation apparatus, method and program
[0001]
The present invention relates to a technology for collecting a sound signal with a microphone
installed in a certain sound field and reproducing the sound field using the sound signal.
[0002]
As a technique for generating a binaural signal for virtually reproducing the sound field of a
remote place using a plurality of microphones, for example, the technique described in NonPatent Document 1 is known.
In the technique described in Non-Patent Document 1, a binaural signal is generated by
convoluting one head related transfer function (HRTF) corresponding to one direction.
10-05-2019
1
[0003]
Hirahara Tatsuya, et al., "Problems concerning measurement of head related transfer functions
and binaural reproduction," IEICE Information and Frontier Society Fundamentals Review, vol. 2,
no. 4, pp. 68-85, 2009.
[0004]
In the technique described in Non-Patent Document 1, only one head related transfer function is
used. For this reason, the accuracy of the generated binaural signal was not necessarily high.
[0005]
An object of the present invention is to provide a binaural signal generation apparatus, method
and program for generating a more accurate binaural signal.
[0006]
According to the binaural signal generation device according to one aspect of the present
invention, the desired sound field and the synthetic sound field are matched in consideration of a
plurality of head-related transfer functions based on the reproduced sound from the circularly
arranged circular speaker array. , And a filtering unit that generates a binaural signal by
convoluting in a space-time direction the observation signal of the circular microphone array
arranged in a circle.
[0007]
By using a plurality of head related transfer functions, a highly accurate binaural signal can be
generated.
[0008]
FIG. 2 is a functional block diagram showing an example of a binaural signal generation device
according to the first embodiment.
10-05-2019
2
The flow chart which shows the example of the binaural signal generation method of a first
embodiment.
The figure for demonstrating a coordinate system.
FIG. 2 is a functional block diagram showing an example of a binaural signal generation device
according to the first embodiment. The flow chart which shows the example of the binaural
signal generation method of a first embodiment.
[0009]
Hereinafter, embodiments of the present invention will be described with reference to the
drawings. In the following description, the symbols “~”, “<->”, “^”, etc. used in the text
should originally be written directly above the previous character, but due to the limitations of
the text notation Immediately after In the formula, these symbols are described at their original
positions. Moreover, the processing performed in each element unit of a vector or a matrix is
applied to all elements of the vector or the matrix unless otherwise noted.
[0010]
First Embodiment The first embodiment relates to a binaural signal generation apparatus and
method for generating a binaural signal based on signals collected by a two-dimensionally
arranged circular microphone array. First, the technical background of the first embodiment will
be described.
[0011]
<Technical Background> A circular speaker array composed of three or more speakers is
disposed on the circumference of radius R s, and it is assumed that a head is present at the center
of the circle. The speaker position is written as r s = (R s, φ s, 0) in a cylindrical coordinate
system (see FIG. 3). Let the positions of the left and right ears be r L = (RL, φ L, 0) and r R = (RR,
10-05-2019
3
φ R, 0), respectively, and HRTF in the time frequency domain be G <L> (r L-rs) , G <R> (r R-rs). It
is assumed that G <L> (rL-rs) and G <R> (rR-rs) are known in advance. The measurement of
HRTFs does not necessarily have to be performed using a speaker array, and HRTFs may be
measured separately from each direction. The shape of the head may be modeled, and the HRTF
may be obtained by simulation. Hereinafter, only the time frequency domain signal P <L> (r L) of
the left ear will be discussed, but the same argument holds true for the right ear.
[0012]
Considering combining the left ear signal P <L> (r L) by convolving the signal D (rs) with the
transfer function from each speaker position, P <L> (r L) is It can be written like.
[0013]
[0014]
Here, the variable with "~" represents that it is a cylindrical harmonic domain expression, and m
is the order in the cylindrical harmonic domain expression.
Also, j is an imaginary number, e is a Napier number, and π is a pi.
[0015]
[0016]
Now, D (·) may be a drive signal for synthesizing a desired sound field P <des> (r).
[0017]
Let r = (r, φ, 0) be an arbitrary position in the inner region of the circle of radius R s, and the
sound field synthesized using the speaker on the circumference without the head be P <syn> (r
And).
10-05-2019
4
このとき、
[0018]
[0019]
となる。
Here, G (rr s) represents the transfer characteristic of each speaker, and a measurement or
physical phenomenon is modeled (an example modeled as a linear sound source will be described
later).
Can be obtained by
In the past, signal conversion was formulated assuming plane wave propagation from the
direction of the speaker at the time of obtaining HRTF, but here, the following formulation is
conducted assuming spherical wave propagation from a point sound source .
[0020]
It is assumed that the cylindrical harmonic spectrum of the desired sound field ~ P <des> (r, m) =
<-> P <des> (m) J m (kr) is estimated on the sound collection side. This may be synthesized by
simulation. Here, k is the wave number, and J m is the m-th order Bessel function. Also, <-> P
<des> (m) is a coefficient of cylindrical harmonic expansion of the desired sound field. Since it is
sufficient if P <syn> (r) and P <des> (r) match,
[0021]
[0022]
10-05-2019
5
となる。
If ~ G (r-R s, m) is obtained by measuring on the circumference of radius R ref
[0023]
[0024]
となる。
Therefore, by substituting the equation (2) into the equation (1), a binaural signal can be
obtained.
[0025]
[0026]
The cylindrical harmonic spectrum of the desired sound field ~ P <des> (r, m) = P <des> (m) J m
(kr) has several estimation methods, but here a cylindrical rigid body of radius R m Consider
using a circular microphone array placed on the baffle of.
The circular microphone array is composed of three or more microphones. Here, a baffle means
what attaches a microphone or a speaker. The presence of the baffle itself affects the transfer
function of the sound when it is picked up or played back. Further, depending on whether the
baffle is a rigid body or a sound absorber, it also affects the relationship between the cylindrical
harmonic spectrum of the desired sound field and the cylindrical harmonic spectrum of the
observation signal.
[0027]
Assuming that the cylindrical harmonic spectrum of the signal observed by the microphone array
is ~ P <rcv> (Rm, m), ~ P <des> (r, m) is as follows.
10-05-2019
6
[0028]
[0029]
Here, H m <(1)> 'is a derivative of the m-th kind Hankel function H m <(1)> of the first order.
As another example, consider using a circular microphone array of radius R m on a spherical
rigid baffle of radius R m.
Assuming that the cylindrical harmonic spectrum of the signal observed by the microphone array
is ~ P <rcv> (Rm, m), ~ P <des> (r, m) is as follows.
[0030]
[0031]
It is to be noted that ~ P <des> (r, m) is defined by an equation other than the above depending on
the arrangement of the microphone array and the shape and properties of the baffle (rigid body,
sound absorber, and whether it is disposed in the air) It is also good.
[0032]
Here, P n <m> is the Legendre 陪 function.
Substituting equation (4) into equation (3),
[0033]
[0034]
10-05-2019
7
となる。
By this equation, binaural signals can be synthesized from the circular microphone array signal
by the space-time convolution of the filter in the time frequency domain defined below.
[0035]
[0036]
To be precise, the convolution uses a space-time domain filter obtained by the inverse Fourier
transform of F <L>.
That is, a binaural signal can be generated by performing spatiotemporal convolution of the
observation signal of the circular microphone array with the spatiotemporal domain filter
obtained using the inverse Fourier transform of F <L>. .
[0037]
Note that the filter coefficients ~ F <L> (m) in the space-time frequency domain of order m
[0038]
[0039]
If the binaural signal is defined as in the above, the filter coefficients of the space-time frequency
domain of order m to F <L> (m) and a circular microphone array signal can also be expressed as
follows.
[0040]
[0041]
10-05-2019
8
Equation (5) was assumed to use ~ G obtained by measurement, but here, the case of modeling as
a linear sound source will be considered.
[0042]
[0043]
であることより、
[0044]
[0045]
となる。
However, since an actual speaker has a characteristic closer to a point sound source than a linear
sound source, it is necessary to multiply it by (2πj / k) <1/2> as a correction term.
[0046]
Note that following the movement of the listener's head can be realized by processing the
cylindrical harmonic spectrum acquired by the recording side circular microphone array.
For simple rotations, a simple phase shift may be applied.
The cylindrical harmonic spectrum ~ P <rot> (R m, m) where phase shift is applied, where the
rotation angle is φ rot is
[0047]
10-05-2019
9
[0048]
となる。
It is also possible to estimate a cylindrical harmonic spectrum whose origin is a position different
from the center of the circular microphone array.
For example, the cylindrical harmonic spectrum <-> P <trans> at the position (R t, φ t) is
[0049]
[0050]
Obtained as
When cylindrical harmonic spectra at a plurality of positions can be obtained, estimation by the
least squares method is possible.
[0051]
The binaural signal may be generated by performing spatial direction convolution using the filter
of equation (A) or equation (B) as in the following equation.
When generating a binaural signal by convolution in the space direction, the number of
loudspeakers constituting the circular loudspeaker array must be equal to the number of
microphones constituting the circular microphone array, and NL is the number of loudspeakers.
= The number of microphones.
φ i is the direction corresponding to φ L.
10-05-2019
10
[0052]
[0053]
<Binaural Signal Generation Apparatus and Method> The binaural signal generation apparatus
according to the first embodiment includes a filtering unit 1 and a filter generation unit 2 as
shown in FIG.
The binaural signal generation device implements the binaural signal generation method by
executing the process of step S1 shown in FIG.
[0054]
A filtering unit 1 receives a filter obtained by matching a desired sound field with a synthetic
sound field in consideration of a plurality of head transfer functions based on reproduced sound
from a circular speaker array arranged in a circle. Ru.
[0055]
This filter is a filter obtained by inverse Fourier transform of the filter defined by equation (A) or
equation (B).
This filter is generated in advance by the filter generation unit 2 prior to the processing of the
filtering unit 1.
[0056]
[0057]
Further, an observation signal from a circular microphone array arranged in a circular shape is
input to the filtering unit 1.
10-05-2019
11
[0058]
The filtering unit 1 generates a binaural signal by convoluting the input filter and the input
observation signal in the space-time direction (step S1).
[0059]
The processing of the filtering unit 1 described above is merely an example.
The filtering unit 1 may generate a binaural signal by the other filtering process or convolution
process described in the section of <Technical background>.
[0060]
That is, for example, the filtering unit 1 may perform the filtering process using a filter obtained
by the inverse Fourier transform of F <L> instead of the filter F <L>.
[0061]
In addition, for example, when the listener changes the direction by φ rot with P P <r c v> (R m,
m) as the cylindrical harmonic spectrum of the above observation signal, the filtering unit 1
substitutes the equation Convolution of the above filter with ~ P <rcv> (R m, m) defined by (C),
[0062]
[0063]
When the listener moves to the position (R t, φ t), even if the above filter is convoluted with ~ P
<trans> (R m, m) defined by equation (D) instead of the above observation signal Good.
[0064]
[0065]
Thus, by considering a plurality of head related transfer functions, a highly accurate binaural
signal can be generated.
10-05-2019
12
[0066]
Second Embodiment The second embodiment is a binaural signal generation apparatus and
method for generating a binaural signal based on a signal collected by a three-dimensionally
arranged spherical microphone array.
First, the technical background of the second embodiment will be described.
[0067]
Technical Background It is assumed that a spherical speaker array composed of three or more
speakers is disposed on a spherical surface of radius R s, and that a head is present at the center
of the spherical speaker array.
The speaker position is written as r s = (R s, θ s, φ s) in a spherical coordinate system.
Let the positions of the left and right ears be r L = (RL, θ L, φ L) and r R = (RR, θ R, φ R),
respectively, and HRTF in the time frequency domain be G <L> (r L- Write rs), G <R> (r R -rs).
It is assumed that G <L> (rL-rs) and G <R> (rR-rs) are known in advance.
The measurement of HRTFs does not necessarily have to be performed using a speaker array,
and HRTFs may be measured separately from each direction.
The shape of the head may be modeled, and the HRTF may be obtained by simulation.
Hereinafter, only the time frequency domain signal P <L> (r L) of the left ear will be discussed,
but the same argument holds true for the right ear.
10-05-2019
13
[0068]
Considering combining the left ear signal P <L> (r L) by convolving the signal D (rs) with the
transfer function from each speaker position, P <L> (r L) is It can be written like.
[0069]
[0070]
It should be noted that in contrast to the first embodiment using a circular array, the second
embodiment using a spherical array can not be described as a spatial convolution.
[0071]
Now, D (·) may be a drive signal for synthesizing a desired sound field.
Let r = (r, θ, φ) be an arbitrary position in the inner region, and let P <syn> (r) be a sound field
synthesized using a loudspeaker on a spherical surface without a head.
At this time, assuming the axial symmetry of each speaker, the north pole position of η = (0, 0, R
s),
[0072]
[0073]
となる。
SO (3) is a third-order rotation group (special orthogonal group).
Here, G (rr s) represents the transfer characteristic of each speaker and can be obtained by
measurement or modeling.
10-05-2019
14
In the past, signal conversion was formulated assuming a plane wave from a sound source at the
time of obtaining HRTF, whereas here the following formulation is performed assuming a
spherical wave from a point sound source.
Y n <m> is a spherical harmonic, and n and m are their orders.
Here, the variable with "~" represents a spherical harmonic spectrum region.
[0074]
On the sound collection side, it is assumed that the spherical harmonic spectrum of the desired
sound field ~ P <des> (r, n, m) = <-> P <des> (n, m) j n (kr).
This may be synthesized by simulation. Since it is sufficient if P <syn> (r) and P <des> (r) match,
[0075]
[0076]
となる。
j n is an n-order first-order spherical Bessel function. If ~ G (r-R s, n, 0) is obtained by measuring
on a sphere of radius R ref
[0077]
[0078]
10-05-2019
15
となる。
Therefore, by substituting the equation (7) into the equation (6), a binaural signal can be
obtained.
[0079]
[0080]
Spherical harmonic spectrum of the desired sound field ~ P <des> (r, n, m) = <-> P <des> (n, m) jn
(kr) has several estimation methods. Consider using a spherical microphone array located on an
R m spherical baffle.
The spherical microphone array is composed of three or more microphones.
[0081]
Here, a baffle means what attaches a microphone or a speaker. The presence of the baffle itself
affects the transfer function of the sound when it is picked up or played back. Further, the
relationship between the spherical harmonic spectrum and the spherical harmonic spectrum of
the observation signal is influenced by whether the baffle is a rigid body or a sound absorber.
[0082]
Assuming that the spherical harmonic spectrum of the signal observed by the microphone array
is ~ P <rcv> (Rm, n, m), ~ P <des> (r, n, m) is as follows. h n <(1)> 'means the derivative of the n-th
kind Hankel function h n <(1)>.
[0083]
10-05-2019
16
[0084]
Substituting this equation (8),
[0085]
[0086]
となる。
S is a set of speakers constituting a spherical speaker array used for measurement of HRTF.
Unlike in the circular array, signal conversion with a single filter convolution is not possible, but
after applying the filter of the following equation (E) to the signal of the microphone array, the
convolution with each HRTF is measured by HRTF. A signal obtained by adding the number of
speakers used in the above can be used as a binaural signal.
[0087]
That is, a time signal is obtained by space-time convoluting the filter of equation (E) with the
observation signal from the spherical microphone array arranged in a sphere, and the time signal
corresponding to each speaker position of the spherical speaker array of the obtained signal And
the time signal corresponding to each speaker position of the spherical speaker array of a
plurality of head transfer functions based on the reproduced sound from the spherical speaker
array in the time direction by the number of speakers constituting the spherical speaker array By
adding together, a binaural signal in the time domain is generated.
[0088]
[0089]
Note that filter coefficients of order n and m to F <L> (rs, n, m)
[0090]
10-05-2019
17
[0091]
The binaural signal is expressed as follows by filter coefficients of order n and m and a circular
microphone array signal.
[0092]
[0093]
Formula (F) was assumed to use ~ G obtained by measurement, but here, a case where it is
modeled as a point sound source will be considered.
[0094]
[0095]
であることより、
[0096]
[0097]
となる。
The filter of this formula (F) may be used instead of the filter of formula (E).
[0098]
<Binaural Signal Generation Apparatus and Method> The binaural signal generation apparatus
according to the second embodiment includes a filtering unit 1 and a filter generation unit 2 as
shown in FIG.
10-05-2019
18
The binaural signal generation apparatus implements the process of each step shown in FIG. 5 to
realize the binaural signal generation method.
[0099]
A filter obtained by matching a desired sound field with a synthetic sound field in consideration
of a spherical speaker array disposed in a spherical shape is input to the filtering unit 1.
[0100]
This filter is a filter defined by equation (E) or equation (F).
This filter is generated in advance by the filter generation unit 2 prior to the processing of the
filtering unit 1.
[0101]
[0102]
Further, an observation signal from a circular microphone array arranged in a spherical shape is
input to the filtering unit 1.
[0103]
The filtering unit 1 obtains a signal by performing space-time convolution of the input filter and
the input observation signal.
The obtained signal is output to the signal generator 3.
[0104]
The signal generation unit 3 generates a plurality of head transmissions based on the time
signals corresponding to each speaker position of the spherical speaker array used to obtain the
10-05-2019
19
HRTF of the signal obtained by the filtering unit 1 and the reproduced sound from the spherical
speaker array A time-domain binaural signal is generated by adding the time signal
corresponding to each speaker position of the spherical speaker array of the function and
convolved in the time direction by the number of speakers constituting the spherical speaker
array.
[0105]
The processing of the filtering unit 1 described above is merely an example.
The filtering unit 1 may generate a binaural signal by the other filtering process or convolution
process described in the section of <Technical background>.
[0106]
That is, for example, even if the filtering unit 1 generates a binaural signal by performing the
process defined by the equation (9), using ~P <rcv> (R m, m) as the spherical harmonic spectrum
of the observation signal. Good.
[0107]
Thus, by considering a plurality of head related transfer functions, a highly accurate binaural
signal can be generated.
[0108]
[Function] Each function used in the above description will be described.
[0109]
The n-th kind Hankel function H n <(1)> (·) and the n-th order Bessel function J n (·) are defined
as follows, where · is an arbitrary real number.
Γ (z) is a gamma function and Y n (z) is a Neumann function.
10-05-2019
20
[0110]
[0111]
The n-th-order first-class Hankel function h <(1)> n (·) and the n-th-order spherical Bessel
function j n (·) are defined as follows.
[0112]
[0113]
Legendre ル function P <m> n (·) and is defined as follows.
P n (·) represents a Legendre polynomial.
[0114]
[0115]
The spherical harmonic function Y n <m> is defined by the following equation.
[0116]
[0117]
[Modification] The filtering process in the time frequency domain or the space-time frequency
domain may be performed in the space-time domain.
That is, the above-mentioned filtering process may be performed by the convolution in the time
direction and the convolution in the space direction.
10-05-2019
21
[0118]
The binaural signal generation device can be realized by a computer.
In this case, the processing content of each part of this apparatus is described by a program.
And each part in this apparatus is implement | achieved on a computer by running this program
by computer.
[0119]
The program describing the processing content can be recorded in a computer readable
recording medium.
Further, in this embodiment, these devices are configured by executing a predetermined program
on a computer, but at least a part of the processing contents may be realized as hardware.
[0120]
The present invention is not limited to the above-described embodiment, and various
modifications can be made without departing from the spirit of the present invention.
[0121]
1 filtering unit 2 filter generation unit 3 signal generation unit
10-05-2019
22
Документ
Категория
Без категории
Просмотров
0
Размер файла
30 Кб
Теги
jp2017046256
1/--страниц
Пожаловаться на содержимое документа