Patent Translate Powered by EPO and Google Notice This translation is machine-generated. It cannot be guaranteed that it is intelligible, accurate, complete, reliable or fit for specific purposes. Critical decisions, such as commercially relevant or financial decisions, should not be based on machine-translation output. DESCRIPTION JP2017046256 Abstract: The present invention provides a binaural signal generation technology capable of generating a highly accurate binaural signal. A binaural signal generation apparatus matches a desired sound field with a synthetic sound field in consideration of a plurality of head related transfer functions based on reproduced sound from a circularly arranged circular speaker array. And a filtering unit 1 that generates a binaural signal by convoluting a filter obtained by performing the above-described process and a signal observed by a circular microphone array arranged in a circular shape in a space-time direction. [Selected figure] Figure 1 Binaural signal generation apparatus, method and program [0001] The present invention relates to a technology for collecting a sound signal with a microphone installed in a certain sound field and reproducing the sound field using the sound signal. [0002] As a technique for generating a binaural signal for virtually reproducing the sound field of a remote place using a plurality of microphones, for example, the technique described in NonPatent Document 1 is known. In the technique described in Non-Patent Document 1, a binaural signal is generated by convoluting one head related transfer function (HRTF) corresponding to one direction. 10-05-2019 1 [0003] Hirahara Tatsuya, et al., "Problems concerning measurement of head related transfer functions and binaural reproduction," IEICE Information and Frontier Society Fundamentals Review, vol. 2, no. 4, pp. 68-85, 2009. [0004] In the technique described in Non-Patent Document 1, only one head related transfer function is used. For this reason, the accuracy of the generated binaural signal was not necessarily high. [0005] An object of the present invention is to provide a binaural signal generation apparatus, method and program for generating a more accurate binaural signal. [0006] According to the binaural signal generation device according to one aspect of the present invention, the desired sound field and the synthetic sound field are matched in consideration of a plurality of head-related transfer functions based on the reproduced sound from the circularly arranged circular speaker array. , And a filtering unit that generates a binaural signal by convoluting in a space-time direction the observation signal of the circular microphone array arranged in a circle. [0007] By using a plurality of head related transfer functions, a highly accurate binaural signal can be generated. [0008] FIG. 2 is a functional block diagram showing an example of a binaural signal generation device according to the first embodiment. 10-05-2019 2 The flow chart which shows the example of the binaural signal generation method of a first embodiment. The figure for demonstrating a coordinate system. FIG. 2 is a functional block diagram showing an example of a binaural signal generation device according to the first embodiment. The flow chart which shows the example of the binaural signal generation method of a first embodiment. [0009] Hereinafter, embodiments of the present invention will be described with reference to the drawings. In the following description, the symbols “~”, “<->”, “^”, etc. used in the text should originally be written directly above the previous character, but due to the limitations of the text notation Immediately after In the formula, these symbols are described at their original positions. Moreover, the processing performed in each element unit of a vector or a matrix is applied to all elements of the vector or the matrix unless otherwise noted. [0010] First Embodiment The first embodiment relates to a binaural signal generation apparatus and method for generating a binaural signal based on signals collected by a two-dimensionally arranged circular microphone array. First, the technical background of the first embodiment will be described. [0011] <Technical Background> A circular speaker array composed of three or more speakers is disposed on the circumference of radius R s, and it is assumed that a head is present at the center of the circle. The speaker position is written as r s = (R s, φ s, 0) in a cylindrical coordinate system (see FIG. 3). Let the positions of the left and right ears be r L = (RL, φ L, 0) and r R = (RR, 10-05-2019 3 φ R, 0), respectively, and HRTF in the time frequency domain be G <L> (r L-rs) , G <R> (r R-rs). It is assumed that G <L> (rL-rs) and G <R> (rR-rs) are known in advance. The measurement of HRTFs does not necessarily have to be performed using a speaker array, and HRTFs may be measured separately from each direction. The shape of the head may be modeled, and the HRTF may be obtained by simulation. Hereinafter, only the time frequency domain signal P <L> (r L) of the left ear will be discussed, but the same argument holds true for the right ear. [0012] Considering combining the left ear signal P <L> (r L) by convolving the signal D (rs) with the transfer function from each speaker position, P <L> (r L) is It can be written like. [0013] [0014] Here, the variable with "~" represents that it is a cylindrical harmonic domain expression, and m is the order in the cylindrical harmonic domain expression. Also, j is an imaginary number, e is a Napier number, and π is a pi. [0015] [0016] Now, D (·) may be a drive signal for synthesizing a desired sound field P <des> (r). [0017] Let r = (r, φ, 0) be an arbitrary position in the inner region of the circle of radius R s, and the sound field synthesized using the speaker on the circumference without the head be P <syn> (r And). 10-05-2019 4 このとき、 [0018] [0019] となる。 Here, G (rr s) represents the transfer characteristic of each speaker, and a measurement or physical phenomenon is modeled (an example modeled as a linear sound source will be described later). Can be obtained by In the past, signal conversion was formulated assuming plane wave propagation from the direction of the speaker at the time of obtaining HRTF, but here, the following formulation is conducted assuming spherical wave propagation from a point sound source . [0020] It is assumed that the cylindrical harmonic spectrum of the desired sound field ~ P <des> (r, m) = <-> P <des> (m) J m (kr) is estimated on the sound collection side. This may be synthesized by simulation. Here, k is the wave number, and J m is the m-th order Bessel function. Also, <-> P <des> (m) is a coefficient of cylindrical harmonic expansion of the desired sound field. Since it is sufficient if P <syn> (r) and P <des> (r) match, [0021] [0022] 10-05-2019 5 となる。 If ~ G (r-R s, m) is obtained by measuring on the circumference of radius R ref [0023] [0024] となる。 Therefore, by substituting the equation (2) into the equation (1), a binaural signal can be obtained. [0025] [0026] The cylindrical harmonic spectrum of the desired sound field ~ P <des> (r, m) = P <des> (m) J m (kr) has several estimation methods, but here a cylindrical rigid body of radius R m Consider using a circular microphone array placed on the baffle of. The circular microphone array is composed of three or more microphones. Here, a baffle means what attaches a microphone or a speaker. The presence of the baffle itself affects the transfer function of the sound when it is picked up or played back. Further, depending on whether the baffle is a rigid body or a sound absorber, it also affects the relationship between the cylindrical harmonic spectrum of the desired sound field and the cylindrical harmonic spectrum of the observation signal. [0027] Assuming that the cylindrical harmonic spectrum of the signal observed by the microphone array is ~ P <rcv> (Rm, m), ~ P <des> (r, m) is as follows. 10-05-2019 6 [0028] [0029] Here, H m <(1)> 'is a derivative of the m-th kind Hankel function H m <(1)> of the first order. As another example, consider using a circular microphone array of radius R m on a spherical rigid baffle of radius R m. Assuming that the cylindrical harmonic spectrum of the signal observed by the microphone array is ~ P <rcv> (Rm, m), ~ P <des> (r, m) is as follows. [0030] [0031] It is to be noted that ~ P <des> (r, m) is defined by an equation other than the above depending on the arrangement of the microphone array and the shape and properties of the baffle (rigid body, sound absorber, and whether it is disposed in the air) It is also good. [0032] Here, P n <m> is the Legendre 陪 function. Substituting equation (4) into equation (3), [0033] [0034] 10-05-2019 7 となる。 By this equation, binaural signals can be synthesized from the circular microphone array signal by the space-time convolution of the filter in the time frequency domain defined below. [0035] [0036] To be precise, the convolution uses a space-time domain filter obtained by the inverse Fourier transform of F <L>. That is, a binaural signal can be generated by performing spatiotemporal convolution of the observation signal of the circular microphone array with the spatiotemporal domain filter obtained using the inverse Fourier transform of F <L>. . [0037] Note that the filter coefficients ~ F <L> (m) in the space-time frequency domain of order m [0038] [0039] If the binaural signal is defined as in the above, the filter coefficients of the space-time frequency domain of order m to F <L> (m) and a circular microphone array signal can also be expressed as follows. [0040] [0041] 10-05-2019 8 Equation (5) was assumed to use ~ G obtained by measurement, but here, the case of modeling as a linear sound source will be considered. [0042] [0043] であることより、 [0044] [0045] となる。 However, since an actual speaker has a characteristic closer to a point sound source than a linear sound source, it is necessary to multiply it by (2πj / k) <1/2> as a correction term. [0046] Note that following the movement of the listener's head can be realized by processing the cylindrical harmonic spectrum acquired by the recording side circular microphone array. For simple rotations, a simple phase shift may be applied. The cylindrical harmonic spectrum ~ P <rot> (R m, m) where phase shift is applied, where the rotation angle is φ rot is [0047] 10-05-2019 9 [0048] となる。 It is also possible to estimate a cylindrical harmonic spectrum whose origin is a position different from the center of the circular microphone array. For example, the cylindrical harmonic spectrum <-> P <trans> at the position (R t, φ t) is [0049] [0050] Obtained as When cylindrical harmonic spectra at a plurality of positions can be obtained, estimation by the least squares method is possible. [0051] The binaural signal may be generated by performing spatial direction convolution using the filter of equation (A) or equation (B) as in the following equation. When generating a binaural signal by convolution in the space direction, the number of loudspeakers constituting the circular loudspeaker array must be equal to the number of microphones constituting the circular microphone array, and NL is the number of loudspeakers. = The number of microphones. φ i is the direction corresponding to φ L. 10-05-2019 10 [0052] [0053] <Binaural Signal Generation Apparatus and Method> The binaural signal generation apparatus according to the first embodiment includes a filtering unit 1 and a filter generation unit 2 as shown in FIG. The binaural signal generation device implements the binaural signal generation method by executing the process of step S1 shown in FIG. [0054] A filtering unit 1 receives a filter obtained by matching a desired sound field with a synthetic sound field in consideration of a plurality of head transfer functions based on reproduced sound from a circular speaker array arranged in a circle. Ru. [0055] This filter is a filter obtained by inverse Fourier transform of the filter defined by equation (A) or equation (B). This filter is generated in advance by the filter generation unit 2 prior to the processing of the filtering unit 1. [0056] [0057] Further, an observation signal from a circular microphone array arranged in a circular shape is input to the filtering unit 1. 10-05-2019 11 [0058] The filtering unit 1 generates a binaural signal by convoluting the input filter and the input observation signal in the space-time direction (step S1). [0059] The processing of the filtering unit 1 described above is merely an example. The filtering unit 1 may generate a binaural signal by the other filtering process or convolution process described in the section of <Technical background>. [0060] That is, for example, the filtering unit 1 may perform the filtering process using a filter obtained by the inverse Fourier transform of F <L> instead of the filter F <L>. [0061] In addition, for example, when the listener changes the direction by φ rot with P P <r c v> (R m, m) as the cylindrical harmonic spectrum of the above observation signal, the filtering unit 1 substitutes the equation Convolution of the above filter with ~ P <rcv> (R m, m) defined by (C), [0062] [0063] When the listener moves to the position (R t, φ t), even if the above filter is convoluted with ~ P <trans> (R m, m) defined by equation (D) instead of the above observation signal Good. [0064] [0065] Thus, by considering a plurality of head related transfer functions, a highly accurate binaural signal can be generated. 10-05-2019 12 [0066] Second Embodiment The second embodiment is a binaural signal generation apparatus and method for generating a binaural signal based on a signal collected by a three-dimensionally arranged spherical microphone array. First, the technical background of the second embodiment will be described. [0067] Technical Background It is assumed that a spherical speaker array composed of three or more speakers is disposed on a spherical surface of radius R s, and that a head is present at the center of the spherical speaker array. The speaker position is written as r s = (R s, θ s, φ s) in a spherical coordinate system. Let the positions of the left and right ears be r L = (RL, θ L, φ L) and r R = (RR, θ R, φ R), respectively, and HRTF in the time frequency domain be G <L> (r L- Write rs), G <R> (r R -rs). It is assumed that G <L> (rL-rs) and G <R> (rR-rs) are known in advance. The measurement of HRTFs does not necessarily have to be performed using a speaker array, and HRTFs may be measured separately from each direction. The shape of the head may be modeled, and the HRTF may be obtained by simulation. Hereinafter, only the time frequency domain signal P <L> (r L) of the left ear will be discussed, but the same argument holds true for the right ear. 10-05-2019 13 [0068] Considering combining the left ear signal P <L> (r L) by convolving the signal D (rs) with the transfer function from each speaker position, P <L> (r L) is It can be written like. [0069] [0070] It should be noted that in contrast to the first embodiment using a circular array, the second embodiment using a spherical array can not be described as a spatial convolution. [0071] Now, D (·) may be a drive signal for synthesizing a desired sound field. Let r = (r, θ, φ) be an arbitrary position in the inner region, and let P <syn> (r) be a sound field synthesized using a loudspeaker on a spherical surface without a head. At this time, assuming the axial symmetry of each speaker, the north pole position of η = (0, 0, R s), [0072] [0073] となる。 SO (3) is a third-order rotation group (special orthogonal group). Here, G (rr s) represents the transfer characteristic of each speaker and can be obtained by measurement or modeling. 10-05-2019 14 In the past, signal conversion was formulated assuming a plane wave from a sound source at the time of obtaining HRTF, whereas here the following formulation is performed assuming a spherical wave from a point sound source. Y n <m> is a spherical harmonic, and n and m are their orders. Here, the variable with "~" represents a spherical harmonic spectrum region. [0074] On the sound collection side, it is assumed that the spherical harmonic spectrum of the desired sound field ~ P <des> (r, n, m) = <-> P <des> (n, m) j n (kr). This may be synthesized by simulation. Since it is sufficient if P <syn> (r) and P <des> (r) match, [0075] [0076] となる。 j n is an n-order first-order spherical Bessel function. If ~ G (r-R s, n, 0) is obtained by measuring on a sphere of radius R ref [0077] [0078] 10-05-2019 15 となる。 Therefore, by substituting the equation (7) into the equation (6), a binaural signal can be obtained. [0079] [0080] Spherical harmonic spectrum of the desired sound field ~ P <des> (r, n, m) = <-> P <des> (n, m) jn (kr) has several estimation methods. Consider using a spherical microphone array located on an R m spherical baffle. The spherical microphone array is composed of three or more microphones. [0081] Here, a baffle means what attaches a microphone or a speaker. The presence of the baffle itself affects the transfer function of the sound when it is picked up or played back. Further, the relationship between the spherical harmonic spectrum and the spherical harmonic spectrum of the observation signal is influenced by whether the baffle is a rigid body or a sound absorber. [0082] Assuming that the spherical harmonic spectrum of the signal observed by the microphone array is ~ P <rcv> (Rm, n, m), ~ P <des> (r, n, m) is as follows. h n <(1)> 'means the derivative of the n-th kind Hankel function h n <(1)>. [0083] 10-05-2019 16 [0084] Substituting this equation (8), [0085] [0086] となる。 S is a set of speakers constituting a spherical speaker array used for measurement of HRTF. Unlike in the circular array, signal conversion with a single filter convolution is not possible, but after applying the filter of the following equation (E) to the signal of the microphone array, the convolution with each HRTF is measured by HRTF. A signal obtained by adding the number of speakers used in the above can be used as a binaural signal. [0087] That is, a time signal is obtained by space-time convoluting the filter of equation (E) with the observation signal from the spherical microphone array arranged in a sphere, and the time signal corresponding to each speaker position of the spherical speaker array of the obtained signal And the time signal corresponding to each speaker position of the spherical speaker array of a plurality of head transfer functions based on the reproduced sound from the spherical speaker array in the time direction by the number of speakers constituting the spherical speaker array By adding together, a binaural signal in the time domain is generated. [0088] [0089] Note that filter coefficients of order n and m to F <L> (rs, n, m) [0090] 10-05-2019 17 [0091] The binaural signal is expressed as follows by filter coefficients of order n and m and a circular microphone array signal. [0092] [0093] Formula (F) was assumed to use ~ G obtained by measurement, but here, a case where it is modeled as a point sound source will be considered. [0094] [0095] であることより、 [0096] [0097] となる。 The filter of this formula (F) may be used instead of the filter of formula (E). [0098] <Binaural Signal Generation Apparatus and Method> The binaural signal generation apparatus according to the second embodiment includes a filtering unit 1 and a filter generation unit 2 as shown in FIG. 10-05-2019 18 The binaural signal generation apparatus implements the process of each step shown in FIG. 5 to realize the binaural signal generation method. [0099] A filter obtained by matching a desired sound field with a synthetic sound field in consideration of a spherical speaker array disposed in a spherical shape is input to the filtering unit 1. [0100] This filter is a filter defined by equation (E) or equation (F). This filter is generated in advance by the filter generation unit 2 prior to the processing of the filtering unit 1. [0101] [0102] Further, an observation signal from a circular microphone array arranged in a spherical shape is input to the filtering unit 1. [0103] The filtering unit 1 obtains a signal by performing space-time convolution of the input filter and the input observation signal. The obtained signal is output to the signal generator 3. [0104] The signal generation unit 3 generates a plurality of head transmissions based on the time signals corresponding to each speaker position of the spherical speaker array used to obtain the 10-05-2019 19 HRTF of the signal obtained by the filtering unit 1 and the reproduced sound from the spherical speaker array A time-domain binaural signal is generated by adding the time signal corresponding to each speaker position of the spherical speaker array of the function and convolved in the time direction by the number of speakers constituting the spherical speaker array. [0105] The processing of the filtering unit 1 described above is merely an example. The filtering unit 1 may generate a binaural signal by the other filtering process or convolution process described in the section of <Technical background>. [0106] That is, for example, even if the filtering unit 1 generates a binaural signal by performing the process defined by the equation (9), using ~P <rcv> (R m, m) as the spherical harmonic spectrum of the observation signal. Good. [0107] Thus, by considering a plurality of head related transfer functions, a highly accurate binaural signal can be generated. [0108] [Function] Each function used in the above description will be described. [0109] The n-th kind Hankel function H n <(1)> (·) and the n-th order Bessel function J n (·) are defined as follows, where · is an arbitrary real number. Γ (z) is a gamma function and Y n (z) is a Neumann function. 10-05-2019 20 [0110] [0111] The n-th-order first-class Hankel function h <(1)> n (·) and the n-th-order spherical Bessel function j n (·) are defined as follows. [0112] [0113] Legendre ル function P <m> n (·) and is defined as follows. P n (·) represents a Legendre polynomial. [0114] [0115] The spherical harmonic function Y n <m> is defined by the following equation. [0116] [0117] [Modification] The filtering process in the time frequency domain or the space-time frequency domain may be performed in the space-time domain. That is, the above-mentioned filtering process may be performed by the convolution in the time direction and the convolution in the space direction. 10-05-2019 21 [0118] The binaural signal generation device can be realized by a computer. In this case, the processing content of each part of this apparatus is described by a program. And each part in this apparatus is implement | achieved on a computer by running this program by computer. [0119] The program describing the processing content can be recorded in a computer readable recording medium. Further, in this embodiment, these devices are configured by executing a predetermined program on a computer, but at least a part of the processing contents may be realized as hardware. [0120] The present invention is not limited to the above-described embodiment, and various modifications can be made without departing from the spirit of the present invention. [0121] 1 filtering unit 2 filter generation unit 3 signal generation unit 10-05-2019 22

1/--страниц