вход по аккаунту



код для вставкиСкачать
Patent Translate
Powered by EPO and Google
This translation is machine-generated. It cannot be guaranteed that it is intelligible, accurate,
complete, reliable or fit for specific purposes. Critical decisions, such as commercially relevant or
financial decisions, should not be based on machine-translation output.
Background of the Invention of the Method and Device of Making Directional Sound 1. Field of
the Invention The field of the invention is a method and apparatus for detecting and playing
sound. 2. 2. Description of the Background Art Extensive physical and behavioral scientific
research has revealed that the outer ear (including the torso, head, auricle and ear canal) plays an
important role in spatial hearing. The outer ear is known to change the spectrum of the sound
according to the angle of incidence of the sound entering the ear. Also, for binaural hearinging, it
is also known that the spectral differences created by the outer ear introduce important cues for
localized sound, in addition to interaural time and intensity differences. It is done. When the
sound source is in the sagittal plane or in the case of monaural hearing (monaural), the spectral
cues provided by the outer ear are used almost exclusively by the auditory system to identify the
position of the sound source. In addition, the outer ear embodies a sound image. A sound that is
heard with both original time differences and intensity differences, but no spectral cues
introduced by the outer ear, is generally perceived as having a sound source in the listener's
head. The functional model of the external ear deformation characteristics is of great interest in
simulating a realistic auditory image with headphones. The problem in reproducing the sound as
if it were heard in a three-dimensional space arises in auditory research, hi-fi (high fidelity) music
reproduction and voice communication. In an article published in the Journal of the Acoustical
Society of America (March, 1992, pages 1637 to 1647), Kis-tler and Wightman describe the free
field versus tympanic transfer function (free-). discloses a methodology based on field-to-eardrum
transfer functions (FETF). This methodology analyzes the amplitude spectrum and results that
represent up to 90% of the energy in the measured FETF amplitude. This methodology does not
give an interpolation of the FETF between measurement points in the spherical auditory space
around the listener's head but represents the FETF phase. For other background art in the
relevant area of auditory research, see Journal of the Acoustical Society of America, Vol. 92, no.
4, Pt. 1 (October 1992, pages 1933 to 1944). Refer to the introductory paper of “The external
ear transfer function modeling (A beamforming approach)” published by the present inventors
et al. I want to be
SUMMARY OF THE INVENTION The present invention relates to a method and apparatus for
recording and reproducing nondirectional sound so that it can be heard as directional sound, and
to recording. The measurement data are used to determine an ear-ear transfer function whose
frequency dependence is separated from space dependence. Multiple frequency dependent
functions are weighted and summed to represent the ear transfer function. The weights are
expressed as a function of direction. Sounds that do not provide directional cues will sound like
coming from a specific direction that is processed according to the signal processing techniques
disclosed and claimed herein. According to the present invention, auditory information is
obtained about spatial three-dimensional characteristics. The method and apparatus of the
present invention can be applied where listeners such as pilots, astronauts or ultrasound detector
operators need direction information, or can be used to enhance the enjoyable effect of listening
to recorded music . Other objects and advantages than the above will be apparent to those skilled
in the art from the description of the preferred embodiments described below. In the following
description, reference is made to the accompanying drawings which illustrate the invention.
However, these examples do not fully illustrate the various embodiments of the present
invention, and therefore, to determine the scope of the present invention, reference should be
made to the appended claims. BRIEF DESCRIPTION OF THE DRAWINGS FIG. 1 is a diagram
showing how sound data is collected according to the present invention. 2a-2j are spectrum
graphs of the sounds collected in FIG. 1 or the sounds interpolated for the data collected in FIG.
FIG. 3 is a block diagram of the apparatus used to record the sound data shown in FIGS. 1 and 2.
FIG. 4 is a flow chart showing the steps of making a sound according to the invention. FIG. 5a is a
functional circuit diagram showing how directional sound is synthesized by the device of FIG.
FIG. 5b is a functional circuit diagram showing a second method of synthesizing sound by the
apparatus of FIG. FIG. 6 is a block diagram showing an apparatus for producing directional sound
As shown in FIG. 1, the present invention uses data measured in a three-dimensional space for
the general human ear. The measurement can be performed on a human subject if it requires a
specific subject's ear, or using a special manikin head 10 such as the KEMARTM head that
represents the general human ear. Can also be done. The spherical space around the head is
described in terms of spherical coordinates θ and φ. The variable θ represents the reading of
the azimuth relative to the vertical midline plane formed by the interaural axes 11 and 12 (the
angle to the right of the midline plane in FIG. 1 is positive, to the left The angle is a negative
The variable φ represents the reading of the elevation relative to the horizontal plane passing
through the axes 12, 13 and the centers of both ears (a positive angle above this horizontal
plane, a negative angle below this horizontal plane). In FIG. 1, the equidirectional lines and
contours 14 are shown in increments of 20 °. The loudspeakers 15 are moved to various
positions and generate a wide band sound. Ear sounds are measured using the subject's ear or
mannequin head 10 by placing a microphone on one of the ears so that the sound is recorded as
if it were heard by the listener. Data can be taken for both ears. Also, the sound is measured
without the effect of the ear by removing the subject's ear or manikin head 10 and detecting the
sound at the previous position of the ear to develop a free sound field to ear transfer function .
This is sound data of "free sound field". Both measurements are repeated for different speaker
positions. Standard signal processing methods are used to determine the transfer function
between the ear and the free field data at each location. 2a, 2c, 2e, 2g and 2i show a series of
spectral sound graphs for a series of readings when the azimuth angle is changed from 0 ° to
36 ° for an altitude of 18.5 ° Amplitude-frequency) is shown. As the sound source is moved, it
is observed that the peaks and valleys of the spectrum are displaced. Figures 2b, 2d, 2f, 2h and 2j
show interpolated values using data and the methodology described herein. FIG. 3 shows an
apparatus for collecting sound data for performing free-field and ear-canal recording. A subject
10 and a movable speaker 15 are disposed in the recording chamber 16. A personal computer
20 such as IBM's PC AT or AT compatible computer has a bulk memory 21 such as a CD-ROM or
one or more high capacity hard drives. Microphones 23a, 23b are placed in the ears of the
subject or manikin. The sound is led to an AD converter portion 22a of a signal processing board
in the computer chassis and processed through an amplifier / equalizer unit 24 and an analog
band pass filter circuit 27 outside the computer 20. This converts an analog signal of the type
found in FIG. 2 into a plurality of sampled and digitized readings. Readings are taken at many
more than 2,000 locations on the sphere around mannequin head 10.
This requires about 70 megabytes of data storage capacity. The computer 20 generates a test
sound via the sound generator portion 22b of the signal processing board. Electrical signals are
processed through the output amplifier circuit 25 and the attenuator circuit 26 to boost the
generated sound to an appropriate output level. Next, a sound generation signal (which is
typically a square wave pulse of duration 30 to 100 microseconds or other broadband signal) is
applied via the speaker 15 to generate a test sound. The speaker 15 is moved from point to point
as shown in FIG. In another embodiment for recording spatial sound data, an ADQ-32 signal
processing board is used in a VAX3200 computer. In the method and apparatus for recording
and reproducing simulated directional sound, an audio input signal is passed through a filter, the
frequency response of which filters the free field versus tympanic transfer function. This filter is
obtained as a combination of weighted (where weight is a function of the selected spatial
direction) basic filters. FIG. 4 illustrates how the sound data collected in FIGS. 1-3 is processed to
determine the basic filters and weights used to convey the spatial characteristics to emit sound in
accordance with the present invention. It indicates what to do. Sound data is input and stored for
more than 2,000 specific speaker positions for both the free sound field R (ω, θ, φ) and the ear
canal recording E (ω, θ, φ). This is represented by the input block 31 of FIG. In general, this
data includes noise from sound detection, measurement errors and artifacts. As represented by
processing block 32 in FIG. 4, the free sound field to ear transfer function H (ω, θ, φ) (which is
a function of frequency ω at an azimuth angle θ and an altitude φ Conventional known signal
processing techniques are used to deploy. This block 32 is implemented by a program written in
MATLABTM and the C programming language used on SUN / SPARC2 computers. MATLAB
version 3.5 is from Math Works, Inc. It is commercially available from (Notick, Mass.). A similar
program can be written for the AT compatible computer 20 or other computers to perform this
block. If H (ω, θ, φ) is the measured FETF at a certain azimuth angle θ and altitude φ, this
model is represented by ti (ω) (i = 0,1, ..., p) Frequency-dependence characterized by the basic
filter (also called eigenfilter (EF)) with weight w i (θ, φ) (i = 1,..., P) It separates from the
represented spatial-dependence.
These weights are called spatial transformation characteristic functions (STCF). This gives a twostep procedure of ti (ω) and wi (θ, φ). Provided are methods and apparatus for determining EF
and STCF to be similar. Where H (θ, φ) and ti are N-dimensional vectors, the elements of which
are N samples at the frequency of the measured FETF, ie H (ω, θ, φ) and the eigenfilter {ti (ω) ,
I = 0, 1,..., P} be N samples. In general, the value of N is 256, but larger (or smaller) values may be
used. N should be large enough so that the eigenfilter is better described by the ti samples. The
eigenfilters {ti (ω), i = 0, 1,..., P} are the eigenvectors corresponding to the p largest eigenvalues
of the sample variance matrix ΣH formed from spatial samples of the FETF frequency vector H
(θ, φ) Selected as The eigenfilter t0 (ω) is the FETF frequency vector. H (θj, φk) represents the
measured FETF at azimuthal height pair (θj, φk), j = 1,..., L, k = 1,. Is about 2,000. In equation
(2), the superscript "H" indicates the complex conjugate transpose operation in the dispersion
matrix ΣH of the FETF samples. The non-negative weight factor α jk is used to emphasize that
the relative accuracy of this analysis in one direction is better than the other. If all directions are
equally important, then j = 1,..., L, k = 1,. The EF frequency vector {ti (ω), (i = 0, 1, ..., p)} satisfies
the following eigenvalue problem. ΣHti = λiti (4) where i = 1,..., P, λi are the “p” maximum
eigenvalues of HH. The fidelity of the sound reproduced using the methodology of the present
invention is improved by increasing "p". A typical value for "p" is sixteen. The EF vector t0 (ω) is
set equal to H. (Θj, φk), i = 1, ..., p, j: 1, ..., L, k = 1, ..., M (these are calculated values and positions
(θj, φ,) by choosing to minimize the squared error between the measurements of FETF at j =
1,..., L, k = 1,. , MSTCF wi (θ, φ), given by i.
ここで、i=1,・・・,p、j=1,・・・,N、k=1,・・・,Mである。 Here, the
present invention assumes that ti has a unit norm, that is, tiHti = 1, i = 1,. The spline model that
creates the STCF smoothes the measurement noise and allows interpolation of the STCF (and
hence the FETF) between the measurement directions. The spline model is obtained by solving
the following regularization problem. ここで、i=1,・・・,pである。 Here, Wi (θj, φk) is
a function representation of the i-th STCF, λ is a regularization parameter, and P is a smoothing
operator. The regularization parameters control the tradeoff between solution smoothness and
this fidelity to the data. The optimum value of λ is determined by generalization cross validation.
Considering θ and φ as coordinates of a two-dimensional rectangular coordinate system, the
smoothing operator P is The regularized STCF is combined with the EF to synthesize regularized
FETF at any given θ and φ. Process block 33 of FIG. 4 represents the computation of HH, which
is performed by a MATLAB language program used on a SUN / SPARC2 computer. A similar
program can be written to perform this block on an AT compatible computer 20 or other
computer. The eigenvector expansion is then added to the ΣH result to calculate the eigenvector
ti corresponding to the weighting function of the eigenvalue λi and the frequency ti (ω), as
represented by the process block 34 of FIG. In this example, the eigenanalysis is specifically
called the Kar-hunen-Loeve extension. For further expansion of this expansion, see Papoulis,
"Probability, Random Variables and Random Stochastic Processes" (3rd Edition, McGraw-Hill, Inc.
New York, New York, 1991, pp. 413-416, p. 425). Then, as represented by block 35 in FIG. 4, the
eigenvectors are processed to generate interval variables (θ, φ) for each direction (from this
variable, as described in equation (5) above). Calculate STCF sample wi as a function of being
This calculation is performed by a MATLAB language program used on SUN / SPARC computers.
A similar program can be written to execute this block on an AT compatible computer 20 or
other computer. Next, as shown in process block 36 of FIG. 4, netlib @ Reserch. att. co-om. The
generalized spline model is fitted to STCF using a publicly available software package known as
RKpack, obtained via E-mail. The spline model filters noise from each sampled STCF. As a result,
the spline base STCF becomes a continuous function of space variables (θ, φ). This surface
mapping and filtration provides result data that allows interpolation of STCF between
measurement points in spherical space. As shown in process block 37, EF's to (ω) and ti (ω) and
STCF's wi (θ, φ), i = 1,..., P describe the complete FETF model. The FETFs for the selected
direction are then synthesized by weighting and summing the smoothed and interpolated STCF
into EF. As shown at process block 38, directional sound is synthesized by filtering out the
omnidirectional sound with FETF. The synthesized sound is converted to an audible sound signal
as shown at process block 39 and converted to sound via a speaker as shown at output block 40.
This completes the method of the present invention as shown in block 41. FIG. 5a is a block
diagram showing how directional sound is synthesized according to the present invention. The
nondirectional sound shown by the input signal 29 in FIG. 5 is reproduced through the variable p
of the filter 42 corresponding to the variable p of the EF of the right ear and the variable p of the
filter 43 of the left ear. In this example, it is assumed that p = 16 for the purpose of illustration.
The signal input through each of these 16 filters 42 is, for each ear, as shown by the 16 duplex
connections 74 for the right ear and 16 duplex connections 75 for the left ear: It is amplified
according to STCF analysis of the data shown in blocks 106, 107 as a function of space variables
θ and φ, as outlined above. Also, the input signal 29 is filtered by the FETF sample average to
(ω) and then amplified by the factor of unity (1), as shown in blocks 51, 52 of FIG. 5a.
The amplified and EF filtered component signals are then summed with each other and the EF
filtered component signals at summing connections 80, 81 for the right and left ears,
respectively, and to the listener at a remote location It is played back via headphones. By
weighting the EF filtered signal with the STCF weight corresponding to the selected direction
defined by θ and φ and summing the weighted filtered signal, the effect as generated from the
selected direction is obtained A sound was made. FIG. 5b illustrates another approach to
synthesizing directional sound in accordance with the present invention. Here, the
omnidirectional input signal 29 is filtered directly by the FETF for the selected direction. The
FETF for the selected direction is obtained by weighting the EFs 55, 56 in the "p" duplex 45, 46
with the STCF 106, 107 for the selected direction. The adjusted EFs are then summed together at
summing junctions 47, 48 together with the average value to (ω) of the FETF samples
represented by elements 55, 56 to obtain the response characteristics for the selected direction
of sound. A single filter 49, 50 is formed for each respective ear. In the above example, although
the filtration of the components takes place in the frequency domain, it is clear that one can filter
the components of the time domain according to an equivalent example without departing from
the scope of the present invention. Both FIGS. 5a and 5b show the final steps in explaining the
mutual listening time delay. Since the mutual listening time delay is eliminated during the
modeling process, it needs to be restored when performing binaural hearing. The range of
mutual listening time delay is from 0 to about 700 microseconds. Blocks 132, 142 in FIGS. 5a
and 5b, respectively, represent mutual listening time delay controllers. These controllers convert
the given position variables θ and φ into time delay control signals and send these control
signals to the binaural channel. The blocks 130, 131, 140, 141 are delays controlled by the
mutual listening time delay controllers 132, 142. The actual mutual listening time delay can be
calculated by cross correlating the binaural ear canal recordings with the position of each sound
source. These separate inter listening time delay samples are then input to the spline model,
which results in a continuous inter listening time delay function. FIG. 6 is a block diagram
showing an apparatus for producing directional sound according to the present invention. Nondirectional sound is recorded using a microphone 82 that detects sound, and an amplifier 83 and
signal processing boards 84-86 are used to digitize and record the sound.
The signal processing board has a data acquisition circuit 84, which comprises an AD converter,
a digital signal processor 85 and a DA output circuit 86. The signal processor 85 and the other
sections 84, 86 are interfaced to the PC AT computer 20 or equivalent computer as described
above. The DA output circuit 86 is connected to the stereo amplifier 87 and the stereo
headphone 88. The measurement data for the FETF is stored in a mass storage device (not
shown) connected to the computer 20. Element 89 is another arrangement for pre-recording and
storing audible sound signals, which are then provided to digital signal processor 85 for
producing directional sound. The signal 29 in FIGS. 5a and 5b is received via the microphone 82.
The filtration by the filters 42, 43 and other operations seen in FIGS. 5a and 5b are digital signal
processing using EF and STCF function data 106, 107 accepted from an AT compatible computer
or other suitable computer. Is executed in the unit 85. Other elements 86-88 in FIG. 6 convert
the audible sound signal seen in FIG. 5 into a sound that the listener feels originating from the
direction determined by the selection of θ and φ in FIG. . This selection is made by the AT
compatible computer 20 or other suitable computer by inputting data for θ and φ. The method
of the present invention performs recording on various media, such as CD, tape and digital sound
recording media, which convert non-directional sound into directional sound by inputting various
sets of values for θ and φ. It can be used for By using a series of values that change, the sound
can be "moved" to the listener's ear, and thus the terms "three-dimensional" sound and "virtual
audible sound environment" to explain this effect Is applied. The foregoing has illustrated and
described how the present invention may be practiced. It will be understood by those skilled in
the art that changes in detail may be made to achieve other detailed embodiments, and that many
of these embodiments are included within the scope of the present invention. It will be done.
Accordingly, the following claims are set forth to provide a disclosure of the scope of the
invention and the embodiments covered by the invention.
Без категории
Размер файла
21 Кб
Пожаловаться на содержимое документа