close

Вход

Забыли?

вход по аккаунту

?

JP2011124723

код для вставкиСкачать
Patent Translate
Powered by EPO and Google
Notice
This translation is machine-generated. It cannot be guaranteed that it is intelligible, accurate,
complete, reliable or fit for specific purposes. Critical decisions, such as commercially relevant or
financial decisions, should not be based on machine-translation output.
DESCRIPTION JP2011124723
The present invention provides an audio data processing apparatus and the like which
accelerates correction processing by linearly interpolating distortion of a waveform generated
when a virtual sound source moves from a speaker. According to one embodiment, a calculation
unit (sound wave propagation time data calculation unit 1202) for calculating a first distance and
a second distance from a position of a speaker at successive points in time to a position of a
virtual sound source. When the first distance and the second distance are different, a specifying
unit (output audio data generation unit 1207) for specifying a distortion portion in the audio
data at the previous and subsequent time points, the audio data of the specified portion, a
function A correction unit (output audio data generation unit 1207) for correcting by the used
interpolation is provided. [Selected figure] Figure 12
AUDIO DATA PROCESSING DEVICE, AUDIO DEVICE, AUDIO DATA PROCESSING METHOD,
PROGRAM, AND RECORDING MEDIUM CONTAINING THE PROGRAM
[0001]
The present invention relates to an audio data processing device, an audio device, an audio data
processing method, a program, and a recording medium storing the program.
[0002]
In recent years, research on audio systems based on Wave Field Synthesis (WFS) as a basic
principle has been actively conducted mainly in Europe (see, for example, Non-Patent Document
1).
09-05-2019
1
The WFS is referred to as a plurality of speakers arranged in an array (hereinafter referred to as
a "speaker array". ) Is a technology that synthesizes the wave front of the sound emitted from.
[0003]
The listener who is listening to the sound facing the speaker array in the acoustic space provided
by the WFS is actually the sound source virtually emitted from the speaker array, which is
present behind the speaker array (hereinafter referred to as It is called "virtual sound source".
Feel as if it were emitted from) (see, eg, FIG. 1).
[0004]
Devices to which the WFS system can be applied include movies, audio systems, televisions, AV
racks, video conferencing systems, video games and the like. For example, if the digital content is
a movie, the presence of an actor is recorded on the media in the form of a virtual sound source.
Therefore, when the actor moves in the screen while talking, the virtual sound source can be
localized in the left, right, front, back, and any direction with respect to the screen in accordance
with the movement direction of the actor in the screen. For example, Patent Document 1
describes a system that enables a virtual sound source to move.
[0005]
Japanese Patent Application Publication No. 2007-502590
[0006]
Berghout, De Bries, D. de Vries, and P. Vogel, "Acoustic control by wave field synthesis"
(Netherlands), 93 (5) edition, Journal Of the Acoustical Society of America (J. Acoust.
Soc), May 1993, p. 2764-2778.
09-05-2019
2
[0007]
The Doppler effect is known as a physical phenomenon in which the frequency of the sound
wave is different depending on the relative velocity of the sound source as the sound source and
the listener. According to the Doppler effect, when the sound source, which is the source of the
sound wave, approaches the listener, the vibration of the sound wave is packed and the
frequency is increased. Conversely, when the sound source is away from the observer, the
vibration of the sound wave is extended and decreased. . This means that the number of sound
wave waves arriving from the sound source does not change even if the sound source moves.
However, the one described in Non-Patent Document 1 is premised on that the virtual sound
source is fixed without moving, and the Doppler effect generated with the movement of the
virtual sound source is not studied. Therefore, when moving the virtual sound source in the
direction of moving away from or in the direction of moving away from the speaker, the number
of waves of the audio signal that is the basis of the sound emitted by the speaker changes, and
the change in the number of waves causes distortion in the waveform. Will occur. If distortion
occurs in the waveform, the listener perceives it as noise, so it is necessary to take measures to
eliminate the distortion in the waveform. The details of waveform distortion will be described
later.
[0008]
On the other hand, the one described in Patent Document 1 takes into account the Doppler effect
generated as the virtual sound source moves, and from the appropriate sample data in one
segment in the audio data that is the basis of the audio signal, in the next segment The audio data
in the range is corrected by changing the weighting factor for the audio data in the range up to
the appropriate sample data. Here, the "segment" is a processing unit of audio data. By correcting
the audio data, the extreme distortion of the audio signal waveform is eliminated to some extent,
and the noise generated by the waveform distortion can be reduced. However, in order to correct
the audio data of the current segment, it is necessary to previously calculate the sound wave
propagation time of the audio data of the next segment according to Patent Document 1. That is,
the audio data of the segment at the present time can not be corrected if the calculation
processing of the sound wave propagation time of the audio data of the next segment is not
completed in the one described in Patent Document 1; There is a problem that a delay of one
segment occurs to output data.
[0009]
The present invention has been made in view of such problems, and is an audio data processing
09-05-2019
3
apparatus or the like for identifying a distortion portion in audio data and correcting the
distortion of the identified waveform. An object of the present invention is to provide an audio
data processing apparatus or the like which can output audio data without causing it.
[0010]
The audio data processing apparatus according to the present invention inputs audio data
corresponding to a sound emitted by a moving virtual sound source, the position of the virtual
sound source, and the position of a speaker that emits sound based on the audio data. An audio
data processing apparatus that corrects the audio data based on the position of the speaker,
calculates a first distance and a second distance from the position of the speaker at successive
points in time to the position of the virtual sound source. If the calculation means, the first means
and the second means are different, the audio data of the specified part is specified as the
identification means for specifying the distortion part in the audio data at the previous and
subsequent time points. And correction means for correcting by the used interpolation.
[0011]
In the audio data processing apparatus according to the present invention, the audio data
includes sample data, and the identifying means identifies repeated portions and missing
portions of sample data caused by separation and approach of the virtual sound source to the
speaker, and the correction The means is characterized in that the identified repetitive part and
missing part are corrected by interpolation using a function.
[0012]
In the audio data processing apparatus according to the present invention, the interpolation
using the function is a linear interpolation.
[0013]
In the audio data processing apparatus according to the present invention, the portion to be
subjected to the correction is a difference in time width of propagation of the sound wave
through the first and second distances, or a time width proportional to the difference. .
[0014]
The audio apparatus according to the present invention uses audio data corresponding to a
sound emitted by a moving virtual sound source, the position of the virtual sound source, and the
position of a speaker that emits sound based on the audio data. An audio apparatus for
correcting the audio data based on a position, comprising: a digital content input unit for
09-05-2019
4
inputting digital content including the audio data and the position of the virtual sound source;
and analyzing the digital content input by the digital content input unit; A content information
separation unit for separating audio data and data of a position of a virtual sound source
included in the digital content; data of a position of the virtual sound source separated by the
content information separation unit; and data of a position of the speaker Content information
The audio data processing unit includes: an audio data processing unit that corrects the audio
data separated by the separating unit; and an audio signal generation unit that converts the
corrected audio data into an audio signal and outputs the audio signal to a speaker. Calculation
means for calculating the first distance and the second distance from the position of the speaker
at the time to the position of the virtual sound source and the first distance and the second
distance are different at the time before and after The audio data is characterized by comprising:
identification means for identifying a portion of distortion in the audio data; and correction
means for correcting the audio data of the identified portion by interpolation using a function.
[0015]
In the audio apparatus of the present invention, the digital content input unit is characterized in
that the digital content is input from a recording medium for storing the digital content, a server
for distributing the digital content via a network, or a broadcasting station for broadcasting the
digital content. Do.
[0016]
In the audio data processing method of the present invention, audio data corresponding to a
sound emitted by a moving virtual sound source, a position of the virtual sound source and a
position of a speaker emitting a sound based on the audio data are input, and the position of the
virtual sound source In an audio data processing method in an audio data processing apparatus
for correcting the audio data based on a position of a speaker, each of a first distance and a
second distance from the position of the speaker at successive points in time to the position of
the virtual sound source Calculating the distance, if the first distance and the second distance are
different, identifying a portion of distortion in the audio data at previous and subsequent time
points, and determining the audio data of the identified portion, And correcting by interpolation
using a function
[0017]
A program according to the present invention is the audio data corresponding to the sound
emitted by the moving sound source based on the position of the virtual sound source formed by
the sound emitted by the speaker inputting the audio signal corresponding to the audio data and
the position of the speaker Calculating a first distance and a second distance from the position of
the speaker at successive points in time to the position of the virtual sound source in the
09-05-2019
5
program for correcting the first distance, and And performing the steps of: identifying a portion
of distortion in the audio data at previous and subsequent times when the distance of 2 is
different; and correcting the audio data of the identified portion by interpolation using a function
It is characterized by
[0018]
A recording medium of the present invention is characterized in that the program described
above is recorded.
[0019]
In the audio data processing apparatus of the present invention, the location of waveform
distortion is identified according to the approach and separation of the virtual sound source from
the speaker, and then the distortion of the identified waveform is corrected by interpolation
using a function. Therefore, the audio data can be corrected and output without delay.
[0020]
In the audio data processing apparatus according to the present invention, repetitive portions
and missing portions of sample data resulting from separation and approach of the virtual sound
source to the speaker are identified, and the correction means identifies repetitive portions
identified by interpolation using a function. Audio data can be corrected and output without
delay in order to correct parts and missing parts.
[0021]
In the audio data processing apparatus according to the present invention, the location of
waveform distortion is determined according to the approach and separation of the virtual sound
source from the speaker, and then linear interpolation is performed to correct the identified
waveform distortion. Audio data can be corrected and output without delay.
[0022]
In the audio apparatus of the present invention, the location of distortion of the waveform is
identified according to the approach and separation of the virtual sound source from the speaker,
and then the distortion of the identified waveform is corrected by interpolation using a function.
The audio data can be corrected and output without delay.
[0023]
09-05-2019
6
In the audio data processing method of the present invention, the location of waveform distortion
is identified according to the approach and separation of the virtual sound source from the
speaker, and then the distortion of the identified waveform is corrected by interpolation using a
function. Therefore, the audio data can be corrected and output without delay.
[0024]
According to the program of the present invention, the location of waveform distortion is
determined according to the approach and separation of the virtual sound source with respect to
the speaker, and then interpolation is performed using a function to correct distortion of the
specified waveform. Audio data can be corrected and output without delay.
[0025]
In the recording medium on which the program of the present invention is recorded, the location
of distortion of the waveform is specified according to the approach and separation of the virtual
sound source to the speaker, and then the distortion of the specified waveform is determined by
interpolation using a function. The audio data can be corrected without delay and output.
[0026]
According to the audio data processing apparatus and the like according to the present invention,
it is possible to correct the distortion of the audio data caused by the approach or separation of
the virtual sound source to the speaker without delay, and to output the corrected audio data.
[0027]
It is explanatory drawing of an example of the acoustic space provided by WFS.
It is an explanatory view generally explaining an audio signal.
It is explanatory drawing of a part of audio signal waveform formed of audio data.
It is explanatory drawing of an example of the audio signal waveform formed of the audio data in
a 1st segment.
09-05-2019
7
It is explanatory drawing of an example of the audio signal waveform formed of the audio data in
a 2nd segment.
FIG. 6 is an explanatory diagram of an example of an audio signal waveform obtained by
combining an audio signal waveform formed by the audio data shown in FIG. 4 and an audio
signal waveform formed by the audio data shown in FIG. 5;
It is explanatory drawing of an example of the audio signal waveform formed of the audio data in
a 1st segment.
It is explanatory drawing of an example of the audio signal waveform formed of the audio data in
a 2nd segment.
Four blanks occur between the audio signal waveform formed by the audio data of the first part
in the first segment and the audio signal waveform formed by the audio data of the last part in
the second segment It is explanatory drawing which shows the state which is carrying out.
FIG. 8 is an explanatory diagram of an example of an audio signal waveform obtained by
combining an audio signal waveform formed by the audio data shown in FIG. 7 and an audio
signal waveform formed by the audio data shown in FIG. 8;
FIG. 1 is a block diagram showing a configuration example of an audio apparatus provided with
an audio data processing unit according to Embodiment 1.
FIG. 2 is a block diagram showing an example of an internal configuration of an audio data
processing unit according to Embodiment 1.
It is explanatory drawing of one structural example of an input audio data buffer.
It is explanatory drawing of one structural example of a sound wave propagation time data
buffer.
09-05-2019
8
It is explanatory drawing of an example of the audio signal waveform formed of the audio data
after correction | amendment.
It is explanatory drawing of an example of the audio signal waveform formed of the audio data
after correction | amendment.
5 is a flowchart showing a flow of data processing according to Embodiment 1;
FIG. 7 is a block diagram showing an example of the internal configuration of an audio device
according to Embodiment 2;
[0028]
First Embodiment First, an operation model on the assumption that the virtual sound source does
not move in the acoustic space provided by WFS and an operation model in consideration of the
movement of the virtual sound source will be described, and then, the description of the
embodiment will be given. Move to
[0029]
FIG. 1 is an explanatory view of an example of an acoustic space provided by WFS.
In the acoustic space shown in FIG. 1, there are listeners 102 who are listening to sounds facing
the speaker array 103 composed of M speakers 103_1 to 103_M and the speaker array 103.
In this acoustic space, the wavefronts of the sound radiated from the M speakers 103_1 to 103
̶ M are wavefront synthesized based on the principle of Huygens, and are transmitted as a
synthesized wavefront 104 in the acoustic space. At this time, the listener 102 actually sounds as
if the sound radiated from the speaker array 103 is emitted from the N virtual sound sources
101_1 to 101_N localized to the rear of the speaker array 103 and not actually present. Receive
a sense of The N virtual sound sources 101_1 to 101_N are collectively called a virtual sound
source 101.
09-05-2019
9
[0030]
On the other hand, FIG. 2 is an explanatory view generally explaining an audio signal. When
theoretically dealing with an audio signal, the audio signal is generally represented as a
continuous signal S (t). 2 (a) shows a continuous signal S (t), FIG. 2 (b) shows an impulse train of
sampling intervals Δt, and FIG. 2 (c) shows the continuous signal S (t) sampled and quantized at
sampling intervals Δt. It is a figure showing data s (bdeltat) (however, b = positive integer). For
example, as shown in FIG. 2A, the continuous signal S (t) is continuous both on the axis of time t
and on the axis of amplitude S. Sampling aims at obtaining a temporally discrete signal from the
continuous signal S (t). This is intended to represent the continuous signal S (t) by data s (bΔt) at
discrete discrete time points bΔt. In theory, the sampling interval may be variable, but it is more
practical to use a fixed interval. In the sampling and quantization operations, assuming that the
sampling interval is Δt, as shown in FIG. 2C, the continuous signal S (t) is punched out by an
impulse sequence at the sampling interval Δt (FIG. 2B). It is performed by quantizing. In the
following description, the quantized data s (bΔt) is referred to as “sample data”.
[0031]
The contents of the operation model which does not consider the movement of the virtual sound
source 101 are as follows. In this operation model, audio signals to be provided to the speaker
array 103 are generated using the following equations (1) to (4).
[0032]
In this calculation model, the m-th speaker (hereinafter, referred to as “speaker 103 ̶ m”)
included in the speaker array 103. Sample data at discrete time t of the audio signal given to.
Here, as shown in FIG. 1, the number of virtual sound sources 101 is N, and the number of
speakers constituting the speaker array 103 is M.
[0033]
[0034]
09-05-2019
10
However, qn (t): the n-th virtual sound source of the N virtual sound sources 101 (hereinafter
referred to as “virtual sound source 101 ̶ n”.
Sample data lm (t) at discrete time t of the sound wave emitted from the speaker 103_m and
reaching the speaker 103 _m: sample data at discrete time t of the audio signal given to the
speaker 103 _m
[0035]
[0036]
Where Gn: gain coefficient for virtual sound source 101_n sn (t): sample data of audio signal
given to virtual sound source 101_n at discrete time t τ mn: sound wave propagation caused by
distance between position of virtual sound source 101_n and position of speaker 103_m Number
of hourly samples
[0037]
[0038]
Where w: weight constant rn: position vector of virtual sound source 101_n (fixed value) rm:
position vector of speaker 103_m (fixed value)
[0039]
[0040]
Here, the floor symbol indicates "the largest of the integers not exceeding a given value".
[0041]
As can be understood from Expressions (3) and (4), in the present operation model, the gain
coefficient Gn for the virtual sound source 101_n is inversely proportional to the square root of
09-05-2019
11
the distance from the virtual sound source 101_n to the speaker 103_m.
This is because the set of speakers 103 ̶ m is modeled as a linear sound source.
On the other hand, the sound wave propagation time τ mn is proportional to the distance from
the virtual sound source 101 ̶ n to the speaker 103 ̶ m.
[0042]
The above equations (1) to (4) are based on the assumption that the virtual sound source 101 ̶
n does not move and is at rest at a certain position.
However, in the real world, a person walks and talks, and a car travels with an engine sound.
That is, in the real world, the sound source may be stationary or may move.
Therefore, in order to cope with such a case, a new operation model (the operation model
according to the first embodiment) is introduced in consideration of the case where the sound
source moves.
Hereinafter, a new calculation model will be described.
[0043]
In consideration of the case where the virtual sound source 101 ̶ n moves, the equations (2) to
(4) are replaced with the equations (5) to (7) shown below.
[0044]
[0045]
However, Gn, t: Gain coefficient for virtual sound source 101 _ n at discrete time t τ m n, t:
09-05-2019
12
Number of samples of sound wave propagation time due to distance between virtual sound
source 101 _ n at discrete time t and speaker 103 _ m
[0046]
[0047]
Where: rn, t: position vector of virtual sound source 101_n at discrete time t
[0048]
[0049]
Since the virtual sound source 101_n is moving, the gain coefficient with respect to the virtual
sound source 101_n, the position of the virtual sound source 101_n, and the sound wave
propagation time all change according to the discrete time t, as understood from the equations
(5) to (7) .
[0050]
Audio data is generally signal processed in units of segments.
A "segment" is a processing unit of audio data and is also called a "frame".
One segment is composed of, for example, 256 sample data or 512 sample data.
Therefore, lm (t) (sample data of the audio signal given to the speaker 103 ̶ m at discrete time
t) of Expression (1) is calculated in units of segments.
Therefore, in the present operation model, a segment of audio data forming an audio signal to be
given to the speaker 103 ̶ m calculated at discrete time t is a vector and is Lm, t.
09-05-2019
13
In that case, Lm, t is vector data composed of a sample data (for example, 256, 512, etc. sample
data) included in one segment from discrete time t-a + 1 to discrete time t. Yes, it is expressed by
equation (8).
[0051]
[0052]
As audio data is processed in units of segments, it is practical that rn, t is also determined for
each segment.
However, the frequency of updating rn does not necessarily have to match the segment unit.
Then, by comparing the virtual sound source positions rn, t0 at the discrete time t0 and the
virtual sound source positions rn, t0-a at the discrete time (t0-a), the virtual sound source
positions rn, t0 are discrete time (t0-a) From the speaker 103_m to the discrete time t0.
Here, a case where the virtual sound source 101_n moves in a direction away from the speaker
103 _m (the virtual sound source 101 _n is separated from the speaker 103 _m) and moves in a
direction approaching (the virtual sound source 101 _n approaches the speaker 103 _m) will be
described .
[0053]
Gn, t and τmn, t also change according to the distance the virtual sound source 101 ̶ n has
moved between the discrete time (t0−a) and the discrete time t0. Equations (9) and (10) shown
below correspond to the variation of the gain coefficient and the sound wave propagation time
which change according to the distance the virtual sound source 101_n has moved between the
discrete time (t0-a) and the discrete time t0. It represents the amount of variation in the number
of samples. For example, ΔG n, t 0 represents the variation of the gain coefficient at discrete
time t 0, and Δτ mn, t 0 represents the sound propagation time at discrete time (t 0 -a) of the
number of samples of the sound propagation time at discrete time t 0 Represents the amount of
variation from the number of samples (also called "time width"). When the virtual sound source
09-05-2019
14
moves from the discrete time (t0-a) to the discrete time t0, these fluctuation amounts take either
a positive value or a negative value depending on the direction in which the virtual sound source
101_n moves.
[0054]
[0055]
[0056]
By moving the virtual sound source 101 ̶ n in a direction to move away from or to move closer
to the speaker 103 ̶ m, ΔGn, t0 and a time width Δτmn, t0 are generated, so that waveform
distortion occurs at discrete time t0.
Here, the state in which "waveform distortion" has occurred means a state in which the audio
signal waveform does not change continuously, but changes discontinuously so that the listener
perceives the portion as noise.
[0057]
For example, if the sound wave propagation time is increased by moving the virtual sound source
101_n away from the speaker 103 _m, that is, if the time width Δτ mn, t0 is positive, the first
part of the segment starting from the discrete time t0 The audio data of the last part in the
immediately preceding segment reappears again with the time width Δτ mn, t 0.
Hereinafter, the segment immediately before the segment starting from the discrete time t0 is
called a first segment, and the segment starting from the discrete time t0 is called a second
segment. As a result of this repeated appearance of audio data, distortion occurs in the
waveform.
[0058]
09-05-2019
15
On the other hand, if the sound wave propagation time is reduced by moving the virtual sound
source 101 ̶ n in the direction approaching the speaker 103 ̶ m, ie, if the time width Δτ mn,
t 0 is negative, the audio data of the last part in the first segment and A drop of a time width Δτ
mn, t 0 occurs between the audio data of the first part in the second segment. As a result,
discontinuity occurs in the audio signal waveform. This is also waveform distortion. Hereinafter,
specific examples of waveform distortion will be described using the drawings.
[0059]
FIG. 3 is an explanatory view of a part of an audio signal waveform formed by audio data. The
audio data shown in FIG. 3 is assumed to be represented by a total of 28 sample data of sample
data 301 to sample data 328. Hereinafter, based on the audio signal shown in FIG. 3, the reason
why the waveform distortion occurs when the virtual sound source 101 ̶ n moves in a direction
away from and in the direction away from the speaker 103 ̶ m will be described.
[0060]
First, when the sound wave propagation time with respect to the distance between the position of
the virtual sound source 101_n and the position of the speaker 103_m is increased by moving
the virtual sound source 101_n away from the speaker 103_m, that is, when the time width
Δτmn, t0 is positive. Will be explained.
[0061]
FIG. 4 is an explanatory diagram of an example of an audio signal waveform formed by audio
data in the first segment.
The last part of the first segment contains sample data 301-312. FIG. 5 is an explanatory diagram
of an example of an audio signal waveform formed by audio data in the second segment. The first
part of the second segment contains sample data 308'-318. In this example, the virtual sound
source 101 _n moves away from the speaker 103 _m, so that the number of sound wave
propagation time samples for the distance from the virtual sound source 101 _n to the speaker
103 _m in the second segment is the virtual sound source 101 _n in the first segment. For
example, it is assumed that the number of samples for the sound wave propagation time with
respect to the distance from the speaker 103 _m is increased by, for example, 5 (= Δτ m n, t)
09-05-2019
16
points. As a result of the increase in the sound wave propagation time, the sample data 308, 309,
310, 311, 312 of the last part in the first segment shown in FIG. 4 is in the first part in the
second segment shown in FIG. It reappears as sample data 308 ', 309', 310 ', 311', 312 '.
Therefore, when the audio signal waveform formed by the audio data shown in FIG. 4 and the
audio signal waveform formed by the audio data shown in FIG. 5 are combined, waveform
distortion occurs in the coupled portion. 6 is an explanatory diagram of an example of an audio
signal waveform obtained by combining an audio signal waveform formed by the audio data
shown in FIG. 4 and an audio signal waveform formed by the audio data shown in FIG. It can be
understood from FIG. 6 that the audio data becomes discontinuous in the vicinity of the sample
data 308 ', and waveform distortion occurs. The distortion of this waveform is perceived by the
listener as noise.
[0062]
Conversely, the case where the sound wave propagation time is reduced by moving the virtual
sound source 101 ̶ n in the direction approaching the speaker 103 ̶ m, that is, the case
where the time width Δτ mn, t 0 is negative will be described. FIG. 7 is an explanatory diagram
of an example of an audio signal waveform formed by audio data in the first segment. The last
part of the first segment contains sample data 301-312. The contents are the same as those
shown in FIG. FIG. 8 is an explanatory diagram of an example of an audio signal waveform
formed by audio data in the second segment. The first part of the second segment contains
sample data 317-328. In this example, by moving the virtual sound source 101_n in the direction
approaching the speaker 103_m, the number of samples for the sound wave propagation time to
the distance from the virtual sound source 101_n to the speaker 103_m in the second segment is
the virtual sound source 101_n in the first segment. For example, it is assumed that the number
of samples for the sound wave propagation time with respect to the distance from the speaker
103 _m to the number of samples for 4 (= Δτ m n, t) is decreased.
[0063]
FIG. 9 shows four points between the audio signal waveform formed by the audio data of the first
part in the first segment and the audio signal waveform formed by the audio data of the last part
in the second segment. It is explanatory drawing which shows the state which the missing part
has generate | occur | produced. As a result of the reduction of the sound wave propagation time,
as shown in FIG. 9, it is formed by the audio signal waveform formed by the audio data of the last
part in the first segment and the audio data of the first part in the second segment. Missing
portions of four points (sample data 313 to 316) occur between the audio signal waveforms.
09-05-2019
17
Therefore, when the audio signal waveform formed by the audio data shown in FIG. 7 and the
audio signal waveform formed by the audio data shown in FIG. 8 are combined, waveform
distortion occurs in the coupled portion. FIG. 10 is an explanatory diagram of an example of an
audio signal waveform obtained by combining an audio signal waveform formed by the audio
data shown in FIG. 7 and an audio signal waveform formed by the audio data shown in FIG. As
can be seen from FIG. 10, the audio data becomes discontinuous in the vicinity of the sample
data 317, and waveform distortion occurs. The distortion of this waveform is also perceived by
the listener as noise.
[0064]
The reason why the distortion of the waveform occurs when the virtual sound source 101 ̶ n
moves has been described above. Next, an embodiment according to the present invention for
eliminating waveform distortion by correcting audio data will be specifically described with
reference to the drawings.
[0065]
FIG. 11 is a block diagram showing a configuration example of an audio apparatus provided with
the audio data processing unit according to the first embodiment. The audio device 1100
includes an audio data processing unit 1101, a content information separation unit 1102, an
audio data storage unit 1103, a virtual sound source position data storage unit 1104, a speaker
position data input unit 1105, and a speaker position data storage unit 1106 according to the
first embodiment. , D / A conversion unit 1107, M amplifiers 1108_1 to 1108_M, a reproduction
unit 1109, and a communication interface unit 1110. The audio device 1100 includes a central
processing unit (CPU) 1111 that centrally controls the above-described components, a read-only
memory (ROM) 1112 that stores a computer program executed by the CPU 1111, data and
variables processed during execution of the computer program. And the like are further
provided. The audio device 1100 outputs an audio signal corresponding to the corrected audio
data to the speaker array 103.
[0066]
The playback unit 1109 reads the digital content from the recording medium 1117 storing
digital content (movie, computer game, music video, etc.), and outputs the digital content to the
09-05-2019
18
content information separation unit 1102. The recording medium 1117 is, for example, a
compact disc recordable (CD-R), a digital versatile disc (DVD), or a Blu-ray disc (registered
trademark). In the digital content, a plurality of audio data files corresponding to each of the
virtual sound sources 101_1 to 101_N and virtual sound source position data corresponding to
the virtual sound sources 101_1 to 101_N are recorded in association with each other.
[0067]
The communication interface unit 1110 acquires digital content from the server 1115 that
distributes digital content via a communication network such as the Internet 1114, and outputs
the digital content to the content information separation unit 1102. Further, the communication
interface unit 1110 includes a device (not shown) such as an antenna or a tuner, receives a
program broadcasted by the broadcast station 1116, and outputs it as digital content to the
content information separation unit 1102.
[0068]
The content information separation unit 1102 acquires digital content from the reproduction
unit 1109 or the communication interface unit 1110, analyzes the digital content, and separates
audio data and virtual sound source position data from the digital content. Next, the content
information separation unit 1102 outputs each of the separated audio data and virtual sound
source position data to the audio data storage unit 1103 and the virtual sound source position
data storage unit 1104. The virtual sound source position data is, for example, position data
corresponding to the relative positions of a singer and a plurality of musical instruments
displayed on the video screen when the digital content is a music video. The virtual sound source
position data is stored in the digital content together with the audio data.
[0069]
The audio data storage unit 1103 stores audio data acquired from the content information
separation unit 1102, and the virtual sound source position data storage unit 1104 stores virtual
sound source position data acquired from the content information separation unit 1102. The
speaker position data storage unit 1106 acquires, from the speaker position data input unit
1105, speaker position data indicating the position in the acoustic space where the speakers
103_1 to 103_M of the speaker array 103 are arranged, and stores the speaker position data.
09-05-2019
19
The speaker position data is information set by the user based on the position of each of the
speakers 103_1 to 103_M configuring the speaker array 103. The information is represented,
for example, by coordinates in one plane (X-Y coordinate system) fixed to the audio device 1100
in the acoustic space. The user operates the speaker position data input unit 1105 to store the
speaker position data in the speaker position data storage unit 1106. If the arrangement of the
speaker array 103 is determined in advance due to mounting restrictions, the speaker position
data is set as a fixed value. On the other hand, when the user can freely determine the
arrangement of the speaker array 103 to some extent, the speaker position data is set as a
variable value.
[0070]
The audio data processing unit 1101 reads an audio file corresponding to each of the virtual
sound sources 101_1 to 101_N from the audio data storage unit 1103. Also, the audio data
processing unit 1101 reads virtual sound source position data corresponding to the virtual sound
sources 101_1 to 101 ̶ N from the virtual sound source position data storage unit 1104.
Further, the audio data processing unit 1101 reads speaker position data corresponding to the
speakers 103_1 to 103_M of the speaker array 103 from the speaker position data storage unit
1106. The audio data processing unit 1101 performs processing according to the embodiment
on the read audio data based on the read virtual sound source position data and the speaker
position data. That is, the audio data processing unit 1101 generates audio data that forms an
audio signal to be provided to the speakers 103_1 to 103_M by performing arithmetic
processing based on the above-described arithmetic model in consideration of the movement of
the virtual sound sources 101_1 to 101_N. The audio data generated by the audio data
processing unit 1101 is output as an audio signal by the D / A conversion unit 1107, and is
output to the speakers 103_1 to 103_M through the amplification units 1108_1 to 1108_M. The
speaker array 103 generates a sound based on this audio signal and radiates it to the acoustic
space.
[0071]
FIG. 12 is a block diagram showing an example of the internal configuration of the audio data
processing unit 1101 according to the first embodiment. The audio data processing unit 1101
includes a distance data calculating unit 1201, a sound wave propagation time data calculating
unit 1202, a sound wave propagation time data buffer 1203, a gain coefficient data calculating
unit 1204, a gain coefficient data buffer 1205, an input audio data buffer 1206, and output
audio data A generation unit 1207, an output audio data superposition unit 1208, and an output
09-05-2019
20
audio data buffer 1209 are provided. The distance data calculation unit 1201 is connected to the
virtual sound source position data storage unit 1104 and the speaker position data storage unit
1106. The input audio data buffer 1206 is connected to the audio data storage unit 1103. The
output audio data superposition unit 1208 is connected to the D / A conversion unit 1107. The
output audio data buffer 1209 is connected to the output audio data generator 1207.
[0072]
The distance data calculation unit 1201 acquires virtual sound source position data and speaker
position data from the virtual sound source position data storage unit 1104 and the speaker
position data storage unit 1106, and based on them, the distance between the virtual sound
source 101_n and each of the speakers 103_1 to 103_M. The distance data (| rn, t−rm |) is
calculated, and is output to the sound wave propagation time data calculation unit 1202 and the
gain coefficient data calculation unit 1204. The sound wave propagation time data calculation
unit 1202 calculates sound wave propagation time data (number of samples of sound wave
propagation time) τ mn, t based on the distance data (| rn, t−rm |) obtained from the distance
data calculation unit 1201. (See equation (7)). The sound wave propagation time data buffer
1203 acquires the sound wave propagation time data τ m n, t from the sound wave propagation
time data calculation unit 1202, and temporarily stores the sound wave propagation time data
for a plurality of segments. The gain coefficient data calculation unit 1204 calculates gain
coefficient data Gn, t based on the distance data (| rn, t−rm |) obtained from the distance data
calculation unit 1201 (see equation (6)).
[0073]
The input audio data buffer 1206 acquires input audio data corresponding to each virtual sound
source 101 ̶ n from the audio data storage unit 1103 and temporarily stores input audio data
for a plurality of segments among them. One segment consists of sample data of 256 or 512
audio data, for example. The output audio data generation unit 1207 uses the sound wave
propagation time data τ mn, t calculated by the sound wave propagation time data calculation
unit 1203 and the gain coefficient data Gn, t calculated by the gain coefficient data calculation
unit 1205 to use the input audio data buffer 1206. Generating output audio data corresponding
to the input audio data temporarily stored in The output audio data superposition unit 1208
synthesizes the output audio data generated by the output audio data generation unit 1207
according to the number of virtual sound sources 101 ̶ n.
09-05-2019
21
[0074]
FIG. 13 is an explanatory diagram of a configuration example of the input audio data buffer
1206. The input audio data buffer 1206 temporarily stores data by FIFO (First-In, First-Out)
method, and discards old data. The buffer size may generally be set based on the sample number
width of the maximum value of the distance between the virtual sound source and the speaker.
For example, assuming that the maximum value is 34 meters, the sampling frequency may be
44100 Hz and the sound speed may be 340 meters, and 44100 × 34/340 = 4410 samples or
more may be prepared. The input audio data buffer 1206 reads the input audio data from the
audio data storage unit 1103 according to its own buffer size, stores the input audio data, and
outputs the data to the output audio data generation unit 1207. That is, the data is not output to
the output audio data generation unit 1207 in order from the old data. In FIG. 13, each square
block represents a sample data storage area, and one sample data in the segment is temporarily
stored in the sample data storage area. According to FIG. 13, for example, one sample data of the
leading part of the latest segment is temporarily stored in the sample data storage area 1300_1,
and one of the last part of the latest segment is stored in the sample data storage area 1300_1 +
a-1. Sample data, that is, the latest one sample data is temporarily stored. Here, a is a segment
length, and is the number of sample data included in one segment.
[0075]
FIG. 14 is an explanatory diagram of a configuration example of the sound wave transit time data
buffer 1203. The sound wave propagation time data buffer 1203 is also a temporary storage unit
that performs data input / output in the FIFO method. In FIG. 14, each square block represents a
sound wave transit time data storage area, and the sound wave transit time data of each segment
is temporarily stored in the sound wave transit time data storage area. Also, FIG. 14 shows that
sound wave propagation time data for two segments is temporarily stored in the sound wave
transit time data buffer 1203. Further, in FIG. 14, the oldest sound wave propagation time data is
temporarily stored in the sound wave transit time data storage area 1203_1 of the sound wave
transit time data buffer 1203, and the newest sound wave transit time data is stored in the sound
wave transit time data storage area 1203_2. Indicates that is temporarily stored.
[0076]
The operation according to the embodiment will be described with reference to FIGS. 12 to 14.
The input audio data buffer 1206 reads input audio data of one segment from the discrete time
09-05-2019
22
t1 to the discrete time (t1 + a-1) from the audio data storage unit 1103 and temporarily stores it.
Describing with reference to FIG. 13, sample data from discrete time t1 to discrete time (t1 + a-1)
are stored in order from sample data storage area 1300_1 to sample data storage area 1300_1 +
a-1. Further, in the sample data storage area other than the sample data storage area 1300_1 to
1300_1 + a-1, input audio data for a plurality of segments before the discrete time t1 is already
stored. The output audio data buffer 1209 already stores sample data at discrete time (t1-1) of
the output audio data corresponding to the immediately preceding segment. Also, the sound
wave transit time data buffer 1203 already stores the sound wave transit time data of the
previous segment.
[0077]
The distance data calculation unit 1201 calls the first virtual sound source at the discrete time t1
(hereinafter, “virtual sound source 101_1”. And the first speaker (hereinafter referred to as
“speaker 103_1”). The distance data (| r1, t1-r1 |) indicating the distance of) is calculated, and
is output to the sound wave propagation time data calculation unit 1202 and the gain coefficient
data calculation unit 1204.
[0078]
The sound wave propagation time data calculation unit 1202 calculates sound wave propagation
time data τ 11 and t 1 based on the distance data (| r 1, t 1 − r 1 |) acquired from the distance
data calculation unit 120 1 using Expression (7). The sound wave transit time data buffer 1203 is
output.
[0079]
The sound wave propagation time data buffer 1203 stores the sound wave propagation time data
τ 11 and t 1 acquired from the sound wave propagation time data calculation unit 1202.
Referring to FIG. 14, after the data already stored in the data storage area 1203_2 is moved to
1203_1, the sound wave propagation time data τ 11 and t1 are stored in the data storage area
1203_2. Therefore, at this point, the sound wave propagation time data of the previous segment
is stored in the sound wave transit time data buffer 1203_1. The sound wave propagation time
data buffer is prepared as many as the number of loudspeakers × the number of virtual sound
sources existing at time t1. That is, at least M × N sound wave propagation time data buffers are
09-05-2019
23
provided, and sound wave propagation time data for the past one segment and current sound
wave propagation time data are stored, respectively.
[0080]
The gain coefficient data calculation unit 1204 calculates gain coefficient data G1, t1 based on
the distance data (| r1, t1-r1 |) obtained from the distance data calculation unit 1201 using
Expression (6).
[0081]
The output audio data generation unit 1207 generates output audio data using the new sound
wave propagation time data stored in the sound wave propagation time data buffer 1203 and the
gain coefficient data calculated by the gain coefficient data calculation unit 1204.
[0082]
When the virtual sound source 101_n is separated from the speaker 103_m between the discrete
time (t1-a) and the discrete time (t1-1), the distortion of the waveform as shown in FIG. 6 occurs
as described above. is there.
That is, as shown in the equation (7), since the sound wave propagation time data τ mn, t1 is
larger than the sound wave propagation time data τ mn, t1 -a, the first part in the segment
starting from the discrete time t1 Is the repetition of the last part in the segment starting at
discrete time (t1-a).
That is, in the first part in the segment starting from the discrete time t1, the last part in the
segment starting from the discrete time (t1-a) is a time width Δτ mn, t1 (difference between
sound wave propagation time data) = Τmn, t1-τmn, t1-a). Therefore, the waveform of the audio
data becomes discontinuous in the vicinity of the discrete time t1. This is waveform distortion
and causes noise. Here, in this example, the time width Δτ mn, t1 of the sound wave
propagation time data is set to 5. As described above, FIG. 6 is an explanatory diagram of an
example of the waveform before correction. The waveform before correction from discrete time
t1 to discrete time (t1 + Δτmn, t1) is a waveform in which sample data 308 ', 309', 310 ', 311',
and 312 'are connected. This waveform is the same as the waveform obtained by connecting
sample data 308, 309, 310, 311, and 312 in the immediately preceding segment.
09-05-2019
24
[0083]
First, the correction section width is set to 5 as in the time width Δτ mn, t1. The output audio
data buffer 1209 already stores sample data 312 of the last discrete time (t1-1) of the previous
segment. In the first embodiment, sample data 312 (see FIG. 6) at discrete time (t1-1), that is,
sample data 312 stored in output audio data buffer 1209, is eliminated in order to eliminate
distortion of the waveform shown in FIG. And interpolation of five (Δτmn, t1 = 5) sample data
between the sample data 313 at discrete time (t1 + Δτmn, t1) using a function. Here, linear
interpolation is used as an example. Linear interpolation is a method of calculating an
approximate value, considering that the numbers are linear. Thus, in FIG. 6, it is considered that
the sample data 312 to the sample data 313 are linear. FIG. 15 is an explanatory diagram of an
example of an audio signal waveform formed by the audio data after correction. From FIG. 15, in
the audio signal waveform after correction, the sample data 312 to the sample data 313 are
linearized by linear interpolation (sample data 1500 to sample data 1504), whereby the
waveform distortion shown in FIG. Know that
[0084]
In order to correct the distortion of the waveform near the discrete time t1, the sound wave
propagation time of the segment starting at the discrete time (t1-a) and the sound wave
propagation time of the segment starting at the discrete time t1 are calculated. Just do it. That is,
in order to correct distortion to audio data in the vicinity of the starting point of the current
segment, it is necessary to calculate the sound wave propagation time of audio data of the
segment starting from discrete time (t1 + a) which is the next segment There is no Therefore,
when the virtual sound source 101 ̶ n is separated from the speaker 103 ̶ m, a delay of one
segment does not occur. Therefore, even when the virtual sound source position is changed in
real time, the audio data can be corrected without delay.
[0085]
Next, when the virtual sound source 101_n approaches the speaker 103_m between the discrete
time (t1-a) and the discrete time t1, the sound wave propagation time data τ mn, t1 -a is more
than the sound wave propagation time data t mn, t1 It becomes smaller. Therefore, since (Δτ
mn, t 1 = τ mn, t 1 -a − τ mn, t 1), the time width Δτ mn, t 1 is negative. In this case, audio
data is dropped between the segment starting at the discrete time (t1-a) and the segment starting
09-05-2019
25
at the discrete time t1. FIG. 10 is an explanatory diagram of an example of an audio signal
waveform obtained by combining an audio signal waveform formed by the audio data shown in
FIG. 7 and an audio signal waveform formed by the audio data shown in FIG. As can be seen from
FIG. 10, the audio data changes rapidly in the vicinity of the sample data 317, and as a result,
waveform distortion occurs. The distortion of this waveform is also perceived by the listener as
noise.
[0086]
The output audio data buffer 1209 stores sample data 312 of the last discrete time (t1-1) of the
previous segment. In the first embodiment, four (Δτ mn) between sample data 317 at discrete
time (t 1-1) and sample data 321 at discrete time (t 1 + Δτ m n, t 1) to eliminate distortion of
the waveform shown in FIG. , t1 = 4) is interpolated using a function. Here, linear interpolation is
used as an example. Therefore, in FIG. 10, it is considered that the sample data 312 to the sample
data 321 are linear. FIG. 16 is an explanatory diagram of an example of an audio signal
waveform formed by the audio data after correction. From FIG. 16, in the audio signal waveform
after correction, the sample data 312 to the sample data 321 are linearized by linear
interpolation (sample data 1600 to sample data 1603), thereby eliminating the distortion of the
waveform shown in FIG. Know that As in the case where the virtual sound source 101 ̶ n is
separated from the speaker 103 ̶ m, in order to correct the distortion of the waveform near the
discrete time t1, the sound wave propagation time and discrete time of the segment starting from
the discrete time (t1-a) The sound wave propagation time of the segment starting from t1 may be
calculated. That is, in order to correct distortion to audio data in the vicinity of the starting point
of the current segment, it is necessary to calculate the sound wave propagation time of the audio
data of the segment starting from discrete time (t1 + a) which is the next segment There is no
Therefore, when the virtual sound source 101 ̶ n is separated from the speaker 103 ̶ m, a
delay of one segment does not occur. Therefore, even when the virtual sound source position is
changed in real time, the audio data can be corrected without delay.
[0087]
FIG. 17 is a flowchart showing the flow of data processing according to the first embodiment. The
data processing is executed by the audio data processing unit 1101 under the control of the CPU
1111. The audio data processing unit 1101 first substitutes 1 for the number n of the virtual
sound source 101 ̶ n and substitutes 1 for the number m of the speaker 103 ̶ m. That is, the
audio data processing unit 1101 designates the first virtual sound source 101_1 and the first
speaker 103_1 (S10). The audio data processing unit 1101 inputs an audio file corresponding to
09-05-2019
26
the n-th virtual sound source 101 ̶ n from the audio data storage unit 1103 (S11).
Furthermore, the audio data processing unit 1101 inputs each of virtual sound source position
data and speaker position data corresponding to the virtual sound source 101 ̶ n from the
virtual sound source position data storage unit 1104 and the speaker position data storage unit
1106 (S12). The audio data processing unit 1101 performs first and second distance data (| rn, trm | of the virtual sound source 101_n and the speaker 103_m at successive points in time
based on the input virtual sound source position data and the speaker position data). ) Is
calculated (S13). The audio data processing unit 1101 calculates sound wave propagation time
data τ m n, t with respect to the calculated first and second distance data (| rn, t−rm |) based
on the calculated first and second distance data (| rn, t−rm |) (S14). The audio data processing
unit 1101 stores the sound wave propagation time data τ m n, t and the gain coefficient data G
n, t in the sound wave propagation time data buffer 1203 and the gain coefficient data buffer
1205, respectively. Next, the audio data processing unit 1101 determines whether the first and
second distance data are different (S15). Note that it may be determined whether the sound wave
propagation time τmn, ta corresponding to the previous segment stored in the sound wave
transit time data buffer 1203 is different from the sound wave propagation time data τmn, t
stored this time. That is, in this step, the audio data processing unit 1101 determines whether the
virtual sound source 101 ̶ n is moving or stationary with respect to the speaker 103 ̶ m.
[0088]
If it is determined in S15 that the first and second distance data are different (S15: YES), that is, if
it is determined that the virtual sound source 101_n has moved relative to the speaker 103_m,
the audio data processing unit 1101 determines whether the first and second distance data are
different. Proceed to processing. On the other hand, if it is determined in S15 that the first and
second distance data are the same (S15: NO), that is, if it is determined that the virtual sound
source 101_n is stationary, the audio data processing unit 1101 The process proceeds to step
S19. The audio data processing unit 1101 identifies the repeated portion and the missing portion
of the sample data caused by the separation and approach of the virtual sound source to the
speaker based on the determination result of S15 (S16), and the above describes the waveform
distortion portion. The waveform is corrected by performing linear interpolation (S17).
[0089]
Next, the audio data processing unit 1101 performs gain control on the virtual sound source
101_n (S18). Next, the audio data processing unit 1101 adds 1 to the number n of the virtual
sound source 101_n (S19), and determines whether the number n of the virtual sound source
09-05-2019
27
101_n is the maximum value N (S20). As a result of the determination in S20, when it is
determined that the number n of the virtual sound source 101_n is the maximum value N (S20:
YES), audio data is synthesized (S21). On the other hand, as a result of the determination in S20,
when it is determined that the number of the virtual sound source 101_n is not the maximum
value N (S20: NO), the audio data processing unit 1101 returns to the process of S11 and then
the second virtual sound source 101_2 The processing of S11 to S18 is performed on the first
speaker 103_1.
[0090]
After synthesizing the audio data in S21, the audio data processing unit 1101 substitutes 1 into
the number n of the virtual sound source 101_n (S22), and adds 1 to the number m of the
speaker 103_m (S23). Next, the audio data processing unit 1101 determines whether the
number m of the speaker 103 _m is the maximum value M (S24), and when it is determined that
the number m of the speaker 103 _m is the maximum value M (S24: YES), Finish. On the other
hand, when it is determined that the number m of the speaker 103 ̶ m is not the maximum
value M (S24: NO), the process returns to S11.
[0091]
Second Embodiment FIG. 18 is a block diagram showing an example of the internal configuration
of an audio device 1100 according to a second embodiment. The second embodiment executes
the program stored in the ROM 1112 in the audio device 1100, while the second embodiment
executes the program in the electrically erasable programmable read-only memory (EEPROM) or
the internal storage device 25. The stored program is read out and executed. The audio device
1100 includes an EEPROM 24, an internal storage device 25, and a recording medium reading
unit 23. The CPU 17 reads a program 231 from a recording medium 230 such as a CD (Compact
Disk) -ROM or a DVD (Digital Versatile Disk) -ROM inserted into the recording medium reading
unit 23 and stores the program 231 in the EEPROM 24 or the internal storage device 25. It is
The CPU 17 reads the program 231 stored in the EEPROM 24 or the internal storage device 25
to the RAM 18 and executes the program.
[0092]
The program 231 according to the present invention is not limited to the case where it is read
09-05-2019
28
from the recording medium 230 and stored in the EEPROM 24 or the internal storage device 25,
and may be stored in an external memory such as a memory card. In this case, the program 231
is read from an external memory (not shown) connected to the CPU 17 and stored in the
EEPROM 24 or the internal storage device 25. Furthermore, communication may be established
between a communication unit (not shown) connected to the CPU 17 and an external computer,
and the program 231 may be downloaded to the EEPROM 24 or the internal storage device 25.
[0093]
DESCRIPTION OF SYMBOLS 101 Virtual sound source 1100 Audio apparatus 1101 Audio data
processing part 1102 Content information separation part 1109 Reproduction part 1110
Communication interface part 1115 Server 1116 Broadcasting station
09-05-2019
29
Документ
Категория
Без категории
Просмотров
0
Размер файла
44 Кб
Теги
jp2011124723
1/--страниц
Пожаловаться на содержимое документа