close

Вход

Забыли?

вход по аккаунту

?

JP2007110451

код для вставкиСкачать
Patent Translate
Powered by EPO and Google
Notice
This translation is machine-generated. It cannot be guaranteed that it is intelligible, accurate,
complete, reliable or fit for specific purposes. Critical decisions, such as commercially relevant or
financial decisions, should not be based on machine-translation output.
DESCRIPTION JP2007110451
An audio signal adjustment apparatus and the like are provided to adjust sound quality easily, at
high speed, or accurately. A sub-band data input unit 1 acquires a sub-band data group
representing a temporal change in the intensity of a fundamental frequency component or a
harmonic component of speech, while a calibration data generation unit 3 is provided with a
speech reproduction unit 5. Receives the voice to be reproduced, and generates a calibration subband data group representing this voice. The sound quality adjustment unit 4 adjusts the
strength of each subband data in the subband data group based on the calibration subband data
group, and if the sound quality designation data input unit acquires the sound quality
designation data, The strength of each sub-band data in the sub-band data group is adjusted in
accordance with the sound quality designation data. The audio reproduction unit 5 reproduces
the audio represented by the adjusted sub-band data group. [Selected figure] Figure 1
Audio signal conditioning apparatus, audio signal conditioning method and program
[0001]
The present invention relates to an audio signal adjustment device, an audio signal adjustment
method, and a program.
[0002]
When reproducing speech using speech data, generally, the difference between the original
speech represented by the speech data and the actually reproduced speech is corrected, the noise
is eliminated from the reproduced speech, or the original speech is auditory The adjustment of
the sound quality of the sound to be reproduced is performed for the purpose of adding the
above special effect and the like.
08-05-2019
1
[0003]
The adjustment of the sound quality is conventionally performed by causing an audio
reproducing apparatus provided with an equalizer to reproduce voice using test voice data,
receiving the reproduced voice, and receiving the waveform of the received voice and the voice
data for the test. The frequency characteristic of the equalizer is determined on the basis of the
difference from the waveform represented by, and the operation is performed by operating the
equalizer so as to obtain the determined frequency characteristic (for example, see Patent
Document 1).
In addition, as speech data for a test, for example, one representing an impulse waveform or a
sweep waveform has been used.
JP 2001-197585 A
[0004]
However, the conventional equalizer has a complicated structure and a large manufacturing cost.
In addition, the equalizer that changes the frequency characteristics accurately as determined is
complicated in configuration, and it has been technically and economically difficult to
manufacture it. In addition, if an impulse waveform is used as test speech, the frequency band of
reproduced speech becomes extremely wide, so it is difficult to accurately identify the frequency
characteristic, and therefore the determination result of the frequency characteristic of the
equalizer tends to be inadequate. In addition, if a sweep waveform is used as test sound, it takes a
long time to specify the frequency characteristics of the sound to be reproduced.
[0005]
The present invention has been made in view of the above circumstances, and it is an object of
the present invention to provide an audio signal adjustment device, an audio signal adjustment
method, and a program for adjusting the sound quality easily, at high speed or accurately.
[0006]
08-05-2019
2
In order to achieve the above object, an audio signal adjustment device according to a first aspect
of the present invention acquires an audio signal group consisting of an audio signal
representing a temporal change in intensity of a fundamental frequency component or a
harmonic component of audio from the outside. Based on the audio signal acquisition means, an
audio signal adjustment means for changing the intensity of the audio signal included in the
audio signal group acquired by the audio signal acquisition means, the audio signal group based
on the audio signal group in which the intensity of the audio signal is changed And waveform
generation means for generating a signal representing a waveform of the sound represented by
the group.
[0007]
The audio signal adjustment device may further include specification data acquisition means for
acquiring specification data for specifying an aspect of changing the strength of the audio signal
included in the audio signal group acquired by the audio signal acquisition means from the
outside.
In this case, the audio signal adjusting unit changes the strength of the audio signal included in
the audio signal group acquired by the audio signal acquiring unit in a mode specified by the
designated data acquired by the designated data acquiring unit. May be
[0008]
The voice signal adjusting device receives a voice, and a calibration voice signal group including
a calibration voice signal representing a temporal change in the strength of a fundamental
frequency component and a harmonic component of a voice signal to be processed which
represents the waveform of the voice. The apparatus may further include calibration audio signal
generation means for generating
In this case, the audio signal adjusting means may substantially change the value of the intensity
of the audio signal included in the audio signal group acquired by the audio signal acquiring
means after changing the intensity of the audio signal and the audio signal. It may be determined
based on the strength of the calibration audio signal representing the component of the same
frequency, and the strength of the audio signal may be changed according to the determination
result.
08-05-2019
3
[0009]
The calibration voice signal generation means generates a signal representing the waveform of
the received voice, and substantially equalizes the time lengths of the sections corresponding to
the unit pitch of the signal, thereby making the signal into a pitch waveform signal. And means
for processing, and means for generating a signal representing a temporal change in the intensity
of the fundamental frequency component and the harmonic component of the pitch waveform
signal as the calibration audio signal.
[0010]
Further, according to a second aspect of the present invention, there is provided an audio signal
adjusting method comprising: acquiring an audio signal group consisting of an audio signal
representing a temporal change in intensity of a fundamental frequency component or a
harmonic component of audio; The intensity of the audio signal included in the group is changed,
and the signal representing the waveform of the audio represented by the audio signal group is
generated based on the audio signal group in which the intensity of the audio signal is changed.
[0011]
According to a third aspect of the present invention, there is provided a program according to a
third aspect of the present invention, which comprises: a computer; An audio signal adjusting
means for changing the strength of the audio signal included in the audio signal group acquired
by the audio signal acquisition means; and an audio signal represented by the audio signal group
based on the audio signal group in which the audio signal strength is changed. And waveform
generation means for generating a signal representing the waveform.
[0012]
According to the present invention, an audio signal adjustment device, an audio signal
adjustment method, and a program for adjusting the sound quality easily, at high speed or
accurately are realized.
[0013]
Hereinafter, an embodiment of the present invention will be described by taking a sound quality
adjustment device as an example and referring to the drawings.
08-05-2019
4
FIG. 1 is a diagram showing the configuration of this sound quality adjustment device.
As shown in the figure, this sound quality adjustment apparatus is composed of a sub-band data
input unit 1, a sound quality specification data input unit 2, a calibration data generation unit 3, a
sound quality adjustment unit 4, and an audio reproduction unit 5. There is.
[0014]
The sub-band data input unit 1 may be, for example, a recording medium driver (flexible disk
drive, MO drive, etc.) for reading data recorded on a recording medium (for example, flexible
disk, MO (Magneto Optical disk), etc.) It comprises a communication control device or the like
that is composed of a USB (Universal Serial Bus) interface circuit or the like and controls data
exchange with the outside.
[0015]
The sub-band data input unit 1 acquires a sub-band data group representing voice and supplies
the group to the sound quality adjustment unit 4.
The sub-band data group includes 0-th sub-band data representing time-varying changes in the
strength of fundamental frequency components of speech, and 1 s to 1 The data is data including
n sub-band data up to the n-th.
Each sub-band data represents the intensity of the fundamental frequency component (or
harmonic component) in the form of a DC signal when there is no temporal change in the
intensity of the fundamental frequency component (or harmonic component) of speech.
[0016]
Further, when the speech represented by the sub-band data group is such that the phases of the
respective sections are aligned by phase-shifting each section corresponding to the unit pitch, the
sub-band data input unit 1 is the sub-band If it is possible to acquire pitch information on the
voice represented by the data group, this pitch information is also acquired and supplied to the
08-05-2019
5
audio reproduction unit 5.
The pitch information is information representing the original value of the length (pitch length)
of each section of the speech represented by the sub-band data group.
[0017]
The sound quality specification data input unit 2 includes, for example, an input device such as a
keyboard and a pointing device, and a processor such as a CPU (Digital Signal Processor). The
sound quality designation data input unit 2 acquires the sound quality designation data
according to the operation when the operator inputs the sound quality designation data. Then,
the acquired sound quality designation data is supplied to the sound quality adjustment unit 4.
[0018]
The sound quality specification data is data for specifying how to change the strength of each of
the subband data constituting the subband data group acquired by the subband data input unit 1,
and, for example, each subband data is It consists of data representing a factor to be multiplied
by the strength of the component to be represented.
[0019]
The calibration data generation unit 3 includes a calibration voice input unit 31, a pitch
extraction unit 32, and a sub-band analysis unit 33.
[0020]
The calibration voice input unit 31 is configured of a sound receiving device including a
microphone, an AF (Audio Frequency) amplifier, a sampler, an A / D (Analog-to-Digital) converter,
a PCM encoder, and the like.
The calibration voice input unit 31 amplifies a voice signal representing a voice received by its
own microphone, performs sampling and A / D conversion, and generates calibration voice data
representing the sampled voice signal, The signal is supplied to the pitch extraction unit 32.
08-05-2019
6
[0021]
The calibration audio data may have, for example, a format of a PCM (Pulse Code Modulation) modulated digital signal, and the audio received by the calibration audio input unit 31 may be
constant and sufficiently shorter than the pitch thereof. It suffices to represent the result of
sampling at a period of.
[0022]
Each of the pitch extraction unit 32 and the sub-band analysis unit 33 includes a processor such
as a digital signal processor (DSP) or a CPU and a memory such as a random access memory
(RAM).
Note that a single processor or a single memory may perform some or all of the functions of the
pitch extraction unit 32 and the subband analysis unit 33.
Also, a processor that performs the function of the sound quality specification data input unit 2
may perform some or all of the functions of the pitch extraction unit 32 and the sub-band
analysis unit 33 in common.
[0023]
Functionally, for example, as shown in FIG. 2, the pitch extraction unit 32 calculates a cepstrum
analysis unit 321, an autocorrelation analysis unit 322, a weight calculation unit 323, and BPF
(Band Pass Filter) coefficients. A section 324, a band pass filter 325, a zero cross analysis section
326, a waveform correlation analysis section 327, a phase adjustment section 328, and a
resampling section 329.
[0024]
Note that a single processor or a single memory has a cepstrum analysis unit 321, an
autocorrelation analysis unit 322, a weight calculation unit 323, a BPF (Band Pass Filter)
coefficient calculation unit 324, a band pass filter 325, a zero cross analysis unit 326, and a
waveform. Some or all of the functions of the correlation analysis unit 327, the phase adjustment
unit 328, and the resampling unit 329 may be performed.
08-05-2019
7
[0025]
The cepstrum analysis unit 321 performs cepstrum analysis on the calibration speech data
supplied from the calibration speech input unit 31 to specify the fundamental frequency and the
formant frequency of the speech represented by the calibration speech data.
Then, data indicating the identified fundamental frequency is generated and supplied to the
weight calculation unit 323, and data indicating the identified formant frequency is generated
and supplied to the subband analysis unit 33.
[0026]
Specifically, when the cepstrum analysis unit 321 receives the calibration speech data from the
calibration speech input unit 31, first, the cepstrum analysis unit 321 performs a fast Fourier
transform method (or discrete variables) on the spectrum of the calibration speech data. And any
other method for generating data representing the result of Fourier transform of
[0027]
Next, the cepstrum analysis unit 321 converts the intensity of each component of the obtained
spectrum into a value corresponding to the logarithm of each original value.
(The base of logarithm is arbitrary, for example, common logarithm etc. may be sufficient.
Next, the cepstrum analysis unit 321 performs a fast inverse Fourier transform (or an inverse
Fourier transform of discrete variables) on the result (in other words, cepstrum) obtained by
applying the inverse Fourier transform to the spectrum whose value has been transformed. It is
determined by any other method of generating data to be represented.
[0028]
Then, the cepstrum analysis unit 321 specifies the fundamental frequency of the voice
represented by the cepstrum based on the obtained cepstrum, generates data indicating the
08-05-2019
8
specified fundamental frequency, and supplies the data to the weight calculation unit 323.
Specifically, for example, the cepstrum analysis unit 321 extracts a frequency component (long
component) having a predetermined quefrance or more from the cepstrum by filtering (i.e.,
lifting) the obtained cepstrum, and the extracted length The fundamental frequency may be
specified based on the position of the component peak.
[0029]
When the autocorrelation analysis unit 322 is supplied with the calibration audio data from the
calibration audio input unit 31, the autocorrelation analysis unit 322 analyzes the waveform of
the calibration audio data based on the autocorrelation function to generate the audio
represented by the calibration audio data. , And generates data indicating the specified
fundamental frequency and supplies the data to the weight calculation unit 323.
[0030]
Specifically, when the calibration speech data is supplied from the calibration speech input unit
31, the autocorrelation analysis unit 322 first specifies the autocorrelation function r (l)
represented by the right side of Formula 1.
[0031]
[0032]
Next, the autocorrelation analysis unit 322 sets the minimum value exceeding the predetermined
lower limit value among the frequencies giving the maximum value of the function (periodogram)
obtained as a result of Fourier transform of the autocorrelation function r (l) to the fundamental
frequency , And generates data indicating the specified fundamental frequency and supplies the
data to the weight calculation unit 323.
[0033]
When the weight calculating unit 323 receives a total of two pieces of data indicating the
fundamental frequency from the cepstrum analyzing unit 321 and the autocorrelation analyzing
unit 322 one by one, the weight calculating unit 323 averages the absolute values of the
reciprocals of the fundamental frequencies indicated by these two data. Ask for
08-05-2019
9
Then, data indicating the calculated value (that is, the average pitch length) is generated and
supplied to the BPF coefficient calculation unit 324.
[0034]
The BPF coefficient calculation unit 324 is supplied with data indicating the average pitch length
from the weight calculation unit 323, and when supplied with a zero cross signal described later
from the zero cross analysis unit 326, the average pitch length is calculated based on the
supplied data or the zero cross signal. It is determined whether or not the pitch signal and the
period of the zero cross differ from each other by a predetermined amount or more.
Then, when it is determined that they are not different, the frequency characteristic of the band
pass filter 325 is controlled so that the reciprocal of the period of the zero crossing is set as the
center frequency (the center frequency of the pass band of the band pass filter 325).
On the other hand, when it is determined that they differ by a predetermined amount or more,
the frequency characteristic of the band pass filter 325 is controlled such that the reciprocal of
the average pitch length is used as the center frequency.
[0035]
The band pass filter 325 performs the function of a FIR (Finite Impulse Response) type filter
whose center frequency is variable.
Specifically, the band pass filter 325 sets its own center frequency to a value according to the
control of the BPF coefficient calculation unit 324. Then, the calibration speech data supplied
from the calibration speech input unit 31 is filtered, and the filtered calibration speech data
(pitch signal) is supplied to the zero cross analysis unit 326 and the waveform correlation
analysis unit 327. The pitch signal is assumed to be digital data having a sampling interval
substantially the same as the sampling interval of the calibration audio data. The bandwidth of
the band pass filter 325 is preferably such that the upper limit of the pass band of the band pass
filter 325 always falls within twice the fundamental frequency of the sound represented by the
calibration audio data.
08-05-2019
10
[0036]
The zero cross analysis unit 326 identifies the timing when the time when the instantaneous
value of the pitch signal supplied from the band pass filter 325 becomes 0 (time when the zero
cross occurs), a signal (zero cross signal) representing the identified timing as the BPF coefficient
The data is supplied to the calculation unit 324. However, the zero cross analysis unit 326
specifies the timing when the moment when the instantaneous value of the pitch signal becomes
a predetermined value that is not 0, and supplies the signal representing the specified timing to
the BPF coefficient calculation unit 324 instead of the zero cross signal. You may do it.
[0037]
When the waveform correlation analysis unit 327 receives the calibration speech data from the
calibration speech input unit 31, the calibration is performed at the timing when the boundary of
the unit cycle (for example, one cycle) of the pitch signal supplied from the band pass filter 325
comes. Separate the audio data for Then, for each of the sections that can be divided, the
correlation between what variously changed the phase of the calibration speech data in this
section and the pitch signal in this section are found, and the calibration speech when the
correlation becomes highest The phase of the data is specified as the phase of the calibration
audio data in this section.
[0038]
Specifically, for each section, the waveform correlation analysis unit 327 varies, for example, the
value cor represented by the right side of Formula 2 and the value of .phi. (Where .phi. Is an
integer of 0 or more) representing the phase. It asks about each when changing it. Then, the
waveform correlation analysis unit 327 specifies the value Ψ of φ at which the value cor
becomes maximum, generates data indicating the value Ψ, and as phase data representing the
phase of the calibration audio data in this section The signal is supplied to the phase adjustment
unit 328.
[0039]
08-05-2019
11
[0040]
The time length of the section is preferably about one pitch.
As the section is longer, the number of samples in the section increases and the amount of data
of calibration pitch waveform data (described later) increases, or the sampling interval increases
and the voice represented by the calibration pitch waveform data becomes inaccurate. The
problem arises.
[0041]
When the phase adjustment unit 328 is supplied with the calibration voice data from the
calibration voice input unit 31 and the data indicating the phase Ψ of each section of the
calibration voice data from the waveform correlation analysis unit 327, the phase adjustment
unit 328 The phases of the sections are aligned by shifting the phase of the calibration audio
data by (− 区間). Then, it supplies the phase-shifted calibration audio data (calibration pitch
waveform data) to the resampling unit 329.
[0042]
The resampling unit 329 resamples (resamples) each section of the calibration audio data
supplied from the phase adjustment unit 328, and supplies the resampled calibration pitch
waveform data to the subband analysis unit 33. .
[0043]
However, the resampling unit 329 performs resampling so that the number of samples in each
section of the audio data for calibration is approximately equal and a constant number, and the
intervals are equal in the same section.
For a section where the number of samples does not reach this fixed number, by adding a sample
having a value that interpolates between adjacent samples on the time axis according to a
predetermined method (for example, Lagrange interpolation), Make the number of samples equal
to this fixed number.
08-05-2019
12
[0044]
The subband analysis unit 33 generates the calibration subband data group by performing
orthogonal transformation such as discrete cosine transform (DCT) on the calibration pitch
waveform data supplied from the resampling unit 329, and generates the calibration The sub
band data group is supplied to the sound quality adjustment unit 4. The calibration sub-band
data group includes 0th calibration sub-band data representing the temporal change of the
intensity of the fundamental frequency component of the speech represented by the calibration
pitch waveform data supplied to the sub-band analysis unit 33 and n of this speech It is data
including 1st to n-th n pieces of calibration subband data representing the time change of the
intensities of n (n is a natural number described above) of harmonic components.
[0045]
When the tone quality adjustment unit 4 is supplied with the subband data group from the
subband data input unit 1 and the calibration subband data group from the subband analysis unit
33 of the calibration data generation unit 3, Change each subband data in the subband data
group so that the intensity of the k-th (k is an integer of 0 or more and n or less) subband data in
the data group becomes the value Y (k) shown in Equation 3. . Then, the subband data group
whose value has been changed is supplied to the subband combining unit 51.
[0046]
(Equation 3) Y (k) = {α · X (k)} <2> / {R (k)} (where X (k) is the k-th before changing the subband
data in the subband data group Intensity, R (k) is the intensity of the kth subband data in the
calibration subband data group, α is a predetermined proportional coefficient)
[0047]
When the calibration data generation unit 3 receives the sound reproduced by the sound
reproduction unit 5, the sound represented by the sub-band data group supplied from the sound
quality adjustment unit 4 to the sound reproduction unit 5 is the sound reproduction unit The
time length from the time when the calibration sub-band data group representing the voice is
generated to the time of being supplied to the sound quality adjustment unit 4 after being
08-05-2019
13
reproduced by 5 is so short that R (k) is Y (k). Under the condition that it can be regarded as
proportional to, the value of Y (k) is substantially adjusted to a value proportional to {α · X (k)}.
(However, it is assumed that the sound quality designation data is not supplied from the sound
quality designation data input unit 2. )
[0048]
However, when the sound quality adjustment data is supplied from the sound quality
specification data input section 2, the sound quality specification data is the strength of each subband data in the sub-band data group after changing the value. By further changing to the
designated strength, the sound quality of the sound represented by the sub-band data group as a
whole is adjusted. For example, if the sound quality designation data represents a coefficient to
be multiplied by the strength of the component represented by the sub-band data, the sub-band
data such that the product of the strength of the component and the coefficient becomes a new
strength. Change the strength of Then, the subband synthesis unit 51 is supplied with the
subband data group whose tone quality has been adjusted by further changing the intensity.
[0049]
The audio reproduction unit 5 is composed of a sub-band synthesis unit 51, an audio waveform
restoration unit 52, and an audio output unit 53. Among these, each of the sub-band synthesis
unit 51 and the speech waveform restoration unit 52 is configured by a processor such as a DSP
or a CPU and a memory such as a RAM. Note that a single processor or a single memory may
perform some or all of the functions of the subband synthesis unit 51 and the speech waveform
restoration unit 52. Also, a processor that performs some or all of the functions of the sound
quality specification data input unit 2, the pitch extraction unit 32, and the sub-band analysis
unit 33 performs all or some of the functions of the sub-band synthesis unit 51 and the speech
waveform restoration unit 52. It may be performed in common.
[0050]
When the sub-band synthesis unit 51 receives the sub-band data group from the sound quality
adjustment unit 4, the sub-band synthesis unit 51 converts the sub-band data group into a pitch
such that the intensity of each frequency component is represented by the sub-band data group
Waveform data (that is, voice data in which the phase of each section is aligned by phase shifting
08-05-2019
14
each section corresponding to a unit pitch of voice) or voice data that has not been subjected to
the process of aligning the phases of the sections is restored and restored The pitch waveform
data or the voice data thus obtained is supplied to the voice waveform restoration unit 52.
[0051]
The transformation performed on the subband data group by the subband synthesis unit 51 is
substantially in a reverse conversion relationship to the transformation performed on the speech
data to generate the subband data group acquired by the subband data input unit 1. It is a
conversion as it is.
Therefore, for example, when the sub-band data group is generated by performing DCT on pitch
waveform data, the sub-band combining unit 51 may perform IDCT (Inverse DCT) on the subband data group. .
[0052]
If the data supplied from the subband synthesis unit 51 is pitch waveform data, the speech
waveform restoration unit 52 receives the time length of each section of the pitch waveform data
and the pitch information supplied from the subband data input unit 1 Change to the indicated
time length. The change of the time length of the section may be performed, for example, by
changing the interval and / or the number of samples in the section. Then, the voice waveform
restoration unit 52 supplies, to the voice output unit 53, pitch waveform data (that is, voice data
representing the restored voice) in which the time length of each section is changed. On the
other hand, if the data supplied from the sub-band synthesis unit 51 is voice data which has not
been subjected to the process of aligning the phases of the sections, the voice waveform
restoration unit 52 is voice data representing the voice that has been restored. It supplies to the
audio | voice output part 53 as what is.
[0053]
The audio output unit 53 includes, for example, a control circuit that performs a PCM decoder
function, a D / A (Digital-to-Analog) converter, an AF (Audio Frequency) amplifier, a speaker, and
the like. The audio output unit 53 receives the audio data representing the recovered audio from
08-05-2019
15
the audio waveform recovery unit 52, demodulates these audio data, performs D / A conversion
and amplification, and obtains the obtained analog signal. The sound is reproduced by driving
the speaker using it.
[0054]
By performing the above-described operation under the condition that the calibration data
generation unit 3 receives the sound reproduced by the sound reproduction unit 5, the sound
quality adjustment device adjusts the sound quality of the sound represented by the sub-band
data Do.
[0055]
The adjustment of the sound quality is performed by changing the strength of the component
represented by the sub-band data, while the sub-band data is unless the temporal change of the
strength of the fundamental frequency component or the harmonic component of the voice is
particularly steep. Since it can be regarded as a direct current signal, the configuration of the
sound quality adjustment device can be simplified and easily manufactured.
[0056]
In addition, the process of changing the intensity of the sub-band data that can be regarded as a
direct current signal is different from the filtering by the finite order filter, and the desired
characteristics can be accurately obtained, so the sound quality adjustment is performed
accurately.
[0057]
In addition, since the sound quality adjustment apparatus can use the sound reproduced by itself
as the test sound based on arbitrary sub-band data acquired from the outside, the sound quality
adjustment is performed using a predetermined test signal. Therefore, it is not necessary to
spend time, and it is possible to adjust the sound quality while reproducing the audio that is
originally intended to be reproduced.
[0058]
The configuration of the sound quality adjustment apparatus is not limited to the above.
08-05-2019
16
For example, the sound quality adjustment apparatus may not necessarily include both the sound
quality specification data input unit 2 and the calibration data generation unit 3.
When the sound quality adjustment apparatus does not include the calibration data generation
unit 3 (or when the calibration sub-band data is not supplied from the calibration data generation
unit 3), the sound quality adjustment unit 4 includes the sub-band data input unit 1. The subband
data group supplied by the user may be treated as if the value of the subband data is a modified
subband data group.
Then, the intensity of each sub-band data in the sub-band data group may be immediately
changed to the intensity designated by the sound quality designation data.
[0059]
Also, the sub-band data input unit 1 may acquire sub-band data from the outside via a
communication line such as a telephone line, a dedicated line, or a satellite line.
In this case, the sub-band data input unit 1 may be provided with a communication control
device including, for example, a modem. Similarly, the sound quality specification data input unit
2 may be provided with a communication control device, and sound quality specification data
may be acquired from the outside via a communication line. Note that one recording medium
drive device or communication control device may perform the functions of the sub-band data
input unit 1 and the sound quality designation data input unit 2 as well.
[0060]
In addition, the pitch extraction unit 32 does not have to include the cepstrum analysis unit 321
(or the autocorrelation analysis unit 322), and in this case, the weight calculation unit 323 does
not include the cepstrum analysis unit 321 (or the autocorrelation analysis unit 322). The
reciprocal of the obtained fundamental frequency may be treated as the average pitch length as
it is. In addition, the waveform correlation analysis unit 327 may supply the pitch signal supplied
from the band pass filter 325 to the cepstrum analysis unit 321 as a zero cross signal as it is.
08-05-2019
17
[0061]
Also, the sound quality adjustment unit 4 may remove noise from the sub-band data by filtering
the sub-band data and substantially removing the AC component.
[0062]
The embodiment of the present invention has been described above, but the audio signal
adjustment device according to the present invention can be realized using a normal computer
system, not based on a dedicated system.
For example, in a personal computer provided with a microphone, sampling circuit, A / D
converter, D / A converter, speaker, etc., the above-mentioned sub-band data input unit 1, sound
quality specification data input unit 2, calibration data generation unit 3, By installing the
program from a medium (such as a CD-ROM or a flexible disk) storing a program for executing
the operation of the sound quality adjustment unit 4 and the sound reproduction unit 5, a sound
quality adjustment device configured to execute the above-mentioned processing is configured.
can do.
[0063]
Then, it is assumed that the above-described personal computer that executes this program
performs the processing shown in FIGS. 3 to 4 as the processing corresponding to the operation
of the voice adjustment device in FIG. 1. 3 to 4 are flowcharts showing processing executed by
this personal computer.
[0064]
That is, first, the personal computer acquires the above-mentioned sub-band data group from the
outside (FIG. 3, step S101), and the speech represented by the sub-band data group shifts each
section corresponding to the unit pitch. By matching, the phases of the sections are aligned, and
if it is possible to acquire pitch information for the sound represented by the sub-band data
group, this pitch information is also acquired (step S101). Further, when the operator performs
an operation of inputting the sound quality designation data, the personal computer acquires the
sound quality designation data according to the operation (step S101).
08-05-2019
18
[0065]
On the other hand, this personal computer receives the voice, samples it, and A / D converts it to
generate digital voice data for calibration (step S102). Then, the calibration speech data is filtered
to generate the filtered calibration speech data (pitch signal) (step S103).
[0066]
Note that this personal computer performs feedback processing based on the filtering
characteristics performed to generate the pitch signal based on the pitch length described later
and the time when the instantaneous value of the pitch signal is 0 (time when the zero cross
occurs). decide.
[0067]
That is, this personal computer specifies the fundamental frequency of the voice represented by
the voice data by performing, for example, the above-described cepstrum analysis or analysis
based on the above-described autocorrelation function on the voice data generated by receiving
the sound. Then, the absolute value (that is, the pitch length) of the reciprocal of this
fundamental frequency is determined (step S104).
(Or, this personal computer specifies two fundamental frequencies by performing both cepstrum
analysis and analysis based on an autocorrelation function, and obtains the average of the
absolute value of the reciprocal of these two fundamental frequencies as the pitch length. You
may )
[0068]
On the other hand, this personal computer specifies the timing when the time when the pitch
signal crosses zero comes (step S105). Then, the personal computer determines whether or not
the pitch length and the zero crossing period of the pitch signal differ from each other by a
predetermined amount or more (step S106), and determines that the reciprocal of the zero
crossing period is not different. It is assumed that the above-mentioned filtering is performed
08-05-2019
19
with the characteristics of the band pass filter to be the center frequency (step S107). On the
other hand, if it is determined that they differ by a predetermined amount or more, the abovedescribed filtering is performed with the characteristics of the band pass filter in which the
reciprocal of the pitch length is the center frequency (step S108).
[0069]
Next, the personal computer divides the audio data read from the recording medium at the
timing when the boundary of the unit period of the generated pitch signal comes (specifically, the
timing when the pitch signal crosses zero) (step S109). Then, for each of the sections that can be
divided, the correlation between what varied the phase of the voice data in this section and the
pitch signal in this section is found, and the phase of the voice data at the highest correlation is
calculated. The phase is specified as the phase of audio data in this section (FIG. 4, step S110).
Then, the pitch waveform data for calibration is generated by phase-shifting each section of the
voice data so that the phases become substantially the same as each other (step S111).
Specifically, the personal computer specifies the above-mentioned value Ψ in step S110 for each
section, and shifts the voice data in the section by (−Ψ) in step S111.
[0070]
Next, the personal computer resamples each section of the calibration pitch waveform data (step
S112). The personal computer may perform resampling so that the number of samples in each
section of the pitch waveform data is substantially equal to each other, and the intervals are
equal in the same section. If the number of samples does not reach this fixed number, the
number of samples in this section can be set to this fixed number by adding samples with values
that interpolate between adjacent samples on the time axis according to a predetermined
method. You just have to align.
[0071]
Next, the personal computer generates a calibration subband data group by performing
orthogonal transformation on the calibration pitch waveform data (step S113). Then, the
intensity before modification of the kth subband data in the subband data group acquired in step
S101 is X (k), and the kth subband data in the calibration subband data group generated in step
S113 In the subband data group such that the intensity of the kth subband data in the subband
08-05-2019
20
data group acquired in step S101 becomes the above-mentioned value Y (k) shown in Formula 3.
The respective sub-band data is changed (step S114), and the process proceeds to step S116.
Note that, in a state in which the calibration subband data group has not been created yet, this
personal computer may treat the subband data group acquired in step S101 as having undergone
the process in step S114.
[0072]
Next, when the personal computer is also acquiring the sound quality designation data, the
intensity of each sub-band data in the subband data group after changing the value in step S114
is further converted to the intensity specified by the sound quality designation data. By changing,
the sound quality of the sound which the sub-band data group represents as a whole is adjusted
(step S115), and the process proceeds to step S116.
[0073]
In step S116, the personal computer converts the sub-band data group subjected to the
processing up to step S114 or S115 to convert pitch waveform data or audio data in which the
intensity of each frequency component is represented by the sub-band data group. Restore.
The conversion to be applied to the sub-band data group in step S116 is a conversion that is
substantially inverse to the conversion applied to the audio data to generate the sub-band data
group acquired in step S101 I assume.
[0074]
Next, if the data generated in step S116 is pitch waveform data, the personal computer changes
the time length of each section of the pitch waveform data to the time length indicated by the
pitch information acquired in step S101. (Step S117), and the process proceeds to step S118. On
the other hand, if the data generated in step S116 is audio data that has not undergone the
process of aligning the phases of the sections, the process of step S117 is omitted and the
process immediately proceeds to step S118.
[0075]
In step S118, the personal computer demodulates the audio data obtained by the processing up
08-05-2019
21
to step S116 or step S117, performs D / A conversion and amplification, and reproduces the
audio using the obtained analog signal.
[0076]
For example, this program may be uploaded to a bulletin board (BBS) of a communication line,
and may be distributed via the communication line, or the carrier wave may be modulated by a
signal representing this program, and the obtained modulated wave may be transmitted.
Alternatively, a device that receives this modulated wave may demodulate the modulated wave to
restore these programs.
Then, the above process can be executed by activating this program and executing it under the
control of the OS in the same manner as other application programs.
[0077]
If the OS shares part of the processing or if the OS constitutes part of one component of the
present invention, the recording medium stores the program excluding that part. May be Also in
this case, in the present invention, the recording medium stores a program for executing each
function or step executed by the computer.
[0078]
It is a block diagram which shows the structure of the sound quality adjustment apparatus which
concerns on embodiment of this invention. It is a block diagram which shows the structure of a
pitch extraction part. It is a flowchart which shows the process which the personal computer
which performs the function of the audio | voice adjustment apparatus which concerns on
embodiment of this invention performs. It is a continuation of the flowchart which shows the
process which the personal computer which performs the function of the audio | voice
adjustment apparatus which concerns on embodiment of this invention performs.
Explanation of sign
08-05-2019
22
[0079]
Reference Signs List 1 subband data input unit 2 sound quality designation data input unit 3
calibration data generation unit 31 calibration speech input unit 32 pitch extraction unit 321
cepstrum analysis unit 322 autocorrelation analysis unit 323 weight calculation unit 324 BPF
coefficient calculation unit 325 band pass filter 326 zero cross analysis unit 327 waveform
correlation analysis unit 328 phase adjustment unit 329 resampling unit 33 sub band analysis
unit 4 sound quality adjustment unit 5 speech reproduction unit 51 sub band synthesis unit 52
speech waveform restoration unit 53 speech output unit
08-05-2019
23
Документ
Категория
Без категории
Просмотров
0
Размер файла
36 Кб
Теги
jp2007110451
1/--страниц
Пожаловаться на содержимое документа