вход по аккаунту



код для вставкиСкачать
Patent Translate
Powered by EPO and Google
This translation is machine-generated. It cannot be guaranteed that it is intelligible, accurate,
complete, reliable or fit for specific purposes. Critical decisions, such as commercially relevant or
financial decisions, should not be based on machine-translation output.
BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention distributes
source sound through multiple band pass filters in many parallel channels in enhancing the
desired sound and processing the source sound for unwanted sound; Providing in each channel
filter means for preferentially filtering the desired sound against undesired sounds in the
frequency band of the channel; and means for assembling the output signal of this channel into
the enhanced output sound The present invention relates to a method and an apparatus for
processing source sound.
2. Description of the Related Art First of all, the desired sound is voice or, in general, sound that
can belong to a specific pitch. Sounds that do not have such pitches are excluded from the
purpose of enhancement. Sound enhancement is the improvement of the signal-to-noise ratio,
where noise is noise generated by other sounds or sounds other than sound to be enhanced,
music, distinguishable objects such as machines, or sound sources. It can be an unknown or
unidentifiable physical noise. Such sound enhancement makes the desired sound more clear,
more comfortable, and even more suitable. Therefore, it makes it easier to enhance the sound of
a particular instrument relative to other instruments. The results of this enhancement are used in
particular. Other applications offset the enhancement signal from the source signal and then use
or otherwise process the result of this offset.
SUMMARY OF THE INVENTION The simple method described above can determine whether it is
desirable or undesirable for low frequencies coupled to the pitch of the signal. However, high
harmonics cause problems of various characteristics. First of all, the phase of the higher
harmonics will not be correctly coupled to the fundamental pitch period; in the extreme case the
phase itself will be exposed to noise phenomena. Accordingly, such a method becomes attributed
to certain harmonics for these latter noise phenomena. This will then cause interference in the
high frequency range of the desired signal and will effectively attenuate the high frequency
It is an object of the present invention to provide a source sound processing method and device
of the kind described above, which can be easily adapted to the actual requirements and broaden
the field of application.
SUMMARY OF THE INVENTION The present invention is directed to processing a source sound
by enhancing a desired sound with respect to unwanted sounds, and distributing the source
sound through multiple band pass filters in many parallel channels. Each channel provides each
filter means which preferentially filters the desired sound against undesired sounds in the
frequency band of that channel; collecting the output signal of this channel into an enhanced
output sound In a source sound processing method comprising: means for supplying the output
of each band pass filter to the envelope detection means to power the filter means of this
channel; supplying the output of each band pass filter to the envelope modulation means It is
characterized in that an output signal of this channel is generated.
The invention is based on the fact that at high frequencies the phase of the envelope is coupled
to the pitch frequency rather than the phase of the signal itself.
Thus, unwanted signals can be filtered out by adaptively filtering the envelope of each frequency
band rather than the signal itself.
Preferably the filter means comprises comb shaped filter means.
The single-channel comb filter, which filters the signal itself, is degraded by the addition of white
noise, a paper published by JSLim et al. In IEEE Transaction-on Acoustic Speech and Signal
Processing, vol. Evaluation of an adaptive comb filter method for enhancing speech. The solution
is that it is necessary to apply filtering (not particularly limited to comb filters) to a plurality of
parallel channels, as implemented in the signal envelope. Another slightly different solution is to
replace the comb filter with the choice of harmonics. The above two methods are mathematically
equivalent when the desired signal is a stationary signal, and the technical term used in the
claims covers the latter technology. In particular, the latter technique relates to the change from
the time domain to the spectral frequency domain. However, the translation for harmonic
selection is no longer correct if the desired signal is not stationary. However, for the correctness
of the appropriate comb filter approach, the desired signal need not be stationary. The reason for
applying the above-mentioned method is to encode the signal according to the envelopes of
different frequency bands and to carry out the reconstruction in practice to generate the desired
signal without audible distortion. Multiple rate filtering of subband encoding / decoding by itself
is described in the article "Multiple" published by Martin Vetli in IEEE Transaction on Acoustic
Speech and Signal Processing, ASSP 35 (March 1987), pages 356-372. The theory of rate filter
banks is described.
Further, the source sound processing apparatus according to the present invention is a source
sound processing apparatus for enhancing a desired sound with respect to undesired sound in
the source sound, comprising: a plurality of first plurality of channels assigned to respective
continuous frequency bands; The distribution means for distributing the source sound, each
channel comprising: band pass filter means at the frequency of the associated channel; envelope
detection means fed by the band pass filter means of these channels; A comb filter means fed by
the envelope detection means of the channel; an envelope modulation means fed by the comb
filter means of these channels; and an output means fed by the output of all the parallel channels
It is characterized by
Such devices are effective for speech and music processing, for example for the purpose of realtime and record reproduction, and for information dissemination, education, entertainment,
psychology, music, linguistics, history and court investigations etc. It can be used.
In the practice of the invention, said filter means comprise comb-like filter means.
Further, the desired sound is a voice sound.
In addition, certain instruments are augmented so that they can be separated or subtracted from
other instruments. The source sound processor according to the invention has an additional
channel means of a frequency which is lower than the frequency band of the first plurality of
synthetic channels in this frequency band, any additional channels of this additional channel
means being said distribution Means for feeding and associated additional channel frequency
band pass filter means and channel band pass filter means and comprising comb-like filter means
also having said output means. Furthermore, said envelope detection means comprise
downsampling means and said envelope modulation means comprise upsampling means. Also,
the comb filter means have mutually uniform characteristics in the interdental space
approximately equal to the instantaneous fundamental frequency of the desired sound. In all
cases, the constant enhancement is relative, which can be combined with the amplification or
attenuation of the desired signal itself.
An embodiment of the invention will be described with reference to the drawings. FIG. 1 is an
exemplary amplitude-to-time signal of speech samples. While speech is an important field of use,
time and amplitude are considered relative quantities as long as the invention is directed to
various types of signal sources. However, all sources apply to those with more complex physical
sources than when generating pure harmonics.
FIG. 2 shows the same signal as in FIG. 1 with frequency domain substitution. The frequency
range is 0 to 5000 Hz on a linear scale. The amplitudes are relative, whereas a figure is shown
but not calibrated. Curve 1b1 is the logarithm of the spectral amplitude as a function of
frequency f. At the lowest frequency the amplitude is significantly lower. At intermediate
frequencies the amplitude sometimes goes up and down. However, there are enough changes. At
higher frequencies the amplitude drops gradually but does not change further. The curve 1b2 is
the spectral envelope of the signal also having the curve 1b1 generated as a function of
frequency. For the sake of clarity, the curve 1b2 is shifted somewhat upward relative to the curve
1b1. As is apparent from the figure, the change in curve 1b2 is sufficiently smoother than the
change in curve 1b1. In general, the peak of the envelope corresponds to the so-called formant
frequency of speech. For a description of the formant phenomenon, refer to the standard
textbook for speech analysis. Curve 1b3 represents a band pass filter for each of the five formant
frequencies. The bandwidth is approximately 500 Hz. The flat part of the transmission curve
shows essentially 100% transmission. In a practical embodiment of the invention, a number of
band pass filters are provided to ensure that sufficient acoustic energy is transmitted. Also, the
passbands are narrow and close together (two passbands connected to the two highest formant
frequencies). In fact, the width of 1/3 of an octave is the most logical for perceptual reasons. In
any case, the overall transfer curve of all the combined passband filters has no holes but is nearly
flat with frequency.
FIG. 3 shows five curve pairs, each pair being associated with a particular one of the five formant
frequencies of curve 1b2. The lower curve of each pair of curves shows the transmitted
amplitude of the signal itself. The vertically offset curve shows the amplitude envelope of the
transmitted signal. The upper curve pair relates to the basic pitch of the speech sound passed by
the appropriate band pass filter. The lower voice is common but the common pitch frequency of
adult male voice is 50-200 Hz. The voices of females and juveniles are of sufficiently high pitch,
ie 150-300 Hz for females and up to 400 Hz for children, but the pitch of soprano has been
found to rise to 1200 Hz. As shown, the signal itself is modulated with almost periodic changes.
The envelope is periodic at the pitch frequency. Such current pitch changes are slow with respect
to the pitch period. The next pair of curves symbolizes the speech signal at the next higher
formant frequency (generally, the (1/2) th harmonic in this example) with respect to the pitch.
Also, the phase to this pitch exhibits some variation with time and the shape of the signal is
somewhat less sinusoidal than the shape of the first formant. This phenomenon is more
pronounced for the curve pairs associated with the highest frequency formants F3, F4, F5:
despite the fact that the overall shape (= related to the envelope) is more periodic It is not
applicable to the very non-periodic signal itself. At the highest frequency formants, the envelope
is also significantly non-periodic. This means that a large phase change occurs. As a result, the
present invention uses high frequency band envelopes for other processing. In general, nonspeech signals are directed to similar signal diagrams.
FIG. 4a illustrates the impulse response of the comb filter. Add 1 to the height of each peak. The
output of the filter is a convolution of the input signal with the transmittance of each comb filter
tooth. The spacing between successive teeth is the known or measured peak period of the input
signal. Because of this, this requirement is not completely exact, but the comb teeth of the comb
filter with a constant pitch are generally symmetrical. In general, the response factor decreases
with distance from the center. The number of coefficients is chosen to be an odd value of 7, but
other values can be applied including even values. In general, the layout of FIG. 4a is arbitrary.
Also, the repetition frequency of the comb filter application is arbitrary, but usually faster than
the pitch frequency itself.
The left side of FIG. 4b shows an infinite pulse train whose horizontal axis is time. Also on the
right of FIG. 4b is its Fourier transform: this is an infinite number of identical pulses shown only
on the right of the frequency axis.
The left side of FIG. 4c shows an example of a window function which changes with time. The
right side of FIG. 4c shows a Fourier transform of approximately the same scale as the Fourier
transform of FIG. 4b. The result is a relatively narrow peak that symmetrically surrounds the zero
of the frequency axis.
The left side of FIG. 4d shows the signals transferred when the window function of FIG. 4c
operates on the pulse train of FIG. 4b. Similarly, the right side of FIG. 4d shows the result of the
convolution of the Fourier transform of the pulse train of FIG. 4b and the Fourier transform of
the window of FIG. 4c. The right side of FIG. 4d is the Fourier transform of the left side of FIG. 4d.
FIG. 5 shows a block diagram of the device according to the invention. In FIG. 5, the input means
20 receives a source sound which contains the desired sound to be enhanced in which the
undesired sound has been suppressed. This input can represent a microphone or other
transducer, a digital or analog audio transmission channel, or other conventional device. The
plurality of band pass filters 22-30 have a continuous pass band, so collectively they pass all of
the acoustic energy within the desired frequency range. Such a range need not comprise all the
energy of the input means 20, and the flatness of the overall transmission factor can be selected
according to the desired accuracy or other valid criteria. The number of filters is arbitrary and
can be, for example, 32 or 64. In this case, the width of the half height of the response curve can
be, for example, 1/10 to 1/3 of the octave. These filters operate according to digital or analog
The array 32 comprises envelope detection means which may for example be configured as
downsampling means. In practice, this acts as a demodulator. Downsampling is provided by a
Vetterli reference operation circuit. Another easy procedure is the full wave rectification followed
by the smoothing procedure. The smooth time constant can be compared to the bandwidth of the
band in question. The smoothed signal can then be sampled at a somewhat lower repetition rate.
In addition to the five channels so discussed, two additional channels are shown illustratively
having bandpass filters 60, 62 and having no envelope detector in the array 32. This latter
channel applies to the part of the spectrum where the phase of the signal is invariant. In practice,
this depends on the type of sound being processed, which is the low frequency part for speech
below 1250 Hz each time. In practice, the width of the band pass filter is equal when measured
in octaves.
The array 42 is each comb filter described with reference to FIG. All channels are comb filtered,
they have no envelope detection means. Furthermore, it is preferable that all comb filters have a
uniform configuration in that the interdental distance is equal to the actual pitch period and that
the tooth height has the same pattern. Array 52, which corresponds to array 32, modulates the
filtered signal with each envelope detected earlier in array 32. The interconnections providing
modulation-control signals from array 32 to array 52 are omitted for the sake of simplicity.
Channels that do not perform envelope detection are not modulated by the envelope. All outputs
of each channel are combined at output terminal 64.
The above-mentioned place is a basic level of FIG. The actual circuits at the electronic level, such
as synchronization, signal definition, electronic achievement, etc. are not shown. The details of
such circuits are omitted as they are known to those skilled in the art.
Без категории
Размер файла
18 Кб
Пожаловаться на содержимое документа