close

Вход

Забыли?

вход по аккаунту

?

JP2018014711

код для вставкиСкачать
Patent Translate
Powered by EPO and Google
Notice
This translation is machine-generated. It cannot be guaranteed that it is intelligible, accurate,
complete, reliable or fit for specific purposes. Critical decisions, such as commercially relevant or
financial decisions, should not be based on machine-translation output.
DESCRIPTION JP2018014711
Abstract: An audio processing apparatus and method for estimating the signal to noise ratio of an
electrical input signal representing sound. An input unit is a time-varying acoustic signal
composed of a target speech signal component S (k, n) from a target sound source TS and a noise
signal component N (k, n) from a sound source other than the target sound source. To provide a
time-frequency representation Y (k, n) of the electrical input signal representing Determine the
recursive signal-to-noise ratio estimate γ (k, n) of the electrical input signal, and from the
recursive signal-to-noise ratio estimate γ (k, n), the electrical input signal based on a recursive
decision-directed algorithm The a priori target signal to noise signal ratio estimate ζ (k, n) is
determined. The a priori target signal to noise ratio estimate ζ (k, n) for the nth time frame is
the a priori target signal to noise ratio estimate ζ (k, n−) for the (n−1) th time frame 1) and the
recursive signal-to-noise ratio estimate γ (k, n) for the nth time frame. [Selected figure] Figure
1A
Audio processing apparatus and method for estimating signal to noise ratio of acoustic signal
[0001]
The present disclosure relates to an audio processing device, for example a hearing aid, and to a
method of estimating the signal to noise ratio of an electrical input signal representing a sound.
In particular, the present disclosure provides deductive signal-to-noise ratio estimation by
nonlinear smoothing of recursive signal-to-noise ratio estimates (eg, implemented as low-pass
filtering with adaptive low-cutoff frequency) Relates to the value acquisition scheme.
03-05-2019
1
[0002]
The “inductive signal-to-noise ratio” (SNRpost) in this context was observed at a given time t,
eg the power of a noisy signal picked up by one or more microphones (available ) Noisecontaining signal (target signal S plus noise N, Y (t) = S (t) + N (t)), noise estimate such as noise
signal power ((t)), etc. Mean the ratio of the noise to the noise N (t), ie, SNRpost (t) = Y (t) / (t), or
SNRpost (t) = Y (t) <2> / (t) <2> . For example, "inductive signal to noise ratio" (SNRpost) can be
defined in the time-frequency domain as the value of each frequency band (index k) and time
frame (index n), ie SNRpost = SNRpost (k, n), that is, for example, SNRpost (k, n) = | Y (k, n) | <2>
/ | (t) <2> |. An example of "reductive" signal-to-noise ratio generation is illustrated in FIGS. 1A
and 1B for single and multiple microphone configurations, respectively.
[0003]
The “a priori signal-to-noise ratio” SNRprio in this context is the estimate of the target signal S
(t) (or the target signal power S (t) <2>) at a given instant t and the noise signal amplitude N (t)
(Or noise signal power N (t) <2>) means the ratio of these signals to the estimated value, for
example, SNRprio = SNRprio (t) = (t) <2> / (t ) <2>, or SNRprio = SNRprio (k, n), that is, for
example, SNRprio (k, n) = | (k, n) | <2> / | (k, n) | <2> .
[0004]
European Patent No. 2701145
[0005]
Ephraim,Y.、Malah,D.
“Speech enhancement using a minimum square error short-time spectral amplitude estimator,”
IEEE Transactions on Acoustic, Speech and Signal Processing, December 1984, Volume 32, No. 6,
pages 1109 to 1211, URL: http: // ieeexplore. ieee. org / stamp / stamp. jsp?
tp=&arnumber=1164453&isnumber=26187
Martin,R. , “Noise Power Spectral Density Based on Optimal Smoothing and
Minimum Statistics Based on Optimal Smoothing and Minimum Statistics,” IEEE Transactions on
Acoustics, Speech and Signal Processing, April 2001, 9th. Volume, No. 5, pages 504 to 1512
03-05-2019
2
Ephraim, Y .; Malah, D .; "Speech enhancement using a minimum error squared error log-spectral
amplitude estimator", IEEE Transactions on Acoustics, Speech and Signal Processing, April 1985,
33. Volume, No. 2, pages 443-445, URL: http: // ieeexplore. ieee. org / stamp / stamp. jsp?
tp=&arnumber=1164550&isnumber=26190
Breithaupt,C.、Martin,R. “Analysis of the Decision-Directed SNR
Estimator for Speech Enhancement with Low SNR and Transient Conditions”, IEEE Transactions
on Acoustics, IEEE Transactions on Acoustics. Speech, and Language Processing, February 2011,
Vol. 19, No. 2, pages 277 to 289, URL: http: // ieeexplore. ieee. org / stamp / stamp. jsp?
tp=&arnumber=5444986&isnumber=5609232
Cappe,O. "Elimination of the noise noise phenomenon using Ephraim and Malah's noise
suppressor (Elimination of the noise noise phenomenon with the Ephraim and Malah noise
suppresor)", IEEE Transactions on Speech and Audio Processing, April 1994 Volume, No. 2, pages
345 to 349, URL: http: // ieeexplore. ieee. org / stamp / stamp. jsp?
tp=&arnumber=279283&isnumber=6926 Loizou,P. ,
"Speech Enhancement; Theory and Practice", CRC Press, Boca Raton: Florida
[0006]
Audio processing device, for example a hearing device such as a hearing aid In a first aspect of
the present application, an audio processing device is provided. This audio processing device, for
example a hearing aid, comprises: a target speech signal component S (k, n) from a target sound
source TS and a noise signal component N (k, n), the frequency band index and the time frame
index being k and n respectively and n) at least one input unit providing a time-frequency
representation Y (k, n) of the electrical input signal representing a time-varying acoustic signal,
and a noise reduction system, the noise reduction system comprising Determine the first
recursive signal-to-noise ratio estimate γ (k, n) of the electrical input signal, ○ based on the
recursive algorithm from the recursive signal-to-noise ratio estimate γ (k, n) To determine a
second a priori target signal to noise ratio estimate ζ (k, n) of the electrical input signal, · · · the a
priori target signal to noise ratio estimation for the (n-1) th time frame value (K, n-1); · The a
priori signal-to-noise ratio estimate γ (k, n) for the nth time frame; and the a priori target signal
to noise for the nth time frame Configured to determine a ratio estimate.
[0007]
In one embodiment, a recursive algorithm implements a first-order IIR lowpass filter with unit DC
gain and an adaptive time constant (or lowpass cutoff frequency).
03-05-2019
3
[0008]
In a second aspect of the present application, an audio processing device is provided.
This audio processing device, for example a hearing aid, comprises: a target speech signal
component S (k, n) from a target sound source TS and a noise signal component N (k, n), the
frequency band index and the time frame index being k and n, respectively. and n) at least one
input unit providing a time-frequency representation Y (k, n) of the electrical input signal
representing a time-varying acoustic signal, and a noise reduction system, the noise reduction
system comprising , For each frequency band, o determine a first recursive signal to noise ratio
estimate γ (k, n) of the electrical input signal, o from the recursive signal to noise ratio estimate
γ (k, n) , A second a priori target signal-to-noise ratio estimate of the electrical input signal ζ
(based on a recursive algorithm implementing a low pass filter with an adaptive time constant or
low pass cutoff frequency Determine k, n).
[0009]
In other words, the second a priori target signal to noise ratio estimate ζ (k, n) is determined by
low pass filtering the first recursive signal to noise ratio estimate γ (k, n) Ru.
[0010]
In one embodiment, the adaptive time constant or low pass cutoff frequency of the low pass filter
is determined in dependence on the first recursive signal to noise ratio estimate and / or the
second a priori signal to noise ratio estimate Ru.
[0011]
In one embodiment, the first recursive signal-to-noise where the adaptive time constant or low
pass cutoff frequency of the low pass filter for a given frequency index k (also called frequency
channel k) corresponds to only this frequency index k It is determined depending on the ratio
estimate and / or the second a priori signal-to-noise ratio estimate.
[0012]
In one embodiment, the adaptive time constant or low-pass cutoff frequency of the low-pass filter
for a given frequency index k (also referred to as frequency channel k) includes, for example, at
least adjacent frequency indices k-1, k, k + 1. Determined according to a predetermined (or
03-05-2019
4
adaptive) scheme depending on the first recursive signal to noise ratio estimate and / or the
second a priori signal to noise ratio estimate corresponding to the plurality of frequency indices
k ' Be done.
[0013]
In certain embodiments, the adaptive time constant or low pass cutoff frequency of the low pass
filter for a given frequency index k (also referred to as frequency channel k) may be one or more
detectors (eg, onset indicators , Wind noise or sound detector etc.).
[0014]
In one embodiment, the low pass filter is a first order IIR low pass filter.
In one embodiment, the first order IIR low pass filter has unit DC gain.
[0015]
The adaptive time constant or low pass cutoff frequency of the low pass filter at a given point in
time n is the first maximum likelihood estimate of the second a priori target signal to noise ratio
estimate at that point in time, and / or the previous It is determined depending on the second a
priori target signal to noise ratio estimate at time instant n-1.
[0016]
This can lead to improved noise reduction.
[0017]
The noise signal component N (k, n) can originate from, for example, one or more sound sources
NSi (i = 1,..., Ns) other than the target sound source TS.
In one embodiment, the noise signal component N (k, n) includes late reverberations from the
target signal (e.g., target signal components reaching the user more than 50 ms later than the
dominant peak of the target signal component of interest) .
03-05-2019
5
[0018]
In other words, ζ (k, n) = F (ζ (k, n−1), γ (k, n)).
Using the nearest frame power to the recursive SNR in the determination of the a priori SNR (SNR
= Signal to Noise Ratio), for example, typically results in speech with a large increase in SNR in a
short period of time It can be advantageous for SNR estimation at onset.
[0019]
In one embodiment, the noise reduction system estimates the a priori target signal to noise ratio
estimate ζ (k, n) for the n th time frame under the assumption that γ (k, n) is greater than or
equal to one. Configured to determine.
In one embodiment, the signal-to-noise ratio estimate γ (k, n) of the electrical input signal Y (k,
n) is, for example, the signal power spectral density of the current value Y (k, n) of the electrical
input signal | Defined as the ratio of Y (k, n) | <2> to the current noise power spectral density
estimate <σ <2 >> of the electrical input signal Y (k, n), ie γ (k, n) = | Y (k, n) | <2> / <σ <2 >>.
[0020]
In one embodiment, the noise reduction system comprises a priori target signal to noise ratio
estimate ζ (k, n-1) for the (n-1) th time frame and a priori target for the nth time frame. From
the maximum likelihood SNR estimator ζ <ML> (k, n) of the signal to noise ratio estimate ζ (k,
n), the a priori target signal to noise ratio estimate ζ (k, for the n th time frame n) configured to
determine.
[0021]
In one embodiment, the noise reduction system comprises a maximum likelihood SNR estimator
ζ <ML> (k, n), a maximum operator MAX, and a maximum likelihood SNR estimator ζ <ML> (k,
n) It is configured to be determined as MAX {ζ <ML> min (k, n); γ (k, m) −1}, where ζ <ML>
03-05-2019
6
min (k, n).
In one embodiment, the minimum value ζ <ML> min (k, n) of the maximum likelihood SNR
estimator ζ <ML> (k, n) may depend, for example, on the frequency band index.
In one embodiment, the minimum value ζ <ML> min (k, n) is independent.
In one embodiment, the minimum value ζ <ML> min (k, n) is considered equal to "1" (ie, 0 dB on
a logarithmic scale).
This applies, for example, if the target signal component S (k, m) can be ignored, ie if only the
noise component N (k, m) is present in the input signal Y (k, m).
[0022]
In one embodiment, the noise reduction system determines the a priori target signal to noise
ratio estimate ζ by nonlinear smoothing of the recursive signal to noise ratio estimate γ or by
parameters derived from this non-linear smoothing. Configured The parameter derived from
nonlinear smoothing can be, for example, the maximum likelihood SNR estimator SNR <ML>.
Non-linear smoothing can be performed, for example, by low-pass filtering with, for example, an
adaptive low-pass cutoff frequency, for example by means of a first-order IIR low-pass filter with
unit DC gain and an adaptive time constant.
[0023]
In one embodiment, the noise reduction system is configured to provide SNR dependent
smoothing, which is easier to smooth under low SNR conditions than under high SNR conditions.
This configuration can have the advantage of reducing music noise. The terms "low SNR
condition" and "high SNR condition" refer to the first and second conditions such that the true
SNR under the first condition is lower than the true SNR under the second condition. It is
intended to indicate the condition. In one embodiment, "low SNR condition" and "high SNR
condition" are considered to mean less than 0 dB and more than 0 dB, respectively. The
dependence of the time constant controlling the smoothing preferably exhibits a gradual change
03-05-2019
7
depending on the SNR. In one embodiment, the higher the time constant (s) involved in the
smoothing, the lower the SNR. The SNR estimates are generally relatively poor at "low SNR
conditions" than at "high SNR conditions" (thus, they are less reliable at low SNR and thus are a
factor for further smoothing).
[0024]
In one embodiment, the noise reduction system is configured to provide a negative bias relative
to ML <ML> n for low SNR conditions. This arrangement can have the advantage of suppressing
the audibility of music noise during periods of noise only. The term “bias” in this context is the
expectation of the maximum likelihood SNR estimator ζ <ML> (k, n) E (ξ <ML> n) and the a
priori signal-to-noise ratio ζ (k, n) Used to indicate the difference with the value E (E n). In other
words, under “low SNR conditions” (eg, when true SNR <0 dB), E (E <ML> n) −E (ζn) <0 (eg,
as shown in FIG. 3) is there.
[0025]
In one embodiment, the noise reduction system is configured to provide a recursive bias that
enables switching of low to high and high to low configurable SNR conditions.
[0026]
In logarithmic representation, the a priori signal-to-noise ratio for the nth time frame is
expressed as sn = s (k, n) = 10log (ζ (k, n)), corresponding to the nth time frame The maximum
likelihood SNR estimator for each can be expressed as s <ML> n = s <ML> (k, n) = 10 log (ζ
<ML> (k, n).
[0027]
In one embodiment, the noise reduction system comprises deducing the a priori target signal to
noise ratio estimate ζ (k, n−1) for the (n−1) -th time frame and the deduction for the n-th time
frame. From the maximum likelihood SNR estimator ζ <ML> (k, n) of the dynamic target signal
to noise ratio estimate ζ (k, n), the a priori target signal for the n th time frame according to the
following recursive algorithm Configured to determine a noise-to-noise ratio estimate ζ (k, n),
where ρ (sn−1) represents the bias function or parameter of the (n−1) -th time frame, λ (s n 1) represents the smoothing function or parameter of the (n-1) th time frame.
[0028]
03-05-2019
8
In one embodiment, ρ (sn-1) is chosen to be equal to the value of, and ξ satisfies, where is the
non-linear function defined in equation (8).
[0029]
In one embodiment, the smoothing function λ (sn-1) is the function of the function at 0 dB
crossing (ie, when Sn <ML> -sn = ρ (sn-1)) (see curve in FIG. 3) It is chosen to be equal to the
slope (for s <ML> n).
[0030]
In one embodiment, the audio processing device includes a filter bank having an analysis filter
bank that provides a time-frequency representation Y (k, n) of the electrical input signal.
In one embodiment, the electrical input signal is available as multiple frequency subband signals
Y (k, n), k = 1, 2,.
In one embodiment, the a priori signal-to-noise ratio estimate ζ (k, n) is a recursive signal-tonoise ratio estimate γ (k, n) of an adjacent frequency subband signal (eg, γ (k−1) , N) and / or
γ (k + 1, n)).
[0031]
In one embodiment, an audio processing unit is configured to cause the analysis filter bank to be
oversampled.
In one embodiment, the audio processing unit is configured to provide that the analysis filter
bank is a DFT modulation analysis filter bank.
[0032]
In one embodiment, a recursive loop of the algorithm that determines the a priori target signal to
03-05-2019
9
noise ratio estimate ζ (k, n) for the nth time frame includes high order delay elements, eg,
circular buffers.
In one embodiment, the higher order delay elements are configured to compensate for
oversampling of the analysis filter bank.
[0033]
In one embodiment, a noise reduction system compensates the oversampling of the analysis filter
bank by an algorithm that determines the a priori target signal to noise ratio estimate ζ (k, n) for
the n th time frame. Configured to fit.
In one embodiment, the algorithm includes smoothing parameters (λ) and / or bias parameters
(ρ).
[0034]
In one embodiment, two functions λ and 制 御 control the amount of smoothing and the amount
of SNR bias as a recursive function of the estimated SNR.
[0035]
In one embodiment, the smoothing parameter (λ) and / or the bias parameter (ρ) are adapted
to compensate for the sampling rate (see, eg, FIG. 5).
In one embodiment, different oversampling rates are compensated by adapting the parameter α
(see, eg, FIG. 8).
[0036]
In one embodiment, the audio processing device includes a hearing device such as a hearing aid,
a headset, an earphone, a soundproofing device, or a combination thereof.
03-05-2019
10
[0037]
In one embodiment, the audio processing device may be frequency dependent gain and / or level
dependent compression, and / or (with or without frequency compression), for example to
compensate for the hearing impairment of the user and / or the harsh acoustic environment. It is
adapted to provide a transition from one or more frequency ranges to one or more other
frequency ranges.
In one embodiment, the audio processing device includes a signal processing unit that augments
the input signal to provide a processed output signal.
[0038]
In one embodiment, the audio processing device includes an output unit that provides a stimulus
that the user perceives as an auditory signal based on the processed electrical signal. In one
embodiment, the output unit comprises a plurality of cochlear implant electrodes or vibrators of
a bone conduction hearing device. In one embodiment, the output unit comprises an output
converter. In one embodiment, the output transducer includes a receiver (speaker) that provides
the user with a stimulus as an auditory signal. In one embodiment, the output transducer
includes a vibrator that provides the user with stimulation as mechanical vibration of the skull
(eg, in a bone-mounted or bone-fixed hearing device).
[0039]
In one embodiment, the audio processing device includes an input unit providing an electrical
input signal representing a sound. In one embodiment, the input unit comprises an input
transducer, such as a microphone, which converts the input sound into an electrical input signal.
In one embodiment, the input unit includes a wireless receiver that receives a wireless signal
including sound and provides an electrical input signal representative of the sound.
[0040]
In one embodiment, the audio processing device is a portable device, such as a device that
includes a local energy source, such as a battery, such as a rechargeable battery.
03-05-2019
11
[0041]
In one embodiment, the a priori SNR estimates for a given constant hearing aid forming part of a
binaural hearing aid system are based on the recursive SNR estimates from both hearing aids of
the binaural hearing aid system.
In one embodiment, the a priori SNR estimates of a given hearing aid forming part of a binaural
hearing aid system are the deductions of the inductive SNR estimates of a given hearing aid of
the binaural hearing aid system and the other hearing aid. Based on the estimated SNR.
[0042]
In one embodiment, the audio processing device includes a forward (or signal) path between an
input transducer (microphone system and / or a direct electrical input (eg, a wireless receiver))
and an output transducer. In one embodiment, the signal processing unit is located in the
forward path. In one embodiment, the signal processing unit is adapted to provide frequency
dependent gain according to the specific needs of the user. In one embodiment, the audio
processing device has functional elements that perform analysis of the input signal (e.g., signal
level, modulation, type, acoustic feedback estimates, etc.) and possibly control of the processing
of the forward path. Includes analysis (or control) pathways. In one embodiment, signal
processing of part or all of the analysis path and / or the signal path is performed in the
frequency domain. In one embodiment, signal processing of some or all of the analysis and / or
signal paths is performed in the time domain.
[0043]
In one embodiment, the analysis (or control) path operates on fewer channels (or frequency
subbands) than the forward path. By doing this, it is possible to save the power of the audio
processing device, for example the power consumption is an important parameter, such as
portable audio processing devices like hearing aids.
[0044]
03-05-2019
12
In one embodiment, an analog electrical signal representing an acoustic signal is converted to a
digital audio signal in an analog-to-digital (AD) conversion process, wherein the analog signal is,
for example, 8 kHz (fitted to the specific needs of the application) Sampled at a predetermined
sampling frequency or rate fs in the range of 48 kHz to provide digital samples xn (or x [n]) at
discrete points in time tn (or n), each audio sample may for example The value of the acoustic
signal at tn is represented by a predetermined number of bits Ns in the range of 16 bits. The
digital sample x has a time length of 1 / fs, for example 50 μs at fs = 20 kHz. In one
embodiment, multiple audio samples are arranged in one time frame. In one embodiment, one
time frame includes 64 or 128 audio data samples. Other frame lengths may be used depending
on the actual application. In one embodiment, in the case of oversampling (e.g., where critical
sampling (without frame overlap) corresponds to a frame length of 3.2 ms (e.g., fs = 20 kHz, 64
samples per frame)) The frame shifts every ms or every 2 ms. In other words, these frames
overlap so that only a particular portion of the sample, such as 25%, 50% or 75% of the sample,
is new from a given frame to the next frame.
[0045]
In one embodiment, the audio processing device includes an analog to digital (AD) converter that
digitizes the analog input at a predetermined sampling rate, such as 20 kHz. In one embodiment,
the audio processing device includes a digital-to-analog (DA) converter that converts digital
signals into, for example, analog output signals that are provided to the user via the output
converter.
[0046]
In one embodiment, an audio processing device such as, for example, a microphone unit and / or
a transceiver unit comprises a TF conversion unit providing a time-frequency representation of
the input signal. In one embodiment, this time-frequency representation comprises a
corresponding complex or real-valued array or map of signals of interest in a particular time and
frequency range. In one embodiment, the TF transform unit includes a filter bank that filters the
(time-varying) input signal to provide multiple (time-varying) output signals, each having a
different frequency range of the input signal. In one embodiment, the TF transform unit includes
a Fourier transform unit that transforms a time-varying input signal into a (time-varying) signal
in the frequency domain. In one embodiment, the frequency range from the minimum frequency
fmin to the maximum frequency fmax considered by the audio processing device is in a typical
human audio frequency range of 20Hz to 20kHz, such as part of a range of 20Hz to 12kHz.
03-05-2019
13
Including some. In one embodiment, the signal in the forward path and / or analysis path of the
audio processing device is divided into NI frequency bands, wherein NI is greater than 5 such as,
for example, more than 10, 50, 100, 500, etc. , At least some of which are processed separately.
In one embodiment, the audio processing unit is adapted to process the signals of the forward
path and / or the analysis path on NP different frequency channels (NP ≦ NI). The frequency
channels can have uniform widths or non-uniform widths (e.g., widths increase with frequency),
and can be overlapping or non-overlapping.
[0047]
In one embodiment, the audio processing device is the current physical environment of the audio
processing device (e.g. the current acoustic environment) and / or the current state of the user
wearing the audio processing device and / or the audio processing It includes a plurality of
detectors configured to provide status signals regarding the current state or operating mode of
the device. Alternatively, or additionally, one or more detectors may form part of an external
device (eg, wirelessly) in communication with the audio processing device. The external device
can include, for example, another hearing aid device, a remote control device, an audio
distribution device, a telephone set (eg, a smart phone), an external sensor, and the like.
[0048]
In one embodiment, one or more of the plurality of detectors operate based on the full band
signal (time domain). In one embodiment, one or more of the plurality of detectors operate based
on the band split signal ((time-) frequency domain).
[0049]
In one embodiment, the plurality of detectors include level detectors that estimate the current
signal level of the forward path. In one embodiment, the predetermined criteria include whether
the current signal level of the forward path is above or below a given (L-) threshold.
[0050]
03-05-2019
14
In a particular embodiment, the audio processing unit includes an audio detector (VD) that
determines whether the input signal contains an audio signal (at a given point in time). Speech
signals in this context are considered to include speech signals from humans. Speech signals may
also include other vocalizations (eg, songs) generated by the human speech system. In one
embodiment, the speech detector unit is adapted to classify the user's current acoustic
environment into a "voice" environment or a "non-voice" environment. This includes time
segments that contain only other sources (eg, artificially generated noise) by identifying time
segments of the electrical microphone signal that contain human speech (eg, speech) in the
user's environment. It has the advantage of being separable from In one embodiment, the speech
detector is adapted to detect the user's own voice also as "speech". Alternatively, the voice
detector is adapted to exclude the user's own voice from the detection of "voice".
[0051]
In one embodiment, the audio processing device includes a self-voice detector that detects
whether a given input sound (eg, voice) originates from the voice of the user of the system. In
one embodiment, the microphone system of the audio processing device is adapted to be able to
distinguish between the user's own voice and the voice of another person, possibly "non-speech"
sound.
[0052]
In one embodiment, the hearing aid includes a classification unit configured to classify the
current state based on an input signal from (at least a part of) the detector and possibly other
inputs. The "current state" in this context is considered to be defined by one or more of the
following. a) Current electromagnetic environment, such as the generation of an electromagnetic
signal (eg, intended or not intended to be received by an audio processing device (eg, including
audio signal and / or control signal), or other than acoustic) B) current acoustical conditions
(input level, feedback etc), c) current user's mode or state (movement, temperature etc), and d)
current hearing aids. The mode or state of the device (selected program, elapsed time since last
user interaction, etc.) and / or the current mode or state of another device in communication with
the audio processing device.
[0053]
In certain embodiments, the audio processing device further includes other related functionality
for the subject application, such as compression, amplification, feedback reduction, and the like.
03-05-2019
15
[0054]
In one embodiment, an audio processing device is adapted to be completely or partially located
in, for example, a hearing device, a hearing aid, a hearing device, a user's ear, or an outer ear, a
headset, an earphone, a soundproofing device. It includes a hearing device such as a protective
device or a combination thereof.
[0055]
Applications In certain aspects, the application of the above-described audio processing device in
the "Detailed Description of the Embodiments" and claims is further provided.
One embodiment provides an application in a system that includes audio delivery.
In one embodiment, a system including, for example, one or more hearing aids such as handsfree telephone systems, teleconferencing systems, on-site announcement systems, karaoke
systems, classroom amplification systems, headsets, earphones, active soundproofing systems,
etc. Provide the use of
[0056]
Method In one aspect of the present application, the a priori signal-to-noise ratio ζ (k) of a timefrequency representation Y (k, n) of an electrical input signal representing a time-varying
acoustic signal composed of a target speech component and a noise component. , N) are further
provided, where k and n are the frequency band index and the time frame index, respectively.
The method comprises: determining a recursive signal to noise ratio estimate γ (k, n) of the
electrical input signal Y (k, n); and the recursive signal to noise ratio estimate γ (k, n) from n)
determining an a priori target signal to noise signal ratio estimate ζ (k, n) of said electrical input
signal based on a recursive algorithm, said deduction for the (n-1) -th time frame From the
relative target signal to noise signal ratio estimate ζ (k, n-1) and the recursive signal to noise
ratio estimate γ (k, n) for the n th time frame, for the n th time frame Determining the a priori
target signal to noise signal ratio estimate ζ (k, n) of
03-05-2019
16
[0057]
In a further aspect of the present application, the a priori signal-to-noise ratio ζ (k, k) of the
time-frequency representation Y (k, n) of an electrical input signal representing a time-varying
acoustic signal composed of target speech components and noise components. Further provided
is a method of estimating n), where k and n are a frequency band index and a time frame index,
respectively. The method comprises: determining a recursive signal to noise ratio estimate γ (k,
n) of the electrical input signal Y (k, n); and the recursive signal to noise ratio estimate γ (k, n)
Determine the a priori target signal to noise signal ratio estimate ζ (k, n) of said electrical input
signal based on a recursive algorithm implementing a low pass filter with an adaptive time
constant or low pass cutoff frequency from n) Includes steps and
[0058]
Some or all of the structural features of the above-described apparatus in the detailed description
and / or claims may be combined with the method embodiments if properly replaced by the
corresponding process, and vice versa It is similar. The embodiment of the method has the same
advantages as the corresponding device.
[0059]
In one embodiment, the estimated value (k, n) of the amplitude of the target speech component is
determined from the electrical input signal Y (k, n) multiplied by a gain function G, the gain
function G being It is a function of the recursive signal to noise ratio estimate γ (k, n) and the a
priori target signal to noise ratio estimate ζ (k, n).
[0060]
In one embodiment, the method includes the step of providing SNR dependent smoothing, which
is easier to smooth under low SNR conditions than high SNR conditions.
[0061]
In one embodiment, the method includes a smoothing parameter (λ) and / or a bias parameter
(ρ) and / or a bypass parameter κ.
[0062]
03-05-2019
17
In one embodiment, the smoothing parameter (λ) and / or the bias parameter (ρ) may be the
recursive SNR γ or the spectral density of the electrical input signal | Y | <2> and the noise
spectral density <σ <2 >> Depends on
In one embodiment, the smoothing parameter (λ) and / or the bias parameter (ρ) and / or the
parameter κ are selected depending on the user's hearing loss, cognitive skill or speech
intelligibility score.
In one embodiment, the lower the smoothing parameter (λ) and / or the bias parameter (ρ) and
/ or the parameter 多 く, the lower the user's hearing, cognitive skills or speech intelligibility
skills of the subject, the more smoothing it takes Selected to offer.
[0063]
In one embodiment, the method includes adjusting the smoothing parameter (λ) to account for
oversampling of the filter bank.
[0064]
In one embodiment, the method includes the step of causing the smoothing parameter and / or
the bias parameter to depend on whether the input is increasing or decreasing.
[0065]
In one embodiment, the method comprises the steps of providing that the smoothing parameter
(λ) and / or the bias parameter (ρ) and / or the parameter κ are selectable from the user
interface.
In one embodiment, the user interface is implemented as a smartphone app.
[0066]
03-05-2019
18
In one embodiment, the method estimates a maximum likelihood SNR estimate of the a priori
target signal to noise ratio estimate ζ (k, n) for the n th time frame maximum likelihood by the
selected minimum value ξ <ML> min Providing pre-smoothing of the quantity ζ <ML> (k, n).
This step is used to address the following cases.
[0067]
In one embodiment, the recursive algorithm is configured to allow the maximum likelihood SNR
estimate to bypass the a priori estimates of the previous frame in the calculation of the bias and
smoothing parameters.
In one embodiment, the current maximum likelihood SNR estimate if the recursive algorithm
subtracts the parameter κ from the current maximum likelihood SNR estimate Sn <ML> is
greater than the previous a priori SNR estimate sn−1. It is arranged to allow the value Sn <ML>
to bypass the a priori estimate sn-1 of the previous frame (see FIG. 4). In one embodiment, the
value supplied to the mapping unit MAP of FIG. 4 is Sn <ML> -κ as shown in FIG. 4, but in
another embodiment ((Sn <ML> − >> sn− Sn) <ML> is supplied directly to the mapping unit
MAP if the condition 1) is fulfilled. In one embodiment, the recursive algorithm is located in a
recursive loop that allows maximum likelihood SNR estimates to bypass a priori estimates of
previous frames in computing bias and smoothing parameters via parameters パ ラ メ ー タ.
Includes maximum operator. This allows immediate detection of (large) SNR manifestations (thus
reducing the risk of over-attenuation of speech manifestations).
[0068]
In one embodiment, the recursive signal to noise ratio estimate γ (k, n) of the electrical input
signal Y (k, n) comprises a first recursive signal to noise ratio and a second recursive signal to
noise ratio. And as a combined recursive signal-to-noise ratio generated as a mix of Other
combinations (other than inductive estimates) (eg, noise variance estimates <σ <2 >>) can also
be used.
[0069]
03-05-2019
19
In one embodiment, two recursive signal-to-noise ratios are generated respectively from a single
microphone configuration and a multiple microphone configuration. In one embodiment, the first
recursive signal to noise ratio is generated faster than the second recursive signal to noise ratio.
In one embodiment, the combined recursive signal to noise ratio is generated as a weighted
mixture of the first recursive signal to noise ratio and the second recursive signal to noise ratio.
In one embodiment, the first and second recursive signal-to-noise ratios combined to produce the
inductive signal-to-noise ratio of the ipsilateral hearing aid are identical to the ipsilateral hearing
aid and the contralateral hearing aid of the binaural hearing aid system It originates.
[0070]
Computer readable medium An aspect of the present application is a tangible computer readable
medium storing a computer program comprising program code means, said computer program
being run on a data processing system, said detailed description of the embodiments. And a
tangible computer readable medium causing the data processing system to perform at least some
(such as most or all) of the above described method steps in the claims and in the claims.
[0071]
By way of example and not limitation, such computer readable media may be RAM, ROM,
EEPROM, CD-ROM or other optical disk storage, magnetic disk storage or other magnetic storage
device, or any desired program code or instructions or data structures. It can include any other
medium that can be accessed or carried by the computer and accessed by a computer.
As used herein, disks (discs or discs) include compact disks (CDs), laser disks, optical disks, digital
versatile disks (DVDs), floppy disks and Blu-ray disks, where disk is usually data And the disc
optically reproduces the data using a laser. Combinations of these should also be included within
the scope of computer readable media. The computer program may be transmitted via a
transmission medium such as a wired or wireless link or a network such as the Internet in
addition to being stored on a tangible medium, loaded into a data processing system, and a
location different from the tangible medium location. You can also run
[0072]
Computer program In the present application, a computer program (product) comprising
instructions that, when the program is executed by a computer, causes the computer to execute
the above-described method (steps) in the "detailed description of the embodiment" and claims.
03-05-2019
20
Further provide.
[0073]
Data Processing System In one aspect of the present application, causing the processor to
perform at least some (such as most or all) of the steps of the above-described method in the
Detailed Description of the Embodiments and Claims. Further provided is a data processing
system including program code.
[0074]
Hearing System In another aspect, there is further provided a hearing system comprising the
audio processing device described above in the Detailed Description of the Embodiments and
claims and an auxiliary device.
[0075]
In one embodiment, the system establishes a communication link between the audio processing
device and the auxiliary device to exchange information (e.g., control and status signals, possibly
audio signals), or from one to the other. It is adapted to bring about the possibility of forwarding.
[0076]
In one embodiment, the audio processing device is a hearing device such as a hearing aid or
includes such a device.
In one embodiment, the audio processing device is or includes a telephone.
[0077]
In one embodiment, the auxiliary device receives a plurality of audio signals (e.g., from an
entertainment device such as a TV or music player, a telephone device such as a mobile
telephone, or a computer such as a PC) and An audio gateway device adapted to select and / or
combine and / or combine suitable ones (or combinations of signals), or include such devices.
03-05-2019
21
In one embodiment, the auxiliary device is or includes a remote control device that controls the
function and operation of the audio processing device or the hearing device (s).
In one embodiment, the function (s) of the remote control may be implemented in a smartphone,
possibly running an application that enables control of the function of the audio processing
device (the audio processing device (s) may for example be Bluetooth or other Including a
suitable wireless interface to the smartphone based on any standardization scheme or
proprietary scheme of
[0078]
In one embodiment, the auxiliary device is another audio processing device, such as a hearing
device such as a hearing aid.
In one embodiment, the hearing system includes two hearing devices adapted to implement a
binaural hearing system, such as a binaural hearing aid system.
[0079]
App Another aspect of the present disclosure further provides a non-transitory application called
an app. The app includes executable instructions that are executed on the auxiliary device to
implement the user interface for the hearing device or hearing system described above in the
"detailed description of the embodiment" and claims. . In one embodiment, the app is configured
to operate on a cellular phone, such as a smart phone, or on another portable device that allows
communication with the hearing device or the hearing system.
[0080]
Definitions "Hearing device" in the present context receives an acoustic signal from the user's
surroundings, generates a corresponding audio signal, possibly modifies the audio signal,
possibly modifies the audio signal to at least one of the user By devices such as hearing aids or
active soundproofing devices or other audio processing devices adapted to improve, enhance and
/ or protect the user's hearing by providing them as audible signals to the ear. Furthermore, the
03-05-2019
22
"hearing device" is adapted to receive the audio signal electronically, possibly to correct the audio
signal, and to provide the possibly corrected audio signal as an audible signal to at least one ear
of the user It also refers to devices such as earphones or headsets. Such an audible signal may be
provided, for example, in the form of an acoustic signal that extends into the user's outer ear,
which is a mechanical vibration in the user's inner ear through the bone structure and / or
middle ear portion of the user's head. The electrical signal is transmitted directly or indirectly to
the cochlear nerve of the user.
[0081]
The hearing device may be, for example, as a unit placed behind the ear, including a tube leading
a spread acoustic signal into the ear canal, or a speaker placed near or in the ear canal, in the
pinna and / or ear canal The unit may be configured to be mounted in any known manner as a
unit attached to a fixture embedded in the skull as a fully or partially disposed unit. The hearing
device may comprise a single unit or a plurality of units in electronic communication with each
other.
[0082]
More generally, the hearing device receives an audio signal from the surroundings of the user
and provides an input audio signal, and / or an electronic (ie wired or wireless) reception of the
audio input signal. And signal processing circuitry (usually configurable) for processing the input
audio signal, and output means for providing an audible signal to the user depending on the
processed audio signal. In some hearing devices, an amplifier can constitute a signal processing
circuit. Typically, the signal processing circuitry executes the program and / or stores parameters
that are or may be used in processing and / or stores information related to the function of the
hearing device, and / or Or one or more (for example, storing information used in connection
with an interface to a user and / or an interface to a programming device, such as processed
information provided by a signal processing circuit) Integral or separate) storage element. In
some hearing devices, the output means may include an output transducer such as, for example,
a speaker providing an airborne acoustic signal or a vibrator providing a structural or liquid
propagating acoustic signal. In some hearing devices, the output means can include one or more
output electrodes that provide an electrical signal.
[0083]
03-05-2019
23
In some hearing devices, the vibrator can be adapted to provide structurally transmitted acoustic
signals to the skull percutaneously. In some hearing devices, the vibrator can be implanted in the
middle and / or inner ear. In some hearing devices, the vibrator can be adapted to provide
structural propagation acoustic signals to the bone and / or cochlea of the middle ear. In some
hearing devices, the vibrator can be adapted to provide fluid-borne acoustic signals to cochlear
fluid, for example through an oval window. Some hearing devices embed the output electrode
inside the cochlea or skull to provide electrical signals to the cochlea hair cells, one or more
auditory nerves, the auditory cortex and / or other parts of the cerebral cortex. Can be adapted to
[0084]
"Hearing system" means a system that includes one or more hearing devices, and "binaural
hearing system" includes two hearing devices to cooperatively provide audible signals to the
user's ears Means a system adapted as. Hearing system or binaural hearing system is in
communication with the hearing device (s) to influence the functioning of the hearing device (s)
and / or benefit from this function And “including”. For example, the auxiliary device may be a
remote control device, an audio gateway device, a mobile phone (e.g. a smart phone), an onpremises announcement system, a car audio system or a music player. Hearing devices, hearing
systems or binaural hearing systems are used, for example, to compensate for the hearing loss of
hearing impaired persons, to enhance or protect the hearing of normal hearing persons and / or
to convey electronic audio signals to people be able to.
[0085]
Embodiments of the present disclosure may be useful, for example, in applications such as
hearing aids, headsets, earphones, active soundproofing systems, hands-free telephone systems,
cell phones and the like.
[0086]
Aspects of the present disclosure can be best understood by referring to the following detailed
description in conjunction with the accompanying drawings.
The figures are outlined and simplified for clarity, and show only those details that make the
03-05-2019
24
claims easier to understand, and other details have been omitted. The same reference numerals
are used for the same or corresponding parts throughout. Individual features of each aspect may
be combined with each other in part or all of the other aspects. These and other aspects, features
and / or technical effects will be apparent upon reference to the figures set forth below.
[0087]
FIG. 6 shows a single channel noise reduction unit in which a single microphone acquires a mix y
(t) of target sound (x) and noise (v). It is a figure which shows the multi-channel noise reduction
unit which several microphones (M1, M2) acquire mix y (t) of target sound (x) and noise (v).
Maximum likelihood estimator ξ n <as a function of true SNR [dB] indicating the bias introduced
into maximum likelihood deductive SNR estimator ξ n <ML> = max (ξ min <ML>, γ n -1) by
one-way rectification It is a figure which shows the average value [dB] of ML>. It is a figure which
shows the input-output relationship ((DELTA) output = f ((DELTA) input)) of DD * algorithm by
numerical evaluation of Formula (7) showing STSA [1] gain function (in the case of (alpha) =
0.98). FIG. 7 is a diagram of an example implementation of the proposed directed bias and
smoothing algorithm (DBSA, implemented by unit Po2Pr). FIG. 7 illustrates how ρ and λ can be
derived from the parameters given by the decision-directed approach. It is a figure which shows
gradient (lambda) of the predetermined | prescribed function which represents STSA gain
function = 0.98. FIG. 7 shows the zero crossing ρ of a given function representing an STSA gain
function = 0.98. Fig. 6 shows a comparison of the response (mark x) of the DBSA algorithm
according to the present disclosure with the response of the DD algorithm (line) with the adapted
function in Fig. 6A and Fig. 6B, the curve deductive SNR from -30 dB to +30 dB It is a figure
which shows a value every 5 dB. The DBSA algorithm (shown in Figure 4) has been modified to
adapt to filterbank oversampling to mimic the dynamic behavior of the system with less
oversampling, with the goal of inserting an additional D-frame delay into the recursive loop FIG.
FIG. 1 illustrates an embodiment of an audio processing device, such as a hearing aid, according
to the present disclosure. FIG. 9B illustrates an embodiment of a noise reduction system
according to the present disclosure for use in the example audio processing device of FIG. 9A (for
M = 2). FIG. 5 illustrates the generation of a combined recursive signal to noise ratio from two
recursive signal to noise ratios, one generated from a single microphone channel and the other
generated from a multi microphone configuration. FIG. 1 illustrates an embodiment of the
present hearing aid that includes a BTE portion located behind the user's ear and an ITE portion
located within the user's ear canal, according to the present disclosure. FIG. 7 is a diagram of a
first further exemplary implementation of the proposed directed biasing and smoothing
algorithm (DBSA, implemented eg by unit Po2Pr of FIGS. 1A, 1 B and 9 B).
FIG. 7 is a diagram of a second further exemplary implementation of the proposed directed bias
03-05-2019
25
and smoothing algorithm. FIG. 7 is a diagram of a third further exemplary implementation of the
proposed directed bias and smoothing algorithm. FIG. 13 illustrates a general example of
providing expression flags for use in the embodiment of the DBSA algorithm shown in FIGS. 12A,
12B, 12C. FIG. 13 illustrates an exemplary embodiment of an expression detector (controller)
based on input from adjacent frequency bands that provides an expression flag that can be used
in the embodiment of the DBSA algorithm shown in FIGS. 12A, 12B, 12C. .
[0088]
The figures have been outlined and simplified for clarity, and show only those details that are
essential to the understanding of the present disclosure and omit other details. The same
reference numerals are used for the same or corresponding parts throughout.
[0089]
Further areas of applicability of the present disclosure will become apparent from the following
detailed description. However, it should be understood that the detailed description and the
specific examples, while indicating preferred embodiments of the present disclosure, are merely
exemplary. Other embodiments will be apparent to those of ordinary skill in the art from the
following detailed description.
[0090]
The detailed description set forth below in connection with the appended drawings is intended as
a description of various configurations. The detailed description includes specific details for the
purpose of providing a thorough understanding of the various concepts. However, it will be
apparent to one skilled in the art that these concepts may be practiced without these specific
details. Aspects of the apparatus and method are described by various blocks, functional units,
modules, components, circuits, steps, processes, algorithms, etc. (collectively referred to as
"elements"). These elements may be implemented using electronic hardware, computer programs,
or any combination thereof, depending on the particular application, design constraints or other
reasons.
[0091]
03-05-2019
26
Electronic hardware may include microprocessors, microcontrollers, digital signal processors
(DSPs), field programmable gate arrays (FPGAs), programmable logic devices (PLDs), gated logic,
discrete hardware circuits, and various other features described throughout this disclosure. Other
suitable hardware configured to perform the function may be included. A computer program,
whether called in software, firmware, middleware, microcode, hardware description language, or
otherwise, may be an instruction, instruction set, code, code segment, program code, program,
subprogram, software It should be interpreted broadly to mean modules, applications, software
applications, software packages, routines, subroutines, objects, executables, threads of execution,
procedures, functions, etc.
[0092]
The present application relates to the field of hearing devices, for example hearing aids.
[0093]
Speech enhancement and noise reduction can be obtained by applying a fast-varying gain in the
time-frequency domain.
The purpose of applying the fast-varying gain is to suppress the noise dominated time-frequency
tile while keeping the speech-dominated time-frequency tile unchanged. This increases the
modulation of the resulting enhanced signal, typically becoming similar to the modulation of the
original speech signal, resulting in higher speech intelligibility.
[0094]
The signal y (t) to be observed is the sum of the target speech signal x (t) (eg, picked up by one or
more microphones) and the noise v (t), and the analysis filter bank (FBA; FBA1 , FBA 2), and the
frequency subband signal Y kn corresponding to the frequency k (hereinafter, the frequency
index k is omitted to simplify the notation) and the time frame n (see, eg, FIGS. 1A and 1B).
Suppose that (Y (n, k)) occurs. For example, Y n can include (or consist of) complex coefficients
obtained from a DFT filter bank. Spectral speech enhancement relies on the estimation of the
amount of target signal (X) compared to the amount of noise (N) in each time-frequency tile, ie,
the signal to noise ratio (SNR). In spectral noise reduction, SNR is generally described using two
03-05-2019
27
different terms: 1) Inductive SNR. It is defined as follows. Is an estimate of the spectral density of
the noise (spectral power variation of the noise) in the nth time frame. 2) Deductive SNR. It is
defined as follows. | Xn | <2> is the spectral density of the target signal. The inductive SNR
requires an estimate of the power spectral density of the noise, and the a priori SNR requires
access to the power spectral density () of both the speech (| Xn | <2>) and the noise. If a priori
SNR is available, then an estimate of the target signal can be found for each time and frequency
unit as follows, which represents the Wiener's gain approach. However, other SNRs can be used
for the gain function. The terms "inductive" and "deductive" signal-to-noise ratio are used, for
example, in [4].
[0095]
FIG. 1A shows a single channel noise reduction unit in which a single microphone (M) receives a
mix y (t) of target sound (x) and noise (v), and in FIG. 1B a plurality of microphones (M1) , M2)
show a multi-channel noise reduction unit that receives the mix y (t) of the target sound (x) and
the noise (v).
[0096]
The present disclosure assumes that an analog-to-digital conversion unit is applied as needed to
provide a digital electrical input signal from the microphone.
Similarly, if desired, it is assumed that the output signal (e.g. to the signal to be converted by the
speaker into an acoustic signal) is applied to the digital signal to analog conversion unit (s).
[0097]
These mix (s) are converted into the frequency domain by the respective analysis filter banks
(shown as FBA (analysis) and FBA1 (analysis), FBA2 (analysis) respectively in FIGS. 1A and 1B)
(FIG. 1A and FIG. 1A). The Y (n, k) and Y (n, k) 1 and Y (n, k) 2 signals Y (n, k) are acquired in FIG.
1B. In each case, the recursive SNR γ (inductive SNRs in FIGS. 1A and 1B, γ n) contains the
target signal (provided by the respective amplitude square calculation unit | · | <2>) power
spectral density | Recognized as the ratio between Yn | <2> and the power spectral density
estimate of the noise in the mix (provided by each noise estimation unit NT, shown as <σ <2> in
FIGS. 1A and 1B) 1A, 1B (see combined unit “//”)). In the case of multiple microphones (eg,
FIG. 1B), as shown by the output signal from the beamformer filtering unit BFU of FIG. 1B, a
03-05-2019
28
linear combination of microphone signals Y (n, k) = w (k) 1 · Y (n) , K) 1 + w (k) 2 · Y (n, k) 2 can
reduce noise in the mix, and another linear combination of microphone signals (N (n, k) for the
purpose of target signal cancellation) To better estimate the residual noise by using Y (n, k) = w
(k) 3 · Y (n, k) 1 + w (k) 4 · Y (n, k) 2) it can.
[0098]
The a priori signal-to-noise ratio (a priori SNR, ζ n in FIGS. 1A, 1 B) is determined by the
transform unit Po2Pr implementing the algorithm according to the present disclosure, which will
be further described below. For example, the a priori SNR is converted to a gain of any SNR to
gain conversion unit SNR2G that can provide the resulting current noise reduction gain GNR (eg,
based on Wiener's gain function), and this combined unit The noise reduction signal YNR (n, k)
can be provided at “X” to the signal Y (n, k) (the input signal of FIG. 1A and the spatially
filtered signal of FIG. 1B).
[0099]
If an estimate of the noise power density (shown as <σ <2 >> in FIGS. 1A and 1B) is available, the
inductive SNR can be determined directly (combination of FIGS. 1A and 1B (here) Division) unit
"../." Since the target power spectral density (An <2>) where an estimated value of unknown
target amplitude | Xn | is An can not be accessed usually, direct access to the a priori SNR is not
possible. To estimate the a priori SNR, a decision-directed (DD) algorithm (1) has been proposed
[1] and is an estimate of the target signal amplitude (in the nth time frame), and Is the noise
spectral deviation (power spectral density) at the frequency of f, and α is a weighting factor. The
above equation is a linear combination of the following two estimates of the a priori SNR ξ n:
(Γ-1 = (| Y | <2> / σ <2>)-1 == (| Y | <2> -σ <2>) / σ <2>) to)) recursive Part (generally
depends on ξ n), and 2) non-recursive part max (0, γ n -1). Typically, the weighting parameter
α is chosen in intervals of 0.94 to 0.99, but may obviously depend on the frame rate and
possibly other parameters. The noise estimate may for example be a noise activity detector and a
noise tracker (e.g. [2] EP 2701145 [using an estimate of noise level when speech is not detected
and functioning in frequency subbands) level estimators (e.g. It is assumed that it is available
from spectral noise estimators such as 3). Speech amplitude estimates are obtained using speech
estimators, some of which are available. In general, the speech estimator can be represented by
the corresponding gain function G: (2)
[0100]
03-05-2019
29
The gain function can be chosen depending on the cost function or the object to be minimized
and statistical assumptions about the speech and noise processes. Well known examples are
STSA gain functions [1], LSA [4], MOSIE [5], Wiener, and spectral subtraction gain functions [5],
[7]. STSA (STSA = minimum-mean square error Short-Time Amplitude estimator), LSA and MOSIE
depend on both (estimated) a priori SNR ξ n and recursive SNR However, the Wiener and
spectral subtraction gain functions are one-dimensional and depend only on ξ n. As shown in
[5], can be estimated using the following equation known as the MOSIE estimator. (3) In the
formula, Γ (. ) Is a gamma function, Φ (a, b; x) is a confluent hypergeometric function, If (2) and
(3) are combined, it can be written as follows.
[0101]
If β = 0.001 and μ = 1, the LSA estimator (see, eg, [4]) can be sufficiently approached (see, eg,
[5]). Thus, the a priori SNR estimated by the decision-directed approach is a smoothed version of
max (0, γ n -1) which depends on the smoothing factor α and the selected estimator for
obtaining
[0102]
As mentioned above, α can depend on the frame rate. In one embodiment, a decision-oriented
approach, as originally proposed in [1], is designed with a frame that is displaced every 8
milliseconds (ms). In hearing aids, frames are usually updated at much higher frame rates (e.g.,
every millisecond). The high oversampling factor of this filter bank can speed up the system
response (eg, to better maintain speech expression). The advantage of this faster reaction time
can not be sufficiently achieved by just adjusting α according to the high frame rate. Instead, we
propose a method that works well when exploiting high oversampling factors.
[0103]
The DD algorithm (1) can be reformulated as a recursive function: = (4)
[0104]
03-05-2019
30
As a first simplification, consider a slightly modified algorithm called DD *. The recursion in DD *
changes to depend only on the current frame observations and previous a priori estimates.
(5)
[0105]
The effect of this modification on the a priori estimates can be quantified by numerical
simulation (see later) and it can be seen that the effects are generally small but audible. In fact,
using the nearest frame power to inductive SNR in the gain function seems to be useful for SNR
estimation at speech onset.
[0106]
Next, we consider the maximum likelihood SNR estimator, which represents the SNR value with
the highest likelihood. Here we make the standard assumption that noise and speech processes
are uncorrelated Gaussian processes and that the spectral coefficients are independent over time
and frequency [1]. At this time, the maximum likelihood SNR estimator ξ n <ML> is given by
(6)
[0107]
Note that the maximum likelihood estimator is not a central estimator because its mean is
different from the true value. In this case, the central estimator is as follows and can take
negative values.
[0108]
Figure 2 shows the maximum likelihood as a function of true SNR [dB] which shows the bias
introduced into the maximum likelihood deductive SNR estimator ξ n <ML> = max (ξ n <ML>,
γ n-1) by one-way rectification The average value [dB] of the estimated amount ξ n <ML> is
shown. It is assumed that the target signal is a Gaussian signal. For noise only inputs, the
03-05-2019
31
estimated SNR is equal to ξ <ML> = e <−1 >> − 4.3 dB (assuming ξmin <ML> = 0, also see
[5]), the bias of FIG. Please refer to it. One advantage of the DD approach is that it compensates
for this bias.
[0109]
Input-Output Relationship In the following, we propose a functional approximation of the DD *
algorithm in equation (5). For mathematical convenience, in the following we assume such an
approximation. This assumption simplifies the non-recursive part, since ξ n = max (0, γ n -1) is
simplified to ξ n = γ n -1 and γ n = ξ n + 1. It can be shown that the influence of this
assumption (on the outcome) is actually small. Thus, ignoring the case, the DD * algorithm of
equation (5) can be written as a function of ξ n <ML> below. (7) That is, the function 示 す
indicates the relative change in the a priori estimates as a function of the ratio of the current ξ n
<ML> to the previous a priori SNR estimates ξ n -1. Therefore, it becomes as follows. (8)
[0110]
By expressing the SNR ratio on a logarithmic (dB) scale, the above relationship represents the
non-linear input-output relationship represented by the DD * algorithm.
[0111]
FIG. 3 shows the input-output relationship (Δoutput = f (Δinput)) of the DD * algorithm by the
numerical evaluation of Equation (7) representing the STSA [1] gain function (for α = 0.98).
Smoothing is effective at low a priori SNR estimates (e.g., the curve shown as -30 dB), as the
change in output is small for moderate input changes. Furthermore, as can be seen by the zero
crossings of the non-zero abscissas, the average estimated a priori SNR is smaller than the
average maximum likelihood SNR estimate when a bias is introduced. The term “bias” is often
used to reflect the difference between the expected value E () and the “true” reference value,
but here the expected value E (ξ n <ml>) and E Used to reflect the difference with (と n). FIG. 3
shows the difference (or these) between the current maximum likelihood estimate ξ n <ML> and
the previous a priori SNR estimate ζ n -1 (and the absolute value of the previous a priori SNR
estimate ζ n -1) So that the difference (or their ratio) (output) between the current a priori SNR
estimate ζ n and the previous a priori SNR estimate ζ n-1 can be determined from the
knowledge of Give a graphical relationship to
03-05-2019
32
[0112]
Figure 3 shows the relationship that reveals these two salient effects, with lower a priori SNR
values (eg, the curve shown as ζ n -1 = -30 dB) where the output change is smaller than the
input change, Low pass filtering / smoothing of the maximum likelihood SNR estimate ξ n <ML>
is effectively implemented. At high a priori SNR values (ζ n−1 = + 30 dB), the DD * a priori SNR
estimates ξ n change as well as changes in ξ n <ML>, and the smoothing is very small. Second,
the zero crossing of the curve representing the low a priori SNR value is displaced to a positive
dB value of up to about 10 dB. This means that in the low SNR region, the a priori SNR estimate
ξ n should settle to about 10 dB lower than the average value of ξ <ML>.
[0113]
FIG. 3 shows the difference (or these) between the current maximum likelihood estimate ξ n
<ML> and the previous a priori SNR estimate ζ n -1 (and the absolute value of the previous a
priori SNR estimate ζ n -1) So that the difference (or their ratio) (output) between the current a
priori SNR estimate ζ n and the previous a priori SNR estimate ζ n-1 can be determined from
the knowledge of Give a graphical relationship to
[0114]
From the graph as shown in FIG. 3, the smoothing parameter (λ DD) and the bias parameter (ρ)
described later for the graphs related to 繹 n-1 = -30 dB, = n -1 = 0 dB and ζ n -1 = +30 dB for
the a priori SNR The value of can be read.
The bias parameter ρ is taken as the zero crossing of the graph between the horizontal axis. The
smoothing parameter λ DD is recognized as the slope shown as α (•) of the graph of interest at
the zero crossing. These values are extracted and stored, for example, in a table representing the
relevant values of the a priori SNR (see, for example, the mapping unit MAP in FIG. 4).
[0115]
FIG. 4 is a diagram of an exemplary implementation of the proposed Directed Bias and Smoothing
Algorithm (DBSA) implemented in the transform unit Po2Pr.
03-05-2019
33
[0116]
Directed Bias and Smoothing Algorithm (DBSA) FIG. 4 proposes a proposed directed bias and
smoothing algorithm aiming to provide a configurable alternative implementation of the DD
approach that includes the following three main effects of DD: FIG. 5 is a diagram of (with DBSA,
implemented by unit Po2Pr).
1. SNR dependent smoothing to allow further smoothing in low SNR conditions, reducing
music noise. 2. Negative bias compared to SNRn <ML> in low SNR conditions, reducing the
audibility of music noise during periods of noise only. 3. Recursive bias that allows fast
switching of SNR conditions from low to high and high to low.
[0117]
The DBSA algorithm operates on SNR estimates in the dB domain, thus introducing as well as,
[0118]
The central part of the proposed algorithm embodiment is a first order IIR low pass filter with
unit DC gain and adaptive time constant. Two functions λ (sn) and ρ (sn) control the amount of
smoothing and the amount of SNR bias as a recursive function of the estimated SNR.
[0119]
In the following, control functions are derived to mimic the input-output relationship of the DD
system described above. Assuming that the a priori SNR and the maximum likelihood SNR
expressed in dB are sn and sn <ML>, and the maximum operation is ignored (for the moment κ
→ ∞), the DBSA input-output relationship is defined by . (9)
[0120]
03-05-2019
34
Therefore, identifying DBSA with the DD * method corresponds to the following approximation.
(10)
[0121]
In order to completely specify the DBSA in (10), it is necessary to specify the bias function ((sn)
and the smoothing function λ (sn). Since the purpose is to mimic the behavior of the DD *
method, for example, measure the location of the zero crossing and the slope at this location of
the function (which is evaluated as a function of ξ n <ML>), sn) and λ (sn) can be chosen to
have the same value. Therefore, the bias function ((sn) is selected to be equal to the following
values, and ξ satisfies the following.
[0122]
Similarly, the smoothing function λ (sn-1) is obtained at the position of the 0 dB crossing of the
curve in FIG. 3 (ie, when sn <ML> −sn = −ρ (sn−1)) (sn <ML> Can be set equal to the slope).
[0123]
FIG. 4 shows an implementation of a directed bias and smoothing algorithm (DBSA) that replaces
the DD approach.
The dashed box in the upper right portion of FIG. 4 represents a first order IIR low pass filter
with unit DC gain and variable smoothing factor λ (λ n-1 in FIG. 4). The mapping unit MAP,
which provides the input to this part, and the combination unit “+” (resulting in the signal sn
<ml> −nn−1) and the first-order IIR low-pass filter (resulting in the smoothing parameter λ
and the bias parameter ρ) The following equation (10) is implemented (see the instruction
“from equation (10)” in FIG. 4). The two mapping functions λ (s) and ρ (s) (see mapping unit
MAP) give the amount of smoothing (λ) and bias (ρ) as a recursive function of the estimated a
priori SNR (sn in FIG. 4) Each is controlled as -1 (ζ n -1). The left part of FIG. 4 which yields the
maximum likelihood value ζ n <ml> of the a priori SNR of the n th time frame implements
equation (6) above (see the indication “from equation (6) in FIG. 4) ). The maximum likelihood
value ζ n <ml> of the a priori signal-to-noise ratio is transformed into the logarithmic domain by
the "dB" unit. The mapping unit MAP may, for example, be a figure for the associated value of the
a priori SNR ((for example for a wider 範 囲 range and / or more values such as 1 curve per 5 dB
03-05-2019
35
or 1 curve per dB). The memory is implemented as a memory including a look-up table (or
equivalent data material) (see the display of FIG. 4; “from FIG. 3”) including the values of the
smoothing parameter and the bias parameter ρ extracted from B.3. An implementation of the
(off-line) calculation algorithm of the associated smoothing parameter λ and bias parameter さ
れ る stored in the memory of the mapping unit MAP is shown in FIG. The embodiment of FIG. 4
further comprises a bypass branch for the value of the larger current maximum likelihood value
sn <ml> (ζn <ml>) of the a priori SNR implemented by the unit BPS. The bypass unit BPS
includes a combination unit "+" and a maximum operator unit "max". The combination unit "+"
takes the parameter κ as input. The value of κ is subtracted from the current maximum
likelihood value sn <ml> and the resulting value sn <ml> -κ is supplied to the maximum value
unit max along with the previous value sn−1 of the a priori SNR. Thereby, a relatively large
value (greater than sn-1 + κ) of the current maximum likelihood value sn <ml> (ζn <ml>) of the
a priori SNR can immediately affect the input to the mapping unit MAP become able to.
In one embodiment, the parameter κ is frequency dependent (ie, different for different
frequency channels k, for example).
[0124]
FIG. 5 shows how the bypass parameter ρ and the smoothing parameter λ can be derived from
the parameters of the decision directed method. FIG. 5 shows an embodiment of an algorithm for
generating association data with the mapping unit MAP of FIG. This algorithm determines the
bias parameter ρ and the smoothing parameter λ from the current maximum likelihood value
sn <ml> (ζn <ml>) of the a priori SNR and the previous a priori SNR value sn-1. Depending on
whether the input is increasing or decreasing, as opposed to having a single mapping of 及 び
and λ, one can choose to have different sets of 及 び and λ. This corresponds to having
different attack and release values for ρ and λ, such a set of parameters being derived from
different values of α corresponding to different attack and release times (following the mapping
unit MAP Can be stored in Smoothing to account for different frame rates (or frame lengths) than
those used in the LSA method [4] (to allow direct use of the value of the smoothing parameter λ
stored in the mapping unit MAP), as described below It is preferable to implement compensation
of the optimization parameters. This is further described below, for example, in connection with
FIG.
[0125]
03-05-2019
36
6A and 6B respectively show the slope λ and the zero crossing ρ of the function representing
the STSA gain function [1], and in both cases α = 0.98 is used. FIG. 7 shows a comparison of the
response (marked x) of the DBSA algorithm according to the present disclosure with the response
of the DD algorithm (line) with the adapted function in FIGS. 6A and 6B, the curves being −30
dB to +30 dB Deductive SNR values of up to 5 dB are shown.
[0126]
FIG. 6 shows the results from the numerical evaluation and FIG. 7 shows a comparison of the
input-output response of the DD * algorithm with the input-output response of the DBSA
algorithm. As shown in the simulation in the later section, it can be seen that in most cases the
difference is very small.
[0127]
If the observed SNR is low, consider the case of In DBSA, this case is captured by the minimum
value ξmin <ML> which limits the impact. Referring again to equation (2), in general, the class
of gain functions that can be represented as powers of the Wiener gain function is This property
makes the bias of the DD algorithm extremely negative and can be mimicked with DBSA using
relatively small values of ξ min <ML>.
[0128]
On the other hand, for STSA, LSA and MOSIE gain functions, a gain greater than 0 dB occurs at
and a non-zero at the extreme. This effect can be addressed to some extent by the larger ξmin
<ML>. In practice, the difference between the DD * method and the DBSA can be made negligible.
[0129]
Numerical Problems In some cases (usually at low a priori SNR values), the function does not
have a zero crossing. This reflects the limit of the range of actual a priori SNR values that the
system can generate. One particular example occurs when the gain function is limited by some
03-05-2019
37
minimum gain value Gmin. Substituting this minimum value into equation (5), we can easily show
the following.
[0130]
Thus, if ξ n -1 is sufficiently small, then the function Ψ becomes greater than 1, which also
means that there is no zero crossing of the function 10 log 10 Ψ. The numerical implementation
needs to detect this situation and again identify some suitable look-up table values for ρ (sn) and
λ (sn). In practice, the exact value used is not a problem, as it is most likely to be sampled only
during convergence from the initial state.
[0131]
Maximum operator, etc. In FIG. 4, the maximum operator is located in the recursive loop, and a
priori estimation of the frame where the maximum likelihood SNR estimate value was earlier in
the calculation of bias parameters (via parameter κ) and smoothing parameters. Allow bypassing
of values. The motivation for this factor is to help detect SNR manifestations and thus reduce the
risk of over-attenuation of speech onset. In equation (1) of the DD approach, the term (1-α)
allows a rapid reduction of the negative bias due to the large expression in the current frame, this
maximum imitating this behavior to be controlled by the parameter κ Do. Thus, the coefficient
κ can be used to bypass the smoothing. By increasing κ, speech expression can be better
maintained. On the other hand, as κ increases, the noise floor may increase. However, the
increased noise floor only affects when applying a large amount of attenuation. Thus, the value of
κ chosen depends on the maximum amount of attenuation chosen.
[0132]
Instead of the maximum operator (“max” in FIG. 4, FIG. 5 and FIG. 8), a more general selection
scheme can also be used to identify (rapid) changes in SNR (eg, expression), eg See the "Select"
unit of the embodiments shown in FIGS. 12A, 12B and 12C. Such a general scheme may, for
example, be an event (change) in the acoustic environment (e.g. a sudden appearance or removal
of a noise source (e.g. wind noise) or other sound sources such as a speech source such as one's
own voice) Consideration of sudden changes (eg, see FIG. 13A) and / or consideration of changes
in the signal across many frequency bands around the considered frequency band (eg, all
frequency bands are evaluated and logical criteria applied And providing an expression flag for
03-05-2019
38
the resulting frequency band of interest (see, eg, FIG. 13B).
[0133]
Filter Bank Oversampling The filter bank parameters have a large impact on the results of the DD
approach. Oversampling is a key parameter to consider as it directly affects the amount of
smoothing effect and bias introduced into the a priori SNR estimates.
[0134]
The literature does not adequately describe how to correct for filter bank oversampling in the DD
approach. In the first formulation [1], a 256 point FFT was used with a Hanning window, an
overlap of 192 samples corresponding to 4 × oversampling, and an 8 kHz sample rate. In
general, 2 × oversampling (50% frame overlap) is common, see [1] and references therein.
However, in hearing aids and other low delay applications, oversampling by a factor of 16 or
more is not unrealistic.
[0135]
If all the conditions are equal, oversampling suppresses the recursive effects of the DD method
and the DBSA method. In the limit of "infinite" oversampling, the recursive bias is replaced by an
asymptotic bias function.
[0136]
One possible approach for oversampling compensation is to downsample the DD / DBSA estimate
by a proportional multiple of oversampling to keep the a priori estimates constant over multiple
frames. A disadvantage of this approach may be that the introduction of gain jumps may result in
poor sound quality when used in combination with an oversampled filter bank. With
oversampling, an equivalent synthesis filter may not be sufficient and may be insufficient for the
attenuation of the convolutive noise introduced by the gain jump.
03-05-2019
39
[0137]
In the DBSA method, the combination of directed recursion smoothing and directed recursion
bias controls temporal behavior (that is, response to smoothing and expression of the SNR
estimate). The computational requirements are strict, but the theoretically accurate method of
processing the filter bank oversampling is due to the high-order delay elements (circular buffers)
in the recursive loop as shown in FIG.
[0138]
FIG. 8 shows a modification of the DBSA algorithm (shown in FIG. 4) to match filter bank
oversampling, in which case the purpose of inserting an additional D-frame delay in the recursive
loop is less over To imitate the dynamic behavior of the system by sampling. Compared with the
embodiments of the DBSA algorithm illustrated in FIG. 4, FIG. 5 and FIG. 8, the embodiments
shown in FIG. 12A, FIG. 12B and FIG. 12C have selection operations where the largest operator
can be controlled by eg Onset flag. It differs in that it is replaced by a child (select). Expression
flags were qualified according to, for example, a predetermined or adaptive (eg, logic) scheme
(see, eg, FIG. 1A), as opposed to a maximal operator that only affects the local frequency channel
k, and One can rely on multiple "control inputs", including / or other frequency channels (see, eg,
FIG. 13B). In one embodiment, the bias parameter κ is frequency dependent (ie, different for
different frequency channels k).
[0139]
FIG. 12A is a diagram of a first further exemplary implementation of the proposed directed bias
and smoothing algorithm (DBSA, implemented eg by the unit Po2Pr of FIGS. 1A, 1B and 9B). The
expression flag may also be dependent on other frequency channels as opposed to the maximum
operator, which only affects the local frequency channel k (see, eg, FIG. 13B). The advantage of
the expression flag is that expression information detected in a small number of frequency
channels with high SNR can be propagated to the frequency channel with low SNR (assuming
that the expression simultaneously affects many frequency channels) is there. This allows the
expression information to be applied more quickly in low SNR frequency channels. In one
embodiment, a broadband expression detector can be used (or as an input to an expression flag
criterion) as well as an expression flag for a given frequency channel k. Alternatively, for
example, the bias-corrected latest SNR value sn <ML in a plurality of K frequency channels (for
example, channel k of interest and adjacent channels on both sides (for example, k−1, k + 1, see
03-05-2019
40
FIG. 13B)) If>-そ の (its maximum likelihood ("deductive") estimate) is higher than the previous
("deductive") SNR value sn-1, it is a measure of expression. The determination of the expression
flag of a given frequency channel k may take into account frequency channels other than the
immediately adjacent channels and / or other expression indicators. In one embodiment, the
expression flag of a particular frequency channel k is determined depending on whether local
expression was detected in at least q channels with a number between 1 and K as q.
[0140]
FIG. 12B is a diagram of a second further exemplary implementation of the proposed directed
bias and smoothing algorithm (DBSA, implemented eg by unit Po2Pr of FIGS. 1A, 1B and 9B). In
addition to the dependence on SNR, λ and ρ can also depend on whether the SNR is increasing
or decreasing. If the SNR is increased as shown by sn <ML> + ρn−1−sn−1> 0, then one set of
λ (s) and ρ (s), λatk (s), and katk (s) And another set of λ (s) and ρ (s), and λrel (s), if SNR
decreases as shown by sn <ML> ++ n−1−sn−1 <0. And rel rel (s). Exemplary transitions of the
smoothing parameters λ (s) and ρ (s) are shown in FIGS. 6A and 6B, respectively.
[0141]
Furthermore, in another preferred embodiment, the "selection" unit may not be dependent solely
on the detected expression. The selection unit may also rely on its own voice detected, wind
noise, or any combination of the mentioned (or other) detectors (see, eg, FIG. 13A).
[0142]
FIG. 12C is a diagram of a third additional exemplary implementation of the proposed directed
bias and smoothing algorithm (DBSA, implemented, for example, by the unit Po2Pr of FIGS. 1A,
1B and 9B). In addition to the dependence on the SNR, λ and ρ can also depend on another
indicator that the SNR is increasing or decreasing. If the SNR increases as shown by sn <ML> -sn1> 0, then select one set of λ and と, λ atk and ρ atk, and sn <ML> -sn-1 <0 Choose another set
of λ and と, λ rel and ρ rel if SNR decreases as indicated by
[0143]
03-05-2019
41
FIG. 13A shows a general example of providing expression flags for use in the embodiment of the
DBSA algorithm shown in FIGS. 12A, 12B, 12C. An audio processing device, such as a hearing aid,
provides a plurality of indicators (signals IX1, ..., IXND) of the expression of the change of the
acoustic scene around the audio processing device, the signal being considered by the forward
path of the audio processing device , N D detectors or indicators (IND 1,..., INDND) capable of
changing the SNR of For example, such an indicator may be a general expression detector, wind
noise detector, unique that detects abrupt changes in the time-varying input sound s (t) (see, eg,
FIG. 9A), such as its modulation. Audio detectors such as audio detectors, etc., and combinations
thereof may be included. The outputs (IX1, ..., IXND) from the indicators (IND1, ..., INDND)
implement an algorithm that provides the resulting expression indicator (signal expression flag)
for a given frequency channel k It is supplied to the controller (CONTROL). A particular
implementation (or partial implementation) of such a scheme is shown in FIG. 13B.
[0144]
FIG. 13B shows an exemplary embodiment of a controller (CONTROL) based on an input from an
adjacent frequency band that provides an expression flag that can be used in the embodiment of
the DBSA algorithm shown in FIGS. 12A, 12B, 12C. . The illustrated scheme is SNR as indicated
by whether sn <ML> (k ′) − κ> sn−1 (k ′) is satisfied over multiple frequency bands k ′
around the considered frequency band k. Providing an input indicator signal (IXp,..., IXq)
comprising an indicator that evaluates the change over time of the (eg, k ′ = k−1, k and k + 1,
or only one of these, or “ Equations are evaluated and logic for two out of three, etc., or for all
frequency bands k = 1,..., K (or, for example, a selection range where speech and / or noise is
expected to occur) Apply the criteria to provide the resulting frequency band expression flag). In
one embodiment, only the bands directly adjacent to a given channel k are considered, ie three
channels are included in providing the expression flag for each channel. In one embodiment,
such a scheme is combined with the inputs from other detectors as mentioned in connection with
FIG. 13A. In one embodiment, the equation sn <ML> (k ′) − κ> sn−1 (k ′) or other similar
equation is evaluated for multiple frequency channels around the channel of interest, eg, all
channels, and the result A scheme providing the obtained expression flag is applied to the input
indicator (IXp, ..., IXq). The bias constant κ can be constant over the entire frequency, or can be
different for each channel, or different for some channels.
[0145]
Advantages of Proposed Implementation The proposed implementation has the following
03-05-2019
42
advantages over decision-oriented approaches. • The smoothing parameters can be adjusted to
take into account filter bank oversampling, which is important for implementation in low delay
applications such as hearing aids. The smoothing and biasing do not depend on the selected gain
function, but the parameterization of the two mapping functions directly controls the smoothing
λ (s) and the bias ρ (s). This allows each mapping function to be separately tuned for the
desired tradeoff between noise reduction and sound quality. For example, by overemphasizing
the bias, the target energy can be better maintained. Parameters can also be set to address a
given range of SNRs of interest. Such parameter settings are determined by the individual user,
as some users benefit primarily from noise reduction (in terms of variation gain) in the low SNR
region and do not require noise reduction as a high signal-to-noise ratio. Can be chosen
differently. On the other hand, some users may need noise reduction in the high signal to noise
ratio region and may need constant attenuation in the low signal to noise ratio region. As an
extension of the proposed system, the smoothing and bias parameters can also depend on
whether the input is increasing or decreasing. That is, different attack and release values of the
two parameters can be used. -Changing the decision-directed approach to relying only on the
current frame of observations and previous a priori estimates seems to be advantageous for SNR
estimation at speech onset. Similarly, the maximum operator controlled by the parameter κ can
be used to reduce the risk of over-attenuating speech expression. The value chosen can depend
on the chosen maximum attenuation. • Address the case with pre-smoothing of ξ n <ML> with
the chosen minimum ξ min <ML>. The noise estimator may rely on multi-channel input as well
as single-channel input or both and / or binaural input (see, eg, FIG. 10). The DBSA parameters
can be adjusted differently depending on whether the noise estimator relies on a single channel
input or a multichannel input.
[0146]
FIG. 9A shows an embodiment of an audio processing device APD, such as a hearing aid
according to the present disclosure. A time-varying input sound s (t), assumed to contain a mix of
the target signal component x (t) and the noise signal component v (t), is picked up and
processed by the audio processor and audible in the processed form It is provided to the user as
a signal. The audio processing device of FIG. 9A, here a hearing aid, comprises a large number of
input units IUj, j = 1,..., M, each of which represents the sound s (t) in time-frequency
representation (k, m To provide an electrical input signal Si represented by In the embodiment of
FIG. 9A, each input unit IUi converts the input sound si from the environment (received at the
input unit IUi) into electrical time domain signals s'i, i = 1,. It contains a converter ITi. The input
unit IUi converts the electrical time-domain signal s'i into a plurality of frequency sub-band
signals (k = 1,. And an analysis filter bank FBAi. The hearing aid further includes a multi-input
noise reduction system NRS that provides a noise reduction signal YNR based on the number of
electrical input signals Si, i = 1,. The multi-input noise reduction system NRS includes a multi-
03-05-2019
43
input beamformer filtering unit BFU, a post filter unit PSTF, and a control unit CONT. The multiinput beamformer filtering unit BFU (and the control unit CONT) receives a number of electrical
input signals Si, i = 1,..., M and provides signals Y and N. The control unit CONT includes a
memory MEM storing the complex weights Wij. The complex weights Wij define possible fixed
beamformers of the beamformer filtering unit BFU (provided to the BFU via the signal Wij) (see,
for example, FIG. 9B). The control unit CONT estimates one or more speech activities to estimate
whether a given input signal (eg a given time-frequency unit of the input signal) contains (or is
dominated by) speech It further includes a detector VAD. The control signals V-N1 and V-N2 are
respectively supplied to a beamformer filtering unit BFU and a post filtering unit PSTF. The
control unit CONT receives from the input unit IUi a number of electrical input signals Si, i = 1,...,
M and receives a signal Y from the beamformer filtering unit BFU.
Signal Y contains an estimate of the target signal component and signal N contains an estimate of
the noise signal component. The (single channel) post-filtering unit PSTF receives the (spatial
filtered) target signal estimate Y and the (spatial filtered) noise signal estimate N, from the noise
signal estimate N The noise reduction target signal estimate YNR is provided (further) based on
the knowledge of the extracted noise. The hearing aid further comprises a signal processing unit
SPU which (further) processes the noise reduction signal to provide a processed signal ES. The
signal processing unit SPU can be configured to apply a level and frequency dependent shaping
of the noise reduction signal YNR, for example to compensate for the hearing impairment of the
user. The hearing aid further comprises a synthesis filter bank FBS which converts the processed
frequency sub-band signal ES into a time domain signal es, the time domain signal es being an
output unit which provides the user with a stimulus es (t) as a perceptible signal. Supplied to OT.
In the embodiment of FIG. 9A, the output unit comprises a speaker providing the processing
signal es as sound to the user. Here, the forward path from the input unit to the output unit of
the hearing aid is operated in the time-frequency domain (processed in the plurality of frequency
subbands FBk, k = 1,..., K). In another embodiment, the forward path from the input unit to the
output unit of the hearing aid can be manipulated in the time domain. The hearing aid may
further include a user interface and one or more detectors that allow user input and detector
input to be received by a noise reduction system NRS such as a beamformer filtering unit BFU.
The adaptive function of the beamformer filtering unit BFU can also be provided.
[0147]
FIG. 9B is a block diagram of an embodiment of a noise reduction system NRS for use in the
example speech processing device of FIG. 9A (for M = 2), such as a hearing aid, according to the
present disclosure. FIG. 9B illustrates the exemplary embodiment of the noise reduction system
of FIG. 9A in further detail. FIG. 9B shows an embodiment of an adaptive beamformer filtering
03-05-2019
44
unit (BFU) according to the present disclosure. The beamformer filtering unit includes first
(unidirectional) and second (target canceled) beamformers (shown as fixed BFO and fixed BFC in
FIG. 9B and symbolized by corresponding beam patterns). The first and second fixed
beamformers provide beam forming signals O and C, respectively, as a linear combination of the
first and second electrical input signals S1 and S2, and represent first and second beam patterns
respectively. A set of complex weighting coefficients (Wo1 (k) *, Wo2 (k) *) and (Wc1 (k) *, Wc2
(k) *) are stored in the memory unit (MEM) (control unit CONT of FIG. 9A). Memory unit MEM
and signal Wij)). * Indicates a complex conjugate. The beamformer filtering unit (BFU) further
includes an adaptive beamformer (adaptive BF, ADBF) which provides an adaptive coefficient
βada (k) representing the adaptively determined beam pattern. By combining the fixed
beamformer of the beamformer filtering unit BFU with the adaptive beamformer, the (adaptive)
estimate of the resulting target signal Y is provided as Y = O-βadaC. A beamformer filtering unit
(BFU) is a voice activity detection that indicates whether (or with what probability) the input
signal (here O or one Si) contains audio content (eg speech) Adaptive beamformer (here based on
target-cancelled beamformer C) during the time segment where speech / speech is not indicated
(or its probability is low) by VAD1 noise estimate <σ c <2 >> It further comprises a voice activity
detector VAD1 providing a control signal V-N1 (for example based on the signal O or one of the
input signals Si) that allows updating.
[0148]
Thus, the (spatially filtered or beamformed) target signal estimate Y from the resulting
beamformer filtering unit can be expressed as: Y (k) = O (k)-. Beta.ada (k) .C (k) Y (k) = (Wo1 <*>.
S1 + Wo2 <*> S.2)-. Beta.ada (k). (Wc1 <*>. S1 + Wc2 <*> · S2)
[0149]
However, it may be computationally more advantageous to calculate only the actual resultant
weights provided to each microphone signal, rather than calculating the different beamformers
used to obtain the resulting signal.
[0150]
The embodiment of the post-filtering unit of FIG. 9B receives the input signal Y (spatial filtered
target signal estimate) and <σ c <2 >> (noise power spectrum estimate) and outputs the output
signal YBF ( Provide a reduced noise target signal estimate).
03-05-2019
45
Post-filtering unit PSTF is a noise reduction and correction unit that improves the noise power
spectrum estimate <σc <2> received from the beamformer filtering unit to provide an improved
noise power spectrum estimate <σ <2> N-COR is included. This improvement is derived from the
use of the speech activity detector VAD2 to indicate that there is a silence time-frequency unit in
the spatially filtered target signal estimate Y (see signal V-N2) It is a thing. The post-filtering unit
PSTF provides the target signal power spectrum estimate | Y | <2> and the recursive signal-tonoise ratio γ = | Y | <2> / <σ <2> respectively, amplitude squared (| · | <2>) A processing unit
and a division (· / ·) processing unit are further included. The post-filtering unit PSTF further
comprises a transformation unit Po2Pr, which implements the algorithm according to the present
disclosure, transforms the recursive signal-to-noise ratio estimate γ into a priori signal-to-noise
ratio estimate ζ. The post-filtering unit PSTF transforms the a priori signal-to-noise ratio
estimate ζ into the corresponding gain GNR to be applied to the spatially filtered target signal
estimate (here by the multiplication unit “X”) It further includes a transform unit SNR2G
configured to provide the resulting noise reduced target signal estimate YBF. For simplicity, FIG.
9B does not show the frequency index k and the time index k. However, for example, it is
assumed that the corresponding time frame can be used for processed signals such as, for
example, | Y n | <2>, <σ n <2 >>, γ n, 対 応 n, GNR, n.
[0151]
The multi-input noise reduction system includes a multi-input beamformer filtering unit BFU, and
the single channel post filtering unit PSTF can be implemented as described in [2], for example,
with the modifications proposed in this disclosure.
[0152]
In the embodiment of FIG. 9B, the noise power spectrum <σ <2> is based on a two microphone
beamformer (target canceled beamformer C), but may alternatively be based on single channel
noise estimates, eg modulation Analysis (e.g. voice activity detector).
[0153]
FIG. 10 is connected to the respective analysis filter banks FBA1 and FBA2 as described in
connection with FIG. 1B and the respective mixed electrical input frequency subband signals Y (n,
k) 1 and Y (n, n). k) An input stage (e.g. of a hearing aid) comprising microphones M1 and M2
providing 2 is shown.
03-05-2019
46
The electrical input signals Y (n, k) 1 and Y (n, k) 2 based on the first and second microphone
signals are for example (of the nth time frame) as described in connection with FIG. 1B. A multiinput (here 2) recursive signal-to-noise calculation unit (APSNR-M) providing multi-input
recursive SNR γ n, m is provided.
One of the two electrical input signals Y (n, k) 1 and Y (n, k) 2, or a third different electrical input
signal (eg, a beamforming signal, or the microphone of the opposite hearing aid or a separate
microphone, etc. A third microphone-based signal) (for example, as described in connection with
FIG. 1A), a single-input recursive signal pair that provides a single-input recursive SNR γ n, s (of
the n-th time frame) It is supplied to the noise calculation unit (APSNR-S). The two recursive SNRs
γ n, m and γ n, s are combined into a mixing unit MIX which generates a (resulting) recursive
signal to noise ratio γ n, res combined from these two recursive signal to noise ratios Supplied.
In general, the combination of two independent inductive estimates yields better estimates than
each estimate alone. Since the multichannel estimate γ n, m is usually more reliable than the
single channel estimate γ n, s, less smoothing is required for the multi channel estimate
compared to the single channel noise estimate . Thus, for the multi-mike recursive SNR estimate
γ n, m and the single-micro-ch recursive SNR estimate γ n, s different smoothing parameters ρ
(bias), λ (smoothing) and κ (bias) (FIG. 3, (See FIG. 4) is required. The mixture of the two
estimates providing the resulting inductive SNR estimate γ n, res can be provided, for example,
as a weighted sum of the two estimates γ n, m, γ n, s.
[0154]
In embodiments of the binaural hearing aid system, inductive SNR, a priori SNR, or noise
estimates or gains from the opposite hearing aid are transmitted to the same hearing aid for use.
[0155]
The a priori estimates depend on the inductive estimates from the contralateral hearing aid, the a
priori estimates, or the noise estimates (or gain estimates) in addition to the inductive estimates
from the ipsilateral hearing aid be able to.
Again, an improved a priori SNR estimate can be obtained by combining different independent
SNR estimates.
[0156]
03-05-2019
47
FIG. 11 illustrates an embodiment of a hearing aid that includes a BTE portion located behind the
user's ear and an ITE portion located within the user's ear canal, according to the present
disclosure.
[0157]
In FIG. 11, a BTE portion (BTE) adapted to be located behind the pinna and an output transducer
(eg a speaker / receiver) adapted to be located in the user's ear canal , SPK) and shows an
exemplary hearing aid (HD) configured as an in-ear receiver (RITE) type hearing aid (for example
illustrating the hearing aid (HD shown in FIG. 9A)) .
The BTE portion (BTE) and the ITE portion (ITE) are connected (eg, electrically connected) by a
connection element (IC). In the hearing aid embodiment of FIG. 11, two BTE parts (BTE)
respectively provide electrical input audio signals representing the input sound signal (SBTE)
from the environment (from the sound source S in the scenario of FIG. 11) Includes transducers
(here, microphones) (MBTE1, MBTE2). The hearing aid of FIG. 11 further comprises two radio
receivers (WLR1, WLR2) providing respective auxiliary audio and / or information signals
received directly. Hearing aids (HD) are functionally divided according to the intended
application, but configurable signal processing unit (SPU), beamformer filtering, coupled to each
other and to input / output units via conductors Wx It further includes a substrate (SUB) on
which is mounted a number of electronic components (analog, digital, passive etc), including
units (BFU) and memory units (MEM). The functional units (and other components) to be
mentioned may be divided into circuits and components (for example, in view of size, power
consumption, analog processing or digital processing, etc.) according to the target application For
example, it can be integrated within one or more integrated circuits, or as a combination of one
or more integrated circuits and one or more separate electronic components (eg, inductors,
capacitors, etc.). The configurable signal processing unit (SPU) provides an enhanced audio signal
(see signal ES in FIG. 9A) for presentation to the user. In the hearing aid device embodiment of
FIG. 11, the ITE part (ITE) converts the electrical signal (es in FIG. 9A) into an acoustic signal
(providing the acoustic signal SED to the ear drum, or contributing thereto) It includes an output
unit in the form of a speaker (receiver) (SPK). In one embodiment, the ITE portion further
includes an input unit having an input transducer (e.g., a microphone) (MITE) that provides an
electrical input audio signal representing the input sound signal SITE from the environment to or
within the ear canal. In another embodiment, the hearing aid may include only BTE microphones
(MBTE1, MBTE2). In yet another embodiment, the hearing aid may include an input unit (IT3)
located at a location other than the ear canal in combination with one or more input units located
in the BTE portion and / or the ITE portion.
03-05-2019
48
The ITE portion further includes an inductive element such as a dome (DO) which guides and
positions the ITE portion into the user's ear canal.
[0158]
The hearing aid (HD) illustrated in FIG. 11 is a portable device and further includes a battery
(BAT) that applies voltage to electronic components of the BTE portion and the ITE portion.
[0159]
The hearing aid (HD) comprises a directional microphone system (beamformer filtering unit
(BFU)) adapted to enhance the target sound source of the many sound sources in the local
environment of the user wearing the hearing aid device.
In one embodiment, the directional system detects (eg, adaptively detects) which direction a
particular microphone signal portion (eg, target portion and / or noise portion) originates from
(eg, adaptively detects), and / or present Are adapted to receive input on the target direction of
the user from a user interface (eg, a remote control device or a smartphone). The memory unit
(MEM) defines a predetermined or constant (or adaptively determined "constant") beam pattern
and a beamforming signal Y (see, eg, FIGS. 9A, 9B) according to the present disclosure It contains
a predetermined (or adaptively determined) complex frequency dependent constant.
[0160]
The hearing aid of FIG. 11 may constitute or form part of a hearing aid system and / or a
binaural hearing aid system according to the present disclosure.
[0161]
The hearing aid (HD) according to the present disclosure may be implemented, for example, in an
auxiliary device (AUX), for example as shown in FIG. 11, such as a remote control device
implemented as an application in a smartphone or other portable (or fixed) electronic device. It
can include a user interface UI.
03-05-2019
49
In the embodiment of FIG. 11, the screen of the user interface (UI) shows a smooth beamforming
application. Parameters that govern or influence the current smoothing of the signal-to-noise
ratio of the beamforming noise reduction system, here the parameters ((bias), λ (smoothing) (see
FIGS. 3 and 4) (See "Directivity.") It can be controlled by the Smoothing Beamforming app) (titled
'Configure smoothing parameters'). The bias parameter ρ can be set to a value between the
minimum (for example 0) and the maximum (for example 10 dB) via the slider. The screen shows
the current setting (5 dB here) at the position of the slider on the bar (monochrome) over a range
of configurable values. Similarly, the smoothing parameter can also be set to a value between the
minimum value (e.g. 0) and the maximum value (e.g. 1) via the slider. The screen shows the
current setting (here 0.6) at the position of the slider on the bar (monochrome) over a range of
configurable values. The arrows at the bottom of the screen allow switching to the previous and
next screen of the app, and the tab on the circular dot between the two arrows allows a menu to
allow selection of features of other apps or devices call. The parameters ρ and λ for smoothing
are not necessarily visible to the user. The set of 、, λ can be derived from a third parameter (eg,
a quiet to aggressive noise reduction bar or a setting via an environmental detector).
[0162]
The assistance device and the hearing aid are adapted to be able to communicate data
representing the currently selected smoothing parameter to the hearing aid, for example via a
wireless communication link (see dashed arrow WL2 in FIG. 11). The communication link WL2 is
implemented by a hearing aid (HD) and an appropriate antenna and transceiver in the auxiliary
device, indicated by the transceiver unit WLR2 in the hearing aid, for example
telecommunications such as Bluetooth or Bluetooth Low Energy (or similar technology) Can be
based on The communication link may be configured to provide one-way communication (eg,
from application to hearing aid) or two-way communication (eg, eg, audio and / or control signals
or information signals).
[0163]
The structural features of the device described above in the detailed description and / or claims
can be combined with the method steps if appropriately replaced by the corresponding process.
[0164]
The singular forms "a, an" and "the" which are used are intended to include the plural, as well, (ie,
means "at least one") unless the context clearly indicates otherwise Be done.
03-05-2019
50
Further, when the term "includes, includes, includes, and / or comprising" is used herein, the
recited feature, integer, step, action, element and / or component However, it should be
understood that it does not exclude the presence or addition of one or more other features,
integers, steps, operations, elements, components and / or groups thereof. Also, when an element
is said to be "connected" or "coupled" to another element, this element directly connects or
couples to the other element, unless explicitly stated otherwise. However, it should be understood
that intervening elements may be present. Furthermore, the terms "connected" or "coupled" as
used herein may also include "connected" or "coupled" wirelessly. As used herein, the term "and /
or" includes any and all combinations of one or more of the associated listed items. The steps of
any method disclosed are not limited to the exact order recited herein, unless explicitly stated
otherwise.
[0165]
References to features as "one embodiment", "an embodiment", "an embodiment", or "can be
included" throughout this specification are as described in connection with that embodiment. Is
meant to be included in at least one embodiment of the present disclosure. Furthermore, in one
or more embodiments of the present disclosure, these particular features, structures or
characteristics may also be suitably combined. The above description is provided to enable any
person skilled in the art to practice the various aspects described herein. Various modifications of
these aspects will be readily apparent to those skilled in the art, and the general principles
defined herein may be applied to other aspects.
[0166]
The claims are not limited to the embodiments shown herein, but rather the full scope consistent
with the wording of the claims is to be appreciated, and references to singular elements are
Unless stated as such, it does not mean "only", but rather is intended to mean "one or more".
Unless otherwise indicated, the term "some" means one or more than one.
[0167]
Accordingly, the scope of the present disclosure should be determined in view of the following
claims.
03-05-2019
51
[0168]
y (t) mix x (t) target sound v (t) noise M microphone FBA analysis filter bank Y (n, k) timefrequency domain signal | · | <2> amplitude square calculation unit NT noise estimation unit <σ
<2> Noise power spectral density estimated value ・ ・ ・ Combination unit SNR signal to noise
ratio γ n recursive SNR Po2Pr conversion unit 繹 n a priori SNR SNR 2 G gain conversion unit
GNR noise reduction gain YNR (n, k) noise reduction signal
03-05-2019
52
Документ
Категория
Без категории
Просмотров
0
Размер файла
86 Кб
Теги
jp2018014711
1/--страниц
Пожаловаться на содержимое документа