close

Вход

Забыли?

вход по аккаунту

?

el%3A19961076

код для вставкиСкачать
where M , N denote the number of rows in the cell matrix and the
row length of the cell matrix, respectively, and a, p the transition
probability of the state. Thus, the improved cell loss rate using the
proposed cell loss recovery method, denoted by P, is
position and amplitude of the preceding pitch pulse. The used
algorithm attempts to reduce the number of computations and
promotes the accuracy of pitch detection. The discrimination criterion is given by
P = Pi,,, - L l / M
Fig. 2 shows the improved cell loss rate with the proposed method
when the coding matrix ( M = 30, N = 40) is employed.
r,*Ampftr,*PPf > p
(1)
where I’,and rp are the similarity ratios of amplitude and position,
respectively, Ampf is the pitch amplitude, PPf is the pitch period,
and p is a threshold value. Then according to the acoustic phonetics and pitch information of Mandarin speech, the isolated Mandarin syllable is divided into three segments: consonant-segment,
vowel-segment, and residual-segment. In every segment, only one
representative frame was selected for speech recognition. The principle of selecting the representative frames is as follows:
(i)
In a consonant-segment, based on the experimental observation, the first representative frame is decided by selecting M sample points of the whole consonant part before the first pitch peak.
(ii) In a vowel-segment, we select N representative pitch peaks as
the second frame in place of the whole section.
(iii) In a residual-segment, to maintain the residual part of speech
signal, the final pitch peak to the end of the speech is chosen to be
the third frame.
The feature vectors of the LPC cepstrum are then obtained from
the three frames.
0 IEE 1996
29 April 1996
Electronics Letters Online No: 19961029
Hyo Tdek Lim (Department of Coniputer Science unci Engineering,
Dongseo University, Pusun, 61 7-716, Korea)
DaeHun Nyang and JooSeok Song (Department of Computer. Science,
Yonsei University, Seoul, 120-749, Korea)
References
, and SONG. : ‘Cell loss recovery method in B-ISDN/ATM
networks’, Electron. Lett., 1995, 31, (11), pp. 849-851
AYANOGLU, E ,
GITLIN, R.D.,
and OGUZ. N c : ‘Performance
improvement in broadband networks using forward error
correction for lost packet recovery’, J. High Speed Netn.orks 2,
1993, pp. 287-303
ITU-T: ‘Recommendation 1.363, B-ISDN ATM adaptation layer
(AAL) specification’. 1993
LIM,
andarin speech recognition using
d cepstral comparison in noisy
conditions
Shin-Lun T u n g and Yau-Tarng Juang
Indexing terms: Speech recognition, Cepstrul analysis
A new scheme is proposed that compensates for the effects of
noise in speech recognition systems. The new scheme was applied
to Mandarin speech recognition. Another scheme, based on
interpolation of the compensation vectors of several environments
for a particular environment that is not obtained during the
training phase, called interpolated SSDCN (ISSDCN), is also
presented. Experimental results show that the scheme performs
well under different SNR conditions.
Segment-bused SNR-dependent cepsival normalisation: In this Section, we describe the proposed algorithms, related to as SNRdependent normalisation procedures, which compensate for environmental variation based on the different SNR and segment
compensation vectors.
SSDCN A segment-based SNR-dependent cepstral normalisation
algorithm applies additive correction in the cepstral domain that
depends on the instantaneous SNR of the segmental frame for
pitch-based Mandarin speech recognition. When a Mandarin syllable from some unknown environment is input to the recognition
system, the system first determines which of the testing environments in the training data is most similar to the current testing
emironment. The compensation vectors from the chosen testing
environment are applied to normalise the utterance according to
the expression
+
3,f = Zsf T s f [ S N R ]
(2)
where sf is the segmental frame of the syllable index, SNR is the
signal to noise ratio of environment, R z, and r are the compensated cepstral, original cepstral and compensation vectors, respectively. The compensation vectors in the SSDCN are described as
follows
SET
( X i f - Z$G(S,f
Introduction: In recent years, speech recognition systeins that are
robust with respect to adverse environments have attracted an
increasing amount of interest. A variety of approaches have been
considered in the development of noisy-speech recognition systems
including techniques based on spectral subtraction [l], the use of a
comb filter [2],a family of distortion measures [3], and the use of
cepstral normalisation [4, 51, etc.. Among these many approaches,
a series of normalisation algorithms have been developed that
reduce the effects of environmental variations on recognition accuracy [4, 51. The normalisation algorithms based on cepstral comparison assume that differences between the training and testing
environments can be characterised by an additive correction to the
cepstral vectors that represent the speech.
In this Letter, we applied the cepstral normalisation algorithm
t o Mandarin speech and called the new scheme segment-based
SNR-dependent cepstral normalisation (SSDCN). However, the
testing environment does not closely resemble any single environment in the training set in some conditions. In that case, the interpolation of the Compensation vectors of several environments may
be more useful. Based on the above, an interpolated SSDCN (ISSDCN) algorithm is also proposed in this Letter.
~.S[SA-;R] =
’=’
-
SNR)
(3)
SET
6(S,f - SIVR)
2=1
where SET is the training set number, s , is~ ~the SNR value for the
segmental frame of the syllable s, and x,$,z , ; ~are the cepstral vectors of training and testing syllables, respectively.
ISSDCN. In cases where the testing environment does not resemble the training environments used to develop the compensation
vectors for SSDCN, interpolation of the compensation vectors of
several environments can be more beneficial than using a single
compensation vector. The interpolated compensation vectors are
obtained by interpolating several of the closer compensation vectors:
F
n=l
where ivtt is the weighting factor for the nth environment, P[SNR]
is the estimated compensation vector, and r[SNR,,,]is the compensation vector for the nth environment. The weighting factors for
each closer compensation environment are described as follows:
E
Pitch-based Mandarin speech recognition: The scheme focuses on
the pitch-based segmental model for Mandarin speech recognition.
The main process is to obtain the representative frames that are
important and necessary for speech recognition. First, we designed
a pulse-based pitch detector to extract the pitch period. The detector predicts the location of successive pitch pulses based on the
1542
SNR
w n= SiVR,
lSNR-SNR,I
t=l,zjn
E
(E- 1) C ISNR - S N R J
(5)
Z=1
In eqn. 5 , the first item is the compensation ratio and the second
item is the weighting ratio.
ELECTRONICS LETTERS
15th August 1996
Vol. 32
No. 17
that obtained without using it. An interpolated implenientation
(ISSDCN) of the algorithm is also described for application when
the acoustics of the testing enkironment are not in the training
sets. The experimental results show that the method is very effective when the compensation vector of the testing environment is
the combination of two neighbouring training environments. Furthermore, it is seen that if the training environments are chosen to
more closely resemble the test environment, then the recognition
results will be even better.
Acknowledgments: This work is supported by the National Science
Council of the Republic of Chinti under the contract NSC85-2213E008-028.
3 June 1996
0 IEE 1996
Electronics Lettess Online No: 19961076
Shin-Lun Tung and Yau-Tarng Juang (Department qf Electrical
Engineering, National Central University, Ckung Li, Taiwun 32054,
Republic of’ China)
References
‘Suppression of acoua:tic noise in speech using spectral
subtraction’. IEEE Trans., 1970, ASSP-27, (2), pp. 113-120
LIM, J.S., OPPENHEIM, A.V., andl BRAIDA, L.D.: ‘Evaluation O f an
adaptive filtering method for enhancing speech degraded by white
noise addition’, IEEE Trans., 19’78, ASSP-26, pp. 354-358
MANSOUR, D., and JUANG. B.H.: ‘A family of distortion measure
based upon projection operation for robust speech recognition’,
I E E E Trans., 1989, ASSP-37, pp. 1659-1671
L I U , r : ~ . , STERN. R.M., ACERO, A , and MORENO. P.J.: ‘Environment
normalization for robust speech. recognition using direct cepstral
comparison’. IEEE Int. Conf. Acoustic, Speech, and Signal
Processing, 1994, 2, pp. 61-64
LIU, F . H ,
STERN, R.M.,
ACERO, A.,
and MORENO.
P.J.:
‘Signal
processing for robust speech recognition’. Proc. ARPA Human
Language Technology Workshop, March 1994
BOLL, s.F.:
1%
Maximum
Minimum
Average
92.9
90.7
37
32
34.1
72.5
7
5
6.8
Noisy conditions at 25, 20, 15 and lOdB SNR were tested.
Table 1 shows the recognition results of a pitch-based Mandarin
speech model. We alternately chose each one of the 10 sets in the
database as the claimed testing set and the other nine sets as training sets. It is obvious that the performance of the speec.h system
degrades rapidly as noise increses. Table 2 summarises the experimental results obtained using the SSDCN compensate for the
noisy speech. By comparing the results of Tables 1 and 2, it is seen
that the SSDCN scheme achieves much better results. Also, the
use of the SSDCN increases the correction rate to >90‘%1for low
SNR. Table 3 summarises the similar results obtained using the
ISSDCN when the test environment is excluded from the set of
data used to develop the compensation vectors. From Table 3, it is
seen that for noisy speech, with the same SNR 18dB (columns 2
and 4), the interpolated compensation vector using 20 and 15dB
obtains a better correction rate than that using 20 and 10dB.
MSE tracking performiaince of DS/SS code
tracking scheme using an FIR adaptive filter
M.G. El-Tarhuni and A.U.H. Sheikh
Table 2 Performance of three-segmental model based on pitch
periods using SSDCN under different SNR conditions
1
Correction rate
clean
SNR
2QdB
25dB
Indexing terms: Spread spettsum communication, Aduptiw J1ter.r
15dB
lOdB
94.1
92
93.9
‘X
Maximum
Minimum
Average
E
2!:
~
The authors investigate the mean-square error (MSE) tracking
perfonnance of a DWSS code tracking system based on an
adaptive filtering technique originally proposed in [l]. The ability
of this scheme to track a fast linearly changing delay and delay
changing in a random walk model is presented. It is shown that
the proposed scheme outperforms the maximum likelihood
estimator (MLE) by -4dB. Also, the MSE results indicate that a
filter with small number of taps is recommended during tracking.
This supports the conclusions,made in [l] which was based on the
mean hold in time performance of the :iystem.
94.1
96.0
94.1
Table 3 Performance of three-segmental model based on pitch
periods using ISSDCN under different SNR conditions
Correction rate
~
1
Maximum
Minimum
Average
13dB
1
I
18dB
%
%>
Y”
98
94
96.4
95
94
94.6
96
92
94.3
85
88.0
1
Conclusions: We have proposed two cepstral normalisation methods and applied them to pitch-based Mandarin speech recognition
in noisy conditions. We found that the use of the ILSSDCN
increases the correction rate to >90% at low SNR compared to
ELECTRONICS LETTERS
15th August 1996
Vol. 32
Introduction: Code synchronisation is an essential requirement for
proper operation of direct sequence spread spectrum (DS/SS) systems. This task is usually developed over two stages: acquisition
(coarse alignment), and tracking (fine alignment). Acquisition is
initially used to bring the delay offset between the incoming signal
and the locally generated code ito within the pull-in range of the
tracking loop (usually one code symbol duration), and tracking is
then initiated to minimuse the delay offset error and to compensate
for the changes that may be caused by channel variations, code
Doppler, and clock instabilities. Usually a delay-locked loop
(DLL) or a tau-dither loop (TDL) is used in code tracking [2].
Another code tracking technique which uses an M-tap adaptive
filter has been introduced in [l] and has, besides robustness to
small fluctuations in received SNR, the advantage of using the
same filter in code acquisition a!?well [3]. Ln this Letter, the MSE
in the time delay estimation error ]produced by the proposed track-
No. 17
1543
Документ
Категория
Без категории
Просмотров
2
Размер файла
297 Кб
Теги
3a19961076
1/--страниц
Пожаловаться на содержимое документа