close

Вход

Забыли?

вход по аккаунту

?

5.9781600866852.0501.0506

код для вставкиСкачать
Downloaded by UNIVERSITY OF NEW SOUTH WALES (UNSW) on October 26, 2017 | http://arc.aiaa.org | DOI: 10.2514/5.9781600866852.0501.0506 | Book DOI: 10.2514/4.866852
Appendix D
Statistical Properties of Maximum
Likelihood Estimates
I. Asymptotic Consistency
^ ML are asymptotically consistent,
HE MAXIMUM likelihood estimates Q
^
that is, QML converges in probability to the true values Q. In the following,
we investigate this property.
We know from the properties of the probability functions that
ð
p(zjQ) dz ¼ 1
(D.1)
T
Partial differentiation of Eq. (D.1) with respect to Q, and interchanging the order
of integration and differentiation assuming sufficient regularity conditions,1
yields
ð
@p(zjQ)
dz ¼ 0
(D.2)
@Q
Equation (D.2) can be rewritten as
ð
ð
1 @p(zjQ)
@ ln p(zjQ)
p(zjQ)dz ¼
p(zjQ)dz ¼ 0
p(zjQ) @Q
@Q
(D:3)
or equivalently,
@ ln p(zjQ)
E
¼0
@Q
(D.4)
Differentiating Eq. (D.3) and rewriting yields
)
ð (
@ ln p(zjQ) @ ln p(zjQ) T @2 ln p(zjQ)
p(zjQ) dz ¼ 0
þ
@Q
@Q
@Q2
501
(D.5)
Downloaded by UNIVERSITY OF NEW SOUTH WALES (UNSW) on October 26, 2017 | http://arc.aiaa.org | DOI: 10.2514/5.9781600866852.0501.0506 | Book DOI: 10.2514/4.866852
502
FLIGHT VEHICLE SYSTEM IDENTIFICATION
or
(
J¼E
@ ln p(zjQ)
@Q
@ ln p(zjQ)
@Q
T )
2
@ ln p(zjQ)
¼E @Q2
(D.6)
The Fisher information matrix J, defined in Eq. (D.6) is generally positive
definite. However, if the observations are independent of Q, that is, p(zjQ)
is not a function of Q as assumed, then J in this case will reduce to zero.
In practice, this implies that it would not be possible to estimate Q from
sample observations which do not contain information about Q.
The Taylor series expansion of the term ½@ ln p(zjQ)=@Q in the above
^ ML , leads to
equation about the true values Q evaluated at Q
^ ML ) @ ln p(zjQ) @2 ln p(zjQ )
@ ln p(zjQ
^ ML Q)
¼
þ
(Q
@Q
@Q
@Q2
(D:7)
^ ML ; 0 l 1.
where Q ¼ lQ þ (1 l)Q
^ ML is the solution of the likelihood equation, equating Eq. (D.7) to
Since Q
zero yields
@ ln p(zjQ)
@2 ln p(zjQ ) ^
(QML Q)
¼
@Q
@Q2
(D:8)
We recall our assumption that the measurements at different time points are
assumed to be statistically independent. This provides
"
#
N
Y
@ ln p(zjQ)
@
p(zk jQ)
¼
ln
@Q
@Q
k¼1
"
#
N
@ X
ln p(zk jQ)
¼
@Q k¼1
¼
(D.9)
N
X
@
ln p(zk jQ)
@Q
k¼1
Similarly,
N
@2 ln p(zjQ) X
@2 ln p(zk jQ)
¼
2
@Q
@Q2
k¼1
(D.10)
Now, from Eqs. (D.9) and (D.10), the strong law of large numbers, which indicates that the sample average converges in probability to the ensemble
Downloaded by UNIVERSITY OF NEW SOUTH WALES (UNSW) on October 26, 2017 | http://arc.aiaa.org | DOI: 10.2514/5.9781600866852.0501.0506 | Book DOI: 10.2514/4.866852
APPENDIX D
503
average, yields
N
1X
@ ln p(zk jQ)
@ ln p(zk jQ)
! E
N k¼1
@Q
@Q
(D:11)
2
N
1X
@2 ln p(zk jQ)
@ ln p(zk jQ)
!
E
N k¼1
@Q2
@Q2
(D.12)
and
Further it can be shown that the likelihood function is concave in Q.2,3 Hence,
it is appropriate to assume that the matrix E{@2 ln p(zk jQ)=@Q2 } is positive
definite. From Eqs. (D.8), (D.11), and (D.12) it directly follows that
^ ML ! Q
Q
(D.13)
^ ML is consistent.
with probability of one. Thus, Q
II.
Asymptotic Normality
The asymptotic normality implies that for large number of data points N, the
^ ML converge to normal distribution, that is,
maximum likelihood estimates Q
pffiffiffiffi
^ ML Q) ! r1 N (0, J 1 )
N (Q
(D.14)
where J is the average Fisher information matrix per sample and Q the true
parameter values. In Eq. (A.4.14) we have used the standard notation used in
statistics which implies that the term on the left-hand side tends to random
1
variables denoted by r1 , having N (0, J ) distribution, that is, normal (Gaussian)
1
distribution with zero mean and variance J . In the following we investigate this
property.
^ ML )=@Q about the true parameter
The Taylor series expansion of ½@ ln p(zjQ
value Q yields
^ ML ) @ ln p(zjQ) @2 ln p(zjQ)
@ ln p(zjQ
^ ML Q)
¼
þ
(Q
@Q
@Q
@Q2
þ higher order terms
(D.15)
^ ML are asymptotically consistent, higher-order
Using the property that Q
^ ML satisfies the likelihood equation, equating
terms can be neglected. Since Q
Eq. (D.15) to zero gives
1 @ ln p(zjQ) pffiffiffiffi
1 @2 ln p(zjQ) ^
pffiffiffiffi
¼ N (QML Q)
@Q
N
N
@Q2
(D.16)
Downloaded by UNIVERSITY OF NEW SOUTH WALES (UNSW) on October 26, 2017 | http://arc.aiaa.org | DOI: 10.2514/5.9781600866852.0501.0506 | Book DOI: 10.2514/4.866852
504
FLIGHT VEHICLE SYSTEM IDENTIFICATION
Once again, using Eqs. (D.6) and (D.12), the strong law of large numbers leads to
2
1 @2 ln p(zjQ)
@ ln p(zk jQ)
!
E
¼J
N
@Q2
@Q2
(D.17)
where J ¼ J=N represents the average information matrix per sample, and J the
Fisher information matrix.
The assumption that the samples zk are independent allows the left-hand side
of Eq. (D.16) to be written as
N
1 @ ln p(zjQ)
1 X
@ ln p(zk jQ)
pffiffiffiffi
¼ pffiffiffiffi
@Q
@Q
N
N k¼1
(D.18)
Now, it is already known from Eqs. (D.4) and (D.6) that
@ ln p(zk jQ)
E
¼0
@Q
(D.19)
and
(
E
@ ln p(zk jQ)
@Q
@ ln p(zk jQ)
@Q
T )
¼J
(D.20)
Using Eqs. (D.19) and (D.20), and the central limit theorem, it follows from
Eq. (D.18) that
1 @ ln p(zjQ)
pffiffiffiffi
! r2 N (0, J)
@Q
N
(D.21)
Now, Eqs. (D.16) and (D.17) yield
pffiffiffiffi
^ ML Q) ! r2 N (0, J)
N J(Q
(D.22)
which can be equivalently expressed as:
pffiffiffiffi
^ ML Q) ! r1 N (0, J 1 )
N (Q
(D.23)
^ obtained
The property of asymptotic normality implies that the estimates Q
from different sets of data samples corresponding to different experiments are
clustered around Q with a normal distribution.
Downloaded by UNIVERSITY OF NEW SOUTH WALES (UNSW) on October 26, 2017 | http://arc.aiaa.org | DOI: 10.2514/5.9781600866852.0501.0506 | Book DOI: 10.2514/4.866852
APPENDIX D
III.
505
Asymptotic Efficiency
^ ML are asymptotically efficient in the
The maximum likelihood estimates Q
sense that they attain Cramér– Rao lower bounds. In the following we study
the property of asymptotic efficiency.
^ ML , consider a general estimator
To establish the asymptotic efficiency of Q
^
Q(z),
not necessarily the maximum likelihood. This gives
ð
^
^
E{Q} ¼ Qp(zjQ)dz
¼ Q þ b(Q)
(D.24)
where Q represents the true values of the parameters and b(Q) the bias that
may result in the estimates.
Differentiation of Eq. (D.24) yields
@p(zjQ) T
@b(Q)
^
Q
dz ¼ I þ
@Q
@Q
(D.25)
T
^ @ ln p(zjQ) p(zjQ)dz ¼ I þ @b(Q)
Q
@Q
@Q
(D.26)
ð
which can be rewritten as
ð
Once again starting from the basic relation,
ð
p(zjQ)dz ¼ 1
(D.27)
differentiation of Eq. (D.27) yields
ð
@p(zjQ)
dz ¼ 0
@Q
(D.28)
which can be rewritten as
ð
ð
1 @p(zjQ)
@ ln p(zjQ)
p(zjQ)dz ¼ 0
p(zjQ)dz ¼
p(zjQ) @Q
@Q
(D.29)
^ that is, Q ¼ E{Q},
^ and
Multiplying Eq. (D.29) with Q, the expected value of Q,
subtracting from Eq. (D.26) gives
ð
T
^ Q) @ ln p(zjQ) p(zjQ)dz ¼ I þ @b(Q)
(Q
@Q
@Q
(D.30)
Downloaded by UNIVERSITY OF NEW SOUTH WALES (UNSW) on October 26, 2017 | http://arc.aiaa.org | DOI: 10.2514/5.9781600866852.0501.0506 | Book DOI: 10.2514/4.866852
506
FLIGHT VEHICLE SYSTEM IDENTIFICATION
Using the Cauchy –Schwarz inequality for two integrable quadratic functions f (x)
and g(x), namely
ð
2 ð
ð
(D.31)
f (x)g(x) dx f 2 (x) dx g2 (x) dx 0
it follows from Eq. (D30) that
ð
(ð )
T
@
ln
p(zjQ)
@
ln
p(zjQ)
^ Q) p(zjQ) dz
^ Q)(Q
p(zjQ) dz
(Q
@Q
@Q
@b(Q) 2
Iþ
@Q
T
(D.32)
However, the second term on the left-hand side of the inequality is defined as the
information matrix J.
Using Eqs. (D.6) and (D.24), Eq. (D.32) leads to
^ Q)(Q
^ Q)T} I þ @b(Q) J 1
cov(Q) ¼ E{(Q
@Q
(D.33)
The expression in Eq. (D.33) is called the Cramér– Rao inequality.
Since the maximum likelihood estimates are asymptotically bias free, it
follows that
^ ML } ¼ Q
E{Q
and
^ ML ) ¼ 0
b(Q
(D.34)
Therefore, Eq. (D.33) yields
^ ML ) J 1
cov(Q
(D.35)
^ ML are asymptotically conIt has already been shown in Secs. I and II that Q
^ ML
sistent and normal. Combining these results with Eq. (D.35), it follows that Q
is asymptotically efficient in the sense of achieving the Cramér –Rao lower
bound.
The property of asymptotic efficiency is of practical significance. It implies
that the maximum likelihood estimator makes efficient use of the available
data. The Cramér– Rao lower bound indicates the theoretically maximum
achievable accuracy of the estimates.
References
1
Cramér, H., Mathematical Methods of Statistics, Princeton University Press, Princeton,
NJ, 1946.
2
Kashyap, R. L., “Maximum Likelihood Identification of Stochastic Linear systems,”
IEEE Transactions on Automatic Control, Vol. AC-15, No. 1, 1970, pp. 25 – 34.
3
Wilks, S. S., Mathematical Statistics, John Wiley & Sons, New York, 1962.
Документ
Категория
Без категории
Просмотров
2
Размер файла
126 Кб
Теги
0501, 0506, 9781600866852
1/--страниц
Пожаловаться на содержимое документа