Забыли?

?

# 9789813148963 0001

код для вставкиСкачать
```July 7, 2017
8:11
Handbook of Medical Statistics
9.61in x 6.69in
b2736-ch01
CHAPTER 1
by 80.82.77.83 on 10/25/17. For personal use only.
PROBABILITY AND PROBABILITY
DISTRIBUTIONS
Jian Shi?
1.1. The Axiomatic Definition of Probability1,2
There had been various de?nitions and methods of calculating probability
at the early stage of the development of probability theory, such as those of
classical probability, geometric probability, the frequency and so on. In 1933,
Kolmogorov established the axiomatic system of probability theory based on
the measure theory, which laid the foundation of modern probability theory.
The axiomatic system of probability theory: Let ? be the set of point ?,
and F be the set of subset A of ?. F is called an ?-algebra of ? if it satis?es
the conditions:
(i) ? ? F;
(ii) if A ? F, then its complement set Ac ? F;
(iii) if An ? F for n = 1, 2, . . ., then ??
n=1 An ? F.
Let P (A)(A ? F) be a real-valued function on the ?-algebra F. Suppose
P ( ) satis?es:
(1) 0 ? P (A) ? 1 for every A ? F;
(2) P (?) = 1;
?
(3) P (??
n=1 An ) =
n=1 P (An ) holds for An ? F, n = 1, 2, . . ., where
Ai ? Aj = ? for i = j, and ? is the empty set.
Then, P is a probability measure on F, or probability in short. In addition,
a set in F is called an event, and (?, F, P ) is called a probability space.
Some basic properties of probability are as follows:
? Corresponding
author: jshi@iss.ac.cn
1
page 1
July 7, 2017
8:11
Handbook of Medical Statistics
9.61in x 6.69in
b2736-ch01
J. Shi
2
1. P (?) = 0;
2. For events A and B, if B ? A, then P (A ? B) = P (A) ? P (B), P (A) ?
P (B), and particularly, P (Ac ) = 1 ? P (A);
3. For any events A1 , . . . , An and n ? 1, there holds
P (?ni=1 Ai ) ?
n
P (Ai );
i=1
4. For any events A and B, there holds
by 80.82.77.83 on 10/25/17. For personal use only.
P (A ? B) = P (A) + P (B) ? P (A ? B)
Suppose a variable X may represent di?erent values under di?erent conditions due to accidental, uncontrolled factors of uncertainty and randomness,
but the probability that the value of X falls in a certain range is ?xed, then
X is a random variable.
The random variable X is called a discrete random variable if it represents only a ?nite or countable number of values with ?xed probabilities.
Suppose X takes values x1 , x2 , . . . with probability pi = P {X = xi } for
i = 1, 2, . . ., respectively. Then, it holds that:
(1) pi ? 0, i = 1, 2, . . .; and
?
(2)
i=1 pi = 1.
The random variable X is called a continuous random variable if it can
represent values from the entire range of an interval and the probability for
X falling into any sub-interval is ?xed.
For a continuous random variable X, if there exists a non-negative integrable function f (x), such that
b
f (x)dx,
P {a ? X ? b} =
a
holds for any ?? < a < b < ?, and
?
f (x)dx = 1,
??
then f (x) is called the density function of X.
For a random variable X, if F (x) = P {X ? x} for ?? < x < ?,
then F (x) is called the distribution function of X. When X is a discrete
random variable, its distribution function is F (x) =
i:xi ?x pi and similarly, when
x X is a continuous random variable, its distribution function is
F (x) = ?? f (t)dt.
page 2
July 7, 2017
8:11
Handbook of Medical Statistics
9.61in x 6.69in
b2736-ch01
Probability and Probability Distributions
3
1.2. Uniform Distribution2,3,4
If the random variable X can take values from the interval [a, b] and the
probabilities for X taking each value in [a, b] are equal, then we say X
by 80.82.77.83 on 10/25/17. For personal use only.
d
follows the uniform distribution over [a, b] and denote it as X = U (a, b). In
particular, when a = 0 and b = 1, we say X follows the standard uniform
distribution U (0, 1). The uniform distribution is the simplest continuous distribution.
d
If X = U (a, b), then the density function of X is
1
b?a , if a ? x ? b,
f (x; a, b) =
0,
otherwise,
and the distribution function of X is
?
?
0,
x < a,
?
?
x?a
F (x; a, b) = b?a , a ? x ? b,
?
?
? 1,
x > b.
A uniform distribution has the following properties:
d
1. If X = U (a, b), then the k-th moment of X is
E(X k ) =
bk+1 ? ak+1
,
(k + 1)(b ? a)
k = 1, 2, . . .
d
2. If X = U (a, b), then the k-th central moment of X is
0,
when k is odd,
k
E((X ? E(X)) ) = (b?a)k
, when k is even.
2k (k+1)
d
3. If X = U (a, b), then the skewness of X is s = 0 and the kurtosis of X is
? = ?6/5.
d
4. If X = U (a, b), then its moment-generating function and characteristic
function are
M (t) = E(etX ) =
?(t) = E(eitX ) =
respectively.
ebt ? eat
,
(b ? a)t
eibt ? eiat
,
i(b ? a)t
and
page 3
July 7, 2017
8:11
Handbook of Medical Statistics
4
9.61in x 6.69in
b2736-ch01
J. Shi
by 80.82.77.83 on 10/25/17. For personal use only.
5. If X1 and X2 are independent and identically distributed random variables both with the distribution U (? 12 , 12 ), then the density function of
X = X1 + X2 , is
1 + x, ?1 ? x ? 0,
f (x) =
1 ? x, 0 ? x ? 1.
This is the so-called ?Triangular distribution?.
6. If X1 , X2 , and X3 are independent and identically distributed random
variables with common distribution U (? 12 , 12 ), then the density function
of X = X1 + X2 + X3 is
?1
3 2
3
1
?
2 (x + 2 ) , ? 2 ? x ? ? 2 ,
?
?
?
?
? 3 ? x2 ,
? 12 < x ? 12 ,
4
f (x) =
1
3 2
1
3
?
?
?
2 (x ? 2 ) ,
2 < x ? 2,
?
?
?
0,
otherwise.
The shape of the above density function resembles that of a normal density function, which we will discuss next.
d
d
7. If X = U (0, 1), then 1 ? X = U (0, 1).
8. Assume that a distribution function F is strictly increasing and continud
ous, F ?1 is the inverse function of F , and X = U (0, 1). In this case, the
distribution function of the random variable Y = F ?1 (X) is F .
In stochastic simulations, since it is easy to generate pseudo random numbers
of the standard uniform distribution (e.g. the congruential method), pseudo
random numbers of many common distributions can therefore be generated
using property 8, especially for cases when inverse functions of distributions
have explicit forms.
1.3. Normal Distribution2,3,4
If the density function of the random variable X is
(x ? х)2
1
x?х
exp ?
=?
,
?
?
2? 2
2??
where ?? < x, х < ? and ? > 0, then we say X follows the normal
d
distribution and denote it as X = N (х, ? 2 ). In particular, when х = 0 and
? = 1, we say that X follows the standard normal distribution N (0, 1).
page 4
July 7, 2017
8:11
Handbook of Medical Statistics
9.61in x 6.69in
b2736-ch01
Probability and Probability Distributions
5
d
If X = N (х, ? 2 ), then the distribution function of X is
x x?х
t?х
?
=
dt.
?
?
?
??
If X follows the standard normal distribution, N (0, 1), then the density and
distribution functions of X are ?(x) and ?(x), respectively.
The Normal distribution is the most common continuous distribution
and has the following properties:
d
by 80.82.77.83 on 10/25/17. For personal use only.
1. If X = N (х, ? 2 ), then Y =
d
X?х
?
d
d
= N (0, 1), and if X = N (0, 1), then
Y = a + ?X = N (a, ? 2 ).
Hence, a general normal distribution can be converted to the standard
normal distribution by a linear transformation.
d
2. If X = N (х, ? 2 ), then the expectation of X is E(X) = х and the variance
of X is Var(X) = ? 2 .
d
3. If X = N (х, ? 2 ), then the k-th central moment of X is
0,
k is odd,
E((X ? х)k ) =
k!
k
? , k is even.
2k/2 (k/2)!
d
4. If X = N (х, ? 2 ), then the moments of X are
E(X 2k?1 ) = ? 2k?1
k
i=1
(2k ? 1)!!х2i?1
,
(2i ? 1)!(k ? i)!2k?i
and
2k
E(X ) = ?
2k
k
i=0
(2k)!х2i
(2i)!(k ? i)!2k?i
for k = 1, 2, . . ..
d
5. If X = N (х, ? 2 ), then the skewness and the kurtosis of X are both 0, i.e.
s = ? = 0. This property can be used to check whether a distribution is
normal.
d
6. If X = N (х, ? 2 ), then the moment-generating function and the characteristic function of X are M (t) = exp{tх + 12 t2 ? 2 } and ?(t) = exp
{itх ? 12 t2 ? 2 }, respectively.
d
7. If X = N (х, ? 2 ), then
d
a + bX = N (a + bх, b2 ? 2 ).
page 5
July 7, 2017
8:11
Handbook of Medical Statistics
9.61in x 6.69in
b2736-ch01
J. Shi
6
d
8. If Xi = N (хi , ?i2 ) for 1 ? i ? n, and X1 , X2 , . . . , Xn are mutually independent, then
n
n
n
d
2
Xi = N
хi ,
?i .
i=1
i=1
i=1
by 80.82.77.83 on 10/25/17. For personal use only.
9. If X1 , X2 , . . . , Xn represent a random sample of the population N (х, ? 2 ),
2
d
then the sample mean X?n = n1 ni=1 Xi satis?es X?n = N (х, ?n ).
The central limit theorem: Suppose that X1 , . . . , Xn are independent and
identically distributed random variables, and that х = E(X1 ) and 0 < ? 2 =
?
Var(X1 ) < ?, then the distribution of Tn = n(X?n ?х)/? is asymptotically
standard normal when n is large enough.
The central limit theorem reveals that limit distributions of statistics in
many cases are (asymptotically) normal. Therefore, the normal distribution
is the most widely used distribution in statistics.
The value of the normal distribution is the whole real axis, i.e. from
negative in?nity to positive in?nity. However, many variables in real problems take positive values, for example, height, voltage and so on. In these
cases, the logarithm of these variables can be regarded as being normally
distributed.
d
Log-normal distribution: Suppose X > 0. If ln X ? N (х, ? 2 ), then we
d
say X follows the log-normal distribution and denote it as X ? LN (х, ? 2 ).
1.4. Exponential Distribution2,3,4
If the density function of the random variable X is
?e??x , x ? 0,
f (x) =
0,
x < 0,
where ? > 0, then we say X follows the exponential distribution and denote
d
it as X = E(?). Particularly, when ? = 1, we say X follows the standard
exponential distribution E(1).
d
If X = E(?), then its distribution function is
1 ? e??x , x ? 0,
F (x; ?) =
0,
x < 0.
Exponential distribution is an important distribution in reliability. The life
of an electronic product generally follows an exponential distribution. When
page 6
July 7, 2017
8:11
Handbook of Medical Statistics
9.61in x 6.69in
b2736-ch01
Probability and Probability Distributions
7
the life of a product follows the exponential distribution E(?), ? is called the
failure rate of the product.
Exponential distribution has the following properties:
d
1. If X = E(?), then the k-th moment of X is E(X k ) = k??k , k = 1, 2, . . . .
d
2. If X = E(?), then E(X) = ??1 and Var(X) = ??2 .
d
3. If X = E(?), then its skewness is s = 2 and its kurtosis is ? = 6.
d
4. If X = E(?), then the moment-generating function and the characteris?
?
for t < ? and ?(t) = ??it
, respectively.
tic function of X are M (t) = ??t
d
d
by 80.82.77.83 on 10/25/17. For personal use only.
5. If X = E(1), then ??1 X = E(?) for ? > 0.
d
6. If X = E(?), then for any x > 0 and y > 0, there holds
P {X > x + y|X > y} = P {X > x}.
This is the so-called ?memoryless property? of exponential distribution.
If the life distribution of a product is exponential, no matter how long
it has been used, the remaining life of the product follows the same
distribution as that of a new product if it does not fail at the present
time.
d
7. If X = E(?), then for any x > 0, there hold E(X|X > a) = a + ??1 and
Var(X|X > a) = ??2 .
8. If x and y are independent and identically distributed as E(?), then
min(X, Y ) is independent of X ? Y and
d
{X|X + Y = z} ? U (0, z).
9. If X1 , X2 , . . . , Xn are random samples of the population E(?), let
X(1,n) ? X(2,n) ? и и и ? X(n,n) be the order statistics of X1 , X2 , . . . , Xn .
Write Yk = (n ? k + 1)(X(k,n) ? X(k?1,n) ), 1 ? k ? n, where X(0,n) = 0.
Then, Y1 , Y2 , . . . , Yn are independent and identically distributed as E(?).
10. If X1 , X2 , . . . , Xn are random samples of the population of E(?), then
n
d
i=1 Xi ? ?(n, ?), where ?(n, ?) is the Gamma distribution in Sec. 1.12.
d
d
11. If then Y ? U (0, 1), then X = ? ln(Y ) ? E(1). Therefore, it is easy
to generate random numbers with exponential distribution through uniform random numbers.
1.5. Weibull Distribution2,3,4
If the density function of the random variable X is
?
?
??1 exp{? (x??) }, x ? ?,
? (x ? ?)
?
f (x; ?, ?, ?) =
0,
x < ?,
page 7
July 7, 2017
8:11
Handbook of Medical Statistics
9.61in x 6.69in
b2736-ch01
J. Shi
8
d
then we say X follows the Weibull distribution and denote it as X ?
W (?, ?, ?). Where ? is location parameter, ? > 0 is shape parameter, ? > 0,
is scale parameter. For simplicity, we denote W (?, ?, 0) as W (?, ?).
Particularly, when ? = 0, ? = 1, Weibull distribution W (1, ?) is transformed into Exponential distribution E(1/?).
d
by 80.82.77.83 on 10/25/17. For personal use only.
If X ? W (?, ?, ?), then its distribution function is
?
}, x ? ?,
1 ? exp{? (x??)
?
F (x; ?, ?, ?) =
0,
x < ?,
Weibull distribution is an important distribution in reliability theory. It is
often used to describe the life distribution of a product , such as electronic
product and wear product.
Weibull distribution has the following properties:
d
1. If X ? E(1), then
d
Y = (X?)1/? + ? ? W (?, ?, ?)
Hence, Weibull distribution and exponential distribution can be converted
to each other by transformation.
d
2. If X ? W (?, ?), then the k-th moment of X is
k
? k/? ,
E(X k ) = ? 1 +
?
where ?(и) is the Gamma function.
d
3. If X ? W (?, ?, ?), then
1
? 1/? + ?,
E(X) = ? 1 +
?
1
2
2
?? 1+
? 2/? .
Var(X) = ? 1 +
?
?
4. Suppose X1 , X2 , . . . , Xn are mutually independent and identically distributed random variables with common distribution W (?, ?, ?), then
d
X1,n = min(X1 , X2 , . . . , Xn ) ? W (?, ?/n, ?),
d
d
whereas, if X1,n ? W (?, ?/n, ?), then X1 ? W (?, ?, ?).
page 8
July 7, 2017
8:11
Handbook of Medical Statistics
9.61in x 6.69in
Probability and Probability Distributions
b2736-ch01
9
1.5.1. The application of Weibull distribution in reliability
by 80.82.77.83 on 10/25/17. For personal use only.
The shape parameter ? usually describes the failure mechanism of a product.
Weibull distributions with ? < 1 are called ?early failure? life distributions,
Weibull distributions with ? = 1 are called ?occasional failure? life distributions, and Weibull distributions with ? > 1 are called ?wear-out (aging)
failure? life distributions.
d
If X ? W (?, ?, ?) then its reliability function is
?
}, x ? ?,
exp{? (x??)
?
R(x) = 1 ? F (x; ?, ?, ?) =
1,
x < ?.
When the reliability R of a product is known, then
xR = ? + ? 1/? (? ln R)1/?
is the Q-percentile life of the product.
If R = 0.5, then x0.5 = ? + ? 1/? (ln 2)1/? is the median life; if R = e?1 ,
then xe?1 ? + ? 1/? is the characteristic life; R = exp{??? (1 + ??1 )}, then
xR = E(X), that is mean life.
The failure rate of Weibull distribution W (?, ?, ?) is
?
(x ? ?)??1 , x ? ?,
f (x; ?, ?, ?)
= ?
?(x) =
R(x)
0,
x < ?.
The mean rate of failure is
1
??(x) =
x??
x
?(t)dt =
(x??)??1
,
?
x ? ?,
0,
x < ?.
?
Particularly, the failure rate of Exponential distribution E(?) = W (1, 1/?)
is constant ?.
1.6. Binomial Distribution2,3,4
We say random variable follows the binomial distribution, if it takes discrete
values and
P {X = k} = Cnk pk (1 ? p)n?k ,
k = 0, 1, . . . , n,
where n is positive integer, Cnk is combinatorial number, 0 ? p ? 1. We
d
denote it as X ? B(n, p).
Consider n times independent trials, each with two possible outcomes
?success? and ?failure?. Each trial can only have one of the two outcomes.
The probability of success is p. Let X be the total number of successes in this
page 9
July 7, 2017
8:11
Handbook of Medical Statistics
9.61in x 6.69in
b2736-ch01
J. Shi
10
d
by 80.82.77.83 on 10/25/17. For personal use only.
n trials, then X ? B(n, p). Particularly, if n = 1, B(1, p) is called Bernoulli
distribution or two-point distribution. It is the simplest discrete distribution.
Binomial distribution is a common discrete distribution.
d
If X ? B(n, p), then its density function is
min([x],n)
Cnk pk q n?k , x ? 0,
k=0
B(x; n, p) =
0,
x < 0,
where [x] is integer part of x, q = 1 ? p.
x
Let Bx (a, b) = 0 ta?1 (1?t)b?1 dt be the incomplete Beta function, where
0 < x < 1, a > 0, b > 0, then B(a, b) = B1 (a, b) is the Beta function. Let
Ix (a, b) = Bx (a, b)/B(a, b) be the ratio of incomplete Beta function. Then
the binomial distribution function can be represented as follows:
B(x; n, p) = 1 ? Ip (x + 1, n ? [x]),
0 ? x ? n.
Binomial distribution has the following properties:
1. Let b(k; n, p) = Cnk pk q n?k for 0 ? k ? n. If k ? [(n+1)p], then b(k; n, p) ?
b(k ? 1; n, p); if k > [(n + 1)p], then b(k; n, p) < b(k ? 1; n, p).
2. When p = 0.5, Binomial distribution B(n, 0.5) is a symmetric distribution; when p = 0.5, Binomial distribution B(n, p) is asymmetric.
3. Suppose X1 , X2 , . . . , Xn are mutually independent and identically distributed Bernoulli random variables with parameter p, then
Y =
n
d
Xi ? B(n, p).
i=1
d
4. If X ? B(n, p), then
E(X) = np,
Var(X) = npq.
d
5. If X ? B(n, p), then the k-th moment of X is
k
E(X ) =
k
S2 (k, i)Pni pi ,
i=1
where S2 (k, i) is the second order Stirling number, Pnk is number of permutations.
d
6. If X ? B(n, p), then its skewness is s = (1 ? 2p)/(npq)1/2 and kurtosis is
? = (1 ? 6pq)/(npq).
d
7. If X ? B(n, p), then the moment-generating function and the characteristic function of X are M (t) = (q + pet )n and ?(t) = (q + peit )n ,
respectively.
page 10
July 7, 2017
8:11
Handbook of Medical Statistics
9.61in x 6.69in
b2736-ch01
Probability and Probability Distributions
11
8. When n and x are ?xed, the Binomial distribution function b(x; n, p) is a
monotonically decreasing function with respect to p(0 < p < 1).
d
9. If Xi ? B(ni , p) for 1 ? i ? k, and X1 , X2 , . . . , Xk are mutually indepen
d
dent, then X = ki=1 Xi ? B( ki=1 ni , p).
1.7. Multinomial Distribution2,3,4
by 80.82.77.83 on 10/25/17. For personal use only.
If an n(n ? 2)-dimensional random vector X = (X1 , . . . , Xn ) satis?es the
following conditions:
(1) Xi ? 0, 1 ? n, and ni=1 Xi = N ;
(2) Suppose m1 , m2 , . . . , mn are any non-negative integers with ni=1 mi =
N and the probability of the following event is P {X1 = m1 , . . . , Xn =
N!
i
?ni=1 pm
mn } = m1 !иииm
i ,
n!
where pi ? 0, 1 ? i ? n,
n
i=1 pi
= 1, then we say X follows the multinomial
d
distribution and denote it as X ? P N (N ; p1 , . . . , pn ).
Particularly, when n = 2, multinomial distribution degenerates to binomial distribution.
Suppose a jar has balls with n kinds of colors. Every time, a ball is drawn
randomly from the jar and then put back to the jar. The probability for the
ith color ball being drawn is pi , 1 ? i ? n, ni=1 pi = 1. Assume that balls
are drawn and put back for N times and Xi is denoted as the number of
drawings of the ith color ball, then the random vector X = (X1 , . . . , Xn )
follows the Multinomial distribution P N (N ; p1 , . . . , pn ).
Multinomial distribution is a common multivariate discrete distribution.
Multinomial distribution has the following properties:
d
?
=
1. If (X1 , . . . , Xn ) ? P N (N ; p1 , . . . , pn ), let Xi+1
n
p
,
1
?
i
<
n,
then
j=i+1 i
n
?
j=i+1 Xi , pi+1
=
d
? ) ? P N (N ; p , . . . , p , p? ),
(i) (X1 , . . . , Xi , Xi+1
1
i i+1
d
(ii) Xi ? B(N, pi ), 1 ? i ? n.
More generally, let 1 = j0 < j1 < и и и < jm = n, and X?k =
jk
j k
d
i=jk?1 +1 Xi , p?k =
i=jk?1 +1 pi , 1 ? k ? m, then (X?1 , . . . , X?m ) ?
P N (N ; p?1 , . . . , p?m ).
page 11
July 7, 2017
8:11
Handbook of Medical Statistics
9.61in x 6.69in
b2736-ch01
J. Shi
12
d
2. If (X1 , . . . , Xn ) ? P N (N ; p1 , . . . , pn ), then its moment-generating function and the characteristic function of are
?
?
?N
?N
n
n
M (t1 , . . . , tn ) = ?
pj etj ?
and ?(t1 , . . . , tn ) = ?
pj eitj ? ,
j=1
j=1
respectively.
d
3. If (X1 , . . . , Xn ) ? P N (N ; p1 , . . . , pn ), then for n > 1, 1 ? k < n,
(X1 , . . . , Xk |Xk+1 = mk+1 , . . . , Xn = mn ) ? P N (N ? M ; p?1 , . . . , p?k ),
by 80.82.77.83 on 10/25/17. For personal use only.
d
where
M=
n
mi ,
0 < M < N,
i=k+1
pj
p?j = k
i=1 pi
,
1 ? j ? k.
4. If Xi follows Poisson distribution P (?i ), 1 ? i ? n, and X1 , . . . , Xn are
mutually independent, then for any given positive integer N , there holds
n
d
Xi = N ? P N (N ; p1 , . . . , pn ),
X1 , . . . , Xn i=1
n
where pi = ?i / j=1 ?j , 1 ? i ? n.
1.8. Poisson Distribution2,3,4
If random variable X takes non-negative integer values, and the probability is
P {X = k} =
?k ??
e ,
k!
? > 0,
k = 0, 1, . . . ,
d
then we say X follows the Poisson distribution and denote it as X ? P (?).
d
If X ? P (?), then its distribution function is
P {X ? x} = P (x; ?) =
[x]
p(k; ?),
k=0
where p(k; ?) = e?? ?k /k!, k = 0, 1, . . . .
Poisson distribution is an important distribution in queuing theory. For
example, the number of the purchase of the ticket arriving in ticket window
in a ?xed interval of time approximately follows Poisson distribution. Poisson
page 12
July 7, 2017
8:11
Handbook of Medical Statistics
9.61in x 6.69in
b2736-ch01
Probability and Probability Distributions
13
distribution have a wide range of applications in physics, ?nance, insurance
and other ?elds.
Poisson distribution has the following properties:
1. If k < ?, then p(k; ?) > p(k ? 1; ?); if k > ?, then p(k; ?) < p(k ? 1; ?).
If ? is not an integer, then p(k; ?) has a maximum value at k = [?]; if ?
is an integer, then p(k, ?) has a maximum value at k = ? and ? ? 1.
2. When x is ?xed, P (x; ?) is a non-increasing function with respect to ?,
that is
by 80.82.77.83 on 10/25/17. For personal use only.
P (x; ?1 ) ? P (x; ?2 ) if ?1 < ?2 .
When ? and x change at the same time, then
P (x; ?) ? P (x ? 1; ? ? 1)
if x ? ? ? 1,
P (x; ?) ? P (x ? 1; ? ? 1)
if x ? ?.
d
3. If X ? P (?), then the k-th moment of X is E(X k ) =
where S2 (k, i) is the second order Stirling number.
k
i
i=1 S2 (k, i)? ,
d
4. If X ? P (?), then E(X) = ? and Var(X) = ?. The expectation and
variance being equal is an important feature of Poisson distribution.
d
5. If X ? P (?), then its skewness is s = ??1/2 and its kurtosis is ? = ??1 .
d
6. If X ? P (?), then the moment-generating function and the characteristic
function of X are M (t) = exp{?(et ? 1)} and ?(t) = exp{?(eit ? 1)},
respectively.
7. If X1 , X2 , . . . , Xn are mutually independent and identically distributed,
d
d
then X1 ? P (?) is equivalent to ni=1 Xi ? P (n?).
d
8. If Xi ? P (?i ) for 1 ? i ? n, and X1 , X2 , . . . , Xn are mutually independent, then
n
n
d
Xi ? P
?i .
i=1
d
i=1
d
9. If X1 ? P (?1 ) and X2 ? P (?2 ) are mutually independent, then conditional distribution of X1 given X1 + X2 is binomial distribution, that is
d
(X1 |X1 + X2 = x) ? B(x, p),
where p = ?1 /(?1 + ?2 ).
page 13
July 7, 2017
8:11
Handbook of Medical Statistics
9.61in x 6.69in
b2736-ch01
J. Shi
14
1.9. Negative Binomial Distribution2,3,4
For positive integer m, if random variable X takes non-negative integer values, and the probability is
k
pm q k ,
P {X = k} = Ck+m?1
k = 0, 1, . . . ,
where 0 < p < 1, q = 1 ? p, then we say X follows the negative binomial
d
distribution and denotes it as X ? N B(m, p).
by 80.82.77.83 on 10/25/17. For personal use only.
d
If X ? N B(m, p), then its distribution function is
[x]
k
m k
k=0 Ck+m?1 p q , x ? 0,
N B(x; m, p) =
0,
x < 0,
Negative binomial distribution is also called Pascal distribution. It is the
direct generalization of binomial distribution.
Consider a success and failure type trial (Bernoulli distribution), the
probability of success is p. Let X be the total number of trial until it has m
times of ?success?, then X ? m follows the negative binomial distribution
N B(m, p), that is, the total number of ?failure? follows the negative binomial
distribution N B(m, p).
Negative binomial distribution has the following properties:
k
pm q k , where 0 < p < 1, k = 0, 1, . . . , then
1. Let nb(k; m, p) = Ck+m?1
nb(k + 1; m, p) =
m+k
и nb(k; m, p).
k+1
Therefore, if k < m?1
p ? m, nb(k; m, p) increases monotonically; if k >
m?1
p ? m, nb(k; m, p) decreases monotonically with respect to k.
2. Binomial distribution B(m, p) and negative binomial distribution
N B(r, p) has the following relationship:
N B(x; r, p) = 1 ? B(r ? 1; r + [x], p).
3. N B(x; m, p) = Ip (m, [x] + 1), where Ip (и, и) is the ratio of incomplete Beta
function.
d
4. If X ? N B(m, p), then the k-th moment of X is
k
E(X ) =
k
S2 (k, i)m[i] (q/p)i ,
i=1
where m[i] = m(m + 1) и и и (m + i ? 1), 1 ? i ? k, S2 (k, i) is the second
order Stirling number.
page 14
July 7, 2017
8:11
Handbook of Medical Statistics
9.61in x 6.69in
Probability and Probability Distributions
b2736-ch01
15
d
5. If X ? N B(m, p), then E(X) = mq/p, Var(X) = mq/p2 .
d
6. If X ? N B(m, p), then its skewness and kurtosis are s = (1 + q)/(mq)1/2
and ? = (6q + p2 )/(mq), respectively.
d
7. If X ? N B(m, p), then the moment-generating function and the characteristic function of X are M (t) = pm (1 ? qet )?m and ?(t) = pm (1 ?
qeit )?m , respectively.
by 80.82.77.83 on 10/25/17. For personal use only.
d
8. If Xi ? N B(mi , p) for 1 ? i ? n, and X1 , X2 , . . . , Xn are mutually
independent, then
n
n
d
Xi ? N B
mi , p .
i=1
i=1
d
9. If X ? N B(mi , p), then there exists a sequence random variables
X1 , . . . , Xm which are independent and identically distributed as G(p),
such that
d
X = X1 + и и и + Xm ? m,
where G(p) is the Geometric distribution in Sec. 1.11.
1.10. Hypergeometric Distribution2,3,4
Let N, M, n be positive integers and satisfy M ? N, n ? N . If the random variable X takes integer values from the interval [max(0, M + n ?
N ), min(M, n)], and the probability for X = k is
k C n?k
CM
N ?M
,
P {X = k} =
n
CN
where max(0, M + n ? N ) ? k ? min(M, n), then we say X follows the
d
hypergeometric distribution and denote it as X ? H(M, N, n).
d
If X ? H(M, N, n), then the distribution function of X is
n?k
k
min([x],K2 ) CM CN?M
, x ? K1 ,
n
k=K1
CN
H(x; n, N, M ) =
0,
x < K1 ,
where K1 = max(0, Mn ? N ), K2 = min(M, n).
The hypergeometric distribution is often used in the sampling inspection
of products, which has an important position in the theory of sampling
inspection.
Assume that there are N products with M non-conforming ones. We
randomly draw n products from the N products without replacement. Let
page 15
July 7, 2017
8:11
Handbook of Medical Statistics
9.61in x 6.69in
b2736-ch01
J. Shi
16
X be the number of non-conforming products out of this n products, then
it follows the hypergeometric distribution H(M, N, n).
Some properties of hypergeometric distribution are as follows:
k C n?k /C n , then
1. Denote h(k; n, N, M ) = CM
N
n?M
h(k; n, N, M ) = h(k; M, N, n),
by 80.82.77.83 on 10/25/17. For personal use only.
h(k; n, N, M ) = h(N ? n ? M + k; N ? n, N, M ),
where K1 ? k ? K2 .
2. The distribution function of the hypergeometric distribution has the following expressions
H(x; n, N, M ) = H(N ? n ? M + x; N ? n, N, N ? M )
= 1 ? H(n ? x ? 1; n, N, N ? M )
= 1 ? H(M ? x ? 1; N ? n, N, M )
and
1 ? H(n ? 1; x + n, N, N ? m) = H(x; n + x, N, M ),
where x ? K1 .
d
3. If X ? H(M, N, n), then its expectation and variance are
E(X) =
nM
,
N
Var(X) =
nM (N ? n)(N ? M )
.
N 2 (N ? 1)
For integers n and k, denote
n(n ? 1) и и и (n ? k + 1),
n(k) =
n!
k < n,
k ? n.
d
4. If X ? H(M, N, n), the k-th moment of X is
E(X k ) =
k
S2 (k, i)
i=1
n(i) M (i)
.
N (i)
d
5. If X ? H(M, N, n), the skewness of X is
s=
(N ? 2M )(N ? 1)1/2 (N ? 2n)
.
(N M (N ? M )(N ? n))1/2 (N ? 2)
page 16
July 7, 2017
8:11
Handbook of Medical Statistics
9.61in x 6.69in
b2736-ch01
Probability and Probability Distributions
17
d
6. If X ? H(M, N, n), the moment-generating function and the characteristic function of X are
(N ? n)!(N ? M )!
M (t) =
F (?n, ?M ; N ? M ? n + 1; et )
N !(N ? M ? n)!
and
by 80.82.77.83 on 10/25/17. For personal use only.
?(t) =
(N ? n)!(N ? M )!
F (?n, ?M ; N ? M ? n + 1; eit ),
N !(N ? M ? n)!
respectively, where F (a, b; c; x) is the hypergeometric function and its
de?nition is
ab x
a(a + 1)b(b + 1) x2
F (a, b; c, x) = 1 +
+
+ иии
c 1!
c(c + 1)
2!
with c > 0.
A typical application of hypergeometric distribution is to estimate the number of ?sh. To estimate how many ?sh in a lake, one can catch M ?sh,
and then put them back into the lake with tags. After a period of time,
one re-catches n(n > M ) ?sh from the lake among which there are s ?sh
with the mark. M and n are given in advance. Let X be the number of
?sh with the mark among the n re-caught ?sh. If the total amount of ?sh
in the lake is assumed to be N , then X follows the hypergeometric distribution H(M, N, n). According to the above property 3, E(X) = nM/N ,
which can be estimated by the number of ?sh re-caught with the mark, i.e.,
s ? E(X) = nM/N . Therefore, the estimated total number of ?sh in the
lake is N? = nM/s.
1.11. Geometric Distribution2,3,4
If values of the random variable X are positive integers, and the probabilities
are
P {X = k} = q k?1 p,
k = 1, 2, . . . ,
where 0 < p ? 1, q = 1 ? p, then we say X follows the geometric distribution
d
d
and denote it as X ? G(p). If X ? G(p), then the distribution function of
X is
1 ? q [x] , x ? 0,
G(x; p) =
0,
x < 0.
Geometric distribution is named according to what the sum of distribution
probabilities is a geometric series.
page 17
July 7, 2017
8:11
Handbook of Medical Statistics
9.61in x 6.69in
b2736-ch01
J. Shi
18
In a trial (Bernoulli distribution), whose outcome can be classi?ed as
either a ?success? or a ?failure?, and p is the probability that the trial
is a ?success?. Suppose that the trials can be performed repeatedly and
independently. Let X be the number of trials required until the ?rst success
occurs, then X follows the geometric distribution G(p).
Some properties of geometric distribution are as follows:
by 80.82.77.83 on 10/25/17. For personal use only.
1. Denote g(k; p) = pq k?1 , k = 1, 2, . . . , 0 < p < 1, then g(k; p) is a monotonically decreasing function of k, that is,
g(1; p) > g(2; p) > g(3; p) > и и и .
d
2. If X ? G(p), then the expectation and variance of X are E(X) = 1/p
and Var(X) = q/p2 , respectively.
d
3. If X ? G(p), then the k-th moment of X is
k
K
S2 (k, i)i!q i?1 /pi ,
E(X ) =
i=1
where S2 (k, i) is the second order Stirling number.
d
4. If X ? G(p), the skewness of X is s = q 1/2 + q ?1/2 .
d
5. If X ? G(p), the moment-generating function and the characteristic
function of X are M (t) = pet (1 ? et q)?1 and ?(t) = peit (1 ? eit q)?1 ,
respectively.
d
6. If X ? G(p), then
P {X > n + m|X > n} = P {X > m},
for any nature number n and m.
Property 6 is also known as ?memoryless property? of geometric distribution.
This indicates that, in a success-failure test, when we have done n trials
with no ?success? outcome, the probability of the even that we continue to
perform m trials still with no ?success? outcome has nothing to do with the
information of the ?rst n trials.
The ?memoryless property? is a feature of geometric distribution. It can
be proved that a discrete random variable taking natural numbers must
follow geometric distribution if it satis?es the ?memoryless property?.
d
7. If X ? G(p), then
E(X|X > n) = n + E(X).
8. Suppose X and Y are independent discrete random variables, then
min(X, Y ) is independent of X ? Y if and only if both X and Y follow
the same geometric distribution.
page 18
July 7, 2017
8:11
Handbook of Medical Statistics
9.61in x 6.69in
Probability and Probability Distributions
b2736-ch01
19
1.12. Gamma Distribution2,3,4
If the density function of the random variable X is
? ??1 ??x
? x
e
, x ? 0,
?(?)
g(x; ?, ?) =
0,
x < 0,
where ? > 0, ? > 0, ?(и) is the Gamma function, then we say X follows the
Gamma distribution with shape parameter ? and scale parameter ?, and
d
denote it as X ? ?(?, ?).
by 80.82.77.83 on 10/25/17. For personal use only.
d
If X ? ?(?, ?), then the distribution function of X is
? ??1 ??x
? t
e
dt, x ? 0,
?(?)
?(x; ?, ?) =
0,
x < 0.
Gamma distribution is named because the form of its density is similar to
Gamma function. Gamma distribution is commonly used in reliability theory
to describe the life of a product.
When ? = 1, ?(?, 1), is called the standard Gamma distribution and its
density function is
??1 ?x
x
e
, x ? 0,
?(?)
g(x; ?, 1) =
0,
x < 0.
When ? = 1(1, ?) is called the single parameter Gamma distribution, and it
is also the exponential distribution E(?) with density function
?e??x , x ? 0,
g(x; 1, ?) =
0,
x < 0.
More generally, the gamma distribution with three parameters can be
obtained by means of translation transformation, and the corresponding
density function is
?
? (x??)??1 e?? (x??)
, x ? 0,
?(?)
g(x; ?, ?, ?) =
0,
x < ?.
Some properties of gamma distribution are as follows:
d
d
1. If X ? ?(?, ?), then ?X ? ?(?, 1). That is, the general gamma distribution can be transformed into the standard gamma distribution by scale
transformation.
page 19
July 7, 2017
8:11
Handbook of Medical Statistics
9.61in x 6.69in
b2736-ch01
J. Shi
20
2. For x ? 0, denote
I? (x) =
1
?(?)
x
t??1 e?t dt
0
to be the incomplete Gamma function, then ?(x; ?, ?) = I? (?x).
Particularly, ?(x; 1, ?) = 1 ? e??x .
3. Several relationships between gamma distributions are as follows:
by 80.82.77.83 on 10/25/17. For personal use only.
(1) ?(x; ?, 1) ? ?(x;?? + 1, 1) = g(x; ?, 1).
(2) ?(x; 12 , 1) = 2?( 2x) ? 1,
where ?(x) is the standard normal distribution function.
d
4. If X ? ?(?, ?), then the expectation of X is E(X) = ?/? and the variance
of X is Var(X) = ?/? 2 .
d
5. If X ? ?(?, ?), then the k-th moment of X is E(X k ) = ? ?k ?(k+?)/?(?).
d
6. If X ? ?(?, ?), the skewness of X is s = 2??1/2 and the kurtosis of X is
? = 6/?.
d
? ?
) ,
7. If X ? ?(?, ?), the moment-generating function of X is M (t) = ( ??t
?
)? for t < ?.
and the characteristic function of X is ?(t) = ( ??it
d
8. If Xi ? ?(?i , ?), for 1 ? i ? n, and X1 , X2 , . . . , Xn and are independent,
then
n
n
d
Xi ? ?
?i , ? .
i=1
d
i=1
d
9. If X ? ?(?1 , 1), Y ? ?(?2 , 1), and X is independent of Y , then X + Y is
independent of X/Y . Conversely, if X and Y are mutually independent,
non-negative and non-degenerate random variables, and moreover X + Y
is independent of X/Y , then both X and Y follow the standard Gamma
distribution.
1.13. Beta Distribution2,3,4
If the density function of the random variable X is
a?1
x
(1?x)b?1
, 0 ? x ? 1,
B(a,b)
f (x; a, b) =
0,
otherwise,
where a > 0, b > 0, B(и, и) is the Beta function, then we say X follows the
d
Beta distribution with parameters a and b, and denote it as X ? BE(a, b).
page 20
July 7, 2017
8:11
Handbook of Medical Statistics
9.61in x 6.69in
b2736-ch01
Probability and Probability Distributions
21
d
If X ? BE(a, b), then the distribution function of X is
?
?
?
by 80.82.77.83 on 10/25/17. For personal use only.
BE(x; a, b) =
?
?
1,
x > 1,
Ix (a, b),
0 < x ? 1,
0,
x ? 0,
where Ix (a, b) is the ratio of incomplete Beta function.
Similar to Gamma distribution, Beta distribution is named because the
form of its density function is similar to Beta function.
Particularly, when a = b = 1, BE(1, 1) is the standard uniform distribution U (0, 1).
Some properties of the Beta distribution are as follows:
d
d
1. If X ? BE(a, b), then 1 ? X ? BE(b, a).
2. The density function of Beta distribution has the following properties:
(1)
(2)
(3)
(4)
(5)
when a < 1, b ? 1, the density function is monotonically decreasing;
when a ? 1, b < 1, the density function is monotonically increasing;
when a < 1, b < 1, the density function curve is U type;
when a > 1, b > 1, the density function curve has a single peak;
when a = b, the density function curve is symmetric about x = 1/2.
d
3. If X ? BE(a, b), then the k-th moment of X is E(X k ) =
B(a+k,b)
B(a,b) .
d
4. If X ? BE(a, b), then the expectation and variance of X are E(X) =
a/(a + b) and Var(X) = ab/((a + b + 1)(a + b)2 ), respectively.
d
5. If X ? BE(a, b), the skewness of X is s =
kurtosis of X is ? =
3(a+b)(a+b+1)(a+1)(2b?a)
ab(a+b+2)(a+b+3)
+
2(b?a)(a+b+1)1/2
(a+b+2)(ab)2
a(a?b)
a+b ? 3.
and the
d
6. If X ? BE(a, b), the moment-generating function and the characteris? ?(a+k)
tk
tic function of X are M (t) = ?(a+b)
k=0 ?(a+b+k) ?(k+1) and ?(t) =
?(a)
?(a+k)
(it)k
?(a+b) ?
k=0 ?(a+b+k) ?(k+1) , respectively.
?(a)
d
7. Suppose X1 , X2 , . . . , Xn are mutually independent, Xi ? BE(ai , bi ), 1 ?
i ? n, and ai+1 = ai + bi , 1 ? i ? n ? 1, then
n
i=1
d
Xi ? BE
a1 ,
n
bi .
i=1
8. Suppose X1 , X2 , . . . , Xn are independent and identically distributed random variables with common distribution U (0, 1), then min(X1 , . . . , Xn )
page 21
July 7, 2017
8:11
Handbook of Medical Statistics
9.61in x 6.69in
b2736-ch01
J. Shi
22
d
? BE(1, n). Conversely, if X1 , X2 , . . . , Xn are independent and identically distributed random variables, and
d
min(X1 , . . . , Xn ) ? U (0, 1),
d
then X1 ? BE(1, 1/n).
9. Suppose X1 , X2 , . . . , Xn are independent and identically distributed random variables with common distribution U (0, 1), denote
X(1,n) ? X(2,n) ? и и и ? X(n,n)
as the corresponding order statistics, then
by 80.82.77.83 on 10/25/17. For personal use only.
d
X(k,n) ? BE(k, n ? k + 1),
1 ? k ? n,
d
X(k,n) ? X(i,n) ? BE(k ? i, n ? k + i + 1), 1 ? i < k ? n.
10. Suppose X1 , X2 , . . . , Xn are independent and identically distributed random variables with common distribution BE(a, 1).
Let
Y = min(X1 , . . . , Xn ),
then Y
d
a ?
BE(1, n).
d
11. If X ? BE(a, b), where a and b are positive integers, then
BE(x; a, b) =
a+b?1
i
Ca+b?1
xi (1 ? x)a+b?1?i .
i=a
1.14. Chi-square Distribution2,3,4
If Y1 , Y2 , . . . , Yn are mutually independent and identically distributed random variables with common distribution N (0, 1), then we say the random
variable X = ni=1 Yi2 change position with the previous math formula follows the Chi-square distribution (?2 distribution) with n degree of freedom,
d
and denote it as X ? ?2n .
d
If X ? ?2n , then the density function of X is
?x/2 n/2?1
e
x
x > 0,
n/2 ?(n/2) ,
2
f (x; n) =
0,
x ? 0,
where ?(n/2) is the Gamma function.
Chi-square distribution is derived from normal distribution, which plays
an important role in statistical inference for normal distribution. When the
degree of freedom n is quite large, Chi-square distribution ?2n approximately
becomes normal distribution.
page 22
July 7, 2017
8:11
Handbook of Medical Statistics
9.61in x 6.69in
b2736-ch01
Probability and Probability Distributions
23
Some properties of Chi-square distribution are as follows:
d
d
d
by 80.82.77.83 on 10/25/17. For personal use only.
1. If X1 ? ?2n , X2 ? ?2m , X1 and X2 are independent, then X1 + X2 ? ?2n+m .
This is the ?additive property? of Chi-square distribution.
2. Let f (x; n) be the density function of Chi-square distribution ?2n . Then,
f (x; n) is monotonically decreasing when n ? 2, and f (x; n) is a single
peak function with the maximum point n ? 2 when n ? 3.
d
3. If X ? ?2n , then the k-th moment of X is
n
?(n/2 + k)
= 2k ?k?1
.
?
i
+
E(X k ) = 2k
i=0
?(n/2)
2
d
4. If X ? ?2n , then
E(X) = n,
Var(X) = 2n.
?
then the skewness of X is s = 2 2n?1/2 , and the kurtosis of
5. If X ?
X is ? = 12/n.
d
?2n ,
d
6. If X ? ?2n , the moment-generating function of X is M (t) = (1 ? 2t)?n/2
and the characteristic function of X is ?(t) = (1?2it)?n/2 for 0 < t < 1/2.
7. Let K(x; n) be the distribution function of Chi-square distribution ?2n ,
then we have
(1) K(x; 2n) = 1 ? 2 ni=1 f (x; 2i);
(2) K(x; 2n + 1) = 2?(x) ? 1 ? 2 ni=1 f (x; 2i + 1);
(3) K(x; n) ? K(x; n + 2) = ( x2 )n/2 e?x/2 /?( n+2
2 ), where ?(x) is the standard normal distribution function.
d
d
d
8. If X ? ?2m , Y ? ?2n , X and Y are independent, then X/(X + Y ) ?
BE(m/2, n/2), and X/(X + Y ) is independent of X + Y .
Let X1 , X2 , . . . , Xn be the random sample of the normal population
N (х, ? 2 ). Denote
n
n
1
Xi , S 2 =
(Xi ? X?)2 ,
X? =
n
i=1
then
d
S 2 /? 2 ?
i=1
?2n?1 and is independent of X?.
1.14.1. Non-central Chi-square distribution
d
Suppose random variables Y1 , . . . , Yn are mutually independent, Yi ?
N (хi , 1), 1 ? i ? n, then the distribution function of the random variable
X = ni=1 Yi2 is the non-central Chi-square distribution with the degree of
freedom n and the non-central parameter ? = ni=1 х2i , and is denoted as
?2n,? . Particularly, ?2n,0 = ?2n .
page 23
July 7, 2017
8:11
Handbook of Medical Statistics
9.61in x 6.69in
b2736-ch01
J. Shi
24
d
9. Suppose Y1 , . . . , Ym are mutually independent, and Yi ? ?2ni ,?i for 1 ?
m
m
d
2
i ? m, then m
i=1 Yi ? ?n,? where, n =
i=1 ni , ? =
i=1 ?i .
d
10. If X ? ?2n,? then E(X) = n + ?, Var(X) = 2(n + 2?), the skewness of X
? n+3?
n+4?
is s = 8 (n+2?)
3/2 , and the kurtosis of X is ? = 12 (n+2?)2 .
d
by 80.82.77.83 on 10/25/17. For personal use only.
11. If X ? ?2n,? , then the moment-generating function and the characteristic
function of X are M (t) = (1 ? 2t)?n/2 exp{t?/(1 ? 2t)} and ?(t) =
(1 ? 2it)?n/2 exp{it?/(1 ? 2it)}, respectively.
1.15. t Distribution2,3,4
d
d
Assume X ? N (0, 1), Y ? ?
?2n , and X is independent of Y . We say the
?
random variable T = nX/ Y follows the t distribution with n degree of
d
freedom and denotes it as T ? tn .
d
If X ? tn , then the density function of X is
?( n+1
2 )
t(x; n) =
(n?)1/2 ?( n2 )
x2
1+
n
?(n+1)/2
,
for ?? < x < ?.
De?ne T (x; n) as the distribution function of t distribution, tn , then
T (x; n) =
1
1 n
2 In/(n+x2 ) ( 2 , 2 ),
1
1
1 n
2 + 2 In/(n+x2 ) ( 2 , 2 ),
x ? 0,
x < 0,
where In/(n+x2 ) ( 12 , n2 ) is the ratio of incomplete beta function.
Similar to Chi-square distribution, t distribution can also be derived
from normal distribution and Chi-square distribution. It has a wide range of
applications in statistical inference on normal distribution. When n is large,
the t distribution tn with n degree of freedom can be approximated by the
standard normal distribution.
t distribution has the following properties:
1. The density function of t distribution, t(x; n), is symmetric about x = 0,
and reaches the maximum at x = 0.
2
2. limn?? t(x; n) = ?12? e?x /2 = ?(x), the limiting distribution for t distribution is the standard normal distribution as the degree of freedom n
goes to in?nity.
page 24
July 7, 2017
8:11
Handbook of Medical Statistics
9.61in x 6.69in
b2736-ch01
Probability and Probability Distributions
25
d
by 80.82.77.83 on 10/25/17. For personal use only.
3. Assume X ? tn . If k < n, then E(X k ) exists, otherwise, E(X k ) does not
exist. The k-th moment of X is
?
0
if 0 < k < n, and k is odd,
?
?
?
?
?
? ?( k+1 )?( n?k ) k2
?
? 2?
2
if 0 < k < n, and k is even,
??( n
)
E(X k ) =
2
?
?
?
?
doesn?t exist if k ? n,
and k is odd,
?
?
?
?
?
if k ? n,
and k is even.
d
4. If X ? tn , then E(X) = 0. When n > 3, Var(X) = n/(n ? 2).
d
5. If X ? tn , then the skewness of X is 0. If n ? 5, the kurtosis of X is
? = 6/(n ? 4).
6. Assume that X1 and X2 are independent and identically distributed random variables with common distribution ?2n , then the random variable
Y =
1 n1/2 (X2 ? X1 ) d
? tn .
2 (X1 X2 )1/2
Suppose that X1 , X2 , . . . , Xn are random samples of the normal population
N (х, ? 2 ), de?ne X? = n1 ni=1 Xi , S 2 = ni=1 (Xi ? X?)2 , then
T =
n(n ? 1)
X? ? х d
? tn?1 .
S
1.15.1. Non-central t distribution
d
d
2
Suppose that
? X ? N (?, 1), Y ? ?n , X and Y are independent, then
?
T = nX/ Y is a non-central t distributed random variable with n degree
d
of freedom and non-centrality parameter ?, and is denoted as T ? tn,? .
Particularly, tn,0 = tn .
7. Let T (x; n, ?) be the distribution function of the non-central t distribution
tn,? , then we have
T (x; n, ?) = 1 ? T (?x; n, ??),
?
T (1; 1, ?) = 1 ? ?2 (?/ 2).
d
8. If X ? tn,? , then E(X) =
?2 ) ?
(E(X))2
for n > 2.
n ?( n?1
)
2
2 ?( n
)
2
T (0; n, ?) = ?(??),
? for n > 1 and Var(X) =
n
n?2 (1
+
page 25
July 7, 2017
8:11
Handbook of Medical Statistics
9.61in x 6.69in
b2736-ch01
J. Shi
26
1.16. F Distribution2,3,4
d
d
Let X and Y be independent random variables such that X ? ?2m , Y ? ?2n .
Y
De?ne a new random variable F as F = X
m / n . Then the distribution of F
is called the F distribution with the degrees of freedom m and n, denoted
d
as F ? Fm,n .
d
If X ? Fm,n , then the density function of X is
by 80.82.77.83 on 10/25/17. For personal use only.
f (x; m, n) =
? m m
m?2 (n)2
?
? B(
1+
m n x 2
)
?
?
2 2
m+n
mx ? 2
n
, x > 0,
x ? 0.
0,
Let F (x; m, n) be the distribution function of F distribution, Fm,n , then
F (x; m, n) = Ia (n/2, m/2), where a = xm/(n + mx), Ia (и, и) is the ratio of
incomplete beta function.
F distribution is often used in hypothesis testing problems on two or
more normal populations. It can also be used to approximate complicated
distributions. F distribution plays an important role in statistical inference.
F distribution has the following properties:
1. F distributions are generally skewed, the smaller of n, the more it skews.
2. When m = 1 or 2, f (x; m, n) decreases monotonically; when m >
2, f (x; m, n) is unimodal, the mode is n(m?2)
(n+2)m .
d
d
3. If X ? Fm,n , then Y = 1/X ? Fn,m .
d
d
4. If X ? tn , then X 2 ? F1,n .
d
5. If X ? Fm,n , then the k-th moment of X is
E(X k ) =
?
m
n
n k ?( 2 +k)?( 2 ?k)
?
?( m
)
, 0 < k < n/2,
?( m )?( n )
2
?
?
2
?,
d
k ? n/2.
6. Assume that X ? Fm,n . If n > 2, then E(X) =
Var(X) =
2n2 (m+n?2)
.
m(n?2)2 (n?4)
d
n
n?2 ;
if n > 4, then
7. Assume that X ? Fm,n . If n > 6, then the skewness of X is
(2m+n?2)(8(n?4))1/2
; if n >
(n?6)(m(mn ?2))1/2
12((n?2)2 (n?4)+m(m+n?2)(5n?22))
.
m(n?6)(n?8)(m+n?2)
s =
8, then the kurtosis of X is ? =
page 26
July 7, 2017
8:11
Handbook of Medical Statistics
9.61in x 6.69in
b2736-ch01
Probability and Probability Distributions
27
8. When m is large enough and n > 4, the normal distribution function
?(y) can be used to approximate the F distribution function F (x; m, n),
where y =
x?n
n?2
2(n+m?2) 1/2
n
(
)
n?2
m(n?4)
, that is, F (x; m, n) ? ?(y).
d
Suppose X ? Fm,n . Let Zm,n = ln X, when both m and n are large enough,
the distribution of Zm,n can be approximated by the normal distribution
1
1
), 12 ( m
+ n1 )), that is,
N ( 12 ( n1 ? m
by 80.82.77.83 on 10/25/17. For personal use only.
d
Zm,n ? N
1
1 1
?
,
2 n m
1
2
1
1
+
m n
.
Assume that X1 , . . . , Xm are random samples of the normal population
N (х1 , ?12 ) and Y1 , . . . , Yn are random samples of the normal population
N (х2 , ?22 ). The testing problem we are interested in is whether ?1 and ?2
are equal.
n
2
2
?1
2
De?ne ??12 = (m ? 1)?1 m
i=1 (Xi ? X?) and ??2 = (n ? 1)
i=1 (Yi ? Y? )
2
2
as the estimators of ?1 and ?2 , respectively. Then we have
d
??12 /?12 ? ?2m?1 ,
d
??22 /?22 ? ?2n?1 ,
where ??12 and ??22 are independent. If ?12 = ?22 , by the de?nition of F distribution, the test statistics
2
(n ? 1)?1 m
??12 /?12 d
i=1 (Xi ? X?)
=
? Fm?1,n?1 .
F =
(m ? 1)?1 ni=1 (Yi ? Y? )2
??22 /?22
1.16.1. Non-central F distribution
d
d
Y
If X ? ?2m,? , Y ? ?2n , X and Y are independent, then F = X
m / n follows a
non-central F distribution with the degrees of freedom m and n and nond
centrality parameter ?. Denote it as F ? Fm,n,? . Particularly, Fm,n,0 = Fm,n .
d
d
10. If X ? tn,? , then X 2 ? F1,n,? .
d
11. Assume that X ? F1,n,? . If n > 2 then E(X) =
Var(X) =
n 2 (m+?)2 +(m+2?)(n?2)
)
.
2( m
(n?2)2 (n?4)
(m+?)n
(n?2)m ;
if n > 4, then
page 27
July 7, 2017
8:11
Handbook of Medical Statistics
9.61in x 6.69in
b2736-ch01
J. Shi
28
1.17. Multivariate Hypergeometric Distribution2,3,4
by 80.82.77.83 on 10/25/17. For personal use only.
Suppose X = (X1 , . . . , Xn ) is an n-dimensional random vector with n ? 2,
which satis?es:
(1) 0 ? Xi ? Ni , 1 ? i ? n, ni=1 Ni = N ;
(2) let m1 , . . . , mn be positive integers with ni=1 mi = m, the probability
of the event {X1 = m1 , . . . , Xn = mn } is
n
mi
i=1 CNi
,
P {X1 = m1 , . . . , Xn = mn } =
m
CN
then we say X follows the multivariate hypergeometric distribution, and
d
denote it as X ? M H(N1 , . . . , Nn ; m).
Suppose a jar contains balls with n kinds of colors. The number of balls
of the ith color is Ni , 1 ? i ? n. We draw m balls randomly from the jar
without replacement, and denote Xi as the number of balls of the ith color
for 1 ? i ? n. Then the random vector (X1 , . . . , Xn ) follows the multivariate
hypergeometric distribution M H(N1 , . . . , Nn ; m).
Multivariate hypergeometric distribution has the following properties:
d
1. Suppose (X1 , . . . , Xn ) ? M H(N1 , . . . , Nn ; m).
k
k
Xi , Nk? = ji=j
Ni ,
For 0 = j0 < j1 < и и и < js = n, let Xk? = ji=j
k?1 +1
k?1 +1
d
1 ? k ? s, then (X1? , . . . , Xs? ) ? M H(N1? , . . . , Ns? ; m).
Combine the components of the random vector which follows multivariate hypergeometric distribution into a new random vector, the new random
vector still follows multivariate hypergeometric distribution.
d
2. Suppose (X1 , . . . , Xn ) ? M H(N1 , . . . , Nn ; m), then for any 1 ? k < n,,
we have
m?
P {X1 = m1 , . . . , Xk = mk } =
where N =
n
?
i=1 Ni , Nk+1
=
n
mk
m1 m2
CN
CN2 и и и CN
CN ?k+1
1
k
?
i=k+1 Ni , mk+1
k+1
m
CN
=m?
Especially, when k = 1, we have P {X1 = m1 } =
H(N1 , N, m).
,
k
i=1 mi .
m?
m
CN 1 CN ?2
1
2
m
CN
d
, that is X1 ?
page 28
July 7, 2017
8:11
Handbook of Medical Statistics
9.61in x 6.69in
b2736-ch01
Probability and Probability Distributions
29
Multivariate hypergeometric distribution is the extension of hypergeometric distribution.
d
3. Suppose (X1 , . . . , Xn ) ? M H(N1 , . . . , Nn ; m), 0 < k < n, then
P {X1 = m1 , . . . , Xk = mk |Xk+1 = mk+1 , . . . , Xn = mn }
by 80.82.77.83 on 10/25/17. For personal use only.
=
mk
m1
и и и CN
CN
1
k
?
m
CN
?
,
where, N ? = ki=1 Ni , m?k+1 = m? ni=k+1 mi . This indicates that, under
the condition of Xk+1 = mk+1 , . . . , Xn = mn , the conditional distribution
of (X1 , . . . , Xk ) is M H(N1 , . . . , Nk ; m? ).
d
4. Suppose Xi ? B(Ni , p), 1 ? i ? n, 0 < p < 1, and X1 , . . . , Xn are mutually independent, then
n
d
Xi = m ? M H(N1 , . . . , Nn ; m).
X1 , . . . , Xn i=1
This indicates that, when the sum of independent binomial random variables is given, the conditional joint distribution of these random variables
is a multivariate hypergeometric distribution.
d
5. Suppose (X1 , . . . , Xn ) ? M H(N1 , . . . , Nn ; m). If Ni /N ? pi when
N ? ? for 1 ? i ? n, then the distribution of (X1 , . . . , Xn ) converges to
the multinomial distribution P N (N ; p1 , . . . , pn ).
In order to control the number of cars, the government decides to implement the random license-plate lottery policy, each participant has the same
probability to get a new license plate, and 10 quotas are allowed each issue.
Suppose 100 people participate in the license-plate lottery, among which
10 are civil servants, 50 are individual household, 30 are workers of stateowned enterprises, and the remaining 10 are university professors. Denote
X1 , X2 , X3 , X4 as the numbers of people who get the license as civil servants,
individual household, workers of state-owned enterprises and university professors, respectively. Thus, the random vector (X1 , X2 , X3 , X4 ) follows the
multivariate hypergeometric distribution. M H(10, 50, 30, 10; 10). Therefore,
in the next issue, the probability of the outcome X1 = 7, X2 = 1, X3 = 1,
X4 = 1 is
P {X1 = 7, X2 = 1, X3 = 1, X4 = 1} =
7 C1 C1 C1
C10
50 30 10
.
10
C100
page 29
July 7, 2017
8:11
Handbook of Medical Statistics
9.61in x 6.69in
b2736-ch01
J. Shi
30
1.18. Multivariate Negative Binomial Distribution2,3,4
by 80.82.77.83 on 10/25/17. For personal use only.
Suppose X = (X1 , . . . , Xn ) is a random vector with dimension n(n ? 2)
which satis?es:
(1) Xi takes non-negative integer values, 1 ? i ? n;
(2) If the probability of the event {X1 = x1 , . . . , Xn = xn } is
(x1 + и и и + xn + k ? 1)! k x1
p0 p1 и и и pxnn ,
P {X1 = x1 , . . . , Xn = xn } =
x1 ! и и и xn !(k ? 1)!
where 0 < pi < 1, 0 ? i ? n, ni=0 pi = 1, k is a positive integer, then we
say X follows the multivariate negative binomial distribution, denoted
d
as X ? M N B(k; p1 , . . . , pn ).
Suppose that some sort of test has (n + 1) kinds of di?erent results, but only
one of them occurs every test with the probability of pi , 1 ? i ? (n + 1). The
sequence of tests continues until the (n + 1)-th result has occurred k times.
At this moment, denote the total times of the i-th result occurred as Xi for
1 ? i ? n, then the random vector (X1 , . . . , Xn ) follows the multivariate
negative binomial distribution MNB(k; p1 , . . . , pn ).
Multivariate negative binomial distribution has the following properties:
d
1. Suppose (X1 , . . . , Xn ) ? M N B(k; p1 . . . , pn ). For 0 = j0 < j1 < и и и <
jk
jk
?
js = n, let Xk? =
i=jk?1 +1 Xi , pk =
i=jk?1 +1 pi , 1 ? k ? s, then
d
(X1? , . . . , Xs? ) ? M N B(k; p?1 . . . , p?s ).
Combine the components of the random vector which follows multivariate
negative binomial distribution into a new random vector, the new random
vector still follows multivariate negative binomial distribution.
d
r1
rn
2. If (X1 , . . . , X
1 и и и Xn ) = (k +
Pnn) ? M N B(k; p1 . . . , pn ), then E(X
n
n
ri
i=1 ri ?n
i=1 (pi /p0 ) , where p0 = 1 ?
i=1 ri ? 1)
i=1 pi .
d
d
3. If (X1 , . . . , Xn ) ? M N B(k; p1 . . . , pn ), 1 ? s < n, then (X1 , . . . , Xs ) ?
MNB(k; p?1 . . . , p?s ), where p?i = pi /(p0 + p1 + и и и + ps ), 1 ? i ? s, p0 =
1 ? ni=1 pi .
d
0
).
Especially, when s = 1, X1 ? N B(k, p0p+p
1
1.19. Multivariate Normal Distribution5,2
A random vector X = (X1 , . . . , Xp ) follows the multivariate normal distri
d
bution, denoted as X ? Np (х, ), if it has the following density function
?1
1
p ? 2
1
(x ? х) ,
f (x) = (2?)? 2 exp ? (x ? х)
2
page 30
July 7, 2017
8:11
Handbook of Medical Statistics
9.61in x 6.69in
b2736-ch01
Probability and Probability Distributions
31
where x = (x1 , . . . , xp ) ? Rp , х ? Rp ,
is a p О p positive de?nite matrix,
?| и |? denotes the matrix determinant, and ? ? denotes the transition matrix
transposition.
Multivariate normal distribution is the extension of normal distribution.
It is the foundation of multivariate statistical analysis and thus plays an
important role in statistics.
Let X1 , . . . , Xp be independent and identically distributed standard normal random variables, then the random vector X = (X1 , . . . , Xp ) follows
by 80.82.77.83 on 10/25/17. For personal use only.
d
the standard multivariate normal distribution, denoted as X ? Np (0, Ip ),
where Ip is a unit matrix of p-th order.
Some properties of multivariate normal distribution are as follows:
1. The necessary and su?cient conditions for X = (X1 , . . . , Xp ) following
multivariate normal distribution is that a X also follows normal distribution for any a = (a1 , . . . , ap ) ? Rp .
d
2. If X ? Np (х, ), we have E(X) = х, Cov(X) = .
d
3. If X ? Np (х, ), its moment-generating function and characteristic
function are M (t) = exp{х t + 12 t t} and ?(t) = exp{iх t ? 12 t t}
for t ? Rp , respectively.
4. Any marginal distribution of a multivariate normal distribution is still a
d
multivariate normal distribution. Let X = (X1 , . . . , Xp ) ? N (х, ),
= (?ij )pОp . For any 1 ? q < p, set
where х = (х1 , . . . , хp ) ,
(1)
(1)
X = (X1 , . . . , Xq ) , х = (х1 , . . . , хq ) , 11 = (?ij )1?i,j?1 , then we
d
d
have X(1) ? Nq (х(1) , 11 ). Especially, X1 ? N (хi , ?ii ), 1 ? i ? p.
d
5. If X ? Np (х, ), B denotes an q О p constant matrix and a denotes an
q О 1 constant vector, then we have
d
B ,
a + BX ? Nq a + Bх, B
which implies that the linear transformation of a multivariate normal
random vector still follows normal distribution.
d
6. If Xi ? Np (хi , i ), 1 ? i ? n, and X1 , . . . , Xn are mutually indepen
d
dent, then we have ni=1 Xi ? Np ( ni=1 хi , ni=1 i ).
d
d
7. If X ? Np (х, ), then (X ? х) ?1 (X ? х) ? ?2p .
d
as follows:
8. Let X = (X1 , . . . , Xp ) ? Np (х, ), and divide X, х and
х(1)
X(1)
11
12
, х=
,
= ,
X=
(2)
(2)
X
х
21
22
page 31
July 7, 2017
8:11
Handbook of Medical Statistics
9.61in x 6.69in
b2736-ch01
J. Shi
32
by 80.82.77.83 on 10/25/17. For personal use only.
where X(1) and х(1) are q О 1 vectors, and 11 is an q О q matrix, q < p,
then X(1) and X(2) are mutually independent if and only if 12 = 0.
d
in the same
9. Let X = (X1 , . . . , Xp ) ? Np (х, ), and divide X, х and
manner as property 8, then the conditional distribution of X(1) given
?1 (2) ? х(2) ),
X(2) is Nq (х(1) + 12 ?1
11 ?
12
21 ).
22 (X
22
d
the
10. Let X = (X1 , . . . , Xp ) ? Np (х, ), and divide X, х and in
(1) are
X
same manner as property 8, then X(1) and X(2) ? 21 ?1
11
d
(1)
(1)
independent, and X ? Nq (х , 11 ),
?1 (1)
?1
(1) d
(2)
X
?
N
?
?
?
х
,
?
?
?
?
?
х
X(2) ? ?21 ??1
p?q
21 11
22
21 11 12 .
11
Similarly, X(2) and X(1) ?
(х(2) , 22 ),
?1
12
22
d
X(2) are independent, and X(2) ? Np?q
?1 (2)
?1
(2) d
(1)
X
?
N
?
?
?
х
,
?
?
?
?
?
х
.
X(1) ? ?12 ??1
q
12
11
12
21
22
22
22
1.20. Wishart Distribution5,6
Let X1 , . . . , Xn be independent and identically distributed p-dimensional
random vectors with common distribution Np (0, ), and X = (X1 , . . . , Xn )
be an pОn random matrix. Then, we say the p-th order random matrix W =
XX = ni=1 Xi Xi follows the p-th order (central) Wishart distribution with
d
n degree of freedom, and denote it as W ? Wp (n, ). Here the distribution
of a random matrix indicates the distribution of the random vector generated
by matrix vectorization.
d 2
?n , which implies
Particularly, if p = 1, we have W = ni=1 Xi2 ?
that Wishart distribution is the extension of Chi-square distribution.
d
> 0, n ? p, then density function of W is
If W ? Wp (n, ), and
|W|(n?p?1)/2 exp{? 12 tr( ?1 W)}
,
fp (W) =
2(np/2) | |n/2 ? (p(p?1)/4) ?pi=1 ?( (n?i+1)
)
2
where W > 0, and ?tr? denotes the trace of a matrix.
Wishart distribution is a useful distribution in multivariate statistical
analysis and plays an important role in statistical inference for multivariable
normal distribution.
Some properties of Wishart distribution are as follows:
d
1. If W ? Wp (n, ), then E(W) = n .
page 32
July 7, 2017
8:11
Handbook of Medical Statistics
9.61in x 6.69in
b2736-ch01
Probability and Probability Distributions
33
by 80.82.77.83 on 10/25/17. For personal use only.
d
d
2. If W ? Wp (n, ), and C denotes an k О p matrix, then CWC ?
Wk (n, C C ).
d
3. If W ? Wp (n, ), its characteristic function is E(e{itr(TW)} ) = |Ip ?
2i T|?n/2 , where T denotes a real symmetric matrix with order p.
d
4. If Wi ? Wp (ni , ), 1 ? i ? k, and W1 , . . . , Wk are mutually indepen
d
dent, then ki=1 Wi ? Wp ( ki=1 ni , ).
5. Let X1 , . . . , Xn be independent and identically distributed p-dimensional
> 0, and X =
random vectors with common distribution Np (0, ),
(X1 , . . . , Xn ).
(1) If A is an n-th order idempotent matrix, then the quadratic form
d
matrix Q = XAX ? Wp (m, ), where m = r(A), r(и) denotes the
rank of a matrix.
(2) Let Q = XAX , Q1 = XBX , where both A and B are idempotent matrices. If Q2 = Q ? Q1 = X(A ? B)X ? 0, then
d
Q2 ? Wp (m ? k, ), where m = r(A), k = r(B). Moreover, Q1
and Q2 are independent.
d
> 0, n ? p, and divide W and
into q-th order
6. If W ? Wp (n, ),
and (p ? q)-th order parts as follows:
W=
W11
W12
W21
W22
,
11
=
21
12
,
22
then
d
(1) W11 ? Wq (n, 11 );
?1
W12 and (W11 , W21 ) are independent;
(2) W22 ? W21 W11
d
?1
(3) W22 ? W21 W11 W12 ? Wp?q (n ? q, 2|1 ) where 2|1 = 22 ? 21
?1 12 .
11
?1
d
1
> 0, n > p + 1, then E(W?1 ) = n?p?1
.
7. If W ? Wp (n, ),
p
d
d
> 0, n ? p, then |W| = | | i=1 ?i , where
8. If W ? Wp (n, ),
d
?1 , . . . , ?p are mutually independent and ?i ? ?2n?i+1 , 1 ? i ? p.
d
> 0, n ? p, then for any p-dimensional non-zero
9. If W ? Wp (n, ),
vector a, we have
a ?1 a d 2
? ?n?p+1 .
a W?1 a
page 33
July 7, 2017
8:11
Handbook of Medical Statistics
9.61in x 6.69in
b2736-ch01
J. Shi
34
1.20.1. Non-central Wishart distribution
by 80.82.77.83 on 10/25/17. For personal use only.
Let X1 , . . . , Xn be independent and identically distributed p-dimensional
random vectors with common distribution Np (х, ), and X = (X1 , . . . , Xn )
be an p О n random matrix. Then, we say the random matrix W = XX
follows the non-central Wishart distribution with n degree of freedom. When
х = 0, the non-central Wishart distribution becomes the (central) Wishart
distribution Wp (n, ).
1.21. Hotelling T2 Distribution5,6
d
d
Suppose that X ? Np (0, )W ? Wp (n, ), X and W are independent. Let
T2 = nX W?1 X, then we say the random variable T2 follows the (central)
d
Hotelling T2 distribution with n degree of freedom, and denote it as T2 ?
Tp2 (n).
If p = 1, Hotelling T2 distribution is the square of univariate t distribution. Thus, Hotelling T2 distribution is the extension of t distribution.
The density function of Hotelling T2 distribution is
f (t) =
(t/n)(p?2)/2
?((n + 1)/2)
.
?(p/2)?((n ? p + 1)/2) (1 + t/n)(n+1)/2
Some properties of Hotelling T2 distribution are as follows:
d
d
1. If X and W are independent, and X ? Np (0, ), W ? Wp (n, ), then
d
X W?1 X =
?2p
2
?n?p+1
, where the numerator and denominator are two inde-
pendent Chi-square distributions.
d
2. If T2 ? Tp2 (n), then
n?p+1 2 d
T =
np
?2p
p
?2n?p+1
n?p+1
d
? Fp,n?p+1 .
Hence, Hotelling T2 distribution can be transformed to F distribution.
1.21.1. Non-central T2 distribution
d
d
Assume X and W are independent, and X ? Np (х, ), W ? Wp (n, ),
then the random variable T2 = nX W?1 X follows the non-central Hotelling
T2 distribution with n degree of freedom.
When х = 0, non-central Hotelling T2 distribution becomes central
Hotelling T2 distribution.
page 34
July 7, 2017
8:11
Handbook of Medical Statistics
9.61in x 6.69in
b2736-ch01
Probability and Probability Distributions
d
3. Suppose that X and W are independent, X ? Np (х,
Let T2 = nX W?1 X, then
n?p+1 2 d
T =
np
by 80.82.77.83 on 10/25/17. For personal use only.
where a = х
?1
?2p,a
p
?2n?p+1
n?p+1
35
d
), W ? Wp (n,
).
d
? Fp,n?p+1,a ,
х.
Hotelling T2 distribution can be used in testing the mean of a multivariate
normal distribution. Let X1 , . . . , Xn be random samples of the multivariate
> 0,
is unknown, n > p. We want
normal population Np (х, ), where
to test the following hypothesis:
vs H1 : х = х0 .
H0 : х = х0 ,
n
Let X?n = n
i=1 Xi be the sample mean and Vn =
i=1 (Xi ? X?n )(Xi ?
X?n ) be the sample dispersion matrix. The likelihood ratio test statistic is
T2 = n(n ? 1)(X?n ? х0 ) Vn?1 (X?n ? х0 ). Under the null hypothesis H0 , we
n
?1
d
n?p
2 d
(n?1)p T ? Fp,n?p .
n?p
P {Fp,n?p ? (n?1)p
T 2 }.
have T2 ? Tp2 (n?1). Moreover, from property 2, we have
Hence, the p-value of this Hotelling T2 test is p =
1.22. Wilks Distribution5,6
d
Assume that W1 and W2 are independent, W1 ? Wp (n,
(m, ), where
> 0, n ? p. Let
A=
d
), W2 ? Wp
|W1 |
,
|W1 + W2 |
then the random variable A follows the Wilks distribution with the degrees
of freedom n and m, and denoted as ?p,n,m.
Some properties of Wilks distribution are as follows:
d
d
1. ?p,n,m = B1 B2 и и и Bp , where Bi ? BE((n ? i + 1)/2, m/2), 1 ? i ? p, and
B1 , . . . , Bp are mutually independent.
d
2. ?p,n,m = ?m,n+m?p,p.
3. Some relationships between Wilks distribution and F distribution are:
(1)
(2)
(3)
n 1??1,n,m d
m ?1,n,m ? Fm,n ;
n+1?p 1??p,n,1 d
? Fp,(n+1?p) ;
p
??p,n,1
?
d
n?1 1?
? 2,n,m ?
F2m,2(n?1) ;
m
?2,n,m
page 35
July 7, 2017
8:11
Handbook of Medical Statistics
9.61in x 6.69in
b2736-ch01
J. Shi
36
?
by 80.82.77.83 on 10/25/17. For personal use only.
(4)
?
d
n+1?p 1?
? p,n,2 ?
p
?p,n,2
F2p,2(n+1?p) .
Wilks distribution is often used to model the distribution of a multivariate covariance. Suppose we have k mutually independent populations
d
> 0 and
is unknown. Let xj1 , . . . , xjnj be the
Xj ? Np (хj , ), where
random samples of population Xj , 1 ? j ? k. Set n = kj=1 nj , and we have
n ? p + k. We want to test the following hypothesis:
H0 : х1 = и и и = хk , vs H1 : х1 , . . . , хk are not identical.
Set
nj
?1
x?j = nj
xji , 1 ? j ? k,
i=1
(xji ? x?j )(xji ? x?j ) ,
nj
Vj =
x? =
i=1
k
1 ? j ? k,
nj x?j /n,
j=1
k
nj (x?j ? x?)(x?j ? x?) be the between-group variance,
SSB =
k j=1
SSB = j=1 Vj be the within-group variance.
The likelihood ratio test statistic is
|SSW|
.
?=
|SSW + SSB|
d
Under the null hypothesis H0 , we have ? ? ?p,n?k,k?1. Following the relationships between Wilks distribution and F distribution, we have following
conclusions:
(1) If k = 2, let
n?p?1 1?? d
и
? Fp,n?p?1 ,
p
?
then the p-value of the test is
p = P {Fp,n?p?1 ? F}.
F=
(2) If p = 2, let
?
n?k?1 1? ? d
и ?
? F2(k?1),2(n?k?1) ,
F=
k?1
?
then the p-value of the test is
p = P {F2(k?1),2(n?k?1) ? F}.
page 36
July 7, 2017
8:11
Handbook of Medical Statistics
9.61in x 6.69in
Probability and Probability Distributions
(3) If k = 3, let
b2736-ch01
37
?
n?p?2 1? ? d
и ?
? F2p,2(n?p?2) ,
F=
p
?
then the p-value of the test is
p = P {F2p,2(n?p?2) ? F}.
by 80.82.77.83 on 10/25/17. For personal use only.
References
1. Chow, YS, Teicher, H. Probability Theory: Independence, Interchangeability,
Martingales. New York: Springer, 1988.
2. Fang, K, Xu, J. Statistical Distributions. Beijing: Science Press, 1987.
3. Krishnamoorthy, K. Handbook of Statistical Distributions with Applications. Boca
Raton: Chapman and Hall/CRC, 2006.
4. Patel, JK, Kapadia, CH, and Owen, DB. Handbook of Statistical Distributions. New
York: Marcel Dekker, 1976.
5. Anderson, TW. An Introduction to Multivariate Statistical Analysis. New York: Wiley,
2003.
6. Wang, J. Multivariate Statistical Analysis. Beijing: Science Press, 2008.
Dr. Jian Shi, graduated from Peking University, is Professor at the Academy of Mathematics and Systems
Science in Chinese Academy of Sciences. His research
interests include statistical inference, biomedical statistics, industrial statistics and statistics in sports. He has
held and participated in several projects of the National
Natural Science Foundation of China as well as applied
projects.
page 37
? and ? > 0, then we say X follows the normal
d
distribution and denote it as X = N (х, ? 2 ). In particular, when х = 0 and
? = 1, we say that X follows the standard normal distribution N (0, 1).
page 4
July 7, 2017
8:11
Handbook of Medical Statistics
9.61in x 6.69in
b2736-ch01
Probability and Probability Distributions
5
d
If X = N (х, ? 2 ), then the distribution function of X is
x x?х
t?х
?
=
dt.
?
?
?
??
If X follows the standard normal distribution, N (0, 1), then the density and
distribution functions of X are ?(x) and ?(x), respectively.
The Normal distribution is the most common continuous distribution
and has the following properties:
d
by 80.82.77.83 on 10/25/17. For personal use only.
1. If X = N (х, ? 2 ), then Y =
d
X?х
?
d
d
= N (0, 1), and if X = N (0, 1), then
Y = a + ?X = N (a, ? 2 ).
Hence, a general normal distribution can be converted to the standard
normal distribution by a linear transformation.
d
2. If X = N (х, ? 2 ), then the expectation of X is E(X) = х and the variance
of X is Var(X) = ? 2 .
d
3. If X = N (х, ? 2 ), then the k-th central moment of X is
0,
k is odd,
E((X ? х)k ) =
k!
k
? , k is even.
2k/2 (k/2)!
d
4. If X = N (х, ? 2 ), then the moments of X are
E(X 2k?1 ) = ? 2k?1
k
i=1
(2k ? 1)!!х2i?1
,
(2i ? 1)!(k ? i)!2k?i
and
2k
E(X ) = ?
2k
k
i=0
(2k)!х2i
(2i)!(k ? i)!2k?i
for k = 1, 2, . . ..
d
5. If X = N (х, ? 2 ), then the skewness and the kurtosis of X are both 0, i.e.
s = ? = 0. This property can be used to check whether a distribution is
normal.
d
6. If X = N (х, ? 2 ), then the moment-generating function and the characteristic function of X are M (t) = exp{tх + 12 t2 ? 2 } and ?(t) = exp
{itх ? 12 t2 ? 2 }, respectively.
d
7. If X = N (х, ? 2 ), then
d
a + bX = N (a + bх, b2 ? 2 ).
page 5
July 7, 2017
8:11
Handbook of Medical Statistics
9.61in x 6.69in
b2736-ch01
J. Shi
6
d
8. If Xi = N (хi , ?i2 ) for 1 ? i ? n, and X1 , X2 , . . . , Xn are mutually independent, then
n
n
n
d
2
Xi = N
хi ,
?i .
i=1
i=1
i=1
by 80.82.77.83 on 10/25/17. For personal use only.
9. If X1 , X2 , . . . , Xn represent a random sample of the population N (х, ? 2 ),
2
d
then the sample mean X?n = n1 ni=1 Xi satis?es X?n = N (х, ?n ).
The central limit theorem: Suppose that X1 , . . . , Xn are independent and
identically distributed random variables, and that х = E(X1 ) and 0 < ? 2 =
?
Var(X1 ) < ?, then the distribution of Tn = n(X?n ?х)/? is asymptotically
standard normal when n is large enough.
The central limit theorem reveals that limit distributions of statistics in
many cases are (asymptotically) normal. Therefore, the normal distribution
is the most widely used distribution in statistics.
The value of the normal distribution is the whole real axis, i.e. from
negative in?nity to positive in?nity. However, many variables in real problems take positive values, for example, height, voltage and so on. In these
cases, the logarithm of these variables can be regarded as being normally
distributed.
d
Log-normal distribution: Suppose X > 0. If ln X ? N (х, ? 2 ), then we
d
say X follows the log-normal distribution and denote it as X ? LN (х, ? 2 ).
1.4. Exponential Distribution2,3,4
If the density function of the random variable X is
?e??x , x ? 0,
f (x) =
0,
x < 0,
where ? > 0, then we say X follows the exponential distribution and denote
d
it as X = E(?). Particularly, when ? = 1, we say X follows the standard
exponential distribution E(1).
d
If X = E(?), then its distribution function is
1 ? e??x , x ? 0,
F (x; ?) =
0,
x < 0.
Exponential distribution is an important distribution in reliability. The life
of an electronic product generally follows an exponential distribution. When
page 6
July 7, 2017
8:11
Handbook of Medical Statistics
9.61in x 6.69in
b2736-ch01
Probability and Probability Distributions
7
the life of a product follows the exponential distribution E(?), ? is called the
failure rate of the product.
Exponential distribution has the following properties:
d
1. If X = E(?), then the k-th moment of X is E(X k ) = k??k , k = 1, 2, . . . .
d
2. If X = E(?), then E(X) = ??1 and Var(X) = ??2 .
d
3. If X = E(?), then its skewness is s = 2 and its kurtosis is ? = 6.
d
4. If X = E(?), then the moment-generating function and the characteris?
?
for t < ? and ?(t) = ??it
, respectively.
tic function of X are M (t) = ??t
d
d
by 80.82.77.83 on 10/25/17. For personal use only.
5. If X = E(1), then ??1 X = E(?) for ? > 0.
d
6. If X = E(?), then for any x > 0 and y > 0, there holds
P {X > x + y|X > y} = P {X > x}.
This is the so-called ?memoryless property? of exponential distribution.
If the life distribution of a product is exponential, no matter how long
it has been used, the remaining life of the product follows the same
distribution as that of a new product if it does not fail at the present
time.
d
7. If X = E(?), then for any x > 0, there hold E(X|X > a) = a + ??1 and
Var(X|X > a) = ??2 .
8. If x and y are independent and identically distributed as E(?), then
min(X, Y ) is independent of X ? Y and
d
{X|X + Y = z} ? U (0, z).
9. If X1 , X2 , . . . , Xn are random samples of the population E(?), let
X(1,n) ? X(2,n) ? и и и ? X(n,n) be the order statistics of X1 , X2 , . . . , Xn .
Write Yk = (n ? k + 1)(X(k,n) ? X(k?1,n) ), 1 ? k ? n, where X(0,n) = 0.
Then, Y1 , Y2 , . . . , Yn are independent and identically distributed as E(?).
10. If X1 , X2 , . . . , Xn are random samples of the population of E(?), then
n
d
i=1 Xi ? ?(n, ?), where ?(n, ?) is the Gamma distribution in Sec. 1.12.
d
d
11. If then Y ? U (0, 1), then X = ? ln(Y ) ? E(1). Therefore, it is easy
to generate random numbers with exponential distribution through uniform random numbers.
1.5. Weibull Distribution2,3,4
If the density function of the random variable X is
?
?
??1 exp{? (x??) }, x ? ?,
? (x ? ?)
?
f (x; ?, ?, ?) =
0,
x < ?,
page 7
July 7, 2017
8:11
Handbook of Medical Statistics
9.61in x 6.69in
b2736-ch01
J. Shi
8
d
then we say X follows the Weibull distribution and denote it as X ?
W (?, ?, ?). Where ? is location parameter, ? > 0 is shape parameter, ? > 0,
is scale parameter. For simplicity, we denote W (?, ?, 0) as W (?, ?).
Particularly, when ? = 0, ? = 1, Weibull distribution W (1, ?) is transformed into Exponential distribution E(1/?).
d
by 80.82.77.83 on 10/25/17. For personal use only.
If X ? W (?, ?, ?), then its distribution function is
?
}, x ? ?,
1 ? exp{? (x??)
?
F (x; ?, ?, ?) =
0,
x < ?,
Weibull distribution is an important distribution in reliability theory. It is
often used to describe the life distribution of a product , such as electronic
product and wear product.
Weibull distribution has the following properties:
d
1. If X ? E(1), then
d
Y = (X?)1/? + ? ? W (?, ?, ?)
Hence, Weibull distribution and exponential distribution can be converted
to each other by transformation.
d
2. If X ? W (?, ?), then the k-th moment of X is
k
? k/? ,
E(X k ) = ? 1 +
?
where ?(и) is the Gamma function.
d
3. If X ? W (?, ?, ?), then
1
? 1/? + ?,
E(X) = ? 1 +
?
1
2
2
?? 1+
? 2/? .
Var(X) = ? 1 +
?
?
4. Suppose X1 , X2 , . . . , Xn are mutually independent and identically distributed random variables with common distribution W (?, ?, ?), then
d
X1,n = min(X1 , X2 , . . . , Xn ) ? W (?, ?/n, ?),
d
d
whereas, if X1,n ? W (?, ?/n, ?), then X1 ? W (?, ?, ?).
page 8
July 7, 2017
8:11
Handbook of Medical Statistics
9.61in x 6.69in
Probability and Probability Distributions
b2736-ch01
9
1.5.1. The application of Weibull distribution in reliability
by 80.82.77.83 on 10/25/17. For personal use only.
The shape parameter ? usually describes the failure mechanism of a product.
Weibull distributions with ? < 1 are called ?early failure? life distributions,
Weibull distributions with ? = 1 are called ?occasional failure? life distributions, and Weibull distributions with ? > 1 are called ?wear-out (aging)
failure? life distributions.
d
If X ? W (?, ?, ?) then its reliability function is
?
}, x ? ?,
exp{? (x??)
?
R(x) = 1 ? F (x; ?, ?, ?) =
1,
x < ?.
When the reliability R of a product is known, then
xR = ? + ? 1/? (? ln R)1/?
is the Q-percentile life of the product.
If R = 0.5, then x0.5 = ? + ? 1/? (ln 2)1/? is the median life; if R = e?1 ,
then xe?1 ? + ? 1/? is the characteristic life; R = exp{??? (1 + ??1 )}, then
xR = E(X), that is mean life.
The failure rate of Weibull distribution W (?, ?, ?) is
?
(x ? ?)??1 , x ? ?,
f (x; ?, ?, ?)
= ?
?(x) =
R(x)
0,
x < ?.
The mean rate of failure is
1
??(x) =
x??
x
?(t)dt =
(x??)??1
,
?
x ? ?,
0,
x < ?.
?
Particularly, the failure rate of Exponential distribution E(?) = W (1, 1/?)
is constant ?.
1.6. Binomial Distribution2,3,4
We say random variable follows the binomial distribution, if it takes discrete
values and
P {X = k} = Cnk pk (1 ? p)n?k ,
k = 0, 1, . . . , n,
where n is positive integer, Cnk is combinatorial number, 0 ? p ? 1. We
d
denote it as X ? B(n, p).
Consider n times independent trials, each with two possible outcomes
?success? and ?failure?. Each trial can only have one of the two outcomes.
The probability of success is p. Let X be the total number of successes in this
page 9
July 7, 2017
8:11
Handbook of Medical Statistics
9.61in x 6.69in
b2736-ch01
J. Shi
10
d
by 80.82.77.83 on 10/25/17. For personal use only.
n trials, then X ? B(n, p). Particularly, if n = 1, B(1, p) is called Bernoulli
distribution or two-point distribution. It is the simplest discrete distribution.
Binomial distribution is a common discrete distribution.
d
If X ? B(n, p), then its density function is
min([x],n)
Cnk pk q n?k , x ? 0,
k=0
B(x; n, p) =
0,
x < 0,
where [x] is integer part of x, q = 1 ? p.
x
Let Bx (a, b) = 0 ta?1 (1?t)b?1 dt be the incomplete Beta function, where
0 < x < 1, a > 0, b > 0, then B(a, b) = B1 (a, b) is the Beta function. Let
Ix (a, b) = Bx (a, b)/B(a, b) be the ratio of incomplete Beta function. Then
the binomial distribution function can be represented as follows:
B(x; n, p) = 1 ? Ip (x + 1, n ? [x]),
0 ? x ? n.
Binomial distribution has the following properties:
1. Let b(k; n, p) = Cnk pk q n?k for 0 ? k ? n. If k ? [(n+1)p], then b(k; n, p) ?
b(k ? 1; n, p); if k > [(n + 1)p], then b(k; n, p) < b(k ? 1; n, p).
2. When p = 0.5, Binomial distribution B(n, 0.5) is a symmetric distribution; when p = 0.5, Binomial distribution B(n, p) is asymmetric.
3. Suppose X1 , X2 , . . . , Xn are mutually independent and identically distributed Bernoulli random variables with parameter p, then
Y =
n
d
Xi ? B(n, p).
i=1
d
4. If X ? B(n, p), then
E(X) = np,
Var(X) = npq.
d
5. If X ? B(n, p), then the k-th moment of X is
k
E(X ) =
k
S2 (k, i)Pni pi ,
i=1
where S2 (k, i) is the second order Stirling number, Pnk is number of permutations.
d
6. If X ? B(n, p), then its skewness is s = (1 ? 2p)/(npq)1/2 and kurtosis is
? = (1 ? 6pq)/(npq).
d
7. If X ? B(n, p), then the moment-generating function and the characteristic function of X are M (t) = (q + pet )n and ?(t) = (q + peit )n ,
respectively.
page 10
July 7, 2017
8:11
Handbook of Medical Statistics
9.61in x 6.69in
b2736-ch01
Probability and Probability Distributions
11
8. When n and x are ?xed, the Binomial distribution function b(x; n, p) is a
monotonically decreasing function with respect to p(0 < p < 1).
d
9. If Xi ? B(ni , p) for 1 ? i ? k, and X1 , X2 , . . . , Xk are mutually indepen
d
dent, then X = ki=1 Xi ? B( ki=1 ni , p).
1.7. Multinomial Distribution2,3,4
by 80.82.77.83 on 10/25/17. For personal use only.
If an n(n ? 2)-dimensional random vector X = (X1 , . . . , Xn ) satis?es the
following conditions:
(1) Xi ? 0, 1 ? n, and ni=1 Xi = N ;
(2) Suppose m1 , m2 , . . . , mn are any non-negative integers with ni=1 mi =
N and the probability of the following event is P {X1 = m1 , . . . , Xn =
N!
i
?ni=1 pm
mn } = m1 !иииm
i ,
n!
where pi ? 0, 1 ? i ? n,
n
i=1 pi
= 1, then we say X follows the multinomial
d
distribution and denote it as X ? P N (N ; p1 , . . . , pn ).
Particularly, when n = 2, multinomial distribution degenerates to binomial distribution.
Suppose a jar has balls with n kinds of colors. Every time, a ball is drawn
randomly from the jar and then put back to the jar. The probability for the
ith color ball being drawn is pi , 1 ? i ? n, ni=1 pi = 1. Assume that balls
are drawn and put back for N times and Xi is denoted as the number of
drawings of the ith color ball, then the random vector X = (X1 , . . . , Xn )
follows the Multinomial distribution P N (N ; p1 , . . . , pn ).
Multinomial distribution is a common multivariate discrete distribution.
Multinomial distribution has the following properties:
d
?
=
1. If (X1 , . . . , Xn ) ? P N (N ; p1 , . . . , pn ), let Xi+1
n
p
,
1
?
i
<
n,
then
j=i+1 i
n
?
j=i+1 Xi , pi+1
=
d
? ) ? P N (N ; p , . . . , p , p? ),
(i) (X1 , . . . , Xi , Xi+1
1
i i+1
d
(ii) Xi ? B(N, pi ), 1 ? i ? n.
More generally, let 1 = j0 < j1 < и и и < jm = n, and X?k =
jk
j k
d
i=jk?1 +1 Xi , p?k =
i=jk?1 +1 pi , 1 ? k ? m, then (X?1 , . . . , X?m ) ?
P N (N ; p?1 , . . . , p?m ).
page 11
July 7, 2017
8:11
Handbook of Medical Statistics
9.61in x 6.69in
b2736-ch01
J. Shi
12
d
2. If (X1 , . . . , Xn ) ? P N (N ; p1 , . . . , pn ), then its moment-generating function and the characteristic function of are
?
?
?N
?N
n
n
M (t1 , . . . , tn ) = ?
pj etj ?
and ?(t1 , . . . , tn ) = ?
pj eitj ? ,
j=1
j=1
respectively.
d
3. If (X1 , . . . , Xn ) ? P N (N ; p1 , . . . , pn ), then for n > 1, 1 ? k < n,
(X1 , . . . , Xk |Xk+1 = mk+1 , . . . , Xn = mn ) ? P N (N ? M ; p?1 , . . . , p?k ),
by 80.82.77.83 on 10/25/17. For personal use only.
d
where
M=
n
mi ,
0 < M < N,
i=k+1
pj
p?j = k
i=1 pi
,
1 ? j ? k.
4. If Xi follows Poisson distribution P (?i ), 1 ? i ? n, and X1 , . . . , Xn are
mutually independent, then for any given positive integer N , there holds
n
d
Xi = N ? P N (N ; p1 , . . . , pn ),
X1 , . . . , Xn i=1
n
where pi = ?i / j=1 ?j , 1 ? i ? n.
1.8. Poisson Distribution2,3,4
If random variable X takes non-negative integer values, and the probability is
P {X = k} =
?k ??
e ,
k!
? > 0,
k = 0, 1, . . . ,
d
then we say X follows the Poisson distribution and denote it as X ? P (?).
d
If X ? P (?), then its distribution function is
P {X ? x} = P (x; ?) =
[x]
p(k; ?),
k=0
where p(k; ?) = e?? ?k /k!, k = 0, 1, . . . .
Poisson distribution is an important distribution in queuing theory. For
example, the number of the purchase of the ticket arriving in ticket window
in a ?xed interval of time approximately follows Poisson distribution. Poisson
page 12
July 7, 2017
8:11
Handbook of Medical Statistics
9.61in x 6.69in
b2736-ch01
Probability and Probability Distributions
13
distribution have a wide range of applications in physics, ?nance, insurance
and other ?elds.
Poisson distribution has the following properties:
1. If k < ?, then p(k; ?) > p(k ? 1; ?); if k > ?, then p(k; ?) < p(k ? 1; ?).
If ? is not an integer, then p(k; ?) has a maximum value at k = [?]; if ?
is an integer, then p(k, ?) has a maximum value at k = ? and ? ? 1.
2. When x is ?xed, P (x; ?) is a non-increasing function with respect to ?,
that is
by 80.82.77.83 on 10/25/17. For personal use only.
P (x; ?1 ) ? P (x; ?2 ) if ?1 < ?2 .
When ? and x change at the same time, then
P (x; ?) ? P (x ? 1; ? ? 1)
if x ? ? ? 1,
P (x; ?) ? P (x ? 1; ? ? 1)
if x ? ?.
d
3. If X ? P (?), then the k-th moment of X is E(X k ) =
where S2 (k, i) is the second order Stirling number.
k
i
i=1 S2 (k, i)? ,
d
4. If X ? P (?), then E(X) = ? and Var(X) = ?. The expectation and
variance being equal is an important feature of Poisson distribution.
d
5. If X ? P (?), then its skewness is s = ??1/2 and its kurtosis is ? = ??1 .
d
6. If X ? P (?), then the moment-generating function and the characteristic
function of X are M (t) = exp{?(et ? 1)} and ?(t) = exp{?(eit ? 1)},
respectively.
7. If X1 , X2 , . . . , Xn are mutually independent and identically distributed,
d
d
then X1 ? P (?) is equivalent to ni=1 Xi ? P (n?).
d
8. If Xi ? P (?i ) for 1 ? i ? n, and X1 , X2 , . . . , Xn are mutually independent, then
n
n
d
Xi ? P
?i .
i=1
d
i=1
d
9. If X1 ? P (?1 ) and X2 ? P (?2 ) are mutually independent, then conditional distribution of X1 given X1 + X2 is binomial distribution, that is
d
(X1 |X1 + X2 = x) ? B(x, p),
where p = ?1 /(?1 + ?2 ).
page 13
July 7, 2017
8:11
Handbook of Medical Statistics
9.61in x 6.69in
b2736-ch01
J. Shi
14
1.9. Negative Binomial Distribution2,3,4
For positive integer m, if random variable X takes non-negative integer values, and the probability is
k
pm q k ,
P {X = k} = Ck+m?1
k = 0, 1, . . . ,
where 0 < p < 1, q = 1 ? p, then we say X follows the negative binomial
d
distribution and denotes it as X ? N B(m, p).
by 80.82.77.83 on 10/25/17. For personal use only.
d
If X ? N B(m, p), then its distribution function is
[x]
k
m k
k=0 Ck+m?1 p q , x ? 0,
N B(x; m, p) =
0,
x < 0,
Negative binomial distribution is also called Pascal distribution. It is the
direct generalization of binomial distribution.
Consider a success and failure type trial (Bernoulli distribution), the
probability of success is p. Let X be the total number of trial until it has m
times of ?success?, then X ? m follows the negative binomial distribution
N B(m, p), that is, the total number of ?failure? follows the negative binomial
distribution N B(m, p).
Negative binomial distribution has the following properties:
k
pm q k , where 0 < p < 1, k = 0, 1, . . . , then
1. Let nb(k; m, p) = Ck+m?1
nb(k + 1; m, p) =
m+k
и nb(k; m, p).
k+1
Therefore, if k < m?1
p ? m, nb(k; m, p) increases monotonically; if k >
m?1
p ? m, nb(k; m, p) decreases monotonically with respect to k.
2. Binomial distribution B(m, p) and negative binomial distribution
N B(r, p) has the following relationship:
N B(x; r, p) = 1 ? B(r ? 1; r + [x], p).
3. N B(x; m, p) = Ip (m, [x] + 1), where Ip (и, и) is the ratio of incomplete Beta
function.
d
4. If X ? N B(m, p), then the k-th moment of X is
k
E(X ) =
k
S2 (k, i)m[i] (q/p)i ,
i=1
where m[i] = m(m + 1) и и и (m + i ? 1), 1 ? i ? k, S2 (k, i) is the second
order Stirling number.
page 14
July 7, 2017
8:11
Handbook of Medical Statistics
9.61in x 6.69in
Probability and Probability Distributions
b2736-ch01
15
d
5. If X ? N B(m, p), then E(X) = mq/p, Var(X) = mq/p2 .
d
6. If X ? N B(m, p), then its skewness and kurtosis are s = (1 + q)/(mq)1/2
and ? = (6q + p2 )/(mq), respectively.
d
7. If X ? N B(m, p), then the moment-generating function and the characteristic function of X are M (t) = pm (1 ? qet )?m and ?(t) = pm (1 ?
qeit )?m , respectively.
by 80.82.77.83 on 10/25/17. For personal use only.
d
8. If Xi ? N B(mi , p) for 1 ? i ? n, and X1 , X2 , . . . , Xn are mutually
independent, then
n
n
d
Xi ? N B
mi , p .
i=1
i=1
d
9. If X ? N B(mi , p), then there exists a sequence random variables
X1 , . . . , Xm which are independent and identically distributed as G(p),
such that
d
X = X1 + и и и + Xm ? m,
where G(p) is the Geometric distribution in Sec. 1.11.
1.10. Hypergeometric Distribution2,3,4
Let N, M, n be positive integers and satisfy M ? N, n ? N . If the random variable X takes integer values from the interval [max(0, M + n ?
N ), min(M, n)], and the probability for X = k is
k C n?k
CM
N ?M
,
P {X = k} =
n
CN
where max(0, M + n ? N ) ? k ? min(M, n), then we say X follows the
d
hypergeometric distribution and denote it as X ? H(M, N, n).
d
If X ? H(M, N, n), then the distribution function of X is
n?k
k
min([x],K2 ) CM CN?M
, x ? K1 ,
n
k=K1
CN
H(x; n, N, M ) =
0,
x < K1 ,
where K1 = max(0, Mn ? N ), K2 = min(M, n).
The hypergeometric distribution is often used in the sampling inspection
of products, which has an important position in the theory of sampling
inspection.
Assume that there are N products with M non-conforming ones. We
randomly draw n products from the N products without replacement. Let
page 15
July 7, 2017
8:11
Handbook of Medical Statistics
9.61in x 6.69in
b2736-ch01
J. Shi
16
X be the number of non-conforming products out of this n products, then
it follows the hypergeometric distribution H(M, N, n).
Some properties of hypergeometric distribution are as follows:
k C n?k /C n , then
1. Denote h(k; n, N, M ) = CM
N
n?M
h(k; n, N, M ) = h(k; M, N, n),
by 80.82.77.83 on 10/25/17. For personal use only.
h(k; n, N, M ) = h(N ? n ? M + k; N ? n, N, M ),
where K1 ? k ? K2 .
2. The distribution function of the hypergeometric distribution has the following expressions
H(x; n, N, M ) = H(N ? n ? M + x; N ? n, N, N ? M )
= 1 ? H(n ? x ? 1; n, N, N ? M )
= 1 ? H(M ? x ? 1; N ? n, N, M )
and
1 ? H(n ? 1; x + n, N, N ? m) = H(x; n + x, N, M ),
where x ? K1 .
d
3. If X ? H(M, N, n), then its expectation and variance are
E(X) =
nM
,
N
Var(X) =
nM (N ? n)(N ? M )
.
N 2 (N ? 1)
For integers n and k, denote
n(n ? 1) и и и (n ? k + 1),
n(k) =
n!
k < n,
k ? n.
d
4. If X ? H(M, N, n), the k-th moment of X is
E(X k ) =
k
S2 (k, i)
i=1
n(i) M (i)
.
N (i)
d
5. If X ? H(M, N, n), the skewness of X is
s=
(N ? 2M )(N ? 1)1/2 (N ? 2n)
.
(N M (N ? M )(N ? n))1/2 (N ? 2)
page 16
July 7, 2017
8:11
Handbook of Medical Statistics
9.61in x 6.69in
b2736-ch01
Probability and Probability Distributions
17
d
6. If X ? H(M, N, n), the moment-generating function and the characteristic function of X are
(N ? n)!(N ? M )!
M (t) =
F (?n, ?M ; N ? M ? n + 1; et )
N !(N ? M ? n)!
and
by 80.82.77.83 on 10/25/17. For personal use only.
?(t) =
(N ? n)!(N ? M )!
F (?n, ?M ; N ? M ? n + 1; eit ),
N !(N ? M ? n)!
respectively, where F (a, b; c; x) is the hypergeometric function and its
de?nition is
ab x
a(a + 1)b(b + 1) x2
F (a, b; c, x) = 1 +
+
+ иии
c 1!
c(c + 1)
2!
with c > 0.
A typical application of hypergeometric distribution is to estimate the number of ?sh. To estimate how many ?sh in a lake, one can catch M ?sh,
and then put them back into the lake with tags. After a period of time,
one re-catches n(n > M ) ?sh from the lake among which there are s ?sh
with the mark. M and n are given in advance. Let X be the number of
?sh with the mark among the n re-caught ?sh. If the total amount of ?sh
in the lake is assumed to be N , then X follows the hypergeometric distribution H(M, N, n). According to the above property 3, E(X) = nM/N ,
which can be estimated by the number of ?sh re-caught with the mark, i.e.,
s ? E(X) = nM/N . Therefore, the estimated total number of ?sh in the
lake is N? = nM/s.
1.11. Geometric Distribution2,3,4
If values of the random variable X are positive integers, and the probabilities
are
P {X = k} = q k?1 p,
k = 1, 2, . . . ,
where 0 < p ? 1, q = 1 ? p, then we say X follows the geometric distribution
d
d
and denote it as X ? G(p). If X ? G(p), then the distribution function of
X is
1 ? q [x] , x ? 0,
G(x; p) =
0,
x < 0.
Geometric distribution is named according to what the sum of distribution
probabilities is a geometric series.
page 17
July 7, 2017
8:11
Handbook of Medical Statistics
9.61in x 6.69in
b2736-ch01
J. Shi
18
In a trial (Bernoulli distribution), whose outcome can be classi?ed as
either a ?success? or a ?failure?, and p is the probability that the trial
is a ?success?. Suppose that the trials can be performed repeatedly and
independently. Let X be the number of trials required until the ?rst success
occurs, then X follows the geometric distribution G(p).
Some properties of geometric distribution are as follows:
by 80.82.77.83 on 10/25/17. For personal use only.
1. Denote g(k; p) = pq k?1 , k = 1, 2, . . . , 0 < p < 1, then g(k; p) is a monotonically decreasing function of k, that is,
g(1; p) > g(2; p) > g(3; p) > и и и .
d
2. If X ? G(p), then the expectation and variance of X are E(X) = 1/p
and Var(X) = q/p2 , respectively.
d
3. If X ? G(p), then the k-th moment of X is
k
K
S2 (k, i)i!q i?1 /pi ,
E(X ) =
i=1
where S2 (k, i) is the second order Stirling number.
d
4. If X ? G(p), the skewness of X is s = q 1/2 + q ?1/2 .
d
5. If X ? G(p), the moment-generating function and the characteristic
function of X are M (t) = pet (1 ? et q)?1 and ?(t) = peit (1 ? eit q)?1 ,
respectively.
d
6. If X ? G(p), then
P {X > n + m|X > n} = P {X > m},
for any nature number n and m.
Property 6 is also known as ?memoryless property? of geometric distribution.
This indicates that, in a success-failure test, when we have done n trials
with no ?success? outcome, the probability of the even that we continue to
perform m trials still with no ?success? outcome has nothing to do with the
information of the ?rst n trials.
The ?memoryless property? is a feature of geometric distribution. It can
be proved that a discrete random variable taking natural numbers must
follow geometric distribution if it satis?es the ?memoryless property?.
d
7. If X ? G(p), then
E(X|X > n) = n + E(X).
8. Suppose X and Y are independent discrete random variables, then
min(X, Y ) is independent of X ? Y if and only if both X and Y follow
the same geometric distribution.
page 18
July 7, 2017
8:11
Handbook of Medical Statistics
9.61in x 6.69in
Probability and Probability Distributions
b2736-ch01
19
1.12. Gamma Distribution2,3,4
If the density function of the random variable X is
? ??1 ??x
? x
e
, x ? 0,
?(?)
g(x; ?, ?) =
0,
x < 0,
where ? > 0, ? > 0, ?(и) is the Gamma function, then we say X follows the
Gamma distribution with shape parameter ? and scale parameter ?, and
d
denote it as X ? ?(?, ?).
by 80.82.77.83 on 10/25/17. For personal use only.
d
If X ? ?(?, ?), then the distribution function of X is
? ??1 ??x
? t
e
dt, x ? 0,
?(?)
?(x; ?, ?) =
0,
x < 0.
Gamma distribution is named because the form of its density is similar to
Gamma function. Gamma distribution is commonly used in reliability theory
to describe the life of a product.
When ? = 1, ?(?, 1), is called the standard Gamma distribution and its
density function is
??1 ?x
x
e
, x ? 0,
?(?)
g(x; ?, 1) =
0,
x < 0.
When ? = 1(1, ?) is called the single parameter Gamma distribution, and it
is also the exponential distribution E(?) with density function
?e??x , x ? 0,
g(x; 1, ?) =
0,
x < 0.
More generally, the gamma distribution with three parameters can be
obtained by means of translation transformation, and the corresponding
density function is
?
? (x??)??1 e?? (x??)
, x ? 0,
?(?)
g(x; ?, ?, ?) =
0,
x < ?.
Some properties of gamma distribution are as follows:
d
d
1. If X ? ?(?, ?), then ?X ? ?(?, 1). That is, the general gamma distribution can be transformed into the standard gamma distribution by scale
transformation.
page 19
July 7, 2017
8:11
Handbook of Medical Statistics
9.61in x 6.69in
b2736-ch01
J. Shi
20
2. For x ? 0, denote
I? (x) =
1
?(?)
x
t??1 e?t dt
0
to be the incomplete Gamma function, then ?(x; ?, ?) = I? (?x).
Particularly, ?(x; 1, ?) = 1 ? e??x .
3. Several relationships between gamma distributions are as follows:
by 80.82.77.83 on 10/25/17. For personal use only.
(1) ?(x; ?, 1) ? ?(x;?? + 1, 1) = g(x; ?, 1).
(2) ?(x; 12 , 1) = 2?( 2x) ? 1,
where ?(x) is the standard normal distribution function.
d
4. If X ? ?(?, ?), then the expectation of X is E(X) = ?/? and the variance
of X is Var(X) = ?/? 2 .
d
5. If X ? ?(?, ?), then the k-th moment of X is E(X k ) = ? ?k ?(k+?)/?(?).
d
6. If X ? ?(?, ?), the skewness of X is s = 2??1/2 and the kurtosis of X is
? = 6/?.
d
? ?
) ,
7. If X ? ?(?, ?), the moment-generating function of X is M (t) = ( ??t
?
)? for t < ?.
and the characteristic function of X is ?(t) = ( ??it
d
8. If Xi ? ?(?i , ?), for 1 ? i ? n, and X1 , X2 , . . . , Xn and are independent,
then
n
n
d
Xi ? ?
?i , ? .
i=1
d
i=1
d
9. If X ? ?(?1 , 1), Y ? ?(?2 , 1), and X is independent of Y , then X + Y is
independent of X/Y . Conversely, if X and Y are mutually independent,
non-negative and non-degenerate random variables, and moreover X + Y
is independent of X/Y , then both X and Y follow the standard Gamma
distribution.
1.13. Beta Distribution2,3,4
If the density function of the random variable X is
a?1
x
(1?x)b?1
, 0 ? x ? 1,
B(a,b)
f (x; a, b) =
0,
otherwise,
where a > 0, b > 0, B(и, и) is the Beta function, then we say X follows the
d
Beta distribution with parameters a and b, and denote it as X ? BE(a, b).
page 20
July 7, 2017
8:11
Handbook of Medical Statistics
9.61in x 6.69in
b2736-ch01
Probability and Probability Distributions
21
d
If X ? BE(a, b), then the distribution function of X is
?
?
?
by 80.82.77.83 on 10/25/17. For personal use only.
BE(x; a, b) =
?
?
1,
x > 1,
Ix (a, b),
0 < x ? 1,
0,
x ? 0,
where Ix (a, b) is the ratio of incomplete Beta function.
Similar to Gamma distribution, Beta distribution is named because the
form of its density function is similar to Beta function.
Particularly, when a = b = 1, BE(1, 1) is the standard uniform distribution U (0, 1).
Some properties of the Beta distribution are as follows:
d
d
1. If X ? BE(a, b), then 1 ? X ? BE(b, a).
2. The density function of Beta distribution has the following properties:
(1)
(2)
(3)
(4)
(5)
when a < 1, b ? 1, the density function is monotonically decreasing;
when a ? 1, b < 1, the density function is monotonically increasing;
when a < 1, b < 1, the density function curve is U type;
when a > 1, b > 1, the density function curve has a single peak;
when a = b, the density function curve is symmetric about x = 1/2.
d
3. If X ? BE(a, b), then the k-th moment of X is E(X k ) =
B(a+k,b)
B(a,b) .
d
4. If X ? BE(a, b), then the expectation and variance of X are E(X) =
a/(a + b) and Var(X) = ab/((a + b + 1)(a + b)2 ), respectively.
d
5. If X ? BE(a, b), the skewness of X is s =
kurtosis of X is ? =
3(a+b)(a+b+1)(a+1)(2b?a)
ab(a+b+2)(a+b+3)
+
2(b?a)(a+b+1)1/2
(a+b+2)(ab)2
a(a?b)
a+b ? 3.
and the
d
6. If X ? BE(a, b), the moment-generating function and the characteris? ?(a+k)
tk
tic function of X are M (t) = ?(a+b)
k=0 ?(a+b+k) ?(k+1) and ?(t) =
?(a)
?(a+k)
(it)k
?(a+b) ?
k=0 ?(a+b+k) ?(k+1) , respectively.
?(a)
d
7. Suppose X1 , X2 , . . . , Xn are mutually independent, Xi ? BE(ai , bi ), 1 ?
i ? n, and ai+1 = ai + bi , 1 ? i ? n ? 1, then
n
i=1
d
Xi ? BE
a1 ,
n
bi .
i=1
8. Suppose X1 , X2 , . . . , Xn are independent and identically distributed random variables with common distribution U (0, 1), then min(X1 , . . . , Xn )
page 21
July 7, 2017
8:11
Handbook of Medical Statistics
9.61in x 6.69in
b2736-ch01
J. Shi
22
d
? BE(1, n). Conversely, if X1 , X2 , . . . , Xn are independent and identically distributed random variables, and
d
min(X1 , . . . , Xn ) ? U (0, 1),
d
then X1 ? BE(1, 1/n).
9. Suppose X1 , X2 , . . . , Xn are independent and identically distributed random variables with common distribution U (0, 1), denote
X(1,n) ? X(2,n) ? и и и ? X(n,n)
as the corresponding order statistics, then
by 80.82.77.83 on 10/25/17. For personal use only.
d
X(k,n) ? BE(k, n ? k + 1),
1 ? k ? n,
d
X(k,n) ? X(i,n) ? BE(k ? i, n ? k + i + 1), 1 ? i < k ? n.
10. Suppose X1 , X2 , . . . , Xn are independent and identically distributed random variables with common distribution BE(a, 1).
Let
Y = min(X1 , . . . , Xn ),
then Y
d
a ?
BE(1, n).
d
11. If X ? BE(a, b), where a and b are positive integers, then
BE(x; a, b) =
a+b?1
i
Ca+b?1
xi (1 ? x)a+b?1?i .
i=a
1.14. Chi-square Distribution2,3,4
If Y1 , Y2 , . . . , Yn are mutually independent and identically distributed random variables with common distribution N (0, 1), then we say the random
variable X = ni=1 Yi2 change position with the previous math formula follows the Chi-square distribution (?2 distribution) with n degree of freedom,
d
and denote it as X ? ?2n .
d
If X ? ?2n , then the density function of X is
?x/2 n/2?1
e
x
x > 0,
n/2 ?(n/2) ,
2
f (x; n) =
0,
x ? 0,
where ?(n/2) is the Gamma function.
Chi-square distribution is derived from normal distribution, which plays
an important role in statistical inference for normal distribution. When the
degree of freedom n is quite large, Chi-square distribution ?2n approximately
becomes normal distribution.
page 22
July 7, 2017
8:11
Handbook of Medical Statistics
9.61in x 6.69in
b2736-ch01
Probability and Probability Distributions
23
Some properties of Chi-square distribution are as follows:
d
d
d
by 80.82.77.83 on 10/25/17. For personal use only.
1. If X1 ? ?2n , X2 ? ?2m , X1 and X2 are independent, then X1 + X2 ? ?2n+m .
This is the ?additive property? of Chi-square distribution.
2. Let f (x; n) be the density function of Chi-square distribution ?2n . Then,
f (x; n) is monotonically decreasing when n ? 2, and f (x; n) is a single
peak function with the maximum point n ? 2 when n ? 3.
d
3. If X ? ?2n , then the k-th moment of X is
n
?(n/2 + k)
= 2k ?k?1
.
?
i
+
E(X k ) = 2k
i=0
?(n/2)
2
d
4. If X ? ?2n , then
E(X) = n,
Var(X) = 2n.
?
then the skewness of X is s = 2 2n?1/2 , and the kurtosis of
5. If X ?
X is ? = 12/n.
d
?2n ,
d
6. If X ? ?2n , the moment-generating function of X is M (t) = (1 ? 2t)?n/2
and the characteristic function of X is ?(t) = (1?2it)?n/2 for 0 < t < 1/2.
7. Let K(x; n) be the distribution function of Chi-square distribution ?2n ,
then we have
(1) K(x; 2n) = 1 ? 2 ni=1 f (x; 2i);
(2) K(x; 2n + 1) = 2?(x) ? 1 ? 2 ni=1 f (x; 2i + 1);
(3) K(x; n) ? K(x; n + 2) = ( x2 )n/2 e?x/2 /?( n+2
2 ), where ?(x) is the standard normal distribution function.
d
d
d
8. If X ? ?2m , Y ? ?2n , X and Y are independent, then X/(X + Y ) ?
BE(m/2, n/2), and X/(X + Y ) is independent of X + Y .
Let X1 , X2 , . . . , Xn be the random sample of the normal population
N (х, ? 2 ). Denote
n
n
1
Xi , S 2 =
(Xi ? X?)2 ,
X? =
n
i=1
then
d
S 2 /? 2 ?
i=1
?2n?1 and is independent of X?.
1.14.1. Non-central Chi-square distribution
d
Suppose random variables Y1 , . . . , Yn are mutually independent, Yi ?
N (хi , 1), 1 ? i ? n, then the distribution function of the random variable
X = ni=1 Yi2 is the non-central Chi-square distribution with the degree of
freedom n and the non-central parameter ? = ni=1 х2i , and is denoted as
?2n,? . Particularly, ?2n,0 = ?2n .
page 23
July 7, 2017
8:11
Handbook of Medical Statistics
9.61in x 6.69in
b2736-ch01
J. Shi
24
d
9. Suppose Y1 , . . . , Ym are mutually independent, and Yi ? ?2ni ,?i for 1 ?
m
m
d
2
i ? m, then m
i=1 Yi ? ?n,? where, n =
i=1 ni , ? =
i=1 ?i .
d
10. If X ? ?2n,? then E(X) = n + ?, Var(X) = 2(n + 2?), the skewness of X
? n+3?
n+4?
is s = 8 (n+2?)
3/2 , and the kurtosis of X is ? = 12 (n+2?)2 .
d
by 80.82.77.83 on 10/25/17. For personal use only.
11. If X ? ?2n,? , then the moment-generating function and the characteristic
function of X are M (t) = (1 ? 2t)?n/2 exp{t?/(1 ? 2t)} and ?(t) =
(1 ? 2it)?n/2 exp{it?/(1 ? 2it)}, respectively.
1.15. t Distribution2,3,4
d
d
Assume X ? N (0, 1), Y ? ?
?2n , and X is independent of Y . We say the
?
random variable T = nX/ Y follows the t distribution with n degree of
d
freedom and denotes it as T ? tn .
d
If X ? tn , then the density function of X is
?( n+1
2 )
t(x; n) =
(n?)1/2 ?( n2 )
x2
1+
n
?(n+1)/2
,
for ?? < x < ?.
De?ne T (x; n) as the distribution function of t distribution, tn , then
T (x; n) =
1
1 n
2 In/(n+x2 ) ( 2 , 2 ),
1
1
1 n
2 + 2 In/(n+x2 ) ( 2 , 2 ),
x ? 0,
x < 0,
where In/(n+x2 ) ( 12 , n2 ) is the ratio of incomplete beta function.
Similar to Chi-square distribution, t distribution can also be derived
from normal distribution and Chi-square distribution. It has a wide range of
applications in statistical inference on normal distribution. When n is large,
the t distribution tn with n degree of freedom can be approximated by the
standard normal distribution.
t distribution has the following properties:
1. The density function of t distribution, t(x; n), is symmetric about x = 0,
and reaches the maximum at x = 0.
2
2. limn?? t(x; n) = ?12? e?x /2 = ?(x), the limiting distribution for t distribution is the standard normal distribution as the degree of freedom n
goes to in?nity.
page 24
July 7, 2017
8:11
Handbook of Medical Statistics
9.61in x 6.69in
b2736-ch01
Probability and Probability Distributions
25
d
by 80.82.77.83 on 10/25/17. For personal use only.
3. Assume X ? tn . If k < n, then E(X k ) exists, otherwise, E(X k ) does not
exist. The k-th moment of X is
?
0
if 0 < k < n, and k is odd,
?
?
?
?
?
? ?( k+1 )?( n?k ) k2
?
? 2?
2
if 0 < k < n, and k is even,
??( n
)
E(X k ) =
2
?
?
?
?
doesn?t exist if k ? n,
and k is odd,
?
?
?
?
?
if k ? n,
and k is even.
d
4. If X ? tn , then E(X) = 0. When n > 3, Var(X) = n/(n ? 2).
d
5. If X ? tn , then the skewness of X is 0. If n ? 5, the kurtosis of X is
? = 6/(n ? 4).
6. Assume that X1 and X2 are independent and identically distributed random variables with common distribution ?2n , then the random variable
Y =
1 n1/2 (X2 ? X1 ) d
? tn .
2 (X1 X2 )1/2
Suppose that X1 , X2 , . . . , Xn are random samples of the normal population
N (х, ? 2 ), de?ne X? = n1 ni=1 Xi , S 2 = ni=1 (Xi ? X?)2 , then
T =
n(n ? 1)
X? ? х d
? tn?1 .
S
1.15.1. Non-central t distribution
d
d
2
Suppose that
? X ? N (?, 1), Y ? ?n , X and Y are independent, then
?
T = nX/ Y is a non-central t distributed random variable with n degree
d
of freedom and non-centrality parameter ?, and is denoted as T ? tn,? .
Particularly, tn,0 = tn .
7. Let T (x; n, ?) be the distribution function of the non-central t distribution
tn,? , then we have
T (x; n, ?) = 1 ? T (?x; n, ??),
?
T (1; 1, ?) = 1 ? ?2 (?/ 2).
d
8. If X ? tn,? , then E(X) =
?2 ) ?
(E(X))2
for n > 2.
n ?( n?1
)
2
2 ?( n
)
2
T (0; n, ?) = ?(??),
? for n > 1 and Var(X) =
n
n?2 (1
+
page 25
July 7, 2017
8:11
Handbook of Medical Statistics
9.61in x 6.69in
b2736-ch01
J. Shi
26
1.16. F Distribution2,3,4
d
d
Let X and Y be independent random variables such that X ? ?2m , Y ? ?2n .
Y
De?ne a new random variable F as F = X
m / n . Then the distribution of F
is called the F distribution with the degrees of freedom m and n, denoted
d
as F ? Fm,n .
d
If X ? Fm,n , then the density function of X is
by 80.82.77.83 on 10/25/17. For personal use only.
f (x; m, n) =
? m m
m?2 (n)2
?
? B(
1+
m n x 2
)
?
?
2 2
m+n
mx ? 2
n
, x > 0,
x ? 0.
0,
Let F (x; m, n) be the distribution function of F distribution, Fm,n , then
F (x; m, n) = Ia (n/2, m/2), where a = xm/(n + mx), Ia (и, и) is the ratio of
incomplete beta function.
F distribution is often used in hypothesis testing problems on two or
more normal populations. It can also be used to approximate complicated
distributions. F distribution plays an important role in statistical inference.
F distribution has the following properties:
1. F distributions are generally skewed, the smaller of n, the more it skews.
2. When m = 1 or 2, f (x; m, n) decreases monotonically; when m >
2, f (x; m, n) is unimodal, the mode is n(m?2)
(n+2)m .
d
d
3. If X ? Fm,n , then Y = 1/X ? Fn,m .
d
d
4. If X ? tn , then X 2 ? F1,n .
d
5. If X ? Fm,n , then the k-th moment of X is
E(X k ) =
?
m
n
n k ?( 2 +k)?( 2 ?k)
?
?( m
)
, 0 < k < n/2,
?( m )?( n )
2
?
?
2
?,
d
k ? n/2.
6. Assume that X ? Fm,n . If n > 2, then E(X) =
Var(X) =
2n2 (m+n?2)
.
m(n?2)2 (n?4)
d
n
n?2 ;
if n > 4, then
7. Assume that X ? Fm,n . If n > 6, then the skewness of X is
(2m+n?2)(8(n?4))1/2
; if n >
(n?6)(m(mn ?2))1/2
12((n?2)2 (n?4)+m(m+n?2)(5n?22))
.
m(n?6)(n?8)(m+n?2)
s =
8, then the kurtosis of X is ? =
page 26
July 7, 2017
8:11
Handbook of Medical Statistics
9.61in x 6.69in
b2736-ch01
Probability and Probability Distributions
27
8. When m is large enough and n > 4, the normal distribution function
?(y) can be used to approximate the F distribution function F (x; m, n),
where y =
x?n
n?2
2(n+m?2) 1/2
n
(
)
n?2
m(n?4)
, that is, F (x; m, n) ? ?(y).
d
Suppose X ? Fm,n . Let Zm,n = ln X, when both m and n are large enough,
the distribution of Zm,n can be approximated by the normal distribution
1
1
), 12 ( m
+ n1 )), that is,
N ( 12 ( n1 ? m
by 80.82.77.83 on 10/25/17. For personal use only.
d
Zm,n ? N
1
1 1
?
,
2 n m
1
2
1
1
+
m n
.
Assume that X1 , . . . , Xm are random samples of the normal population
N (х1 , ?12 ) and Y1 , . . . , Yn are random samples of the normal population
N (х2 , ?22 ). The testing problem we are interested in is whether ?1 and ?2
are equal.
n
2
2
?1
2
De?ne ??12 = (m ? 1)?1 m
i=1 (Xi ? X?) and ??2 = (n ? 1)
i=1 (Yi ? Y? )
2
2
as the estimators of ?1 and ?2 , respectively. Then we have
d
??12 /?12 ? ?2m?1 ,
d
??22 /?22 ? ?2n?1 ,
where ??12 and ??22 are independent. If ?12 = ?22 , by the de?nition of F distribution, the test statistics
2
(n ? 1)?1 m
??12 /?12 d
i=1 (Xi ? X?)
=
? Fm?1,n?1 .
F =
(m ? 1)?1 ni=1 (Yi ? Y? )2
??22 /?22
1.16.1. Non-central F distribution
d
d
Y
If X ? ?2m,? , Y ? ?2n , X and Y are independent, then F = X
m / n follows a
non-central F distribution with the degrees of freedom m and n and nond
centrality parameter ?. Denote it as F ? Fm,n,? . Particularly, Fm,n,0 = Fm,n .
d
d
10. If X ? tn,? , then X 2 ? F1,n,? .
d
11. Assume that X ? F1,n,? . If n > 2 then E(X) =
Var(X) =
n 2 (m+?)2 +(m+2?)(n?2)
)
.
2( m
(n?2)2 (n?4)
(m+?)n
(n?2)m ;
if n > 4, then
page 27
July 7, 2017
8:11
Handbook of Medical Statistics
9.61in x 6.69in
b2736-ch01
J. Shi
28
1.17. Multivariate Hypergeometric Distribution2,3,4
by 80.82.77.83 on 10/25/17. For personal use only.
Suppose X = (X1 , . . . , Xn ) is an n-dimensional random vector with n ? 2,
which satis?es:
(1) 0 ? Xi ? Ni , 1 ? i ? n, ni=1 Ni = N ;
(2) let m1 , . . . , mn be positive integers with ni=1 mi = m, the probability
of the event {X1 = m1 , . . . , Xn = mn } is
n
mi
i=1 CNi
,
P {X1 = m1 , . . . , Xn = mn } =
m
CN
then we say X follows the multivariate hypergeometric distribution, and
d
denote it as X ? M H(N1 , . . . , Nn ; m).
Suppose a jar contains balls with n kinds of colors. The number of balls
of the ith color is Ni , 1 ? i ? n. We draw m balls randomly from the jar
without replacement, and denote Xi as the number of balls of the ith color
for 1 ? i ? n. Then the random vector (X1 , . . . , Xn ) follows the multivariate
hypergeometric distribution M H(N1 , . . . , Nn ; m).
Multivariate hypergeometric distribution has the following properties:
d
1. Suppose (X1 , . . . , Xn ) ? M H(N1 , . . . , Nn ; m).
k
k
Xi , Nk? = ji=j
Ni ,
For 0 = j0 < j1 < и и и < js = n, let Xk? = ji=j
k?1 +1
k?1 +1
d
1 ? k ? s, then (X1? , . . . , Xs? ) ? M H(N1? , . . . , Ns? ; m).
Combine the components of the random vector which follows multivariate hypergeometric distribution into a new random vector, the new random
vector still follows multivariate hypergeometric distribution.
d
2. Suppose (X1 , . . . , Xn ) ? M H(N1 , . . . , Nn ; m), then for any 1 ? k < n,,
we have
m?
P {X1 = m1 , . . . , Xk = mk } =
where N =
n
?
i=1 Ni , Nk+1
=
n
mk
m1 m2
CN
CN2 и и и CN
CN ?k+1
1
k
?
i=k+1 Ni , mk+1
k+1
m
CN
=m?
Especially, when k = 1, we have P {X1 = m1 } =
H(N1 , N, m).
,
k
i=1 mi .
m?
m
CN 1 CN ?2
1
2
m
CN
d
, that is X1 ?
page 28
July 7, 2017
8:11
Handbook of Medical Statistics
9.61in x 6.69in
b2736-ch01
Probability and Probability Distributions
29
Multivariate hypergeometric distribution is the extension of hypergeometric distribution.
d
3. Suppose (X1 , . . . , Xn ) ? M H(N1 , . . . , Nn ; m), 0 < k < n, then
P {X1 = m1 , . . . , Xk = mk |Xk+1 = mk+1 , . . . , Xn = mn }
by 80.82.77.83 on 10/25/17. For personal use only.
=
mk
m1
и и и CN
CN
1
k
?
m
CN
?
,
where, N ? = ki=1 Ni , m?k+1 = m? ni=k+1 mi . This indicates that, under
the condition of Xk+1 = mk+1 , . . . , Xn = mn , the conditional distribution
of (X1 , . . . , Xk ) is M H(N1 , . . . , Nk ; m? ).
d
4. Suppose Xi ? B(Ni , p), 1 ? i ? n, 0 < p < 1, and X1 , . . . , Xn are mutually independent, then
n
d
Xi = m ? M H(N1 , . . . , Nn ; m).
X1 , . . . , Xn i=1
This indicates that, when the sum of independent binomial random variables is given, the conditional joint distribution of these random variables
is a multivariate hypergeometric distribution.
d
5. Suppose (X1 , . . . , Xn ) ? M H(N1 , . . . , Nn ; m). If Ni /N ? pi when
N ? ? for 1 ? i ? n, then the distribution of (X1 , . . . , Xn ) converges to
the multinomial distribution P N (N ; p1 , . . . , pn ).
In order to control the number of cars, the government decides to implement the random license-plate lottery policy, each participant has the same
probability to get a new license plate, and 10 quotas are allowed each issue.
Suppose 100 people participate in the license-plate lottery, among which
10 are civil servants, 50 are individual household, 30 are workers of stateowned enterprises, and the remaining 10 are university professors. Denote
X1 , X2 , X3 , X4 as the numbers of people who get the license as civil servants,
individual household, workers of state-owned enterprises and university professors, respectively. Thus, the random vector (X1 , X2 , X3 , X4 ) follows the
multivariate hypergeometric distribution. M H(10, 50, 30, 10; 10). Therefore,
in the next issue, the probability of the outcome X1 = 7, X2 = 1, X3 = 1,
X4 = 1 is
P {X1 = 7, X2 = 1, X3 = 1, X4 = 1} =
7 C1 C1 C1
C10
50 30 10
.
10
C100
page 29
July 7, 2017
8:11
Handbook of Medical Statistics
9.61in x 6.69in
b2736-ch01
J. Shi
30
1.18. Multivariate Negative Binomial Distribution2,3,4
by 80.82.77.83 on 10/25/17. For personal use only.
Suppose X = (X1 , . . . , Xn ) is a random vector with dimension n(n ? 2)
which satis?es:
(1) Xi takes non-negative integer values, 1 ? i ? n;
(2) If the probability of the event {X1 = x1 , . . . , Xn = xn } is
(x1 + и и и + xn + k ? 1)! k x1
p0 p1 и и и pxnn ,
P {X1 = x1 , . . . , Xn = xn } =
x1 ! и и и xn !(k ? 1)!
where 0 < pi < 1, 0 ? i ? n, ni=0 pi = 1, k is a positive integer, then we
say X follows the multivariate negative binomial distribution, denoted
d
as X ? M N B(k; p1 , . . . , pn ).
Suppose that some sort of test has (n + 1) kinds of di?erent results, but only
one of them occurs every test with the probability of pi , 1 ? i ? (n + 1). The
sequence of tests continues until the (n + 1)-th result has occurred k times.
At this moment, denote the total times of the i-th result occurred as Xi for
1 ? i ? n, then the random vector (X1 , . . . , Xn ) follows the multivariate
negative binomial distribution MNB(k; p1 , . . . , pn ).
Multivariate negative binomial distribution has the following properties:
d
1. Suppose (X1 , . . . , Xn ) ? M N B(k; p1 . . . , pn ). For 0 = j0 < j1 < и и и <
jk
jk
?
js = n, let Xk? =
i=jk?1 +1 Xi , pk =
i=jk?1 +1 pi , 1 ? k ? s, then
d
(X1? , . . . , Xs? ) ? M N B(k; p?1 . . . , p?s ).
Combine the components of the random vector which follows multivariate
negative binomial distribution into a new random vector, the new random
vector still follows multivariate negative binomial distribution.
d
r1
rn
2. If (X1 , . . . , X
1 и и и Xn ) = (k +
Pnn) ? M N B(k; p1 . . . , pn ), then E(X
n
n
ri
i=1 ri ?n
i=1 (pi /p0 ) , where p0 = 1 ?
i=1 ri ? 1)
i=1 pi .
d
d
3. If (X1 , . . . , Xn ) ? M N B(k; p1 . . . , pn ), 1 ? s < n, then (X1 , . . . , Xs ) ?
MNB(k; p?1 . . . , p?s ), where p?i = pi /(p0 + p1 + и и и + ps ), 1 ? i ? s, p0 =
1 ? ni=1 pi .
d
0
).
Especially, when s = 1, X1 ? N B(k, p0p+p
1
1.19. Multivariate Normal Distribution5,2
A random vector X = (X1 , . . . , Xp ) follows the multivariate normal distri
d
bution, denoted as X ? Np (х, ), if it has the following density function
?1
1
p ? 2
1
(x ? х) ,
f (x) = (2?)? 2 exp ? (x ? х)
2
page 30
July 7, 2017
8:11
Handbook of Medical Statistics
9.61in x 6.69in
b2736-ch01
Probability and Probability Distributions
31
where x = (x1 , . . . , xp ) ? Rp , х ? Rp ,
is a p О p positive de?nite matrix,
?| и |? denotes the matrix determinant, and ? ? denotes the transition matrix
transposition.
Multivariate normal distribution is the extension of normal distribution.
It is the foundation of multivariate statistical analysis and thus plays an
important role in statistics.
Let X1 , . . . , Xp be independent and identically distributed standard normal random variables, then the random vector X = (X1 , . . . , Xp ) follows
by 80.82.77.83 on 10/25/17. For personal use only.
d
the standard multivariate normal distribution, denoted as X ? Np (0, Ip ),
where Ip is a unit matrix of p-th order.
Some properties of multivariate normal distribution are as follows:
1. The necessary and su?cient conditions for X = (X1 , . . . , Xp ) following
multivariate normal distribution is that a X also follows normal distribution for any a = (a1 , . . . , ap ) ? Rp .
d
2. If X ? Np (х, ), we have E(X) = х, Cov(X) = .
d
3. If X ? Np (х, ), its moment-generating function and characteristic
function are M (t) = exp{х t + 12 t t} and ?(t) = exp{iх t ? 12 t t}
for t ? Rp , respectively.
4. Any marginal distribution of a multivariate normal distribution is still a
d
multivariate normal distribution. Let X = (X1 , . . . , Xp ) ? N (х, ),
= (?ij )pОp . For any 1 ? q < p, set
where х = (х1 , . . . , хp ) ,
(1)
(1)
X = (X1 , . . . , Xq ) , х = (х1 , . . . , хq ) , 11 = (?ij )1?i,j?1 , then we
d
d
have X(1) ? Nq (х(1) , 11 ). Especially, X1 ? N (хi , ?ii ), 1 ? i ? p.
d
5. If X ? Np (х, ), B denotes an q О p constant matrix and a denotes an
q О 1 constant vector, then we have
d
B ,
a + BX ? Nq a + Bх, B
which implies that the linear transformation of a multivariate normal
random vector still follows normal distribution.
d
6. If Xi ? Np (хi , i ), 1 ? i ? n, and X1 , . . . , Xn are mutually indepen
d
dent, then we have ni=1 Xi ? Np ( ni=1 хi , ni=1 i ).
d
d
7. If X ? Np (х, ), then (X ? х) ?1 (X ? х) ? ?2p .
d
as follows:
8. Let X = (X1 , . . . , Xp ) ? Np (х, ), and divide X, х and
х(1)
X(1)
11
12
, х=
,
= ,
X=
(2)
(2)
X
х
21
22
page 31
July 7, 2017
8:11
Handbook of Medical Statistics
9.61in x 6.69in
b2736-ch01
J. Shi
32
by 80.82.77.83 on 10/25/17. For personal use only.
where X(1) and х(1) are q О 1 vectors, and 11 is an q О q matrix, q < p,
then X(1) and X(2) are mutually independent if and only if 12 = 0.
d
in the same
9. Let X = (X1 , . . . , Xp ) ? Np (х, ), and divide X, х and
manner as property 8, then the conditional distribution of X(1) given
?1 (2) ? х(2) ),
X(2) is Nq (х(1) + 12 ?1
11 ?
12
21 ).
22 (X
22
d
the
10. Let X = (X1 , . . . , Xp ) ? Np (х, ), and divide X, х and in
(1) are
X
same manner as property 8, then X(1) and X(2) ? 21 ?1
11
d
(1)
(1)
independent, and X ? Nq (х , 11 ),
?1 (1)
?1
(1) d
(2)
X
?
N
?
?
?
х
,
?
?
?
?
?
х
X(2) ? ?21 ??1
p?q
21 11
22
21 11 12 .
11
Similarly, X(2) and X(1) ?
(х(2) , 22 ),
?1
12
22
d
X(2) are independent, and X(2) ? Np?q
?1 (2)
?1
(2) d
(1)
X
?
N
?
?
?
х
,
?
?
?
?
?
х
.
X(1) ? ?12 ??1
q
12
11
12
21
22
22
22
1.20. Wishart Distribution5,6
Let X1 , . . . , Xn be independent and identically distributed p-dimensional
random vectors with common distribution Np (0, ), and X = (X1 , . . . , Xn )
be an pОn random matrix. Then, we say the p-th order random matrix W =
XX = ni=1 Xi Xi follows the p-th order (central) Wishart distribution with
d
n degree of freedom, and denote it as W ? Wp (n, ). Here the distribution
of a random matrix indicates the distribution of the random vector generated
by matrix vectorization.
d 2
?n , which implies
Particularly, if p = 1, we have W = ni=1 Xi2 ?
that Wishart distribution is the extension of Chi-square distribution.
d
> 0, n ? p, then density function of W is
If W ? Wp (n, ), and
|W|(n?p?1)/2 exp{? 12 tr( ?1 W)}
,
fp (W) =
2(np/2) | |n/2 ? (p(p?1)/4) ?pi=1 ?( (n?i+1)
)
2
where W > 0, and ?tr? denotes the trace of a matrix.
Wishart distribution is a useful distribution in multivariate statistical
analysis and plays an important role in statistical inference for multivariable
normal distribution.
Some properties of Wishart distribution are as follows:
d
1. If W ? Wp (n, ), then E(W) = n .
page 32
July 7, 2017
8:11
Handbook of Medical Statistics
9.61in x 6.69in
b2736-ch01
Probability and Probability Distributions
33
by 80.82.77.83 on 10/25/17. For personal use only.
d
d
2. If W ? Wp (n, ), and C denotes an k О p matrix, then CWC ?
Wk (n, C C ).
d
3. If W ? Wp (n, ), its characteristic function is E(e{itr(TW)} ) = |Ip ?
2i T|?n/2 , where T denotes a real symmetric matrix with order p.
d
4. If Wi ? Wp (ni , ), 1 ? i ? k, and W1 , . . . , Wk are mutually indepen
d
dent, then ki=1 Wi ? Wp ( ki=1 ni , ).
5. Let X1 , . . . , Xn be independent and identically distributed p-dimensional
> 0, and X =
random vectors with common distribution Np (0, ),
(X1 , . . . , Xn ).
(1) If A is an n-th order idempotent matrix, then the quadratic form
d
matrix Q = XAX ? Wp (m, ), where m = r(A), r(и) denotes the
rank of a matrix.
(2) Let Q = XAX , Q1 = XBX , where both A and B are idempotent matrices. If Q2 = Q ? Q1 = X(A ? B)X ? 0, then
d
Q2 ? Wp (m ? k, ), where m = r(A), k = r(B). Moreover, Q1
and Q2 are independent.
d
> 0, n ? p, and divide W and
into q-th order
6. If W ? Wp (n, ),
and (p ? q)-th order parts as follows:
W=
W11
W12
W21
W22
,
11
=
21
12
,
22
then
d
(1) W11 ? Wq (n, 11 );
?1
W12 and (W11 , W21 ) are independent;
(2) W22 ? W21 W11
d
?1
(3) W22 ? W21 W11 W12 ? Wp?q (n ? q, 2|1 ) where 2|1 = 22 ? 21
?1 12 .
11
?1
d
1
> 0, n > p + 1, then E(W?1 ) = n?p?1
.
7. If W ? Wp (n, ),
p
d
d
> 0, n ? p, then |W| = | | i=1 ?i , where
8. If W ? Wp (n, ),
d
?1 , . . . , ?p are mutually independent and ?i ? ?2n?i+1 , 1 ? i ? p.
d
> 0, n ? p, then for any p-dimensional non-zero
9. If W ? Wp (n, ),
vector a, we have
a ?1 a d 2
? ?n?p+1 .
a W?1 a
page 33
July 7, 2017
8:11
Handbook of Medical Statistics
9.61in x 6.69in
b2736-ch01
J. Shi
34
1.20.1. Non-central Wishart distribution
Handbook of Medical Statistics D```
###### Документ
Категория
Без категории
Просмотров
5
Размер файла
631 Кб
Теги
0001, 9789813148963
1/--страниц
Пожаловаться на содержимое документа