close

Вход

Забыли?

вход по аккаунту

?

Закон Ципфа для случайных текстов с неравными вероятностями букв и пирамидa Паскаля.

код для вставкиСкачать
???????? ?????. ??????????
2012, ? 12, c. 30?33
http://www.ksu.ru/journals/izv_vuz/
e-mail: izvuz.matem@ksu.ru
?.?. ????A???, ?.?. ??????
?A??? ????A ??? ????A???? ???????
? ???A????? ????????????? ???? ? ???A???A ?A??A??
?????????. ?a???a????a???? ?????? ?????????? ???? ? ???a???????? ???a????? ????????????? ????. ???a?a??, ??? ??????????? p(r) ???? ?a??a r ????? ????????? ??????? a??????????. ? ??????? ?? ?a??? ?????????a???? ?a???? ????a?a ? ???????a???a ?a?? ????????
???a?a???????? ???????a????? ?????a??, ???????a ?a??? ???a? ??????a ??? ???a?a????
?????????? ?a???a.
???????? ?????: ?a??? ????a, ?????? ????????, ?????????? ??a???????, ????????? ?a????,
???a???a ?a??a??, ?????a???? ????????a?????????, ????????a????? ??a??????.
???: 519.213
????????, ??? ?a?????? ?a??? ??????a???? ? a????????? ?????? ????? ?the?, ?????? ?? ?a????? ??????a?????? ? ?of?, ?a??? ???a???, ?a???? ?????????? w ?????a ??????a???????
?? ?a?? r(w) ? ????? ? ?a??????? ??????. ?a????a ????a f (w) ???????????? ?a? ????????? ?????????? ???? w ? ?????? ? ????? ????? ?????a. ?a??? ????a (???????? ? ??????
???????? ???????? ???a) ???????a??, ??? ???????????? r(w) ?a f (w) ???? ???????? ????????a? ???????a, ?a??a? 0.1 ??? a?????????? ?????a. ? ?a??????? ????? ??a???a?? Google
Labs ???????? ?a???? ?? ???????a?a? ?a?????a?a??? 4% ???? ????a-???? ?????????a????
???? [1]. ????a??? ?a??? ?a????a?, ????a?, ?????????a? ?? ?????? ?????????? ?????????
?? ??????a????????a???? (??? ???????a ?? ?a???a????a?? ?????????? ???a????) ??a?????? r ? f ??a ?a?????? ?a??? ??????a???? ???? a?????????? ????a ? 2000 ????, ????? ???
lg f = ?1.05182 ? 1.00026 lg r. ?a??? ??????? ????a??? ??????????? ?a???? ???? ?????????
??????? ? ?a????? ????a ??????? ????a???? ?????? ?a??? ?a????.
??? ?????????? ?a???a ????a ???????a???? ?a??????? ?????a?????: ??a???? ????? ?a??? ?????????? a?????? ?a ??? ???? ?a???? ? [2], ?a? ?? ?a???a????a???? ????? ????? ?a???a
?? ????? ??????? ????????????? ??????, ???????????? ?.?. ?a??????. ? ???? ?????? ??
????????????? ?? ????? ??a?????????? ?????????? ?a???a ????a: ?????? ??????????, ?a???? ?a? ???a?????? ???? ? ???????????? p0 ?a???a???? ?a ??????, ???? ?a???????????
?a ???? ?? 26 a????????? ???? [3]?[5]. ?????? ?a???a???? ????????a????????? ???? ?????
????? ??????a??. ????????, ??? ????a ????a????? ????? ?????a???? ? ????? ? ??? ??
????????????, ?? ?a??? ???? ??????. ?????a??? ????? p(r) ??????????? ????????? ????a
?a??a r. ????? ??????, ??? ?c1 , c2 : c1 < ln p(r) ? ? ln r < c2 , ??? ? ????a? p0 = 1/27 ?????
? = ln 27/ ln 26. ? ???? ?????? ?? ???????????, ??? ???????? ?a????? ????????? ?a??????? ???? ? ?????? ??????a????, ??? ???????? ????a ??? ?????????????? ?a?a???? ?a?????,
????????? 25.11.2011
?????? ????????? ??? ?????????? ????????? ??????????? ????? ??????????????? ????????????, ????? ? 12-06-00404-a.
30
?A??? ????A ??? ????A???? ??????? ? ???A????? ????????????? ????
31
?a???????? C. ??????-?a?? [6], [7] (??????????? ??????????a???? ??????? ??a?????? ?????????? ??a?????? ???a?a???????? ?a???????????).
????????? ?????? ???????? ?a ????a? ???a???????????? ???? ???? ????a?? ???????????? ???a??? ? 12-???a?????? ?a???? ????a?a ? ???????a???a [8]. ?? ???a?a????????
???a??????a, a?a????????? (1) (??. ????), ????????a?? ?????????? ?a???????? ?????? ???
???????????? ???????. ? ?a???????? ????? ??a??? a????? ?????: ????? ??, ???????,
??????? ????? ???a?a???????? ?????????? ?a???a ??? ???a???? ???????????? ??? ???? ??????????? ???????????? ? ??????a?? ????????? ???? ????????? ?a???? ???????a?a. ? ?????? ?? ??????a??? ???????? ???a?a????????, ?????????? ?????? ????????: ? ??????? ?????????a??? ??????? ???a???? ?a??a??. ???????, ??? ? ????a? ?a???????? ?a?????????
???????????? ???? ????????? ?a??? ???????? ?? ?????a. ????????a??? ???????????? ?????????? ? ???a?a???????? ?a????? ??? ?a???????? ????? ?????? ? ??????? ????? ??????
??? ??????????????? ???a?a????? ????? ????????a ?a???a [9].
n
pi = 1 ? p0 ,
??????a. ????? ??????????? ????????? ???? ?a??? p1 , . . . , pn , n > 1
a ? ???? ?????? ??a??????
n
i=1
i=1
p?i
= 1. ????a
?c1 , c2 : c1 < ln p(r) ? ln r/? < c2 .
(1)
??????????????. ?????? ?a??????????? ?????? ?????, ?a?? ???????? ????a?? ?a???? ???????, ??? ???a????? ????a ????? ???????? ?a??? (?a??? ????????????? ?? ?????? ?a ???????????? ???????). ??? ????a, ? ????a?? ??????? k1 ???? 1-?? ???a,. . . , kn ? n-??, ?????
???? ? ?? ?? ??????????? Pr(k) = pk11 . . . pknn p0 , ?? ?a??? ???? ??????. ?????????? ?a???
n )!
. ?a??????????? ?a?a???? ??????????a????? ????????????? M (k1 , . . . , kn ) = (k1k+иии+k
1 !...kn !
?? ????????? ??????????? f ? (0, 1] ? ?????a??? ????? Q(f ) ?a?? ?????????? ????a w,
???????? ??????????? ?? ?????? f ? ??????????a???? ?? ??????a??a??? ????????????
?????? ???? ????. ?a?, ?a??????, Q(p0 ) = 1 (???? ???? ?????? ?????), Q(p p0 ) = 2, ???
p = max{p1 , . . . , pn } (????? ???????a?a???? ?????????????? ?a??????a), ? ?. ?. ????????,
Q ? ??????a??a??a? ???????-????????a?, ??????a??a? ???? ?a???a????? ??a????? ???????, ???????a??? ? ????????????? ??? f ? 0. ????? ????????, ??? ??? ??????? ?????
????????? ??????? ???????????, ?????? ?.
?a?? ????a w (?. ?. ?????????? ???? ? ?a?a?? ??????????? ?????a ?? w ????????????)
?a?a???? ?????? ???????????????? ????????????? M (k1 , . . . , kn ), ??? ????????a??? ??????? ?? ???? k1 , . . . , kn , ??? ??????? Pr(k) ? f .
????? Q(x)
= Q(p0 e?x ), x ? 0. ????????,
M (k1 , . . . , kn ),
(2)
Q(x)
=
k?0:L1 k1 +иии+Ln kn ?x
??? Li = ? ln pi , i = 1, . . . , n. ???????????? ??????? ???????????? ?????????????? ????????
ln Q(x)??x.
??? ????? ???????????????? ??? ????????? ???????? ???????????? ?????????
n
e?Li = 1 ? p0 . ?????? Li = ?Li ? ????? x = ?x.
???a???? ?a??a??. ?a?????, ??? ? ?a?
?????
n
i=1
e?Li
= 1, a ??????????? (2) ?????????a???? ? ????
i=1
Q(x)
=
k?0:L1 k1 +иии+Ln kn ?x
M (k1 , . . . , kn ).
32
?.?. ????A???, ?.?. ??????
????? ????????, ??? ???? ?? ???a????a ???? ????? ??????? x , ?? ????????? ???a??????a?
???????a. ?a??? ???a???, ????? ????a? ?????? ? ????a? ? = 1.
????, ? ?????????? ??? ??????????? ????????
n
e?Li = 1.
(3)
i=1
??????? Q(x)
?????????a ??? x ? 0. ?????? ???????????? ?? ????? ??? x < 0. A?a???????
????? ???????????? ???a???? ?a??a?? ?????? ??? ?????a??????? ??a?????? ki . ????a
????? ????? ? ???a???? ?a??a??, ?a ??????????? M (0, . . . , 0), ???? ????a ????? ???????
?????, ??????? ?a ??????? ??????? ???? ?? ????????. ??? ???????? ? ????????a??????
? Ln ) + Q0 (x), ??? Q0 ? ????????a ?????a??a
? L1 ) + и и и + Q(x
??a?????? Q(x)
= Q(x
(???????, ?a??a? ???? ??? ?????a??????? ??a?????? a???????a ? ??????? ??? ???????a???????).
??? x ? L(n) = max{L1 , . . . , Ln } ??????? ???????????? ???????????
n (x ? L1 ) + и и и + Q
n (x ? Ln ),
n (x) = Q
Q
(4)
n (x) = Q(x)
+ 1/(n ? 1). ? ????a?, ????a ??? ????? Li /Lj , i, j = 1, . . . , n, ?a????a????,
??? Q
????? ?????a???? ????????a????????? ? ??????a ???a???a???? ? ??????? ????????? ?????
?????? ??? ?a??? ????????a??????????. ? ????? ????a??? ?????? ????? ?????? ?????
?????????a?? ??????? ???a??????a.
?????a? ??????????? (3) ?a ex , ???????, ??? ???a?a?????a? ??????? ?????????????
n (x)e?x
n (x) ???????????? q(x) = Q
??a?????? (4). ? ???? ????????? ??????????a ??????? Q
??????? ??????????, a ??a???, ?????????? ?a??? ????????????? c1 ? c2 , ??? c1 < q(x) < c2
??? ???? 0 ? x ? L(n). ?a????? ? ???????????? ??????????? (4) ??a?a???? ? ??a??? ?a???
?a ???? ?????? ????? (??????), ?a?????? ??????a? ?????????? ???a??????a c1 < q(x) < c2
?? x ? L(n) + L(n),
??? L(n)
= min{L1 , . . . , Ln }. ???????? ?a??? ????????? ???????a???, ?a ???????? ????? ?a??? ???a??? ?????????? ???a??????a ??? ?????? ????? ??????
???????? x. ??????a????????a? ???a???????, ??????? ??????????? ?? ???a??????????
n (x) ? x, a ??a???, ? ?a?????? ln Q(x)
? x.
ln Q
?a??? ???a???, ???a?a? ????????? ??????? a?????????? ? ???????? ????? ??????? ???
????. ? ????a? ??????? Q(x) ???a?a???? ?????????? ?a???a ?a?a???? ???????? ? ??a??????
n
p?i = 1. ???? ?? ?a???a????a?? ??????????? p(r) ??????a?????? r-?? ????a ? ?a???????
i=1
??????, ?? ??????? ????????? ??????? a?????????? ? ???a?a????? 1/? (?. ?. ?????? ???????). ?a?????, ??? ???????a ?????????? ?a???a ?? ?a????? ??????a?????? ??a ????? (a ??
??a) ?a?????? ?????????? ???? ?a??????? ??????????? ??????, ?????????? ? ?a?? Google
Labs (??a?????????, ?????????, ????????, ???a?????? ? ?????????? ?a??a???? a??????????), ???a???a??, ??? ???a?a???? ??????? ??? ?a????? ????a ??a???? ??????? ?? ???????
? ?????? ? ??a?????, ???????????? ? ??????? ?????????? ???????.
A????? ??a???a??? ?.?. ????????a ?a ???????a???? ??????????.
??????????
[1] Michel J.B., Shen Y.K., Aiden A.P., Veres A., Gray M.K., Pickett J.P., Hoiberg D., Clancy D., Norvig P.,
Orwant J., Pinker S., Nowak M.A., Aiden E.L. Quantitative analysis of culture using millions of digitized
books, Science 331 (6014), 176?182 (2011),
http://www.librarian.net/wp-content/uploads/science-googlelabs.pdf.
?A??? ????A ??? ????A???? ??????? ? ???A????? ????????????? ????
33
[2] ?a???? ?.?., ?a????a ?.?. ? ?a???? ????a ? ?a?????? ?a???????????? ? ??????????? ? ?????????,
?a???. ?a????? 80 (5), 718?732 (2006).
[3] Mandelbrot B. An informational theory of the statistical structure of languages, in: Communication Theory,
Ed. by W. Jackson (Betterworth, 1953), pp. 486?502.
[4] Miller G.A. Some e?ects of intermittent silence, Amer. J. Psychology 70, 311?314 (1957).
[5] Li W. Random texts exhibit Zipf ?s-law-like word frequency distribution, IEEE Transactions on Information
Theory 38 1842?1845 (1992).
[6] ??????-?a?? ?.?. ? ?a??????????? ???? ???????? ????a ?? ?a????? ??????a??????, ?????. ?????a??
??????. 24 (4), 102-107 (1988).
[7] ??????-?a?? ?.?. ? ??????a?????? ???????? ???? ? ? ?????? ?a??????a???? ???a? ? ??????a????,
?a????-?????????a? ??????a???, ???. 2, ? 1, 28?31 (1987).
[8] Conrad B., Mitzenmacher M. Power laws for monkeys typing randomly: the case of unequal probabilities, IEEE
Transac. 50, 1403?1414 (2004), http://www.eecs.harvard.edu/?michaelm/postscripts/toit2004a.pdf.
[9] Bochkarev V.V., Lerner E.Yu. Zipf and non-Zipf laws for homogeneous Markov chain,
http://arxiv.org.abs/1207.1872
?.?. ????a???
a????????, ?a????a ?a?????????,
?a?a????? (???????????) ?????a????? ???????????,
??. ?????????a?, ?. 18, ?. ?a?a??, 420008, ??????,
e-mail: vbochkarev@mail.ru
?.?. ??????
??????, ?a????a ????????????? ???????????,
?a?a????? (???????????) ?????a????? ???????????,
??. ?????????a?, ?. 18, ?. ?a?a??, 420008, ??????,
e-mail: eduard.lerner@gmail.com
V.V. Bochkarev and E.Yu. Lerner
The Zipf law for random texts with unequal letter probabilities and the Pascal pyramid
Abstract. The model of word generation with independent unequal letter probabilities is analyzed
in the article. It is proved that the probability p(r) of words of rank r has the power asymptotic
behavior. Elementary methods not similar to Conrad and Mitzenmacher ones are used to represent
a short proof of the theorem. We derive also an explicit formula of power.
Keywords: Zipf law, monkey model, order statistics, power laws, Pascal pyramid, recursive sequences, functional equations.
V.V. Bochkarev
Assistant, Chair of Radiophysics,
Kazan (Volga Region) Federal University,
18 Kremlyovskaya str., Kazan, 420008 Russia,
e-mail: vbochkarev@mail.ru
E.Yu. Lerner
Associate Professor, Chair of Economic Cybernetics,
Kazan (Volga Region) Federal University,
18 Kremlyovskaya str., Kazan, 420008 Russia,
e-mail: eduard.lerner@gmail.com
Документ
Категория
Без категории
Просмотров
9
Размер файла
153 Кб
Теги
закон, случайных, неравными, буква, вероятностями, ципфа, текстом, пирамид, паскаль
1/--страниц
Пожаловаться на содержимое документа