close

Вход

Забыли?

вход по аккаунту

?

7757.Автоматическая классификация текстов на основе заголовка рубрики

код для вставкиСкачать
«
»
#
$
4. . +
*
. .3
'
)
(futujaos@gmail.com)
,
(louk_nat@mail.ru)
,
6
1.
4
"
$
"
$ $
( $
"
)
:
.
"
!
$
,
$
$1
[Sebastiani, 2002]. 4
$
,
$
.
,
[(
., 2008].
$
"
$
" $
!% "
"
!
$
$
$
$
$
!
$
,
,
. 0
$
$
!% "
&
$ "
"
$
$
.
$
,
!
$
"
$
$ $
"
$
,
"
"
[Bates, 1988]. 0
"
"
$
,
$
"
$
$
",
.
"
$
$
WordNet [Miller, 1998]
k $
"
" (
$
$
2.
#
%
"
$
. k-nearest neighbor algorithm, kNN).
Reuters-101.
$
)
6
$
$ .
$
[Gliozzo et al., 2005]
% !
$
(
1
"
,
10
interest, ship, wheat, corn.
,
,
$ !%
. Latent Semantic Indexing, LSI)
$
8493
% "
.
%
+
"
Reuters-21578 - earn, acquisitions, money, grain, crude, trade,
.
- 238 -
(
$
. Gaussian Mixture algorithm).
.
$
$
[Rodriguez et al., 1997]
WordNet
-/
(
. Widrow-Hoff),
.
6
"
$
[Glickman et al., 2006]. RTE-11 $
$
",
%
$
!% "
(
. Rocchio)
'
% !
.#
-
,
$
. #
". 3
WordNet
!%
"
$ "
$
"
,
-
,
"
"
.
"
% !
$
$
$
[Barak et al., 2009].
"
.
&
% !
. -
LSI
$
$
"
%
$
$
,
3. (
,
,
WordNet.
)
7 $
$
,
$
,
$
. -
!
$
.
#
$
$
% !
(
$
. bag of words). WordNet,
% !
$
,
,
% !
.
$
+ ,
"
",
+ - .
1 2 1 3
!1 2 !!1 3 !
./0 , 7
+ ,
$ "
!
+ - ,
.
#
WordNet
2
$
,
%
!%
.
,
.
interest:
Noun:
[ (62)S: (n) interest, involvement (a sense of concern with and curiosity about someone
or something) "an interest in music"
1
2
Recognising Textual Entailment Challenge,
-$
. http://pascallin.ecs.soton.ac.uk/Challenges/RTE/
WordNet, $1
"
WordNet.
- 239 -
!%
,
[ (32)S: (n) sake, interest (a reason for wanting something done) "for your sake"; "died
for the sake of his country"; "in the interest of safety"; "in the common interest"
[ (21)S: (n) interest, interestingness (the power of attracting or holding one's attention
(because it is unusual or exciting etc.)) "they said nothing of great interest"; "primary
colors can add interest to a room"
[ ...
Verb:
[ (5)S: (v) interest (excite the curiosity of; engage the interest of)
[ (2)S: (v) concern, interest, occupy, worry (be on the mind of) "I worry about the
second Germanic consonant shift"
[ …
7
$
$
,
,
!
., ,
interest
$
"
- “a sense of concern with and curiosity
about someone or something”,
62,
“a reason for wanting something done”
" 32 . .
(
:
1
k
$
$
.
2
%
$
.
% ,
k
$
$
.
1
3
%
$
.
% ,
k
$
!% "
$
.
4 $
!%
,
$
%
. +
% "
.
5 )
$
,
$
%
.
6 $
!
k
$
.
$
!
%
.
% !
!
,
! $
,
, $
!% $
"
$
. 0
$
$
!
!
$
,
$
"
$
".
( + 1.
(
' $
0
grain
grain, cereal, wheat, rice, malt, Indian rice, maltster, coarse-grained
interest
interest, involvement, sake, enthusiasm, concern, occupy, apt, vexation
trade
trade, craft, merchandise, selling, marketing, import, fair trade, plumber
1
-
moneyer,
WordNet .
$
$
,
grain - grainy, granular
"
money
granulate.
- 240 -
,
! !
monetary
$
$
1
$
WordNet
4
!%
.&
,
!
.
4.
k
4. $
)
$
% ! WordNet,
,
$
",
.
,
!
"
"
.4
!
"
$
"
*
1
$
%
$
"
,
2
$
!
.
.
!
,
,
"
.
$
$
. &
"
)
$
,
$
4
.089:;<<,
4
%
$
.
/
"
;<<,
)
,
,
,
$
,
!% "
"
$
"
$
!% "
!. )
,
,
"
, 089$
.
[0, 1],
,
$
"
$,
WordNet $
!%
". )
,
"
,
$
$
!%
"
$
$
"
,
$
,
.
$
. 6
$
".
$
2
$
( + 2.
8
7-
$
-
4
!%
,
567,
"4
,
"
3
:
.
)
"
"
(
)
. interest involvemen sake
interest
):
interest
&
%
(
&
enthusiam concern
occupy apt
vexation
1
0.971
0.030
0.030
0.030
0.030
0.030
0.030
0.030
2
0.189
0.189
0.189
0.189
0.189
0.189
0.189
0
3
0.987
0.031
0.031
0.031
0.031
0.031
0.031
0
4
0.912
0.094
0
0.152
0.104
0.130
0
0
- 241 -
"
5.
)
0 ' kNN
$
$
. 7 $
"
$
",
. 4
$
$
,
$
$
,
,
. &
$
,
"-
$
$
.
-
$
kNN.
$
k
$
"
.
.
&
"
$
%
"
$
$
$
$
$
$
,
"
"
",
,
$
$
- {grain, trade}
$
,
3
$
$
%
" 3.
$
" 2),
,
( + 3.
1
$
.)
",
%
"
3
.
$
. )
$
. )
.
.
{money, crude}
2,
"
$
crude,
,
!
,
(money, crude, grain trade
.
(
kNN
2
$
,
"
4
+
k=5
5
grain
grain
Grain
trade
trade
grain, trade
grain, trade
trade
money
trade, money grain, trade
grain, trade
grain, trade
crude
money,
crude
money,
crude
crude
grain, trade
grain, trade
corn
money,
crude
money,
crude
-
*
$
k,
). 4
kNN
!%
k = 9.
4573
grain
3, 5, 7, 9 (
10
66
.
6. !
)
$
,
$
,
%
$
4
WordNet,
!
kNN.
: ma_p - 242 -
$
, ma_r -
!
!%
,
ma_f -
f1-
, mi_p, mi_r, mi_f , 2004] .
,&
[(
#
$
$
$
!% "
!
SVM.
. #
WordNet
kNN
$
,
"
$
( + 4.
,
& '
(
+
+
ma_r
ma_f
mi_p
mi_r
mi_f
0.6375
0.5983
0.6173
0.7017
0.6343
0.6663
0.6231
0.7235
0.6696
0.5953
0.7422
0.6607
0.7128
0.6684
0.6899
0.7540
0.7512
0.7526
$
$
!% !
!
10,
$
10
".
SVM
1
$ $
libsvm
2
$
.*
$
!%
ma_f = 0.4836
mi_f = 0.7494
$
[Barak et al. 2009] $
10. #
$ !% "
f1: mi_f = 0.76 ( .
!!
"
$
%
,
$ $
.
&
,
"
$
$
SVM. 0
:
$
Reuters" $
!
$
$
4). 4
$
"
$
$
% ! tf-idf
f1-
:
,
LSI
f1
(
):
tf-idf,
LSI.
,
$
,
,
,
kNN
$
7. *
Reuters-10
ma_p
#
. 6
"
mi_f = 0.65
mi_f = 0.79
,
$
" $
$
!%
".
'
"
$
$
WordNet,
1
2
$
,
,
$
,
$
$
,
http://www.csie.ntu.edu.tw/~cjlin/libsvm/index.html
http://www.csie.ntu.edu.tw/~cjlin/libsvmtools/multilabel/
- 243 -
.% !
"
$
"
,
$
% !
kNN.
,
[Barak et al., 2009] Barak L., Dagan I., Shnarch E. University of Toronto, Bar-Ilan
University: Text Categorization from Category Name via Lexical Reference // Proceedings
of NAACL HLT 2009: Short Papers, pages 33-36, Boulder, Colorado, June 2009.
[Bates, 1988] Bates M. 1988. How to use controlled vocabularies more effectively in
online searching // Online archive V.12, Issue 6, pp. 45-56.
[Gliozzo et al., 2005] Gliozzo A., Strapparava C., Dagan I. Istituto per la Ricerca
Scientifica e Tecnologica, Bar Ilan University: Investigating Unsupervised Learning for
Text Categorization Bootstrapping // Proceedings of Human Language Technology
Conference and Conference on Empirical Methods in Natural Language Processing
(HLT/EMNLP), pages 129–136, Vancouver, October 2005.
[Glickman et al., 2006] Glickman O., Shnarch E., Dagan I. Bar Ilan University:
Lexical Reference: a Semantic Matching Subtask // Proceedings of the 2006 Conference on
Empirical Methods in Natural Language Processing (EMNLP 2006), pages 172–179,
Sydney, July 2006.
[Miller, 1998] Miller G. Nouns in WordNet // WordNet – An Electronic Lexical
Database / Fellbaum, C (ed). The MIT Press, 1998. P. 23-47.
[Rodriguez et al., 1997] Rodriguez M., Gomez-Hidalgo J., Diaz-Agudo B.
Universidad Complutense de Madrid: Using WordNet to complement training information
in text categorization // Recent Advances in Natural Language Processing, 1997.
[Sebastiani, 2002] Sebastiani F. Machine Learning in Automated Text Categorization
// ACM Computing Reviews. 2002. V.34, N 1.
[
,
, 2004] (
. ., &
.). 4
'4
#’2004. // '
" "
4
#
.
# %
, 2004.
[
., 2008] (
. ., - $
*. ., 3
. .(
$
:
$
// .
&
.
2
. 2008. , 150.
4. C. 25-40.
- 244 -
$
7
'
<
!
'
. . (ilia2010@yandex.ru)
(
1.
6
" (
,
",
$
!
$1
,
"
)
, $
$ $
"
"
.
4
"
"
"
"
"
$
% !
"
#
"
,
.&
(
)
!,
"
"
!
,
" (Chetviorkin & Loukachevitch, 2012).
$
!
"
"
"
- $
"
%
..
".
$
"
#
.
"
,
(Chetviorkin & Loukachevitch, 2012). $
"
" !
, $
!% !
. "
,
"
"
,
$ "
. '
"
"
,
$
!
"
-
!
!
"
,
$
.
,
,
(Takamura et al., 2005).
"
$
$
,
!
,
"
(Weiss, 2001),
$
"
$
!%
%
,
!%
$
:
2
,
"
3
4$
.
"
,
5$
.
2.
*
"
$
,
»
.
$% ",
% "
" $
Lau et al., 2011; Qiu et al., 2011). 4
$
«
"
(Kanayama & Nasukawa, 2007;
$
"
WordNet
!
.4
,
- 245 -
"
Документ
Категория
Без категории
Просмотров
1
Размер файла
204 Кб
Теги
автоматическая, 7757, основы, текстом, классификация, рубрик, заголовки
1/--страниц
Пожаловаться на содержимое документа