close

Вход

Забыли?

вход по аккаунту

?

9781119282105.index

код для вставкиСкачать
305
Index
a
e
Adobe PDF see get data, pdf
application program interfaces
(API) 282–289
area under the curve (AUC) 194
AUC see area under the curve (AUC)
emoji 108–113
emoticons
punctuation based 106, 107
Unicode 105–107
Euclidean distance 69, 139, 145, 148,
170
b
bag of words 20
bar plot see visualization, bar plot
box plot 263, 265
c
challenges see text mining, challenges
commonality cloud see visualization,
commonality cloud
comparison cloud see visualization,
comparison cloud
confusion matrix 195
corpus, volatile corpus 44
cosine distance 139, 177, 178
cross validation 192–193
csv file see get data, csv
g
get data
csv 294
doc or .docx 295
multiple files 296–298
pdf 298–299
txt 295
xls or xlsx 295–296
i
inner join 122
k
k‐means clustering 130–138
k‐mediod clustering 144–145
d
l
data partition 185, 192–193
dendrogram see visualization,
dendrogram
document term matrix (DTM),
example 21
lasso regression 189–191
latent Dirichlet allocation
(LDA) 155–156
LDA see latent Dirilechet allocation
(LDA)
Text Mining in Practice with R, First Edition. Ted Kwartler.
© 2017 John Wiley & Sons Ltd. Published 2017 by John Wiley & Sons Ltd.
306
Index
m
machine learning 20
maximum entropy 241–242
mean squared error 228–230
MS Excel document see get data, xls or
xlsx
MS Word document see get data, doc
or docx
n
named entity recognition (NER) 238,
242, 246–247
o
OCR see optical character recognition
(OCR)
open NLP project 237–238
optical character recognition
(OCR) 299–302
p
Plutchik’s wheel of emotion 86–87
polarity 93–96
precision 218–219
preprocessing functions 37–43
principle of least effort 91–92
pyramid plot, polarized tag cloud see
visualization, pyramid plot
r
recall 218–219
receiver operator characteristics
(ROC) 194–195
ridge regression 189–191
s
sentiment analysis, definition 85
sentiment word cloud see
visualization, sentiment word
cloud
Silhouette plot 138
skip gram method 174–175
spell check 45–46
spherical k‐means 139–144
string distance 147–154
string manipulation
extraction 32
keyword scanning grep 33
nchar 26
paste 30
split 31
string counts 36
substitutions 28
subjectivity lexicon 89–91
supervised learning
document classification 181–183
prediction 209–210
syntactic parsing 22
t
term document matrix (TDM),
example 21, 22
term frequency (TF) 47, 53
term frequency inverse document
frequency (TFIDF) 100–101
text mining
challenges 6
definition 1, 17
workflow 9
TF see term frequency (TF)
TFIDF see term frequency inverse
document frequency (TFIDF)
tidy data format 120–121
tidytext, sentiment 118, 125
tm definition see text mining,
definition
topic modeling 154–169
txt file see get data, txt
u
unsupervised learning 129–130, 147,
154
v
visualization
bar plot 55–57
commonality cloud 75–78
comparison cloud 75–79
dendrogram 67–73
Index
pyramid plot 80–81
sentiment word cloud 96, 102
treemap 168–169
word cloud 8, 73–83
word network 59–67
Vox Populi, wisdom of crowds 4
word cloud see visualization, word cloud
word network see visualization, word
network
word to vector 169–174
workflow see text mining, workflow
w
XML 290–292
web scraping 272–282
wisdom of crowds see Vox Populi,
wisdom of crowds
x
z
Zipf ’s law 91–93
307
Документ
Категория
Без категории
Просмотров
2
Размер файла
57 Кб
Теги
9781119282105, index
1/--страниц
Пожаловаться на содержимое документа