21.33 Quantitative Analysis of Text
The (Rinker 2023) package provides an extensive suite of functions to support the quantitative analysis of text.
We can obtain simple summaries of a list of words, and to do so we
will illustrate with the terms from our Term Document Matrix
tdm. We first extract the shorter terms from each of our
documents into one long word list. To do so we convert tdm
into a matrix, extract the column names (the terms) and retain those
shorter than 20 characters.
We can then summarise the word list. Notice, in particular, the use of qdap::dist_tab() from qdap (Rinker 2023) to generate frequencies and percentages.
## [1] 6456
## [1] "abstract" "academi" "accur" "accuraci" "acnntex" "acsi"
## [7] "act" "address" "adjust" "adopt" "advanc" "advantag"
## [13] "advers" "affect" "algorithm"
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 3.000 5.000 6.000 6.644 8.000 19.000
##
## 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18
## 579 867 1044 1114 935 651 397 268 200 138 79 63 34 28 22 21
## 19
## 16
## interval freq cum.freq percent cum.percent
## 1 3 579 579 8.97 8.97
## 2 4 867 1446 13.43 22.40
## 3 5 1044 2490 16.17 38.57
## 4 6 1114 3604 17.26 55.82
## 5 7 935 4539 14.48 70.31
## 6 8 651 5190 10.08 80.39
## 7 9 397 5587 6.15 86.54
## 8 10 268 5855 4.15 90.69
## 9 11 200 6055 3.10 93.79
## 10 12 138 6193 2.14 95.93
....
References
———. 2023. Qdap: Bridging the Gap Between Qualitative Data and Quantitative Analysis. https://trinker.github.io/qdap/.
Your donation will support ongoing availability and give you access to the PDF version of this book. Desktop Survival Guides include Data Science, GNU/Linux, and MLHub. Books available on Amazon include Data Mining with Rattle and Essentials of Data Science. Popular open source software includes rattle, wajig, and mlhub. Hosted by Togaware, a pioneer of free and open source software since 1984. Copyright © 1995-2022 Graham.Williams@togaware.com Creative Commons Attribution-ShareAlike 4.0