21.35 Letter Frequency
Next we want to review the frequency of letters across all of the words in the discourse. Some data preparation will transform the vector of words into a list of letters, which we then construct a frequency count for, and pass this on to be plotted.
We again use a pipeline to string together the operations on the
data. Starting from the vector of words stored in word
we
split the words into characters using stringr::str_split() from
(Wickham 2023), removing the first string (an empty string) from each
of the results (using BiocGenerics::sapply()). Reducing the result into
a simple vector, using BiocGenerics::unlist(), we then generate a data
frame recording the letter frequencies, using qdap::dist_tab()
from . We can then plot the letter proportions.
References
Your donation will support ongoing availability and give you access to the PDF version of this book. Desktop Survival Guides include Data Science, GNU/Linux, and MLHub. Books available on Amazon include Data Mining with Rattle and Essentials of Data Science. Popular open source software includes rattle, wajig, and mlhub. Hosted by Togaware, a pioneer of free and open source software since 1984. Copyright © 1995-2022 Graham.Williams@togaware.com Creative Commons Attribution-ShareAlike 4.0