Next we want to review the frequency of letters across all of the words in the discourse. Some data preparation will transform the vector of words into a list of letters, which we then construct a frequency count for, and pass this on to be plotted.
We again use a pipeline to string together the operations on the
data. Starting from the vector of words stored in
split the words into characters using stringr::str_split() from
(Wickham 2022b), removing the first string (an empty string) from each
of the results (using BiocGenerics::sapply()). Reducing the result into
a simple vector, using BiocGenerics::unlist(), we then generate a data
frame recording the letter frequencies, using qdap::dist_tab()
from . We can then plot the letter proportions.
Your donation will support ongoing availability and give you access to the PDF version of this book. Desktop Survival Guides include Data Science, GNU/Linux, and MLHub. Books available on Amazon include Data Mining with Rattle and Essentials of Data Science. Popular open source software includes rattle, wajig, and mlhub. Hosted by Togaware, a pioneer of free and open source software since 1984. Copyright © 1995-2022 Graham.Williams@togaware.com Creative Commons Attribution-ShareAlike 4.0