20.35 Letter Frequency

Next we want to review the frequency of letters across all of the words in the discourse. Some data preparation will transform the vector of words into a list of letters, which we then construct a frequency count for, and pass this on to be plotted.

We again use a pipeline to string together the operations on the data. Starting from the vector of words stored in word we split the words into characters using stringr::str_split() from (Wickham 2019b), removing the first string (an empty string) from each of the results (using BiocGenerics::sapply()). Reducing the result into a simple vector, using BiocGenerics::unlist(), we then generate a data frame recording the letter frequencies, using qdap::dist_tab() from . We can then plot the letter proportions.



Your donation will support ongoing development and give you access to the PDF version of the book. Desktop Survival Guides include Data Science, GNU/Linux, and MLHub. Books available on Amazon include Data Mining with Rattle and Essentials of Data Science. Popular open source software includes rattle, wajig, and mlhub. Hosted by Togaware, a pioneer of free and open source software since 1984.
Copyright © 1995-2021 Graham.Williams@togaware.com Creative Commons Attribution-ShareAlike 4.0.