21.35 Letter Frequency

Next we want to review the frequency of letters across all of the words in the discourse. Some data preparation will transform the vector of words into a list of letters, which we then construct a frequency count for, and pass this on to be plotted.

We again use a pipeline to string together the operations on the data. Starting from the vector of words stored in word we split the words into characters using stringr::str_split() from (Wickham 2023), removing the first string (an empty string) from each of the results (using BiocGenerics::sapply()). Reducing the result into a simple vector, using BiocGenerics::unlist(), we then generate a data frame recording the letter frequencies, using qdap::dist_tab() from . We can then plot the letter proportions.

References

———. 2023. Stringr: Simple, Consistent Wrappers for Common String Operations. https://stringr.tidyverse.org.

Your donation will support ongoing availability and give you access to the PDF version of this book. Desktop Survival Guides include Data Science, GNU/Linux, and MLHub. Books available on Amazon include Data Mining with Rattle and Essentials of Data Science. Popular open source software includes rattle, wajig, and mlhub. Hosted by Togaware, a pioneer of free and open source software since 1984. Copyright © 1995-2022 Graham.Williams@togaware.com Creative Commons Attribution-ShareAlike 4.0