20.35 Letter Frequency

Next we want to review the frequency of letters across all of the words in the discourse. Some data preparation will transform the vector of words into a list of letters, which we then construct a frequency count for, and pass this on to be plotted.

We again use a pipeline to string together the operations on the data. Starting from the vector of words stored in word we split the words into characters using stringr::str_split() from (Wickham 2019b), removing the first string (an empty string) from each of the results (using BiocGenerics::sapply()). Reducing the result into a simple vector, using BiocGenerics::unlist(), we then generate a data frame recording the letter frequencies, using qdap::dist_tab() from . We can then plot the letter proportions.

