21.34 Word Length Counts
A simple plot is then effective in showing the distribution of the word lengths. Here we create a single column data frame that is passed on to ggplot2::ggplot() to generate a histogram, with a vertical line to show the mean length of words.
data.frame(nletters=nchar(words)) %>% ggplot(aes(x=nletters)) + geom_histogram(binwidth=1) + geom_vline(xintercept=mean(nchar(words)), colour="green", size=1, alpha=.5) + labs(x="Number of Letters", y="Number of Words")
Your donation will support ongoing development and give you access to the PDF version of this book. Desktop Survival Guides include Data Science, GNU/Linux, and MLHub. Books available on Amazon include Data Mining with Rattle and Essentials of Data Science. Popular open source software includes rattle, wajig, and mlhub. Hosted by Togaware, a pioneer of free and open source software since 1984.
Copyright © 1995-2021 Graham.Williams@togaware.com Creative Commons Attribution-ShareAlike 4.0.