Data scientists write programs to ingest, fuse, clean, wrangle, visualise, analyse, and model data. Programming over data is a core task for the data scientist. We will primarily use R (R Core Team 2023) and in particular the tidyverse as our programming language and assume basic familiarity of R as may be gained from the many resources available on the Intranet, particularly from https://cran.r-project.org/manuals.html.
The development of the tidyverse has been instrumental in bringing R into the modern data science era and the resources provided by RStudio and the tidyverse community are extensive. In particular, as you develop your data analyses, be sure to have the RStudio cheatsheets for the tidyverse in front of you. You will find them invaluable. Visit https://rstudio.com/resources/cheatsheets/.
Programmers of data develop sentences or code. Code instructs a computer to perform specific tasks. A collection of sentences written in a language is what we might call a program. Through programming by example and learning by immersion we will share programs to deliver insights and outcomes from our data.
R is a large and complex ecosystem for the practice of data science. There is much freely available information on the Internet from which we can continually learn and borrow useful code segments that illustrate almost any task we might think of. We introduce here the basics for getting started with R, libraries and packages which extend the language, and the concepts of functions, commands, and operators.
Your donation will support ongoing availability and give you access to the PDF version of this book. Desktop Survival Guides include Data Science, GNU/Linux, and MLHub. Books available on Amazon include Data Mining with Rattle and Essentials of Data Science. Popular open source software includes rattle, wajig, and mlhub. Hosted by Togaware, a pioneer of free and open source software since 1984. Copyright © 1995-2022 Graham.Williams@togaware.com Creative Commons Attribution-ShareAlike 4.0