Data Science Desktop Survival Guide
by Graham Williams |
|||||
Dataset Setup |
20200320 Packages used in this chapter include dplyr, janitor, magrittr, randomForest, and rattle.
Packages are loaded into the currently running R session from your
local library directories on disk. Missing packages can be installed
using utils::install.packages() within R. On Ubuntu, for
example, R packages can be installed using
wajig install r-cran-<pkgname>
.
# Load required packages from local library into the R session.
library(dplyr) # Wrangling: select() sample_frac(). library(janitor) # Cleanup: clean_names(). library(magrittr) # Data pipelines: %>% %<>% %T>% equals(). library(randomForest) # Model: randomForest() na.roughfix() for missing data. library(rattle) # normVarNames(). Dataset: weather.
After loading the required packages into the library we access the rattle::weatherAUS dataset and save it into the template dataset named ds, as per the template based approach introduced in Williams (2017). The dataset is reasonably large ( rows or observations by columns or variables) and is used extensively in this book to illustrate the capabilities of R for the Data Scientist.
|
# Initialise the dataset as per the template.
dsname <- "weatherAUS" ds <- get(dsname) ds %>% sample_frac()
|