Data Science Desktop Survival Guide
by Graham Williams |
|||||
Wrangling Setup |
20180908 Packages used in this chapter include dplyr, FSelector, ggplot2, glue, janitor, lobstr, lubridate, randomForest, readr, stringi, stringr, tidyr, magrittr, and rattle.
Packages are loaded into the currently running R session from your
local library directories on disk. Missing packages can be installed
using utils::install.packages() within R. On Ubuntu, for
example, R packages can be installed using
wajig install r-cran-<pkgname>
.
# Load required packages from local library into the R session.
library(rattle) # weather dataset. library(readr) # Efficient reading of CSV data. library(dplyr) # Wrangling: glimpse(). library(lobstr) # Inspect R data structures. library(tidyr) # Prepare a tidy dataset, gather(). library(magrittr) # Pipes %>% and %T>% and equals(). library(glue) # Format strings. library(janitor) # Cleanup: clean_names(). library(lubridate) # Dates and time. library(FSelector) # Feature selection, information.gain(). library(stringi) # String concat operator %s+%. library(stringr) # String operations. library(randomForest) # Impute missing values with na.roughfix(). library(ggplot2) # Visualise data. library(purrr) # simplify(), set_names()
The rattle::weatherAUS dataset is loaded into the template variable ds and further template variables are setup as introduced in Williams (2017). See Chapter 7 for details.
|
dsname <- "weatherAUS"
ds <- get(dsname) nobs <- nrow(ds) vnames <- names(ds) ds %<>% clean_names(numerals="right") names(vnames) <- names(ds) vars <- names(ds) target <- "rain_tomorrow" vars <- c(target, vars) %>% unique() %>% rev()
|