Data Science Desktop Survival Guide
by Graham Williams |
|||||
Visualisation Setup |
20200608
Packages used in this chapter include GGally, RColorBrewer, colorRamps, dplyr, epitools, ggplot2, randomForest, scales, stringr, magrittr, and rattle.
Packages are loaded into the currently running R session from your
local library directories on disk. Missing packages can be installed
using utils::install.packages() within R. On Ubuntu, for
example, R packages can be installed using
wajig install r-cran-<pkgname>
.
# Load required packages from local library into the R session.
library(GGally) # Pairs plots. library(RColorBrewer) # Brew various colour ranges. library(colorRamps) # Generate colour ranges: blue2green2red(). library(dplyr) # glimpse(). library(epitools) # Colour selection: colors.plot(). library(ggplot2) # Visualise data. library(scales) # commas(), percent(). library(stringr) # str_replace_all().
The rattle::weatherAUS dataset is loaded into the template variable ds and further template variables are setup as introduced in Williams (2017). See Chapter 7 for details.
|
dsname <- "weatherAUS"
ds <- get(dsname) nobs <- nrow(ds) vnames <- names(ds) ds %<>% clean_names(numerals="right") names(vnames) <- names(ds) vars <- names(ds) target <- "rain_tomorrow" vars <- c(target, vars) %>% unique() %>% rev()
We also do a little more to set the data up for demonstrating various approaches to visualisation. As with the model template, a number of template variables are identified here. We also a little data wrangling to remove all missing values by performing a missing value imputation with randomForest::na.roughfix().
|
risk <- "risk_mm"
id <- c("date", "location") ignore <- c(risk, id) vars <- setdiff(vars, ignore) inputs <- setdiff(vars, target) ds[vars] %<>% na.roughfix()
|