Data Science Desktop Survival Guide by Graham Williams Desktop Survival Project Home Preface Data Science Introducing R R Constructs R Tasks R Strings R Read, Write, and Create Data Template Data Exploration Data Wrangling Data Visualisation Statistics ML Template ML Scenarios ML Activities ML Applications ML Algorithms Cluster Analysis Decision Trees Computer Vision Graph Data Privacy Literate Data Science Coding with Style Resources Bibliography Index

Algorithms Data and Variables

20210103

The rattle::weatherAUS dataset is loaded into the template variable ds and further template variables are setup as introduced in Williams (2017). See Chapter 7 for details.

 dsname <- "weatherAUS" ds     <- get(dsname) nobs   <- nrow(ds) vnames <- names(ds) ds    %<>% clean_names(numerals="right") names(vnames) <- names(ds) vars   <- names(ds) target <- "rain_tomorrow" vars   <- c(target, vars) %>% unique() %>% rev() The rattle::weatherAUS dataset suggests the following the template variables (Williams, 2017) for predictive modelling. See Chapter 7 for details.
 risk   <- "risk_mm" id     <- c("date", "location") ignore <- c(risk, id) vars   <- setdiff(vars, ignore) inputs <- setdiff(vars, target) form   <- formula(target %s+% " ~ .") ds[vars] %<>% na.roughfix() SPLIT <- c(0.70, 0.15, 0.15) nobs %>% sample(SPLIT[1]*nobs)                               -> tr nobs %>% seq_len() %>% setdiff(tr) %>% sample(SPLIT[2]*nobs) -> tu nobs %>% seq_len() %>% setdiff(tr) %>% setdiff(tu)           -> te ds %>% slice(tr) %>% pull(target) -> actual_tr ds %>% slice(tu) %>% pull(target) -> actual_tu ds %>% slice(te) %>% pull(target) -> actual_te ds %>% slice(tr) %>% pull(risk) -> risk_tr ds %>% slice(tu) %>% pull(risk) -> risk_tu ds %>% slice(te) %>% pull(risk) -> risk_te