Data Science Desktop Survival Guide by Graham Williams Desktop Survival Project Home Preface Data Science Introducing R R Constructs R Tasks R Strings R Read, Write, and Create Data Template Data Exploration Data Wrangling Data Visualisation Statistics ML Template ML Scenarios ML Activities ML Applications ML Algorithms Cluster Analysis Decision Trees Computer Vision Graph Data Privacy Literate Data Science Coding with Style Resources Bibliography Index

## ML Modelling Setup

20200607

The rattle::weatherAUS dataset suggests the following the template variables (Williams, 2017) for predictive modelling. See Chapter 7 for details.

 risk   <- "risk_mm" id     <- c("date", "location") ignore <- c(risk, id) vars   <- setdiff(vars, ignore) inputs <- setdiff(vars, target) form   <- formula(target %s+% " ~ .") ds[vars] %<>% na.roughfix() SPLIT <- c(0.70, 0.15, 0.15) nobs %>% sample(SPLIT[1]*nobs)                               -> tr nobs %>% seq_len() %>% setdiff(tr) %>% sample(SPLIT[2]*nobs) -> tu nobs %>% seq_len() %>% setdiff(tr) %>% setdiff(tu)           -> te ds %>% slice(tr) %>% pull(target) -> actual_tr ds %>% slice(tu) %>% pull(target) -> actual_tu ds %>% slice(te) %>% pull(target) -> actual_te ds %>% slice(tr) %>% pull(risk) -> risk_tr ds %>% slice(tu) %>% pull(risk) -> risk_tu ds %>% slice(te) %>% pull(risk) -> risk_te
 ```## # A tibble: 2 x 2 ## rain_tomorrow Count ## * ## 1 No 139670 ## 2 Yes 37077 ```

## Error in eval(parse(text = text, keep.source = FALSE), envir): object '.' not found

The 176,747 observations from the dataset have been randomly partitioned into a training dataset with 123,722 observations, a tuning dataset with 26,512 observations, and a testing dataset with 26,513 observations. The target variable (`rain_tomorrow`) has the classes: No (139670), Yes (37077).