## ML Modelling Setup

20200607

The rattle::weatherAUS dataset suggests the following the template variables (Williams, 2017) for predictive modelling. See Chapter 7 for details.

 risk   <- "risk_mm" id     <- c("date", "location") ignore <- c(risk, id) vars   <- setdiff(vars, ignore) inputs <- setdiff(vars, target) form   <- formula(target %s+% " ~ .") ds[vars] %<>% na.roughfix() SPLIT <- c(0.70, 0.15, 0.15) nobs %>% sample(SPLIT[1]*nobs)                               -> tr nobs %>% seq_len() %>% setdiff(tr) %>% sample(SPLIT[2]*nobs) -> tu nobs %>% seq_len() %>% setdiff(tr) %>% setdiff(tu)           -> te ds %>% slice(tr) %>% pull(target) -> actual_tr ds %>% slice(tu) %>% pull(target) -> actual_tu ds %>% slice(te) %>% pull(target) -> actual_te ds %>% slice(tr) %>% pull(risk) -> risk_tr ds %>% slice(tu) %>% pull(risk) -> risk_tu ds %>% slice(te) %>% pull(risk) -> risk_te
 ```## # A tibble: 2 x 2 ## rain_tomorrow Count ## * ## 1 No 139670 ## 2 Yes 37077 ```

## Error in eval(parse(text = text, keep.source = FALSE), envir): object '.' not found

The 176,747 observations from the dataset have been randomly partitioned into a training dataset with 123,722 observations, a tuning dataset with 26,512 observations, and a testing dataset with 26,513 observations. The target variable (`rain_tomorrow`) has the classes: No (139670), Yes (37077).