Data Science Desktop Survival Guide
by Graham Williams |
|||||
Decision Trees Modelling Setup |
20200815
The rattle::weatherAUS dataset suggests the following the template variables (Williams, 2017) for predictive modelling. See Chapter 7 for details.
risk <- "risk_mm"
id <- c("date", "location") ignore <- c(risk, id) vars <- setdiff(vars, ignore) inputs <- setdiff(vars, target) form <- formula(target %s+% " ~ .") ds[vars] %<>% na.roughfix() SPLIT <- c(0.70, 0.15, 0.15) nobs %>% sample(SPLIT[1]*nobs) -> tr nobs %>% seq_len() %>% setdiff(tr) %>% sample(SPLIT[2]*nobs) -> tu nobs %>% seq_len() %>% setdiff(tr) %>% setdiff(tu) -> te ds %>% slice(tr) %>% pull(target) -> actual_tr ds %>% slice(tu) %>% pull(target) -> actual_tu ds %>% slice(te) %>% pull(target) -> actual_te ds %>% slice(tr) %>% pull(risk) -> risk_tr ds %>% slice(tu) %>% pull(risk) -> risk_tu ds %>% slice(te) %>% pull(risk) -> risk_te
|
## Error in eval(parse(text = text, keep.source = FALSE), envir): object '.' not found
The 176,747 observations from the dataset have been
randomly partitioned into a training dataset with
123,722 observations, a tuning dataset with
26,512 observations, and a testing dataset
with 26,513 observations. The target variable
(
|