Data Science Desktop Survival Guide
by Graham Williams |
|||||
Algorithms Data and Variables |
20210103
The rattle::weatherAUS dataset is loaded into the template variable ds and further template variables are setup as introduced in Williams (2017). See Chapter 7 for details.
dsname <- "weatherAUS"
ds <- get(dsname) nobs <- nrow(ds) vnames <- names(ds) ds %<>% clean_names(numerals="right") names(vnames) <- names(ds) vars <- names(ds) target <- "rain_tomorrow" vars <- c(target, vars) %>% unique() %>% rev()
The rattle::weatherAUS dataset suggests the following the template variables (Williams, 2017) for predictive modelling. See Chapter 7 for details.
|
risk <- "risk_mm"
id <- c("date", "location") ignore <- c(risk, id) vars <- setdiff(vars, ignore) inputs <- setdiff(vars, target) form <- formula(target %s+% " ~ .") ds[vars] %<>% na.roughfix() SPLIT <- c(0.70, 0.15, 0.15) nobs %>% sample(SPLIT[1]*nobs) -> tr nobs %>% seq_len() %>% setdiff(tr) %>% sample(SPLIT[2]*nobs) -> tu nobs %>% seq_len() %>% setdiff(tr) %>% setdiff(tu) -> te ds %>% slice(tr) %>% pull(target) -> actual_tr ds %>% slice(tu) %>% pull(target) -> actual_tu ds %>% slice(te) %>% pull(target) -> actual_te ds %>% slice(tr) %>% pull(risk) -> risk_tr ds %>% slice(tu) %>% pull(risk) -> risk_tu ds %>% slice(te) %>% pull(risk) -> risk_te
|