Go to TogaWare.com Home Page. Data Science Desktop Survival Guide
by Graham Williams
Duck Duck Go



CLICK HERE TO VISIT THE UPDATED SURVIVAL GUIDE

Decision Trees Modelling Setup

20200815

The rattle::weatherAUS dataset suggests the following the template variables (Williams, 2017) for predictive modelling. See Chapter 7 for details.

risk   <- "risk_mm"
id     <- c("date", "location")
ignore <- c(risk, id)
vars   <- setdiff(vars, ignore)
inputs <- setdiff(vars, target)

form   <- formula(target %s+% " ~ .")

ds[vars] %<>% na.roughfix()

SPLIT <- c(0.70, 0.15, 0.15)

nobs %>% sample(SPLIT[1]*nobs)                               -> tr
nobs %>% seq_len() %>% setdiff(tr) %>% sample(SPLIT[2]*nobs) -> tu
nobs %>% seq_len() %>% setdiff(tr) %>% setdiff(tu)           -> te

ds %>% slice(tr) %>% pull(target) -> actual_tr
ds %>% slice(tu) %>% pull(target) -> actual_tu
ds %>% slice(te) %>% pull(target) -> actual_te

ds %>% slice(tr) %>% pull(risk) -> risk_tr
ds %>% slice(tu) %>% pull(risk) -> risk_tu
ds %>% slice(te) %>% pull(risk) -> risk_te

## # A tibble: 2 x 2
##   rain_tomorrow  Count
## * <fct>          <int>
## 1 No            139670
## 2 Yes            37077

## Error in eval(parse(text = text, keep.source = FALSE), envir): object '.' not found

The 176,747 observations from the dataset have been randomly partitioned into a training dataset with 123,722 observations, a tuning dataset with 26,512 observations, and a testing dataset with 26,513 observations. The target variable (rain_tomorrow) has the classes: No (139670), Yes (37077).


Support further development by purchasing the PDF version of the book.
Other online resources include the GNU/Linux Desktop Survival Guide.
Books available on Amazon include Data Mining with Rattle and Essentials of Data Science.
Popular open source software includes rattle and wajig.
Hosted by Togaware, a pioneer of free and open source software since 1984.
Copyright © 2000-2020 Togaware Pty Ltd. . Creative Commons ShareAlike V4.