Go to TogaWare.com Home Page. Data Science Desktop Survival Guide
by Graham Williams
Duck Duck Go



CLICK HERE TO VISIT THE UPDATED SURVIVAL GUIDE

ML Data and Variables

20200607

The rattle::weatherAUS dataset is loaded into the template variable ds and further template variables are setup as introduced in Williams (2017). See Chapter 7 for details.

dsname <- "weatherAUS"
ds     <- get(dsname)

nobs   <- nrow(ds)

vnames <- names(ds)
ds    %<>% clean_names(numerals="right")
names(vnames) <- names(ds)

vars   <- names(ds)
target <- "rain_tomorrow"
vars   <- c(target, vars) %>% unique() %>% rev()

Using the rattle::weatherAUS dataset we follow the template approach as introduced in Williams (2017) to setup further template variables. See Chapter 7 for details.

risk   <- "risk_mm"
id     <- c("date", "location")
ignore <- c(risk, id)
vars   <- setdiff(vars, ignore)
inputs <- setdiff(vars, target)

form   <- formula(target %s+% " ~ .")

ds[vars] %<>% na.roughfix()

SPLIT <- c(0.70, 0.15, 0.15)

nobs %>% sample(SPLIT[1]*nobs)                               -> tr
nobs %>% seq_len() %>% setdiff(tr) %>% sample(SPLIT[2]*nobs) -> tu
nobs %>% seq_len() %>% setdiff(tr) %>% setdiff(tu)           -> te

ds %>% slice(tr) %>% pull(target) -> actual_tr
ds %>% slice(tu) %>% pull(target) -> actual_tu
ds %>% slice(te) %>% pull(target) -> actual_te

ds %>% slice(tr) %>% pull(risk) -> risk_tr
ds %>% slice(tu) %>% pull(risk) -> risk_tu
ds %>% slice(te) %>% pull(risk) -> risk_te


Support further development by purchasing the PDF version of the book.
Other online resources include the GNU/Linux Desktop Survival Guide.
Books available on Amazon include Data Mining with Rattle and Essentials of Data Science.
Popular open source software includes rattle and wajig.
Hosted by Togaware, a pioneer of free and open source software since 1984.
Copyright © 2000-2020 Togaware Pty Ltd. . Creative Commons ShareAlike V4.