15.1 Machine Learning Setup
20200514 Packages used in this chapter include magrittr (Bache and Wickham 2022), and rattle (G. Williams 2022).
Packages are loaded into the currently running R session from your
local library directories on disk. Missing packages can be installed
using utils::install.packages() within R. On Ubuntu, for
example, R packages can also be installed using $ wajig install r-cran-<pkgname>.
# Load required packages from local library into the R session.
library(magrittr) # Data pipelines: %>% %<>% %T>% equals().
library(rattle) # Dataset: weather.The rattle::weatherAUS dataset is loaded into the template
variable ds and further template variables are setup as
introduced by Graham J. Williams (2017). See
Chapter 8 for details.
dsname <- "weatherAUS"
ds <- get(dsname)
nobs <- nrow(ds)
vnames <- names(ds)
ds %<>% clean_names(numerals="right")
names(vnames) <- names(ds)
vars <- names(ds)
target <- "rain_tomorrow"
vars <- c(target, vars) %>% unique() %>% rev()The variable form is used in this chapter as the formula
describing the model to be built.
## rain_tomorrow ~ .
## # A tibble: 208,495 × 24
## date location min_temp max_temp rainfall evaporation sunshine
## <date> <chr> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 2017-11-17 Woomera 10.6 27.3 0 NA NA
## 2 2010-05-09 Portland 13 18.4 1 0.4 5.5
## 3 2017-10-01 Williamtown 8.8 23.4 0 NA NA
## 4 2013-10-10 Canberra 6.6 28.5 0 NA NA
## 5 2012-06-16 Moree 8.7 22.1 0 3 8
## 6 2009-04-09 Woomera 12.4 26 0 8 11.2
## 7 2010-08-14 Williamtown 5.3 20.6 NA NA NA
## 8 2013-05-17 Uluru 5.1 23.1 0 NA NA
## 9 2014-06-18 Uluru 2 19.3 0 NA NA
## 10 2018-10-01 Hobart 5.7 19.2 0 4.8 7.9
## # ℹ 208,485 more rows
## # ℹ 17 more variables: wind_gust_dir <ord>, wind_gust_speed <dbl>,
## # wind_dir_9am <ord>, wind_dir_3pm <ord>, wind_speed_9am <dbl>,
## # wind_speed_3pm <dbl>, humidity_9am <int>, humidity_3pm <int>,
## # pressure_9am <dbl>, pressure_3pm <dbl>, cloud_9am <int>, cloud_3pm <int>,
## # temp_9am <dbl>, temp_3pm <dbl>, rain_today <fct>, risk_mm <dbl>,
## # rain_tomorrow <fct>
References
Bache, Stefan Milton, and Hadley Wickham. 2022. Magrittr: A Forward-Pipe Operator for r. https://magrittr.tidyverse.org.
Williams, Graham. 2022. Rattle: Graphical User Interface for Data Science in r. https://rattle.togaware.com/.
Williams, Graham J. 2017. The Essentials of Data Science: Knowledge Discovery Using r. The r Series. CRC Press.
Your donation will support ongoing availability and give you access to the PDF version of this book. Desktop Survival Guides include Data Science, GNU/Linux, and MLHub. Books available on Amazon include Data Mining with Rattle and Essentials of Data Science. Popular open source software includes rattle, wajig, and mlhub. Hosted by Togaware, a pioneer of free and open source software since 1984. Copyright © 1995-2022 Graham.Williams@togaware.com Creative Commons Attribution-ShareAlike 4.0