Data Science Desktop Survival Guide
by Graham Williams
Decision Trees Setup
20180603 Packages used in this chapter include C50, RWeka, party, partykit, rpart, rpart.plot, and rattle.
Packages are loaded into the currently running R session from your
local library directories on disk. Missing packages can be installed
using utils::install.packages() within R. On Ubuntu, for
example, R packages can be installed using
wajig install r-cran-<pkgname>.
# Load required packages from local library into the R session.
library(C50) # Original C5.0 implementation.
library(RWeka) # Weka decision tree J48.
library(party) # Conditional decision trees ctree().
library(partykit) # Convert rpart object to BinaryTree
library(rattle) # GUI for building trees and fancy tree plot.
library(rpart) # Popular decision tree algorithm.
library(rpart.plot) # Enhanced tree plots.
dsname <- "weatherAUS"
ds <- get(dsname)
nobs <- nrow(ds)
vnames <- names(ds)
ds %<>% clean_names(numerals="right")
names(vnames) <- names(ds)
vars <- names(ds)
target <- "rain_tomorrow"
vars <- c(target, vars) %>% unique() %>% rev()
It is always useful to remind ourselves of the dataset with a random sample:
ds %>% sample_frac() %>% select(date, location, sample(3:length(vars), 5))