Go to TogaWare.com Home Page. Data Science Desktop Survival Guide
by Graham Williams
Duck Duck Go



CLICK HERE TO VISIT THE UPDATED SURVIVAL GUIDE

Build a Decision Tree Model

As seen from Rattle's Log tab the decision tree model is built using rpart(). Once the different template variables have been defined as in Section 18.2 (form, ds, tr, and vars) we can use this template call to build the model:

library(rpart)
model <- rpart(formula=form, data=ds[tr, vars], model=TRUE)

This is essentially the same as the command used by Rattle except that some parameter settings are removed. These will be explored later.

In the above call to rpart::rpart() we have named each of the arguments. If we have a look at the structure of rpart::rpart() we see that the arguments are in their expected order, and hence the use of the argument names, formula= and data=, is optional.

str(rpart)
## function (formula, data, weights, subset, na.action=na.rpart, method, 
##     model=FALSE, x=FALSE, y=TRUE, parms, control, cost, ...)

Whilst the argument names are optional they can assist in reading the code, and so the use of argument names in function calls is encouraged.

A textual presentation of the model is concise and informative, once we learn how to read it. Note this tree is different to the previous one we have seen, since we are using a much larger (the full) weather dataset which includes multiple years of daily observations from many different weather stations across Australia.

model
## n= 123722 
## 
## node), split, n, loss, yval, (yprob)
##       * denotes terminal node
## 
##  1) root 123722 26101 No (0.7890351 0.2109649)  
##    2) humidity_3pm< 72.5 105547 14889 No (0.8589349 0.1410651) *
##    3) humidity_3pm>=72.5 18175  6963 Yes (0.3831087 0.6168913)  
##      6) humidity_3pm< 83.5 10161  4901 No (0.5176656 0.4823344)  
##       12) rainfall< 2.7 6886  2757 No (0.5996224 0.4003776)  
##         24) wind_gust_speed< 47 5118  1729 No (0.6621727 0.3378273) *
##         25) wind_gust_speed>=47 1768   740 Yes (0.4185520 0.5814480) *
##       13) rainfall>=2.7 3275  1131 Yes (0.3453435 0.6546565) *
##      7) humidity_3pm>=83.5 8014  1703 Yes (0.2125031 0.7874969) *

Refer to Section 18.8 for an explanation of the format of the textual presentation of the decision tree.


Support further development by purchasing the PDF version of the book.
Other online resources include the GNU/Linux Desktop Survival Guide.
Books available on Amazon include Data Mining with Rattle and Essentials of Data Science.
Popular open source software includes rattle and wajig.
Hosted by Togaware, a pioneer of free and open source software since 1984.
Copyright © 2000-2020 Togaware Pty Ltd. . Creative Commons ShareAlike V4.