14.6 Predict Class

20200607 R provides stats::predict() to obtain predictions from the model. We can obtain the predictions, for example, on the test dataset and thereby determine the apparent accuracy of the model.

For different types of models stats::predict() will behave in a similar way. There are however variations that we need to be aware of for each. For an rpart model to predict the class (i.e., Yes or No) use type="class":

ds[te, vars] %>% predict(model, newdata=., type="class") -> predict_te

head(predict_te)
##  1  2  3  4  5  6 
## No No No No No No 
## Levels: No Yes

We can then compare this to the actual class for these observations as is recorded in the original te dataset. The actual classes have already been stored as the variable target_te:

head(actual_te)
## [1] No No No No No No
## Levels: No Yes

We can observe from the above that the model correctly predicts 6 of the first 6 observations from the test dataset, suggesting a 100% accuracy. Over the full 34,031 observations contained in the test dataset 28,242 are correctly predicted, which is 83% accurate.

For different evaluations of the model we will collect the class predictions from the training and tuning datasets as well:

ds[tr, vars] %>% predict(model, newdata=., type="class") -> predict_tr
ds[tu, vars] %>% predict(model, newdata=., type="class") -> predict_tu

We can also calculate the accuracy for each of these datasets:

glue("{round(100*sum(predict_tr==actual_tr)/length(predict_tr))}%")
## 83%
glue("{round(100*sum(predict_tu==actual_tu)/length(predict_tu))}%")
## 83%


Your donation will support ongoing availability and give you access to the PDF version of this book. Desktop Survival Guides include Data Science, GNU/Linux, and MLHub. Books available on Amazon include Data Mining with Rattle and Essentials of Data Science. Popular open source software includes rattle, wajig, and mlhub. Hosted by Togaware, a pioneer of free and open source software since 1984. Copyright © 1995-2022 Graham.Williams@togaware.com Creative Commons Attribution-ShareAlike 4.0