14.11 ROC Chart

Another measure of the quality of a model is the ROC curve and in particular the area under the ROC curve. This area can be calculated using ROCR::prediction() and ROCR::performance() from ROCR (Sing et al. 2020). These functions use the probability of a prediction rather than the prediction of a class.

In the following code block we obtain the predicted probabilities from the model, predicting over the te dataset. The result from calling stats::predict() is a matrix with columns corresponding to the possible class values recording the probability of each class for each observation. The second column is the one of interest (the probability that it will rain tomorrow or rain_tomorrow==yes). These probabilities are passed on to ROCR::prediction() to compare them with the actual target values. The result is then passed on to ROCR::performance() from which we obtain the xfun::attr()ibute y.values and then magrittr::extract2() the first value as the area under the curve.

References

Sing, Tobias, Oliver Sander, Niko Beerenwinkel, and Thomas Lengauer. 2020. ROCR: Visualizing the Performance of Scoring Classifiers. http://ipa-tys.github.io/ROCR/.

Your donation will support ongoing availability and give you access to the PDF version of this book. Desktop Survival Guides include Data Science, GNU/Linux, and MLHub. Books available on Amazon include Data Mining with Rattle and Essentials of Data Science. Popular open source software includes rattle, wajig, and mlhub. Hosted by Togaware, a pioneer of free and open source software since 1984. Copyright © 1995-2022 Graham.Williams@togaware.com Creative Commons Attribution-ShareAlike 4.0