Go to TogaWare.com Home Page. Data Science Desktop Survival Guide
by Graham Williams
Duck Duck Go


Predict Class

20200607 R provides stats::predict() to obtain predictions from the model. We can obtain the predictions, for example, on the test dataset and thereby determine the apparent accuracy of the model.

For different types of models stats::predict() will behave in a similar way. There are however variations that we need to be aware of for each. For an rpart model to predict the class (i.e., Yes or No) use type="class":

ds[te, vars] %>% predict(model, newdata=., type="class") -> predict_te

##  1  2  3  4  5  6 
## No No No No No No 
## Levels: No Yes

We can then compare this to the actual class for these observations as is recorded in the original te dataset. The actual classes have already been stored as the variable target_te:

## [1] No No No No No No
## Levels: No Yes

We can observe from the above that the model correctly predicts 6 of the first 6 observations from the test dataset, suggesting a 100% accuracy. Over the full 26,513 observations contained in the test dataset 22,177 are correctly predicted, which is 84% accurate.

For different evaluations of the model we will collect the class predictions from the training and tuning datasets as well:

ds[tr, vars] %>% predict(model, newdata=., type="class") -> predict_tr
ds[tu, vars] %>% predict(model, newdata=., type="class") -> predict_tu

We can also calculate the accuracy for each of these datasets:

## 83%

## 83%

Support further development by purchasing the PDF version of the book.
Other online resources include the GNU/Linux Desktop Survival Guide.
Books available on Amazon include Data Mining with Rattle and Essentials of Data Science.
Popular open source software includes rattle and wajig.
Hosted by Togaware, a pioneer of free and open source software since 1984.
Copyright © 2000-2020 Togaware Pty Ltd. . Creative Commons ShareAlike V4.