Data Science Desktop Survival Guide
by Graham Williams
20200607 R provides stats::predict() to obtain predictions from the model. We can obtain the predictions, for example, on the test dataset and thereby determine the apparent accuracy of the model.
For different types of models stats::predict() will behave
in a similar way. There are however variations that we need to be
aware of for each. For an rpart model to predict the
class (i.e., Yes or No) use
ds[te, vars] %>% predict(model, newdata=., type="class") -> predict_te
We can then compare this to the actual class for these observations as is recorded in the original te dataset. The actual classes have already been stored as the variable target_te:
We can observe from the above that the model correctly predicts 6 of the first 6 observations from the test dataset, suggesting a 100% accuracy. Over the full 26,513 observations contained in the test dataset 22,177 are correctly predicted, which is 84% accurate.
For different evaluations of the model we will collect the class predictions from the training and tuning datasets as well:
ds[tr, vars] %>% predict(model, newdata=., type="class") -> predict_tr
ds[tu, vars] %>% predict(model, newdata=., type="class") -> predict_tu
We can also calculate the accuracy for each of these datasets: