 Data Science Desktop Survival Guide by Graham Williams Desktop Survival Project Home Preface Data Science Introducing R R Constructs R Tasks R Strings R Read, Write, and Create Data Template Data Exploration Data Wrangling Data Visualisation Statistics ML Template ML Scenarios ML Activities ML Applications ML Algorithms Cluster Analysis Decision Trees Computer Vision Graph Data Privacy Literate Data Science Coding with Style Resources Bibliography Index

## Accuracy and Error Rate

From the two vectors cl_te and target_te we can calculate the overall accuracy of the predictions over the te dataset. This will simply be the sum of the number of times the prediction agrees with the actual class, divided by the size of the test dataset (which is the same as the length of target_te).

acc.te <- sum(predict_te == actual_te, na.rm=TRUE)/length(actual_te)
round(100*acc.te, 2)
 ```##  83.59 ```

Here we can see that the model has an overall accuracy of 83.59%. That is a relatively high accuracy for a typical model build.

We can also calculate the overall error rate in a similar fashion. Some Data Scientists prefer to talk in terms of the error rate rather than the accuracy:

err.te <- sum(predict_te != actual_te, na.rm=TRUE)/length(actual_te)
round(100*err.te, 2)
 ```##  16.41 ```

Thus our decision tree model has an overall error rate of 16.41%.

Notice also that we have now twice converted a proportion (generally a number between 0 and 1) into a percentage (generally a number between 0 and 100) by multiplying the proportion by 100 and then base::round()ing it to 2 decimal places. We will no doubt want to do this regularly (if we find percentages to be more quickly accessible than proportions). This is thus a candidate for packaging up as a function. To do so we use base::function() and provide it with a single argument—the number we wish to convert to a percentage:

 per <- function(n) { p <- round(100*n, 2); return(p) } We can now use this as a convenience:

per(acc.te)
 ```##  83.59 ```

per(err.te)
 ```##  16.41 ```

To illustrate the more optimistic measure that we obtain when we apply our model to the training dataset we can repeat the above calculations:

acc.tr <- sum(predict_tr == actual_tr, na.rm=TRUE)/length(actual_tr)
per(acc.tr)
 ```##  83.5 ```

err.tr <- sum(predict_tr != actual_tr, na.rm=TRUE)/length(actual_tr)
per(err.tr)
 ```##  16.5 ```

The overall accuracy over the training dataset is 83.5% compared to the 83.59% accuracy calculated over the te dataset. The difference for this small dataset is small but we do see that the accuracy is higher on the training dataset. Similarly the overall error rate is 16.5% on the training dataset compared to the te error rate of 16.41%.