Data Science Desktop Survival Guide
by Graham Williams
Interpret RPart Decision Tree
The textual version of a classification decision tree is reported by rpart.
The legend, which begins with node) indicates that each node is identified by a number, followed by a split (which will usually be in the form of a test on the value of a variable), the number of observations at that node, the number of observations that are incorrectly classified (the ), the default classification for the node (the ), and then the distribution of classes in that node (the ) across No and Yes. The next line indicates that a “*” denotes a terminal node of the tree (i.e., a leaf node—the tree is not split any further at that node).
The actual tree starts with the root node labelled 1). observations and a default decision of No. There are 42 observations with Yes as the decision, so these are “lost” if we make the decision No for all observations. The probability of No is reported as (which is ) and of Yes is ().
The root node is split into two branches, nodes number 2 and 3. For
node number 2, the split corresponds to those observations for which