Go to TogaWare.com Home Page. Data Science Desktop Survival Guide
by Graham Williams
Duck Duck Go



CLICK HERE TO VISIT THE UPDATED SURVIVAL GUIDE

Risk Variable

20180723 With some knowledge of the data we observe risk_mm captures the amount of rain recorded tomorrow. We refer to this as a risk variable, being a measure of the impact or risk of the target we are predicting (rain tomorrow). The risk is an output variable and should not be used as an input to the modelling—it is not an independent variable. In other circumstances it might actually be treated as the target variable.

# Note the risk variable - measures the severity of the outcome.

risk <- "risk_mm"

For this risk variable note that we expect it to have a value of 0 for all observations when the target variable has the value No.

# Review the distribution of the risk variable for non-targets.

ds %>%
  filter(rain_tomorrow == "No") %>%
  select(risk_mm) %>%
  summary()
##     risk_mm      
##  Min.   :0.0000  
##  1st Qu.:0.0000  
##  Median :0.0000  
##  Mean   :0.0726  
##  3rd Qu.:0.0000  
##  Max.   :1.0000

Interestingly, even a little rain (defined as 1mm or less) is regarded as no rain. That is useful to keep in mind and is a discovery of the data that we might not have expected. As data scientists we should be expecting to find the unexpected.

A similar analysis for the target observations is more in line with expectations.

# Review the distribution of the risk variable for targets.

ds %>%
  filter(rain_tomorrow == "Yes") %>%
  select(risk_mm) %>%
  summary()
##     risk_mm      
##  Min.   :  1.10  
##  1st Qu.:  2.40  
##  Median :  5.00  
##  Mean   : 10.17  
##  3rd Qu.: 11.40  
##  Max.   :474.00


Support further development by purchasing the PDF version of the book.
Other online resources include the GNU/Linux Desktop Survival Guide.
Books available on Amazon include Data Mining with Rattle and Essentials of Data Science.
Popular open source software includes rattle and wajig.
Hosted by Togaware, a pioneer of free and open source software since 1984.
Copyright © 2000-2020 Togaware Pty Ltd. . Creative Commons ShareAlike V4.