8.9 Formula to Describe the Goal

20200607 In the context of supporting analytic modelling tasks we identify formula used to describe the model to be built. Typically we will model the target variable on the input variables, so that using any resulting model with a new set of values for the input variables we can predict the value of the target variable.

Using stats::formula() we can automatically construct the formula from the dataset itself if the first column of the dataset is the target variable and the remaining columns are the input variables. Our usual ordering of columns within a dataset place the target variable as the last variable rather than the first. A simple selection of the columns from vars in the reverse order, using base::rev(), will then lead to the right formula automatically.

form <- formula(ds[rev(vars)]) %T>% print()

## rain_tomorrow ~ min_temp + max_temp + rainfall + evaporation + 
##     sunshine + wind_gust_dir + wind_gust_speed + wind_dir_9am + 
##     wind_dir_3pm + wind_speed_9am + wind_speed_3pm + humidity_9am + 
##     humidity_3pm + pressure_9am + pressure_3pm + cloud_9am + 
##     cloud_3pm + temp_9am + temp_3pm + rain_today

The notation used to express the formula begins with the name of the target (rain_tomorrow) followed by a tilde ( ) followed by the variables that will be used to model the target, each separated by a plus (+). The formula indicates that we will fit a model to predict rain_tomorrow from the remaining input variables.

A shorthand for this same formulation is:

rain_tomorrow ~ .

Your donation will support ongoing availability and give you access to the PDF version of this book. Desktop Survival Guides include Data Science, GNU/Linux, and MLHub. Books available on Amazon include Data Mining with Rattle and Essentials of Data Science. Popular open source software includes rattle, wajig, and mlhub. Hosted by Togaware, a pioneer of free and open source software since 1984. Copyright © 1995-2022 Graham.Williams@togaware.com Creative Commons Attribution-ShareAlike 4.0