Data Science Desktop Survival Guide
by Graham Williams |
|||||
Formula to Describe the Goal |
20200607 In the context of supporting analytic modelling tasks we identify formula used to describe the model to be built. Typically we will model the target variable on the input variables, so that using any resulting model with a new set of values for the input variables we can predict the value of the target variable.
Using stats::formula() we can automatically construct the formula from the dataset itself if the first column of the dataset is the target variable and the remaining columns are the input variables. Our usual ordering of columns within a dataset place the target variable as the last variable rather than the first. A simple selection of the columns from vars in the reverse order, using base::rev(), will then lead to the right formula automatically.
form <- formula(ds[rev(vars)]) %T>% print()
The notation used to express the formula begins with the name of the
target (rain_tomorrow) followed by a
tilde ( A shorthand for this same formulation is:
|
rain_tomorrow ~ .
|