Data Science Desktop Survival Guide
by Graham Williams |
|||||
Normalizing Variable Names |
20180721 A convenient convention that I personally prefer is to map all variable names to lowercase. R is case sensitive so that doing this will result in different variable names as far as R is concerned. Such (so called) normalisation is useful when different upper/lower case conventions are intermixed inconsistently in names like Incm_tax_PyBl. Remembering how to capitalize when interactively exploring the data with thousands of such variables can be quite a cognitive load for us. Yet we often see such variable names arising in practise especially when we import data from databases which are often case insensitive.
We can use rattle::normVarNames() to make a reasonable attempt of converting variables from a dataset into a preferred standard form. The actual form follows a style that is presented in Appendix 21. The example below shows the transformation into a normalised form. We make extensive use of the function base::names() to work with the variable names.