Data Science Desktop Survival Guide
by Graham Williams |
|||||
Normalise Variables |
20200912 To rename variables in a dataset we can use dplyr::rename_with() which can apply a function, like rattle::normVarNames(), to the variable names and replace those names with the result from the function. A tidy alternative is to use janitor::clean_names() with the option numerals="right" to replicate rattle::normVarNames().
The choice of variable naming style is suggested in Chapter 23. all variable names are lowercase with words separated by the underscore. This normalisation is useful when different upper/lower case conventions are intermixed inconsistently in names like Incm_tax_PyBl. Remembering how to capitalize when interactively exploring the data with thousands of such variables can be quite a cognitive load. Yet we often see such variable names arising in practise especially when we import data from databases which are often case insensitive.
The example below shows the transformation into the preferred normalised form.
# Normalise variable names.
library(janitor) # Cleanup: clean_names(). names(ds)
ds %<>%
clean_names(numerals="right") names(ds)
Notice the use of the assignment pipe here as introduced in Chapter 3. We will recall that the magrittr::https://www.rdocumentation.org/packages/magrittr/topics/to the function on the right-hand side and then returns the result to the left-hand side overwriting the original contents of the memory referred to on the left-hand side.
|