Go to TogaWare.com Home Page. Data Science Desktop Survival Guide
by Graham Williams
Duck Duck Go



CLICK HERE TO VISIT THE UPDATED SURVIVAL GUIDE

Normalizing Variable Names

20180721 A convenient convention that I personally prefer is to map all variable names to lowercase. R is case sensitive so that doing this will result in different variable names as far as R is concerned. Such (so called) normalisation is useful when different upper/lower case conventions are intermixed inconsistently in names like Incm_tax_PyBl. Remembering how to capitalize when interactively exploring the data with thousands of such variables can be quite a cognitive load for us. Yet we often see such variable names arising in practise especially when we import data from databases which are often case insensitive.

We can use rattle::normVarNames() to make a reasonable attempt of converting variables from a dataset into a preferred standard form. The actual form follows a style that is presented in Appendix 21. The example below shows the transformation into a normalised form. We make extensive use of the function base::names() to work with the variable names.

# Normalise the variable names.

ds %<>% rename_all(normVarNames)
names(ds)
##  [1] "date"            "location"        "min_temp"        "max_temp"   ...
##  [5] "rainfall"        "evaporation"     "sunshine"        "wind_gust_di...
##  [9] "wind_gust_speed" "wind_dir_9am"    "wind_dir_3pm"    "wind_speed_9...
## [13] "wind_speed_3pm"  "humidity_9am"    "humidity_3pm"    "pressure_9am...
## [17] "pressure_3pm"    "cloud_9am"       "cloud_3pm"       "temp_9am"   ...
## [21] "temp_3pm"        "rain_today"      "risk_mm"         "rain_tomorrow"
....

Notice the use of the assignment pipe here as introduced in Chapter [*]. We will recall that the magrittr::%$<$$>$% operator pipes the left-hand data to the function on the right-hand side and then returns the result to the left-hand side overwriting the original contents of the memory referred to on the left-hand side.


Support further development by purchasing the PDF version of the book.
Other online resources include the GNU/Linux Desktop Survival Guide.
Books available on Amazon include Data Mining with Rattle and Essentials of Data Science.
Popular open source software includes rattle and wajig.
Hosted by Togaware, a pioneer of free and open source software since 1984.
Copyright © 2000-2020 Togaware Pty Ltd. . Creative Commons ShareAlike V4.