8.3 Normalise Variable Names

Variable names are normalised so as to have some certainty in interacting with the data. The convenience function janitor::clean_names() can do this.

# Review the variables before normalising their names.

names(ds)
##  [1] "Date"          "Location"      "MinTemp"       "MaxTemp"      
##  [5] "Rainfall"      "Evaporation"   "Sunshine"      "WindGustDir"  
##  [9] "WindGustSpeed" "WindDir9am"    "WindDir3pm"    "WindSpeed9am" 
## [13] "WindSpeed3pm"  "Humidity9am"   "Humidity3pm"   "Pressure9am"  
## [17] "Pressure3pm"   "Cloud9am"      "Cloud3pm"      "Temp9am"      
## [21] "Temp3pm"       "RainToday"     "RISK_MM"       "RainTomorrow"
# Capture the original variable names for use in plots.

vnames <- names(ds)

# Normalise the variable names.

ds %<>% clean_names(numerals="right")

# Confirm the results are as expected.

names(ds)
##  [1] "date"            "location"        "min_temp"        "max_temp"       
##  [5] "rainfall"        "evaporation"     "sunshine"        "wind_gust_dir"  
##  [9] "wind_gust_speed" "wind_dir_9am"    "wind_dir_3pm"    "wind_speed_9am" 
## [13] "wind_speed_3pm"  "humidity_9am"    "humidity_3pm"    "pressure_9am"   
## [17] "pressure_3pm"    "cloud_9am"       "cloud_3pm"       "temp_9am"       
## [21] "temp_3pm"        "rain_today"      "risk_mm"         "rain_tomorrow"
# Index the original variable names by the new names.

names(vnames) <- names(ds)

vnames
##            date        location        min_temp        max_temp        rainfall 
##          "Date"      "Location"       "MinTemp"       "MaxTemp"      "Rainfall" 
##     evaporation        sunshine   wind_gust_dir wind_gust_speed    wind_dir_9am 
##   "Evaporation"      "Sunshine"   "WindGustDir" "WindGustSpeed"    "WindDir9am" 
##    wind_dir_3pm  wind_speed_9am  wind_speed_3pm    humidity_9am    humidity_3pm 
##    "WindDir3pm"  "WindSpeed9am"  "WindSpeed3pm"   "Humidity9am"   "Humidity3pm" 
##    pressure_9am    pressure_3pm       cloud_9am       cloud_3pm        temp_9am 
##   "Pressure9am"   "Pressure3pm"      "Cloud9am"      "Cloud3pm"       "Temp9am" 
##        temp_3pm      rain_today         risk_mm   rain_tomorrow 
##       "Temp3pm"     "RainToday"       "RISK_MM"  "RainTomorrow"

Notice that we capture the original variable names in the variable for reference, and particularly when generating plots and wanting to use the original names.

The variable names now conform to our expectations of them and in accordance to our chosen style as documented in Chapter 23.



Your donation will support ongoing development and give you access to the PDF version of the book. Desktop Survival Guides include Data Science, GNU/Linux, and MLHub. Books available on Amazon include Data Mining with Rattle and Essentials of Data Science. Popular open source software includes rattle, wajig, and mlhub. Hosted by Togaware, a pioneer of free and open source software since 1984.
Copyright © 1995-2021 Graham.Williams@togaware.com Creative Commons Attribution-ShareAlike 4.0.