Go to TogaWare.com Home Page. Data Science Desktop Survival Guide
by Graham Williams
Duck Duck Go

Reviewing Variable Names

The names of the variables within the dataset as supplied to us may not be in any particular form and may use different conventions. For example, they may use a mix of upper and lower case letters (TempToday9AM) or be very long (Temperature_Recorded_Today_9am) or use sequential numbers to identify each variable (V004 or V004_rainToday) or use codes (XVn34_rain) or any number of other conventions. Often we prefer to simplify the variable names to ease our processing and thinking and to enforce a standard and consistent naming convention for ourselves.

We use base::names() to list the names of the variables within a dataset.

# Review the variables to consider normalising their names.

names(ds)
##  [1] "date"            "location"        "min_temp"        "max_temp"   ...
##  [5] "rainfall"        "evaporation"     "sunshine"        "wind_gust_di...
##  [9] "wind_gust_speed" "wind_dir_9am"    "wind_dir_3pm"    "wind_speed_9...
## [13] "wind_speed_3pm"  "humidity_9am"    "humidity_3pm"    "pressure_9am...
## [17] "pressure_3pm"    "cloud_9am"       "cloud_3pm"       "temp_9am"   ...
## [21] "temp_3pm"        "rain_today"      "risk_mm"         "rain_tomorrow"
....

Notice that the names here use a scheme whereby the initial letter is capitalised and each word within the variable name is also capitalised. That's a reasonable naming scheme and is preferred by some.


Support further development by purchasing the PDF version of the book.
Other online resources include the GNU/Linux Desktop Survival Guide.
Books available on Amazon include Data Mining with Rattle and Essentials of Data Science.
Popular open source software includes rattle and wajig.
Hosted by Togaware, a pioneer of free and open source software since 1984.
Copyright © 2000-2020 Togaware Pty Ltd. . Creative Commons ShareAlike V4.