Go to TogaWare.com Home Page. Data Science Desktop Survival Guide
by Graham Williams
Duck Duck Go



CLICK HERE TO VISIT THE UPDATED SURVIVAL GUIDE

Introducing Template Variables

20180721 A reference to the original dataset can be created using a template (or generic) variable. The new variable will be called ds (short for dataset).

# Take a copy of the dataset into a generic variable.

ds <- weatherAUS

Both ds and weatherAUS will now reference the same dataset within the computer's memory. As we modify ds those modifications will only affect the data referenced by ds. Effectively, an extra copy of the dataset in the computer's memory will start to grow as we change the data from its original form. R avoids making copies of datasets unnecessarily and so a simple assignment does not create a new copy. As modifications are made to one or the other copy of a dataset then extra memory will be used to store the columns that differ between the datasets.

From here on we no longer refer to the dataset as weather but as ds. This allows the following analyses and processing to be rather generic—turning the R code into a template and so requiring only minor modification when used with a different dataset assigned into ds.

Often we will find that we can simply load a different dataset into memory, store it as ds and the remaining steps of our analyses and processing will essentially work unchanged.

The first few steps of our template are then captured as creating the reference to the dataset and presenting our initial view of the dataset.

# Prepare for a templated analysis and processing.

dsname <- "weatherAUS"
ds     <- get(dsname)
ds %<>% clean_names(numerals="right")
glimpse(ds)
## Rows: 176,747
## Columns: 24
## $ date            <date> 2008-12-01, 2008-12-02, 2008-12-03, 2008-12-04,...
## $ location        <chr> "Albury", "Albury", "Albury", "Albury", "Albury"...
## $ min_temp        <dbl> 13.4, 7.4, 12.9, 9.2, 17.5, 14.6, 14.3, 7.7, 9.7...
## $ max_temp        <dbl> 22.9, 25.1, 25.7, 28.0, 32.3, 29.7, 25.0, 26.7, ...
....

We are a little tricky here in recording the dataset name in the variable dsname and then using the function base::get() to make a copy of the dataset reference and link it to the generic variable ds. We could simply assign the data to ds directly as we saw above. Either way the generic variable ds refers to the same dataset. The use of base::get() allows us to be a little more generic in our template.

The use of generic variables within a template for the tasks we perform on each new dataset will have obvious advantages but we need to be careful. A disadvantage is that we may be working with several datasets and accidentally overwrite previously processed datasets referenced using the same generic variable (ds). The processing of the dataset might take some time and so accidentally losing it is not an attractive proposition. Care needs to be taken to avoid this.


Support further development by purchasing the PDF version of the book.
Other online resources include the GNU/Linux Desktop Survival Guide.
Books available on Amazon include Data Mining with Rattle and Essentials of Data Science.
Popular open source software includes rattle and wajig.
Hosted by Togaware, a pioneer of free and open source software since 1984.
Copyright © 2000-2020 Togaware Pty Ltd. . Creative Commons ShareAlike V4.