Go to TogaWare.com Home Page. Data Science Desktop Survival Guide
by Graham Williams
Duck Duck Go


A Template for Data Preparation

Through this chapter we have built a template for data preparation. An actual knitr template based on this chapter for data preparation is available as http://HandsOnDataScience.com/scripts/data.Rnw. An automatically derived version including just the R code is also available as http://HandsOnDataScience.com/scripts/data.R. Notice that we would not necessarily perform all of the steps, such as normalising the variable names, imputing missing values, omitting observations with missing values, and so on. Instead we pick and choose as is appropriate to our situation and specific datasets. Also, some data specific transformations are not included in the template and there may be other transforms we need to perform that we have not covered here. As we discover new tools to support the data scientist we can add them into our own templates.

dsname        <- "weatherAUS"
ds            <- get(dsname)
vnames        <- names(ds)
names(ds)    %<>% normVarNames()
names(vnames) <- names(ds)

Support further development by purchasing the PDF version of the book.
Other online resources include the GNU/Linux Desktop Survival Guide.
Books available on Amazon include Data Mining with Rattle and Essentials of Data Science.
Popular open source software includes rattle and wajig.
Hosted by Togaware, a pioneer of free and open source software since 1984.
Copyright © 2000-2020 Togaware Pty Ltd. . Creative Commons ShareAlike V4.