A Template for Data Preparation

Through this chapter we have built a template for data preparation. An actual knitr template based on this chapter for data preparation is available as http://HandsOnDataScience.com/scripts/data.Rnw. An automatically derived version including just the R code is also available as http://HandsOnDataScience.com/scripts/data.R. Notice that we would not necessarily perform all of the steps, such as normalising the variable names, imputing missing values, omitting observations with missing values, and so on. Instead we pick and choose as is appropriate to our situation and specific datasets. Also, some data specific transformations are not included in the template and there may be other transforms we need to perform that we have not covered here. As we discover new tools to support the data scientist we can add them into our own templates.

dsname        <- "weatherAUS"
ds            <- get(dsname)
vnames        <- names(ds)
names(ds)    %<>% normVarNames()
names(vnames) <- names(ds)

