Data Science Desktop Survival Guide by Graham Williams Desktop Survival Project Home Preface Data Science Introducing R R Constructs R Tasks R Strings R Read, Write, and Create Data Template Data Exploration Data Wrangling Data Visualisation Statistics ML Template ML Scenarios ML Activities ML Applications ML Algorithms Cluster Analysis Decision Trees Computer Vision Graph Data Privacy Literate Data Science Coding with Style Resources Bibliography Index

## Factors

20180908 For datasets that we load into R we will not always have examples of all possible levels of a factor. Consequently it is not always possible to automatically list all of the levels automatically. Be default the tidyverse ingests these variables as character so that we can take specific action to convert them to factor as required.

We first review the number of unique levels for each of the factors.

# Observe the unique levels.

ds[charc] %>% sapply(unique)
 ```## location ## [1,] "Albury" ## [2,] "BadgerysCreek" ## [3,] "Cobar" ## [4,] "CoffsHarbour" ## [5,] "Moree" ## [6,] "Newcastle" ## [7,] "NorahHead" ## [8,] "NorfolkIsland" ## [9,] "Penrith" ## [10,] "Richmond" ## [11,] "Sydney" ## [12,] "SydneyAirport" ## [13,] "WaggaWagga" ## [14,] "Williamtown" ## [15,] "Wollongong" ## [16,] "Canberra" ## [17,] "Tuggeranong" ## [18,] "MountGinini" ## [19,] "Ballarat" ## [20,] "Bendigo" ## [21,] "Sale" ## [22,] "MelbourneAirport" ## [23,] "Melbourne" ## [24,] "Mildura" .... ```

If we decide to convert all of these variables from character into factor, then we can do so using base::factor().

 # Convert all chracter variables to be factors. ds[charc] %<>% map(factor) We don't actually do so here instead considering each character variable in turn to decide how to handle it, especially that we might observe that evaporation and sunshine appear to be numeric. A oneliner to do the conversion:
 ds %<>% mutate_if(sapply(ds, is.character), as.factor)