10.32 Factors

20180908 For datasets that we load into R we will not always have examples of all possible levels of a factor. Consequently it is not always possible to automatically list all of the levels automatically. Be default the tidyverse ingests these variables as character so that we can take specific action to convert them to factor as required.

We first review the number of unique levels for each of the factors.

# Observe the unique levels.

ds[charc] %>% sapply(unique)
##       location          
##  [1,] "Albury"          
##  [2,] "BadgerysCreek"   
##  [3,] "Cobar"           
##  [4,] "CoffsHarbour"    
##  [5,] "Moree"           
##  [6,] "Newcastle"       
##  [7,] "NorahHead"       
##  [8,] "NorfolkIsland"   
##  [9,] "Penrith"         
## [10,] "Richmond"        
## [11,] "Sydney"          
## [12,] "SydneyAirport"   
## [13,] "WaggaWagga"      
## [14,] "Williamtown"     
## [15,] "Wollongong"      
## [16,] "Canberra"        
## [17,] "Tuggeranong"     
## [18,] "MountGinini"     
## [19,] "Ballarat"        
## [20,] "Bendigo"         
## [21,] "Sale"            
## [22,] "MelbourneAirport"
## [23,] "Melbourne"       
## [24,] "Mildura"         
## [25,] "Nhil"            
## [26,] "Portland"        
## [27,] "Watsonia"        
## [28,] "Dartmoor"        
## [29,] "Brisbane"        
## [30,] "Cairns"          
## [31,] "GoldCoast"       
## [32,] "Townsville"      
## [33,] "Adelaide"        
## [34,] "MountGambier"    
## [35,] "Nuriootpa"       
## [36,] "Woomera"         
## [37,] "Albany"          
## [38,] "Witchcliffe"     
## [39,] "PearceRAAF"      
## [40,] "PerthAirport"    
## [41,] "Perth"           
## [42,] "SalmonGums"      
## [43,] "Walpole"         
## [44,] "Hobart"          
## [45,] "Launceston"      
## [46,] "AliceSprings"    
## [47,] "Darwin"          
## [48,] "Katherine"       
## [49,] "Uluru"

If we decide to convert all of these variables from character into factor, then we can do so using base::factor().

# Convert all chracter variables to be factors.

ds[charc] %<>% map(factor)

We don’t actually do so here instead considering each character variable in turn to decide how to handle it, especially that we might observe that evaporation and sunshine appear to be numeric.

A oneliner to do the conversion:

ds %<>% mutate_if(sapply(ds, is.character), as.factor)


Your donation will support ongoing development and give you access to the PDF version of this book. Desktop Survival Guides include Data Science, GNU/Linux, and MLHub. Books available on Amazon include Data Mining with Rattle and Essentials of Data Science. Popular open source software includes rattle, wajig, and mlhub. Hosted by Togaware, a pioneer of free and open source software since 1984.
Copyright © 1995-2021 Graham.Williams@togaware.com Creative Commons Attribution-ShareAlike 4.0.