 Data Science Desktop Survival Guide by Graham Williams Desktop Survival Project Home Preface Data Science Introducing R R Constructs R Tasks R Strings R Read, Write, and Create Data Template Data Exploration Data Wrangling Data Visualisation Statistics ML Template ML Scenarios ML Activities ML Applications ML Algorithms Cluster Analysis Decision Trees Computer Vision Graph Data Privacy Literate Data Science Coding with Style Resources Bibliography Index

## Evaporation and Sunshine

20180723 The next two character variables are: `evaporation`, `sunshine`. It does seem odd that these would be character, expecting both to be numeric values. If we look at the top of the dataset we see they have missing values.

# Note the character remaining variables to be dealt with.

 ```##  NA NA NA NA NA NA ```

 ```##  NA NA NA NA NA NA ```

# Review other random values.

sample(ds\$evaporation, 8)
 ```##  9.2 5.6 3.4 6.4 4.2 9.8 NA 0.6 ```

sample(ds\$sunshine, 8)
 ```##  NA NA NA 11.6 NA NA 6.8 4.5 ```

The heuristic used to determine the data type when ingesting data only looks at a subset of all the data before it determines the data type. In this case the early observations are all missing and so default to character which is general enough to capture all potential values. We need to convert the variables to numeric.

# Identify the vairables to process.

cvars <- c("evaporation", "sunshine")

# Check the current class of the variables.

ds[cvars] %>% sapply(class)
 ```## evaporation sunshine ## "numeric" "numeric" ```

# Convert to numeric.

ds[cvars] %<>% sapply(as.numeric)

# Review some random values.

sample(ds\$evaporation, 10)
 ```##  NA NA 0.8 1.8 NA 4.2 NA NA 16.0 8.6 ```

sample(ds\$sunshine, 10)
 ```##  NA 3.0 8.0 8.9 6.0 NA NA NA 7.9 11.1 ```