10.49 Evaporation and Sunshine

20180723 The next two character variables are: evaporation, sunshine. It does seem odd that these would be character, expecting both to be numeric values. If we look at the top of the dataset we see they have missing values.

# Note the character remaining variables to be dealt with.

head(ds$evaporation)

## [1] NA NA NA NA NA NA

head(ds$sunshine)

## [1] NA NA NA NA NA NA

# Review other random values.

sample(ds$evaporation, 8)

## [1] 8.0 2.2 7.6  NA  NA 4.4  NA  NA

sample(ds$sunshine, 8)

## [1]   NA   NA   NA   NA 10.4   NA   NA   NA

The heuristic used to determine the data type when ingesting data only looks at a subset of all the data before it determines the data type. In this case the early observations are all missing and so default to character which is general enough to capture all potential values. We need to convert the variables to numeric.

# Identify the vairables to process.

cvars <- c("evaporation", "sunshine")

# Check the current class of the variables.

ds[cvars] %>% sapply(class)

## evaporation    sunshine 
##   "numeric"   "numeric"

# Convert to numeric.

ds[cvars] %<>% sapply(as.numeric)

# Review some random values.

sample(ds$evaporation, 10)

##  [1]  NA 1.0  NA  NA  NA  NA 3.4  NA 8.6 6.8

sample(ds$sunshine, 10)

##  [1]  9.9 11.0  0.1  7.3   NA   NA   NA  3.8   NA  0.5

Your donation will support ongoing availability and give you access to the PDF version of this book. Desktop Survival Guides include Data Science, GNU/Linux, and MLHub. Books available on Amazon include Data Mining with Rattle and Essentials of Data Science. Popular open source software includes rattle, wajig, and mlhub. Hosted by Togaware, a pioneer of free and open source software since 1984. Copyright © 1995-2022 Graham.Williams@togaware.com Creative Commons Attribution-ShareAlike 4.0