Go to TogaWare.com Home Page. Data Science Desktop Survival Guide
by Graham Williams
Duck Duck Go



CLICK HERE TO VISIT THE UPDATED SURVIVAL GUIDE

Evaporation and Sunshine

20180723 The next two character variables are: evaporation, sunshine. It does seem odd that these would be character, expecting both to be numeric values. If we look at the top of the dataset we see they have missing values.

# Note the character remaining variables to be dealt with.

head(ds$evaporation)
## [1] NA NA NA NA NA NA

head(ds$sunshine)
## [1] NA NA NA NA NA NA

# Review other random values.

sample(ds$evaporation, 8)
## [1]  NA  NA 5.4  NA 2.8  NA  NA 3.4

sample(ds$sunshine, 8)
## [1]  2.1 11.7   NA  0.1 10.4   NA  3.6  0.5

The heuristic used to determine the data type when ingesting data only looks at a subset of all the data before it determines the data type. In this case the early observations are all missing and so default to character which is general enough to capture all potential values. We need to convert the variables to numeric.

# Identify the vairables to process.

cvars <- c("evaporation", "sunshine")

# Check the current class of the variables.

ds[cvars] %>% sapply(class)
## evaporation    sunshine 
##   "numeric"   "numeric"

# Convert to numeric.

ds[cvars] %<>% sapply(as.numeric)

# Review some random values.

sample(ds$evaporation, 10)
##  [1] 2.6  NA 2.4  NA  NA 0.4  NA 1.0  NA 8.2

sample(ds$sunshine, 10)
##  [1] 9.7 0.6 7.9  NA  NA 7.6 9.4  NA 5.2 4.2


Support further development by purchasing the PDF version of the book.
Other online resources include the GNU/Linux Desktop Survival Guide.
Books available on Amazon include Data Mining with Rattle and Essentials of Data Science.
Popular open source software includes rattle and wajig.
Hosted by Togaware, a pioneer of free and open source software since 1984.
Copyright © 2000-2020 Togaware Pty Ltd. . Creative Commons ShareAlike V4.