Go to TogaWare.com Home Page. Data Science Desktop Survival Guide
by Graham Williams
Duck Duck Go

Missing Value Imputation

20201026 See Section 9.15 to replace missing values with specific values.

Missing value imputation is useful but must be done with care. It can be akin to inventing new data. We may be tempted to do so as a quick fix for avoiding warnings that would otherwise advise us of missing data when using ggplot2, for example. We can utilise the imputation function randomForest::na.roughfix() to perform missing value imputation through the use of machine learning to fill in the gaps. This particular function operates on numeric and factor columns, thus we remove the first two columns from the dataset to be imputed (date and location),

# Count the number of missing values.

ds %>% is.na() %>% sum()
## [1] 464300

# No missing values in the first two columns (date and location)

ds[1:2] %>% is.na() %>% sum()
## [1] 0

# Impute missing values.

ds[3:24] %<>% na.roughfix()

# Confirm that no missing values remain.

ds %>% is.na() %>% sum()
## [1] 0


Support further development by purchasing the PDF version of the book.
Other online resources include the GNU/Linux Desktop Survival Guide.
Books available on Amazon include Data Mining with Rattle and Essentials of Data Science.
Popular open source software includes rattle and wajig.
Hosted by Togaware, a pioneer of free and open source software since 1984.
Copyright © 2000-2020 Togaware Pty Ltd. . Creative Commons ShareAlike V4.