Data Science Desktop Survival Guide
by Graham Williams |
|||||
Missing Value Imputation |
20201026 See Section 9.15 to replace missing values with specific values.
Missing value imputation is useful but must be done with care. It can be akin to inventing new data. We may be tempted to do so as a quick fix for avoiding warnings that would otherwise advise us of missing data when using ggplot2, for example. We can utilise the imputation function randomForest::na.roughfix() to perform missing value imputation through the use of machine learning to fill in the gaps. This particular function operates on numeric and factor columns, thus we remove the first two columns from the dataset to be imputed (date and location),
# Count the number of missing values.
ds %>% is.na() %>% sum()
# No missing values in the first two columns (date and location)
ds[1:2] %>% is.na() %>% sum()
# Impute missing values.
ds[3:24] %<>% na.roughfix() # Confirm that no missing values remain. ds %>% is.na() %>% sum()
|