Data Science Desktop Survival Guide
by Graham Williams |
|||||
Missing Targets |
20180726 Sometimes there may be further operations to perform on the dataset prior to modelling. A common task is to deal with missing values. Here we remove observations with a missing target. As with any missing data we should also analyse whether there is any pattern to the missing targets. This may be indicative of a systemic data issue rather than simply randomly missing values.
# Check the dimensions to start with.
dim(ds)
# Identify observations with a missing target.
ds %>% pull(target) %>% is.na() -> missing.target # Check how many are found. sum(missing.target)
# Remove observations with a missing target.
ds %<>% filter(!missing.target) # Confirm the filter delivered the expected dataset. dim(ds)
|