10.18 Replace Missing Values

20201026 See Section 10.13 to replace missing vallues with an imputed (or guessed at) values, and Section 10.9 to drop rows in a dataset that contain missing values.

To replace missing values (NA) in a data set with a specific default value, like 0 for numeric data, we can use tidyr::replace_na() within a pipeline. In the following example only the numeric columns of the dataset are considered dplyr::across() the dataset, by checking tidyselect::where() the data base::is.numeric().

ds %>%
  mutate(across(where(is.numeric), ~replace_na(.x, 0)))
## # A tibble: 217,049 × 24
##    date       location min_temp max_temp rainf…¹ evapo…² sunsh…³ wind_…⁴ wind_…⁵
##    <date>     <chr>       <dbl>    <dbl>   <dbl>   <dbl>   <dbl> <ord>     <dbl>
##  1 2008-12-01 Albury       13.4     22.9     0.6     4.8     8.5 W            44
##  2 2008-12-02 Albury        7.4     25.1     0       4.8     8.5 WNW          44
##  3 2008-12-03 Albury       12.9     25.7     0       4.8     8.5 WSW          46
##  4 2008-12-04 Albury        9.2     28       0       4.8     8.5 NE           24
##  5 2008-12-05 Albury       17.5     32.3     1       4.8     8.5 W            41
##  6 2008-12-06 Albury       14.6     29.7     0.2     4.8     8.5 WNW          56
##  7 2008-12-07 Albury       14.3     25       0       4.8     8.5 W            50
##  8 2008-12-08 Albury        7.7     26.7     0       4.8     8.5 W            35
##  9 2008-12-09 Albury        9.7     31.9     0       4.8     8.5 NNW          80
## 10 2008-12-10 Albury       13.1     30.1     1.4     4.8     8.5 W            28
## # … with 217,039 more rows, 15 more variables: wind_dir_9am <ord>,
## #   wind_dir_3pm <ord>, wind_speed_9am <dbl>, wind_speed_3pm <dbl>,
## #   humidity_9am <dbl>, humidity_3pm <dbl>, pressure_9am <dbl>,
## #   pressure_3pm <dbl>, cloud_9am <dbl>, cloud_3pm <dbl>, temp_9am <dbl>,
## #   temp_3pm <dbl>, rain_today <fct>, risk_mm <dbl>, rain_tomorrow <fct>, and
## #   abbreviated variable names ¹​rainfall, ²​evaporation, ³​sunshine,
## #   ⁴​wind_gust_dir, ⁵​wind_gust_speed


Your donation will support ongoing availability and give you access to the PDF version of this book. Desktop Survival Guides include Data Science, GNU/Linux, and MLHub. Books available on Amazon include Data Mining with Rattle and Essentials of Data Science. Popular open source software includes rattle, wajig, and mlhub. Hosted by Togaware, a pioneer of free and open source software since 1984. Copyright © 1995-2022 Graham.Williams@togaware.com Creative Commons Attribution-ShareAlike 4.0