10.14 Filter Rows Having Missing Values

20201202 To select the rows from a dataset which have missing values in any of the columns across the dataset we dplyr::filter() dplyr::across() tidyselect::everything() that base::is.na() and reduce it within the dplyr::filter() using the or operator. In the example we randomly sample a few rows and columns to show the result.

ds %>%
  filter(across(everything(), is.na) %>% reduce(`|`)) %>%
  sample_frac() %>%
  select(date, location, sample(3:length(vars), 4))
## # A tibble: 141,496 × 6
##    date       location      cloud_9am cloud_3pm rain_tomorrow wind_gust_speed
##    <date>     <chr>             <int>     <int> <fct>                   <dbl>
##  1 2020-01-13 Nhil                 NA        NA No                         44
##  2 2012-06-26 Bendigo               1         2 No                         33
##  3 2009-11-27 PearceRAAF            4         4 No                         39
##  4 2019-09-13 Tuggeranong          NA        NA No                         39
##  5 2013-01-29 Witchcliffe          NA        NA No                         35
##  6 2014-10-19 Richmond             NA        NA No                         30
##  7 2019-09-22 Brisbane              3         1 No                         28
##  8 2020-10-29 NorfolkIsland        NA        NA Yes                        43
##  9 2021-09-30 BadgerysCreek        NA        NA No                         20
## 10 2021-01-20 Nhil                 NA        NA No                         39
## # ℹ 141,486 more rows


Your donation will support ongoing availability and give you access to the PDF version of this book. Desktop Survival Guides include Data Science, GNU/Linux, and MLHub. Books available on Amazon include Data Mining with Rattle and Essentials of Data Science. Popular open source software includes rattle, wajig, and mlhub. Hosted by Togaware, a pioneer of free and open source software since 1984. Copyright © 1995-2022 Graham.Williams@togaware.com Creative Commons Attribution-ShareAlike 4.0