10.14 Filter Rows Having Missing Values
20201202 To select the rows from a dataset which have missing values in any of the columns across the dataset we dplyr::filter() dplyr::across() tidyselect::everything() that base::is.na() and reduce it within the dplyr::filter() using the or operator. In the example we randomly sample a few rows and columns to show the result.
ds %>%
filter(across(everything(), is.na) %>% reduce(`|`)) %>%
sample_frac() %>%
select(date, location, sample(3:length(vars), 4))
## # A tibble: 156,731 × 6
## date location min_temp humidity_9am pressure_3pm wind_dir_9am
## <date> <chr> <dbl> <int> <dbl> <ord>
## 1 2013-03-12 Hobart 17.1 68 1010. NNW
## 2 2023-02-16 Witchcliffe 12.5 67 1013. NNW
## 3 2020-03-06 MountGambier 13.2 87 1019. SE
## 4 2019-10-25 PearceRAAF 6.5 49 1019 SSE
## 5 2018-10-20 Albury 14 69 1011. SE
## 6 2020-05-13 Albury 4.8 92 1023. <NA>
## 7 2017-12-13 MountGambier 13.4 26 1002. N
## 8 2014-07-13 Penrith 5.8 46 NA SW
## 9 2015-06-07 Albury 2.3 100 1026. ESE
## 10 2013-05-01 Nuriootpa 5.6 100 1024. <NA>
## # ℹ 156,721 more rows
Your donation will support ongoing availability and give you access to the PDF version of this book. Desktop Survival Guides include Data Science, GNU/Linux, and MLHub. Books available on Amazon include Data Mining with Rattle and Essentials of Data Science. Popular open source software includes rattle, wajig, and mlhub. Hosted by Togaware, a pioneer of free and open source software since 1984. Copyright © 1995-2022 Graham.Williams@togaware.com Creative Commons Attribution-ShareAlike 4.0