10.14 Filter Rows Having Missing Values
20201202 To select the rows from a dataset which have missing values in any of the columns across the dataset we dplyr::filter() dplyr::across() tidyselect::everything() that base::is.na() and reduce it within the dplyr::filter() using the or operator. In the example we randomly sample a few rows and columns to show the result.
ds %>%
filter(across(everything(), is.na) %>% reduce(`|`)) %>%
sample_frac() %>%
select(date, location, sample(3:length(vars), 4))## # A tibble: 141,496 × 6
## date location cloud_9am cloud_3pm rain_tomorrow wind_gust_speed
## <date> <chr> <int> <int> <fct> <dbl>
## 1 2020-01-13 Nhil NA NA No 44
## 2 2012-06-26 Bendigo 1 2 No 33
## 3 2009-11-27 PearceRAAF 4 4 No 39
## 4 2019-09-13 Tuggeranong NA NA No 39
## 5 2013-01-29 Witchcliffe NA NA No 35
## 6 2014-10-19 Richmond NA NA No 30
## 7 2019-09-22 Brisbane 3 1 No 28
## 8 2020-10-29 NorfolkIsland NA NA Yes 43
## 9 2021-09-30 BadgerysCreek NA NA No 20
## 10 2021-01-20 Nhil NA NA No 39
## # ℹ 141,486 more rows
Your donation will support ongoing availability and give you access to the PDF version of this book. Desktop Survival Guides include Data Science, GNU/Linux, and MLHub. Books available on Amazon include Data Mining with Rattle and Essentials of Data Science. Popular open source software includes rattle, wajig, and mlhub. Hosted by Togaware, a pioneer of free and open source software since 1984. Copyright © 1995-2022 Graham.Williams@togaware.com Creative Commons Attribution-ShareAlike 4.0