Check the percentages of missing value
check_percentage_missing.Rd
The function will flag if a survey for its missing values. The missing values column can be created with add_percentage_missing and the values are flagged with check_outliers.
Usage
check_percentage_missing(
dataset,
uuid_column = "uuid",
column_to_check = "percentage_missing",
strongness_factor = 2,
log_name = "percentage_missing_log"
)
Arguments
- dataset
a dataset to be check as a dataframe or a list with the dataframe stored as "checked_dataset".
- uuid_column
uuid column in the dataset. Default is "uuid".
- column_to_check
string character with the name of the columns to check. Default is "percentage_missing"
- strongness_factor
Strongness factor define how strong your outliers will be. The default is 3.
- log_name
name of the log of flagged value, default is percentage_missing_log
Value
return a list with the dataset checked stored as checked_dataset and a dataframe with the flagged values log
Examples
# Adding the percentage missing first
data_example <- data.frame(
uuid = letters[1:3],
col_1 = c(1:3),
col_2 = c(NA, NA, "expenditures"),
col_3 = c("with need", NA, "with need"),
col_4 = c("food health school", NA, "food"),
col_4.food = c(1, NA, 1),
col_4.health = c(1, NA, 0),
col_4.school = c(1, NA, 0)
)
data_example <- data_example %>%
add_percentage_missing()
data_example %>%
check_percentage_missing() |>
knitr::kable()
#> [1] "checking_percentage_missing"
#>
#>
#> |uuid | col_1|col_2 |col_3 |col_4 | col_4.food| col_4.health| col_4.school| percentage_missing|
#> |:----|-----:|:------------|:---------|:------------------|----------:|------------:|------------:|------------------:|
#> |a | 1|NA |with need |food health school | 1| 1| 1| 0.125|
#> |b | 2|NA |NA |NA | NA| NA| NA| 0.750|
#> |c | 3|expenditures |with need |food | 1| 0| 0| 0.000|
#>
#> |uuid |issue |question |old_value |
#> |:----|:-----|:--------|:---------|
# With a dataset that already has a percentage missing
data_example2 <- data.frame(
uuid = letters,
any_cols = LETTERS,
any_number = 1:26,
percentage_missing = c(rep(.05, 25), .99)
)
data_example2 %>%
check_percentage_missing() |>
knitr::kable()
#> [1] "checking_percentage_missing"
#>
#>
#> |uuid |any_cols | any_number| percentage_missing|
#> |:----|:--------|----------:|------------------:|
#> |a |A | 1| 0.05|
#> |b |B | 2| 0.05|
#> |c |C | 3| 0.05|
#> |d |D | 4| 0.05|
#> |e |E | 5| 0.05|
#> |f |F | 6| 0.05|
#> |g |G | 7| 0.05|
#> |h |H | 8| 0.05|
#> |i |I | 9| 0.05|
#> |j |J | 10| 0.05|
#> |k |K | 11| 0.05|
#> |l |L | 12| 0.05|
#> |m |M | 13| 0.05|
#> |n |N | 14| 0.05|
#> |o |O | 15| 0.05|
#> |p |P | 16| 0.05|
#> |q |Q | 17| 0.05|
#> |r |R | 18| 0.05|
#> |s |S | 19| 0.05|
#> |t |T | 20| 0.05|
#> |u |U | 21| 0.05|
#> |v |V | 22| 0.05|
#> |w |W | 23| 0.05|
#> |x |X | 24| 0.05|
#> |y |Y | 25| 0.05|
#> |z |Z | 26| 0.99|
#>
#> |uuid |issue |question |old_value |
#> |:----|:-----------------------------------------------------------------------|:------------------|:---------|
#> |z |Percentages of missing values from this survey is different from others |percentage_missing |0.99 |