Skip to contents

The function will flag if a survey for its missing values. The missing values column can be created with add_percentage_missing and the values are flagged with check_outliers.

Usage

check_percentage_missing(
  dataset,
  uuid_column = "uuid",
  column_to_check = "percentage_missing",
  strongness_factor = 2,
  log_name = "percentage_missing_log"
)

Arguments

dataset

a dataset to be check as a dataframe or a list with the dataframe stored as "checked_dataset".

uuid_column

uuid column in the dataset. Default is "uuid".

column_to_check

string character with the name of the columns to check. Default is "percentage_missing"

strongness_factor

Strongness factor define how strong your outliers will be. The default is 3.

log_name

name of the log of flagged value, default is percentage_missing_log

Value

return a list with the dataset checked stored as checked_dataset and a dataframe with the flagged values log

Examples

 
# Adding the percentage missing first
data_example <- data.frame(
  uuid = letters[1:3],
  col_1 = c(1:3),
  col_2 = c(NA, NA, "expenditures"),
  col_3 = c("with need", NA, "with need"),
  col_4 = c("food health school", NA, "food"),
  col_4.food = c(1, NA, 1),
  col_4.health = c(1, NA, 0),
  col_4.school = c(1, NA, 0)
)

data_example <- data_example %>%
    add_percentage_missing()

data_example %>% 
  check_percentage_missing() |>
  knitr::kable()
#> [1] "checking_percentage_missing"
#> 
#> 
#> |uuid | col_1|col_2        |col_3     |col_4              | col_4.food| col_4.health| col_4.school| percentage_missing|
#> |:----|-----:|:------------|:---------|:------------------|----------:|------------:|------------:|------------------:|
#> |a    |     1|NA           |with need |food health school |          1|            1|            1|              0.125|
#> |b    |     2|NA           |NA        |NA                 |         NA|           NA|           NA|              0.750|
#> |c    |     3|expenditures |with need |food               |          1|            0|            0|              0.000|
#> 
#> |uuid |issue |question |old_value |
#> |:----|:-----|:--------|:---------|


# With a dataset that already has a percentage missing
data_example2 <- data.frame(
  uuid = letters,
  any_cols = LETTERS,
  any_number = 1:26,
  percentage_missing = c(rep(.05, 25), .99)
)

data_example2 %>% 
  check_percentage_missing() |>
  knitr::kable()
#> [1] "checking_percentage_missing"
#> 
#> 
#> |uuid |any_cols | any_number| percentage_missing|
#> |:----|:--------|----------:|------------------:|
#> |a    |A        |          1|               0.05|
#> |b    |B        |          2|               0.05|
#> |c    |C        |          3|               0.05|
#> |d    |D        |          4|               0.05|
#> |e    |E        |          5|               0.05|
#> |f    |F        |          6|               0.05|
#> |g    |G        |          7|               0.05|
#> |h    |H        |          8|               0.05|
#> |i    |I        |          9|               0.05|
#> |j    |J        |         10|               0.05|
#> |k    |K        |         11|               0.05|
#> |l    |L        |         12|               0.05|
#> |m    |M        |         13|               0.05|
#> |n    |N        |         14|               0.05|
#> |o    |O        |         15|               0.05|
#> |p    |P        |         16|               0.05|
#> |q    |Q        |         17|               0.05|
#> |r    |R        |         18|               0.05|
#> |s    |S        |         19|               0.05|
#> |t    |T        |         20|               0.05|
#> |u    |U        |         21|               0.05|
#> |v    |V        |         22|               0.05|
#> |w    |W        |         23|               0.05|
#> |x    |X        |         24|               0.05|
#> |y    |Y        |         25|               0.05|
#> |z    |Z        |         26|               0.99|
#> 
#> |uuid |issue                                                                   |question           |old_value |
#> |:----|:-----------------------------------------------------------------------|:------------------|:---------|
#> |z    |Percentages of missing values from this survey is different from others |percentage_missing |0.99      |