clean_na
function is designed to handle missing data
in a given dataset by excluding variables and observations based on the
provided parameters. It provides a flexible and comprehensive approach to
handling missing data by allowing to exclude variables based on patterns
in their names and filter observations that have a percentage of missing
data above a specified threshold. Additionally, it can perform a MCAR (Missing
Completely At Random) test if requested.
Usage
clean_na(
data,
scenario_based_vars = NULL,
missing_threshold = 60,
full_names = FALSE,
MCAR = FALSE,
main_vars = NULL,
answers_only = FALSE
)
Arguments
- data
The dataset to be cleaned.
- scenario_based_vars
Character vector with base names or full names of variables to exclude from the dataset. Can also be a numeric vector with column indices to be removed.
- missing_threshold
Percentage threshold for missing data per observation (defaults to 60 percent).
- full_names
Logical value indicating whether
scenario_based_vars
are full names of the variables (default is FALSE).- MCAR
Logical value indicating whether to perform a MCAR test (defaults to FALSE).
- main_vars
Character vector with base names of variables to include in the MCAR test. If NULL, all variables are included (default is NULL).
- answers_only
Logical value indicating whether to exclude non-answer variables (variables that do not contain a number nor 'demo' or 'Demo' in their name). If TRUE, these variables are treated the same way as scenario-based variables.
Examples
if (FALSE) {
# Generate some objects
df <- data.frame(a = c(1, 2, NA, 4, 5),
b = c("one", "two", "three", NA, "five"),
c = c(NA, NA, 3, 4, 5))
# Clean the environment but keep lists and the object named 'a'
clean_na(df, missing_threshold = 50, MCAR = TRUE, answers_only = FALSE)
}