--- title: "autoEDA" author: "Prof. Eric A. Suess" format: html --- ## Introduction Lets try out a numbers of ways to begin to look at a dataframe to determine what is in it. We need to know the answers to the following questions: * How many rows and columns? * How many numeric and categorical variables there are in the dataset? * What percentage of the data is missing? What percentage of each column is missing? * After selecting the variables we are interested in working with, how much data is left after dropping the NAs. * Are there ways to input data for the variables that have enough data to use to produces imputed values. Can we use visualization to help with exploring the answers to these questions? ## Some Examples ```{r} library(pacman) p_load(palmerpenguins, NHANES, nycflights13, skimr, tidyverse, DataExplorer, dataReporter) ``` # Lets try *skim()* ```{r} data(mtcars) head(mtcars) skim(mtcars) ``` ```{r} data(penguins) head(penguins) skim(penguins) ``` ```{r} help(NHANES) data("NHANES") head(NHANES) skim(NHANES) ``` ```{r} data(diamonds) head(diamonds) skim(diamonds) ``` ```{r} data(flights) head(flights) skim(flights) ``` ## Visualize the missing values ```{r} Amelia::missmap(mtcars) naniar::vis_miss(mtcars) ``` ```{r} Amelia::missmap(penguins) naniar::vis_miss(penguins) ``` ```{r} Amelia::missmap(NHANES) naniar::vis_miss(NHANES) ``` ```{r} naniar::gg_miss_var(NHANES) ``` ```{r} Amelia::missmap(diamonds) naniar::gg_miss_var(diamonds) ``` ```{r} Amelia::missmap(slice_sample(flights,n=10000)) naniar::vis_miss(slice_sample(flights,n=10000)) ``` ## Try DataExplorer Note that we can specify the name of the output file. ```{r} create_report(mtcars, output_file = "report_mtcars_01.html") create_report(mtcars, y ="mpg", output_file = "report_mtcars_02.html") ``` ```{r} create_report(NHANES, y = "SleepHrsNight") ``` ```{r} create_report(NHANES, y = "SleepTrouble") ```