Exploratory Data Analysis (EDA)
Exploratory data analysis (EDA) is defined as a method to analyze datasets and sum up their main characteristics to derive useful conclusions, often with visual methods.
The purpose of EDA is to:
- Discover patterns within a dataset
- Spot anomalies
- Form hypotheses regarding the behavior of data
- Validate assumptions
Everything from basic summary statistics to complex visualizations helps us gain an intuitive understanding of the data itself, which is highly important when it comes to forming new hypotheses about the data and uncovering what parameters affect the target variable. Often, discovering how the target variable varies across a single feature gives us an indication of how important a feature might be, and a variation across a combination of several features helps us to come up with ideas for new informative features to engineer.
Most explorations and visualizations are intended to understand the relationship between...