Exploratory Data Analysis (EDA) is an approach to analyzing datasets to summarize their main characteristics, often utilizing visual methods. EDA helps users to:
- Understand the Data’s Structure: Gain insights into the organization and format of the data.
- Detect Patterns: Identify trends and patterns within the data.
- Decide on Statistical Techniques: Choose appropriate statistical methods by examining distributions and correlation.
- Select Variables: Determine which variables to include in further analysis.
- Handle Data Quality: Address issues related to data quality and integrity.
- Spot Anomalies and Outliers: Identify unusual data points that may affect analysis.
- Generate and Test Hypotheses: Formulate hypotheses and validate them using statistical methods.
- Check Assumptions: Verify assumptions through statistical summaries and graphical representations.
Common Techniques Used in EDA
- Descriptive Statistics: Calculating measures such as mean, median, mode, standard deviation, and percentiles to summarize data.
- Data Visualisation: Using plots and charts like histograms, box plots, scatter plots, and bar charts to visually explore data.
- Correlation Analysis: Assessing relationships between variables using correlation coefficients and scatter plots.
- Data Transformation: Applying transformations to data, such as normalization or log transformation, to better understand its characteristics.
Implementation
In ML_Tools see: