Exploratory Data Analysis (EDA) is an approach to analyzing datasets to summarize their main characteristics, often utilizing visual methods. EDA helps users to:

  • Understand the Data’s Structure: Gain insights into the organization and format of the data.
  • Detect Patterns: Identify trends and patterns within the data.
  • Decide on Statistical Techniques: Choose appropriate statistical methods by examining distributions and correlation.
  • Select Variables: Determine which variables to include in further analysis.
  • Handle Data Quality: Address issues related to data quality and integrity.
  • Spot Anomalies and Outliers: Identify unusual data points that may affect analysis.
  • Generate and Test Hypotheses: Formulate hypotheses and validate them using statistical methods.
  • Check Assumptions: Verify assumptions through statistical summaries and graphical representations.

Common Techniques Used in EDA

  • Descriptive Statistics: Calculating measures such as mean, median, mode, standard deviation, and percentiles to summarize data.
  • Data Visualisation: Using plots and charts like histograms, box plots, scatter plots, and bar charts to visually explore data.
  • Correlation Analysis: Assessing relationships between variables using correlation coefficients and scatter plots.
  • Data Transformation: Applying transformations to data, such as normalization or log transformation, to better understand its characteristics.

Implementation

In ML_Tools see: