A boxplot, also known as a whisker plot, is a standardized way of displaying the distribution of data based on a five-number summary: minimum, first quartile (Q1), median, third quartile (Q3), and maximum. It can also highlight outliers in the dataset.

Key Components

Uses:

  • Identifying Outliers.
  • Understanding the spread and skewness of the data Distributions.
  • Comparing distributions across different categories.
  • Need to remove then in order to do Data Cleansing.

Components:

  • Minimum: The smallest data point excluding outliers.
  • First Quartile (Q1): The median of the lower half of the dataset.
  • Median (Q2): The middle value of the dataset.
  • Third Quartile (Q3): The median of the upper half of the dataset.
  • Maximum: The largest data point excluding outliers.
  • Outliers: Data points that fall outside 1.5 times the interquartile range (IQR) above Q3 or below Q1.

Implementing Boxplot in Python

You can create a boxplot in Python using libraries like Matplotlib and Seaborn. Here’s how you can do it:

Implementation

import matplotlib.pyplot as plt
 
# Sample data
data = [20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40]
 
# Create a boxplot
plt.boxplot(data)
 
# Add title and labels
plt.title('Boxplot Example')
plt.ylabel('Values')
 
# Show plot
plt.show()
import seaborn as sns
import matplotlib.pyplot as plt
 
# Sample data
data = [20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40]
 
# Create a boxplot
sns.boxplot(data=data)
 
# Add title and labels
plt.title('Boxplot Example')
plt.ylabel('Values')
 
# Show plot
plt.show()