In pandas, the melt
function is used to ==transform (Data Transformation) a DataFrame from a wide format to a long format==. This is especially useful for data analysis and visualization tasks where long-format data is preferred or required. The wide format typically has multiple columns for different variables, whereas the long format has a single column for variable names and a single column for values.
Related:
In DE_Tools see:
Key Reasons to Use melt
:
-
- Wide to Long Transformation:
melt
helps in converting data with many columns (wide format) into a more normalized form with fewer columns (long format). This is useful for many statistical and visualization libraries that prefer long-format data.
- Wide to Long Transformation:
-
Easier Analysis and Data Visualisation:
- Compatibility with Plotting Libraries: Many plotting libraries like
seaborn
andggplot
require data in a long format for creating certain types of plots, such as Grouped plots.
- Compatibility with Plotting Libraries: Many plotting libraries like
-
Simplifying Complex Data Structures:
- Handling Multi-index Columns: If a DataFrame has multiple levels of columns,
melt
can help flatten this structure, making it easier to work with.
- Handling Multi-index Columns: If a DataFrame has multiple levels of columns,
-
Preparation for Aggregation:
- Facilitating Groupby Operations: Long-format data is often more suitable for these.
Parameters of melt
:
id_vars
: Columns to use as identifier variables. These columns are kept as-is in the output.value_vars
: Columns to unpivot. These columns are transformed into a single column.var_name
: Name to use for thevariable
column in the output.value_name
: Name to use for thevalue
column in the output.
Example Usage:
Consider a DataFrame in wide format:
import pandas as pd
# Sample wide format data
data = {
'id': [1, 2, 3],
'math_score': [88, 92, 95],
'science_score': [85, 90, 89],
'english_score': [78, 85, 88]
}
df_wide = pd.DataFrame(data)
print(df_wide)
Output:
id math_score science_score english_score
0 1 88 85 78
1 2 92 90 85
2 3 95 89 88
To convert this wide-format DataFrame into a long-format DataFrame using melt
:
# Melt the DataFrame
df_long = pd.melt(df_wide, id_vars=['id'],
value_vars=['math_score', 'science_score', 'english_score'],
var_name='subject', value_name='score')
print(df_long)
Output:
id subject score
0 1 math_score 88
1 2 math_score 92
2 3 math_score 95
3 1 science_score 85
4 2 science_score 90
5 3 science_score 89
6 1 english_score 78
7 2 english_score 85
8 3 english_score 88