Data transformation is the process of converting data from one format to another.
Data transformation may involve:
- Data Cleansing: Removing inconsistencies and errors.
- Structuring: Organizing raw data.
- Aggregation: Summarizing data for analysis (Pandas Pivot Table).
- Selection:
- Joining: Merging datasets for completeness (also see SQL Joins).
Others:
- Normalizing: Standardizing data distributions for consistency. See Normalisation
- Sorting: Arranging data in a logical order.
- Enriching: Adding external or missing information.
- Validating: Ensuring data integrity and accuracy.
Key Aspects
- Normalization & Scaling: Adjusting numerical values to a consistent range. See: Normalisation of data
- Data Type Conversion: Changing data types (e.g., converting strings to integers).
- Schema Normalization: Ensuring a consistent data structure for efficiency.
- File Format Conversion: Transforming data between formats (e.g.,
.xls
to.csv
).
Related: