Explorations\Preprocess\One_hot_encoding\One_hot_encoding.py
This script demonstrates how to preprocess categorical variables and apply linear regression for house price prediction. Key steps include:
- Data Loading: It loads a dataset of house prices.
- Dummy Variables: It creates dummy variables for the ‘town’ column using
pd.get_dummies()
and merges them with the original dataframe. - Dummy Variable Trap: It drops one dummy variable to avoid multicollinearity (dummy variable trap).
- Feature and Target Split: It separates the dataset into features (X) and the target variable (price).
- Model Training: A Linear Regression model is trained on the data.
- Predictions: It predicts house prices based on various features and evaluates the model’s accuracy.
- Label Encoding and One-Hot Encoding: It applies
LabelEncoder
to convert ‘town’ names into numbers and usesOneHotEncoder
to create dummy variables for categorical columns. - Final Predictions: It predicts prices using the transformed features and evaluates the model’s performance.