Using SQLite to Process and Split Combined Data from Excel
Summary:
Read and Clean the Data: Load the data from the Excel sheet and clean it.
Split the Data: Separate the data into two DataFrames, one for customers and one for orders.
Create Tables: Create the SQLite tables with appropriate foreign key relationships.
Insert Data: Insert the cleaned and separated data into the respective tables.
Verify Foreign Key: Ensure that the foreign key relationships are valid.
Example Data Structure
Combined Excel Data (Sheet1):
order_id
order_date
customer_id
customer_name
contact_name
country
amount
1
2024-01-15
101
John Doe
Jane Doe
USA
100.50
2
2024-02-20
102
Alice Smith
Bob Smith
Canada
200.00
3
2024-03-10
101
John Doe
Jane Doe
USA
150.75
4
2024-04-05
103
Michael Brown
Sarah Brown
UK
250.00
Steps to Process and Split Data
Step 1: Import Libraries and Read Data
import pandas as pdimport sqlite3# Example data for demonstration purposesdata = { 'order_id': [1, 2, 3, 4], 'order_date': ['2024-01-15', '2024-02-20', '2024-03-10', '2024-04-05'], 'customer_id': [101, 102, 101, 103], 'customer_name': ['John Doe', 'Alice Smith', 'John Doe', 'Michael Brown'], 'contact_name': ['Jane Doe', 'Bob Smith', 'Jane Doe', 'Sarah Brown'], 'country': ['USA', 'Canada', 'USA', 'UK'], 'amount': [100.50, 200.00, 150.75, 250.00]}# Create a DataFrame from the example datadf = pd.DataFrame(data)
Step 2: Clean Data
# Example cleaning functiondef clean_data(df): df.dropna(inplace=True) df.columns = [col.strip().replace(" ", "_").lower() for col in df.columns] return df# Clean the datadf = clean_data(df)
# Connect to SQLite database (or create it)conn = sqlite3.connect('data.db')cursor = conn.cursor()# Enable foreign key supportcursor.execute("PRAGMA foreign_keys = ON")# Create 'customers' tablecursor.execute('''CREATE TABLE IF NOT EXISTS customers ( customer_id INTEGER PRIMARY KEY, customer_name TEXT NOT NULL, contact_name TEXT, country TEXT)''')# Create 'orders' table with a foreign key referencing 'customers'cursor.execute('''CREATE TABLE IF NOT EXISTS orders ( order_id INTEGER PRIMARY KEY, order_date TEXT, customer_id INTEGER, amount REAL, FOREIGN KEY (customer_id) REFERENCES customers (customer_id))''')# Commit changesconn.commit()
Step 5: Insert Data into Tables
# Insert data into 'customers' tablecustomers_df.to_sql('customers', conn, if_exists='append', index=False)# Insert data into 'orders' tableorders_df.to_sql('orders', conn, if_exists='append', index=False)# Commit changes and close the connectionconn.commit()conn.close()
Verification: Ensure Foreign Key Relationships
conn = sqlite3.connect('data.db')cursor = conn.cursor()# Query to check if all customer_id in orders table exist in customers tablecursor.execute('''SELECT order_idFROM ordersWHERE customer_id NOT IN (SELECT customer_id FROM customers)''')invalid_orders = cursor.fetchall()conn.close()if invalid_orders: print("Invalid foreign key references found:", invalid_orders)else: print("All foreign key references are valid.")