Purpose: To import and persist data from a Google Sheets into a Databricks managed Delta Table for querying and integration with Spark-based workflows.
Process Overview:
-
Export the Google Sheet as CSV via its public link:
sheet_url = "https://docs.google.com/spreadsheets/d/.../export?format=csv&gid=0"
-
Read the CSV into a pandas DataFrame:
pdf = pd.read_csv(sheet_url)
-
Convert to a Spark DataFrame:
df = spark.createDataFrame(pdf)
-
Save the data as a managed table in Databricks:
df.write.mode("overwrite").saveAsTable("example.databricks.table_name")
Applications:
- Automating ingestion of lightweight reference data.
- Collaborative data entry pipelines.
- Quick ETL prototypes with spreadsheet-based inputs.