Leader have aspiration however due to bad quality data not feasible.
Prevention: Caught at the source Remediation: Observability, alerting, Complex ETL, erosion of trust Failure: failure on opportunity cost, unable to meet business goals
the sooner we handle it the cheaper and easier it is
Example: Software engineer: unit tests, wan to prevent bugs in production, observaility tools do help
Upstream breaking schema changes
Root cause analysis: Documentation
- 5 Y’s
- Fishbone diagram: start at issue at head
- People and ownership: Who is entering the data: the source data
Why should product people (people producing the data) take on responsibility of the data quality? Do they understand how it is being used and how it generates value.
Data driven value - need to motivate it.
Change management
- Communication, knowledge
Change management - git repo
Build the interface from the data contract.
Questions:
Q: If the data producers are not data teams, but business users. For example dumping data in google sheet or excel. With issues in naming conventions. Any suggestions for these type of data producers? A: How to apply same patterns. Agreement on structure. Alerting and automating the change management.