Data Lineage
Tracking the origin, movement, and transformation of data throughout its lifecycle in an organisation.
In-Depth Explanation
Data lineage tracks where data comes from, how it moves through systems, and what transformations are applied. It provides a complete audit trail of data from source to consumption.
What lineage tracks:
- Origin: Where data was created/sourced
- Movement: How data flows between systems
- Transformation: Changes applied to data
- Consumption: Where and how data is used
- Timing: When changes occurred
Lineage benefits:
- Impact analysis (what breaks if X changes?)
- Root cause analysis (where did bad data come from?)
- Compliance and audit (prove data handling)
- Documentation (understand data flows)
- Trust (confidence in data sources)
Lineage implementation:
- Column-level lineage (most detailed)
- Table-level lineage (relationships)
- System-level lineage (data flows)
Tools:
- dbt (built-in lineage)
- Apache Atlas
- Data catalog platforms (Alation, Collibra)
- Cloud-native (AWS Glue, GCP Dataplex)
Business Context
Lineage answers "can I trust this data?" by showing its complete journey. Essential for compliance, debugging, and data governance.
How Clever Ops Uses This
We implement data lineage for Australian businesses needing audit trails, compliance documentation, and confidence in their AI training data.
Example Use Case
"Tracing a suspicious ML model prediction back through the data pipeline to identify a data quality issue in a source system."
Frequently Asked Questions
Related Terms
Related Resources
Data Governance
The framework of policies, processes, and standards for managing data assets. En...
Metadata
Data that describes other data, providing context about structure, meaning, orig...
Data Catalog
A centralised inventory of data assets with metadata, enabling discovery, unders...
Learning Centre
Guides, articles, and resources on AI and automation.
AI & Automation Services
Explore our full AI automation service offering.
AI Readiness Assessment
Check if your business is ready for AI automation.
