D

Data Lineage

Tracking the origin, movement, and transformation of data throughout its lifecycle in an organisation.

In-Depth Explanation

Data lineage tracks where data comes from, how it moves through systems, and what transformations are applied. It provides a complete audit trail of data from source to consumption.

What lineage tracks:

  • Origin: Where data was created/sourced
  • Movement: How data flows between systems
  • Transformation: Changes applied to data
  • Consumption: Where and how data is used
  • Timing: When changes occurred

Lineage benefits:

  • Impact analysis (what breaks if X changes?)
  • Root cause analysis (where did bad data come from?)
  • Compliance and audit (prove data handling)
  • Documentation (understand data flows)
  • Trust (confidence in data sources)

Lineage implementation:

  • Column-level lineage (most detailed)
  • Table-level lineage (relationships)
  • System-level lineage (data flows)

Tools:

  • dbt (built-in lineage)
  • Apache Atlas
  • Data catalog platforms (Alation, Collibra)
  • Cloud-native (AWS Glue, GCP Dataplex)

Business Context

Lineage answers "can I trust this data?" by showing its complete journey. Essential for compliance, debugging, and data governance.

How Clever Ops Uses This

We implement data lineage for Australian businesses needing audit trails, compliance documentation, and confidence in their AI training data.

Example Use Case

"Tracing a suspicious ML model prediction back through the data pipeline to identify a data quality issue in a source system."

Frequently Asked Questions

Category

data analytics

Need Expert Help?

Understanding is the first step. Let our experts help you implement AI solutions for your business.

Ready to Implement AI?

Understanding the terminology is just the first step. Our experts can help you implement AI solutions tailored to your business needs.

FT Fast 500 APAC Winner|500+ Implementations|Harvard-Educated Team