Data Lineage - Trace how an answer was built

Data Lineage explains how any Knowi asset is built and which other assets depend on it. Open it from a widget, dataset, query, or dashboard to see the data flow from source to visualization, the transformations applied along the way, and what would break if that asset changed.

It is most useful when you are about to edit something and want to know what else depends on it, or when you receive an answer from Knowi's AI and want to understand exactly how that number was produced.

Where to find it

Lineage appears in two places:

  • Data Diagram drawer - in the widget, dataset, query, or dashboard page, open the Data Diagram action. The drawer now includes a Lineage tab in addition to the existing data diagram.
  • Agentic BI chat - when an AI answer includes a chart or table, the response surfaces a How this answer was built link that opens the lineage drawer pre-scoped to that answer's dataset or widget.

You can also call lineage programmatically through the explain_lineage MCP tool.

The four tabs

The lineage drawer has four tabs:

  • Data Diagram - the existing relationship diagram showing how a query, dataset, and visualizations connect. Unchanged from before; this is the default tab.

  • Lineage - a step-by-step recipe showing how the asset is built: upstream datasources, query strategy (native or Cloud9QL), transformations, joins, the output dataset, and the rendered widget. Each step is annotated with a confidence level (exact, parsed, inferred) so you know whether the recipe came from structured metadata or by parsing the query.

  • Trace Field Lineage - field-level provenance. For each output column, traces back which input columns contributed to it and what operation produced it (pass-through, aggregation, expression, window function, etc.). Use the field selector at the top to focus on one column. Field tracing is deterministic and best-effort for native SQL; Cloud9QL transforms are traced precisely.

  • Impact - the downstream assets that depend on this one: other widgets that use the same dataset, dashboards that contain those widgets, alerts and reports that fire off them, and derived queries (Elastic Store, Cloud9QL chains). Use this tab before deleting or renaming a dataset, query, or widget so you can see what else would be affected.

Supported assets

Lineage works for these asset types:

  • Widget - traces the full path: datasource ? query ? dataset ? widget. Pass dashboardId to also reflect dashboard-level filters.
  • Dataset - traces upstream: datasources ? query/transforms ? dataset. Impact lists widgets and downstream datasets.
  • Query - traces upstream datasources and downstream datasets produced by the query. Useful for Cloud9QL chains and joins.
  • Dashboard - aggregates lineage across all widgets on the dashboard and shows the combined upstream sources plus the dashboard's own reports, alerts, and drilldowns.

AI summary

Lineage produces a deterministic narrative by default - the same lineage facts every time, with no model in the loop. When you toggle Include AI summary, Knowi rewrites that deterministic narrative into a plain-English explanation using your configured Data Lineage AI provider (User Settings ? AI Settings ? Data Lineage).

The underlying facts and graph never depend on AI. The toggle only changes how the narrative reads.

If the AI provider is unavailable or AI is disabled for your account, deterministic lineage continues to work - you simply do not see the rewritten narrative.

Permissions and redaction

Lineage never shows you data you would not otherwise be allowed to see. When an upstream dependency exists but you do not have access to it, the drawer:

  • shows the dependency as a protected node in the graph (asset type only, no name or ID)
  • replaces section details with a protected marker rather than the actual values
  • adds a redaction notice explaining that something was hidden and why
  • excludes the protected asset from the visible counts in the Impact tab

Specifically, lineage never exposes:

  • Raw source query text in the field trace
  • Raw filter values from any upstream filter set
  • Metadata for protected datasets in the AI summary payload
  • Counts that would let you infer how many inaccessible assets reference an upstream

The Impact counts you see reflect only the assets you yourself can access, so two different users looking at the lineage of the same dataset may see different downstream counts.

MCP tool

The same lineage data is available through the explain_lineage MCP tool, so you can ask Claude, Cursor, ChatGPT, Codex, or VS Code to explain how a Knowi answer was built.

Accepted arguments:

ParameterTypeRequiredDescription
assetTypestringNo*Type of asset: widget, dataset, query, or dashboard
assetIdintegerNo*ID of the asset named by assetType
widgetIdintegerNo*Shortcut for assetType=widget
datasetIdintegerNo*Shortcut for assetType=dataset
queryIdintegerNo*Shortcut for assetType=query
dashboardIdintegerNo*Shortcut for assetType=dashboard, or widget context when explaining a widget
modestringNobuild, impact, or both. Defaults to both.
includeFieldsbooleanNoInclude field-level lineage. Defaults to false.
includeAibooleanNoGenerate the AI narrative. Defaults to false to avoid extra model cost.
fieldNamestringNoFocus the field trace on one column.
questionstringNoThe original business question that produced the answer being explained.

*Either assetType + assetId, or one of the shortcut ID params (widgetId/datasetId/queryId/dashboardId), is required.

Example:

{
  "name": "explain_lineage",
  "arguments": {
    "widgetId": 12345,
    "dashboardId": 678,
    "mode": "build",
    "includeFields": true
  }
}

Permissions

To use lineage, your role needs read access to the asset you are tracing. No additional permission is required beyond what you would need to view the asset itself. Lineage is enabled for all users who can open the Data Diagram drawer.

Known limits

  • Field-level lineage is precise for Cloud9QL transforms and best-effort for native SQL. Very complex CASE/window expressions or source-native SQL with vendor-specific syntax may return inferred-confidence traces rather than exact ones. The deterministic narrative will flag these with a warning.
  • Lineage graphs are capped (currently 250 nodes / 500 edges); larger graphs are truncated with a notice in the Lineage tab. Truncation always prefers visible upstream and direct downstream.
  • The Impact tab scans reports and alerts available to you. For very large tenants, this can take several seconds on first open; subsequent opens within the same session are cached.
  • Lineage is computed on demand from current asset metadata. If a dataset was deleted, lineage stops reporting it; we do not retain historical lineage snapshots.