Agentic BI

Knowi's Agentic BI turns natural language into action. Instead of manually building queries, selecting chart types, and configuring dashboards, you describe what you want and the AI handles it - creating dashboards, generating queries, transforming visualizations, exporting data, and managing reports through conversation.

Agentic BI is available in two ways:

  • In-Product Agents - AI assistants built into Knowi's dashboard and widget interfaces. See AI Tools for in-product documentation.

  • MCP Server - Connect external AI tools like Claude, GPT, or Copilot to Knowi via the Model Context Protocol. Your AI assistant operates Knowi programmatically.

What You Can Do

Ask Questions About Your Data

Ask natural language questions and get answers with visualizations. The AI finds the right dataset, writes the query, and returns results.

"What were our top 10 customers by revenue last quarter?"
"Show me the trend in support tickets over the past 6 months"
"Which product categories have declining sales?"

Create Dashboards

Describe the dashboard you want. The AI selects relevant datasets, generates queries, picks appropriate chart types based on the data, and builds a complete dashboard.

"Create a sales performance dashboard with revenue trends, top products, and regional breakdown"
"Build an executive dashboard with KPIs for this quarter"

Create and Transform Widgets

Create new visualizations or modify existing ones through conversation.

"Make a pie chart of revenue by region"
"Change this to a stacked bar chart"
"Add a date filter to this widget"

Get AI Recommendations

Ask for insights and the AI analyzes your data to surface trends, anomalies, and actionable recommendations.

"What trends do you see in this data?"
"What's driving the revenue drop this month?"
"What should I focus on to improve conversions?"

Search Across Assets

Find dashboards, widgets, datasets, reports, and documents using keyword or semantic search.

"Find dashboards related to customer retention"
"Which datasets contain revenue data?"

Manage Reports and Alerts

Create, schedule, pause, and manage reports and alerts through conversation.

"Email this dashboard to the sales team every Monday at 9 AM"
"Create an alert when revenue drops below $10,000"
"Pause all weekly reports"

Export and Share

Export data and generate shareable URLs without navigating menus.

"Export this widget to Excel"
"Create a shareable URL for this dashboard"
"Export the dashboard to PDF"

How It Works

The Orchestrator

When you submit a request, the AI orchestrator analyzes your intent and routes it to the appropriate agent. For complex requests, multiple agents are chained together automatically.

For example, "Create a sales dashboard" triggers:

  1. Search - finds relevant datasets
  2. Create Dataset - generates queries if needed
  3. Create Dashboard - builds the dashboard with optimal chart types and layout

Clarification Sessions

When a request is ambiguous, the AI asks clarifying questions rather than guessing. The conversation maintains context across multiple turns.

You: "Show me the sales data"
AI: "I found 3 sales datasets: Product Sales, Regional Sales, and Online Sales. Which one would you like to explore?"
You: "Product Sales"
AI: [generates visualization from Product Sales dataset]

Available Agents

The system includes specialized agents for different tasks:

Agent What It Does
NLPConverts natural language to SQL queries and executes them
Data AnalystLong-form Q&A on datasets and documents
Widget Data AnalystAnalyzes and transforms widget data
Create DashboardGenerates complete dashboards from natural language
Create WidgetCreates individual widgets with optimal chart types
Update Widget SettingsModifies chart types, colors, labels, and formatting
Create DatasetBuilds datasets by searching datasources and generating queries
Search AssetsSearches across dashboards, widgets, datasets, reports, and documents
Dashboard FilterCreates and configures dashboard-level filters
Dashboard SettingsModifies dashboard properties (name, theme, layout)
Dashboard SummaryGenerates narrative summaries across all widgets
Add Widget to DashboardAdds an existing widget to a dashboard
Widget LayoutRepositions and resizes widgets on the dashboard grid
Report DeliveryDelivers dashboards via Email or Webhooks
Report ManagementCreates, lists, edits, pauses, and deletes scheduled reports
Alert ManagementCreates, lists, edits, pauses, and deletes data alerts
Share DashboardGenerates shareable URLs for dashboards
Data ExportExports widget or dashboard data to CSV or Excel
RecommendationGenerates AI-powered insights and recommendations

Accessing Agentic BI

In-Product: Dashboard Agent

Click the AI Assistant icon in the top-right corner of any dashboard. This opens the Dashboard Agent panel where you can interact with the entire dashboard using natural language.

For detailed instructions, see Using the Dashboard Agent.

In-Product: Widget Agent

Click the AI Assistant icon on any widget. This opens the Widget Agent panel where you can ask questions, transform data, change visualizations, and export results for that specific widget.

For detailed instructions, see Using the Widget Agent.

Via MCP Server

Connect external AI tools (Claude Code, Claude Desktop, or any MCP-compatible client) to Knowi's MCP server. This gives your AI assistant access to 31 Knowi tools.

For setup instructions, see MCP Server and Claude Code Setup.

Enabling Agentic BI

Account Activation

Contact your Knowi account manager or email support@knowi.com to enable Agentic AI for your account.

AI Settings

Once enabled, navigate to Settings > User Settings > AI Settings and toggle on AI Agents Access.

*Note: This controls AI Agent functionality for your entire account. Disabling it will disable agentic features for all users.

Role Permissions

Control which users can access Agentic BI through role permissions. Navigate to Settings > User Settings > Roles tab. Under the AI category, enable:

  • Use AI Agent Assistant in dashboards - access to the Dashboard Agent
  • Use AI Agent Assistant in widgets - access to the Widget Agent

For more on roles, see User Roles.

Setting Up Datasets for Agentic BI

The accuracy of Agentic BI depends almost entirely on how your datasets are configured. The agent does not magically understand your business ? it reads fields, samples values, and follows the conventions you give it. This section covers everything an admin should review before exposing a dataset to AI agents.

Setup Checklist

For each dataset you want to expose to Agentic BI:

  1. Index it under Settings > User Settings > AI Settings > Dataset Indexing.
  2. Name fields clearly ? Revenue USD beats rev, Customer State beats cs.
  3. Set a dataset description in Dataset Management so the AI knows what the dataset represents.
  4. Add field synonyms if users use multiple terms for the same field.
  5. Configure value aliases for dimensions where users say one thing but the data stores another (e.g., California ? CA).
  6. Decide on Direct vs Non-Direct execution (see below) ? this controls how the AI builds queries.
  7. For direct queries, with tokens, design tokens with descriptive labels so the binder can infer values from natural language.
  8. Add Global AI Instructions for account-wide rules (fiscal year, currency, business definitions).
  9. Mark frequently-used datasets as Agentic BI Defaults.

Indexing

Indexing is the gate. An unindexed dataset is invisible to Agentic BI ? it will not appear in dataset selection, will not be searched, and will not be queried. Toggle indexing under Settings > User Settings > AI Settings > Dataset Indexing.

  • Indexed ? the dataset is searchable by name, schema, and sampled values, and the AI agents can target it.
  • Index by default (account-wide) ? automatically indexes new datasets as they're created. Turn off if you want admins to opt datasets in explicitly.

Only index datasets that are appropriate for AI/NLP use, and disable the rest. Personally identifiable data, raw event streams, and datasets with thousands of low-quality fields will degrade matching and should be modeled into clean, named datasets first.

Field Names, Types, and Dataset Description

The AI sees field names and data types for every field, plus the dataset's overall description if one is set. Treat these as documentation for an analyst who has never seen your data.

  • Names: prefer human-readable names. Avoid abbreviations the AI can't expand (cstmr_st, tx_amt_2). If your source schema is cryptic, alias columns to readable names in the query.
  • Types: ensure numerics are typed as numbers, dates as dates. A revenue field stored as a string will not aggregate correctly.
  • Dataset Description: open the dataset in Dataset Management and set a description that explains what the dataset represents and how it should be used. One or two sentences. Examples:
    • "Daily aggregated revenue by region and product category. All amounts in USD, excludes tax and refunds."
    • "Customer lifecycle events from the CRM ? one row per status change. Use for funnel and churn analysis."

The dataset description is sent to the AI as part of the schema and helps the orchestrator pick the right dataset for a question. Knowi does not currently support per-field descriptions ? to clarify ambiguous fields, rename the field, add a field synonym, or document the convention in Global AI Instructions.

Field Synonyms

Synonyms map alternate names for a field to its canonical name. Configure under Settings > User Settings > AI Settings > Field Synonyms. Use these when:

  • Users use jargon that doesn't appear in the schema (bookings ? gross_revenue).
  • A field has a domain-specific meaning the AI won't infer (MRR ? monthly_recurring_revenue).
  • You've migrated a column name and users still ask using the old one.

Synonyms are field-level and account-wide. They differ from value aliases (which map data values, not field names) ? see Value Aliasing below.

Default Datasets

Mark datasets as Agentic BI Default to pre-select them whenever a user opens the chat. Defaults apply customer-wide, but the per-user authorization model still applies ? users only see defaults they've been shared on. Disabling indexing automatically clears the default flag.

Use defaults sparingly. The chat experience is best when the default set is small (1?5 datasets) and represents the assets users query most often.

Global AI Instructions

Account-wide instructions injected into every Agentic BI request. Configure under Settings > User Settings > AI Settings > Global AI Instructions.

Use this for invariants ? rules that should always be applied:

  • "Fiscal year starts April 1. When users say 'this year' or 'YTD', interpret as fiscal year unless they explicitly say calendar year."
  • "All revenue is reported in USD. Convert other currencies using the fx_rate_usd field."
  • "Never aggregate net sales across regions ? always group by region."
  • "Customer tier is defined as: Enterprise (>$100k ARR), Mid (>$25k), SMB (otherwise)."

Keep instructions short and rule-based. Don't paste documentation, examples, or marketing copy ? every token is sent on every request and the LLM's attention is finite. If an instruction grows past a few sentences, it probably belongs in field descriptions or per-dataset metadata instead.

Direct Query vs Non-Direct (Cached) Datasets

Knowi datasets run in one of two execution modes. The mode you choose changes how Agentic BI generates and runs queries against the dataset, and it changes what setup work is needed for accurate matching.

Mode Where the data lives What the AI generates Best for
Non-Direct (Cached) Knowi ElasticStore ? query results are stored and refreshed on a schedule. Cloud9QL run against the cached results. Slow source queries, frequently-asked questions, datasets users explore interactively, large historical aggregates.
Direct Query Original datasource ? every query hits the live database/API at runtime. Native query (SQL/Mongo/etc.) using the dataset's underlying query template, with runtime tokens filled in. Real-time data, datasets too large to cache, regulated data that must not leave the source, parameter-driven endpoints.

For the difference at the query-creation layer, see Defining Data Execution Strategy.

How Agentic BI Handles Non-Direct Datasets

For non-direct (cached) datasets, the AI works against the materialized fields in the ElasticStore. Setup is straightforward:

  • Field names, types, descriptions, synonyms, and value aliases all apply normally.
  • Refresh cadence is whatever you configured at query authoring (run once, scheduled intervals).
  • Sample values are pulled from the cached dataset, so freshness of indexing matches freshness of the data.

How Agentic BI Handles Direct-Query Datasets

For direct-query datasets, the AI does not generate the SQL from scratch. Instead, the underlying query template authored at dataset creation is preserved, and the AI fills in runtime parameter tokens based on the user's natural-language question. This protects the original query semantics (joins, filters, governance rules) while still letting users ask in plain English.

This means setup for direct datasets requires careful token design in the original query template.

Direct Query Tokens and Runtime Parameters

Direct queries use Knowi's runtime parameter token syntax:

``` $c9_$()$[

  • <name> ? the macro name. Used in matching when no label is provided.
  • <default> ? value used if nothing is bound at runtime.
  • <label> ? display label shown in dashboard filter UI. Critical for AI matching ? the binder uses the label first when inferring values from a question.
  • <formatting> ? optional list/date/quotation directives.

For full token syntax, see Runtime Query Parameters.

How the AI Binds Tokens

When a user asks Agentic BI a question against a direct-query dataset, Knowi attempts to infer values for unfilled tokens from the question text. Resolution order, per token:

  1. Skip if explicitly bound ? runtime filters, dashboard filters, and user content filters always win and cannot be overridden.
  2. Match candidate phrases against the token's label ($c9_x$[payment processor]$ matches "stripe" in "show me transactions for payment processor stripe").
  3. Match candidate phrases against the humanized macro name ($c9_processor$ ? "processor").
  4. Normalize the matched phrase to a canonical value via configured value aliases (e.g., "California" ? "CA").
  5. Sanitize the value ? values must match the whitelist ^[A-Za-z0-9 ._-]+$ and be 100 characters or fewer.

Authoring Tokens for Best AI Matching

Treat tokens like field descriptions ? they're documentation the AI reads.

  • Always set a label. $c9_processor$[Payment Processor]$ matches user phrasing far better than the bare $c9_processor$. Labels are also what users see in filter UIs, so this is double duty.
  • Use specific, descriptive labels. [Region Code] beats [Region] if the field stores codes; [Customer Tier] beats [Tier] when there are multiple kinds of tier in the schema.
  • One token, one concept. Don't reuse $c9_value$ across three unrelated tokens ? every token should map to one column or filter.
  • Pair tokens with value aliases. If your token filters state = $c9_state$[State]$ and your data stores CA but users say California, configure a value alias dataset (see Value Aliasing) so the binder normalizes the inferred value before substitution.
  • Provide a sensible default. If the AI can't infer a value, the default fires. Choose a default that produces a useful, low-risk result (e.g., last 30 days, all regions).

What's Not Yet Supported

Current binder limitations to be aware of:

  • Date-range tokens ($c9_start$, $c9_end$) are supported via natural-language phrases like "last 30 days," "from Jan 1 to Mar 1," "since last quarter." Other free-form date phrases may not bind ? check logs (DirectQueryMacroBinder log prefix) if a question fails to fill date filters.
  • Predicate-clause tokens ($c9_filter$, $c9_where$) are reserved and never AI-bound. If you've authored a query that depends on a clause-shaped token, the user must either fill it via runtime filter UI or you must redesign the template to use value tokens.
  • Multi-word values arrive only via alias normalization ? direct phrase capture is single-word. If your data has multi-word canonical values (e.g., New York City), you must configure them as aliases.
  • Ambiguity is silent ? if multiple labels in a question could match the same token, the binder picks one and logs the choice. There's no in-chat clarification yet for direct-query token ambiguity.

Quick Diagnostic

If a direct-query dataset is producing empty or wrong results from Agentic BI:

  1. Verify token labels are set and descriptive.
  2. Confirm value aliases are configured for any dimensions where user phrasing differs from stored values.
  3. Confirm the user has access to any reference datasets used for aliasing.

Value Aliasing

Users rarely phrase queries using the exact values stored in your data. They ask "how many customers in California?" when the state field actually stores CA, or "orders in NYC" when the city column holds New York City. Value aliasing bridges that gap so Agentic BI resolves the user's term to the canonical value at query time - the generated query runs as state = 'CA' and returns correct results.

Aliases are sourced from reference datasets in your account rather than inline config. A reference dataset is just a regular Knowi dataset. Two formats are supported:

Two-column form (canonical + aliases). One column holds canonical values (what's actually in the data), the other holds alias values (what users might type). For example, a US States dataset:

statealias
CACalifornia
CACali
NYNew York
NYNew York State
TXTexas

The column whose name matches a field in the target dataset is automatically treated as the canonical column. Any other column supplies aliases. Multiple rows per canonical produce multiple aliases.

Single-column vocabulary form. A one-column dataset whose column name matches a target field. Each row's value is registered as a known canonical value for that field. Use this when you just want to teach the system what the valid values for a field are (e.g., a list of airlines), without authoring alternate spellings:

airline
American
Delta
JetBlue
Southwest
United

Now a question like "average delay for Southwest flights" binds Southwest to the airline field even if the target dataset is direct-query (so distinct values aren't pre-sampled).

Configuration

Aliases are configured account-wide on the Settings > User Settings > AI Settings page, under Global Alias Datasets. Admin-only. Pick one or more reference datasets; they apply across every dataset in your account whose fields match a column name in the reference dataset. A dataset with no matching field is simply skipped.

Limits and Access Control

  • Up to 10 reference datasets in the global alias list
  • Each reference dataset must have 50,000 rows or fewer
  • Per-user access is enforced at query time. If a user has not been shared on a reference dataset, aliases from that dataset are silently skipped for that user and no alias data is leaked. Make sure any reference dataset you configure is shared with the users who should benefit from it.

Tips

  • Keep reference datasets small and focused. One dataset per vocabulary domain - US States, Country Codes, Product Aliases - is cleaner than a giant all-in-one table.
  • Column names in the reference dataset must match a field name in the target dataset (case-insensitive) for aliases to apply. If your customer-facing dataset has a field called state_code, the reference dataset needs a column called state_code, not state.
  • Aliases augment the data - they never overwrite it. If a canonical value in your data is itself the same as one of your aliases, the real data value always wins.

Security and Data Privacy

What Data Does the AI See?

When an agent executes, it sends data to the configured AI model to generate a response. The amount of data depends on the operation:

  • Dashboard and widget creation: The AI receives dataset schema (field names, types) and a small sample of rows to select appropriate chart types and generate queries.
  • Recommendations and insights: The AI receives actual widget data rows, truncated to fit the model's context window. Data is iteratively reduced (removing rows) until it fits.
  • NLP queries: The AI receives dataset schema and field metadata to generate queries. Query results are returned directly from the database - not routed through the AI.
  • Deterministic tools (list, get data, export, delete): These do not involve the AI model at all. They execute directly against Knowi's services.

Data Residency and AI Providers

There are two separate data flows to understand:

1. Knowi's internal AI processing - When an agent needs AI reasoning (e.g., choosing chart types, generating SQL, producing recommendations), Knowi sends data to its configured AI model. This is controlled by your AI provider setting:

ProviderWhere Knowi Sends Data for AI ProcessingConfiguration
Knowi (Internal)Knowi's own infrastructure. Data stays within Knowi.Default - no configuration needed
OpenAIOpenAI's servers via API.Settings > AI Settings > AI Model Providers
Anthropic (Claude)Anthropic's servers via API.Settings > AI Settings > AI Model Providers
Google (Gemini)Google's servers via API.Settings > AI Settings > AI Model Providers

AI provider settings are configurable per feature - you can use the internal model for recommendations (which include data rows) and an external model for dashboard creation (which only includes schema).

2. MCP client data flow - When you access Knowi through an external AI tool (Claude Code, Claude Desktop, GPT, Copilot, etc.), tool results - including query results, dashboard metadata, and data rows - are returned to that client. The client's LLM then sees this data. This is true regardless of which AI provider Knowi uses internally. Even if Knowi is configured to use its internal AI model, the MCP tool responses (which may contain actual data) are sent back to the calling client's LLM.

In practice, when using Claude Code with Knowi's MCP server, data flows through both Knowi and Anthropic's servers - Knowi processes the request, and the tool response (containing your data) is returned to Claude, which runs on Anthropic's infrastructure.

For maximum data isolation, use Knowi's in-product agents (Dashboard Agent, Widget Agent) with the internal AI provider. Data stays entirely within Knowi's infrastructure.

For on-premises deployments, the AI model runs within your infrastructure. MCP access from external clients would still send tool responses to the client - evaluate whether this is acceptable for your data classification requirements.

Authorization

All agents enforce Knowi's existing authorization model:

  • Resource-level access: Before operating on any dashboard, widget, or dataset, the system verifies the user has permission using user.isAllowed(resourceId, assetType, write). A user can only interact with resources they are explicitly authorized to access.
  • Customer isolation: Datasource access is verified by matching the datasource's customer ID against the user's customer ID. A user from Organization A cannot access Organization B's datasources, datasets, or dashboards through an agent.
  • Role-based permissions: Agentic AI access is controlled through role permissions. Admins can enable or disable dashboard-level and widget-level agent access per role.

Row-Level Security

AI agents respect user content filters. If a user has row-level security applied (e.g., they can only see data for their region), the agent only sees and operates on the filtered data. This applies to all agent operations including recommendations, data analysis, and NLP queries.

Audit Trail

All agent executions are logged with:

  • The user's input prompt
  • Success or failure status
  • Response message
  • Execution time
  • Associated resource IDs (dashboard, widget, dataset)

Logs are accessible to account administrators.

MCP-Specific Security

When agents are accessed via the MCP Server, additional protections apply:

  • Prompt injection detection: Instructions are scanned for injection patterns and blocked if detected.
  • Destructive operation safeguards: Delete/remove/drop operations are blocked in natural language tools and must use the explicit knowi_delete tool with confirmation.
  • Input sanitization: Control characters are stripped from all inputs. Instruction length is capped at 2,000 characters.
  • Tool whitelisting: Only the 31 registered tools can be called. Arbitrary tool names are rejected.

See MCP Server Security for full details.