Skip to main content

AI Knowledge Studio

AI Utility, Analytics

AI Knowledge Studio - Screenshot 1 - AI Utility, Analytics project

Project Summary

A natural-language analytics layer that lets teams ask questions over massive metrics data and get accurate, explainable answers in seconds.

The system queries both real-time event streams and relational metadata, generates safe queries with guardrails, and returns results with full context and summaries.

All inference runs on self-hosted fine-tuned large language models with tracing and evaluation built in. Data and models stay inside private infrastructure with no external data egress.

Case Study

Overview

Built a chat-first analytics assistant that answers natural-language questions over real-time metrics while keeping data and inference inside private infrastructure.

Problem

Teams were exporting telemetry and KPI data into ad-hoc spreadsheets. Answers were slow, inconsistent, and security requirements ruled out sending data to external LLM APIs.

Goals

  • Answer common analytics questions in under 10 seconds end to end.
  • Support averages, rankings, and grouped comparisons across billions of events.
  • Keep data fresh with sub-2 minute lag from ingestion to query.
  • Provide explainable results with the exact query used.
  • Keep data and inference inside private infrastructure (no external data egress).
  • Capture traces and feedback to improve accuracy over time.

Approach

  1. Used a natural-language query engine to generate SQL over both real-time event streams and relational metadata without duplicating data.
  2. Separated high-cardinality real-time metrics from reference data and access control into dedicated storage layers.
  3. Deployed all data stores inside a private cloud VPC with strict network boundaries.
  4. Self-hosted inference with a fine-tuned 70B-parameter model on dedicated GPU servers to control cost and latency.
  5. Built an observability pipeline to log prompts, generated queries, and outcomes for evaluation and continuous tuning.

Solution

A lightweight chat UI backed by a query service that routes questions to the appropriate data source, executes safe queries, and returns summaries with computed results, all within private infrastructure.

Outcomes

  • Delivered near real-time analytics without a separate warehouse or external LLM APIs.
  • Reduced analysis time from hours of exports to minutes of Q&A.
  • Improved trust by exposing the exact query and result context.

Key Metrics

Median response time
6-10 sec
End-to-end question to answer.
Query success rate
90-95%
Valid queries on first pass.
Data freshness
<2 min
Ingestion to availability.
Data scale
5-8B events
Rolling 12-month window.
Model size
70B params
Fine-tuned for analytics NLQ.

Timeline

Schema mapping
Jan 2026
Metrics, dimensions, and examples.
Dual-source NLQ
Jan 2026
Multi-source query routing.
Private inference
Jan 2026
Self-hosted model deployment on local GPUs.
Evaluation loop
Jan 2026–Present
Observability traces + tuning.

Challenges

  • Preventing unsafe or expensive queries from untrusted prompts.
  • Handling ambiguous questions across two data sources.
  • Balancing explainability with concise answers in a chat UI.

معلومات المشروع

البداية:ديسمبر 2025
النهاية:يناير 2026
المدة:شهر واحد
التقنيات:15 (private)
الصور:1 متاحة

التقنيات المستخدمة

Private stack – contact for info