सूक्ष्म वाचक

The Subtle Commentator

AI that watches cricket and speaks like a legend.

Seven persona voices cloned from 10 seconds of audio. Hindi and English commentary from live TV — all running locally on a single GPU.

What if your TV could commentate in any voice?

Cricket commentary is an art form. Richie Benaud said “Gone.” and the world understood. Sushil Doshi turned a six into poetry. Tony Greig made every boundary sound like a revolution.

We built a system that watches a live cricket broadcast — the actual TV signal — and generates commentary in real time. Not subtitles. Not summaries. Full persona-driven narration with the cadence, vocabulary, and emotional instinct of the greats.

See

Video Swin V3 classifies 32-frame bursts — six, wicket, dot ball, or silence.

Understand

Qwen3 14B writes persona commentary with match context — overs, score, pressure, momentum.

Speak

XTTS-v2 (English) + IndicF5 (Hindi) — zero-shot voice cloning from 10-second reference clips.

The Voices

Seven Voices. One AI.

Three legends, one authority, and three wild cards. Each persona has a distinct vocabulary, emotional range, and minimalism score that controls how much — or how little — it says.

English
0.95

Richie Benaud

Minimalist

Gone.

The gold standard. Benaud never wasted a word. When a wicket falls, a single, perfectly timed observation — then silence. Let the game breathe.

English
0.20

Tony Greig

Dramatic

That's gone into the stands!

Big, bold, theatrical. Greig turns every boundary into an event and every wicket into a story. Longer commentary, more animated, exclamation marks it actually earns.

हिंदी
0.30

Sushil Doshi

Dramatic

आउट! और गया!

The voice of a billion cricket fans. Hindi commentary that blends poetic flair with heartland passion. The AI doesn't translate English — it thinks in Hindi.

English
0.60

Michael Holding

Authoritative

That is a serious delivery.

The quiet confidence of someone who bowled 160 km/h. Appreciates fast bowling above all else. Smooth Jamaican cadence, unhurried, fair but firm.

Wild Cards
हिंदीwild
0.05

Basanti

Unstoppable

जब तक ये बल्ला चलेगा, मैं नहीं रुकूँगी!

Non-stop, breathless, delightfully chaotic. Basanti doesn't pause — she narrates the ball, the crowd, the weather, and her tanga in one breath. Commentary that never stops for air.

हिंदीwild
0.50

Jaspal Bhatti

Satirical

Flop! Yeh toh flop show hai!

Deadpan genius. Bhatti turns a dropped catch into a commentary on Indian bureaucracy and a wide ball into social satire. The funniest commentator who never existed — until now.

हिंदीwild
0.10

Sunil Grover

Chaotic

Patient critical hai!

Dr. Mashoor Gulati commentates cricket as a medical emergency. Every six is a heart attack case, every wicket is surgery required. He flirts with the cricket itself.

How It Works

From TV Signal to Voice

Five stages, each building on the last. The entire pipeline runs on a Z390 server with an RTX 5060 Ti — no cloud GPUs, no subscriptions.

01

Capture

Roku HDMI signal splits — one to TV, one to Elgato Cam Link 4K on Z390. Ticker diff detects scorecard changes. On trigger: 10-second burst at 30fps (300 frames). Rate-limited to one burst per delivery.

Cam Link 4K + V4L2 + burst controller
02

Classify

Video Swin V3 samples 32 frames from the burst and classifies the event: six, wicket, dot ball, boundary four, or non_action. Confidence filter (0.60) silences uncertain predictions. The AI knows when to stay quiet.

Video Swin V3 (27.9M params, BF16)
03

Understand

Qwen3 14B runs locally via Ollama. It receives the event classification, match context (overs, batter, bowler, score), and a persona system prompt. Streams commentary text in real time.

Qwen3 14B via Ollama (local)
04

Contextualize

The context engine tracks run rate, required rate, wickets in hand, partnerships, and milestones. A wicket at 350-3 gets measured commentary. A wicket in a tight chase gets urgency.

Context engine + Cricsheet data
05

Speak

Dual-engine TTS — language router sends English to XTTS-v2 and Hindi to IndicF5. Both clone the persona's voice from a 10-second reference clip. Zero-shot — no fine-tuning needed.

XTTS-v2 (English) + IndicF5 (Hindi)

The Numbers

Built on Real Data

1,018
Bursts captured
4 T20 WC matches, 30fps × 10s each
302
Dot balls
Most common — the model's anchor class
132
Sixes
0.86 confidence — the showstopper
56
Boundaries
0% accuracy — the Green on Green problem
54
Wickets
Merged bowled + caught → 13% accuracy
474
Non-action
0.97 confidence — zero hallucination on dead air
61.8%
V3 accuracy
Video Swin V3 validation
7
Personas
3 legends + 1 authority + 3 wild cards
16 GB
VRAM
RTX 5060 Ti — entire pipeline on one GPU
2
Languages
English (XTTS-v2) + Hindi (IndicF5)
10s
Voice cloning
Reference clip per persona — zero-shot
$0
Cloud cost
Captured, labeled, trained, deployed — all local

Architecture

Under the Hood

CaptureOpenCV + ffmpeg
CaptureCam Link 4K + burst controller
VisionVideo Swin V3
LLMQwen3 14B via Ollama
TTS (English)XTTS-v2
TTS (Hindi)IndicF5 (AI4Bharat)
FrontendNext.js 16 + Tailwind
InfraZ390 + RTX 5060 Ti + Pi #1
DataHuggingFace + Cricsheet
Monitoringntfy + Grafana + n8n

Demo

Same Moment. Three Voices.

One ball. Three legendary styles. See how personality shapes the narrative.

The Moment

Virat Kohli pulls a short ball from Shaheen Shah Afridi over midwicket for six. India 156-3, needing 28 from 18 balls.

Richie Benaud

English

Magnificent. Six.

Measured. One word carries everything.~2 seconds

Tony Greig

English

That's been absolutely launched into the stands! Kohli at his imperious best! What a player!

Every boundary is an event.~5 seconds

Sushil Doshi

हिंदी

छक्का! विराट कोहली ने कमाल कर दिया! क्या मारा है! बॉल स्टैंड में गई!

Heartland passion. The AI thinks in Hindi.~4 seconds

Basanti

हिंदी

अरे ये देखो कोहली ने मारा छक्का, वो भी शाहीन को, वो भी मिडविकेट पर, और स्टैंड में बैठी आंटी का समोसा गिर गया, अरे बाप रे बाप, 28 रन चाहिए अभी भी, पर कोहली है ना, कोहली है तो मुमकिन है, मैं तो कहती हूँ—

Does not stop. Will not stop. Cannot stop.~8 seconds

Jaspal Bhatti

हिंदी

छक्का मारा। बहुत अच्छे। अब बॉल ढूंढने कौन जाएगा? Stadium committee ने बॉल recovery के लिए tender निकाला है — last date next Friday।

Deadpan satire wrapped in cricket.~6 seconds

Audio playback coming soon — TTS pipeline in active development

About

The Project

सूक्ष्म वाचक means “The Subtle Narrator” in Sanskrit. It started as a question: could an AI watch cricket the way a human commentator does — not just describing what happened, but understanding why it mattered?

Built by Aman Misra with two AI partners — not as tools, but as collaborators. It runs entirely on self-hosted infrastructure: a Z390 server with an RTX 5060 Ti for capture and compute, a Pi 5 for DNS and storage, and a Mac for development. No GPU rental. No SaaS subscriptions. Every architectural decision is documented in 33 ADRs.

The Team

One Human. Two AIs.

Not a human using AI tools. Three collaborators in a studio apartment, each with a distinct role, crossing every hurdle together — from HDMI cable debugging at midnight to choosing the right shade of saffron for Doshi's card.

Human

Aman Misra

Architect, cricket domain expert, infrastructure engineer. Designed the pipeline, built the homelab (Z390 + RTX 5060 Ti + Pi), wrote the ADRs, and made every call on what ships and what doesn't. The one who knows why a wicket in the 49th over sounds different from one in the 5th.

  • System architecture & 64 ADRs
  • 1K+ bursts captured and labeled
  • Self-hosted infra on Raspberry Pi
  • Persona design & cricket domain
AI Partner

Claude Code

Anthropic

The coding partner. 20,000+ lines of Python — the vision pipeline, commentary engine, TTS chain, FastAPI backend, and this very site. Wrote the scripts, debugged the edge cases, designed the data models, and pair-programmed every architectural decision from ADR-001 to ADR-033.

  • 20K+ lines of Python & TypeScript
  • Pipeline architecture & debugging
  • This showcase site (Next.js)
  • Blog posts & documentation
AI Partner

Gemini Pro

Google

The vision engine and creative counsel. Watches cricket frames and generates Hindi and English commentary with persona-specific cadence. Also consulted on site design decisions, UX flows, and the visual direction that shaped what you're looking at right now.

  • Multimodal cricket commentary
  • Hindi persona narration
  • Design & UX consultation
  • Creative direction & ideation

Every line of code, every design decision, every “what if we tried...” moment — arrived at together. The future of building isn't human or AI. It's human with AI.