The Subtle Commentator
AI that watches cricket and speaks like a legend.
Seven persona voices cloned from 10 seconds of audio. Hindi and English commentary from live TV — all running locally on a single GPU.
Cricket commentary is an art form. Richie Benaud said “Gone.” and the world understood. Sushil Doshi turned a six into poetry. Tony Greig made every boundary sound like a revolution.
We built a system that watches a live cricket broadcast — the actual TV signal — and generates commentary in real time. Not subtitles. Not summaries. Full persona-driven narration with the cadence, vocabulary, and emotional instinct of the greats.
Video Swin V3 classifies 32-frame bursts — six, wicket, dot ball, or silence.
Qwen3 14B writes persona commentary with match context — overs, score, pressure, momentum.
XTTS-v2 (English) + IndicF5 (Hindi) — zero-shot voice cloning from 10-second reference clips.
The Voices
Three legends, one authority, and three wild cards. Each persona has a distinct vocabulary, emotional range, and minimalism score that controls how much — or how little — it says.
Minimalist
“Gone.”
The gold standard. Benaud never wasted a word. When a wicket falls, a single, perfectly timed observation — then silence. Let the game breathe.
Dramatic
“That's gone into the stands!”
Big, bold, theatrical. Greig turns every boundary into an event and every wicket into a story. Longer commentary, more animated, exclamation marks it actually earns.
Dramatic
“आउट! और गया!”
The voice of a billion cricket fans. Hindi commentary that blends poetic flair with heartland passion. The AI doesn't translate English — it thinks in Hindi.
Authoritative
“That is a serious delivery.”
The quiet confidence of someone who bowled 160 km/h. Appreciates fast bowling above all else. Smooth Jamaican cadence, unhurried, fair but firm.
Unstoppable
“जब तक ये बल्ला चलेगा, मैं नहीं रुकूँगी!”
Non-stop, breathless, delightfully chaotic. Basanti doesn't pause — she narrates the ball, the crowd, the weather, and her tanga in one breath. Commentary that never stops for air.
Satirical
“Flop! Yeh toh flop show hai!”
Deadpan genius. Bhatti turns a dropped catch into a commentary on Indian bureaucracy and a wide ball into social satire. The funniest commentator who never existed — until now.
Chaotic
“Patient critical hai!”
Dr. Mashoor Gulati commentates cricket as a medical emergency. Every six is a heart attack case, every wicket is surgery required. He flirts with the cricket itself.
How It Works
Five stages, each building on the last. The entire pipeline runs on a Z390 server with an RTX 5060 Ti — no cloud GPUs, no subscriptions.
Roku HDMI signal splits — one to TV, one to Elgato Cam Link 4K on Z390. Ticker diff detects scorecard changes. On trigger: 10-second burst at 30fps (300 frames). Rate-limited to one burst per delivery.
Cam Link 4K + V4L2 + burst controllerVideo Swin V3 samples 32 frames from the burst and classifies the event: six, wicket, dot ball, boundary four, or non_action. Confidence filter (0.60) silences uncertain predictions. The AI knows when to stay quiet.
Video Swin V3 (27.9M params, BF16)Qwen3 14B runs locally via Ollama. It receives the event classification, match context (overs, batter, bowler, score), and a persona system prompt. Streams commentary text in real time.
Qwen3 14B via Ollama (local)The context engine tracks run rate, required rate, wickets in hand, partnerships, and milestones. A wicket at 350-3 gets measured commentary. A wicket in a tight chase gets urgency.
Context engine + Cricsheet dataDual-engine TTS — language router sends English to XTTS-v2 and Hindi to IndicF5. Both clone the persona's voice from a 10-second reference clip. Zero-shot — no fine-tuning needed.
XTTS-v2 (English) + IndicF5 (Hindi)The Numbers
Architecture
Demo
One ball. Three legendary styles. See how personality shapes the narrative.
The Moment
Virat Kohli pulls a short ball from Shaheen Shah Afridi over midwicket for six. India 156-3, needing 28 from 18 balls.
“Magnificent. Six.”
“That's been absolutely launched into the stands! Kohli at his imperious best! What a player!”
“छक्का! विराट कोहली ने कमाल कर दिया! क्या मारा है! बॉल स्टैंड में गई!”
“अरे ये देखो कोहली ने मारा छक्का, वो भी शाहीन को, वो भी मिडविकेट पर, और स्टैंड में बैठी आंटी का समोसा गिर गया, अरे बाप रे बाप, 28 रन चाहिए अभी भी, पर कोहली है ना, कोहली है तो मुमकिन है, मैं तो कहती हूँ—”
“छक्का मारा। बहुत अच्छे। अब बॉल ढूंढने कौन जाएगा? Stadium committee ने बॉल recovery के लिए tender निकाला है — last date next Friday।”
Audio playback coming soon — TTS pipeline in active development
Writing
From HDMI cable to real-time Video Swin classification — the full data pipeline that turns live TV into labeled cricket frames.
How grouping frames into temporal blocks cut API calls by 34x and made the commentary sound like it understood the game.
Class imbalance, early stopping at epoch 8, and the argument for shipping an imperfect model to production.
About
सूक्ष्म वाचक means “The Subtle Narrator” in Sanskrit. It started as a question: could an AI watch cricket the way a human commentator does — not just describing what happened, but understanding why it mattered?
Built by Aman Misra with two AI partners — not as tools, but as collaborators. It runs entirely on self-hosted infrastructure: a Z390 server with an RTX 5060 Ti for capture and compute, a Pi 5 for DNS and storage, and a Mac for development. No GPU rental. No SaaS subscriptions. Every architectural decision is documented in 33 ADRs.
The Team
Not a human using AI tools. Three collaborators in a studio apartment, each with a distinct role, crossing every hurdle together — from HDMI cable debugging at midnight to choosing the right shade of saffron for Doshi's card.
Architect, cricket domain expert, infrastructure engineer. Designed the pipeline, built the homelab (Z390 + RTX 5060 Ti + Pi), wrote the ADRs, and made every call on what ships and what doesn't. The one who knows why a wicket in the 49th over sounds different from one in the 5th.
Anthropic
The coding partner. 20,000+ lines of Python — the vision pipeline, commentary engine, TTS chain, FastAPI backend, and this very site. Wrote the scripts, debugged the edge cases, designed the data models, and pair-programmed every architectural decision from ADR-001 to ADR-033.
The vision engine and creative counsel. Watches cricket frames and generates Hindi and English commentary with persona-specific cadence. Also consulted on site design decisions, UX flows, and the visual direction that shaped what you're looking at right now.
Every line of code, every design decision, every “what if we tried...” moment — arrived at together. The future of building isn't human or AI. It's human with AI.