60dB delivers the lowest
Word Error Rate on Hindi
Across 9,997 Hindi clips and 122,747 reference words spanning read, synthetic, conversational and noisy speech, 60dB achieves the lowest overall WER โ and ranks #1 on real-world conversational Hindi, winning 4 of 6 datasets.
Generated 2026-05-31 ยท Lower WER is better
Overall ranking
Overall โ primary ranking
Five datasets (FLEURS excluded for fairness)
| # | Provider | WER | Accuracy |
|---|---|---|---|
| ๐ฅ | 60dB (HTTP / batch)Us | 12.95% | 87.05% |
| ๐ฅ | Ringg AI | 14.92% | 85.08% |
| ๐ฅ | 60dB (WebSocket / streaming)Us | 15.65% | 84.35% |
| 4 | ElevenLabs | 15.74% | 84.26% |
| 5 | Deepgram | 20.69% | 79.31% |
| 6 | Sarvam AI | 22.16% | 77.84% |
Overall โ all six datasets
Including FLEURS
| # | Provider | WER | Accuracy |
|---|---|---|---|
| ๐ฅ | 60dB (HTTP / batch)Us | 12.96% | 87.04% |
| ๐ฅ | 60dB (WebSocket / streaming)Us | 15.49% | 84.51% |
| ๐ฅ | Ringg AI | 15.82% | 84.18% |
| 4 | ElevenLabs | 16.66% | 83.34% |
| 5 | Deepgram | 21.46% | 78.54% |
| 6 | Sarvam AI | 23.16% | 76.84% |
Why two tables? The FLEURS subset's pre-computed vendor columns contain data-quality artifacts (invalid-word placeholders) that inflate competitor error rates, so our primary ranking excludes it for fairness. 60dB leads both ways.
Results by dataset
Six public Hindi datasets covering read speech, synthetic audio, and conversational speech with and without noise.
Common Voice
Read speech
| # | Provider | WER |
|---|---|---|
| ๐ฅ | ElevenLabs | 15.23% |
| ๐ฅ | Ringg AI | 16.01% |
| ๐ฅ | 60dB (HTTP / batch)Us | 17.72% |
| 4 | 60dB (WebSocket / streaming)Us | 20.21% |
| 5 | Deepgram | 21.56% |
| 6 | Sarvam AI | 23.21% |
FLEURS
Read speech
| # | Provider | WER |
|---|---|---|
| ๐ฅ | 60dB (HTTP / batch)Us | 13.09% |
| ๐ฅ | 60dB (WebSocket / streaming)Us | 13.72% |
| ๐ฅ | Ringg AI | 25.62% |
| 4 | ElevenLabs | 26.79% |
| 5 | Deepgram | 29.91% |
| 6 | Sarvam AI | 34.19% |
IndicTTS
Synthetic
| # | Provider | WER |
|---|---|---|
| ๐ฅ | 60dB (HTTP / batch)Us | 11.51% |
| ๐ฅ | Ringg AI | 11.83% |
| ๐ฅ | 60dB (WebSocket / streaming)Us | 11.87% |
| 4 | ElevenLabs | 13.87% |
| 5 | Deepgram | 15.16% |
| 6 | Sarvam AI | 23.92% |
Kathbath
Conversational
| # | Provider | WER |
|---|---|---|
| ๐ฅ | 60dB (HTTP / batch)Us | 12.83% |
| ๐ฅ | Ringg AI | 13.08% |
| ๐ฅ | 60dB (WebSocket / streaming)Us | 15.20% |
| 4 | ElevenLabs | 15.56% |
| 5 | Deepgram | 17.80% |
| 6 | Sarvam AI | 23.01% |
Kathbath-noisy
Conversational + noise
| # | Provider | WER |
|---|---|---|
| ๐ฅ | 60dB (HTTP / batch)Us | 14.14% |
| ๐ฅ | Ringg AI | 14.39% |
| ๐ฅ | ElevenLabs | 15.38% |
| 4 | 60dB (WebSocket / streaming)Us | 16.43% |
| 5 | Deepgram | 19.04% |
| 6 | Sarvam AI | 23.74% |
MUCS
Conversational
| # | Provider | WER |
|---|---|---|
| ๐ฅ | 60dB (HTTP / batch)Us | 10.90% |
| ๐ฅ | 60dB (WebSocket / streaming)Us | 14.09% |
| ๐ฅ | Ringg AI | 15.78% |
| 4 | ElevenLabs | 16.22% |
| 5 | Sarvam AI | 20.60% |
| 6 | Deepgram | 22.71% |
The dataset
| Dataset | Type | Clips | Source |
|---|---|---|---|
| Common Voice | Read speech | 1,727 | SkunkWorkLabs/hindi-asr-benchmark |
| FLEURS | Read speech | 417 | SkunkWorkLabs/hindi-asr-benchmark |
| IndicTTS | Synthetic | 98 | SkunkWorkLabs/hindi-asr-benchmark |
| Kathbath | Conversational | 1,929 | RinggAI/ASR-Benchmarking-Dataset |
| Kathbath-noisy | Conversational + noise | 1,929 | RinggAI/ASR-Benchmarking-Dataset |
| MUCS | Conversational | 3,897 | RinggAI/ASR-Benchmarking-Dataset |
| Total | 9,997 | ||
Methodology
- Datasets: Hindi eval splits of SkunkWorkLabs/hindi-asr-benchmark and RinggAI/ASR-Benchmarking-Dataset โ six subsets (Common Voice, FLEURS, IndicTTS, Kathbath, Kathbath-noisy, MUCS).
- Sample: every record in each dataset โ 9,997 clips total, no sampling.
- Metric: Word Error Rate (WER), lower is better, aggregated by total reference words (micro-average) so longer clips contribute proportionally.
- 60dB: transcribed live through our production APIs โ both the HTTP/batch endpoint and the WebSocket/streaming endpoint โ using the language hint hi,en.
- Other providers: word error rates are taken from the datasets' own pre-computed, normalized vendor columns (we did not re-run those services); they reflect each vendor's result at the time the dataset authors ran them.
- Coverage: 60dB (HTTP) 9,997/9,997; vendor columns 9,914โ9,996/9,997 (a handful of blank cells per provider).
Notes & disclosures
- 1Vendor numbers are dataset-provided, not produced by us, and may reflect older API versions of those services.
- 2The RinggAI dataset is published by Ringg AI โ a competitor's own benchmark โ which can favour their reference conventions. 60dB still leads it overall.
- 3FLEURS vendor columns are partly corrupted; we report it transparently but exclude it from the headline ranking.
- 4Common Voice (clean read-speech) is our weakest subset. Raw WER there also penalises rendering convention โ 60dB writes common English loanwords in Latin script (e.g. branch manager, image) where the reference uses Devanagari; much of that gap is stylistic, not accuracy.
- 5Streaming (WebSocket) vs batch (HTTP): the real-time streaming path scores slightly higher WER than batch (expected โ it commits words incrementally under latency constraints) yet still ranks at or near the top among all providers.
