The 10M-minute audit: what speech analytics platforms actually pay for STT
If your platform processes 10 million minutes a month, your STT line item ranges from $4,200 to $240,000 depending on the vendor. The full per-minute audit with sourced prices, and the line items inside the line item.
Speech analytics platforms tend to talk about volume in seat counts and revenue in ARR. The number that decides the gross margin lives in neither place. It's the per-minute STT cost at the platform's actual monthly throughput, and at production scale it ranges almost two orders of magnitude across the vendors most teams evaluate.
The volume number nobody publishes
Public benchmarks rarely include 10M+ min/mo workloads because the vendors who can serve them charge enterprise contracts and prefer not to anchor pricing in the open. The volume number itself isn't exotic — a single CI platform with 1,000 moderately-active seats hits it; a contact-center QA tool running on a 5,000-agent client doubles past it. It's the threshold at which a 10× price spread stops being a footnote and starts being a board-level conversation.
At 10 million minutes per month, here is what the same workload — transcripts plus speaker diarization, delivered through the vendor's standard production endpoint — actually costs:
| Provider | Batch ($/min) | Real-time ($/min) | Notes |
|---|---|---|---|
| Orchard (production volume) | $0.00042 | $0.00042 | Bundle: STT + speaker diarization on the same per-minute rate. No per-feature add-ons. |
| OpenAI Whisper API | $0.0060 | — | Batch-only. Diarization is not a first-party feature. Real-time requires a separate vendor. |
| Deepgram Nova-3 | $0.0043 | $0.0077 | STT only. Diarization, sentiment, redaction, summarization each priced separately. |
| AssemblyAI Universal | $0.0117 | $0.0150 | Bundles speaker labels at the listed rate. Sentiment, entity detection, content moderation are paid add-ons. |
| Azure AI Speech | $0.0167 | $0.0167 | Standard tier. Conversation transcription with speaker separation listed as the same rate. |
| AWS Transcribe | $0.0240 | $0.0240 | Standard tier. Call Analytics (sentiment + categories) prices at \$0.0365/min for the first 250K min. |
| Google Cloud STT | $0.0240 | $0.0240 | Standard model. Speaker diarization is bundled at the rate above; Conversation Insights is a separate API. |
Multiplied across 10M minutes, the monthly bills land at:
- Orchard — $4,200 / month
- Deepgram Nova-3 — $43,000 / month (before diarization add-on)
- OpenAI Whisper API — $60,000 / month (batch only)
- AssemblyAI Universal — $117,000 / month
- Azure AI Speech — $167,000 / month
- AWS Transcribe — $240,000 / month (standard; Call Analytics is materially higher)
- Google Cloud STT — $240,000 / month
At 10M minutes a month, picking the wrong STT vendor is a $2.8M/year mistake.
The line items inside the line item
The per-minute number is rarely the per-minute number. Three line items move it materially in either direction depending on the vendor, and most internal audits miss at least one:
- Diarization billed separately. Deepgram, Rev.ai, Gladia, Google CCAI all charge speaker labels as an add-on. At call-center volume that's typically a 20-50% uplift on the published STT rate. Orchard bundles diarization at the same per-minute price; the rate you read is the rate you pay.
- Intelligence stacking. Sentiment, topic detection, entity extraction, summarization, redaction — each is a separate per-minute charge on AssemblyAI and AWS Transcribe Call Analytics. A "$0.0117/min" listed price ships at 3-4× that on the actual invoice once a real QA pipeline is wired up.
- Real-time premium. AssemblyAI, Rev.ai, Deepgram all charge real-time at 30-200% above batch. If your QA pipeline mixes both (live agent assist + recorded batch review), the blended rate doesn't match either column on the pricing page.
What changes at $4,200/mo
A 10M-minute platform spending $240K/month on AWS is spending $2.88M/year on a function that, executed correctly, costs $50K/year. The delta — $2.83M — is not a savings line. It's a re-deployable budget the size of a small engineering team's annual cost. The math doesn't ask whether the migration is worth it. It asks how fast you can ship the swap.
Running the audit yourself
The cleanest version of this audit in our experience is four rows on a single spreadsheet:
- Last 6 months of actual minute volume (invoice or usage console).
- Last 6 months of actual STT spend across every line item: base STT, diarization, intelligence, real-time premium.
- Blended price per minute — divide spend by volume; this is the only number that matters for the comparison.
- Re-price that same volume at $0.00042/min and read the delta off the bottom-right cell.
That spreadsheet is the entire business case. The hard part of the migration is the spreadsheet, not the code; the code is one line.