The hidden STT tax breaking conversation intelligence margins
Conversation intelligence looks like SaaS on the pitch deck and bleeds like usage-priced infra on the P&L. The math behind the floor most CI buyers never benchmark — and the API layer that lifts it.
Conversation Intelligence is one of the most attractive software categories of the decade — sticky, expensive, sold to revenue leaders who feel the pain every quarter. It also has a margin floor almost nobody benchmarks, hidden inside a line item the buyer never sees on the invoice: speech-to-text.
A platform charging $1,500 per seat per year on a 1,000-seat contract is booking $1.5M ARR. That number looks SaaS-shaped. The COGS underneath it is not.
The margin math the pitch deck skips
Each seat processes, on average, eight to twelve hours of recorded conversation per month. At ten hours per seat per month across 1,000 seats, the platform is grinding through 120,000 hours of audio per year — about 7.2 million minutes — just to produce the transcripts, speaker labels and downstream intelligence the seat license is sold on.
Run that volume through any of the major STT providers at their listed prices and you can read the floor straight off the calculator:
- AWS Transcribe: $0.024 / min × 7.2M min = $172,800 / year
- AssemblyAI Universal + intelligence: ~$0.0162 / min × 7.2M min = $116,640 / year
- Deepgram Nova-3 + diarization: ~$0.0089 / min × 7.2M min = $64,080 / year
- OpenAI Whisper API: $0.006 / min × 7.2M min = $43,200 / year
Even at Deepgram's price — generally the cheapest of the intelligence-bundled names — you've just handed back $64K of gross margin per 1,000 seats per year. On a $1.5M ARR contract that's a 4.3% structural drag the finance team doesn't get to negotiate down. It's per-minute, contractual, and indexed to platform stickiness: the more your customers use you, the worse it gets.
Conversation Intelligence is a SaaS layer sitting on a usage-priced floor that compounds against you.
Why this is structural, not negotiable
The temptation in any CI war room is to push back on the STT bill: "we'll negotiate a volume tier." That works for a while. The three largest STT vendors all offer enterprise commits and will discount 20-40% off list at meaningful volume. The math improves; the shape doesn't:
- The discount is locked to commit. If usage dips below the commit — a churned customer, a quiet quarter — you pay for capacity you didn't burn.
- The discount is off list, and list prices drift up as the vendor stacks new feature add-ons (sentiment, custom vocabulary, redaction). The discount caps before the list does.
- The discount is not portable. The day you want to renegotiate, your migration cost is your customers' downtime risk — and the vendor knows it.
None of this is vendor villainy. It's the natural shape of usage-priced infrastructure sitting underneath a flat-rate SaaS contract. The only way out is to change the unit economics of the input itself.
Lifting the floor
Orchard's per-minute rate for the same workload — transcription plus speaker diarization, the bundle CI platforms actually consume — is $0.00042 per minute at production volume. Run the same 7.2M minutes through it and the line item drops from $64K to $3,024 per year.
That isn't a percent saving. It's a category change. A 4.3% margin drag becomes a rounding error. On the next 10× of growth — 10,000 seats, 72M minutes — Deepgram's bill scales linearly to $640K per year while Orchard's scales to about $30K. The delta funds two and a half engineers, or buys back the unit-economics narrative your board diligence keeps poking at.
Who this matters for
If you're a CI platform between $1M and $50M ARR, the STT line item is somewhere between 3% and 8% of revenue depending on how many intelligence features you've bolted on. Rip out that drag and you've added a year of runway to the next fundraise without renegotiating headcount, churn or contract length.
If you're under $1M ARR, the dollar amount is smaller but the shape is the same — and easier to fix early than late. Sticky customers compound this bill into a problem you'll otherwise refactor around later, when migration risk is much higher than it is today.
Either way: the floor isn't immovable. It's a vendor choice.