PricingJune 28, 2026

The diarization tax: the silent line item killing call center analytics margins

Speaker diarization is the feature that turns transcription into analytics. Most STT vendors price it as an optional add-on at a 30-50% uplift on base. Why it should be bundled, and what the math looks like when it is.

Mateo Bustamante5 min read

Call center analytics without speaker labels isn't analytics. It's a wall of text with no idea who said which sentence, which is exactly the part of the call your QA team actually scores. So every analytics pipeline runs diarization. And every major STT vendor, with rare exceptions, prices it as if it were a luxury.

What it costs at the names you've heard of

Diarization add-on pricing across the public pages, as of June 2026:

Deepgram Nova-3 — base STT $0.0043/min; diarization adds $0.0040/min on top. That's a 93% uplift on the line item that powers your category.
Google Cloud STT — diarization is bundled at the base rate ($0.024/min), but Conversation Insights — the speaker-aware analytics layer most CCaaS teams actually buy — sits behind a separate API at a markup above that.
Rev.ai Reverb — base $0.0030/min; speakers $0.0055/min. That's a 183% uplift.
Gladia — Solo plan + diarization add-on lands at ~$0.0136/min, against ~$0.0095/min for base. Roughly a 43% uplift.
AssemblyAI Universal — speaker labels are included in the listed rate, but every adjacent intelligence feature (sentiment, entities, topics) prices separately, which is functionally the same problem moved one cell over.

Charging extra for speaker diarization inside a call analytics pipeline is like charging extra for headers inside an email client.

Why it's priced this way

Diarization is its own model. It runs after — or in parallel with — the acoustic STT model, performs speaker embedding, clusters, and aligns the cluster labels back to the transcript. It has its own compute footprint and its own accuracy frontier. Vendors price it separately because they can charge for it separately, and because separating it makes the headline STT rate look more competitive than it actually is.

That logic survives in a generic developer market where some customers want STT only. It breaks immediately in a category like call center analytics, where every minute the platform ingests needs speaker labels. The "optional" add-on is, in practice, 100% attached at 100% of customer volume.

The math at a real call center scale

Take a mid-sized call center analytics platform processing 5M minutes a month:

Deepgram Nova-3 + diarization at $0.0083/min combined = $41,500 / month (base alone would be $21,500; the tax adds $20,000 / month).
Rev.ai Reverb + speakers at $0.0085/min combined = $42,500 / month (base alone $15,000; tax adds $27,500 / month).
Orchard, bundled at $0.00042/min = $2,100 / month. There is no add-on column.

Annualized, the tax line alone — the gap between base STT and base + diarization — runs $240K to $330K per year at 5M-minute scale on the most common providers. That's pure margin recovery available to any team that consolidates on a vendor that doesn't price the feature separately.

Who this tax hurts the most

Two categories of platform pay the highest effective tax:

Call center analytics and QA — diarization is on 100% of minutes; the add-on is a permanent uplift indexed to customer success (more usage = more tax).
Meeting / podcast intelligence — multi- speaker by definition. Tools selling Gong-style insights or Otter-style transcripts can't skip the labels.

If diarization is optional in your product, this post is a footnote. If it's the feature that defines the product, the diarization tax isn't a vendor decision — it's a margin decision, and the only way to fix it is to consolidate on a vendor that doesn't charge for it.

What it costs at the names you've heard of

Why it's priced this way

The math at a real call center scale

Who this tax hurts the most

The cheapest minute on the market. 500 minutes free at signup, no card.