EngineeringJune 28, 2026

How to cut conversation QA costs 90% without changing one line of code

If your QA platform runs on the OpenAI Whisper API or any of its drop-in clones, the cheapest STT minute on the market is a one-line migration away. The math, the swap, and what stays the same.

Mateo Bustamante5 min read

Conversation QA platforms run on speech-to-text the way warehouses run on electricity: invisible until the bill shows up, and impossible to negotiate once the building's already wired. The good news is that the wiring industry standardized years ago. The OpenAI Whisper API is the de facto STT contract; every modern provider speaks it. Which means you can change who bills you without changing how your code works.

The math at conversation-QA volumes

A mid-sized QA platform processes between 1 million and 20 million minutes of audio per month. At those volumes, here's what the same workload costs across the most common STT endpoints today:

5M minutes / month on OpenAI Whisper API at $0.006/min = $30,000 / month
Same 5M minutes on Deepgram Nova-3 at ~$0.0043/min = $21,500 / month
Same 5M minutes on Orchard at $0.00042/min = $2,100 / month

That's $334,800 of annualized OPEX you stop burning, with zero impact on the audio in or the JSON out. The bill that shrinks is a real cash bill — not deferred revenue, not amortized infrastructure spend, not a Q4 IT initiative. It's last quarter's burn, refunded forward.

The wiring standardized years ago. You can change who bills you without changing how your code works.

The one line that changes

If your codebase imports the official openai Python or Node SDK, here's the entire migration:

That's it. client.audio.transcriptions.create(...) keeps working. response_format="verbose_json" keeps returning the same shape with the same fields. Segment timestamps, language detection, prompt biasing — all on the same call signature. Your downstream parsers don't care that the JSON came from a different vendor.

What stays the same

SDK calls — official OpenAI client, Node and Python both supported. No vendor SDK to install.
Response schema — Whisper-compatible JSON. Verbose mode returns segments, timestamps, language, no rewrites.
Accuracy band — competitive WER with the top quartile of the segment, including 60+ languages and regional accents most generic models stumble on.
Diarization — bundled. Not stacked as a separate API call you re-pay for.
Latency — within the same band as the incumbent for batch workloads (60-130× real-time factor).

What changes

The bill — by an order of magnitude. The published per-minute rate is the rate at production volume; there is no metered surge tier that turns a busy month into a financial event.
The billing model — a monthly subscription aligned to the volume tier you actually use, not a consumption meter that punishes growth.
The vendor relationship — a roadmap that treats Spanish-speaking call-center audio as a tier-1 customer, not a long-tail edge case.

When to not bother

If your platform is under ~50K minutes a month, the dollar amount of the swap probably isn't worth the engineering attention this quarter. Stay on whichever endpoint already works, and bookmark this post for the day the volume number moves a digit.

If you're above that, the math has already paid for the migration before the second invoice arrives.

The math at conversation-QA volumes

The one line that changes

What stays the same

What changes

When to not bother

The cheapest minute on the market. 500 minutes free at signup, no card.