Comparison · STT API

Orchard vs Deepgram.

Deepgram is the developer-first STT incumbent — Nova-3 is solid, their docs are good, and the streaming WebSocket has been the benchmark for low-latency voice agents. Orchard sits at roughly 10% of Deepgram's per-minute price, ships an OpenAI-compatible SDK, and bundles TTS + voice cloning on the same balance. Below, the deltas in plain numbers and an honest "when Deepgram still wins".

Last updated 2026-06-22 · Deepgram prices from deepgram.com/pricing

FieldOrchardDeepgram
Price per minute (PAYG, batch)$0.00042$0.0036 (Nova-3 batch)
Price per minute (streaming)async batch only today$0.0058 (Nova-3 streaming)
Plan with 30,000 min / mo$25 (Pro, all products)$108 metered (batch only)
Free credit / tier500 min / mo$200 one-time credit
Real-time WebSocket streaming✗ (roadmap)
Async + webhooks
Speaker diarization✓ (pyannote 3.1)✓ (Nova-3 diar)
Spanish (LATAM) tier-1 quality✓ (rioplatense fine-tune)generic multilingual
TTS on same balance✓ (17 languages)Aura (separate product, $0.015/1k chars)
Voice cloning on same balance✓ (F5 + XTTS)
OpenAI Whisper API compatibility✓ (drop-in)proprietary
On-prem deployment✗ (managed cloud)✓ (Enterprise tier)
Data used for trainingneveropt-out

~8.5× cheaper than Nova-3 batch · ~14× cheaper than Nova-3 streaming · TTS + clone bundled

Where the 10× delta lives

Pricing

Deepgram's headline rate is $0.0036 per minute on Nova-3 batch (the cheapest tier), climbing to $0.0058 per minute for streaming. Orchard sits at $0.00042 per minute flat — 8.5× cheaper than Deepgram batch, 14× cheaper than streaming. The compound effect at sustained volume is significant:

Deepgram Nova-3 (batch)
30,000 min × $0.0036
$108 / month
Orchard PAYG
30,000 min × $0.00042
$12.60 / month
Orchard Pro plan
Flat
$25 / month (30k min + diar + TTS + clone)

The Deepgram Growth tier ($4K prepaid for 25% off) brings the effective rate down to ~$0.0027/min. Orchard PAYG still beats that by ~6.5×, and the Pro plan ($25/mo with 30k included minutes) brings the effective rate to ~$0.00083/min — still 3× under Deepgram's deepest committed discount.

Nova-3 vs Whisper-class

Accuracy

Deepgram Nova-3 is a proprietary model trained from scratch on a mix of public + licensed audio; it consistently ranks near the top on English WER benchmarks. Orchard runs a tuned Whisper-large derivative (whisper.cpp + Core ML) — same model family OpenAI used for the original Whisper API, with our optimizations for throughput and Spanish.

  • English (clean speech): Nova-3 typically wins by 1-2% absolute WER on US-accent corpora. Whisper-class is competitive.
  • English (noisy / accented): Roughly parity. Whisper's larger pre-training corpus helps with accents Nova-3 saw less of.
  • Spanish (neutral LATAM): Parity within ±1.5% WER. Both handle it well.
  • Spanish (rioplatense): Orchard wins by ~4% absolute — we ship a fine-tune for porteño speech that nobody else has.
  • Multispeaker diar: Both ship diarization. Our pipeline (pyannote 3.1 GPU) is included on every plan, Deepgram bills it as an add-on.
Batch vs streaming

Latency & throughput

This is the dimension where Deepgram still leads. Their streaming WebSocket lands transcripts in <300 ms — best-in-class for real-time voice agents that need partial transcripts as the user speaks. Orchard's sync endpoint targets the same use case but via short HTTP requests (~150 ms p50 on a 5 s clip), not a persistent WebSocket; full streaming is on the roadmap.

  • Real-time partial transcripts: Deepgram wins today via WebSocket.
  • Sync STT (utterance complete): Roughly parity (~150 ms vs ~200 ms p50 on a 5 s clip).
  • Batch (long-form): Orchard's cluster shards a 60-min audio across nodes — landing in ~90 s wall-time, ~40× real-time end-to-end. Deepgram batch is single-stream; the comparable 60-min job is ~3-4 min wall.
  • Webhooks for async: Both support callback URLs on job completion.
One balance, three products

What's included beyond STT

Deepgram is primarily an STT vendor that added Aura (TTS) as a separate product with separate metering. Orchard ships three products on the same balance:

Text-to-Speech

12 voices across 17 languages on the Piper engine. Sub-2 s synth latency on CPU. Same per-minute rate as STT, no separate billing.

Voice cloning

F5-AR for Spanish (rioplatense fine-tune), XTTS for the other 16 languages. 6-60 s reference, unlimited synth thereafter.

Speaker diarization

pyannote.audio 3.1 on GPU. 30-min audio diarized in 4 s. 1.5× the per-minute cost, included quota on every plan.

OpenAI SDK swap

Migration

Orchard mirrors the OpenAI Whisper request/response shape. If you already wrap Deepgram in your own service layer, swapping the backend is a single base URL change in the OpenAI SDK:

// Deepgram
const dg = createClient(process.env.DEEPGRAM_API_KEY);
const { result } = await dg.listen.prerecorded.transcribeFile(
  fs.readFileSync("podcast.mp3"),
  { model: "nova-3", smart_format: true, diarize: true }
);

// Orchard (via OpenAI SDK — drop-in compatible)
const client = new OpenAI({
  baseURL: "https://api.orchardrun.com/v1",
  apiKey:  process.env.ORCHARD_API_KEY,
});
const transcription = await client.audio.transcriptions.create({
  file:  fs.createReadStream("podcast.mp3"),
  model: "whisper-1",
});

For diarized + async jobs, see /docs#async. We accept diarize=true and webhook_url=... as form params on the upload endpoint.

Honest tradeoff

When Deepgram is the right call

Three scenarios where Deepgram is the better pick today:

  • Real-time streaming voice agents. If you need partial transcripts as the user speaks (sub-300 ms WebSocket), Deepgram's streaming endpoint is the best-in-class today. Our streaming is on the roadmap; until it ships, the sync HTTP endpoint is the closest we have.
  • On-prem / VPC deployment. Deepgram offers a self-hosted variant on the Enterprise tier for customers with strict data-residency or air-gap requirements. Orchard is managed-only — your audio leaves your environment, hits our cluster, and the result comes back. Audio is dropped from RAM on response, never trained on, but it does transit.
  • English-only US-accent corpora. Nova-3 wins by 1-2% absolute WER on clean US-accent English. If that 1-2% matters more than the 10× price delta (regulated industries, medical transcription with strict accuracy SLAs), stay on Nova-3.

For everything else — high-volume batch, multilingual workloads, anything Spanish-heavy, anything that needs TTS or voice cloning on the same bill — Orchard is the economically obvious choice.

Common pre-migration questions

FAQ

Why is Orchard so much cheaper than Deepgram?+
We run our own cluster of Apple Silicon nodes optimized for whisper.cpp batch throughput, not rented GPU time. The infrastructure cost per minute is a fraction of what hyperscale GPU clouds charge — and we pass that through cleanly. Margins stay healthy because of bundling (one customer using STT + TTS + clone is three product line items on one bill).
Will the accuracy be noticeably worse than Nova-3?+
On clean US-accent English, Nova-3 wins by 1-2% absolute WER. On accented English, Spanish, Portuguese and other languages, Whisper-class typically matches or beats Nova-3 because of the broader pre-training corpus. Run both against your real audio using the free tier (500 min/mo) before committing.
Do you support real-time streaming today?+
Not yet. The sync endpoint handles short utterances (under 60 s) with ~150 ms p50 latency — good enough for many voice-agent use cases. True WebSocket streaming with partial transcripts is on the roadmap. If streaming is your hard requirement today, Deepgram is the right pick.
What about on-prem deployment?+
Orchard is managed-only. If you need a self-hosted variant for data residency or air-gap requirements, Deepgram has an Enterprise option for that. For most managed-cloud workloads our security posture (audio dropped from RAM on response, no training on customer data, 48 h transcript TTL) clears the bar.
Is there a free tier I can test against?+
Yes: 500 min / month, no credit card, includes all three products (STT + TTS + Clone). Run your own benchmark vs Nova-3 on the same audio before committing.

Same minute. $0.0036. Now $0.00042.

OpenAI-compatible SDK, no fork. Diar, TTS and voice cloning on the same balance. Free 500 min a month to benchmark against Nova-3 on your own audio.