Orchard vs Deepgram.
Deepgram is the developer-first STT incumbent — Nova-3 is solid, their docs are good, and the streaming WebSocket has been the benchmark for low-latency voice agents. Orchard sits at roughly 10% of Deepgram's per-minute price, ships an OpenAI-compatible SDK, and bundles TTS + voice cloning on the same balance. Below, the deltas in plain numbers and an honest "when Deepgram still wins".
Last updated 2026-06-22 · Deepgram prices from deepgram.com/pricing
| Field | Orchard | Deepgram |
|---|---|---|
| Price per minute (PAYG, batch) | $0.00042 | $0.0036 (Nova-3 batch) |
| Price per minute (streaming) | async batch only today | $0.0058 (Nova-3 streaming) |
| Plan with 30,000 min / mo | $25 (Pro, all products) | $108 metered (batch only) |
| Free credit / tier | 500 min / mo | $200 one-time credit |
| Real-time WebSocket streaming | ✗ (roadmap) | ✓ |
| Async + webhooks | ✓ | ✓ |
| Speaker diarization | ✓ (pyannote 3.1) | ✓ (Nova-3 diar) |
| Spanish (LATAM) tier-1 quality | ✓ (rioplatense fine-tune) | generic multilingual |
| TTS on same balance | ✓ (17 languages) | Aura (separate product, $0.015/1k chars) |
| Voice cloning on same balance | ✓ (F5 + XTTS) | ✗ |
| OpenAI Whisper API compatibility | ✓ (drop-in) | proprietary |
| On-prem deployment | ✗ (managed cloud) | ✓ (Enterprise tier) |
| Data used for training | never | opt-out |
~8.5× cheaper than Nova-3 batch · ~14× cheaper than Nova-3 streaming · TTS + clone bundled
Pricing
Deepgram's headline rate is $0.0036 per minute on Nova-3 batch (the cheapest tier), climbing to $0.0058 per minute for streaming. Orchard sits at $0.00042 per minute flat — 8.5× cheaper than Deepgram batch, 14× cheaper than streaming. The compound effect at sustained volume is significant:
The Deepgram Growth tier ($4K prepaid for 25% off) brings the effective rate down to ~$0.0027/min. Orchard PAYG still beats that by ~6.5×, and the Pro plan ($25/mo with 30k included minutes) brings the effective rate to ~$0.00083/min — still 3× under Deepgram's deepest committed discount.
Accuracy
Deepgram Nova-3 is a proprietary model trained from scratch on a mix of public + licensed audio; it consistently ranks near the top on English WER benchmarks. Orchard runs a tuned Whisper-large derivative (whisper.cpp + Core ML) — same model family OpenAI used for the original Whisper API, with our optimizations for throughput and Spanish.
- English (clean speech): Nova-3 typically wins by 1-2% absolute WER on US-accent corpora. Whisper-class is competitive.
- English (noisy / accented): Roughly parity. Whisper's larger pre-training corpus helps with accents Nova-3 saw less of.
- Spanish (neutral LATAM): Parity within ±1.5% WER. Both handle it well.
- Spanish (rioplatense): Orchard wins by ~4% absolute — we ship a fine-tune for porteño speech that nobody else has.
- Multispeaker diar: Both ship diarization. Our pipeline (pyannote 3.1 GPU) is included on every plan, Deepgram bills it as an add-on.
Latency & throughput
This is the dimension where Deepgram still leads. Their streaming WebSocket lands transcripts in <300 ms — best-in-class for real-time voice agents that need partial transcripts as the user speaks. Orchard's sync endpoint targets the same use case but via short HTTP requests (~150 ms p50 on a 5 s clip), not a persistent WebSocket; full streaming is on the roadmap.
- Real-time partial transcripts: Deepgram wins today via WebSocket.
- Sync STT (utterance complete): Roughly parity (~150 ms vs ~200 ms p50 on a 5 s clip).
- Batch (long-form): Orchard's cluster shards a 60-min audio across nodes — landing in ~90 s wall-time, ~40× real-time end-to-end. Deepgram batch is single-stream; the comparable 60-min job is ~3-4 min wall.
- Webhooks for async: Both support callback URLs on job completion.
What's included beyond STT
Deepgram is primarily an STT vendor that added Aura (TTS) as a separate product with separate metering. Orchard ships three products on the same balance:
Text-to-Speech
12 voices across 17 languages on the Piper engine. Sub-2 s synth latency on CPU. Same per-minute rate as STT, no separate billing.
Voice cloning
F5-AR for Spanish (rioplatense fine-tune), XTTS for the other 16 languages. 6-60 s reference, unlimited synth thereafter.
Speaker diarization
pyannote.audio 3.1 on GPU. 30-min audio diarized in 4 s. 1.5× the per-minute cost, included quota on every plan.
Migration
Orchard mirrors the OpenAI Whisper request/response shape. If you already wrap Deepgram in your own service layer, swapping the backend is a single base URL change in the OpenAI SDK:
// Deepgram
const dg = createClient(process.env.DEEPGRAM_API_KEY);
const { result } = await dg.listen.prerecorded.transcribeFile(
fs.readFileSync("podcast.mp3"),
{ model: "nova-3", smart_format: true, diarize: true }
);
// Orchard (via OpenAI SDK — drop-in compatible)
const client = new OpenAI({
baseURL: "https://api.orchardrun.com/v1",
apiKey: process.env.ORCHARD_API_KEY,
});
const transcription = await client.audio.transcriptions.create({
file: fs.createReadStream("podcast.mp3"),
model: "whisper-1",
});For diarized + async jobs, see /docs#async. We accept diarize=true and webhook_url=... as form params on the upload endpoint.
When Deepgram is the right call
Three scenarios where Deepgram is the better pick today:
- Real-time streaming voice agents. If you need partial transcripts as the user speaks (sub-300 ms WebSocket), Deepgram's streaming endpoint is the best-in-class today. Our streaming is on the roadmap; until it ships, the sync HTTP endpoint is the closest we have.
- On-prem / VPC deployment. Deepgram offers a self-hosted variant on the Enterprise tier for customers with strict data-residency or air-gap requirements. Orchard is managed-only — your audio leaves your environment, hits our cluster, and the result comes back. Audio is dropped from RAM on response, never trained on, but it does transit.
- English-only US-accent corpora. Nova-3 wins by 1-2% absolute WER on clean US-accent English. If that 1-2% matters more than the 10× price delta (regulated industries, medical transcription with strict accuracy SLAs), stay on Nova-3.
For everything else — high-volume batch, multilingual workloads, anything Spanish-heavy, anything that needs TTS or voice cloning on the same bill — Orchard is the economically obvious choice.
FAQ
Why is Orchard so much cheaper than Deepgram?+
Will the accuracy be noticeably worse than Nova-3?+
Do you support real-time streaming today?+
What about on-prem deployment?+
Is there a free tier I can test against?+
Same minute. $0.0036.
Now $0.00042.
OpenAI-compatible SDK, no fork. Diar, TTS and voice cloning on the same balance. Free 500 min a month to benchmark against Nova-3 on your own audio.