Orchard vs OpenAI Whisper API.
Same model family under the hood, very different surface. Orchard is API-compatible with the OpenAI Whisper endpoint — swap the base URL and key, the SDK keeps working. Below, the deltas in plain numbers: pricing, latency, what's included, where Whisper still wins, and a 30-second migration snippet.
Last updated 2026-06-22 · Whisper prices from openai.com/api/pricing
| Field | Orchard | OpenAI Whisper API |
|---|---|---|
| Price per minute (pay-as-you-go) | $0.00042 | $0.006 |
| Plan with 30,000 min / mo | $25 (Pro, includes diar + TTS + clone) | $180 metered |
| Real-time factor (cluster, batch) | 50-80× RT | ~25× RT (single-stream) |
| Max upload size | 500 MB | 25 MB |
| Async + webhooks | ✓ | ✗ |
| Speaker diarization | ✓ (pyannote 3.1, all plans) | ✗ |
| Text-to-Speech on same balance | ✓ (17 languages) | ✗ (separate product) |
| Voice cloning on same balance | ✓ (F5 ES + XTTS multilingual) | ✗ |
| Word-level timestamps | ✓ | ✓ |
| OpenAI Whisper API compatibility | ✓ (drop-in) | native |
| Spanish (LATAM) tier-1 quality | ✓ (rioplatense fine-tune) | generic |
| Free tier | 500 min / mo | none |
| Data used for training | never | opt-out (was default opt-in pre 2023) |
14× cheaper per minute · same OpenAI SDK calls · diar + TTS + clone bundled
Pricing
OpenAI charges $0.006 per minute flat on the Whisper API, no tier discount and no included quota. Orchard sits at $0.00042 per minute on the Pro plan — same minute, same Whisper-class accuracy, 14× less. For a single team transcribing 30,000 minutes a month (a typical podcast post-prod or call-analytics workload):
Where Orchard is meaningfully cheaper than Whisper-direct: anything past a few hundred minutes a month. The crossover point where Orchard beats OpenAI even on the$1/yr Hobby plan is roughly ~17 minutes a month. Past that, subscription saves you money vs Whisper pay-as-you-go.
Accuracy
Orchard runs on whisper.cpp with Core ML acceleration, served from our own cluster. The default model is orchard-stt-v1 — a tuned Whisper-large derivative. On the same English/Spanish test sets we use internally, WER lands within a percentage point of the OpenAI hosted endpoint, with a tilt in our favour on rioplatense Spanish where we ship a fine-tuned variant.
- English (LibriSpeech clean): parity ±1% WER.
- Spanish (CommonVoice ES): parity ±1.5% WER.
- Spanish (rioplatense, internal): +4% absolute over hosted Whisper (we fine-tuned on porteño speech).
- Multispeaker podcasts: with diarization on, Orchard returns speaker labels Whisper doesn't have at all.
Latency & throughput
OpenAI's Whisper API is single-stream per request. Orchard shards long audio across the cluster, so a 60-minute podcast lands in under 90 seconds wall time — that's ~40× real-time end-to-end, including upload and post-processing. The sync endpoint targets short utterances (under 60 s, voice agent territory), and the async endpoint with webhooks handles everything else.
- Sync POST /v1/audio/transcriptions: ~150 ms p50 on a 5 s clip.
- Async POST /v1/transcriptions/upload: 60 min audio → ~90 s wall.
- Webhooks: we POST the result to your URL when ready (Whisper has no async).
What's included beyond STT
Whisper API is a single endpoint: audio in, text out. Anything else — diarization, synthesis, voice cloning — is a separate provider, separate billing, separate code path. Orchard ships three products on the same balance:
Speaker diarization
pyannote.audio 3.1 on GPU. 4-second turnaround on a 30-min audio. 1.5× the per-minute cost, included quota on every plan.
Text-to-Speech
12 voices, 17 languages, Piper engine. Sub-2 s synth latency. Same per-minute price as STT, drawn from the same balance.
Voice cloning
F5 for Spanish (rioplatense fine-tune), XTTS for 16 other languages. 6-60 s reference, unlimited synth thereafter.
Migration
Because we mirror the OpenAI Whisper request/response shape, the official SDK works against Orchard with a base-URL swap. No fork, no shim, no waiting for an SDK update:
import OpenAI from "openai";
const client = new OpenAI({
- baseURL: "https://api.openai.com/v1",
- apiKey: process.env.OPENAI_API_KEY,
+ baseURL: "https://api.orchardrun.com/v1",
+ apiKey: process.env.ORCHARD_API_KEY,
});
const transcription = await client.audio.transcriptions.create({
file: fs.createReadStream("podcast.mp3"),
model: "whisper-1", // alias accepted, routed to orchard-stt-v1
});Same call, same response shape ({ text, language, duration, segments[] }), same SDK ergonomics. If you were already using OpenAI for STT, the migration is a single env var change in your CI/CD pipeline. Median migration time across customers who've done it: under an hour.
When OpenAI Whisper is the right call
If you're already deep in the OpenAI ecosystem (Assistants API, GPT-4 vision, the rest of the stack billed on one key) and your STT volume is genuinely tiny — under ~50 minutes a month — staying on Whisper saves you the cognitive load of a second vendor. The $0.006/min Whisper rate works out to under a dollar a month at that volume, and the lack of diarization or TTS may not matter to your use case.
Past that point — high-volume workloads, anything multilingual or Spanish-heavy, anything that needs diarization or TTS bundled, or anything that benefits from async + webhooks — Orchard is the economically obvious choice.
FAQ
Is Orchard literally running Whisper under the hood?+
What about whisper-large-v3?+
Can I send a 2-hour audio in one request?+
What happens to my data?+
Is there a free tier I can test against?+
Stop paying $0.006/min.
Start at $0.00042.
Same SDK calls. Same Whisper-class accuracy. Diar, TTS and voice cloning on the same balance. Free 500 min a month to run your own benchmark before paying.