Comparison · TTS + Voice cloning

Orchard vs ElevenLabs.

ElevenLabs is the studio-grade option for premium voice cloning; Orchard is the production-grade option for teams who synthesize at volume. Different unit economics (per minute vs per character), different bundling (STT + diarization included vs sold separately), different best-fit. Below, the deltas in plain numbers and an honest "when each one wins".

Last updated 2026-06-22 · ElevenLabs prices from elevenlabs.io/pricing

FieldOrchardElevenLabs
Billing unitPer minute of audioPer character of input text
TTS price (approx, sustained use)$0.00042 / min$0.10-0.40 / min
Entry planFree 500 min / moFree 10K chars / mo (~10 min)
Mid plan (full feature)$10 / mo (Starter, 10k min)$22 / mo (Creator, 100k chars ≈ 100 min)
Voice cloning available$0 (all plans, 1-50 voices)$22+ Instant · $99+ Professional
Spanish (rioplatense) qualityTier-1 (F5-AR fine-tune)Generic Spanish only
Languages1732
STT (transcription)✓ (same balance)Scribe (separate billing)
Speaker diarization✓ (pyannote 3.1, included)Scribe add-on
Async + webhooks✓ (via Studio)
Voice library (pre-built)12 voices1,000+ in marketplace
Studio dubbing
Data used for trainingnevernever (paid plans)

Per minute, not per character · STT + diar bundled · 10-100× the synth volume for the same monthly bill

The character trap

Pricing

ElevenLabs meters in characters of input text, which is great when you forget how quickly a podcast script piles up characters. A 5-minute podcast is roughly 5,000 characters; at ElevenLabs Creator ($22 / 100K chars), that's 100 such podcasts a month before you start paying overage at $0.30 per 1K chars (i.e. ~$0.30 per podcast minute). Orchard meters in minutes of output audio, which is what you actually shipped to the user:

ElevenLabs Creator
100K chars (≈100 min) + overage at $0.30/1K chars
$22 / month + overage
ElevenLabs Pro
500K chars (≈500 min) for $99
$99 / month flat
Orchard Starter
10,000 min for $10 (≈100 hours)
$10 / month (incl. STT + diar + clone)

On low volume (under ~15 min/mo of synth) ElevenLabs Free is fine. Anything past that, especially anything that needs voice cloning, and Orchard saves 10× or more on the same workload.

Honest tradeoff

Voice cloning

ElevenLabs Professional Voice Cloning is the industry leader for single-voice fidelity — a 30-minute reference yields a voice with ~0.92 speaker similarity on most benchmarks. The catch: it locks to the $99/mo Pro tier and a maximum of 10 voices.

Orchard ships two cloning engines depending on the language: F5 for Spanish (with a rioplatense fine-tune we trained on porteño speech — nobody else has this) and XTTS for the other 16 languages. Reference is 6-60 s, and once cloned, every subsequent synth in that voice is on the per-minute plan rate. Voice limits scale by plan: 1 voice on Free, 3 on Hobby, 10 on Starter, 50 on Pro.

  • Spanish (rioplatense): Orchard wins on accent fidelity — ElevenLabs uses generic Spanish.
  • Spanish (neutral): Roughly comparable quality, Orchard ~5× cheaper per synth.
  • English: ElevenLabs Professional is sharper on long-form narration if you can spend $99+ and have the 30 min reference.
  • Other 16 languages (XTTS): Orchard covers them with a single API; ElevenLabs has more voices but the per-synth cost makes batch generation expensive.
Coverage vs depth

Languages

ElevenLabs covers 32 languages with their Multilingual v2 model. Orchard ships 17 — the gap is real if you're targeting Bulgarian, Vietnamese, or Tagalog. Where Orchard pulls ahead is per-language depth: we ship a fine-tuned rioplatense Spanish (F5-AR) that captures the porteño accent specifically, and our Spanish synthesis quality across all LATAM dialects is meaningfully better than generic-Spanish offerings.

  • If you ship in EN / ES / PT / FR / DE / IT / HI → Orchard wins on price by a factor of 10×.
  • If you need Bulgarian, Tagalog, Korean or other less common languages → ElevenLabs has broader coverage today.
  • If you need a voice that nails Argentine Spanish specifically → only Orchard ships a fine-tune for it.
One bill, three products

What's included beyond TTS

ElevenLabs is a TTS-first company that recently added Scribe (STT) and Studio (dubbing) as separate products with separate metering. Orchard ships three products on the same balance:

Speech-to-Text

OpenAI Whisper-compatible API. 50-80× real-time on the cluster. $0.00042/min, same balance as TTS. Drop-in for OpenAI SDK.

Speaker diarization

pyannote.audio 3.1 on GPU. 30-min audio diarized in 4s. 1.5× per-minute, included quota on every plan from Free up.

Webhooks for async jobs

POST audio, set webhook_url, get the transcript or synth posted to your endpoint when ready. No polling needed.

Honest about the lift

Migration

Unlike Whisper (where we're API-compatible with the OpenAI SDK), ElevenLabs uses its own request shape. Migration is a code change, not an env var swap. Three calls map cleanly:

// ElevenLabs
await elevenlabs.textToSpeech.convert(voice_id, {
  text: "Hola, qué tal", model_id: "eleven_multilingual_v2"
});

// Orchard
await fetch("https://api.orchardrun.com/v1/tts/generate", {
  method: "POST",
  headers: { Authorization: `Bearer ${ORCHARD_KEY}`, "Content-Type": "application/json" },
  body: JSON.stringify({ text: "Hola, qué tal", voice_id: "es_MX-claude" }),
});

For voice cloning, our API takes a 6-60 s reference WAV plus a short text prompt and returns the voice_id you can then synth against. The cloning step is a one-time call per voice; subsequent synths are billed on the normal per-minute rate. We document the full request/response shapes in /docs#clone-voice.

Honest tradeoff

When ElevenLabs is the right call

If your product is built around a single, premium-quality cloned voice — audiobook narration, premium agent characters, branded podcast hosts — and the $99/mo Pro tier doesn't move the needle on your budget, ElevenLabs Professional Voice Cloning is hard to beat on raw fidelity. Same for Studio Dubbing, which we don't have at all.

You should also stay on ElevenLabs if you specifically need their 1,000+ pre-made voice marketplace, their Voice Design (text-to-voice from a description), or coverage of one of the 15 languages we don't ship yet (Korean, Vietnamese, Bulgarian, etc.).

For everything else — Spanish-heavy workflows, high-volume generic-voice synthesis, products that also need STT or diarization, teams who prefer per-minute pricing to character metering — Orchard is the economically obvious choice.

Common pre-migration questions

FAQ

Why per minute instead of per character?+
Characters are an input metric you can't predict accurately (long sentences, punctuation, languages with different char-to-second ratios). Minutes of output audio are what you actually shipped to the user. It's also how the underlying inference cost behaves, so we pass it through cleanly.
Is voice cloning quality on par with ElevenLabs Professional?+
For Spanish (especially rioplatense), our F5-AR fine-tune wins on accent fidelity — ElevenLabs uses a generic Spanish model. For English long-form narration with a 30-min reference, ElevenLabs Professional is sharper. For everything in between (short to medium reference, multiple languages, batch synthesis), quality is comparable at a fraction of the cost.
Do you support the same languages?+
We ship 17 languages today; ElevenLabs covers 32. The 17 cover the major markets (EN, ES, PT, FR, DE, IT, HI, RU, PL, NL, TR, ZH, AR plus dialects). Bulgarian, Tagalog, Korean and other less common languages are not yet supported.
Can I migrate my existing cloned voices?+
Not directly — voice models aren't portable between providers. You'd need to re-clone using your original reference WAV through our API. The cloning call itself takes seconds; the upside is that each cloned voice on Orchard then generates synth at the normal per-minute plan rate rather than per character.
Is there a free tier I can test against?+
Yes: 500 minutes / month, no credit card, includes all three products (STT + TTS + Clone). Enough to clone a voice, generate a few thousand seconds of synth, and run our STT against your existing transcripts to validate quality before committing.

Per character. Per minute.

Voice cloning on every plan. STT and diarization on the same balance. 17 languages, with rioplatense Spanish that nobody else ships. Free 500 min a month to clone a voice and benchmark against your current bill.