Question 1

Which Text to Speech model should I use?

Accepted Answer

- Flash v2.5 - Ultra-low latency (~75ms) for real-time applications like voice agents
- Turbo v2.5 - Balanced quality and speed (~250-300ms) for interactive use cases
- Multilingual v2 - Consistent quality for long-form content up to 10,000 characters
- Eleven v3 - Maximum expressiveness and emotional range for creative applications

Question 2

What latency can I expect?

Accepted Answer

Flash v2.5 delivers ~75ms latency.
Turbo v2.5 typically responds in 250-300ms.
Both support streaming output, allowing playback to begin before generation completes.

Question 3

How many languages are supported?

Accepted Answer

Eleven v3 supports 70+ languages.
Flash v2.5 and Turbo v2.5 support 32 languages.
Multilingual v2 supports 70+ languages.

Question 4

What are the character limits per request?

Accepted Answer

Flash v2.5 and Turbo v2.5: 40,000 characters
Multilingual v2: 10,000 characters
Eleven v3: 3,000 characters

Question 5

Can I control emotion and delivery?

Accepted Answer

Use audio tags ([laughs], [whispers], [sighs], [door slam]) to control delivery, emotion, emphasis, pauses, and sound effects. Eleven v3 provides the most expressive control.

Question 6

How many voices are available?

Accepted Answer

The voice library includes 10,000+ voices. You can also clone voices or design custom voices using text prompts.

Question 7

Does the API support streaming?

Accepted Answer

Yes. Streaming allows you to start playback before the full audio is generated, reducing perceived latency in real-time applications.

Question 8

Can I use custom voices?

Accepted Answer

Yes. Reference any voice in your library by voice ID, including professional voice clones, instant voice clones, and voices you've designed.

Question 9

What audio formats are supported?

Accepted Answer

The API outputs MP3 by default. Additional formats include PCM and μ-law.

Question 10

How do I optimize for latency?

Accepted Answer

Use Flash v2.5 with streaming enabled. Keep requests under 1,000 characters. Enable WebSocket connections for persistent real-time applications.

Question 11

Is pronunciation customizable?

Accepted Answer

Yes. Use phonetic spelling or pronunciation dictionaries to control how specific words are spoken.

Question 12

What SDKs are available?

Accepted Answer

Official SDKs for Python, JavaScript/TypeScript are available. You can also use the HTTP API.

Question 13

Where can I find code examples?

Accepted Answer

Complete API reference, code examples, and integration guides are available at www.11labs.ru/docs/api-reference

Question 14

Do you offer enterprise support?

Accepted Answer

Yes. Enterprise plans include SOC 2 compliance, HIPAA support, GDPR compliance, EU data residency, zero retention mode, dedicated support, and custom SLAs.

Text to Speech API

Ultra-realistic and low latency speech generation

Built on the most powerful Voice AI models

Flash v2.5

Turbo v2.5

Multilingual v2

Eleven v3

Everything you need to build production-ready speech

Control emotion and delivery

Access 10,000+ voices

Voice design & cloning

Multi-speaker dialogue

Audio events and direction

Pronunciation dictionaries

Powering world’s leading companies and brands

APIs built for production

Enterprise-level data protection

Python and TypeScript SDKs

Elevated support and custom deployments

Frequently asked questions