Question 1

How does this text-to-speech work?

Accepted Answer

It uses the Web Speech API's SpeechSynthesis interface, which is built into every modern browser. Your text is handed to the synthesizer your operating system already ships (macOS Speech, Windows Narrator, Android TTS, etc.), so there is no API key, no rate limit, and no per-character charge.

Question 2

Why are some voices high-quality and others robotic?

Accepted Answer

You hear the voices your operating system has installed. macOS and iOS ship with very natural-sounding "Premium" voices (Samantha, Daniel, Karen). Windows ships with Microsoft's neural voices. Linux defaults to espeak which sounds robotic. Install additional system voices to get more variety — the picker updates automatically.

Question 3

Why does the voice list look different on Chrome vs Safari?

Accepted Answer

Each browser exposes the voices its host OS provides, plus its own bundled cloud-style voices. Chrome adds "Google" voices that route through Google servers when online (we do not pass any data ourselves — Chrome handles this). Safari only uses macOS/iOS voices. Use the language filter to find the best voice for your text.

Question 4

Can I download the speech as an MP3 or WAV?

Accepted Answer

Yes — use the "Record & download" button on this page (desktop Chrome, Edge, and Opera only). It uses the browser's tab-audio capture API: when prompted, choose this tab in the picker and tick the "Share tab audio" / "Share system audio" checkbox before clicking Share. The synthesised speech is captured into a WebM (or MP4) audio file you can download. Safari, Firefox, and mobile browsers don't expose system audio yet — on those, use the Audio Recorder tool with your microphone instead.

Question 5

Why does the voice cut off after a long paragraph?

Accepted Answer

Some browsers (notably Chrome) have a hidden 15-second limit on speech utterances. Long text is automatically split into shorter chunks and played sequentially to work around this. If you hear gaps, try shorter paragraphs separated by punctuation.

Question 6

Is the text I type private?

Accepted Answer

Completely. The text never leaves your browser. We never log it, never analyze it, and never store it. The synthesis happens via your local OS's text-to-speech engine. Even autocomplete suggestions are not collected.

Question 7

Why is in-browser audio processing slower than online tools?

Accepted Answer

Server-side tools use multi-threaded native FFmpeg running on dedicated CPUs with fast disks and parallel pipelines. Our engine is FFmpeg compiled to WebAssembly, which runs single-threaded inside your browser tab and has no access to native hardware acceleration. That makes browser-based jobs typically 3–8× slower than a server. The trade-off is total privacy: your audio file is never uploaded, never logged, and never stored — closing the tab erases everything from memory immediately. For most clips up to a few minutes the wait is small, and for sensitive recordings (voice memos, drafts, confidential meetings) the privacy gain is well worth it.

Question 8

Is my audio uploaded?

Accepted Answer

No. Everything runs entirely inside your browser tab using FFmpeg compiled to WebAssembly. The file is read into local memory only, processed in the same tab, and the result is offered as a direct download. Nothing is transmitted to any server, no account is required, no analytics are tied to your file, and closing the tab discards every byte from memory.

Question 9

How big a file can I process?

Accepted Answer

The file picker accepts audio inputs up to about 1 GB, which is well above what mainstream "free tier" online converters allow. The real ceiling is your device — everything runs inside your browser tab, which shares memory with the rest of the page. Most podcasts, songs, and voice memos sit comfortably under that limit even on a phone. If a very large lossless WAV or FLAC ever fails, trim it first or transcode to MP3 / Opus to bring the size down before re-running the tool.

Question 10

Which audio formats are supported?

Accepted Answer

MP3, WAV, OGG (Vorbis and Opus), FLAC, M4A (AAC), AAC, Opus, AIFF, and WMA all decode reliably via FFmpeg WASM. Output formats depend on the specific tool — most editing tools default to MP3 (universal) or WAV (lossless) but expose a format picker so you can pick the one that fits your downstream player or DAW.

Question 11

Which browsers are supported?

Accepted Answer

Recent Chrome, Edge, Firefox, Safari, and other Chromium-based browsers all work. The tool relies on WebAssembly and SharedArrayBuffer, which require the page to be served over HTTPS with the right cross-origin headers — this site is configured correctly by default. On phones the same code runs but is slower than on a desktop because mobile CPUs are weaker.

Question 12

Is there a watermark, sign-up wall, or usage cap?

Accepted Answer

No. The tool is completely free, requires no account, attaches no watermark, applies no usage caps, and shows no popup ads on your output. Because the work happens on your own device, there is no per-user quota for us to enforce — your hardware and browser memory are the only limits. The download is the file you would get from running FFmpeg locally, nothing more, nothing less.

Question 13

Are there any usage limits on Text to Speech?

Accepted Answer

Inputs are capped at 0 MB per file, which keeps memory usage stable across phones, tablets and older laptops. You can run Text to Speech as often as you need; every run produces a full-quality result.

Question 14

Does Text to Speech support batch processing?

Accepted Answer

Text to Speech processes one input at a time by design — it keeps memory usage predictable on lower-end devices and makes results easier to verify. To handle a folder, run the tool once per file; the page stays loaded between runs and remembers your last-used settings, so the second run is essentially instant.

Question 15

Does Text to Speech have an API?

Accepted Answer

Text to Speech is a browser-only tool by design and does not expose a hosted API. The reason is the same as the privacy story: there is no Favtoo backend doing the work, so there is no service to call. If you need to script the same transformation, the underlying engine (standard browser APIs) is open-source and can be used directly from your own code.

Question 16

Are jobs run with Text to Speech stored anywhere?

Accepted Answer

Favtoo keeps no copy of your file because Favtoo never receives your file. Text to Speech runs entirely in your browser, the input is held only in your tab's memory, and closing the tab discards it. There is no opt-in cloud history, no "recent jobs" panel synced to an account, and no server-side retention to configure — the architecture simply has nowhere for your file to be stored.

Question 17

How accurate is Text to Speech?

Accepted Answer

Text to Speech is built on standard browser APIs, which is the same class of engine used by professional audio editing and conversion pipelines. For deterministic operations, the output is byte-identical to what an equivalent CLI run would produce; for operations involving a codec or a model, the result is well within the range of what comparable tools generate. If you have a specific reference output you need to match, run a small test job first to confirm the configuration produces what you expect.

Question 18

Does Text to Speech work in Safari, Firefox, Chrome and Edge?

Accepted Answer

Text to Speech works in any modern browser released in the last few years — Chrome, Edge, Firefox, Safari, Brave, Arc and the major Chromium derivatives are all supported. The underlying engine relies on widely-supported web APIs, so there is nothing exotic to install. If you are on a very old browser version and the tool fails to load, updating to the latest release of your preferred browser is the only fix needed.

Question 19

Is it safe to use Text to Speech on confidential files?

Accepted Answer

Your file is processed inside your browser by standard browser APIs. The engine reads the file's bytes from your tab's memory, computes the result, and writes the result back into the tab. You can confirm what the page does by opening developer tools and watching the Network tab during a run — the requests you see are for the tool's static assets only.

Question 20

Can I use Text to Speech offline?

Accepted Answer

Once the page is loaded, Text to Speech can complete jobs without an active internet connection — the engine is bundled with the page, so there is no per-job network call. The initial page load does require a connection (to fetch the static assets), but after that you can disconnect entirely and the tool will still work. This is a side-effect of the local-first architecture, not a deliberate "offline mode" feature.

About Text to Speech

How it works

Common use cases

FAQ

About Text to Speech

How it works

Common use cases

FAQ

Explore more Audio Tools