Text to Speech is a audio tool that runs in your browser. Type or paste text, pick a system voice, and listen instantly. Adjust speaking rate (0.5×–2×), pitch, and volume in real time. Uses your browser's built-in Web Speech API — no cloud TTS, no API keys, no costs. The page you are reading is the same workspace you will use to do the work: pick a file or paste your input, choose the options that matter to you, and the tool produces the result on your device.
Text to Speech fits naturally into the workflow of podcasters preparing episodes and teachers recording lessons, both of whom typically need a fast result inside the browser. There is no learning curve to budget for: anyone who has used a typical web upload form can complete a run on the first try.
Text to Speech is shaped for the gap between "I'll do it by hand" and "I'll script it." When the job is small enough that automating it would take longer than doing it, but annoying enough to want a focused tool — that is the situation this page is built for.
Under the hood, Text to Speech uses standard browser APIs to do the actual work. Input runs through the same engine, with a per-file ceiling of 0 MB so memory usage stays predictable on lower-end laptops and tablets. The engine ships as part of the page bundle, so once the page is loaded the tool keeps working even if your network connection drops.
The execution model is straightforward: your file is bytes in the tab's memory, the engine reads those bytes, computes the result, and hands the result back to the browser. The transformation happens locally, which is why the tool keeps working when your network connection drops mid-job and why it produces the same result every run for the same input.
Text to Speech fits naturally next to several adjacent tools. Common companions include Audio Recorder, Tone Generator, White Noise Generator, and Add Audio to Video — combine them when the job needs more than one transformation. After running Text to Speech, many users move on to Audio Recorder and Add Audio to Video. Each tool is a separate page so you can compose the exact pipeline you need.
The hard constraints are easy to remember. Maximum input: 0 MB. Multiple files per run: no — one input at a time, by design, to keep results predictable. The same controls apply on every run.
The transformation in Text to Speech is deterministic — the same input plus the same options produces the same result every run. That predictability matters when the result has to match an upstream specification or be reproducible later.
When the job finishes, Text to Speech hands you the result as a sensibly named file. Filenames are derived from your input where possible, so a quick batch of jobs leaves you with a tidy folder rather than a pile of generic "output (3)" files. Nothing is auto-saved on Favtoo's side because nothing was ever sent there.
Text to Speech is structured around the idea that a useful tool should be its own page. Open the page, do the work, close the tab — the page is the entire product. There is no onboarding flow because there is nothing to onboard into.
Text to Speech fits the gap where opening a desktop app feels heavy and writing a script feels overkill. The page handles the common audio editing and conversion task with sensible defaults so a single visit usually completes the job; for highly specialised work, a dedicated desktop application can offer more knobs to turn.
Pro tip: Text to Speech works just as well in a private/incognito window as in a normal one, which is occasionally useful when you want zero browser-history footprint of the job. Another tip: if the tool ever feels slow, it is almost always because the browser tab is competing for CPU with another tab — pausing or closing the heavy ones gives the engine room to work.
When something goes wrong, the cause is usually one of three things: a malformed input, a browser that is out of memory, or a corporate proxy that is interfering with the page's static assets. The first two are easy to diagnose; the third typically requires asking your IT team to allow standard browser APIs to load.
That is essentially everything Text to Speech does and how it does it. Open the tool above, drop in your input, and the work happens in the page. If you find yourself reaching for it often, bookmark the page — it loads quickly on subsequent visits, and your most-recent settings are remembered for the rest of the session.
It uses the Web Speech API's SpeechSynthesis interface, which is built into every modern browser. Your text is handed to the synthesizer your operating system already ships (macOS Speech, Windows Narrator, Android TTS, etc.), so there is no API key, no rate limit, and no per-character charge.
You hear the voices your operating system has installed. macOS and iOS ship with very natural-sounding "Premium" voices (Samantha, Daniel, Karen). Windows ships with Microsoft's neural voices. Linux defaults to espeak which sounds robotic. Install additional system voices to get more variety — the picker updates automatically.
Each browser exposes the voices its host OS provides, plus its own bundled cloud-style voices. Chrome adds "Google" voices that route through Google servers when online (we do not pass any data ourselves — Chrome handles this). Safari only uses macOS/iOS voices. Use the language filter to find the best voice for your text.
Yes — use the "Record & download" button on this page (desktop Chrome, Edge, and Opera only). It uses the browser's tab-audio capture API: when prompted, choose this tab in the picker and tick the "Share tab audio" / "Share system audio" checkbox before clicking Share. The synthesised speech is captured into a WebM (or MP4) audio file you can download. Safari, Firefox, and mobile browsers don't expose system audio yet — on those, use the Audio Recorder tool with your microphone instead.
Some browsers (notably Chrome) have a hidden 15-second limit on speech utterances. Long text is automatically split into shorter chunks and played sequentially to work around this. If you hear gaps, try shorter paragraphs separated by punctuation.
Completely. The text never leaves your browser. We never log it, never analyze it, and never store it. The synthesis happens via your local OS's text-to-speech engine. Even autocomplete suggestions are not collected.
Server-side tools use multi-threaded native FFmpeg running on dedicated CPUs with fast disks and parallel pipelines. Our engine is FFmpeg compiled to WebAssembly, which runs single-threaded inside your browser tab and has no access to native hardware acceleration. That makes browser-based jobs typically 3–8× slower than a server. The trade-off is total privacy: your audio file is never uploaded, never logged, and never stored — closing the tab erases everything from memory immediately. For most clips up to a few minutes the wait is small, and for sensitive recordings (voice memos, drafts, confidential meetings) the privacy gain is well worth it.
No. Everything runs entirely inside your browser tab using FFmpeg compiled to WebAssembly. The file is read into local memory only, processed in the same tab, and the result is offered as a direct download. Nothing is transmitted to any server, no account is required, no analytics are tied to your file, and closing the tab discards every byte from memory.
The file picker accepts audio inputs up to about 1 GB, which is well above what mainstream "free tier" online converters allow. The real ceiling is your device — everything runs inside your browser tab, which shares memory with the rest of the page. Most podcasts, songs, and voice memos sit comfortably under that limit even on a phone. If a very large lossless WAV or FLAC ever fails, trim it first or transcode to MP3 / Opus to bring the size down before re-running the tool.
MP3, WAV, OGG (Vorbis and Opus), FLAC, M4A (AAC), AAC, Opus, AIFF, and WMA all decode reliably via FFmpeg WASM. Output formats depend on the specific tool — most editing tools default to MP3 (universal) or WAV (lossless) but expose a format picker so you can pick the one that fits your downstream player or DAW.
Recent Chrome, Edge, Firefox, Safari, and other Chromium-based browsers all work. The tool relies on WebAssembly and SharedArrayBuffer, which require the page to be served over HTTPS with the right cross-origin headers — this site is configured correctly by default. On phones the same code runs but is slower than on a desktop because mobile CPUs are weaker.
No. The tool is completely free, requires no account, attaches no watermark, applies no usage caps, and shows no popup ads on your output. Because the work happens on your own device, there is no per-user quota for us to enforce — your hardware and browser memory are the only limits. The download is the file you would get from running FFmpeg locally, nothing more, nothing less.
Inputs are capped at 0 MB per file, which keeps memory usage stable across phones, tablets and older laptops. You can run Text to Speech as often as you need; every run produces a full-quality result.
Text to Speech processes one input at a time by design — it keeps memory usage predictable on lower-end devices and makes results easier to verify. To handle a folder, run the tool once per file; the page stays loaded between runs and remembers your last-used settings, so the second run is essentially instant.
Text to Speech is a browser-only tool by design and does not expose a hosted API. The reason is the same as the privacy story: there is no Favtoo backend doing the work, so there is no service to call. If you need to script the same transformation, the underlying engine (standard browser APIs) is open-source and can be used directly from your own code.
Favtoo keeps no copy of your file because Favtoo never receives your file. Text to Speech runs entirely in your browser, the input is held only in your tab's memory, and closing the tab discards it. There is no opt-in cloud history, no "recent jobs" panel synced to an account, and no server-side retention to configure — the architecture simply has nowhere for your file to be stored.
Text to Speech is built on standard browser APIs, which is the same class of engine used by professional audio editing and conversion pipelines. For deterministic operations, the output is byte-identical to what an equivalent CLI run would produce; for operations involving a codec or a model, the result is well within the range of what comparable tools generate. If you have a specific reference output you need to match, run a small test job first to confirm the configuration produces what you expect.
Text to Speech works in any modern browser released in the last few years — Chrome, Edge, Firefox, Safari, Brave, Arc and the major Chromium derivatives are all supported. The underlying engine relies on widely-supported web APIs, so there is nothing exotic to install. If you are on a very old browser version and the tool fails to load, updating to the latest release of your preferred browser is the only fix needed.
Your file is processed inside your browser by standard browser APIs. The engine reads the file's bytes from your tab's memory, computes the result, and writes the result back into the tab. You can confirm what the page does by opening developer tools and watching the Network tab during a run — the requests you see are for the tool's static assets only.
Once the page is loaded, Text to Speech can complete jobs without an active internet connection — the engine is bundled with the page, so there is no per-job network call. The initial page load does require a connection (to fetch the static assets), but after that you can disconnect entirely and the tool will still work. This is a side-effect of the local-first architecture, not a deliberate "offline mode" feature.
Audio Recorder
Record from your microphone directly in the browser. Pick quality (high, medium, low), toggle echo cancellation, noise suppression and auto-gain, then save to WebM/Opus or M4A/AAC. Audio is captured locally — nothing is uploaded.
Tone Generator
Generate a pure tone at any frequency from 20 Hz to 20 kHz. Pick a sine, square, triangle, or sawtooth waveform, choose duration, amplitude, and mono/stereo. Exports a 16-bit PCM WAV file at 44.1 kHz with built-in click-preventing fades.
Silence Generator
Generate a perfectly silent WAV file of any length from 1 second up to 1 hour. Pick mono or stereo, get a 16-bit PCM WAV at 44.1 kHz. Useful as padding between clips, intro silence, leader audio for video timing, or test material.
White Noise Generator
Generate white, pink, or brown noise as a 16-bit PCM WAV file. Pick noise type, duration up to 1 hour, amplitude, and mono/stereo. Useful for sleep, focus, masking distractions, audio testing, and as a backing layer for ambient music.
Metronome
A precise browser-based metronome powered by the Web Audio API. Set BPM from 30 to 300, choose a time signature, accent the first beat, and use tap-tempo to sync. Click timing is sample-accurate using lookahead scheduling — much steadier than typical JavaScript setInterval beats.
Audio Trimmer
Trim any audio file to a precise start and end time. Outputs a lossless stream-copy by default (no quality loss, very fast) or re-encodes to MP3, WAV, OGG, or M4A. Files are processed entirely in your browser with FFmpeg WebAssembly.
Audio Splitter
Split a long audio file into N equal-length parts and download them as a ZIP. Each part is named sequentially. Great for chapterizing audiobooks, podcasts, or long DJ mixes. Runs entirely in your browser with FFmpeg WebAssembly.
Audio Merger
Merge up to 12 audio files into one continuous track. Supports MP3, WAV, OGG, M4A, AAC, FLAC, Opus, AIFF and more. Optional loudness normalization to even out clip levels. Files are processed entirely in your browser with FFmpeg WebAssembly.