A powerful, lightweight web application for text-to-speech (TTS) and speech recognition functionality. No web development knowledge required - simply download the project files and open lingo.html in your Chrome browser to get started! Lingo provides an intuitive interface for reading text aloud and converting speech to text, all running directly in your browser with no dependencies or build process required.
Rant to Developers: This project was very intentionally designed to be free of any specific frameworks (Vue, React, etc.) so it's easily understandable to all web developers and no porting from one framework to another is required. I also wanted this app to be runnable by non-developers simply by opening an HTML page locally. It seems to be the case that nearly ALL AI projects strugle endlessly with TTS/STT never getting it right. Gemini Voice sucks, OpenAI Voice sucks, Github Copilot Voice absolutely sucks, etc and yes voice can be a bit tricky to get right, but trust me this JS has it perfected. You guys no longer have any excuses! This code shows how easy it truly is.
- Smart Reading: Read selected text, from cursor position, or the entire document
- Cursor Position Reading: Place your cursor anywhere in the text to start reading from that point
- Pause & Resume: Pause speech at any time and resume where you left off
- Voice Selection: Choose from all available system voices with language indicators
- Speed Control: Adjustable speaking rates from slow (0.85x) to ludicrous (1.35x)
- Persistent Settings: Voice and speed preferences automatically saved
- Cross-Browser Compatible: Works in Chrome, Firefox, Safari, and other modern browsers
- Continuous Dictation: Real-time speech-to-text conversion
- Auto-Restart: Seamlessly continues listening after pauses
- Smart Insertion: Text appears at cursor position, preserving existing content
- Visual Feedback: Textarea highlights when actively listening
- Chrome Optimized: Full functionality in Chrome/Chromium browsers
- Ctrl/Cmd + Enter: Start/stop text reading
- Ctrl/Cmd + M: Toggle microphone dictation
- Escape: Stop all active operations (TTS or speech recognition)
?mic=on: Automatically start mic dictation when the page loads
- Dark Theme: Easy-on-the-eyes default dark interface
- Responsive Design: Optimized for both desktop and mobile devices
- Real-time Status: Live feedback on current operations
- Accessible: Full keyboard navigation and screen reader support
- Simple Architecture: Clean separation of HTML, CSS, and JavaScript
The easiest way to use Lingo is to simply open the HTML file directly:
- Download
lingo.html,lingo.css, andlingo.jsto the same folder on your computer - Double-click
lingo.htmlor right-click โ Open with โ Chrome (or any browser that supports Web Speech APIs) - Start using - no server setup required!
Note: Chrome/Chromium browsers provide the best experience with full TTS and speech recognition support.
You only need to use the web server if you want to avoid repeated microphone permission prompts. Browsers require microphone permission each time when running from file:// URLs, but remember your choice when running from http://localhost.
For this setup:
- Clone or download the project files
- Run the startup script:
./run.sh
- Your browser will automatically open to
http://localhost:8009/lingo.html
If you prefer to run it manually:
# Start a local HTTP server
python3 -m http.server 8009
# Open your browser to:
# http://localhost:8009/lingo.html- Type or paste text into the main textarea
- Drag and drop text from other applications directly into the textarea (especially handy on Linux - simply select text from any app and drag it in)
- Position your cursor where you want reading to begin, or select specific text to read only that portion
- Click "๐ Read" or press Ctrl/Cmd + Enter
- Choose your preferred voice and speaking speed from the dropdowns
- Click "โธ๏ธ Pause" to pause reading, then "
โถ๏ธ Resume" to continue from where you left off - Click "โน๏ธ Stop" or press Escape to stop reading completely
Tip: If your cursor is at the very end of the text, clicking Read Aloud will start from the beginning.
- Click "๐ค Mic" or press Ctrl/Cmd + M
- Speak clearly - your words will appear in the textarea
- The app continues listening until you stop it manually
- Click "โน๏ธ Stop" or press Escape to stop dictation
Tip: Add
?mic=onto the URL to automatically start mic dictation when the page loads (e.g.,http://localhost:8009/lingo.html?mic=on).
- Three-File Structure: Clean separation of HTML, CSS, and JavaScript
- No Build Process: Runs directly in browser without compilation
- Zero Dependencies: Uses only native Web APIs
- Portable: Copy
lingo.html,lingo.css, andlingo.jsto any folder and it works
| Browser | TTS Support | Speech Recognition |
|---|---|---|
| Chrome/Chromium | โ Full | โ Full |
| Firefox | โ Full | โ Not supported |
| Safari | โ Full | โ Not supported |
| Edge | โ Full | โ Full |
- Speech Synthesis API: For text-to-speech functionality
- Web Speech API: For speech recognition (webkit-prefixed)
- localStorage: For persistent settings
- File API: For handling text input/output
lingo/
โโโ lingo.html # Main HTML structure
โโโ lingo.css # Styles and theming
โโโ lingo.js # Application logic
โโโ run.sh # Startup script for local development
โโโ kill.sh # Stop the local server
โโโ README.md # This documentation
The run.sh script provides a convenient way to start the application:
- Port Management: Checks for existing servers on port 8009 and terminates them
- Server Startup: Launches Python's built-in HTTP server
- Browser Launch: Automatically opens your default browser to the app
- Process Management: Keeps the server running and handles graceful shutdown
- Security: Modern browsers require HTTPS or localhost for Web APIs
- Cross-Origin: Direct file access (
file://) blocks certain features - Port 8009: Chosen to avoid conflicts with common development ports
- Content Review: Listen to written content for proofreading
- Accessibility: Assist users with reading difficulties
- Multitasking: Consume text content while doing other activities
- Note Taking: Quickly dictate thoughts and ideas
- Language Learning: Hear proper pronunciation of text
- Voice Memos: Convert speech to text for documentation
For testing and debugging, Lingo exposes utility functions:
// Speak any text directly
window.__tts.speakNow("Hello world");
// Stop current speech
window.__tts.cancel();- Graceful Degradation: Features disable cleanly when unsupported
- Auto-Recovery: Speech recognition restarts automatically after interruptions
- User Feedback: Clear status messages for all operations
Lingo follows a simple three-file architecture for easy maintenance. When contributing:
- Keep files organized - HTML in
lingo.html, styles inlingo.css, logic inlingo.js - Test across browsers - especially Chrome vs Firefox
- Maintain responsive design - mobile and desktop compatibility
- Preserve accessibility - keyboard navigation and screen readers
This project is open source. Feel free to use, modify, and distribute as needed.
- Language detection and automatic voice matching
- Export functionality for dictated text
- Custom voice pitch and volume controls
- Batch processing for multiple text files
- Integration with cloud speech services for enhanced recognition
Lingo - Bringing voice to your text and text to your voice! ๐๏ธโจ
