A Python-based audio transcription tool that leverages AssemblyAI's powerful speech-to-text API to convert audio files into accurate, timestamped transcripts.
- Multi-format Support: Handles MP3, WAV, M4A, and other common audio formats
- High Accuracy: Powered by AssemblyAI's state-of-the-art transcription engine
- Timestamp Support: Provides word-level timestamps for precise audio navigation
- Batch Processing: Process multiple audio files efficiently
- Progress Tracking: Real-time progress indicators for long-running operations
- Export Options: Save transcripts in TXT, SRT, or JSON formats
- Error Handling: Robust error handling with detailed logging
- Python 3.8 or higher
- AssemblyAI API key (get one at assemblyai.com)
-
Clone the repository:
git clone https://github.com/yourusername/audio-transcriber-mvp.git cd audio-transcriber-mvp -
Install dependencies:
pip install -r requirements.txt
-
Set up your API key:
# Create a .env file echo "ASSEMBLYAI_API_KEY=your_api_key_here" > .env
-
Run the transcriber:
python download_and_transcribe.py
from audio_transcriber import AudioTranscriber
# Initialize the transcriber
transcriber = AudioTranscriber()
# Transcribe a single file
transcript = transcriber.transcribe_file("path/to/audio.mp3")
print(transcript.text)# Transcribe with custom settings
transcript = transcriber.transcribe_file(
"audio.mp3",
language_code="en",
punctuate=True,
format_text=True,
word_boost=["technical", "terms"],
boost_param="high"
)
# Save transcript to file
transcriber.save_transcript(transcript, "output.txt", format="txt")# Transcribe a single file
python download_and_transcribe.py --file audio.mp3
# Transcribe multiple files
python download_and_transcribe.py --directory ./audio_files
# Specify output format
python download_and_transcribe.py --file audio.mp3 --format srtaudio_Transcriber/
├── README.md # Project documentation
├── requirements.txt # Python dependencies
├── download_and_transcribe.py # Main execution script
├── audio_transcriber.py # Core transcription module
├── utils/
│ ├── file_utils.py # File handling utilities
│ └── audio_utils.py # Audio processing utilities
└── config/
└── settings.py # Configuration management
The application can be configured through environment variables or a .env file:
# Required
ASSEMBLYAI_API_KEY=your_api_key_here
# Optional
ASSEMBLYAI_BASE_URL=https://api.assemblyai.com/v2
DEFAULT_LANGUAGE_CODE=en
DEFAULT_OUTPUT_FORMAT=txt
LOG_LEVEL=INFORun the test suite:
# Run all tests
python -m pytest tests/
# Run with coverage
python -m pytest tests/ --cov=audio_transcriber --cov-report=html- Processing Speed: ~1 minute of audio per minute of processing time
- Accuracy: 95%+ accuracy for clear speech in supported languages
- File Size Limits: Up to 1GB per file
- Supported Languages: 99+ languages including English, Spanish, French, German, etc.
- Fork the repository
- Create a feature branch (
git checkout -b feature/amazing-feature) - Commit your changes (
git commit -m 'Add amazing feature') - Push to the branch (
git push origin feature/amazing-feature) - Open a Pull Request
This project is licensed under the MIT License - see the LICENSE file for details.
- AssemblyAI for providing the transcription API
- The open-source community for various utility libraries
If you encounter any issues or have questions:
- Check the troubleshooting guide
- Search existing GitHub issues
- Create a new issue with detailed information
Built with ❤️ for developers who need reliable audio transcription