Skip to content

Production Python pipeline for speech-to-text with timestamp extraction. Leverages AssemblyAI API for accurate transcription - demonstrates AI service integration and data processing workflows.

Notifications You must be signed in to change notification settings

Tylarcam/audio_Transcriber

Repository files navigation

Audio Transcriber MVP

A Python-based audio transcription tool that leverages AssemblyAI's powerful speech-to-text API to convert audio files into accurate, timestamped transcripts.

🎯 Features

  • Multi-format Support: Handles MP3, WAV, M4A, and other common audio formats
  • High Accuracy: Powered by AssemblyAI's state-of-the-art transcription engine
  • Timestamp Support: Provides word-level timestamps for precise audio navigation
  • Batch Processing: Process multiple audio files efficiently
  • Progress Tracking: Real-time progress indicators for long-running operations
  • Export Options: Save transcripts in TXT, SRT, or JSON formats
  • Error Handling: Robust error handling with detailed logging

🚀 Quick Start

Prerequisites

Installation

  1. Clone the repository:

    git clone https://github.com/yourusername/audio-transcriber-mvp.git
    cd audio-transcriber-mvp
  2. Install dependencies:

    pip install -r requirements.txt
  3. Set up your API key:

    # Create a .env file
    echo "ASSEMBLYAI_API_KEY=your_api_key_here" > .env
  4. Run the transcriber:

    python download_and_transcribe.py

📖 Usage

Basic Usage

from audio_transcriber import AudioTranscriber

# Initialize the transcriber
transcriber = AudioTranscriber()

# Transcribe a single file
transcript = transcriber.transcribe_file("path/to/audio.mp3")
print(transcript.text)

Advanced Usage

# Transcribe with custom settings
transcript = transcriber.transcribe_file(
    "audio.mp3",
    language_code="en",
    punctuate=True,
    format_text=True,
    word_boost=["technical", "terms"],
    boost_param="high"
)

# Save transcript to file
transcriber.save_transcript(transcript, "output.txt", format="txt")

Command Line Interface

# Transcribe a single file
python download_and_transcribe.py --file audio.mp3

# Transcribe multiple files
python download_and_transcribe.py --directory ./audio_files

# Specify output format
python download_and_transcribe.py --file audio.mp3 --format srt

📁 Project Structure

audio_Transcriber/
├── README.md                 # Project documentation
├── requirements.txt          # Python dependencies
├── download_and_transcribe.py  # Main execution script
├── audio_transcriber.py      # Core transcription module
├── utils/
│   ├── file_utils.py         # File handling utilities
│   └── audio_utils.py        # Audio processing utilities
└── config/
    └── settings.py           # Configuration management

Configuration

The application can be configured through environment variables or a .env file:

# Required
ASSEMBLYAI_API_KEY=your_api_key_here

# Optional
ASSEMBLYAI_BASE_URL=https://api.assemblyai.com/v2
DEFAULT_LANGUAGE_CODE=en
DEFAULT_OUTPUT_FORMAT=txt
LOG_LEVEL=INFO

🧪 Testing

Run the test suite:

# Run all tests
python -m pytest tests/

# Run with coverage
python -m pytest tests/ --cov=audio_transcriber --cov-report=html

📊 Performance

  • Processing Speed: ~1 minute of audio per minute of processing time
  • Accuracy: 95%+ accuracy for clear speech in supported languages
  • File Size Limits: Up to 1GB per file
  • Supported Languages: 99+ languages including English, Spanish, French, German, etc.

🤝 Contributing

  1. Fork the repository
  2. Create a feature branch (git checkout -b feature/amazing-feature)
  3. Commit your changes (git commit -m 'Add amazing feature')
  4. Push to the branch (git push origin feature/amazing-feature)
  5. Open a Pull Request

📝 License

This project is licensed under the MIT License - see the LICENSE file for details.

Acknowledgments

  • AssemblyAI for providing the transcription API
  • The open-source community for various utility libraries

📞 Support

If you encounter any issues or have questions:

  1. Check the troubleshooting guide
  2. Search existing GitHub issues
  3. Create a new issue with detailed information

Built with ❤️ for developers who need reliable audio transcription

About

Production Python pipeline for speech-to-text with timestamp extraction. Leverages AssemblyAI API for accurate transcription - demonstrates AI service integration and data processing workflows.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages