Skip to content

Feature Proposal: Speech Transcription CLI Extension (cycodsp) #72

@robch

Description

@robch

Overview

This issue proposes adding a new speech transcription and audio processing extension to the Cycod CLI toolset, tentatively named cycodsp (cycod speech).

Motivation

Inspired by the recent addition of the ai speech transcribe command in Azure AI CLI (Fast Transcription API), this feature would enable workflows like:

  • Download audio from YouTube videos or podcasts
  • Transcribe audio using Azure Speech services
  • Analyze transcriptions for insights using AI
  • Example use case: Process "You Are Not So Smart" podcast episodes to extract insights about cognitive biases and intellectual humility

Existing Foundation

We already have relevant code in personal repos:

  • robch/ytd: YouTube downloader + transcriber using Azure Speech SDK
  • robch/searchy: Web search and content extraction tool

Proposed Features

Core Commands

# Basic transcription
cycodsp transcribe --file audio.wav

# YouTube workflow  
cycodsp youtube --url "https://youtube.com/watch?v=VIDEO_ID" --transcribe

# Podcast processing
cycodsp podcast --url "podcast-episode.mp3" --transcribe

# AI analysis
cycodsp analyze --transcript "episode.txt" --prompt "Extract key insights"

Key Capabilities

  • Integration with Azure Speech Fast Transcription API
  • Multiple output formats (text, SRT, VTT, JSON)
  • Speaker diarization support
  • AI-powered content analysis
  • Integration with existing Cycod chat features

Implementation Approach

  1. Phase 1: Basic transcription using Azure Speech Fast Transcription API
  2. Phase 2: Integrate YouTube download from existing ytd repo
  3. Phase 3: Add AI analysis and Cycod chat integration
  4. Phase 4: Advanced features (batch processing, RSS monitoring)

Documentation

Detailed proposal and technical design: todo/speech-transcription-ideas.md in branch robch/2512-dec05-speech-transcription-ideas

Branch: https://github.com/robch/cycod/tree/robch/2512-dec05-speech-transcription-ideas
Documentation: speech-transcription-ideas.md

Questions for Discussion

  • Should this be a separate CLI tool or integrated into main cycod?
  • Preferred approach for YouTube/podcast content licensing compliance?
  • Integration strategy with existing Cycod infrastructure?
  • Priority of different phases?

Related Links

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions