Skip to content

Ritchie-R-H/Neural-Speech-Decoding-M

Repository files navigation

Neural Speech Project

Overview

This project focuses on decoding neural signals (ECoG and sEMG) for speech synthesis and analysis. It implements various neural network models to translate brain and muscle activity into speech representations.

Project Structure

neural_speech_project/          ← Git repo root
│
├── data/                       ← *never* committed; link or mount here
│   ├── ecog_fm/                ← Functional-mapping ECoG (raw or minimally cleaned)
│   ├── ecog_daily/             ← Daily-conversation ECoG
│   └── semg/                   ← sEMG recordings
│
├── datasets/                   ← Python loaders & preprocessing
├── models/                     ← Neural network models
├── embedders/                  ← Code to talk to *frozen* embedding models
├── synthesizers/               ← *frozen* vocoders / neural TTS wrappers
├── configs/                    ← Hydra⁄Ω-Conf YAMLs
├── scripts/                    ← Lightweight CLI entry points
├── utils/                      ← Generic helpers
├── tests/                      ← Unit- & smoke-tests
├── docs/                       ← Documentation
└── notebooks/                  ← Exploratory analyses

Implemented Features

  • Datasets
    • Functional Mapping (FM) ECoG dataset loading
    • Daily Conversation ECoG dataset loading
    • sEMG dataset loading with H5 format preprocessing
  • Models
    • ConvRNN decoder
    • Classification and CTC heads for unit prediction
    • Regression head for direct feature prediction
  • Embedders & Synthesizers
    • HuBERT embedder and synthesizer
    • Sylber synthesizer with regression support
  • Training Infrastructure
    • Training script with loss tracking and checkpointing
    • Wandb integration for experiment tracking

Roadmap

  • Daily Dataset Training Script testing
  • Additional decoder architectures:
    • Conformer
    • Spatial attention module
    • Q-former module
  • Expanded head implementations:
    • Enhanced CTC, Regression, and Classification
  • Phoneme CTC LM synthesizer integration

Getting Started

Prerequisites

  • Python 3.8+
  • CUDA-compatible GPU (for training)

Installation

  1. Clone this repository
  2. Copy .env.example to .env and configure environment variables
  3. Install dependencies: pip install -r requirements.txt

Usage

For training:

python scripts/train.py experiment=your_experiment_config

For inference:

python scripts/infer.py --checkpoint /path/to/checkpoint --input /path/to/input

Documentation

Refer to the docs/ directory for detailed documentation.

Citation

If you use this code in your research, please cite our work.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published