NileTTS

Paper: https://arxiv.org/abs/2602.15675

NileTTS is a large-scale Egyptian Arabic Text-to-Speech dataset and fine-tuned XTTS model. This repository contains the code for data generation, model training, and evaluation as described in our paper.

Resources

Resource	Link
Model Weights	KickItLikeShika/NileTTS-XTTS
Dataset	KickItLikeShika/NileTTS-dataset

Overview

NileTTS addresses the lack of high-quality TTS resources for Egyptian Arabic by providing:

38 hours of transcribed Egyptian Arabic speech across medical, sales, and general conversation domains
A fine-tuned XTTS v2 model optimized for Egyptian Arabic synthesis
A reproducible synthetic data generation pipeline

Results

Model	WER	CER	Speaker Similarity
XTTS v2 (Baseline)	26.8%	8.1%	0.713
NileTTS (Ours)	18.8%	4.1%	0.755

Quick Start

Installation

pip install -r requirements.txt

Using the Model

See playground.ipynb for a complete example of loading and using the model and the dataset.

Repository Structure

NileTTS/
├── generate-data.py
├── evaluate.py
├── playground.ipynb
├── requirements.txt
└── README.md

Data Generation Pipeline

The generate-data.py script processes audio files generated by NotebookLM into training-ready chunks with transcriptions and speaker labels.

Prerequisites

Before running the script, you need:

Audio file: An .m4a or .wav file containing Egyptian Arabic speech (e.g., from NotebookLM)
Speaker centroids: A speaker_centroids.pkl file containing pre-computed speaker embeddings for diarization

How It Works

Transcription: Uses Whisper Large to transcribe the audio with Arabic language setting
Chunking: Groups transcription segments into chunks of max 15 seconds
Speaker Diarization: Identifies speaker for each chunk using ECAPA-TDNN embeddings and cosine similarity to pre-computed centroids
Export: Saves audio chunks as WAV files with corresponding transcriptions and metadata CSV

Interactive Demo

The playground.ipynb notebook demonstrates:

Loading the NileTTS model from HuggingFace
Downloading dataset samples on-demand
Generating audios/speech
Playing and saving generated audio

Citation

If you use NileTTS in your research, please cite: [TO BE ADDED]

License

This project is licensed under the Apache 2.0 License - see the LICENSE file for details.

Acknowledgements

XTTSv2-Finetuning-for-New-Languages by @anhnh2002 for the training code. We adapted their finetuning pipeline and implemented additional evaluation metrics (WER, CER, Speaker Similarity) and Weights & Biases integration.
Coqui TTS for the XTTS v2 architecture
OpenAI Whisper for transcription
SpeechBrain for speaker embeddings
Google NotebookLM for audio synthesis

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

NileTTS

Resources

Overview

Results

Quick Start

Installation

Using the Model

Repository Structure

Data Generation Pipeline

Prerequisites

How It Works

Interactive Demo

Citation

License

Acknowledgements

About

Uh oh!

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
XTTSv2-Finetuning-for-New-Languages		XTTSv2-Finetuning-for-New-Languages
LICENSE		LICENSE
README.md		README.md
evaluate.py		evaluate.py
generate-data.py		generate-data.py
playground.ipynb		playground.ipynb
requirements.txt		requirements.txt
speaker_centroids.pkl		speaker_centroids.pkl

License

KickItLikeShika/NileTTS

Folders and files

Latest commit

History

Repository files navigation

NileTTS

Resources

Overview

Results

Quick Start

Installation

Using the Model

Repository Structure

Data Generation Pipeline

Prerequisites

How It Works

Interactive Demo

Citation

License

Acknowledgements

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages