virtual talkbox

a virtual talkbox using computer vision to track your mouth movements and shape sound with formant filters. basically: you play notes (midi or qwerty keyboard) and your mouth controls how they sound!

what is this

with a talkbox, you play a synth through a tube into your mouth, and shape the sound with vowel movements. this does the same thing but with a webcam instead of a tube. it uses mediapipe to track your mouth, calculates approximate formant frequencies (F1, F2, F3) based on your mouth shape, and filters a sawtooth wave in real-time. the result sounds like you're "singing" (kinda) the notes you play!

setup

macOS

# install system dependencies (required for pyo audio library)
brew install portaudio portmidi liblo libsndfile

# create and activate virtual environment
python3 -m venv venv
source venv/bin/activate

# install python dependencies
pip install -r requirements.txt

# install pyo from github (the PyPI version has build issues)
C_INCLUDE_PATH="/opt/homebrew/include" LIBRARY_PATH="/opt/homebrew/lib" pip install git+https://github.com/belangeo/pyo.git

# run it
python main.py

Linux

# install system dependencies (debian/ubuntu)
sudo apt-get install portaudio19-dev libportmidi-dev liblo-dev libsndfile1-dev

# create and activate virtual environment
python3 -m venv venv
source venv/bin/activate

# install python dependencies
pip install -r requirements.txt

# install pyo
pip install pyo

# run it
python main.py

Windows

# create and activate virtual environment
python -m venv venv
venv\Scripts\activate

# install python dependencies
pip install -r requirements.txt

# install pyo (pre-built wheels available on Windows)
pip install pyo

# run it
python main.py

usage

when you start it up, you'll get a menu to choose your input:

midi keyboard - if you have one plugged in
computer keyboard - if you don't

the app will remember your choice for next time.

keyboard layout

if you're using qwerty keyboard mode, it's set up like a piano:

black keys:  w e   t y u   o p
white keys: a s d f g h j k l ; '

extra controls:

z/x - change octave
c - toggle vibrato
arrow up/down - pitch bend

playing

press a key to play a note
move your mouth while the note is playing
experiment with different vowel shapes:
- "ah" = open mouth
- "ee" = wide smile
- "oo" = rounded lips

sound only plays when a note is pressed AND your face is detected

how it works

face tracking: mediapipe facemesh for mouth landmark detection
formant mapping:
- F1 (270-730hz) controlled by jaw opening
- F2 (870-2290hz) controlled by lip width
- F3 (1650-3000hz) combination of both
audio: pyo synthesizer with supersaw oscillator + formant bandpass filters
threading: separate threads for video, audio, and input

config

creates a config.json file where you can tweak:

audio buffer size (lower = less latency, higher = more stability)
camera device id
formant frequency ranges
debug display settings

credits

uses mediapipe, pyo, opencv, mido, and pynput

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
src		src
.gitattributes		.gitattributes
.gitignore		.gitignore
README.md		README.md
config.json		config.json
config.py		config.py
main.py		main.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

virtual talkbox

what is this

setup

macOS

Linux

Windows

usage

keyboard layout

playing

how it works

config

credits

About

Uh oh!

Releases

Packages

Languages

deancureton/virtual-talkbox

Folders and files

Latest commit

History

Repository files navigation

virtual talkbox

what is this

setup

macOS

Linux

Windows

usage

keyboard layout

playing

how it works

config

credits

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages