Automatic Audio Anonymization

This is my source code for my master at École de Technologie Supérieure in partnership with Desjardins.

The goal was to create a proof of concept to determine if it is possible to anonymize french audio recordings.

The code is separated as followed :

fa : Code used to evaluate two Forced Alignment (FA) algorithms
ner : Code used to train and evaluate three Named Entity Recognition (NER) models
pipeline : Code used to create the docker image to anonymize audio recordings
annotations : Manually annotated speech corpora gold annotations. See Datasets section for more details.

Pipeline

The pipeline is usable via a Docker image. Please refer to the official documentation for more details on how to install Docker.

Installation steps

Firstly, build the docker image.

cd pipeline
docker build --tag pipeline .

Secondly, download the trained NER models on Zenodo.

Thirdly, allow the docker image to use your GPU.

curl -s -L https://nvidia.github.io/nvidia-container-runtime/gpgkey | sudo apt-key add -
distribution=$(. /etc/os-release;echo $ID$VERSION_ID)
curl -s -L https://nvidia.github.io/nvidia-container-runtime/$distribution/nvidia-container-runtime.list |\
    sudo tee /etc/apt/sources.list.d/nvidia-container-runtime.list
sudo apt-get update
sudo apt-get install nvidia-container-runtime
sudo systemctl stop docker
sudo systemctl start docker

User manual

Now, you can use the pipeline.

docker run -it -v [PATH_TO_DATA_TO_ANONYMIZE]:/input \
-v [PATH_TO_TMP_FA_ALGO_OUTPUT]:/align \
-v [PATH_TO_NER_MODELS_DIR]:/ner_models \
-v [PATH_TO_PIPELINE_OUTPUT]:/redact \
--gpus device=0 pipeline

Input data format

The input directory contains the audio with its corresponding transcription.

The audio file format is wav. The transcription file format is TextGrid. Both files must have the same name. For example, if the audio file is named example.wav, the transcription must be named example.TextGrid.

Here is an example of a Textgrid file :

File type = "ooTextFile"
Object class = "TextGrid"

xmin = 0.000000
xmax = 4.803000
tiers? <exists>
size = 1
item []:
    item [1]:
        class = "IntervalTier"
        name = "spkr_1_1-trans"
        xmin = 0.000000
        xmax = 4.803000
        intervals: size = 3
        intervals [1]:
            xmin = 0.000000
            xmax = 0.500000
            text = ""
        intervals [2]:
            xmin = 0.500000
            xmax = 4.303000
            text = "This is an example of someone talking for approximately four seconds"
        intervals [3]:
            xmin = 4.303000
            xmax = 4.803000
            text = ""

Note that to work directly with our pipeline, the interval name containing the transcription must be named spkr_1_1-trans.

Datasets

I used two datasets for this project.

The first one is FrenNER to train the NER models.

The second one is based on NCCFR to evaluate the FA algorithms and pipeline. For more details on how to generate the dataset, check annotations folder.

Important Note

Part of matching between the predictions with the gold standard annotations was done by hand to ensure a good evaluation of the pipeline.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Automatic Audio Anonymization

Pipeline

Installation steps

User manual

Input data format

Datasets

Important Note

About

Uh oh!

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
annotations		annotations
fa		fa
ner		ner
pipeline		pipeline
.gitignore		.gitignore
readme.md		readme.md

gbaril/audio_anonymization

Folders and files

Latest commit

History

Repository files navigation

Automatic Audio Anonymization

Pipeline

Installation steps

User manual

Input data format

Datasets

Important Note

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages