GitHub - bingo-todd/GCC-PHAT_DNN_Loc: DNN based binaural sound localization model, using GCC-PHAT as features

GCC-PHAT based DNN localization method

Baseline system in END-TO-END BINAURAL SOUND LOCALISATION FROM THE RAW WAVEFORM¹

Framework

Dataset

Binaural signal are synthesized using BRIRs.

BRIRs

Surrey binaural room impulse response (BRIR) database, including anechoic room and 4 reverberation room.

Room A B C D

RT_60(s) 0.32 0.47 0.68 0.89

DDR(dB) 6.09 5.31 8.82 6.12
Sound source

TIMIT sentences

Sentences per azimuth

Train Validate Evaluate

24 6 15

Cue extractor

Normally, features are normalized before being fed into network. If each dimension of features is independent variable, then normalization is applied to each dimension separately. For GCC-PHAT, what matters is the peak position, in other words, the relative value of each dimension, the same normalization coefficient should be used.

Two types of normalization are tested here:

separate_norm: each dimension is normalized separately
overall_norm: all dimensions are normalized with the same factor

E.g.

separate_norm	overall_norm

Model training

Multi-conditional training(MCT)

Each time, 1 reverberant room was selected and using in evaluation, the other 3 reverberant rooms and the anechoic room were used in model training.

Evaluation

Localization result was reported every 25 frames, considering the existence of silent frames. The RMSE of sound azimuth is used as performance metrics. For more stable result, evaluation is ran on 4 different test sets and RMSEs are averaged (not in the ref. paper).

	A	B	C	D
Paper	2.7	3.3	3.1	5.2
Separate_norm	0.5	1.6	1.1	3.3
overall_norm	0.6	1.7	1.1	3.3

Stability of model training

For room D, model is trained 3 times. Even though similar losses are achieved, test results vary.

mean: 3.39 std: 0.07

Main Dependencies

python 3
tensorflow-1.14
pysofa https://github.com/bingo-todd/pySOFA
BasicTools https://github.com/bingo-todd/BasicTools

Generate dataset

Align BRIRs(Not necessary)

Align BRIRs of reverberant rooms to BRIRs of anechoic room.
Synthesize spatial recordings
Calculate GCC-PHAT features
Calculate normalization coefficients of GCC-PHAT features

Reference

Vecchiotti, Paolo, Ning Ma, Stefano Squartini, and Guy J. Brown. “END-TO-END BINAURAL SOUND LOCALISATION FROM THE RAW WAVEFORM.” In 2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 451–55. International Conference on Acoustics Speech and Signal Processing ICASSP. 345 E 47TH ST, NEW YORK, NY 10017 USA: IEEE, 2019. ↩

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
example		example
gen_dataset		gen_dataset
images		images
utils		utils
.DS_Store		.DS_Store
._.DS_Store		._.DS_Store
.gitignore		.gitignore
LocDNN.py		LocDNN.py
README.md		README.md
evaluate_model.py		evaluate_model.py
train_model.py		train_model.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

GCC-PHAT based DNN localization method

Framework

Dataset

Cue extractor

Model training

Multi-conditional training(MCT)

Evaluation

Stability of model training

Main Dependencies

Generate dataset

Reference

About

Uh oh!

Releases

Packages

Languages

Room	A	B	C	D
RT_60(s)	0.32	0.47	0.68	0.89
DDR(dB)	6.09	5.31	8.82	6.12

Train	Validate	Evaluate
24	6	15

bingo-todd/GCC-PHAT_DNN_Loc

Folders and files

Latest commit

History

Repository files navigation

GCC-PHAT based DNN localization method

Framework

Dataset

Cue extractor

Model training

Multi-conditional training(MCT)

Evaluation

Stability of model training

Main Dependencies

Generate dataset

Reference

Footnotes

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages