Baseline system in END-TO-END BINAURAL SOUND LOCALISATION FROM THE RAW WAVEFORM1
Binaural signal are synthesized using BRIRs.
-
BRIRs
Surrey binaural room impulse response (BRIR) database, including anechoic room and 4 reverberation room.
Room A B C D RT_60(s) 0.32 0.47 0.68 0.89 DDR(dB) 6.09 5.31 8.82 6.12 -
Sound source
TIMIT sentences
Sentences per azimuth
Train Validate Evaluate 24 6 15
Normally, features are normalized before being fed into network. If each dimension of features is independent variable, then normalization is applied to each dimension separately. For GCC-PHAT, what matters is the peak position, in other words, the relative value of each dimension, the same normalization coefficient should be used.
Two types of normalization are tested here:
- separate_norm: each dimension is normalized separately
- overall_norm: all dimensions are normalized with the same factor
E.g.
| separate_norm | overall_norm |
|---|---|
![]() |
![]() |
Each time, 1 reverberant room was selected and using in evaluation, the other 3 reverberant rooms and the anechoic room were used in model training.
Localization result was reported every 25 frames, considering the existence of silent frames. The RMSE of sound azimuth is used as performance metrics. For more stable result, evaluation is ran on 4 different test sets and RMSEs are averaged (not in the ref. paper).
| A | B | C | D | |
|---|---|---|---|---|
| Paper | 2.7 | 3.3 | 3.1 | 5.2 |
| Separate_norm | 0.5 | 1.6 | 1.1 | 3.3 |
| overall_norm | 0.6 | 1.7 | 1.1 | 3.3 |
For room D, model is trained 3 times. Even though similar losses are achieved, test results vary.
mean: 3.39 std: 0.07
- python 3
- tensorflow-1.14
- pysofa https://github.com/bingo-todd/pySOFA
- BasicTools https://github.com/bingo-todd/BasicTools
-
Align BRIRs(Not necessary)
Align BRIRs of reverberant rooms to BRIRs of anechoic room.
-
Synthesize spatial recordings
-
Calculate GCC-PHAT features
-
Calculate normalization coefficients of GCC-PHAT features
Footnotes
-
Vecchiotti, Paolo, Ning Ma, Stefano Squartini, and Guy J. Brown. “END-TO-END BINAURAL SOUND LOCALISATION FROM THE RAW WAVEFORM.” In 2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 451–55. International Conference on Acoustics Speech and Signal Processing ICASSP. 345 E 47TH ST, NEW YORK, NY 10017 USA: IEEE, 2019. ↩


