This code is the implementation of Speech Emotion Recognition (SER) with acoustic features. The network model is Convolutional Neural Network (CNN) + Bidirectional Long Short Term Memory (BLSTM) + Self-Attention.
- Edit preprocessing.py and preprocess your files
python3 preprocessing.py
- Edit hyper_param.yaml
- Run main.py
python3 main.py
Ryotaro Nagase, Takahiro Fukumori and Yoichi Yamashita: ``Speech Emotion Recognition with Fusion of Acoustic- and Linguistic-Feature-Based Decisions, '' Proc. Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC), pp. 725 -- 730, 2021.