Skip to content

hse-scila/Aphasia

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Aphasia Classification Based on Patient Speech

Repository Navigation

  • checkpoints — directory for locally stored model weights. Each model has its own subdirectory inside
  • data — directory for datasets
  • image — images displayed in the README.md
  • models — Python files containing model classes
  • notebooks — Jupyter notebooks with experiments
    • base_analysis — EDA
    • 2d_mfcc — training a CNN on MFCC
    • 2d_spectrogramm — training a CNN on spectrograms
    • catboost — CatBoost training
    • data_splitting — data splitting
    • ml_test_all_features — Experiments with classical ML
    • swishnet — Training SwishNet on audio chunks
    • wav2vec_train — Training Wav2Vec
    • wav2vec_test — Testing Wav2Vec
  • python scrips — code for running on a remote cluster
  • src — helper functions/classes and Streamlit web app

Problem Statement

Assistive systems are among the most demanded areas in machine learning.
Even today, some doctors use artificial intelligence in their daily practice. It helps simplify diagnosis
and enables personalized treatment for each patient.



Our work focuses on building a model to predict the presence of aphasia in a patient. Aphasia is a speech disorder
that affects speech comprehension. It often occurs after a stroke, traumatic brain injury, or diseases
related to the central nervous system. This condition can severely impact a person’s ability to communicate,
especially in elderly individuals. However, if therapy starts early enough, recovery is possible.
Therefore, having a tool that can detect the first signs of aphasia is crucial.

Dataset

The dataset was provided by the Laboratory of Social Cognitive Informatics. It includes 353 participants with aphasia
and 101 without. Each participant has approximately two audio recordings. The participants belong to different age groups.
The average age of aphasic participants is 58, and the distribution is close to normal. The non-aphasic group’s age
distribution is more uniform, containing both young and elderly subjects.


Below is the distribution of aphasia severity:

Distribution of aphasia severity

We have tested various methods

Classical ML

As a baseline, classical machine learning was chosen, since in some cases it can be sufficient.
FLAML was used because it automatically selects models and tunes their hyperparameters.
Feature sets included MFCC+ZCR, Prosody Features+ZCR, and a combination of several types
(MFCC, Chromagram, Spectral Features, Prosody Features, ZCR, Timestamps). Additionally we used optuna to tune catboost.

MFCC

MFCC represents audio using coefficients for time segments obtained by
convolving the spectrogram with “special filters.” Physically,
it simulates how human hearing processes speech (similar to Mel-spectrograms).
Since our data consists of speech recordings, this representation captures relevant
speech-related features.


In the literature, both classical ML and 1D CNNs are commonly used with MFCCs.

Waveform

One straightforward idea is to feed raw audio directly into a transformer. Below are the Wav2Vec results:

Wav2Vec architecture

Spectrograms

Spectrograms remain one of the most commonly used representations, so it was reasonable to test them as well

Conclusion and Future Work

Various methods were tested, and for the final Streamlit application,
Wav2Vec was chosen for its accuracy, and MobileNet on MFCC due to its speed and good performance.


Although the classifier itself is complete, there is still room for exploration. For example,
severity prediction remains an open goal. If more datasets in other languages were publicly available,
we could train on them and test on Russian speech to see whether the model focuses more on what is said or how it is said.

References

About

This is repository for paper "..."

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published