This project implements a multi-modal classification model using deep learning to analyze vibration and acoustic emission (AE) signals represented as time-frequency scalogram images. The goal is to combine both modalities using feature fusion techniques to improve classification performance.
- Inputs: Spectrogram images of VIB and AE signals.
- Model: Two parallel ResNet50-based CNN feature extractors (pretrained and frozen) for each modality.
- Fusion: Multiple fusion strategies (sum, multiplication, concatenation, interleaving, and attention).
- Classifier: A custom fully connected classifier trained on fused features.
- Loss: Cross-entropy loss with softmax activation for multi-class classification.