Cross-modal Spatial Alignment and Fusion Network for RGB-T crowd counting
Crowd counting is critical for public safety and urban management in smart cities, yet faces challenges in complex scenarios. While RGB-Thermal (RGB-T) fusion helps address information loss in low-light conditions, current methods still suffer from two key limitations. (a) Existing RGB-T crowd counting methods fail to address the spatial misalignment between RGB and thermal features caused by different capturing devices, which diminishes fusion performance and impedes improvements in crowd counting accuracy. (b) Current methods fail to adequately distinguish between specific and common features of RGB and thermal modalities, leading to redundant feature fusion that compromises feature representation and results in suboptimal counting performance. To address the aforementioned challenges, the Cross-modal Spatial Alignment and Fusion Network (CSAFNet) is proposed. CSAFNet integrates three novel modules: the Cross-modal Feature Space Alignment (CFSA), Multiscale Spatial Displacement Compensation (MSDC) and the Cross-modal Feature Decoupling Fusion (CFDF) modules. The CFSA module performs precise spatial alignment via feature windows and achieves wide spatial consistency through the MSDC module. The CFDF module employs Kullback-Leibler divergence and Jensen-Shannon divergence to perform decoupled fusion of cross-modal features, preserving modality-specific details, enhancing cross-modal commonalities, reducing redundant features, and strengthening discriminative feature representation. Extensive experiments demonstrate that the proposed CSAFNet achieves competitive performance on the RGBT-CC dataset, reducing GAME(0) to 10.75 and RMSE to 17.91. These results validate the effectiveness and promising potential of CSAFNet for cross-modal crowd counting tasks.

-
Notifications
You must be signed in to change notification settings - Fork 0
Zyjer888/CSAFNet
Folders and files
| Name | Name | Last commit message | Last commit date | |
|---|---|---|---|---|
Repository files navigation
About
Cross-modal Spatial Alignment and Fusion Network for RGB-T crowd counting
Resources
Stars
Watchers
Forks
Releases
No releases published
Packages 0
No packages published