Skip to content

Cross-modal Spatial Alignment and Fusion Network for RGB-T crowd counting

Notifications You must be signed in to change notification settings

Zyjer888/CSAFNet

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

12 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

CSAFNet

Cross-modal Spatial Alignment and Fusion Network for RGB-T crowd counting Crowd counting is critical for public safety and urban management in smart cities, yet faces challenges in complex scenarios. While RGB-Thermal (RGB-T) fusion helps address information loss in low-light conditions, current methods still suffer from two key limitations. (a) Existing RGB-T crowd counting methods fail to address the spatial misalignment between RGB and thermal features caused by different capturing devices, which diminishes fusion performance and impedes improvements in crowd counting accuracy. (b) Current methods fail to adequately distinguish between specific and common features of RGB and thermal modalities, leading to redundant feature fusion that compromises feature representation and results in suboptimal counting performance. To address the aforementioned challenges, the Cross-modal Spatial Alignment and Fusion Network (CSAFNet) is proposed. CSAFNet integrates three novel modules: the Cross-modal Feature Space Alignment (CFSA), Multiscale Spatial Displacement Compensation (MSDC) and the Cross-modal Feature Decoupling Fusion (CFDF) modules. The CFSA module performs precise spatial alignment via feature windows and achieves wide spatial consistency through the MSDC module. The CFDF module employs Kullback-Leibler divergence and Jensen-Shannon divergence to perform decoupled fusion of cross-modal features, preserving modality-specific details, enhancing cross-modal commonalities, reducing redundant features, and strengthening discriminative feature representation. Extensive experiments demonstrate that the proposed CSAFNet achieves competitive performance on the RGBT-CC dataset, reducing GAME(0) to 10.75 and RMSE to 17.91. These results validate the effectiveness and promising potential of CSAFNet for cross-modal crowd counting tasks. image

About

Cross-modal Spatial Alignment and Fusion Network for RGB-T crowd counting

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published