Hello, thank you very much for your work.
I have several questions regarding the discrepancies between the content of the paper and the code:
First, in modal fusion, the paper combines the features of 2D features and 2D3D features through addition, but it seems that this step is not present in the code. Do we need this operation?

Secondly, after obtaining the 2D Learner, the paper mentions adding the 3D features with the 2D Learner through skip connections to obtain Enhanced 3D Features. However, I didn't see this step in the code. Did I overlook it, or is it unnecessary to obtain Enhanced 3D Features?

Thank you very much! Best of luck with your research!