Hi, thank you for sharing such an impressive and well-structured project. :D
However, we noticed that your code includes future tracks for the reference motion and also allows adding RL rewards when DAgger the student policy. Could you please share why these components were not used in the final implementation? It seems that including future trajectory points for tracking and RL rewards for student supervision might improve performance.
Thanks again for your great work!