Thanks to the authors for their contribution. I have difficulties in reproducing the distillation experiments based on Transformer architecture to achieve the recorded performance. Can you provide more training details for reference. It would be better if .sh files are provided.