From the paper, it is mentioned that fine-tuning the evaluator should be relatively memory efficient compared to the LLaMA 2 inference, which consumes more than 40GB gpu memory.
However, when I run the script using A100 with 40GB gpu memory in Colab, I make sure the batch size is 6 when 40GB is used up.
I wonder what kind of configuration you used to decrease the gpu memory usage when fine-tuning, and what is your memory usage (batch-size vs gpu memory)?
Thank you for your help :)