Skip to content

[MiniLLM] Strange Rouge-L result on Dolly dataset #329

@Stephen-K1

Description

@Stephen-K1

Thanks for your great work.

I followed the instructions of readme and ran "bash scripts/gpt2/minillm/train_base_xl.sh". The student model is init-gpt2-base, which is provided in the readme. The Rouge-L I got at the beginning of the training was higher than what was presented in the paper (24.6), which is strange. You can see some of the Rouge-L results below:

eval | rougeL: 25.100 | exact_match: 3.200 | rev_kl: 1.894 | lens: 69.771
train | data_epochs 0/10 | inner iter: 3/ 8 | ppo epoch: 0/ 4 | global iter: 100/ 5000| tot_loss: 3.7369 | rl_loss: 3.7369 | pt_loss: 0.0000 | pg_loss: 1.3049 | reg_loss: 2.4320 | reward: -1.7812 | rev_kl: 2.5176 | stu_lens: 67.8750 | mixed_lens: 55.2188 | lr: 5.0000e-06 | scale: 2048.00 | time: 1.298 | step time: 0.000
...

eval | rougeL: 25.650 | exact_match: 2.900 | rev_kl: 1.713 | lens: 70.656
train | data_epochs 0/10 | inner iter: 7/ 8 | ppo epoch: 0/ 4 | global iter: 200/ 5000| tot_loss: 2.8755 | rl_loss: 2.8755 | pt_loss: 0.0000 | pg_loss: 0.9560 | reg_loss: 1.9194 | reward: -1.4756 | rev_kl: 1.9214 | stu_lens: 92.5000 | mixed_lens: 70.6250 | lr: 5.0000e-06 | scale: 2048.00 | time: 1.283 | step time: 0.000
...

eval | rougeL: 26.270 | exact_match: 3.600 | rev_kl: 1.603 | lens: 65.790
train | data_epochs 0/10 | inner iter: 3/ 8 | ppo epoch: 1/ 4 | global iter: 300/ 5000| tot_loss: 2.2480 | rl_loss: 2.2480 | pt_loss: 0.0000 | pg_loss: 0.6949 | reg_loss: 1.5532 | reward: -1.4756 | rev_kl: 2.4121 | stu_lens: 42.1250 | mixed_lens: 61.0625 | lr: 5.0000e-06 | scale: 2048.00 | time: 1.250 | step time: 0.000

I train it on two nvidia 3090, and the contents of the "scripts/gpt2/minillm/train_base_xl.sh" are as below:

#! /bin/bash

MASTER_ADDR=localhost
MASTER_PORT=${2-2012}
NNODES=1
NODE_RANK=0
GPUS_PER_NODE=${3-2}

DISTRIBUTED_ARGS="--nproc_per_node $GPUS_PER_NODE
--nnodes $NNODES
--node_rank $NODE_RANK
--master_addr $MASTER_ADDR
--master_port $MASTER_PORT"

model

BASE_PATH=${1-"/workspace/codes/minillm"}

CKPT_NAME="gpt2-base"

CKPT="/workspace/codes/minillm/checkpoints/gpt2-base/"

CKPT_NAME="init-gpt2-120M"
CKPT="/workspace/codes/minillm/checkpoints/init-gpt2-120M"
TEACHER_CKPT_NAME="teacher-gpt2-1.5B"
TEACHER_CKPT="/workspace/codes/minillm/checkpoints/teacher-gpt2-1.5B"

data

PROMPT_DATA_DIR="/workspace/codes/minillm/processed_data/dolly/prompt/gpt2/"

runtime

SAVE_PATH="/workspace/codes/minillm/results/gpt2/train/minillm/"

hp

GRAD_ACC=1
BATCH_SIZE=16
CHUNK_SIZE=16

OPTS=""

model

OPTS+=" --base-path ${BASE_PATH}"
OPTS+=" --model-path ${CKPT}"
OPTS+=" --teacher-model-path ${TEACHER_CKPT}"
OPTS+=" --ckpt-name ${CKPT_NAME}"
OPTS+=" --teacher-ckpt-name ${TEACHER_CKPT_NAME}"
OPTS+=" --n-gpu ${GPUS_PER_NODE}"
OPTS+=" --n-nodes ${NNODES}"
OPTS+=" --teacher-model-fp16"

OPTS+=" --gradient-checkpointing"

data

OPTS+=" --prompt-data-dir ${PROMPT_DATA_DIR}"

OPTS+=" --lm-data-dir ${LM_DATA_DIR}"

OPTS+=" --dev-num 1000"
OPTS+=" --num-workers 16"

hp

OPTS+=" --epochs 10"
OPTS+=" --total-iters 5000"
OPTS+=" --kd-ratio 0.5"
OPTS+=" --batch-size ${BATCH_SIZE}"
OPTS+=" --lr 5e-6"
OPTS+=" --lr-min 5e-6"
OPTS+=" --gradient-accumulation-steps ${GRAD_ACC}"
OPTS+=" --max-length 512"
OPTS+=" --max-prompt-length 256"
OPTS+=" --warmup-iters 100"

runtime

OPTS+=" --save ${SAVE_PATH}"
OPTS+=" --seed 10"
OPTS+=" --seed-ppo 42"
OPTS+=" --seed-lm 7"
OPTS+=" --save-interval 500"
OPTS+=" --eval-interval 100"
OPTS+=" --log-interval 16"
OPTS+=" --mid-log-num 1"

ppo

OPTS+=" --type minillm"
OPTS+=" --ppo-epochs 4"
OPTS+=" --num-rollouts 256"
OPTS+=" --chunk-size ${CHUNK_SIZE}"

minillm

OPTS+=" --length-norm"
OPTS+=" --single-step-reg"
OPTS+=" --teacher-mixed-alpha 0.2"

reward

OPTS+=" --reward-scaling 0.5"
OPTS+=" --cliprange-reward 100"

gen

OPTS+=" --do-sample"
OPTS+=" --top-k 0"
OPTS+=" --top-p 1.0"
OPTS+=" --temperature 1.0"

deepspeed

OPTS+=" --deepspeed"
OPTS+=" --deepspeed_config ${BASE_PATH}/configs/deepspeed/ds_config_zero1_fp16.json"

export NCCL_DEBUG=""
export WANDB_DISABLED=True
export TF_CPP_MIN_LOG_LEVEL=3
export PYTHONPATH=${BASE_PATH}
CMD="torchrun ${DISTRIBUTED_ARGS} ${BASE_PATH}/train_minillm.py ${OPTS} $@"

echo ${CMD}
echo "PYTHONPATH=${PYTHONPATH}"
mkdir -p ${SAVE_PATH}
${CMD}

Do you have any clues why I got these strange Rouge-L results? Thanks a lot for your help.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions