-
Notifications
You must be signed in to change notification settings - Fork 35
Open
Description
Hi there, I noticed that there are APIs to load NLU, DST, Policy and NLG data in unified data format. Besides, I found the training and evaluation guide for NLU/DST/NLG with unified data in $model/README.md or NLU/DST/NLG/evaluate_unified_datasets.py. However, I did not find a guide for how to train and evaluate policy models with unified data format. Specifically, I have the following questions:
- Training: I did not find support for training with unified data format in $policy_model/train.py, such as ppo/train.py and mle/train.py, it seems that they will use MultiWozEvaluator by default.
- Evaluation: I did not find support for evaluation with unified data format in policy/evaluate.py, it seems that it will also use MultiWozEvaluator by default.
- My Training Experiment: I have tried to train a PPO policy with this config file base_pipeline_rule_user.json (which has been initialized with a MLE policy weight trained with default config), and get the result: Best Complete Rate: 0.95, Best Success Rate: 0.5, Best Average Return: 4.5. It is a good start for me, but still worser than
BERTNLU | RuleDST | PPOPolicy | TemplateNLG evaluation in ConvLab2 ReadME (75.5 completion rate and 71.7 success rate). How does this gap come from? - My Evaluation Experiment: I evaluated my previously trained PPO model policy/evaluate.py, but get a much worser result: "Complete 500 0.372 Success 500 0.228 Success strict 500 0.174". During the evaluation, there are two warnings: "Value not found in standard value set: [dontcare] (slot: name domain: restaurant)", "Value [none] invalid! (Lexicalisation Error) (slot: name domain: hotel)". They seem to be the dataset format mismatch between training and evaluation process, because I am not sure whether I have used original Multiwoz format or unified data format to train and evaluate my policy model.
- For user simulator: I have found that tus, emoUS and genTUS could be trained and evaluated with unified data format. However, I did not found unified data format support in rule-based user simulator. Does that mean if I trained my models(NLU/NLG or Policy) with unified data format, I could not evaluate them with rule-based user simulator?
Looking forward to your reply,
James Cao
Metadata
Metadata
Assignees
Labels
No labels