How to train and evaluate policy models with unified dataset format?

Hi there, I noticed that there are [APIs](https://github.com/ConvLab/ConvLab-3/blob/master/convlab/util/unified_datasets_util.py) to load NLU, DST, Policy and NLG data in unified data format. Besides, I found the training and evaluation guide for NLU/DST/NLG with unified data in $model/README.md or NLU/DST/NLG/evaluate_unified_datasets.py. However, I did not find a guide for how to train and evaluate policy models with unified data format. Specifically, I have the following questions:

1. Training: I did not find support for training with unified data format in $policy_model/train.py, such as ppo/train.py and mle/train.py, it seems that they will use MultiWozEvaluator by default.
2. Evaluation: I did not find support for evaluation with unified data format in policy/evaluate.py, it seems that it will also use MultiWozEvaluator by default.
3. My Training Experiment: I have tried to train a PPO policy with this config file [base_pipeline_rule_user.json](https://github.com/ConvLab/ConvLab-3/files/14504542/base_pipeline_rule_user.json) (which has been initialized with a MLE policy weight trained with default config), and get the result: Best Complete Rate: 0.95, Best Success Rate: 0.5, Best Average Return: 4.5.  It is a good start for me, but still worser than 
BERTNLU | RuleDST | PPOPolicy | TemplateNLG evaluation in [ConvLab2 ReadME](https://github.com/thu-coai/ConvLab-2) (75.5 completion rate and 71.7 success rate). How does this gap come from?
4. My Evaluation Experiment: I evaluated  my previously trained PPO model policy/evaluate.py, but get a much worser result: "Complete 500 0.372 Success 500 0.228 Success strict 500 0.174".  During the evaluation, there are two warnings: "Value not found in standard value set: [dontcare] (slot: name domain: restaurant)", "Value [none] invalid! (Lexicalisation Error) (slot: name domain: hotel)". They seem to be the dataset format mismatch between training and evaluation process, because I am not sure whether I have used original Multiwoz format or unified data format to train and evaluate my policy model.
5. For user simulator: I have found that tus, emoUS and genTUS could be trained and evaluated with unified data format. However, I did not found unified data format support in rule-based user simulator. Does that mean if I trained my models(NLU/NLG or Policy) with unified data format, I could not evaluate them with rule-based user simulator?


Looking forward to your reply,
James Cao


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

How to train and evaluate policy models with unified dataset format? #191

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

How to train and evaluate policy models with unified dataset format? #191

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions