-
Notifications
You must be signed in to change notification settings - Fork 3
Open
Description
Hi!
Thank you for the awesome work!
I'm attempting to use your benchmark and somehow found that many examples have mismatch in the number of sections being annotated and number of sections. Sometimes there are more annotations than number of sections and sometimes the opposite.
Do you know what is the correct way to use this data when there is a mismatch?
Also, for some sections that have reasoning_correctness = '', does it mean it is having propagated error from previous steps or does it mean it is correct?
Thank you again for the great work!
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels