Special token formatting

The config file dictates what special tokens are used by each vocabulary. This is because the parser needs to know which token in the training file(s) is the root. In SD and UD, this is `root`, but in CTB and some CoNLL 2009 treebanks, it's `ROOT`. This means we can't just hardcode in which label string indicates the root relation. In [a previous implementation of the parser](https://github.com/tdozat/Parser) the root string can be specified, but in this one you specify the format of all special tokens to allow for consistency; however, this opens up the possibility of leaving out some special tokens that the code assumes are there, or including ones that the code never uses.

A better approach is to hardcode in what the special tokens are for each vocabulary but let the configuration file specify what the format for them is, allowing for the following possibilities:

1. Upper (e.g. `ROOT`)
2. Proper (e.g. `Root`)
3. Lower (e.g. `root`)
4. Upper HTML (e.g. `<ROOT>`)
5. Proper HTML (e.g. `<Root>`)
6. Lower HTML (e.g. `<root>`)

Changing the `special_tokens` option to `special_token_case` and `special_token_html` should fix this, but it'll break older models.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Special token formatting #3

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Special token formatting #3

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions