DutchParliamentCorefResolution

This repository contains the codebase for the paper 'Neural Coreference Resolution for Dutch Parliamentary Documents', presented at the 31th CLIN conference on Natural Language Processing for the Dutch Language. This repository is a fork of the [Dutch e2e implementation] of the e2e model by Kenton Lee, and also contains the dataset that was annotated during this project, as well as the parse trees that were used for the rulebased baseline model based on Alpino.

Installation

Below are the installation instructions for this repository, these are largely based on he instructions of the original repository.

Requirements:

Python 3.6 or 3.7
pip
tensorflow v2.0.0 or higher

In this repository, run:

pip install -r requirements.txt
pip install .

Alternatively, you can install directly from Pypi:

pip install tensorflow
pip install e2e-Dutch

Changes

Although the version of the e2e model used in this research is mostly identical to the main repository, some changes have been made to the original repository, the largest changes are mentioned below.

In line with the original e2e implementation, the option to include speaker metadata in the e2e model has been added to this version of the e2e model in the 'coref_model.py' file.
For the experiments concerning the genders of the actors mentioned in texts, the ability to add this information has been added, these changes were also made in the 'coref_model.py' file.
some minor changes were made in the 'train.py' file, including the ability to specify the number of epochs that the model should be trained for in the form of the '--epochs' command line argument.
As the method used for converting the files to jsonlines is custom because of the speaker information, a new train preparation script is added, 'my_train.sh', and the 'download.py' script was also also slightly altered to remove downloads that are not necessary for this project.

Adding the usage of speaker metadata to the e2e model

- TODO

Additional notes

jsonlines input format

When using the jsonlines input format for the Dutch e2e model, it is important that the the 'doc_key' parameter is specified correctly, to avoid errors later. One should create document IDs that look like the following: 'doc_type/doc_id', where 'doc_type' should be one of the predefined types, discussed in this issue, and 'doc_id' should be the name of the document (the name of this document is not really important, but it does have to be unique!).

Name		Name	Last commit message	Last commit date
Latest commit History 138 Commits
.github/workflows		.github/workflows
docs		docs
e2edutch		e2edutch
scripts		scripts
test		test
.gitignore		.gitignore
.zenodo.json		.zenodo.json
CITATION.cff		CITATION.cff
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

DutchParliamentCorefResolution

Installation

Changes

Adding the usage of speaker metadata to the e2e model

Additional notes

jsonlines input format

About

Uh oh!

Releases

Packages

Languages

License

RubenvanHeusden/DutchParliamentCorefResolution

Folders and files

Latest commit

History

Repository files navigation

DutchParliamentCorefResolution

Installation

Changes

Adding the usage of speaker metadata to the e2e model

Additional notes

jsonlines input format

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages