Skip to content

Use Trafilatura for boilerplate removal #24

@DavidNemeskey

Description

@DavidNemeskey

JusText seems to remove too much and its accuracy seems not to be very high. We need a better tool for boilerplate removal.

Options:

The first two options (and the last one as well, probably) use DL, and need training data. For now, let's experiment with Trafilatura.

Metadata

Metadata

Assignees

Labels

No labels
No labels

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions