Inefficient Process for Adding New Entities in ReFinED

When trying to add a dozen more entities by running preprocess_all.py, the process requires downloading over 100GB of data, which is highly inefficient for such a small addition.

This model cannot be considered to have zero-shot capabilities until there is a streamlined, bloat-free script for adding new entities into the system.

Steps to Reproduce:

1. Clone the repository and set up the environment as per the documentation.
2. Attempt to add a dozen new entities by running preprocess_all.py.
3. Observe the data download requirements and inefficiency.

Expected Behavior:

There should be a lightweight and efficient process for adding new entities without requiring extensive data downloads.

Actual Behavior:

Adding new entities requires downloading over 100GB of data, making the process highly inefficient and cumbersome.

Environment:

Google Colab
Operating System: Linux
Python Version: 3.10

Severity:

High - This issue severely impacts the usability and efficiency of adding new entities to the system and needs immediate attention.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Inefficient Process for Adding New Entities in ReFinED #28

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Inefficient Process for Adding New Entities in ReFinED #28

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions