Skip to content

Inefficient Process for Adding New Entities in ReFinED #28

@Shoumik-Gandre

Description

@Shoumik-Gandre

When trying to add a dozen more entities by running preprocess_all.py, the process requires downloading over 100GB of data, which is highly inefficient for such a small addition.

This model cannot be considered to have zero-shot capabilities until there is a streamlined, bloat-free script for adding new entities into the system.

Steps to Reproduce:

  1. Clone the repository and set up the environment as per the documentation.
  2. Attempt to add a dozen new entities by running preprocess_all.py.
  3. Observe the data download requirements and inefficiency.

Expected Behavior:

There should be a lightweight and efficient process for adding new entities without requiring extensive data downloads.

Actual Behavior:

Adding new entities requires downloading over 100GB of data, making the process highly inefficient and cumbersome.

Environment:

Google Colab
Operating System: Linux
Python Version: 3.10

Severity:

High - This issue severely impacts the usability and efficiency of adding new entities to the system and needs immediate attention.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions