-
Notifications
You must be signed in to change notification settings - Fork 13
Remove torch text #69
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR removes the dependency on the torchtext library by transplanting its vocabulary functionality into the local codebase under lighthouse/common/vocab. The change eliminates an external dependency while maintaining the same vocabulary functionality needed for text processing.
- Replaces
torchtext.vocabimports with locallighthouse.common.vocabmodule - Creates a complete local implementation of vocabulary classes (
Vocab,GloVe,FastText, etc.) - Updates installation instructions and CI workflows to remove torchtext dependency
Reviewed Changes
Copilot reviewed 11 out of 11 changed files in this pull request and generated 22 comments.
Show a summary per file
| File | Description |
|---|---|
| training/dataset.py | Updates import to use local vocab module instead of torchtext |
| training/cg_detr_dataset.py | Updates import to use local vocab module instead of torchtext |
| lighthouse/feature_extractor/text_encoders/glove.py | Updates import to use local vocab module instead of torchtext |
| lighthouse/common/vocab/vocab.py | Implements Vocab class with torch.jit compatibility |
| lighthouse/common/vocab/vectors.py | Implements vector classes (GloVe, FastText, CharNGram) for embeddings |
| lighthouse/common/vocab/init.py | Package initialization exposing vocab classes |
| lighthouse/common/vocab.py | Complete vocabulary implementation with all classes |
| mypy.ini | Removes torchtext from mypy ignore list |
| README.md | Removes torchtext from installation instructions |
| .github/workflows/pytest.yml | Removes torchtext from CI dependencies |
| .github/workflows/mypy_ruff.yml | Removes torchtext from CI dependencies |
lighthouse/common/vocab.py
Outdated
| from .utils import reporthook | ||
|
|
Copilot
AI
Aug 11, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Import error: The module '.utils' is being imported but there's no evidence of a utils.py file in the lighthouse/common directory. The reporthook function is defined within the same file at line 263, making this import unnecessary and likely to cause a runtime error.
| from .utils import reporthook |
| "glove.42B.300d": partial(GloVe, name="42B", dim="300"), | ||
| "glove.840B.300d": partial(GloVe, name="840B", dim="300"), |
Copilot
AI
Aug 11, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Type mismatch: The dim parameter is passed as a string "300" but the GloVe constructor expects an integer. This should be dim=300 without quotes.
| "glove.42B.300d": partial(GloVe, name="42B", dim="300"), | |
| "glove.840B.300d": partial(GloVe, name="840B", dim="300"), | |
| "glove.42B.300d": partial(GloVe, name="42B", dim=300), | |
| "glove.840B.300d": partial(GloVe, name="840B", dim=300), |
| "glove.42B.300d": partial(GloVe, name="42B", dim="300"), | ||
| "glove.840B.300d": partial(GloVe, name="840B", dim="300"), |
Copilot
AI
Aug 11, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Type mismatch: The dim parameter is passed as a string "300" but the GloVe constructor expects an integer. This should be dim=300 without quotes.
| "glove.42B.300d": partial(GloVe, name="42B", dim="300"), | |
| "glove.840B.300d": partial(GloVe, name="840B", dim="300"), | |
| "glove.42B.300d": partial(GloVe, name="42B", dim=300), | |
| "glove.840B.300d": partial(GloVe, name="840B", dim=300), |
| "fasttext.simple.300d": partial(FastText, language="simple"), | ||
| "glove.42B.300d": partial(GloVe, name="42B", dim="300"), | ||
| "glove.840B.300d": partial(GloVe, name="840B", dim="300"), | ||
| "glove.twitter.27B.25d": partial(GloVe, name="twitter.27B", dim="25"), |
Copilot
AI
Aug 11, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Type mismatch: The dim parameter is passed as a string "25" but the GloVe constructor expects an integer. This should be dim=25 without quotes.
| "glove.twitter.27B.25d": partial(GloVe, name="twitter.27B", dim="25"), | |
| "glove.twitter.27B.25d": partial(GloVe, name="twitter.27B", dim=25), |
| "glove.42B.300d": partial(GloVe, name="42B", dim="300"), | ||
| "glove.840B.300d": partial(GloVe, name="840B", dim="300"), | ||
| "glove.twitter.27B.25d": partial(GloVe, name="twitter.27B", dim="25"), | ||
| "glove.twitter.27B.50d": partial(GloVe, name="twitter.27B", dim="50"), | ||
| "glove.twitter.27B.100d": partial(GloVe, name="twitter.27B", dim="100"), | ||
| "glove.twitter.27B.200d": partial(GloVe, name="twitter.27B", dim="200"), | ||
| "glove.6B.50d": partial(GloVe, name="6B", dim="50"), | ||
| "glove.6B.100d": partial(GloVe, name="6B", dim="100"), | ||
| "glove.6B.200d": partial(GloVe, name="6B", dim="200"), | ||
| "glove.6B.300d": partial(GloVe, name="6B", dim="300"), |
Copilot
AI
Aug 11, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Type mismatch: The dim parameter is passed as a string "50" but the GloVe constructor expects an integer. This should be dim=50 without quotes.
| "glove.42B.300d": partial(GloVe, name="42B", dim="300"), | |
| "glove.840B.300d": partial(GloVe, name="840B", dim="300"), | |
| "glove.twitter.27B.25d": partial(GloVe, name="twitter.27B", dim="25"), | |
| "glove.twitter.27B.50d": partial(GloVe, name="twitter.27B", dim="50"), | |
| "glove.twitter.27B.100d": partial(GloVe, name="twitter.27B", dim="100"), | |
| "glove.twitter.27B.200d": partial(GloVe, name="twitter.27B", dim="200"), | |
| "glove.6B.50d": partial(GloVe, name="6B", dim="50"), | |
| "glove.6B.100d": partial(GloVe, name="6B", dim="100"), | |
| "glove.6B.200d": partial(GloVe, name="6B", dim="200"), | |
| "glove.6B.300d": partial(GloVe, name="6B", dim="300"), | |
| "glove.42B.300d": partial(GloVe, name="42B", dim=300), | |
| "glove.840B.300d": partial(GloVe, name="840B", dim=300), | |
| "glove.twitter.27B.25d": partial(GloVe, name="twitter.27B", dim=25), | |
| "glove.twitter.27B.50d": partial(GloVe, name="twitter.27B", dim=50), | |
| "glove.twitter.27B.100d": partial(GloVe, name="twitter.27B", dim=100), | |
| "glove.twitter.27B.200d": partial(GloVe, name="twitter.27B", dim=200), | |
| "glove.6B.50d": partial(GloVe, name="6B", dim=50), | |
| "glove.6B.100d": partial(GloVe, name="6B", dim=100), | |
| "glove.6B.200d": partial(GloVe, name="6B", dim=200), | |
| "glove.6B.300d": partial(GloVe, name="6B", dim=300), |
lighthouse/common/vocab.py
Outdated
| "glove.42B.300d": partial(GloVe, name="42B", dim="300"), | ||
| "glove.840B.300d": partial(GloVe, name="840B", dim="300"), | ||
| "glove.twitter.27B.25d": partial(GloVe, name="twitter.27B", dim="25"), | ||
| "glove.twitter.27B.50d": partial(GloVe, name="twitter.27B", dim="50"), | ||
| "glove.twitter.27B.100d": partial(GloVe, name="twitter.27B", dim="100"), | ||
| "glove.twitter.27B.200d": partial(GloVe, name="twitter.27B", dim="200"), | ||
| "glove.6B.50d": partial(GloVe, name="6B", dim="50"), | ||
| "glove.6B.100d": partial(GloVe, name="6B", dim="100"), | ||
| "glove.6B.200d": partial(GloVe, name="6B", dim="200"), | ||
| "glove.6B.300d": partial(GloVe, name="6B", dim="300"), |
Copilot
AI
Aug 11, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Type mismatch: The dim parameter is passed as a string "50" but the GloVe constructor expects an integer. This should be dim=50 without quotes.
| "glove.42B.300d": partial(GloVe, name="42B", dim="300"), | |
| "glove.840B.300d": partial(GloVe, name="840B", dim="300"), | |
| "glove.twitter.27B.25d": partial(GloVe, name="twitter.27B", dim="25"), | |
| "glove.twitter.27B.50d": partial(GloVe, name="twitter.27B", dim="50"), | |
| "glove.twitter.27B.100d": partial(GloVe, name="twitter.27B", dim="100"), | |
| "glove.twitter.27B.200d": partial(GloVe, name="twitter.27B", dim="200"), | |
| "glove.6B.50d": partial(GloVe, name="6B", dim="50"), | |
| "glove.6B.100d": partial(GloVe, name="6B", dim="100"), | |
| "glove.6B.200d": partial(GloVe, name="6B", dim="200"), | |
| "glove.6B.300d": partial(GloVe, name="6B", dim="300"), | |
| "glove.42B.300d": partial(GloVe, name="42B", dim=300), | |
| "glove.840B.300d": partial(GloVe, name="840B", dim=300), | |
| "glove.twitter.27B.25d": partial(GloVe, name="twitter.27B", dim=25), | |
| "glove.twitter.27B.50d": partial(GloVe, name="twitter.27B", dim=50), | |
| "glove.twitter.27B.100d": partial(GloVe, name="twitter.27B", dim=100), | |
| "glove.twitter.27B.200d": partial(GloVe, name="twitter.27B", dim=200), | |
| "glove.6B.50d": partial(GloVe, name="6B", dim=50), | |
| "glove.6B.100d": partial(GloVe, name="6B", dim=100), | |
| "glove.6B.200d": partial(GloVe, name="6B", dim=200), | |
| "glove.6B.300d": partial(GloVe, name="6B", dim=300), |
lighthouse/common/vocab.py
Outdated
| "glove.42B.300d": partial(GloVe, name="42B", dim="300"), | ||
| "glove.840B.300d": partial(GloVe, name="840B", dim="300"), | ||
| "glove.twitter.27B.25d": partial(GloVe, name="twitter.27B", dim="25"), | ||
| "glove.twitter.27B.50d": partial(GloVe, name="twitter.27B", dim="50"), | ||
| "glove.twitter.27B.100d": partial(GloVe, name="twitter.27B", dim="100"), | ||
| "glove.twitter.27B.200d": partial(GloVe, name="twitter.27B", dim="200"), | ||
| "glove.6B.50d": partial(GloVe, name="6B", dim="50"), | ||
| "glove.6B.100d": partial(GloVe, name="6B", dim="100"), | ||
| "glove.6B.200d": partial(GloVe, name="6B", dim="200"), | ||
| "glove.6B.300d": partial(GloVe, name="6B", dim="300"), |
Copilot
AI
Aug 11, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Type mismatch: The dim parameter is passed as a string "100" but the GloVe constructor expects an integer. This should be dim=100 without quotes.
| "glove.42B.300d": partial(GloVe, name="42B", dim="300"), | |
| "glove.840B.300d": partial(GloVe, name="840B", dim="300"), | |
| "glove.twitter.27B.25d": partial(GloVe, name="twitter.27B", dim="25"), | |
| "glove.twitter.27B.50d": partial(GloVe, name="twitter.27B", dim="50"), | |
| "glove.twitter.27B.100d": partial(GloVe, name="twitter.27B", dim="100"), | |
| "glove.twitter.27B.200d": partial(GloVe, name="twitter.27B", dim="200"), | |
| "glove.6B.50d": partial(GloVe, name="6B", dim="50"), | |
| "glove.6B.100d": partial(GloVe, name="6B", dim="100"), | |
| "glove.6B.200d": partial(GloVe, name="6B", dim="200"), | |
| "glove.6B.300d": partial(GloVe, name="6B", dim="300"), | |
| "glove.42B.300d": partial(GloVe, name="42B", dim=300), | |
| "glove.840B.300d": partial(GloVe, name="840B", dim=300), | |
| "glove.twitter.27B.25d": partial(GloVe, name="twitter.27B", dim=25), | |
| "glove.twitter.27B.50d": partial(GloVe, name="twitter.27B", dim=50), | |
| "glove.twitter.27B.100d": partial(GloVe, name="twitter.27B", dim=100), | |
| "glove.twitter.27B.200d": partial(GloVe, name="twitter.27B", dim=200), | |
| "glove.6B.50d": partial(GloVe, name="6B", dim=50), | |
| "glove.6B.100d": partial(GloVe, name="6B", dim=100), | |
| "glove.6B.200d": partial(GloVe, name="6B", dim=200), | |
| "glove.6B.300d": partial(GloVe, name="6B", dim=300), |
lighthouse/common/vocab.py
Outdated
| "glove.42B.300d": partial(GloVe, name="42B", dim="300"), | ||
| "glove.840B.300d": partial(GloVe, name="840B", dim="300"), | ||
| "glove.twitter.27B.25d": partial(GloVe, name="twitter.27B", dim="25"), | ||
| "glove.twitter.27B.50d": partial(GloVe, name="twitter.27B", dim="50"), | ||
| "glove.twitter.27B.100d": partial(GloVe, name="twitter.27B", dim="100"), | ||
| "glove.twitter.27B.200d": partial(GloVe, name="twitter.27B", dim="200"), | ||
| "glove.6B.50d": partial(GloVe, name="6B", dim="50"), | ||
| "glove.6B.100d": partial(GloVe, name="6B", dim="100"), | ||
| "glove.6B.200d": partial(GloVe, name="6B", dim="200"), | ||
| "glove.6B.300d": partial(GloVe, name="6B", dim="300"), |
Copilot
AI
Aug 11, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Type mismatch: The dim parameter is passed as a string "200" but the GloVe constructor expects an integer. This should be dim=200 without quotes.
| "glove.42B.300d": partial(GloVe, name="42B", dim="300"), | |
| "glove.840B.300d": partial(GloVe, name="840B", dim="300"), | |
| "glove.twitter.27B.25d": partial(GloVe, name="twitter.27B", dim="25"), | |
| "glove.twitter.27B.50d": partial(GloVe, name="twitter.27B", dim="50"), | |
| "glove.twitter.27B.100d": partial(GloVe, name="twitter.27B", dim="100"), | |
| "glove.twitter.27B.200d": partial(GloVe, name="twitter.27B", dim="200"), | |
| "glove.6B.50d": partial(GloVe, name="6B", dim="50"), | |
| "glove.6B.100d": partial(GloVe, name="6B", dim="100"), | |
| "glove.6B.200d": partial(GloVe, name="6B", dim="200"), | |
| "glove.6B.300d": partial(GloVe, name="6B", dim="300"), | |
| "glove.42B.300d": partial(GloVe, name="42B", dim=300), | |
| "glove.840B.300d": partial(GloVe, name="840B", dim=300), | |
| "glove.twitter.27B.25d": partial(GloVe, name="twitter.27B", dim=25), | |
| "glove.twitter.27B.50d": partial(GloVe, name="twitter.27B", dim=50), | |
| "glove.twitter.27B.100d": partial(GloVe, name="twitter.27B", dim=100), | |
| "glove.twitter.27B.200d": partial(GloVe, name="twitter.27B", dim=200), | |
| "glove.6B.50d": partial(GloVe, name="6B", dim=50), | |
| "glove.6B.100d": partial(GloVe, name="6B", dim=100), | |
| "glove.6B.200d": partial(GloVe, name="6B", dim=200), | |
| "glove.6B.300d": partial(GloVe, name="6B", dim=300), |
lighthouse/common/vocab.py
Outdated
| "glove.42B.300d": partial(GloVe, name="42B", dim="300"), | ||
| "glove.840B.300d": partial(GloVe, name="840B", dim="300"), | ||
| "glove.twitter.27B.25d": partial(GloVe, name="twitter.27B", dim="25"), | ||
| "glove.twitter.27B.50d": partial(GloVe, name="twitter.27B", dim="50"), | ||
| "glove.twitter.27B.100d": partial(GloVe, name="twitter.27B", dim="100"), | ||
| "glove.twitter.27B.200d": partial(GloVe, name="twitter.27B", dim="200"), | ||
| "glove.6B.50d": partial(GloVe, name="6B", dim="50"), | ||
| "glove.6B.100d": partial(GloVe, name="6B", dim="100"), | ||
| "glove.6B.200d": partial(GloVe, name="6B", dim="200"), | ||
| "glove.6B.300d": partial(GloVe, name="6B", dim="300"), |
Copilot
AI
Aug 11, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Type mismatch: The dim parameter is passed as a string "300" but the GloVe constructor expects an integer. This should be dim=300 without quotes.
| "glove.42B.300d": partial(GloVe, name="42B", dim="300"), | |
| "glove.840B.300d": partial(GloVe, name="840B", dim="300"), | |
| "glove.twitter.27B.25d": partial(GloVe, name="twitter.27B", dim="25"), | |
| "glove.twitter.27B.50d": partial(GloVe, name="twitter.27B", dim="50"), | |
| "glove.twitter.27B.100d": partial(GloVe, name="twitter.27B", dim="100"), | |
| "glove.twitter.27B.200d": partial(GloVe, name="twitter.27B", dim="200"), | |
| "glove.6B.50d": partial(GloVe, name="6B", dim="50"), | |
| "glove.6B.100d": partial(GloVe, name="6B", dim="100"), | |
| "glove.6B.200d": partial(GloVe, name="6B", dim="200"), | |
| "glove.6B.300d": partial(GloVe, name="6B", dim="300"), | |
| "glove.42B.300d": partial(GloVe, name="42B", dim=300), | |
| "glove.840B.300d": partial(GloVe, name="840B", dim=300), | |
| "glove.twitter.27B.25d": partial(GloVe, name="twitter.27B", dim=25), | |
| "glove.twitter.27B.50d": partial(GloVe, name="twitter.27B", dim=50), | |
| "glove.twitter.27B.100d": partial(GloVe, name="twitter.27B", dim=100), | |
| "glove.twitter.27B.200d": partial(GloVe, name="twitter.27B", dim=200), | |
| "glove.6B.50d": partial(GloVe, name="6B", dim=50), | |
| "glove.6B.100d": partial(GloVe, name="6B", dim=100), | |
| "glove.6B.200d": partial(GloVe, name="6B", dim=200), | |
| "glove.6B.300d": partial(GloVe, name="6B", dim=300), |
lighthouse/common/vocab.py
Outdated
| Examples: | ||
| >>> examples = ['chip', 'baby', 'Beautiful'] | ||
| >>> vec = text.vocab.GloVe(name='6B', dim=50) | ||
| >>> ret = vec.get_vecs_by_tokens(tokens, lower_case_backup=True) |
Copilot
AI
Aug 11, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Variable name inconsistency in docstring: The parameter name is 'examples' in the previous line but 'tokens' is used here. Should be 'ret = vec.get_vecs_by_tokens(examples, lower_case_backup=True)' for consistency.
| >>> ret = vec.get_vecs_by_tokens(tokens, lower_case_backup=True) | |
| >>> ret = vec.get_vecs_by_tokens(examples, lower_case_backup=True) |
|
Suggestions by copilot are trivial and I ignore them. |
|
@h-munakata LGTM. |
This PR is related to #68 .
I transplanted torchtext's vocab into
lighthouse/common/vocab.This change effects to the following scripts:
lighthouse/feature_extractor/text_encoders/glove.pytraining/dataset.pytraining/cg_detr_dataset.pyI confirmed that training runs correctly.