Skip to content

Conversation

@h-munakata
Copy link
Contributor

This PR is related to #68 .
I transplanted torchtext's vocab into lighthouse/common/vocab.

This change effects to the following scripts:

  • lighthouse/feature_extractor/text_encoders/glove.py
  • training/dataset.py
  • training/cg_detr_dataset.py

I confirmed that training runs correctly.

@h-munakata h-munakata requested a review from Copilot August 11, 2025 03:57
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR removes the dependency on the torchtext library by transplanting its vocabulary functionality into the local codebase under lighthouse/common/vocab. The change eliminates an external dependency while maintaining the same vocabulary functionality needed for text processing.

  • Replaces torchtext.vocab imports with local lighthouse.common.vocab module
  • Creates a complete local implementation of vocabulary classes (Vocab, GloVe, FastText, etc.)
  • Updates installation instructions and CI workflows to remove torchtext dependency

Reviewed Changes

Copilot reviewed 11 out of 11 changed files in this pull request and generated 22 comments.

Show a summary per file
File Description
training/dataset.py Updates import to use local vocab module instead of torchtext
training/cg_detr_dataset.py Updates import to use local vocab module instead of torchtext
lighthouse/feature_extractor/text_encoders/glove.py Updates import to use local vocab module instead of torchtext
lighthouse/common/vocab/vocab.py Implements Vocab class with torch.jit compatibility
lighthouse/common/vocab/vectors.py Implements vector classes (GloVe, FastText, CharNGram) for embeddings
lighthouse/common/vocab/init.py Package initialization exposing vocab classes
lighthouse/common/vocab.py Complete vocabulary implementation with all classes
mypy.ini Removes torchtext from mypy ignore list
README.md Removes torchtext from installation instructions
.github/workflows/pytest.yml Removes torchtext from CI dependencies
.github/workflows/mypy_ruff.yml Removes torchtext from CI dependencies

Comment on lines 13 to 14
from .utils import reporthook

Copy link

Copilot AI Aug 11, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Import error: The module '.utils' is being imported but there's no evidence of a utils.py file in the lighthouse/common directory. The reporthook function is defined within the same file at line 263, making this import unnecessary and likely to cause a runtime error.

Suggested change
from .utils import reporthook

Copilot uses AI. Check for mistakes.
Comment on lines +290 to +291
"glove.42B.300d": partial(GloVe, name="42B", dim="300"),
"glove.840B.300d": partial(GloVe, name="840B", dim="300"),
Copy link

Copilot AI Aug 11, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Type mismatch: The dim parameter is passed as a string "300" but the GloVe constructor expects an integer. This should be dim=300 without quotes.

Suggested change
"glove.42B.300d": partial(GloVe, name="42B", dim="300"),
"glove.840B.300d": partial(GloVe, name="840B", dim="300"),
"glove.42B.300d": partial(GloVe, name="42B", dim=300),
"glove.840B.300d": partial(GloVe, name="840B", dim=300),

Copilot uses AI. Check for mistakes.
Comment on lines +290 to +291
"glove.42B.300d": partial(GloVe, name="42B", dim="300"),
"glove.840B.300d": partial(GloVe, name="840B", dim="300"),
Copy link

Copilot AI Aug 11, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Type mismatch: The dim parameter is passed as a string "300" but the GloVe constructor expects an integer. This should be dim=300 without quotes.

Suggested change
"glove.42B.300d": partial(GloVe, name="42B", dim="300"),
"glove.840B.300d": partial(GloVe, name="840B", dim="300"),
"glove.42B.300d": partial(GloVe, name="42B", dim=300),
"glove.840B.300d": partial(GloVe, name="840B", dim=300),

Copilot uses AI. Check for mistakes.
"fasttext.simple.300d": partial(FastText, language="simple"),
"glove.42B.300d": partial(GloVe, name="42B", dim="300"),
"glove.840B.300d": partial(GloVe, name="840B", dim="300"),
"glove.twitter.27B.25d": partial(GloVe, name="twitter.27B", dim="25"),
Copy link

Copilot AI Aug 11, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Type mismatch: The dim parameter is passed as a string "25" but the GloVe constructor expects an integer. This should be dim=25 without quotes.

Suggested change
"glove.twitter.27B.25d": partial(GloVe, name="twitter.27B", dim="25"),
"glove.twitter.27B.25d": partial(GloVe, name="twitter.27B", dim=25),

Copilot uses AI. Check for mistakes.
Comment on lines +290 to +299
"glove.42B.300d": partial(GloVe, name="42B", dim="300"),
"glove.840B.300d": partial(GloVe, name="840B", dim="300"),
"glove.twitter.27B.25d": partial(GloVe, name="twitter.27B", dim="25"),
"glove.twitter.27B.50d": partial(GloVe, name="twitter.27B", dim="50"),
"glove.twitter.27B.100d": partial(GloVe, name="twitter.27B", dim="100"),
"glove.twitter.27B.200d": partial(GloVe, name="twitter.27B", dim="200"),
"glove.6B.50d": partial(GloVe, name="6B", dim="50"),
"glove.6B.100d": partial(GloVe, name="6B", dim="100"),
"glove.6B.200d": partial(GloVe, name="6B", dim="200"),
"glove.6B.300d": partial(GloVe, name="6B", dim="300"),
Copy link

Copilot AI Aug 11, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Type mismatch: The dim parameter is passed as a string "50" but the GloVe constructor expects an integer. This should be dim=50 without quotes.

Suggested change
"glove.42B.300d": partial(GloVe, name="42B", dim="300"),
"glove.840B.300d": partial(GloVe, name="840B", dim="300"),
"glove.twitter.27B.25d": partial(GloVe, name="twitter.27B", dim="25"),
"glove.twitter.27B.50d": partial(GloVe, name="twitter.27B", dim="50"),
"glove.twitter.27B.100d": partial(GloVe, name="twitter.27B", dim="100"),
"glove.twitter.27B.200d": partial(GloVe, name="twitter.27B", dim="200"),
"glove.6B.50d": partial(GloVe, name="6B", dim="50"),
"glove.6B.100d": partial(GloVe, name="6B", dim="100"),
"glove.6B.200d": partial(GloVe, name="6B", dim="200"),
"glove.6B.300d": partial(GloVe, name="6B", dim="300"),
"glove.42B.300d": partial(GloVe, name="42B", dim=300),
"glove.840B.300d": partial(GloVe, name="840B", dim=300),
"glove.twitter.27B.25d": partial(GloVe, name="twitter.27B", dim=25),
"glove.twitter.27B.50d": partial(GloVe, name="twitter.27B", dim=50),
"glove.twitter.27B.100d": partial(GloVe, name="twitter.27B", dim=100),
"glove.twitter.27B.200d": partial(GloVe, name="twitter.27B", dim=200),
"glove.6B.50d": partial(GloVe, name="6B", dim=50),
"glove.6B.100d": partial(GloVe, name="6B", dim=100),
"glove.6B.200d": partial(GloVe, name="6B", dim=200),
"glove.6B.300d": partial(GloVe, name="6B", dim=300),

Copilot uses AI. Check for mistakes.
Comment on lines 554 to 563
"glove.42B.300d": partial(GloVe, name="42B", dim="300"),
"glove.840B.300d": partial(GloVe, name="840B", dim="300"),
"glove.twitter.27B.25d": partial(GloVe, name="twitter.27B", dim="25"),
"glove.twitter.27B.50d": partial(GloVe, name="twitter.27B", dim="50"),
"glove.twitter.27B.100d": partial(GloVe, name="twitter.27B", dim="100"),
"glove.twitter.27B.200d": partial(GloVe, name="twitter.27B", dim="200"),
"glove.6B.50d": partial(GloVe, name="6B", dim="50"),
"glove.6B.100d": partial(GloVe, name="6B", dim="100"),
"glove.6B.200d": partial(GloVe, name="6B", dim="200"),
"glove.6B.300d": partial(GloVe, name="6B", dim="300"),
Copy link

Copilot AI Aug 11, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Type mismatch: The dim parameter is passed as a string "50" but the GloVe constructor expects an integer. This should be dim=50 without quotes.

Suggested change
"glove.42B.300d": partial(GloVe, name="42B", dim="300"),
"glove.840B.300d": partial(GloVe, name="840B", dim="300"),
"glove.twitter.27B.25d": partial(GloVe, name="twitter.27B", dim="25"),
"glove.twitter.27B.50d": partial(GloVe, name="twitter.27B", dim="50"),
"glove.twitter.27B.100d": partial(GloVe, name="twitter.27B", dim="100"),
"glove.twitter.27B.200d": partial(GloVe, name="twitter.27B", dim="200"),
"glove.6B.50d": partial(GloVe, name="6B", dim="50"),
"glove.6B.100d": partial(GloVe, name="6B", dim="100"),
"glove.6B.200d": partial(GloVe, name="6B", dim="200"),
"glove.6B.300d": partial(GloVe, name="6B", dim="300"),
"glove.42B.300d": partial(GloVe, name="42B", dim=300),
"glove.840B.300d": partial(GloVe, name="840B", dim=300),
"glove.twitter.27B.25d": partial(GloVe, name="twitter.27B", dim=25),
"glove.twitter.27B.50d": partial(GloVe, name="twitter.27B", dim=50),
"glove.twitter.27B.100d": partial(GloVe, name="twitter.27B", dim=100),
"glove.twitter.27B.200d": partial(GloVe, name="twitter.27B", dim=200),
"glove.6B.50d": partial(GloVe, name="6B", dim=50),
"glove.6B.100d": partial(GloVe, name="6B", dim=100),
"glove.6B.200d": partial(GloVe, name="6B", dim=200),
"glove.6B.300d": partial(GloVe, name="6B", dim=300),

Copilot uses AI. Check for mistakes.
Comment on lines 554 to 563
"glove.42B.300d": partial(GloVe, name="42B", dim="300"),
"glove.840B.300d": partial(GloVe, name="840B", dim="300"),
"glove.twitter.27B.25d": partial(GloVe, name="twitter.27B", dim="25"),
"glove.twitter.27B.50d": partial(GloVe, name="twitter.27B", dim="50"),
"glove.twitter.27B.100d": partial(GloVe, name="twitter.27B", dim="100"),
"glove.twitter.27B.200d": partial(GloVe, name="twitter.27B", dim="200"),
"glove.6B.50d": partial(GloVe, name="6B", dim="50"),
"glove.6B.100d": partial(GloVe, name="6B", dim="100"),
"glove.6B.200d": partial(GloVe, name="6B", dim="200"),
"glove.6B.300d": partial(GloVe, name="6B", dim="300"),
Copy link

Copilot AI Aug 11, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Type mismatch: The dim parameter is passed as a string "100" but the GloVe constructor expects an integer. This should be dim=100 without quotes.

Suggested change
"glove.42B.300d": partial(GloVe, name="42B", dim="300"),
"glove.840B.300d": partial(GloVe, name="840B", dim="300"),
"glove.twitter.27B.25d": partial(GloVe, name="twitter.27B", dim="25"),
"glove.twitter.27B.50d": partial(GloVe, name="twitter.27B", dim="50"),
"glove.twitter.27B.100d": partial(GloVe, name="twitter.27B", dim="100"),
"glove.twitter.27B.200d": partial(GloVe, name="twitter.27B", dim="200"),
"glove.6B.50d": partial(GloVe, name="6B", dim="50"),
"glove.6B.100d": partial(GloVe, name="6B", dim="100"),
"glove.6B.200d": partial(GloVe, name="6B", dim="200"),
"glove.6B.300d": partial(GloVe, name="6B", dim="300"),
"glove.42B.300d": partial(GloVe, name="42B", dim=300),
"glove.840B.300d": partial(GloVe, name="840B", dim=300),
"glove.twitter.27B.25d": partial(GloVe, name="twitter.27B", dim=25),
"glove.twitter.27B.50d": partial(GloVe, name="twitter.27B", dim=50),
"glove.twitter.27B.100d": partial(GloVe, name="twitter.27B", dim=100),
"glove.twitter.27B.200d": partial(GloVe, name="twitter.27B", dim=200),
"glove.6B.50d": partial(GloVe, name="6B", dim=50),
"glove.6B.100d": partial(GloVe, name="6B", dim=100),
"glove.6B.200d": partial(GloVe, name="6B", dim=200),
"glove.6B.300d": partial(GloVe, name="6B", dim=300),

Copilot uses AI. Check for mistakes.
Comment on lines 554 to 563
"glove.42B.300d": partial(GloVe, name="42B", dim="300"),
"glove.840B.300d": partial(GloVe, name="840B", dim="300"),
"glove.twitter.27B.25d": partial(GloVe, name="twitter.27B", dim="25"),
"glove.twitter.27B.50d": partial(GloVe, name="twitter.27B", dim="50"),
"glove.twitter.27B.100d": partial(GloVe, name="twitter.27B", dim="100"),
"glove.twitter.27B.200d": partial(GloVe, name="twitter.27B", dim="200"),
"glove.6B.50d": partial(GloVe, name="6B", dim="50"),
"glove.6B.100d": partial(GloVe, name="6B", dim="100"),
"glove.6B.200d": partial(GloVe, name="6B", dim="200"),
"glove.6B.300d": partial(GloVe, name="6B", dim="300"),
Copy link

Copilot AI Aug 11, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Type mismatch: The dim parameter is passed as a string "200" but the GloVe constructor expects an integer. This should be dim=200 without quotes.

Suggested change
"glove.42B.300d": partial(GloVe, name="42B", dim="300"),
"glove.840B.300d": partial(GloVe, name="840B", dim="300"),
"glove.twitter.27B.25d": partial(GloVe, name="twitter.27B", dim="25"),
"glove.twitter.27B.50d": partial(GloVe, name="twitter.27B", dim="50"),
"glove.twitter.27B.100d": partial(GloVe, name="twitter.27B", dim="100"),
"glove.twitter.27B.200d": partial(GloVe, name="twitter.27B", dim="200"),
"glove.6B.50d": partial(GloVe, name="6B", dim="50"),
"glove.6B.100d": partial(GloVe, name="6B", dim="100"),
"glove.6B.200d": partial(GloVe, name="6B", dim="200"),
"glove.6B.300d": partial(GloVe, name="6B", dim="300"),
"glove.42B.300d": partial(GloVe, name="42B", dim=300),
"glove.840B.300d": partial(GloVe, name="840B", dim=300),
"glove.twitter.27B.25d": partial(GloVe, name="twitter.27B", dim=25),
"glove.twitter.27B.50d": partial(GloVe, name="twitter.27B", dim=50),
"glove.twitter.27B.100d": partial(GloVe, name="twitter.27B", dim=100),
"glove.twitter.27B.200d": partial(GloVe, name="twitter.27B", dim=200),
"glove.6B.50d": partial(GloVe, name="6B", dim=50),
"glove.6B.100d": partial(GloVe, name="6B", dim=100),
"glove.6B.200d": partial(GloVe, name="6B", dim=200),
"glove.6B.300d": partial(GloVe, name="6B", dim=300),

Copilot uses AI. Check for mistakes.
Comment on lines 554 to 563
"glove.42B.300d": partial(GloVe, name="42B", dim="300"),
"glove.840B.300d": partial(GloVe, name="840B", dim="300"),
"glove.twitter.27B.25d": partial(GloVe, name="twitter.27B", dim="25"),
"glove.twitter.27B.50d": partial(GloVe, name="twitter.27B", dim="50"),
"glove.twitter.27B.100d": partial(GloVe, name="twitter.27B", dim="100"),
"glove.twitter.27B.200d": partial(GloVe, name="twitter.27B", dim="200"),
"glove.6B.50d": partial(GloVe, name="6B", dim="50"),
"glove.6B.100d": partial(GloVe, name="6B", dim="100"),
"glove.6B.200d": partial(GloVe, name="6B", dim="200"),
"glove.6B.300d": partial(GloVe, name="6B", dim="300"),
Copy link

Copilot AI Aug 11, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Type mismatch: The dim parameter is passed as a string "300" but the GloVe constructor expects an integer. This should be dim=300 without quotes.

Suggested change
"glove.42B.300d": partial(GloVe, name="42B", dim="300"),
"glove.840B.300d": partial(GloVe, name="840B", dim="300"),
"glove.twitter.27B.25d": partial(GloVe, name="twitter.27B", dim="25"),
"glove.twitter.27B.50d": partial(GloVe, name="twitter.27B", dim="50"),
"glove.twitter.27B.100d": partial(GloVe, name="twitter.27B", dim="100"),
"glove.twitter.27B.200d": partial(GloVe, name="twitter.27B", dim="200"),
"glove.6B.50d": partial(GloVe, name="6B", dim="50"),
"glove.6B.100d": partial(GloVe, name="6B", dim="100"),
"glove.6B.200d": partial(GloVe, name="6B", dim="200"),
"glove.6B.300d": partial(GloVe, name="6B", dim="300"),
"glove.42B.300d": partial(GloVe, name="42B", dim=300),
"glove.840B.300d": partial(GloVe, name="840B", dim=300),
"glove.twitter.27B.25d": partial(GloVe, name="twitter.27B", dim=25),
"glove.twitter.27B.50d": partial(GloVe, name="twitter.27B", dim=50),
"glove.twitter.27B.100d": partial(GloVe, name="twitter.27B", dim=100),
"glove.twitter.27B.200d": partial(GloVe, name="twitter.27B", dim=200),
"glove.6B.50d": partial(GloVe, name="6B", dim=50),
"glove.6B.100d": partial(GloVe, name="6B", dim=100),
"glove.6B.200d": partial(GloVe, name="6B", dim=200),
"glove.6B.300d": partial(GloVe, name="6B", dim=300),

Copilot uses AI. Check for mistakes.
Examples:
>>> examples = ['chip', 'baby', 'Beautiful']
>>> vec = text.vocab.GloVe(name='6B', dim=50)
>>> ret = vec.get_vecs_by_tokens(tokens, lower_case_backup=True)
Copy link

Copilot AI Aug 11, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Variable name inconsistency in docstring: The parameter name is 'examples' in the previous line but 'tokens' is used here. Should be 'ret = vec.get_vecs_by_tokens(examples, lower_case_backup=True)' for consistency.

Suggested change
>>> ret = vec.get_vecs_by_tokens(tokens, lower_case_backup=True)
>>> ret = vec.get_vecs_by_tokens(examples, lower_case_backup=True)

Copilot uses AI. Check for mistakes.
@h-munakata
Copy link
Contributor Author

Suggestions by copilot are trivial and I ignore them.
@awkrail
After you check this PR, I'll merge it.

@awkrail
Copy link
Contributor

awkrail commented Aug 12, 2025

@h-munakata LGTM.

@h-munakata h-munakata merged commit a6bd6b4 into line:main Aug 12, 2025
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants