Skip to content

Conversation

@srowen
Copy link
Contributor

@srowen srowen commented Apr 26, 2025

Upon running the code as-is today, I find a few small errors arise even when running against the given example doc:

wget -O downloaded/1601.00978.tar.gz https://arxiv.org/e-print/1601.00978 --user-agent "Name <email>"
python main.py
  • Some data/packages files don't have an includes key. The code needs to account for this.
  • parse_snippet operates on envs entries; it seems to assume they're strings but they're dicts. I think it intends to operate on the name of each entry?

Making the changes below resulted in an apparently successful run, anyway.

@Fireblossom
Copy link
Collaborator

Hi @srowen,

Thanks so much for taking a closer look at the code and fixing it!

You're right that the code was failing on some package files.
After reviewing your changes and digging a bit deeper, I find that the structure of the definition files was updated in one of their recent upstream commits.

Specifically:

  1. The includes key, which my original code was looking for, has been renamed to deps.
  2. The structure for macros and envs also appears to have been adjusted in their definitions, which may also impact parsing.

So, while your proposed change to make the includes key optional would prevent the immediate crash, it would unfortunately cause the parser to miss the dependency information and potentially some macro/environment specs from files using this newer format.

I think the ideal solution would be to:

  1. Update the parser to look for the deps key instead of the includes key.
  2. Review and adjust the parsing for the macros and envs structure.

I will fix it asap.

Best,
Fireblossom

@srowen
Copy link
Contributor Author

srowen commented Jun 18, 2025 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants