Skip to content

Conversation

@beckobert
Copy link
Contributor

@beckobert beckobert commented Dec 20, 2024

There seems to be a problem when using preprocessed datasets in combination with multiheads.
When getting a structure from HDF5Dataset, it is first loaded into a Configuration. When initializing the Configuration, the head is not specified and, therefore, is set to "Default" by default. Currently, the correct head saved to the HDF5Dataset is then only set, if configuration.head is None, which currently is never the case.

This pull request should fix that by always setting the head to the value saved in the HDF5Dataset and to Default, if it isn't specified (in line with how heads are set when turning the configuration into AtomicData).
In principle, this assignment can also be moved into the initialization of the Configuration.

There is also - indepentent of multiheads - a problem with preprocessed test sets, if they are preprocessed with multiple processes. They were, contrary to what the documentation says and run_train.py expects, not saved in their own directory, but instead in the same directory with different file names.

@beckobert beckobert changed the title Correct assignment of head Fix some problems with preprocessed datasets Dec 27, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant