Skip to content

Conversation

@aditya0by0
Copy link
Member

@aditya0by0 aditya0by0 commented Jan 9, 2026

Fixes the error: BCE loss unable to locate the data.pt file

  File "/home/staff/a/akhedekar/python-chebai/chebai/models/base.py", line 217, in validation_step
    return self._execute(
  File "/home/staff/a/akhedekar/python-chebai/chebai/models/base.py", line 301, in _execute
    loss = self.criterion(loss_data, loss_labels, **loss_kwargs)
  File "/home/staff/a/akhedekar/miniconda3/envs/gnn/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1739, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/home/staff/a/akhedekar/miniconda3/envs/gnn/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1750, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/staff/a/akhedekar/python-chebai/chebai/loss/bce_weighted.py", line 101, in forward
    self.set_pos_weight(input)
  File "/home/staff/a/akhedekar/python-chebai/chebai/loss/bce_weighted.py", line 67, in set_pos_weight
    [
  File "/home/staff/a/akhedekar/python-chebai/chebai/loss/bce_weighted.py", line 71, in <listcomp>
    for row in self.data_extractor.load_processed_data(
  File "/home/staff/a/akhedekar/python-chebai/chebai/preprocessing/datasets/base.py", line 1325, in load_processed_data
    return self.load_processed_data_from_file(filename)
  File "/home/staff/a/akhedekar/python-chebai/chebai/preprocessing/datasets/base.py", line 1328, in load_processed_data_from_file
    return torch.load(os.path.join(filename), weights_only=False)
  File "/home/staff/a/akhedekar/miniconda3/envs/gnn/lib/python3.10/site-packages/torch/serialization.py", line 1425, in load
    with _open_file_like(f, "rb") as opened_file:
  File "/home/staff/a/akhedekar/miniconda3/envs/gnn/lib/python3.10/site-packages/torch/serialization.py", line 751, in _open_file_like
    return _open_file(name_or_buffer, mode)
  File "/home/staff/a/akhedekar/miniconda3/envs/gnn/lib/python3.10/site-packages/torch/serialization.py", line 732, in __init__
    super().__init__(open(name, mode))
FileNotFoundError: [Errno 2] No such file or directory: 'data.pt'

Context

In #89 (comment), it was agreed to introduce a new method load_processed_data_from_file that accepts only a filename as input and loads the corresponding file from the processed directory.

It should be:

  1. load_processed_data(kind="test") calls _retrieve_splits_from_csv
  2. _retrieve_splits_from_csv calls load_processed_data_from_file(filename=data.pt) (a different function for a different functionality)

However, this change was missed in #92. As a result, the current implementation of load_processed_data_from_file requires the entire file path to be passed in order to load the file. While most of the codebase was updated to use this method (by passing the full path), the loss logic was not updated, leading to inconsistent behavior.

        if self.pos_weight is None:
            print(
                f"Computing loss-weights based on v{self.data_extractor.chebi_version} dataset (beta={self.beta})"
            )
            complete_labels = torch.concat(
                [
                    torch.stack(
                        [
                            torch.Tensor(row["labels"])
                            for row in self.data_extractor.load_processed_data(
                                filename=file_name
                            )
                        ]
                    )
                    for file_name in self.data_extractor.processed_file_names
                ]
            )

As the method name suggests, users should only be required to provide the filename. The method itself should be responsible for constructing the full file path by resolving it relative to the processed directory.

@aditya0by0 aditya0by0 requested a review from sfluegel05 January 9, 2026 15:23
@aditya0by0 aditya0by0 marked this pull request as draft January 9, 2026 15:28
@aditya0by0 aditya0by0 marked this pull request as ready for review January 9, 2026 19:32
@sfluegel05
Copy link
Collaborator

Thanks for fixing this. I came to the same conclusion that the handling is inconsistent here.
@aditya0by0 Could you also add in the docstring documentation of load_processed_data_from_file that the full path is not required, just the file name (maybe with an example).
@tim could you try the fix from this branch and check if it solves your problem?

@aditya0by0 aditya0by0 force-pushed the fix/file_not_found_for_loss branch from bdb7be7 to 89cb005 Compare January 15, 2026 14:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

load_processed_data logic in GraphPropertiesMixIn executed twice due to recursive call pattern

3 participants