Skip to content

Conversation

@LucaMantani
Copy link
Member

@LucaMantani LucaMantani commented Oct 9, 2025

This PR improves batching addressing the issue #404 .

The idea is that it precomputes the inv covmats if batching is fixed. Also, it allows to choose the option to change batches at each epoch.

Data batches can be controlled from the runcard with the following options:

batch_size: 128
batch_seed: 3
shuffle_each_epoch: False

Main implementations:

  • The stream of DataBatches returns a BatchSpec object which contains both indices and precomputed inv_cov, if available.
  • the node data_batches now allows to decide whether the batches should be shuffled each epoch or not. In the second case, if a fit_covariance matrix is provided, the inv_cov corresponding to the batch is precomputed.
  • The likelihood class call method now can receive a BatchSpec object and makes use of it by performing slicing and using the precomputed inv_cov if available. If the batch is not provided, it will behave as before. Note: currently only the MC method uses the batching but with this modification, any method could use it, i.e. the Hessian method might benefit from it in principle since it's also using the gradient_descent.

@LucaMantani LucaMantani marked this pull request as draft October 9, 2025 14:21
@LucaMantani LucaMantani marked this pull request as ready for review November 10, 2025 11:31
@codecov
Copy link

codecov bot commented Nov 10, 2025

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 95.57%. Comparing base (cf194c6) to head (5debb01).
⚠️ Report is 5 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main     #409      +/-   ##
==========================================
+ Coverage   95.54%   95.57%   +0.02%     
==========================================
  Files          29       29              
  Lines        1438     1468      +30     
==========================================
+ Hits         1374     1403      +29     
- Misses         64       65       +1     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@LucaMantani LucaMantani linked an issue Nov 10, 2025 that may be closed by this pull request
@LucaMantani
Copy link
Member Author

Running the attached card (full DIS) with

les_houches_exe lh_batching.yaml -rep 1

in main takes:

[INFO]: MONTE CARLO RUNNING TIME: 389.940048 s

while in the PR takes

[INFO]: MONTE CARLO RUNNING TIME: 6.005040 s

Giving identical results.

lh_batching.yaml

@LucaMantani LucaMantani requested a review from comane November 10, 2025 12:19
Copy link
Collaborator

@vschutze-alt vschutze-alt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've made some small changes to the documentation. Other than that, it looks good to me.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Optimise batching

3 participants