Skip to content

Conversation

@MarcosNicolau
Copy link
Member

@MarcosNicolau MarcosNicolau commented Dec 17, 2025

Description

When aggregating proofs, we mark all the proofs to aggregate to processing so no other proof aggregator aggregates them. The problem is that in the case the proof aggregator fails at any step, the tasks status would stay in processing and no other process would take them to aggregate.

The presented solution to this is:

  1. After an err the proof aggregator will try to set their status back to pending again, so it become available for other aggregators
  2. If 1. fails or the machine was shut down in the middle of the process, then we check the new status_updated_at flag in the task. If 12 hours have passed since the update to processing then it is also valid for the aggregator to take.

How to test

  1. Start ethereum package:
make ethereum_package_start
  1. Start gateway
make agg_mode_gateway_start_ethereum_package
agg_mode_payments_poller_start_ethereum_package
  1. Send proofs
make agg_mode_gateway_send_payment
make agg_mode_gateway_send_sp1_proof
  1. Update the code to force an error in proof aggregator, for example in aggregation_mode/proof_aggregator/src/backend/merkle_tree.rs:
pub fn compute_proofs_merkle_root(
    proofs: &[AlignedProof],
) -> Option<(MerkleTree<AlignedProof>, Vec<[u8; 32]>)> {
   None
}
  1. Start proof aggregator:
make proof_aggregator_start_ethereum_package
  1. See that the proof aggregator fails and sets the proofs in pending status again.
  2. Another test you can do is:
    • Go to adminer in http://localhost:8090
    • Select the tasks and update one to processing status and set the timestamp to a number that is 12 hours in the past from now (you can do that here).
    • Run the proof aggregator and it should fetch the task
    • You can run the same one but setting it to a timestamp that is less than 12 hours in the past, in this case the proof aggregator should not select the task.

Type of change

Please delete options that are not relevant.

  • New feature

Checklist

  • “Hotfix” to testnet, everything else to staging
  • Linked to Github Issue
  • This change depends on code or research by an external entity
    • Acknowledgements were updated to give credit
  • Unit tests added
  • This change requires new documentation.
    • Documentation has been added/updated.
  • This change is an Optimization
    • Benchmarks added/run
  • Has a known issue
  • If your PR changes the Operator compatibility (Ex: Upgrade prover versions)
    • This PR adds compatibility for operator for both versions and do not change crates/docs/examples
    • This PR updates batcher and docs/examples to the newer version. This requires the operator are already updated to be compatible

@MarcosNicolau MarcosNicolau changed the base branch from testnet to staging December 17, 2025 19:39
@MarcosNicolau MarcosNicolau self-assigned this Dec 17, 2025
@MarcosNicolau MarcosNicolau requested a review from JuArce December 18, 2025 17:32
@MarcosNicolau MarcosNicolau requested a review from JuArce December 22, 2025 18:09
pub program_commitment: Vec<u8>,
pub merkle_path: Option<Vec<u8>>,
pub status: TaskStatus,
pub status_updated_at: Option<DateTime<Utc>>,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
pub status_updated_at: Option<DateTime<Utc>>,
pub status_updated_at: DateTime<Utc>,

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants