Skip to content

Conversation

@MasterPtato
Copy link
Contributor

No description provided.

@vercel
Copy link

vercel bot commented Jan 6, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

4 Skipped Deployments
Project Deployment Review Updated (UTC)
rivet-cloud Ignored Ignored Jan 6, 2026 3:16am
rivet-inspector Ignored Ignored Preview Jan 6, 2026 3:16am
rivet-site Ignored Ignored Preview Jan 6, 2026 3:16am
rivetkit-serverless Skipped Skipped Jan 6, 2026 3:16am

@vercel vercel bot temporarily deployed to Preview – rivetkit-serverless January 6, 2026 03:15 Inactive
Copy link
Contributor Author

MasterPtato commented Jan 6, 2026

Warning

This pull request is not mergeable via GitHub because a downstack PR is open. Once all requirements are satisfied, merge this PR as a stack on Graphite.
Learn more


How to use the Graphite Merge Queue

Add the label merge-queue to this PR to add it to the merge queue.

You must have a Graphite account in order to use the merge queue. Sign up using this link.

An organization admin has enabled the Graphite Merge Queue in this repository.

Please do not merge from GitHub as this will restart CI on PRs being processed by the merge queue.

This stack of pull requests is managed by Graphite. Learn more about stacking.

@NathanFlurry NathanFlurry changed the base branch from main to graphite-base/3755 January 6, 2026 03:16
@NathanFlurry NathanFlurry force-pushed the 01-05-fix_fix_deadlock_in_ups branch from 5adffbf to a4a86f0 Compare January 6, 2026 03:16
@vercel vercel bot temporarily deployed to Preview – rivetkit-serverless January 6, 2026 03:16 Inactive
@NathanFlurry NathanFlurry changed the base branch from graphite-base/3755 to 01-05-fix_engine_update_metrics_to_prometheus January 6, 2026 03:16
@graphite-app
Copy link
Contributor

graphite-app bot commented Jan 6, 2026

Merge activity

  • Jan 6, 3:16 AM UTC: NathanFlurry added this pull request to the Graphite merge queue.
  • Jan 6, 3:17 AM UTC: CI is running for this pull request on a draft pull request (#3758) due to your merge queue CI optimization settings.
  • Jan 6, 3:18 AM UTC: Merged by the Graphite merge queue via draft PR: #3758.

@claude
Copy link

claude bot commented Jan 6, 2026

Code Review - PR #3755: Fix deadlock in UPS

Summary

This PR fixes a critical deadlock issue in the UniversalPubSub system and adds tracing instrumentation. The changes are focused and well-targeted.


Strengths

  1. Critical Deadlock Fix (line 384)

    • The change from remove_async(&subject).await to sender.remove() is correct
    • The deadlock occurred because remove_async() requires holding an async lock on the HashMap while in a Drop implementation, which runs in a spawned task context
    • Using sender.remove() (called on the OccupiedEntry) is the right approach as it removes the entry without requiring an additional async lock
    • This aligns with the scc HashMap API design for concurrent operations
  2. Improved Observability

    • Adding #[tracing::instrument] to key methods enables better debugging
    • Structured logging with field extraction (e.g., fields(%subject)) follows best practices
    • The skip_all parameter prevents logging large payloads while still tracking operation flow
  3. Span Propagation Fix (line 459)

    • Adding .in_current_span() to self.driver.next() ensures the tracing context propagates correctly through the tokio::select! macro
    • Without this, the driver's operations would lose their parent span context
  4. Import Hygiene (line 5)

    • Changing from use anyhow::*; to use anyhow::Result; follows the project's CLAUDE.md guidelines
    • ✅ Explicitly states: "Do not glob import (::*) from anyhow. Instead, import individual types and traits"

🔍 Observations & Considerations

  1. Error Handling in Drop (line 384)

    • The let _ = sender.remove(); discards any potential error
    • This is acceptable in a Drop context where error handling is limited, but worth noting
    • Consider if there are any error cases from remove() that should be logged (though likely not critical)
  2. Race Condition in Drop (lines 382-388)

    • There's a TOCTOU (time-of-check-time-of-use) issue: checking receiver_count() == 0 and then calling sender.remove() are separate operations
    • Between the check and remove, another subscriber could theoretically be created
    • However, this appears to be acceptable given:
      • The GC task also cleans up zero-receiver entries (lines 69-74)
      • The worst case is a minor memory leak until the next GC cycle
      • The fast-path optimization is maintained
  3. Instrumentation Completeness

    • Most public and important private methods now have instrumentation
    • Consider if Subscriber::new() (line 336) needs instrumentation for lifecycle tracking
    • The constructor is simple, so likely fine as-is
  4. Testing

    • No test changes included - this is fine for a deadlock fix
    • The deadlock would only manifest under specific concurrent conditions
    • Consider manual testing under load to verify the fix

🎯 Recommendations

  1. Optional: Add trace logging in Drop (line 384)

    • Would help diagnose any unexpected removal failures
    • Though errors here should be rare
  2. Consider documentation

    • Add a brief comment explaining why sender.remove() is used instead of remove_async() to prevent future refactoring mistakes

🔒 Security, Performance & Test Coverage

  • Security: No security concerns. The fix reduces the attack surface by eliminating a deadlock vector.
  • Performance: Positive impact - removes blocking async operations in Drop, reducing contention.
  • Test Coverage: No tests changed. For a deadlock fix, this is expected as deadlocks are difficult to test reliably.

Verdict

LGTM - This is a well-executed fix that addresses the core deadlock issue while improving observability. The changes follow the project conventions and best practices outlined in CLAUDE.md.

The deadlock fix is the critical change, and it's implemented correctly. The tracing additions are a nice improvement for debugging production issues.

graphite-app bot pushed a commit that referenced this pull request Jan 6, 2026
@graphite-app graphite-app bot closed this Jan 6, 2026
@graphite-app graphite-app bot deleted the 01-05-fix_fix_deadlock_in_ups branch January 6, 2026 03:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants