-
Notifications
You must be signed in to change notification settings - Fork 142
feat(engine): optionally automatically drain old runners on new runner version connected #3675
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
The latest updates on your projects. Learn more about Vercel for GitHub.
3 Skipped Deployments
|
PR Review: Auto-drain old runners on version upgradeSummaryThis PR adds an optional Code Quality & Best Practices✅ Strengths
|
More templates
@rivetkit/cloudflare-workers
@rivetkit/db
@rivetkit/framework-base
@rivetkit/next-js
@rivetkit/react
rivetkit
@rivetkit/sql-loader
@rivetkit/engine-runner
@rivetkit/engine-runner-protocol
commit: |
4f348fe to
d9e7133
Compare
d9e7133 to
f160185
Compare
Pull Request Review: feat(engine): optionally automatically drain old runners on new runner version connectedOverviewThis PR implements automatic draining of older runner versions when a new version connects. The feature is controlled by a new ✅ Strengths
🔍 Issues Found1. Duplicate drain logic in workflow (runner2.rs:66-81)Severity: Medium The drain operation is executed in both the workflow activity AND within the operation itself: // In workflow (runner2.rs:66-81)
let drain_result = ctx
.activity(DrainOlderVersionsInput { ... })
.await?;
for workflow_id in drain_result.older_runner_workflow_ids {
ctx.signal(Stop { ... })
.to_workflow_id(workflow_id)
.send()
.await?;
}
// In operation (drain.rs:100-108)
if input.send_runner_stop_signals {
for workflow_id in &older_runners {
ctx.signal(crate::workflows::runner2::Stop { ... })
.to_workflow_id(*workflow_id)
.send()
.await?;
}
}This creates redundant signal sending. The workflow receives the list of workflow IDs and sends Stop signals, but the operation also sends signals if Recommendation: Either:
The current code at drain.rs:100 sends signals when 2. Missing workflow input parameter (runner2.rs:67-72)Severity: Low The ctx.activity(DrainOlderVersionsInput {
namespace_id: input.namespace_id,
name: input.name.clone(),
version: input.version,
})But Recommendation: Add explicit 3. Hard-coded test values (api_runner_configs_*.rs)Severity: Low All test files now have // api_runner_configs_list.rs:44, 99, 152, 158, etc.
drain_on_version_upgrade: true,These tests are about listing/upserting configs, not about drain behavior. Recommendation: Use 4. Inconsistent protocol version handling (test_runner/protocol.rs:6-18)Severity: Low The test runner protocol helper now uses MK2, but the comment and function naming could be clearer: pub const PROTOCOL_VERSION: u16 = rp::PROTOCOL_MK2_VERSION;
/// Helper to decode messages from server (MK2)
pub fn decode_to_client(buf: &[u8], protocol_version: u16) -> Result<rp2::ToClient> {Good that it's marked as MK2, but the protocol_version parameter is still accepted but not used in validation. Recommendation: Consider adding a protocol version check or document why it's safe to accept any version. 5. Runner wait time increase (test_runner/runner.rs:263-269)Severity: Info Added a 2-second sleep after runner becomes ready: // In MK2, we need to wait for the workflow to process the Init signal
// and mark the runner as eligible for actor allocation.
// This can take some time due to workflow processing:
// 1. Workflow receives Init signal
// 2. Workflow executes MarkEligible activity
// 3. Database is updated with runner allocation index
tokio::time::sleep(Duration::from_millis(2000)).await;This is well-commented but seems like a test-only workaround. In production, is there a better way to wait for eligibility status rather than a fixed sleep? Recommendation: Consider if this reveals a gap in the API (e.g., missing ready/eligible status that could be polled). 🎯 Performance Considerations
🔒 Security Considerations✅ No security issues identified. The feature:
📚 Documentation & Code Quality
🧪 Test CoverageExcellent coverage with 4 comprehensive test scenarios:
Missing edge cases:
🐛 TypeScript ChangesThe TypeScript changes look minimal and correct:
No issues found in TS code. 📋 Recommendations SummaryMust Fix:
Should Fix: Nice to Have: ✅ VerdictThis is a well-implemented feature with strong test coverage. The main concern is the potential duplicate signal sending between the workflow and operation. Once that's clarified/fixed, this should be good to merge. Estimated risk: Low (mostly opt-in feature with good defaults) |
f160185 to
5d41ee6
Compare
5d41ee6 to
e4476fd
Compare
Merge activity
|
…r version connected
e4476fd to
69ed9ac
Compare
99737e0 to
47a8179
Compare
…r version connected (#3675)
Pull Request Review: Automatic Runner Draining on Version UpgradeOverviewThis PR adds an optional ✅ Strengths
🔍 Code Quality Issues1. Missing Import Statements (Critical)In 2. Inconsistent Error HandlingIn 3. Potential Race ConditionIn 4. Test Reliability ConcernIn 🔒 SecurityNo critical security issues identified. ⚡ Performance
📊 Test CoverageExcellent coverage with 6 scenarios. Missing edge cases: simultaneous connections, active drain conflicts, config updates during lifecycle. 🐛 Potential Bugs
🎯 RecommendationsHigh Priority:
Medium Priority: Low Priority: SummaryWell-implemented feature with good test coverage. Main issues are code style violations and test reliability concerns. Functionally sound but would benefit from improvements before merging. Review generated by Claude Code |

No description provided.