-
Notifications
You must be signed in to change notification settings - Fork 3.2k
Labels
Description
Summary
During fault-injection testing for the Cosmos DB Rust SDK, gaps and inconsistencies were identified in the current retry logic compared to other SDKs.
Current Behavior
- No retries are performed for HTTP 408 (Request Timeout).
- Retries for 500, 503, and 410 (substatus 1022) are limited:
- Only a single retry is attempted.
- Retries are not executed across all eligible regions.
- 410-1022 does not occur in gateway mode and is handled explicitly.
- Retry behavior differs from other SDKs (e.g., Python), which retry a broader set of transient errors across regions.
Proposed Changes
1. Retry by Category Instead of Narrow Allowlist
- Retry based on error classes, not specific hand-picked status codes.
- Treat the following as retriable by default:
- All 5xx errors
- 408
- 410 (substatus 1022)
- Maintain a blocklist of non-retriable errors instead.
- This aligns with:
- Python SDK behavior
- Envoy-style retry semantics
- Improves resilience to future infrastructure or status-code changes.
2. Retry Across All Eligible Regions
- Retry across all applicable regions, excluding explicitly excluded ones.
- Respect preferred region order.
- Matches retry semantics in other Cosmos DB SDKs.
- Improves availability under regional or transient faults.
3. Different Defaults for Reads vs Writes
- Reads:
- Retry by default.
- Minimal blocklist.
- Writes:
- More conservative.
- Curated set of retriable status codes to avoid side effects.
Motivation
- Improves consistency across Cosmos DB SDKs.
- Increases resilience to transient and regional failures.
- Simplifies retry logic by focusing on exceptions rather than enumerating every transient case.
- Reduces the risk of missing new transient status codes.
Open Questions
- Reads vs Writes
-
Should reads and writes use different retry strategies by default?
-
What should the write retry blocklist include?
-
Any Rust SDK–specific execution model concerns?
-
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
Type
Projects
Status
Untriaged
Status
No status
Status
In Progress