Skip to content

Conversation

@synhershko
Copy link

@synhershko synhershko commented Dec 21, 2025

This commit introduces a new replica_only node role that provides read scalability without impacting the write path or triggering cluster rebalancing, enabling cost-effective horizontal scaling for read-heavy workloads.

Motivation

Organizations often need to scale read capacity independently from write capacity. Traditional approaches of adding more data nodes cause:

  • Unwanted shard rebalancing across the cluster
  • Increased write coordination overhead
  • Higher infrastructure costs for full-featured data nodes

The replica-only node type addresses these challenges by:

  • Providing read-only shard hosting without participating in primary shard allocation or rebalancing
  • Enabling cheap, ephemeral nodes that can be added/removed without cluster disruption
  • Supporting integration with object stores (S3, etc.) for pulling index data on-demand

Why Auto-Expand Replicas 0-all Only

The replica-only node type exclusively supports indices with index.auto_expand_replicas: 0-all for several critical reasons:

  1. Dynamic Replica Management: Auto-expand automatically adjusts replica counts when replica-only nodes join/leave, eliminating manual intervention and preventing under-replication

  2. No Manual Rebalancing: Without auto-expand, adding replica-only nodes would require manual replica count adjustments and could trigger rebalancing on data nodes

  3. Predictable Behavior: The 0-all setting guarantees one copy per eligible node, making replica distribution deterministic and transparent

  4. Operational Safety: Prevents accidental allocation of critical production indices to nodes that may be ephemeral or have different SLAs

High-Level Design

Core Components

  1. New Node Role (DiscoveryNodeRole.REPLICA_ONLY_ROLE)

    • Role name: replica_only
    • Dedicated role that cannot coexist with any other role
  2. Allocation Decider (ReplicaOnlyAllocationDecider)

    • Blocks ALL primary shard allocation to replica-only nodes
  • Blocks replica allocation unless index has auto_expand_replicas: 0-all
    • Prevents force allocation of primaries (safety guarantee)
  1. Rebalancing Prevention (LocalShardsBalancer)
    • Excludes replica-only nodes from rebalancing model entirely
  • Adding/removing replica-only nodes causes zero data node rebalancing
    • Maintains cluster balance stability
  1. Replica Promotion Prevention (RoutingNodes)
    • Blocks promotion of replicas to primaries on replica-only nodes
    • Cluster enters YELLOW/RED state when primary fails and only replica-only nodes have copies
    • Ensures data integrity by requiring regular data node for primaries

Design Concerns Addressed

1. Data Integrity and Cluster Health

What happens if primary fails and only replica-only nodes have copies?

Replicas on replica-only nodes NEVER promote to primaries. The cluster enters YELLOW/RED state and waits for a regular data node. This prevents data loss scenarios where an ephemeral node becomes the source of truth.

2. Rebalancing Isolation

Will replica-only nodes trigger rebalancing on production data nodes?

Replica-only nodes are completely excluded from the BalancedShardsAllocator model. They are invisible to the balancer, ensuring zero rebalancing impact when nodes join/leave.

3. Role Transitions

What happens if a data node transitions to replica-only role?

  • Primary shards are relocated to other data nodes
  • Replicas from non-auto-expand indices are relocated out
  • Replicas from auto-expand 0-all indices remain
  • All transitions are safe with no data loss (canRemain() enforcement)

4. Recovery Code Paths

Could recovery logic accidentally create primaries on replica-only nodes?

No, via multiple layers of protection:

  • AllocationDecider blocks at allocation time
  • canForceAllocatePrimary() blocks forced allocation
  • promoteReplicaToPrimary() has explicit replica-only check

5. Auto-Expand Node Counting

How do replica-only nodes integrate with auto-expand replica counting?

shouldAutoExpandToNode() in ReplicaOnlyAllocationDecider is automatically called by
AutoExpandReplicas.getDesiredNumberOfReplicas().
Replica-only nodes are counted only for 0-all indices, ensuring correct replica counts.

Configuration Example

Node configuration:

node.roles: [replica_only]

Index configuration:

PUT /my-index
{
  "settings": {
    "index.auto_expand_replicas": "0-all"
  }
}

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

Summary by CodeRabbit

Release Notes

  • New Features

    • Introduced a new replica-only node role enabling nodes dedicated to storing replica shards only.
  • Bug Fixes

    • Enhanced shard allocation logic to prevent primary shards from being assigned to replica-only nodes.
    • Added safeguards to prevent replica promotion on replica-only nodes.
    • Excluded replica-only nodes from rebalancing calculations.
  • Tests

    • Added comprehensive test coverage for replica-only node functionality and allocation rules.

✏️ Tip: You can customize this high-level summary in your review settings.

@synhershko synhershko requested a review from a team as a code owner December 21, 2025 06:05
@coderabbitai
Copy link
Contributor

coderabbitai bot commented Dec 21, 2025

Walkthrough

This PR introduces a new replica-only node role to OpenSearch. It adds REPLICA_ONLY_ROLE with validation enforcing it as an exclusive role, a new ReplicaOnlyAllocationDecider that prevents primary allocation to replica-only nodes while allowing replica allocation under specific auto-expand settings, and extends the routing layer to detect and skip replica-only nodes during promotion and rebalancing.

Changes

Cohort / File(s) Change Summary
Node Role Definition
server/src/main/java/org/opensearch/cluster/node/DiscoveryNodeRole.java
Added REPLICA_ONLY_ROLE public role with "replica_only" name and "ro" abbreviation; includes validation to enforce exclusive role presence; updated BUILT_IN_ROLES to include the new role.
Node Role Detection
server/src/main/java/org/opensearch/cluster/node/DiscoveryNode.java
Added static helper isReplicaOnlyNode(Settings) and instance method isReplicaOnlyNode() to query replica-only role presence.
Allocation Decider
server/src/main/java/org/opensearch/cluster/routing/allocation/decider/ReplicaOnlyAllocationDecider.java
New allocation decider class preventing primary allocation to replica-only nodes; allows replicas only for indices with 0-all auto_expand_replicas; implements canAllocate, canRemain, canForceAllocatePrimary, and shouldAutoExpandToNode decision methods.
Decider Registration
server/src/main/java/org/opensearch/cluster/ClusterModule.java
Added import and registration of ReplicaOnlyAllocationDecider in allocation decider pipeline.
Routing Adjustments
server/src/main/java/org/opensearch/cluster/routing/RoutingNodes.java
Added logger and guard in promoteReplicaToPrimary() to prevent promoting replicas residing on replica-only nodes.
Rebalancing Adjustments
server/src/main/java/org/opensearch/cluster/routing/allocation/allocator/LocalShardsBalancer.java
Excludes replica-only nodes from rebalancing model construction in buildModelFromAssigned().
Role Tests
server/src/test/java/org/opensearch/cluster/node/DiscoveryNodeRoleTests.java
Added three unit tests: testReplicaOnlyRoleIsDedicated, testReplicaOnlyRoleProperties, and testReplicaOnlyRoleInBuiltInRoles.
Allocation Decider Tests
server/src/test/java/org/opensearch/cluster/routing/allocation/decider/ReplicaOnlyAllocationDeciderTests.java
New comprehensive test class covering allocation decisions, auto-expand scenarios, force-allocation, remain behavior, and eligibility checks across replica-only and data nodes.

Sequence Diagram

sequenceDiagram
    actor User
    participant Allocator as Shard Allocator
    participant Decider as ReplicaOnlyAllocationDecider
    participant NodeRoles as Node Roles<br/>(DiscoveryNode)
    participant IndexMeta as Index Metadata<br/>(auto_expand_replicas)

    User->>Allocator: Request shard allocation decision
    Allocator->>Decider: canAllocate(ShardRouting, Node)
    
    alt Allocating Primary
        Decider->>NodeRoles: Check if target node is replica-only
        alt Is Replica-Only Node
            NodeRoles-->>Decider: true
            Decider-->>Allocator: DENY (primaries not allowed)
        else Is Data Node
            NodeRoles-->>Decider: false
            Decider-->>Allocator: YES (allow primary)
        end
    else Allocating Replica
        Decider->>NodeRoles: Check if target node is replica-only
        alt Is Replica-Only Node
            NodeRoles-->>Decider: true
            Decider->>IndexMeta: Check auto_expand_replicas setting
            alt Is 0-all (auto-expand)
                IndexMeta-->>Decider: 0-all detected
                Decider-->>Allocator: YES (allow replica)
            else Other Settings
                IndexMeta-->>Decider: Not 0-all
                Decider-->>Allocator: DENY (replica only for 0-all)
            end
        else Is Data Node
            NodeRoles-->>Decider: false
            Decider-->>Allocator: YES (allow replica)
        end
    end
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Areas requiring additional attention:

  • ReplicaOnlyAllocationDecider logic: Complex conditional branching for auto_expand_replicas eligibility and primary vs. replica handling; review the canAllocate and shouldAutoExpandToNode decision paths carefully.
  • DiscoveryNodeRole.validateRole enforcement: Verify that the exclusive role constraint is properly enforced and doesn't conflict with existing role logic or node initialization.
  • RoutingNodes.promoteReplicaToPrimary guard: Ensure the guard correctly identifies replica-only nodes and that early return doesn't break other promotion logic.
  • LocalShardsBalancer exclusion: Confirm that skipping replica-only nodes in model construction doesn't inadvertently exclude valid replica placements or cause rebalancing issues.
  • Test coverage comprehensiveness: Review ReplicaOnlyAllocationDeciderTests for edge cases around mixed auto_expand settings and state transitions.

Suggested labels

enhancement, Other

Suggested reviewers

  • andrross
  • reta
  • dbwiddis
  • cwperks
  • kotwanikunal
  • anasalkouz

Poem

🐰 A new role hops into the cluster today,
Replica-only nodes lead shards the right way!
No primaries here, just backups so keen,
The finest rebalance you ever have seen.
Watch allocations dance with fresh delight,
Auto-expand rules keep everything right! ✨

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 14.29% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
✅ Passed checks (2 passed)
Check name Status Explanation
Title check ✅ Passed The title 'Add replica-only node type for read scalability' accurately summarizes the main change: introducing a new node role for read-focused scaling without write path impact.
Description check ✅ Passed The description comprehensively covers motivation, design rationale, core components, design concerns, and configuration examples. It aligns with the template's Description section and includes Apache 2.0 license confirmation.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

This commit introduces a new `replica_only` node role that provides
read scalability without impacting the write path or triggering cluster
rebalancing, enabling cost-effective horizontal scaling for read-heavy
workloads.

## Motivation

Organizations often need to scale read capacity independently from write
capacity. Traditional approaches of adding more data nodes cause:
- Unwanted shard rebalancing across the cluster
- Increased write coordination overhead
- Higher infrastructure costs for full-featured data nodes

The replica-only node type addresses these challenges by:
- Providing read-only shard hosting without participating in primary
  shard allocation or rebalancing
- Enabling cheap, ephemeral nodes that can be added/removed without
  cluster disruption
- Supporting integration with object stores (S3, etc.) for pulling
  index data on-demand

## Why Auto-Expand Replicas 0-all Only

The replica-only node type exclusively supports indices with
`index.auto_expand_replicas: 0-all` for several critical reasons:

1. **Dynamic Replica Management**: Auto-expand automatically adjusts
   replica counts when replica-only nodes join/leave, eliminating
   manual intervention and preventing under-replication

2. **No Manual Rebalancing**: Without auto-expand, adding replica-only
   nodes would require manual replica count adjustments and could
   trigger rebalancing on data nodes

3. **Predictable Behavior**: The 0-all setting guarantees one copy per
   eligible node, making replica distribution deterministic and
   transparent

4. **Operational Safety**: Prevents accidental allocation of critical
   production indices to nodes that may be ephemeral or have different
   SLAs

## High-Level Design

### Core Components

1. **New Node Role (DiscoveryNodeRole.REPLICA_ONLY_ROLE)**
   - Role name: `replica_only`
   - Dedicated role that cannot coexist with any other role

2. **Allocation Decider (ReplicaOnlyAllocationDecider)**
   - Blocks ALL primary shard allocation to replica-only nodes
- Blocks replica allocation unless index has auto_expand_replicas:
0-all
   - Prevents force allocation of primaries (safety guarantee)

3. **Rebalancing Prevention (LocalShardsBalancer)**
   - Excludes replica-only nodes from rebalancing model entirely
- Adding/removing replica-only nodes causes zero data node
rebalancing
   - Maintains cluster balance stability

4. **Replica Promotion Prevention (RoutingNodes)**
   - Blocks promotion of replicas to primaries on replica-only nodes
   - Cluster enters YELLOW/RED state when primary fails and only
     replica-only nodes have copies
   - Ensures data integrity by requiring regular data node for primaries

## Design Concerns Addressed

### 1. Data Integrity and Cluster Health
What happens if primary fails and only replica-only nodes
have copies?

Replicas on replica-only nodes NEVER promote to primaries.
The cluster enters YELLOW/RED state and waits for a regular data node.
This prevents data loss scenarios where an ephemeral node becomes the
source of truth.

### 2. Rebalancing Isolation
Will replica-only nodes trigger rebalancing on production
data nodes?

Replica-only nodes are completely excluded from the
BalancedShardsAllocator model. They are invisible to the balancer,
ensuring zero rebalancing impact when nodes join/leave.

### 3. Role Transitions
What happens if a data node transitions to replica-only role?

- Primary shards are relocated to other data nodes
- Replicas from non-auto-expand indices are relocated out
- Replicas from auto-expand 0-all indices remain
- All transitions are safe with no data loss (canRemain() enforcement)

### 4. Recovery Code Paths
Could recovery logic accidentally create primaries on
replica-only nodes?

No, via multiple layers of protection:
- AllocationDecider blocks at allocation time
- canForceAllocatePrimary() blocks forced allocation
- promoteReplicaToPrimary() has explicit replica-only check

### 5. Auto-Expand Node Counting
How do replica-only nodes integrate with auto-expand
replica counting?

shouldAutoExpandToNode() in ReplicaOnlyAllocationDecider
is automatically called by
AutoExpandReplicas.getDesiredNumberOfReplicas().
Replica-only nodes are counted only for 0-all indices, ensuring correct
replica counts.

## Configuration Example

Node configuration:
```yaml
node.roles: [replica_only]
```

Index configuration:
```json
PUT /my-index
{
  "settings": {
    "index.auto_expand_replicas": "0-all"
  }
}
```

Signed-off-by: Itamar Syn-Hershko <itamar@bigdataboutique.com>
@synhershko synhershko force-pushed the feature/replica-only-node-type branch from df9b566 to 9d31c2c Compare December 21, 2025 06:06
@synhershko
Copy link
Author

Pinging @msfroh @andrross @prudhvigodithi for a review :)

We haven't tested this extensively just yet but believe it's a strong addition to OpenSearch and would love peer review and assistance with testing this thoroughly.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (4)
server/src/main/java/org/opensearch/cluster/ClusterModule.java (1)

80-81: Minor: Import ordering.

The import for ReplicaOnlyAllocationDecider is placed between SameShardAllocationDecider and SearchReplicaAllocationDecider. While not strictly required to be alphabetical, consider placing it after RemoteStoreMigrationAllocationDecider to maintain alphabetical ordering with other decider imports.

server/src/main/java/org/opensearch/cluster/routing/allocation/decider/ReplicaOnlyAllocationDecider.java (1)

66-69: Extract duplicated isAutoExpandAll logic into a private helper method.

The same logic for determining if an index has auto_expand_replicas: 0-all is duplicated in both shouldAutoExpandToNode (lines 66-69) and canAllocate (lines 111-113). This violates DRY and risks inconsistency if one location is updated but not the other.

🔎 Proposed refactor to extract helper method
+    private static boolean isAutoExpandAll(IndexMetadata indexMetadata) {
+        AutoExpandReplicas autoExpandReplicas = AutoExpandReplicas.SETTING.get(indexMetadata.getSettings());
+        return autoExpandReplicas.isEnabled()
+            && autoExpandReplicas.getMaxReplicas() == Integer.MAX_VALUE
+            && autoExpandReplicas.getMinReplicas() == 0;
+    }
+
     @Override
     public Decision shouldAutoExpandToNode(IndexMetadata indexMetadata, DiscoveryNode node, RoutingAllocation allocation) {
         if (!node.isReplicaOnlyNode()) {
             // Regular data nodes participate in auto-expand for all indices
             return allocation.decision(Decision.YES, NAME, "node [%s] is a data node, eligible for auto-expand", node.getId());
         }

         // Replica-only nodes only participate in 0-all auto-expand
-        AutoExpandReplicas autoExpandReplicas = AutoExpandReplicas.SETTING.get(indexMetadata.getSettings());
-        boolean isAutoExpandAll = autoExpandReplicas.isEnabled()
-            && autoExpandReplicas.getMaxReplicas() == Integer.MAX_VALUE
-            && autoExpandReplicas.toString().startsWith("0-");
-
-        if (isAutoExpandAll) {
+        if (isAutoExpandAll(indexMetadata)) {

Apply similar change in canAllocate method.

Also applies to: 111-113

server/src/test/java/org/opensearch/cluster/routing/allocation/decider/ReplicaOnlyAllocationDeciderTests.java (2)

56-62: Remove unused AllocationService instance.

The AllocationService is instantiated but never used in any test method. This is dead code that adds unnecessary complexity to the test setup.

🔎 Proposed fix
         allocationDeciders = new AllocationDeciders(
             Arrays.asList(decider, new SameShardAllocationDecider(Settings.EMPTY, clusterSettings), new ReplicaAfterPrimaryActiveAllocationDecider())
         );

-        AllocationService service = new AllocationService(
-            allocationDeciders,
-            new TestGatewayAllocator(),
-            new BalancedShardsAllocator(Settings.EMPTY),
-            EmptyClusterInfoService.INSTANCE,
-            EmptySnapshotsInfoService.INSTANCE
-        );
-
         // Create a cluster with 2 nodes: one regular data node, one replica-only node

293-342: Consider adding shouldAutoExpandToNode assertions for non-0-all auto-expand settings.

The test correctly verifies that canAllocate rejects replicas with 0-5 and 1-all settings on replica-only nodes. For completeness, consider also testing shouldAutoExpandToNode with these settings to ensure the auto-expand eligibility check is consistent.

🔎 Proposed addition
         decision = decider.canAllocate(replica1all, replicaOnlyNode, allocation);
         assertEquals(Decision.Type.NO, decision.type());
+
+        // Verify shouldAutoExpandToNode also rejects non-0-all settings
+        IndexMetadata index05 = testState.metadata().index("auto-expand-0-5");
+        IndexMetadata index1all = testState.metadata().index("auto-expand-1-all");
+        DiscoveryNode replicaOnlyDiscoveryNode = testState.nodes().get("replica-only-node");
+
+        decision = decider.shouldAutoExpandToNode(index05, replicaOnlyDiscoveryNode, allocation);
+        assertEquals(Decision.Type.NO, decision.type());
+
+        decision = decider.shouldAutoExpandToNode(index1all, replicaOnlyDiscoveryNode, allocation);
+        assertEquals(Decision.Type.NO, decision.type());
     }
 }
📜 Review details

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 84bc68e and 9d31c2c.

📒 Files selected for processing (8)
  • server/src/main/java/org/opensearch/cluster/ClusterModule.java (2 hunks)
  • server/src/main/java/org/opensearch/cluster/node/DiscoveryNode.java (2 hunks)
  • server/src/main/java/org/opensearch/cluster/node/DiscoveryNodeRole.java (1 hunks)
  • server/src/main/java/org/opensearch/cluster/routing/RoutingNodes.java (3 hunks)
  • server/src/main/java/org/opensearch/cluster/routing/allocation/allocator/LocalShardsBalancer.java (1 hunks)
  • server/src/main/java/org/opensearch/cluster/routing/allocation/decider/ReplicaOnlyAllocationDecider.java (1 hunks)
  • server/src/test/java/org/opensearch/cluster/node/DiscoveryNodeRoleTests.java (1 hunks)
  • server/src/test/java/org/opensearch/cluster/routing/allocation/decider/ReplicaOnlyAllocationDeciderTests.java (1 hunks)
🧰 Additional context used
🧬 Code graph analysis (2)
server/src/test/java/org/opensearch/cluster/routing/allocation/decider/ReplicaOnlyAllocationDeciderTests.java (2)
test/framework/src/main/java/org/opensearch/cluster/OpenSearchAllocationTestCase.java (1)
  • OpenSearchAllocationTestCase (76-543)
server/src/main/java/org/opensearch/cluster/routing/allocation/allocator/BalancedShardsAllocator.java (1)
  • BalancedShardsAllocator (93-876)
server/src/main/java/org/opensearch/cluster/ClusterModule.java (1)
server/src/main/java/org/opensearch/cluster/routing/allocation/decider/ReplicaOnlyAllocationDecider.java (1)
  • ReplicaOnlyAllocationDecider (29-140)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
  • GitHub Check: gradle-check
  • GitHub Check: Mend Security Check
🔇 Additional comments (12)
server/src/main/java/org/opensearch/cluster/node/DiscoveryNode.java (2)

133-135: LGTM!

The static helper follows the existing pattern used for other roles like isWarmNode and isIngestNode.


549-556: LGTM!

The instance method follows the established pattern for role checks (e.g., isSearchNode, isWarmNode) with consistent Javadoc formatting.

server/src/main/java/org/opensearch/cluster/node/DiscoveryNodeRole.java (2)

340-373: LGTM!

The REPLICA_ONLY_ROLE definition is well-documented and follows the established pattern for exclusive roles (similar to SEARCH_ROLE). The validation logic correctly enforces that replica_only must be the only role on a node.


378-382: LGTM!

The role is correctly added to BUILT_IN_ROLES. The sorted set will maintain proper ordering by role name.

server/src/main/java/org/opensearch/cluster/routing/allocation/allocator/LocalShardsBalancer.java (1)

768-775: LGTM!

Correctly excludes replica-only nodes from the rebalancing model, which ensures that adding/removing these nodes doesn't trigger rebalancing of shards on regular data nodes. The trace log is helpful for debugging allocation decisions.

server/src/main/java/org/opensearch/cluster/routing/RoutingNodes.java (1)

89-90: LGTM!

Logger addition follows the standard pattern used in other classes.

server/src/main/java/org/opensearch/cluster/ClusterModule.java (1)

400-400: LGTM!

The ReplicaOnlyAllocationDecider is correctly registered in the allocation deciders pipeline.

server/src/main/java/org/opensearch/cluster/routing/allocation/decider/ReplicaOnlyAllocationDecider.java (2)

29-56: LGTM on the core structure and force allocation blocking.

The decider correctly:

  • Delegates canAllocate and canRemain to a shared helper for consistency
  • Blocks primary shard allocation to replica-only nodes
  • Blocks force allocation of primaries to replica-only nodes (critical safety measure)
  • Provides informative decision messages for debugging

67-69: The suggested fix cannot be implemented: getMinReplicas() is a private method.

The fragility concern about using autoExpandReplicas.toString().startsWith("0-") is valid. However, the proposed fix to use autoExpandReplicas.getMinReplicas() == 0 is not viable—the getMinReplicas() method is private and cannot be called from ReplicaOnlyAllocationDecider.

To address the fragility concern, either:

  1. Make getMinReplicas() public in AutoExpandReplicas, then use the suggested check, or
  2. Add a public helper method to AutoExpandReplicas (e.g., isMinReplicasZero()) that encapsulates this logic.

Likely an incorrect or invalid review comment.

server/src/test/java/org/opensearch/cluster/routing/allocation/decider/ReplicaOnlyAllocationDeciderTests.java (1)

90-156: Good test structure with thorough assertions.

The tests properly verify both the decision type and the decision message content. Enabling debugDecision(true) ensures meaningful assertion failure messages. The coverage of primary/replica allocation scenarios for both node types is comprehensive.

server/src/test/java/org/opensearch/cluster/node/DiscoveryNodeRoleTests.java (2)

163-187: Thorough validation of the dedicated role constraint.

The test correctly verifies that replica_only cannot coexist with DATA_ROLE, CLUSTER_MANAGER_ROLE, or INGEST_ROLE, and that it passes validation when specified alone. The assertion messages are clear and will help diagnose failures.


189-198: LGTM on role property and built-in roles tests.

These tests provide essential coverage for:

  • Role name (replica_only) and abbreviation (ro)
  • Data-containing capability (canContainData() returns true)
  • No legacy setting (expected for new roles)
  • Presence in BUILT_IN_ROLES set

Comment on lines +818 to +831

// CRITICAL: Never promote replicas on replica-only nodes
RoutingNode routingNode = node(activeReplica.currentNodeId());
if (routingNode != null && routingNode.node().isReplicaOnlyNode()) {
logger.warn(
"Cannot promote replica shard [{}] to primary on replica-only node [{}]. "
+ "Shard will remain as replica. Primary must be allocated to a regular data node.",
activeReplica.shardId(),
routingNode.nodeId()
);
// Do NOT call promoteActiveReplicaShardToPrimary - just return
// The primary will remain unassigned, triggering allocation to a data node
return;
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Critical: Early return leaves cluster without a primary shard.

The guard returns early without promoting the replica, but at this point movePrimaryToUnassignedAndDemoteToReplica() has already been called in unassignPrimaryAndPromoteActiveReplicaIfExists(). This leaves the shard group with no primary:

  1. Original primary → demoted to unassigned replica
  2. Active replica on replica-only node → remains a replica (not promoted)
  3. Result: No primary exists for this shard

The replica-only check should occur before the primary is demoted, ideally in unassignPrimaryAndPromoteActiveReplicaIfExists() when selecting the active replica. The replica selection logic should skip replicas on replica-only nodes.

🔎 Suggested approach

Modify the replica selection in unassignPrimaryAndPromoteActiveReplicaIfExists() to exclude replicas on replica-only nodes:

 private void unassignPrimaryAndPromoteActiveReplicaIfExists(
     ShardRouting failedShard,
     UnassignedInfo unassignedInfo,
     RoutingChangesObserver routingChangesObserver
 ) {
     assert failedShard.primary();
     ShardRouting activeReplica = null;
     if (isMigratingToRemoteStore(metadata)) {
         activeReplica = activeReplicaOnRemoteNode(failedShard.shardId());
+        // Skip if on replica-only node
+        if (activeReplica != null && isReplicaOnReplicaOnlyNode(activeReplica)) {
+            activeReplica = null;
+        }
     }
     if (activeReplica == null) {
         if (metadata.isSegmentReplicationEnabled(failedShard.getIndexName())) {
             activeReplica = activeReplicaWithOldestVersion(failedShard.shardId());
         } else {
             activeReplica = activeReplicaWithHighestVersion(failedShard.shardId());
         }
+        // Skip if on replica-only node
+        if (activeReplica != null && isReplicaOnReplicaOnlyNode(activeReplica)) {
+            activeReplica = null;
+        }
     }
     // ... rest of method
 }
+
+private boolean isReplicaOnReplicaOnlyNode(ShardRouting replica) {
+    RoutingNode rn = node(replica.currentNodeId());
+    return rn != null && rn.node().isReplicaOnlyNode();
+}

Alternatively, filter replica-only nodes in activeReplicaWithHighestVersion() and activeReplicaWithOldestVersion().

Committable suggestion skipped: line range outside the PR's diff.

@github-actions
Copy link
Contributor

❌ Gradle check result for 9d31c2c: null

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

@prudhvigodithi
Copy link
Member

prudhvigodithi commented Dec 30, 2025

Thanks for this PR @synhershko

The use case of scaling read capacity independently is definitely valuable. I wanted to point out that OpenSearch recently introduced Search Replicas as part of the reader/writer separation feature, which addresses a similar problem. This feature uses:

  • A dedicated search node role.
  • New shard types (search replicas vs write replicas).
  • Segment replication with remote store backend.
  • A _scale API for search-only mode.

Related Docs link:

Have you tried this search replicas feature? I want to get some idea on how does this replica_only node type differ or complement the existing search replicas feature?

Thank you
Adding @mch2 @vinaykpud

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants