Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] Shard allocator leaves unassigned shards #5908

Closed
jainankitk opened this issue Jan 17, 2023 · 3 comments
Closed

[BUG] Shard allocator leaves unassigned shards #5908

jainankitk opened this issue Jan 17, 2023 · 3 comments
Labels
bug Something isn't working distributed framework

Comments

@jainankitk
Copy link
Collaborator

Describe the bug
Shard allocation does not consider all combinations especially for Zone Aware allocation leaving unassigned shards which could be assigned otherwise. This is observed more commonly for clusters with total_shards_per_node setting enabled.

To Reproduce
Steps to reproduce the behavior:
Create 3 node cluster with each node in different AZ. Now create index with 3 shards and 1 replica. Now, 1 of the shard might not get assigned:

0p and 1r were assigned to Node1
1p and 0r were assigned to Node2

so, only Node3 was left for 2p and 2r. Hence, 2r cannot be assigned as not both copies can be in same zone. Also, other nodes are not eligible as they violate total shards per node constraint

Expected behavior
The shard allocator should be able to pick any other alternative assignment like:

Node1 - 0p, 2r
Node2 - 0r, 1p
Node3 - 1r, 2p

Plugins
NA

Screenshots

% cat /tmp/explain | grep explanation | sort | uniq -c
      1   "allocate_explanation" : "cannot allocate because allocation is not permitted to any of the nodes",
      1           "explanation" : "a copy of this shard is already allocated to this node [[index][2], node[xxxxx], [P], s[STARTED], a[id=xxxxx]]"
      1           "explanation" : "there are too many copies of the shard allocated to nodes with attribute [zone], there are [2] total configured shard copies for this shard id and [3] total attribute values, expected the allocated shard count per attribute [2] to be less than or equal to the upper bound of the required number of shards per attribute [1]"
      4           "explanation" : "too many shards [2] allocated to this node for index [index], index setting [index.routing.allocation.total_shards_per_node=2]"

Host/Environment (please complete the following information):

  • Observed in OS 1.0, but should be part of other releases as well

Additional context
Add any other context about the problem here.

@vinaykpud
Copy link
Contributor

vinaykpud commented Mar 5, 2025

I am currently working on #16987, while working on its integration tests I found this Issue annotated here :

@AwaitsFix(bugUrl = "https://github.com/opensearch-project/OpenSearch/issues/5908")

I tried to reproducing this. Followed the reproduce steps

  1. Created cluster with 3 nodes, each node in different zone.
  2. Created index with 3P and 1R

All shards are assigned as expected.

@mch2 @jainankitk
Can we close this issue? and remove annotation here :

@AwaitsFix(bugUrl = "https://github.com/opensearch-project/OpenSearch/issues/5908")

@mch2 mch2 closed this as not planned Won't fix, can't repro, duplicate, stale Mar 5, 2025
@jainankitk
Copy link
Collaborator Author

@vinaykpud - Did you try running the test until failure? Few runs might succeed, but there are cases where it gets stuck

@vinaykpud
Copy link
Contributor

@jainankitk yes, I have tried this test:

public void testThreeZoneOneReplicaWithForceZoneValueAndLoadAwareness() throws Exception {
until failure and never seen failure.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working distributed framework
Projects
None yet
Development

No branches or pull requests

5 participants