Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

text_similarity_reranker fails with NPE when zero results are returned from a shard #119617

Closed
kderusso opened this issue Jan 6, 2025 · 10 comments · Fixed by #120062
Closed

text_similarity_reranker fails with NPE when zero results are returned from a shard #119617

kderusso opened this issue Jan 6, 2025 · 10 comments · Fixed by #120062
Assignees
Labels
>bug :ml Machine learning :SearchOrg/Relevance Label for the Search (solution/org) Relevance team

Comments

@kderusso
Copy link
Member

kderusso commented Jan 6, 2025

Elasticsearch Version

8.16.1 to current

Installed Plugins

No response

Java Version

bundled

OS Version

Reproduceable in cloud

Problem Description

When performing a text_similarity_reranker search on an multiple indices, if one or more shards returns an empty result (no documents to rerank), the request fails with the following null pointer exception:

{
  "error": {
    "root_cause": [
      {
        "type": "status_exception",
        "reason": "[text_similarity_reranker] search failed - retrievers '[standard]' returned errors. All failures are attached as suppressed exceptions.",
        "suppressed": [
          {
            "type": "search_phase_execution_exception",
            "reason": "",
            "phase": "fetch",
            "grouped": true,
            "failed_shards": []
          }
        ]
      }
    ],
    "type": "status_exception",
    "reason": "[text_similarity_reranker] search failed - retrievers '[standard]' returned errors. All failures are attached as suppressed exceptions.",
    "suppressed": [
      {
        "type": "search_phase_execution_exception",
        "reason": "",
        "phase": "fetch",
        "grouped": true,
        "failed_shards": [],
        "caused_by": {
          "type": "null_pointer_exception",
          "reason": """Cannot invoke "org.elasticsearch.search.SearchPhaseResult.getSearchShardTarget()" because "shardPhaseResult" is null"""
        }
      }
    ]
  },
  "status": 500
}

Potentially this is happening in the RankFeaturePhase on null or empty entry sets.

Proposed fix from @Mikep86, but should probably refactored to happen on the coordinator to be more performant :

// we send out a request to each shard in order to fetch the needed feature info
            for (int i = 0; i < docIdsToLoad.length; i++) {
                List<Integer> entry = docIdsToLoad[i];
                SearchPhaseResult queryResult = queryPhaseResults.getAtomicArray().get(i);
                executeRankFeatureShardPhase(queryResult, rankRequestCounter, entry);
            }

Steps to Reproduce

This can be reproduced by running MSMarco minilm on an alias pointing to empty indices:

PUT _inference/rerank/ms-marco-minilm-l-6-v2
{
  "service": "elasticsearch",
  "task_type": "rerank",
  "service_settings": {
    "num_allocations": 1,
    "num_threads": 1,
    "model_id": "cross-encoder__ms-marco-minilm-l-6-v2"
  },
  "task_settings": {
    "return_documents": true
  }
}

PUT /my-index
{
  "mappings": {
    "properties": {
      "text": {
        "type": "text"
      }
    }
  }
}

PUT /my-index2
{
  "mappings": {
    "properties": {
      "text": {
        "type": "text"
      }
    }
  }
}

POST _aliases
{
  "actions": [
    {
      "add": {
        "indices": [ "my-index", "my-index2" ],
        "alias": "my-alias"
      }
    }
  ]
}

POST my-alias/_search
{
  "retriever": {
    "text_similarity_reranker": {
      "retriever": {
        "standard": {
          "query": {
            "match": {
              "text": "moon"
            }
          }
        }
      },
      "field": "text",
      "inference_id": "ms-marco-minilm-l-6-v2",
      "inference_text": "How often does the moon hide the sun?",
      "rank_window_size": 100
    }
  }
}

Logs (if relevant)

No response

@kderusso kderusso added :ml Machine learning :SearchOrg/Relevance Label for the Search (solution/org) Relevance team >bug labels Jan 6, 2025
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/ml-core (Team:ML)

@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/search-eng (Team:SearchOrg)

@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/search-relevance (Team:Search - Relevance)

@benwtrent
Copy link
Member

@kderusso I tried replicating in a yaml test and I could not. I wonder whats different?

"text similarity ranking with empty indices":
  - do:
      indices.create:
        index: test-index-empty-1
        body:
          mappings:
            properties:
              text:
                type: text

  - do:
      indices.create:
        index: test-index-empty-2
        body:
          mappings:
            properties:
              text:
                type: text

  - do:
      indices.put_alias:
        index: test-index-empty-1
        name: empty-alias

  - do:
      indices.put_alias:
        index: test-index-empty-2
        name: empty-alias

  - do:
      search:
        index: empty-alias
        body:
          track_total_hits: true
          retriever: {
            text_similarity_reranker: {
              retriever:
                {
                  standard: {
                    query: {
                      term: {
                        text: "science"
                      }
                    }
                  }
                },
              rank_window_size: 10,
              inference_id: my-rerank-model,
              inference_text: "How often does the moon hide the sun?",
              field: text
            }
          }
          size: 10

@Mikep86 Mikep86 self-assigned this Jan 8, 2025
@Mikep86
Copy link
Contributor

Mikep86 commented Jan 9, 2025

@benwtrent Your test doesn't replicate the issue because the fetch search phase, where the NPE is thrown, is skipped when the index is empty (or when there are otherwise no documents to fetch).

The issue with our replication attempts to date is that we rerank all the docs fetched by the standard retriever. Thus, if the reranker's doc set is empty, so is the standard retriever's, and the fetch search phase is skipped.

The bug occurs when the reranker's doc set is empty, but the upstream retriever's doc set is not. This creates a scenario where the fetch search phase is executed on (what should be) an empty result set.

I was able to reproduce the bug with this scenario:

PUT _inference/rerank/ms-marco-minilm-l-6-v2
{
  "service": "elasticsearch",
  "task_type": "rerank",
  "service_settings": {
    "num_allocations": 1,
    "num_threads": 1,
    "model_id": "cross-encoder__ms-marco-minilm-l-6-v2"
  },
  "task_settings": {
    "return_documents": true
  }
}

PUT /my-index
{
  "mappings": {
    "properties": {
      "text": {
        "type": "text"
      },
      "rerank_text": {
        "type": "text"
      }
    }
  }
}

PUT /my-index2
{
  "mappings": {
    "properties": {
      "text": {
        "type": "text"
      },
      "rerank_text": {
        "type": "text"
      }
    }
  }
}

PUT /my-index3
{
  "mappings": {
    "properties": {
      "text": {
        "type": "text"
      },
      "rerank_text": {
        "type": "text"
      }
    }
  }
}

POST _aliases
{
  "actions": [
    {
      "add": {
        "indices": [ "my-index", "my-index2", "my-index3" ],
        "alias": "my-alias"
      }
    }
  ]
}


POST /my-index2/_doc/
{
  "text": "The moon occasionally covers the sun during an eclipse.",
  "rerank_text": "The moon occasionally covers the sun during an eclipse."
}

POST /my-index2/_doc/
{
  "text": "Solar eclipses are a type of eclipse that occurs when the Moon passes between the Earth and the Sun.",
  "rerank_text": "Solar eclipses are a type of eclipse that occurs when the Moon passes between the Earth and the Sun."
}

POST /my-index/_doc/
{
  "text": "How often does the moon hide the sun? This happens during a solar eclipse."
}

POST /my-index3/_doc/
{
  "text": "The sun is larger than the moon but appears the same size in the sky because it is much further away.",
  "rerank_text": "The sun is larger than the moon but appears the same size in the sky because it is much further away."
}

POST my-alias/_search
{
  "retriever": {
    "text_similarity_reranker": {
      "retriever": {
        "standard": {
          "query": {
            "match": {
              "text": "moon"
            }
          }
        }
      },
      "field": "rerank_text",
      "inference_id": "ms-marco-minilm-l-6-v2",
      "inference_text": "How often does the moon hide the sun?",
      "rank_window_size": 100
    }
  }
}

@benwtrent
Copy link
Member

@Mikep86

I tried add the following to 70_text_similarity_rank_retriever.yml and it doesn't fail.

Does it need to be more than one node?

I am trying to see about a test that is repeatable in our CI for whomever needs to fix this.

---
"text similarity ranking with empty indices":
  - do:
      indices.create:
        index: test-index-empty-1
        body:
          mappings:
            properties:
              text:
                type: text
              rerank_text:
                type: text

  - do:
      indices.create:
        index: test-index-empty-2
        body:
          mappings:
            properties:
              text:
                type: text
              rerank_text:
                type: text

  - do:
      indices.create:
        index: test-index-empty-3
        body:
          mappings:
            properties:
              text:
                type: text
              rerank_text:
                type: text

  - do:
      indices.put_alias:
        index: test-index-empty-1
        name: empty-alias

  - do:
      indices.put_alias:
        index: test-index-empty-2
        name: empty-alias

  - do:
      indices.put_alias:
        index: test-index-empty-3
        name: empty-alias

  - do:
      index:
        index: test-index-empty-2
        body:
          text: "The moon occasionally covers the sun during an eclipse."
          rerank_text: "The moon occasionally covers the sun during an eclipse."
        refresh: true

  - do:
      index:
        index: test-index-empty-2
        refresh: true
        body:
          text: "Solar eclipses are a type of eclipse that occurs when the Moon passes between the Earth and the Sun."
          rerank_text: "Solar eclipses are a type of eclipse that occurs when the Moon passes between the Earth and the Sun."

  - do:
      index:
        index: test-index-empty-1
        refresh: true
        body:
          text: "How often does the moon hide the sun? This happens during a solar eclipse."

  - do:
     index:
       index: test-index-empty-3
       refresh: true
       body:
        text: "The sun is larger than the moon but appears the same size in the sky because it is much further away."
        rerank_text: "The sun is larger than the moon but appears the same size in the sky because it is much further away."

  - do:
      search:
        index: empty-alias
        body:
          track_total_hits: true
          retriever: {
            text_similarity_reranker: {
              retriever:
                {
                  standard: {
                    query: {
                      match: {
                        text: "moon"
                      }
                    }
                  }
                },
              rank_window_size: 10,
              inference_id: my-rerank-model,
              inference_text: "How often does the moon hide the sun?",
              field: rerank_text
            }
          }
          size: 10

@Mikep86
Copy link
Contributor

Mikep86 commented Jan 9, 2025

@benwtrent I'm having trouble getting this to reliably fail as well. I'm confident the issue is triggered when there are no docs to rerank, but the fetch search phase is run. The complication is that there's logic in the fetch search phase to short-circuit when there are no docs to fetch (here and here). However, the existence of the bug tells us that this short-circuit logic isn't always enough. I will continue investigating to try to figure out the combination of factors that reproduces the issue.

@Mikep86
Copy link
Contributor

Mikep86 commented Jan 10, 2025

@benwtrent

Ok, I think I figured it out for real this time. And if so, the bug is worse than we originally thought. The NPE is a symptom of a deeper issue.

What I think is happening is that we are using inconsistent shard indices when querying via alias. In the first part of the rank feature phase, we consider all shards queried as a single group, regardless of the actual backing index. So for example, if we were querying two indices, my-index-1 (with one shard) and my-index-2 (with two shards) via an alias, the shard indices would be mapped like so:

0: my-index-1 shard 0
1: my-index-2 shard 0
2: my-index-2 shard 1

However, in the second part of the rank feature phase, we break down shards by their backing indices. We still use results from the first part though, resulting in potential shard index mis-matches. Continuing the example above, if the doc we actually want to rerank is in my-index-2, shard 1, the relevant shard index w.r.t to the first part of the phase is 2 and 1 w.r.t to the second part of the phase. This mis-match is what causes the fetch search phase short-circuit logic to fail to catch that there are potentially no results for the shard, leading to the NPE.

But the problem goes deeper. Due to the shard index mis-match, if there are results for the shard we are mistakenly addressing, we are reranking the wrong document. This is bad.

I've been able to recreate the issue with this setup. Let me know if it doesn't work for you. It depends on first-index having a lower shard index than second-index, which I'm not sure is always the case.

PUT _inference/rerank/ms-marco-minilm-l-6-v2
{
  "service": "elasticsearch",
  "task_type": "rerank",
  "service_settings": {
    "num_allocations": 1,
    "num_threads": 1,
    "model_id": "cross-encoder__ms-marco-minilm-l-6-v2"
  },
  "task_settings": {
    "return_documents": true
  }
}

PUT /first-index
{
  "mappings": {
    "properties": {
      "text": {
        "type": "text"
      }
    }
  }
}

PUT /second-index
{
  "settings": {
    "number_of_shards": 2,
    "number_of_replicas": 0
  },
  "mappings": {
    "properties": {
      "text": {
        "type": "text"
      }
    }
  }
}

POST /second-index/_doc/
{
  "text": "How often does the moon hide the sun? This happens during a solar eclipse."
}

POST _aliases
{
  "actions": [
    {
      "add": {
        "indices": [ "first-index", "second-index" ],
        "alias": "my-alias"
      }
    }
  ]
}

// Triggers bug
POST my-alias/_search
{
  "retriever": {
    "text_similarity_reranker": {
      "retriever": {
        "standard": {
          "query": {
            "match": {
              "text": "moon"
            }
          }
        }
      },
      "field": "text",
      "inference_id": "ms-marco-minilm-l-6-v2",
      "inference_text": "How often does the moon hide the sun?",
      "rank_window_size": 100
    }
  }
}

// Does not trigger bug
POST second-index/_search
{
  "retriever": {
    "text_similarity_reranker": {
      "retriever": {
        "standard": {
          "query": {
            "match": {
              "text": "moon"
            }
          }
        }
      },
      "field": "text",
      "inference_id": "ms-marco-minilm-l-6-v2",
      "inference_text": "How often does the moon hide the sun?",
      "rank_window_size": 100
    }
  }
}

@benwtrent
Copy link
Member

@Mikep86 let's sync with @pmpailis to dig in and see.

@pmpailis
Copy link
Contributor

pmpailis commented Jan 13, 2025

Thanks @Mikep86 for the thorough investigation; must have been quite a pain to track down and reproduce :/

I was able to reproduce it as well; I think that the main issue is on the RankFeatureShardPhase when constructing the RankDocs to return. More specifically, we assign the shard index based on the shardTarget instead of relying on the request 😞
I think the following should address the issue and correctly assigned shardIndex for all returned docs, based on the initial request

+++ b/server/src/main/java/org/elasticsearch/search/rank/feature/RankFeatureShardPhase.java
             // FetchSearchResult#shardResult()
             SearchHits hits = fetchSearchResult.hits();
             RankFeatureShardResult featureRankShardResult = (RankFeatureShardResult) rankFeaturePhaseRankShardContext
-                .buildRankFeatureShardResult(hits, searchContext.shardTarget().getShardId().id());
+                .buildRankFeatureShardResult(hits, searchContext.request().shardRequestIndex());
             // save the result in the search context
             // need to add profiling info as well available from fetch
             if (featureRankShardResult != null) {

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
>bug :ml Machine learning :SearchOrg/Relevance Label for the Search (solution/org) Relevance team
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants