forked from elasticsearch-cn/elasticsearch-definitive-guide
-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
First round of phase 2 changes to sync up with version 2.x.
- Loading branch information
Showing
16 changed files
with
148 additions
and
179 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file was deleted.
Oops, something went wrong.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,74 @@ | ||
[[scroll]] | ||
=== Scroll | ||
|
||
A `scroll` query ((("scroll API))) is used to retrieve | ||
large numbers of documents from Elasticsearch efficiently, without paying the | ||
penalty of deep pagination. | ||
|
||
Scrolling allows us to((("scrolled search"))) do an initial search and to keep pulling | ||
batches of results from Elasticsearch until there are no more results left. | ||
It's a bit like a _cursor_ in ((("cursors")))a traditional database. | ||
|
||
A scrolled search takes a snapshot in time. It doesn't see any changes that | ||
are made to the index after the initial search request has been made. It does | ||
this by keeping the old data files around, so that it can preserve its ``view'' | ||
on what the index looked like at the time it started. | ||
|
||
The costly part of deep pagination is the global sorting of results, but if we | ||
disable sorting, then we can return all documents quite cheaply. To do this, we | ||
sort by `_doc`. This instructs Elasticsearch just return the next batch of | ||
results from every shard that still has results to return. | ||
|
||
To scroll through results, we execute a search request and set the `scroll` value to | ||
the length of time we want to keep the scroll window open. The scroll expiry | ||
time is refreshed every time we run a scroll request, so it only needs to be long enough | ||
to process the current batch of results, not all of the documents that match | ||
the query. The timeout is important because keeping the scroll window open | ||
consumes resources and we want to free them as soon as they are no longer needed. | ||
Setting the timeout enables Elasticsearch to automatically free the resources | ||
after a small period of inactivity. | ||
|
||
[source,js] | ||
-------------------------------------------------- | ||
GET /old_index/_search?scroll=1m <1> | ||
{ | ||
"query": { "match_all": {}}, | ||
"sort" : ["_doc"], <2> | ||
"size": 1000 | ||
} | ||
-------------------------------------------------- | ||
<1> Keep the scroll window open for 1 minute. | ||
<2> `_doc` is the most efficient sort order. | ||
|
||
The response to this request includes a | ||
`_scroll_id`, which is a long Base-64 encoded((("scroll_id"))) string. Now we can pass the | ||
`_scroll_id` to the `_search/scroll` endpoint to retrieve the next batch of | ||
results: | ||
|
||
[source,js] | ||
-------------------------------------------------- | ||
GET /_search/scroll | ||
{ | ||
"scroll": "1m", <1> | ||
"scroll_id" : "cXVlcnlUaGVuRmV0Y2g7NTsxMDk5NDpkUmpiR2FjOFNhNnlCM1ZDMWpWYnRROzEwOTk1OmRSamJHYWM4U2E2eUIzVkMxalZidFE7MTA5OTM6ZFJqYkdhYzhTYTZ5QjNWQzFqVmJ0UTsxMTE5MDpBVUtwN2lxc1FLZV8yRGVjWlI2QUVBOzEwOTk2OmRSamJHYWM4U2E2eUIzVkMxalZidFE7MDs=" | ||
} | ||
-------------------------------------------------- | ||
<1> Note that we again set the scroll expiration to 1m. | ||
|
||
The response to this scroll request includes the next batch of results. | ||
Although we specified a `size` of 1,000, we get back many more | ||
documents.((("size parameter", "in scanning"))) When scanning, the `size` is applied to each shard, so you will | ||
get back a maximum of `size * number_of_primary_shards` documents in each | ||
batch. | ||
|
||
NOTE: The scroll request also returns a _new_ `_scroll_id`. Every time | ||
we make the next scroll request, we must pass the `_scroll_id` returned by the | ||
_previous_ scroll request. | ||
|
||
When no more hits are returned, we have processed all matching documents. | ||
|
||
TIP: Some of the official Elasticsearch clients such as | ||
http://elasticsearch-py.readthedocs.org/en/master/helpers.html#scan[Python client] and | ||
https://metacpan.org/pod/Search::Elasticsearch::Scroll[Perl client] provide scroll helpers that | ||
provide easy-to-use wrappers around this funtionality. | ||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.