Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

encapsulate {httr} stuff in helper functions, test against latest Elasticsearch #245

Open
wants to merge 29 commits into
base: main
Choose a base branch
from

Conversation

jameslamb
Copy link
Collaborator

Contributes to #229

Notes for Reviewers

Proposing adding the latest Elasticsearch version to be sure we're covering that before I switch out the do-HTTP-stuff package, so we know that if we see testing failures it's a result of that refactoring.

How I tested this

I ran the instructions in CONTRIBUTING.md, on my M2 macbook (arm64).

```shell
# Start up Elasticsearch on localhost:9200 and seed it with data
./setup_local.sh 7.12.1
# Run tests
make test
# Get test coverage and generate coverage report
make coverage
# Tear down the container and remove testing files
./cleanup_local.sh
```

@jameslamb jameslamb added the maintenance miscellaneous maintenance label Feb 14, 2025
@jameslamb
Copy link
Collaborator Author

had a problem deploying to github-pages 2 hours ago — with GitHub Actions Failure

This can safely be ignored. Fixed with #246

.ci/test.sh Outdated
-i \
-X GET \
-H 'Accept: application/json' \
'http://127.0.0.1:9200/.geoip_databases,empty_index,shakespeare/_mapping'
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Putting this somewhere on the diff to thread it. Reposting what I'd written 2 days ago:

hmmmmm, the integration tests are failing only for Elasticsearch 7.17.22... not any of the versions before or after.

  ── 3. Error ('test-integration.R:338:9'): .get_aliases and get_fields work as ex
  <http_400/http_error/error/condition>
  Error in `.stop_for_status(result)`: Bad Request (HTTP 400).
  Backtrace:

(build link)

Haven't been able to reproduce that locally.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Finally found it!

{"error":{"root_cause":[{"type":"illegal_argument_exception","reason":"Indices [.geoip_databases] use and access is reserved for system operations"}],"type":"illegal_argument_exception","reason":"Indices [.geoip_databases] use and access is reserved for system operations"},"status":400}-

(build link)

It looks like for a call like get_fields(..., es_indices = "_all"), this .geoip_databases index is showing up in the list of indices, right here:

# The use of "_all" to indicate "all indices" was removed in Elasticsearch 7.
if (as.integer(major_version) > 6 && indices == "_all") {
log_warn(sprintf(
paste0(
"You are running Elasticsearch version '%s.x'. _all is not supported in this version."
, " Pulling all indices with 'POST /_cat/indices' for you." # nolint[non_portable_path]
)
, major_version
))
res <- .request(
verb = "GET"
, url = sprintf("%s/_cat/indices?format=json", es_url)
, config = list()
, body = NULL
)

And then it's failing right here, with the error seen above

log_info(paste("Getting indexed fields for indices:", indices))
result <- .request(
verb = "GET"
, url = es_url
, config = httr::add_headers(c("Content-Type" = "application/json")) # nolint[non_portable_path]
, body = NULL
)

Copy link
Collaborator Author

@jameslamb jameslamb Feb 19, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Someone reported this to elasticsearch, and the outcome of it was ".geoip_databases was supposed to be in GET /_cat/indices all long": elastic/elasticsearch#74687

And yet, it can't be queries with /_search and other APIS... so for example, Kibana started skipping it: elastic/kibana#104155

The full list (being updated as Elasticsearch changes) is here: elastic/elasticsearch#50251

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ran tests on a couple versions (https://github.com/uptake/uptasticsearch/actions/runs/13405276725/job/37443973224?pr=245) ... it seems to me like 0 "system indices" are returned by GET /_cat/indices in v7.01, v8.01, or v8.17.2.

But who knows, maybe that is an artifact of the default settings in the different container images, not actual changes in Elasticsearch.

And maybe we're only seeing .geoip_databases and not other system indices because in the 7.17.22 images, the ES functionality that uses whatever that is is enabled.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One more: https://discuss.elastic.co/t/why-cat-indices-shows-system-indices/337837

The index setting index.hidden will determine if the index is displayed or hidden by default. Default is false.

@@ -53,14 +83,14 @@ get_fields <- function(es_host
log_warn(sprintf(
paste0(
"You are running Elasticsearch version '%s.x'. _all is not supported in this version."
, " Pulling all indices with 'POST /_cat/indices' for you." # nolint[non_portable_path]
, " Pulling all indices with 'GET /_cat/indices' for you." # nolint[non_portable_path]
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another long-lived bug! This has been a GET request since the moment it was introduced back in 2019 😬

ref: #161

@jameslamb jameslamb marked this pull request as ready for review February 21, 2025 02:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
maintenance miscellaneous maintenance
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant