Always retry full-request failures #523

andrewvc · 2016-12-06T19:15:53Z

We now always retry full-request failures. It makes no sense to stop retrying a bulk request because the bulk API should always return 200. In any other situation we may lose data. Sometimes a full request can legitimately fail if there is a too-large request size from a too-large event. To fix this we now check before sending requests if the size is too large, and have added a new max_request_size option to configure this.

Fixes #522
Fixes #321

andrewvc · 2016-12-06T19:16:52Z

Alternate approach. Maybe we don't retry 400s, but instead detect the large requests AOT.

jordansissel · 2016-12-06T19:17:50Z

If we assume 400 could mean too-large, we could just retry with a smaller request on 400s?

(Detecting too-large requires we know the maximum size of a request body in Elasticsearch before we send the request, which may not be practical to obtain? I'm open to whatever)

andrewvc · 2016-12-06T19:21:54Z

@jordansissel this only happens if there is a single event larger than the limit. We already attempt to split up requests in this case, but we obviously cannot subdivide a single event.

andrewvc · 2016-12-06T19:27:19Z

The more I think about it the more I think we need a new option max_request_size. We just don't send any request larger than that, and that way we don't need a 'special' 400 case.

In the absence of the too-large-request scenario we should retry 400s indefinitely because that would indicate either:

A bug in the output code
A bug in the user infrastructure, a bad proxy say

In either case they wouldn't want to start dropping events in those cases. A stall would be appropriate.

jordansissel · 2016-12-06T19:57:29Z

In the absence of the too-large-request scenario we should retry 400s indefinitely

++

andrewvc · 2016-12-06T20:53:45Z

@jordansissel OK, so I've changed the logic and updated the description per our discussion. Can I get a review from yourself?

andrewvc · 2016-12-06T20:54:44Z

@jordansissel is there a plugin-api change I need to make now that we're using the bytes validation?

jordansissel · 2016-12-12T16:32:03Z

lib/logstash/outputs/elasticsearch.rb

@@ -88,7 +93,7 @@ class LogStash::Outputs::ElasticSearch < LogStash::Outputs::Base
  # - A sprintf style string to change the action based on the content of the event. The value `%{[foo]}`
  #   would use the foo field for the action
  #
-  # For more details on actions, check out the http://www.elastic.co/guide/en/elasticsearch/reference/current/docs-bulk.html[Elasticsearch bulk API documentation]
+  # For more details on actions, check out the http://www.elastic.co/guide/en/elasticsearch/reference/current/docs-bulk.html[Elasticsearch bthey are:ulk API documentation]


typo inserting 'they are:' in the middle of the word 'bulk' ?

Ah, clearly a complete ViM failure :)

jordansissel · 2016-12-12T16:33:29Z

lib/logstash/outputs/elasticsearch.rb

+  #
+  # This plugin will try to send appropriately sized requests, using multiple
+  # requests per batch if the events are very large, but if a single event
+  # is larger than this value sending even that single request will break. 


"will break" --- it will break what?

jordansissel · 2016-12-12T16:35:25Z

lib/logstash/outputs/elasticsearch.rb

+  # or other things that could account for a slight difference in the HTTTP request, as our size
+  # check only happens on the request body.
+  #
+  # This is specified as a number now, but we should move it to the new 'bytes' type


"This is specified ..." will show up in the docs. I recommend not putting this comment here.

I'll put in a note saying that we will change it in the future.

andrewvc · 2016-12-15T16:15:25Z

@jordansissel can we move this forward to LGTM?

jordansissel · 2016-12-15T17:11:02Z

lib/logstash/outputs/elasticsearch.rb

+  # or other things that could account for a slight difference in the HTTTP request, as our size
+  # check only happens on the request body.
+  #
+  # This is specified as a number now, but we will be moved to the new 'bytes' type


I think this is a TODO item and we don't want ti showing up in the docs (which it would, in its current location in a comment above this)

jordansissel · 2016-12-15T17:11:58Z

Once the comment-that-looks-like-a-todo-item thing is removed, LGTM :)

Addressed your final point Jordan! Thanks for the LGTM!

We now always retry full-request failures. It makes no sense to stop retrying a bulk request because the bulk API should always return 200. In any other situation we may lose data. Sometimes a full request can legitimately fail if there is a too-large request size from a too-large event. To fix this we now check before sending requests if the size is too large, and have added a new max_request_size option to configure this. Fixes logstash-plugins#522

andrewvc · 2016-12-21T00:45:22Z

I for the life of me can't repro these travis failures. They ONLY happen on 5.1.1, and they definitely don't happen locally for me :(. The lack of SSH into travis makes this even more perplexing.

I tried some debugging last week, and it appeared that ES wasn't accepting TCP connections anymore on travis!

Bad travis, bad!

@jsvd @jordansissel any ideas here?

andrewvc · 2016-12-21T00:47:46Z

The failure can be seen here: https://travis-ci.org/logstash-plugins/logstash-output-elasticsearch/jobs/185633956

andrewvc added bug enhancement labels Dec 6, 2016

andrewvc self-assigned this Dec 6, 2016

andrewvc force-pushed the fix-retry-behavior branch from 430c1ce to b66df6c Compare December 6, 2016 20:46

andrewvc changed the title ~~Always retry full-request failures except 400.~~ Always retry full-request failures Dec 6, 2016

andrewvc force-pushed the fix-retry-behavior branch from b66df6c to 9067315 Compare December 6, 2016 20:53

andrewvc force-pushed the fix-retry-behavior branch 2 times, most recently from 82d814c to b4341de Compare December 9, 2016 19:41

jordansissel reviewed Dec 12, 2016

View reviewed changes

andrewvc force-pushed the fix-retry-behavior branch from 7200f34 to 6e5b4d6 Compare December 14, 2016 22:55

jordansissel previously requested changes Dec 15, 2016

View reviewed changes

andrewvc force-pushed the fix-retry-behavior branch from 6e5b4d6 to 95330c1 Compare December 15, 2016 18:49

andrewvc force-pushed the fix-retry-behavior branch 2 times, most recently from fdf3cd7 to 95330c1 Compare December 16, 2016 16:58

This was referenced Dec 20, 2016

Unexpected warnings/errors about Elasticsearch being down #498

Closed

Retry logic does the wrong thing #321

Closed

andrewvc force-pushed the fix-retry-behavior branch from 95330c1 to 36b9e77 Compare December 21, 2016 00:21

roaksoax added the status:needs-review label Apr 6, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Always retry full-request failures #523

Always retry full-request failures #523

andrewvc commented Dec 6, 2016 •

edited

Loading

andrewvc commented Dec 6, 2016

jordansissel commented Dec 6, 2016

andrewvc commented Dec 6, 2016

andrewvc commented Dec 6, 2016

jordansissel commented Dec 6, 2016

andrewvc commented Dec 6, 2016

andrewvc commented Dec 6, 2016

jordansissel Dec 12, 2016

andrewvc Dec 12, 2016

jordansissel Dec 12, 2016

jordansissel Dec 12, 2016

andrewvc Dec 12, 2016

andrewvc commented Dec 15, 2016

jordansissel Dec 15, 2016

jordansissel commented Dec 15, 2016

andrewvc commented Dec 21, 2016

andrewvc commented Dec 21, 2016

Always retry full-request failures #523

Are you sure you want to change the base?

Always retry full-request failures #523

Conversation

andrewvc commented Dec 6, 2016 • edited Loading

andrewvc commented Dec 6, 2016

jordansissel commented Dec 6, 2016

andrewvc commented Dec 6, 2016

andrewvc commented Dec 6, 2016

jordansissel commented Dec 6, 2016

andrewvc commented Dec 6, 2016

andrewvc commented Dec 6, 2016

jordansissel Dec 12, 2016

Choose a reason for hiding this comment

andrewvc Dec 12, 2016

Choose a reason for hiding this comment

jordansissel Dec 12, 2016

Choose a reason for hiding this comment

jordansissel Dec 12, 2016

Choose a reason for hiding this comment

andrewvc Dec 12, 2016

Choose a reason for hiding this comment

andrewvc commented Dec 15, 2016

jordansissel Dec 15, 2016

Choose a reason for hiding this comment

jordansissel commented Dec 15, 2016

andrewvc commented Dec 21, 2016

andrewvc commented Dec 21, 2016

andrewvc commented Dec 6, 2016 •

edited

Loading