-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update Forward Protocol Specification for zstd compression support #4758
Comments
I propose changes to the following two sections.
### CompressedPackedForward Mode
-It carries a series of events as a msgpack binary, compressed by gzip, on a single request. The supported compression algorithm is only gzip.
+It carries a series of events as a msgpack binary, compressed by gzip or zstd, on a single request.
-- `entries` is a gzipped binary chunk of `MessagePackEventStream`, which MAY be a concatenated binary of multiple gzip binary strings.
-- Client MUST send an option with `compressed` key with the value `gzip`.
-- Client MUST send a gzipped chunk as msgpack `bin` format.
+- `entries` is a gzip/zstd binary chunk of `MessagePackEventStream`, which MAY be a concatenated binary of multiple gzip/zstd binary strings.
+- Client MUST send an option with `compressed` key with the value `gzip` or `zstd`.
+- Client MUST send a gzip/zstd chunk as msgpack `bin` format.
- Server MUST accept `bin` format.
- Server MAY decompress and decode individual events on demand but MAY NOT do right after request arrival. It means it MAY costs less, compared to `Forward` mode, when decoding is not needed by any plugins.
https://github.com/fluent/fluentd/wiki/Forward-Protocol-Specification-v1.5#option - Server MAY just ignore any options given.
- `size`: Clients MAY send the `size` option to show the number of event records in an entries by an integer as a value. Server can know the number of events without unpacking entries (especially for PackedForward and CompressedPackedForward mode).
- `chunk`: Clients MAY send the `chunk` option to confirm the server receives event records. The value is a string of Base64 representation of 128 bits `unique_id` which is an ID of a set of events.
-- `compressed`: Clients MUST send the `compressed` option with value `gzip` to tell servers that entries is `CompressedPackedForward`. Other values will be ignored.
+- `compressed`: Clients MUST send the `compressed` option with value `gzip` or `zstd` to tell servers that entries is `CompressedPackedForward`. Other values will be ignored.
```json
{"chunk": "p8n9gmxTQVC8/nh2wlKKeQ==", "size": 4097} |
Does this revision require a voting process? |
The changes suggested here looks good to me. |
If there are no objections, I would like to set it to v1.6 with the agreement of both Fluentd and Fluent Bit maintainers. @cosmo0920 |
LGTM |
This would be good:
However, I have a question for these sentences. Currently, we're able to decompress compressed chunks one-by-one. So, in paper, we are also able to take turns to decompress gzip compressed chunks and zstd compressed chunks. |
@cosmo0920 are u pointing to the case where different sources send different type of compressed chunks? Incase of forward protocol the message itself has the compression type so the decompression happens wrt the metadata of the compressed chunk. |
Yes, different sources use the same in_forward case. This is surely occurred because Fluentd is able to be used as an aggregator which will collect the different Fluentd instances with pointing the same forward endpoint. So, probably we need to clarify on it even if the implementation already does. |
@cosmo0920 |
It’s not answered my question. My question is: Should we write down the current decompression behavior explicitly in the specification document of forward protocol? It's not covered for the actual implementation. This is because Fluent Protocol is shared between Fluentd and Fluent Bit. So, we need to describe the specification in the document instead of the implementations. |
got it @cosmo0920 . Previously there was a possibility that multiple sources could send either gzip compressed chunk or uncompressed one. If we consider uncompressed data also as a compression type then essentially the previous specification was good enough to express that 2 types were supported. Now we are just adding one more number to it so like idk if we really need to be explicit here |
@cosmo0920 @Athishpranav2003
I see! I didn't know about this.
Fluentd only decompresses the whole of CompressedMessagePackEventStream.
So, it would be better to improve the text to avoid confusion. @cosmo0920 |
Ah, Fluentd also handles CompressedMessagePackEventStream one-by-one. That indicates:
This could be happened if multiple sender send payloads for an aggregator. I'm not sure you'd been misunderstanding what I meant but am I right here? I mean, the above situation could be happened and this mixed compressed sequence could be happened. Note that I didn't mean that this contamination would be happened within the specific CompressedMessagePackEventStream. |
This could be work because Fluent Bit also decompresses payloads which are compressed with certain compression methods. |
Oh! Sorry, I misunderstood! I see!
Thanks! |
As @cosmo0920 says, it is possible for multiple senders to send in different compression types. As @Athishpranav2003 says, there is no problem because what was originally two types will only become three types. It would be better to clarify the description to avoid misunderstandings. |
How about this? - Server MUST accept `bin` format.
+- Server MUST decompress `entries` in the format according to the value of `compressed` key of the option for each msgpack binary.
- Server MAY decompress and decode individual events on demand but MAY NOT do right after request arrival. It means it MAY costs less, compared to `Forward` mode, when decoding is not needed by any plugins. |
I noticed the description of these table should also be changed. https://github.com/fluent/fluentd/wiki/Forward-Protocol-Specification-v1.5#logs-type-3 #### Logs Type
name | Ruby type | msgpack format | content
--- | --- | --- | ---
tag | String | str | tag name
- entries | CompressedMessagePackEventStream | bin | gzipped msgpack stream of Entry
+ entries | CompressedMessagePackEventStream | bin | compressed msgpack stream of Entry
option | Hash | map | option including key "compressed" (required)
```json
[
"tag.name",
"<<CompressedMessagePackEventStream>>",
{"compressed": "gzip"}
]
```
#### Metrics or Traces Type
name | Ruby type | msgpack format | content
--- | --- | --- | ---
tag | String | str | tag name
- entries | Msgpack stream for observabilities | bin | gzipped msgpack stream of Entry
+ entries | Msgpack stream for observabilities | bin | compressed msgpack stream of Entry
option | Hash | map | option including key "compressed" and "fluent\_signal" (required)
```json
[
"tag.name",
"<<Compressed payloads of observabilities>>",
{"compressed": "gzip", "fluent_signal": 1|2} # 1 for metrics and 2 for traces.
]
``` |
Based on the above, I re-propose the following change as v1.6.
### CompressedPackedForward Mode
-It carries a series of events as a msgpack binary, compressed by gzip, on a single request. The supported compression algorithm is only gzip.
+It carries a series of events as a msgpack binary, compressed by gzip or zstd, on a single request.
-- `entries` is a gzipped binary chunk of `MessagePackEventStream`, which MAY be a concatenated binary of multiple gzip binary strings.
-- Client MUST send an option with `compressed` key with the value `gzip`.
-- Client MUST send a gzipped chunk as msgpack `bin` format.
+- `entries` is a gzip/zstd binary chunk of `MessagePackEventStream`, which MAY be a concatenated binary of multiple gzip/zstd binary strings.
+- Client MUST send an option with `compressed` key with the value `gzip` or `zstd`.
+- Client MUST send a gzip/zstd chunk as msgpack `bin` format.
- Server MUST accept `bin` format.
+- Server MUST decompress `entries` in the format according to the value of `compressed` key of the option for each msgpack binary.
- Server MAY decompress and decode individual events on demand but MAY NOT do right after request arrival. It means it MAY costs less, compared to `Forward` mode, when decoding is not needed by any plugins.
https://github.com/fluent/fluentd/wiki/Forward-Protocol-Specification-v1.5#logs-type-3 #### Logs Type
name | Ruby type | msgpack format | content
--- | --- | --- | ---
tag | String | str | tag name
- entries | CompressedMessagePackEventStream | bin | gzipped msgpack stream of Entry
+ entries | CompressedMessagePackEventStream | bin | compressed msgpack stream of Entry
option | Hash | map | option including key "compressed" (required)
https://github.com/fluent/fluentd/wiki/Forward-Protocol-Specification-v1.5#metrics-or-traces-type-3 #### Metrics or Traces Type
name | Ruby type | msgpack format | content
--- | --- | --- | ---
tag | String | str | tag name
- entries | Msgpack stream for observabilities | bin | gzipped msgpack stream of Entry
+ entries | Msgpack stream for observabilities | bin | compressed msgpack stream of Entry
option | Hash | map | option including key "compressed" and "fluent\_signal" (required)
https://github.com/fluent/fluentd/wiki/Forward-Protocol-Specification-v1.5#option - Server MAY just ignore any options given.
- `size`: Clients MAY send the `size` option to show the number of event records in an entries by an integer as a value. Server can know the number of events without unpacking entries (especially for PackedForward and CompressedPackedForward mode).
- `chunk`: Clients MAY send the `chunk` option to confirm the server receives event records. The value is a string of Base64 representation of 128 bits `unique_id` which is an ID of a set of events.
-- `compressed`: Clients MUST send the `compressed` option with value `gzip` to tell servers that entries is `CompressedPackedForward`. Other values will be ignored.
+- `compressed`: Clients MUST send the `compressed` option with value `gzip` or `zstd` to tell servers that entries is `CompressedPackedForward`. Other values will be ignored.
```json
{"chunk": "p8n9gmxTQVC8/nh2wlKKeQ==", "size": 4097} |
I propose the following: -- Server MUST decompress `entries` in the format according to the value of `compressed` key of the option for each msgpack binary.
+- Server MUST decompress `entries` in the format according to the value of `compressed` key of the option which contains `gzip` value for each gzip compressed msgpack binary.
+- Server MAY decompress `entries` in the format according to the value og `compressed` key of the option which contains `zstd` value for each zstd compressed msgpack binary.
+- Server MUST decompress `entries` that are gzip or ztsd compressed formats if a server supports both of decompression formats. This is because zstd support status on Fluent Bit is still in PoC. So, we need to provide an option to support zstd compression and decompression in forward protocol. |
Hmm, but wouldn't that be an incomplete protocol?
Isn't it simply a matter of not being able to use zstd compression until both the server and client support v1.6 protocol? If we find any problems in supporting this protocol in the future, we can revise it again at that time. |
How about extending HELO (and PING/PONG) in forward protocol v1.6? |
I mean, there is a possibility to support zstd to be delayed in Fluent Bit. So, we need to tell clients which want to send their payloads which is using forward protocol. Plus, there is a possibility to rolling out client at first and aggregator in some of the users' deployment.
|
I see! |
@daipom so essentially do we do a handshake to ensure client supports |
Yup. HELO is able to contains options. Currently, it contains nonce, auth, and keepalive. So, we need to add compression option there like: ref: https://github.com/fluent/fluentd/wiki/Forward-Protocol-Specification-v1.5#helo |
@Athishpranav2003 The protocol would be as @cosmo0920 says. We need to implement it with Let's do that with a different PR than #4657. |
fluentd/lib/fluent/plugin/out_forward.rb Lines 680 to 683 in 30c3ce0
The options of HELO would be added into |
@daipom so essentially we assume that the client to be forwarded is only known at runtime so we perform this check in the establish connection part and if the check fails we decrypt the info and send it instead of giving an error ryt?(i presume we through a warning log atleast) |
@Athishpranav2003
It would be preferable to fallback to plain format rather than to treat it as an error. |
For this can i just simply add the compression option in the We can either make it start from server side or make the client side simply add if they support the option |
Sorry... I missed that HELO is not mandatory. The Handshake process runs only when the server requires authentication. I wonder why https://github.com/fluent/fluentd/wiki/Forward-Protocol-Specification-v1.5#helo When the server doesn't require authentication, this information cannot be used. Sorry I didn't notice it first, but I wonder if it might not be desirable to add the info of supported compression format to the options of the Handshake for authentication. The fallback feature we are considering now would be provided as a limited mechanism that would be enabled only in cases where the server requires authentication. I'd like to hear your views on this point. @cosmo0920 @Athishpranav2003 |
We refer keepalive in option: https://github.com/fluent/fluent-bit/blob/master/plugins/out_forward/forward.c#L357
Yes. It should be enough in case of enabling authentication in forward protocol. The reason why HELO is not mandatory is for a fluent-logger that uses minimal mechanism to fulfill forward protocol. The minimal forward protocol behaves like simply tcp connections. My thought for Forward protocol v1.6 is: Just enabling zstd compression when HELO option tells the server is compatible with zstd compression option. This is because zstd compressed or gzip compressed chunks are simply binaries when the targeted fluent server don't support those compression methods. So, I thought as a pessimistic way to ensure decompress the sent chunks which are compressed by zstd or gzip compression. |
Thanks! It appears that this value is not being used. My concern is the validity of including information that has nothing to do with authentication.
The client can not get HELO option in cases where the server dose not require authentication. |
I mean, we don't use zstd in not getting HELO case. Just like upgrading behavior of HTTP2 from HTTP 1.1. |
Or, adding upgrade protocol message from client could be working for our case? |
For this case, the first compressed chunk will be compressed with gzip and upgrade option will be attached like: UPDATE: Ugh, it's not working because there is no explicit failure response is defined.... |
I see..., thanks for considering!
Hmm, it seems a bit incongruous to me that I am considering that we should not change the forward protocol at this time. First, we make Fluentd handle zstd internally and revert Fluentd's out_forward zstd feature to avoid violating the protocol. |
Or, add #4758 (comment) something like the following:
Or, for now, we will make zstd available as an exprimenatal feature on the Fluentd side, and note in the documentation that it is an experimental feature that cannot be used unless the server supports it. |
Even i strongly feel that its more suitable to put this as an important note and leave it to the responsiblity of the users |
Reverting zstd compression support is not working for us. Just displaying as a warning log to tell: "This is experimental feature and not standardized yet" or similar message should be enough. I mean, adding zstd compression with upgrading mechanism could make seamless upgrading to use zstd compression in forward protocol. However, we didn't find a suitable way to prevent unsupported errors on "fluent servers". |
@Athishpranav2003 @cosmo0920 |
Is your feature request related to a problem? Please describe.
The following PR supports zstd compression.
So, we need to update Forward Protocol Specification - CompressedPackedForward Mode.
Describe the solution you'd like
Update the following description and add
zstd
value tocompressed
option.Would that be Forward Protocol Specification v1.6?
Describe alternatives you've considered
Having no idea.
Additional context
No response
The text was updated successfully, but these errors were encountered: