Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FIx Request Handling on Requests with Trailing Headers (Mainly in ChunkedContentDecoder) #8753

Merged
merged 1 commit into from
Feb 6, 2025

Conversation

shirady
Copy link
Contributor

@shirady shirady commented Feb 3, 2025

Explain the changes

Background
The ChunkedContentDecoder pipes only the actual content from the body (no size, headers, CR NL, trailers, etc.). Although we added the handling for the extension and trailers in the class - we don't use them.

  1. In http_utils.js accept more types of content sha256 headers (STREAMING-UNSIGNED-PAYLOAD-TRAILER, STREAMING-AWS4-HMAC-SHA256-PAYLOAD-TRAILER), as without those headers on clients that add the checksum headers with trailing we would fail.
  2. Change the state of the machine and add more states to support the trailing headers: STATE_READ_TRAILER (like we have STATE_READ_CHUNK_HEADER), STATE_WAIT_NL_TRAILER (like we have STATE_WAIT_NL_DATA) and STATE_WAIT_NL_END.
  3. Set the following constants to limit the request (avoid the client from abuse):
  • MAX_CHUNK_SIZE - we want to have a lower number than Number.MAX_SAFE_INTEGER (reference), we expect lower number.
  • MAX_CHUNK_HEADER_SIZE - we don't expect that long (as it is saved in memory during the parsing).
  • MAX_TRAILER_SIZE - same, in the example we saw it was about ~30 (x-amz-checksum-crc32:uOMGCw==).
  • MAX_TRAILERS - currently we saw the trailer of checksum (x-amz-checksum-crc32:uOMGCw==), we expect to have a few trailers in a request.
  1. Refactor and organize - add comments with explanations about the state machine, add helper functions, add separation between the parts, rename chunk_header_str to chunk_header, add members related to trailers, add the member this.stream_pos which we use for validation.
  2. Improve building the string (either this.chunk_header, and this.trailer) so we won't build it byte by byte, but only after we find the CR ('\r`).
  3. Replace buffer slice function with subarray as the function slice was deprecated (see reference).

Issues:

  1. There was an update with some of the AWS clients (for example):

List of GAPs:

  1. We have an implementation GAP about the checksum calculation as was reported in #S3 checksum support for validating object integrity #8487.
  2. As mentioned in Add AWS SDK Go V2 Packages and a Test Script Using It  noobaa-operator#1521 we would like to have more types of clients that would run in the CI, AWS SDK Go V2 is planned.
  3. Add a diagram for the current state machine (see comment).

Testing Instructions:

Unit Tests (automatic)

This tests the behavior of the ChunkedContentDecoder to various of inputs and the checks are cut differently.
Please notice that it runs in a loop so the random cut of the chunks will be well-covered.
Please run: npx jest test_chunked_content_decoder.test.js.

End-to-end Test (manual)

Use a client whose version was updated and adds the checksum trailers.
From what I experienced with this test the bucks were the same on every run, but it is not something that is guaranteed (therefore, the unit tests are covering this).

As mentioned in PR noobaa/noobaa-operator#1521, for example: go run script/client_script.go -bucket fourteen.bucket -key test-key -mpu test-mpu-key -endpoint "https://localhost:12443" -disable-deletion and expect to see all tests passing:

Running a couple of tests using AWS SDK Go V2...
--------------------------------------------------
Running on configured endpoint https://localhost:12443

creating bucket fourteen.bucket
succeeded in create bucket fourteen.bucket.

put object test-key in bucket fourteen.bucket
succeeded in put object test-key bucket fourteen.bucket.

create multi part upload test-mpu-key in bucket fourteen.bucket
succeeded in create multipart upload test-mpu-key bucket fourteen.bucket.

upload part test-mpu-key in bucket fourteen.bucket
succeeded in upload part 1 test-mpu-key in bucket fourteen.bucket.

complete multi part test-mpu-key in bucket fourteen.bucket
succeeded in upload part test-mpu-key in bucket fourteen.bucket.
--------------------------------------------------
disable-deletion flag was set, will not operate delete commands
--------------------------------------------------

Total Tests: 5
Passing Tests:  5 [createBucket putObject createMultipartUpload uploadPart completeMultiPartUpload]
Failing Tests:  0 []
Skipped Tests:  0 []
  • Doc added/updated
  • Tests added

@shirady
Copy link
Contributor Author

shirady commented Feb 3, 2025

Attached printing from the logs as an example:

In the headers of the request we can see 'x-amz-trailer': 'x-amz-checksum-crc32'.

Feb-3 14:35:44.552 [Endpoint/596]   [LOG] core.endpoint.s3.s3_rest:: SDSD print headers (req.headers) { host: 'localhost:12443', 'user-agent': 'aws-sdk-go-v2/1.34.0 ua/2.1 os/macos lang/go#1.23.2 md/GOOS#darwin md/GOARCH#arm64 api/s3#1.74.0 m/E,Z', 'content-length': '58', 'accept-encoding': 'identity', 'amz-sdk-invocation-id': 'ba0504c2-9712-4960-920a-7d45d93b1139', 'amz-sdk-request': 'attempt=1; max=3', authorization: 'AWS4-HMAC-SHA256 Credential=75GCqPFLPiMNYLkp54WY/20250203/us-east-1/s3/aws4_request, SignedHeaders=accept-encoding;amz-sdk-invocation-id;amz-sdk-request;content-encoding;content-length;content-type;host;x-amz-content-sha256;x-amz-date;x-amz-decoded-content-length;x-amz-trailer, Signature=c94d61a052a669f7d84c5af8f3d2d8fdbbd32a5927720a04f9dbb10a683bdc73', 'content-encoding': 'aws-chunked', 'content-type': 'application/octet-stream', 'x-amz-content-sha256': 'STREAMING-UNSIGNED-PAYLOAD-TRAILER', 'x-amz-date': '20250203T143545Z', 'x-amz-decoded-content-length': '16', 'x-amz-trailer': 'x-amz-checksum-crc32' }

We added some printings in the function parse(buf) to see how we parse the Chunked HTTP request and how it moves to different state machines.

Feb-3 14:35:44.553 [Endpoint/596]   [LOG] CONSOLE:: SDSD JSON.stringify(buf.toString()) "10\r\n"
Feb-3 14:35:44.553 [Endpoint/596]   [LOG] CONSOLE:: SDSD header_items [ '10' ]
Feb-3 14:35:44.553 [Endpoint/596]   [LOG] CONSOLE:: SDSD this.state STATE_SEND_DATA
Feb-3 14:35:44.554 [Endpoint/596]   [LOG] CONSOLE:: SDSD JSON.stringify(buf.toString()) "body for example"
Feb-3 14:35:44.554 [Endpoint/596]   [LOG] CONSOLE:: SDSD this.state STATE_WAIT_CR_DATA
Feb-3 14:35:44.554 [Endpoint/596]   [LOG] CONSOLE:: SDSD JSON.stringify(buf.toString()) "\r\n0\r\n"
Feb-3 14:35:44.554 [Endpoint/596]   [LOG] CONSOLE:: SDSD header_items [ '0' ]
Feb-3 14:35:44.554 [Endpoint/596]   [LOG] CONSOLE:: SDSD this.state STATE_SEND_DATA
Feb-3 14:35:44.554 [Endpoint/596]   [LOG] CONSOLE:: SDSD JSON.stringify(buf.toString()) "x-amz-checksum-crc32:uOMGCw==\r\n"
Feb-3 14:35:44.554 [Endpoint/596]   [LOG] CONSOLE:: SDSD in STATE_CONTENT_END
Feb-3 14:35:44.554 [Endpoint/596]   [LOG] CONSOLE:: SDSD this.state STATE_CONTENT_END
Feb-3 14:35:44.554 [Endpoint/596]   [LOG] CONSOLE:: SDSD JSON.stringify(buf.toString()) "\r\n"
Feb-3 14:35:44.554 [Endpoint/596]   [LOG] CONSOLE:: SDSD in STATE_CONTENT_END
Feb-3 14:35:44.554 [Endpoint/596]   [LOG] CONSOLE:: SDSD this.state STATE_CONTENT_END

The client is using PutObject with body "body for example", it is 16 characters (decimal) in hexadecimal base it is 10.
Analyzing the HTTP Chunked Request.

  • 10\r\n (chunk header -> first chunk is 16 bytes)
  • body for example (content -> the chunk itself)
  • \r\n (end of chunk)
  • 0\r\n (end of request body)
  • x-amz-checksum-crc32:uOMGCw==\r\n (trailing header) -> we don't want to fail the request on this chunk (currently we don't validate it).

@shirady shirady requested review from guymguym and jackyalbo February 3, 2025 15:26
@shirady shirady self-assigned this Feb 3, 2025
@shirady shirady force-pushed the checksum-issue-checks branch from 057ec16 to d7251c3 Compare February 4, 2025 11:41
@pull-request-size pull-request-size bot added size/M and removed size/S labels Feb 4, 2025
@shirady shirady force-pushed the checksum-issue-checks branch 2 times, most recently from a12235c to 385f83c Compare February 4, 2025 19:51
@shirady shirady requested a review from guymguym February 4, 2025 19:55
@pull-request-size pull-request-size bot added size/L and removed size/M labels Feb 5, 2025
@shirady shirady changed the title Improve Request Handling on Requests with Trailing Headers FIx Request Handling on Requests with Trailing Headers (Mainly in ChunkedContentDecoder) Feb 5, 2025
src/util/chunked_content_decoder.js Outdated Show resolved Hide resolved
src/util/chunked_content_decoder.js Outdated Show resolved Hide resolved
src/util/chunked_content_decoder.js Show resolved Hide resolved
@shirady shirady force-pushed the checksum-issue-checks branch from 141aa75 to 0cae790 Compare February 5, 2025 14:19
@pull-request-size pull-request-size bot added size/XL and removed size/L labels Feb 5, 2025
@guymguym guymguym marked this pull request as ready for review February 5, 2025 15:05
@guymguym
Copy link
Member

guymguym commented Feb 5, 2025

squash?

…nkedContentDecoder)

1. In http_utils.js accept more types of content sha256 headers (STREAMING-UNSIGNED-PAYLOAD-TRAILER, STREAMING-AWS4-HMAC-SHA256-PAYLOAD-TRAILER), as without those headers on clients that add the checksum headers with trailing we would fail.
2. Change the state of the machine and add more states to support the trailing headers: STATE_READ_TRAILER (like we have STATE_READ_CHUNK_HEADER), STATE_WAIT_NL_TRAILER (like we have STATE_WAIT_NL_DATA) and STATE_WAIT_NL_END.
3. Set the following constants to limit the request (avoid the client from abuse):
   - MAX_CHUNK_SIZE - we want to have a lower number than Number.MAX_SAFE_INTEGER (reference), we expect lower number.
   - MAX_CHUNK_HEADER_SIZE - we don't expect that long (as it is saved in memory during the parsing).
   - MAX_TRAILER_SIZE - same, in the example we saw it was about ~30 (x-amz-checksum-crc32:uOMGCw==).
   - MAX_TRAILERS - currently we saw the trailer of checksum (x-amz-checksum-crc32:uOMGCw==), we expect to have a few trailers in a request.
4. Refactor and organize - add comments with explanations about the state machine, add helper functions, add separation between the parts, rename chunk_header_str to chunk_header, add members related to trailers, add the member this.stream_pos which we use for validation.
5. Improve building the string (either this.chunk_header, and this.trailer) so we won't build it byte by byte, but only after we find the CR ('\r`).
6. Replace buffer slice function with subarray as the function slice was deprecated (see reference).

Co-authored-by: Guy Margalit <[email protected]>
Signed-off-by: shirady <[email protected]>
@shirady shirady force-pushed the checksum-issue-checks branch from 3e21106 to 9b5cc89 Compare February 5, 2025 16:33
@guymguym
Copy link
Member

guymguym commented Feb 5, 2025

@jackyalbo -

---
config:
  theme: dark
  look: classic
  layout: elk
---
flowchart TD
    A -- not CR, append string --> A["STATE_READ_CHUNK_HEADER <br> read the chunk header until CR and parse it"]
    A -. CR, header parse problem .-> J["STATE_ERROR <br> an error occurred"]
    A -- "CR, chunk_size!=0" --> B["STATE_WAIT_NL_HEADER <br> wait for NL after the chunk header"]
    A -- "CR, chunk_size==0" --> F["STATE_READ_TRAILER <br> read optional trailer until CR and save it"]
    C -- data --> C["STATE_SEND_DATA <br> send chunk data to the stream until chunk size bytes sent"]
    C -- done size bytes --> D["STATE_WAIT_CR_DATA <br> wait for CR after the chunk data"]
    F -- not CR, append string --> F
    F -- CR, keep trailer --> G["STATE_WAIT_NL_TRAILER <br> wait for NL after non empty trailer"]
    F -- CR, empty trailer --> H["STATE_WAIT_NL_END <br> wait for NL after the last empty trailer"]
    D -- CR --> E["STATE_WAIT_NL_DATA <br> wait for NL after the chunk data"]
    H -- NL --> I["STATE_CONTENT_END <br> the stream is done"]
    B -- NL --> C
    E -- NL --> A
    G -- NL --> F
    B -. not NL .-> J
    E -. not NL .-> J
    G -. not NL .-> J
    H -. not NL .-> J
    D -. not CR .-> J
    I -. any .-> J
Loading

@guymguym
Copy link
Member

guymguym commented Feb 5, 2025

But the rendered diagram on github looks worse than on the mermaid website -

Untitled diagram-2025-02-05-231416

@shirady shirady merged commit debd202 into noobaa:master Feb 6, 2025
11 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants