Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Observed a significant performance degradation in multipart uploads after upgrading from Helidon MP 1.4.14 to Helidon MP 4.1.1 #9678

Open
DeepakR-Oracle opened this issue Jan 23, 2025 · 14 comments
Assignees
Labels
4.x Version 4.x jax-rs JAX-RS and Jersey related issues performance
Milestone

Comments

@DeepakR-Oracle
Copy link

Issue Description:


We are experiencing a significant performance degradation in multipart file uploads after upgrading our application. Our application has an API that allows users to upload files using multipart (we use the jersey-media-multipart dependency). Previously, with Helidon MP 1.4.14 and JDK 8, it took approximately 4 seconds to ingest a 4.2MB file. After upgrading to Helidon MP 4.1.1 and JDK 21, we are observing a noticeable delay when uploading the same file ( in minutes ) .

Environment Details:


Kubernetes : v1.28.11
Kubernetes CNI : Flannel v0.25.4
Helidon MP : v4.1.1
JDK : 21

Information


Here's how our API looks currently.

@Consumes({MediaType.MULTIPART_FORM_DATA})
@Produces({MediaType.APPLICATION_JSON})
public Response uploadFile(
        @FormDataParam("file") InputStream fileOne,
        @FormDataParam("file") FormDataContentDisposition fileOneDisposition
) {
    if(Objects.nonNull(fileOne)) {
        byte[] buffer = new byte[2048];
        int readBytes;
        try (ByteArrayOutputStream bos = new ByteArrayOutputStream()) {
            while ((readBytes = fileOne.read(buffer, 0, buffer.length)) != -1) {
                bos.write(buffer, 0, readBytes);
            }
            bos.flush();
            logger.info("Finished reading all the data!");
        } catch (IOException e) {
            throw new ImportUploadException("Error occurred while reading file!", e);
        }
    }
}

Async profiler data :

async-profiler-data.zip

What we were able to observe internally is that, in Helidon 4.x, reading data from the socket seems to be handled by the LazyInputStream class, whereas this behavior is not observed in Helidon 1.x.

We are wondering if lazy reading might contribute to the slowness observed in multipart processing. Could the LazyInputStream implementation or its interaction with jersey-media-multipart be causing delays in reading and processing multipart data?

Additional Information


We have a GA next month, any timely help would be greatly appreciated.

@github-project-automation github-project-automation bot moved this to Triage in Backlog Jan 23, 2025
@spericas spericas self-assigned this Jan 23, 2025
@spericas spericas added 4.x Version 4.x jax-rs JAX-RS and Jersey related issues performance labels Jan 23, 2025
@spericas spericas moved this from Triage to In Progress in Backlog Jan 23, 2025
@spericas spericas added this to the 4.2.0 milestone Jan 23, 2025
@spericas
Copy link
Member

I'm not able to reproduce the problem so far. I've tried uploading a 4M multipart on Linux and MacOS and the upload times seem reasonable. Please see attached zip file.

>> mvn clean package
>> java -jar target/quickstart-mp.jar &
>> time curl -F [email protected] http://localhost:8080/greet/multi

quickstart-mp.zip

@spericas
Copy link
Member

spericas commented Jan 23, 2025

Looks like the problem is related to running the server on Linux and using a non-loopback interface. Here is a quick test using Mac and Linux:

  1. Linux client to mac server: 2.3 s
  2. Mac client to linux server: 12.3 s

We need to look a bit closer as to how Jersey is reading the data from Helidon and why it seems to performed badly on Linux.

@barchetta
Copy link
Member

For reference (a previous perf issue on Linux): #7983

@spericas
Copy link
Member

spericas commented Jan 27, 2025

Looks like this is relared to the low-level TCP receive buffer configuration. Apparently, this affects Linux in particular. Increasing that buffer size solves the problem as follows (e.g. setting it to 32K):

server.connection-options.socket-receive-buffer-size=32768

So it seems that the auto-tuning we enable in Linux does not work well in this case.

@DeepakR-Oracle
Copy link
Author

Hi !

We tested the suggested configuration, but it did not result in any change in the observed latency. Additionally, we reviewed the performance tuning documentation and experimented with other configurations, but no changes were observed.

server:
    receive-buffer-size: 32768
    write-buffer-size: 32768
    connection-options:
        socket-receive-buffer-size: 32768
        socket-send-buffer-size: 32768

@spericas
Copy link
Member

@DeepakR-Oracle That link may need to be updated, the only option you need is the one I listed above. I can clearly see the improvement if the receive buffer is increased, but let me double check one more time using version 4.1.1. If this works for me, I will provide step-by-step instructions on how I tested it for you to try out.

@spericas
Copy link
Member

spericas commented Jan 28, 2025

@DeepakR-Oracle The setting for the receive buffer size works for me. The scenario under test is described below.

Prerequisites

  • Helidon 4.1.1
  • Two machines, one for client one for server (issue not reproducible using localhost)
  • JDK 21
  • Test below uses two Ubuntu machines on a wi-fi network

Testing

  1. Unpack zip file attached here on server machine
  2. Build and run as follows
>> mvn clean install && java -jar target/quickstart-mp.jar
  1. On client machine create file for uploading and use curl as shown
>> dd if=/dev/zero of=zeros.bin bs=1k count=4000
4000+0 records in
4000+0 records out

>> time curl -v -F [email protected] http://<server-machine>:8080/greet/multi
* processing: http://ubuntu:8080/greet/multi
*   Trying 192.168.6.250:8080...
* Connected to ubuntu (192.168.6.250) port 8080
> POST /greet/multi HTTP/1.1
> Host: ubuntu:8080
> User-Agent: curl/8.2.1
> Accept: */*
> Content-Length: 4096201
> Content-Type: multipart/form-data; boundary=------------------------48f7a9742ba6e477
> Expect: 100-continue
> 
< HTTP/1.1 100 Continue
* We are completely uploaded and fine
< HTTP/1.1 200 OK
< Date: Tue, 28 Jan 2025 09:55:52 -0500
< Connection: keep-alive
< Content-Length: 0
< 
* Connection #0 to host ubuntu left intact

real	0m9.578s
user    0m0.009s
sys	0m0.014s
  1. Now edit microprofile-config.properties file in project and uncomment the line below
#server.connection-options.socket-receive-buffer-size=32768
  1. Rebuild and re-run Helidon app and try curl one more time from client machine
>> time curl -v -F [email protected] http://ubuntu:8080/greet/multi
* processing: http://ubuntu:8080/greet/multi
*   Trying 192.168.6.250:8080...
* Connected to ubuntu (192.168.6.250) port 8080
> POST /greet/multi HTTP/1.1
> Host: ubuntu:8080
> User-Agent: curl/8.2.1
> Accept: */*
> Content-Length: 4096201
> Content-Type: multipart/form-data; boundary=------------------------3cb03e79de62a746
> Expect: 100-continue
> 
< HTTP/1.1 100 Continue
* We are completely uploaded and fine
< HTTP/1.1 200 OK
< Date: Tue, 28 Jan 2025 09:58:11 -0500
< Connection: keep-alive
< Content-Length: 0
< 
* Connection #0 to host ubuntu left intact

real	0m1.048s
user	0m0.007s
sys	0m0.006s
  1. Test runs about 10X faster with the socket setting

quickstart-mp.zip

@DeepakR-Oracle
Copy link
Author

DeepakR-Oracle commented Jan 29, 2025

Hi,

We tested the provided files, but the issue still persists. :/

Environment :

Server :

>>> cat /etc/os-release 
NAME="Oracle Linux Server"
VERSION="8.10"
ID="ol"
ID_LIKE="fedora"
VARIANT="Server"
VARIANT_ID="server"
VERSION_ID="8.10"
PLATFORM_ID="platform:el8"
PRETTY_NAME="Oracle Linux Server 8.10"
ANSI_COLOR="0;31"
CPE_NAME="cpe:/o:oracle:linux:8:10:server"
HOME_URL="https://linux.oracle.com/"
BUG_REPORT_URL="https://github.com/oracle/oracle-linux"

ORACLE_BUGZILLA_PRODUCT="Oracle Linux 8"
ORACLE_BUGZILLA_PRODUCT_VERSION=8.10
ORACLE_SUPPORT_PRODUCT="Oracle Linux"
ORACLE_SUPPORT_PRODUCT_VERSION=8.10

>>> java -version
java version "21.0.4" 2024-07-16 LTS
Java(TM) SE Runtime Environment (build 21.0.4+8-LTS-274)
Java HotSpot(TM) 64-Bit Server VM (build 21.0.4+8-LTS-274, mixed mode, sharing)

Client :

>>> system_profiler SPSoftwareDataType                                                                                                                                                                                                                                                                                                                                                 
Software:

    System Software Overview:

      System Version: macOS 14.7 (23H124)
      Kernel Version: Darwin 23.6.0
      Boot Volume: Macintosh HD
      Boot Mode: Normal
      Computer Name: rdeepak-mac
      User Name: R Deepak (rdeepak)
      Secure Virtual Memory: Enabled
      System Integrity Protection: Enabled
      Time since boot: 4 days, 11 hours, 33 minutes

>>> curl --version          
curl 8.7.1 (x86_64-apple-darwin23.0) libcurl/8.7.1 (SecureTransport) LibreSSL/3.3.6 zlib/1.2.12 nghttp2/1.61.0
Release-Date: 2024-03-27
Protocols: dict file ftp ftps gopher gophers http https imap imaps ipfs ipns ldap ldaps mqtt pop3 pop3s rtsp smb smbs smtp smtps telnet tftp
Features: alt-svc AsynchDNS GSS-API HSTS HTTP2 HTTPS-proxy IPv6 Kerberos Largefile libz MultiSSL NTLM SPNEGO SSL threadsafe UnixSockets

Saw the following warning log in the server at some point :

2025.01.29 15:33:22 WARNING io.helidon.webserver.http.ErrorHandlers VirtualThread[#47,[0x5b6fdc12 0x36d38e31] WebServer socket]/runnable@ForkJoinPool-1-worker-4: [0x5b6fdc12 0x36d38e31] Request failed: HttpPrologue[protocol=HTTP, protocolVersion=1.1, method=POST, uriPath=/greet/multi, query=, fragment=], cannot send error response, as response already sent
io.helidon.http.RequestException: Internal exception in JAX-RS processing
        at io.helidon.http.RequestException$Builder.build(RequestException.java:139)
        at io.helidon.webserver.http.ErrorHandlers.unhandledError(ErrorHandlers.java:197)
        at io.helidon.webserver.http.ErrorHandlers.runWithErrorHandling(ErrorHandlers.java:114)
        at io.helidon.webserver.http.Filters$FilterChainImpl.proceed(Filters.java:121)
        at io.helidon.webserver.observe.metrics.MetricsFeature.lambda$configureVendorMetrics$2(MetricsFeature.java:90)
        at io.helidon.webserver.http.Filters$FilterChainImpl.proceed(Filters.java:119)
        at io.helidon.common.context.Contexts.runInContext(Contexts.java:117)
        at io.helidon.webserver.context.ContextRoutingFeature.filter(ContextRoutingFeature.java:50)
        at io.helidon.webserver.http.Filters$FilterChainImpl.proceed(Filters.java:119)
        at io.helidon.webserver.http.Filters.executeFilters(Filters.java:87)
        at io.helidon.webserver.http.Filters.lambda$filter$0(Filters.java:83)
        at io.helidon.webserver.http.ErrorHandlers.runWithErrorHandling(ErrorHandlers.java:76)
        at io.helidon.webserver.http.Filters.filter(Filters.java:83)
        at io.helidon.webserver.http.HttpRoutingImpl.route(HttpRoutingImpl.java:73)
        at io.helidon.webserver.http1.Http1Connection.route(Http1Connection.java:425)
        at io.helidon.webserver.http1.Http1Connection.handle(Http1Connection.java:195)
        at io.helidon.webserver.ConnectionHandler.run(ConnectionHandler.java:166)
        at io.helidon.common.task.InterruptableTask.call(InterruptableTask.java:47)
        at io.helidon.webserver.ThreadPerTaskExecutor$ThreadBoundFuture.run(ThreadPerTaskExecutor.java:239)
        at java.base/java.lang.VirtualThread.run(VirtualThread.java:329)
Caused by: io.helidon.http.InternalServerException: Internal exception in JAX-RS processing
        at io.helidon.microprofile.server.JaxRsService.doHandle(JaxRsService.java:248)
        at io.helidon.microprofile.server.JaxRsService.lambda$handle$2(JaxRsService.java:185)
        at io.helidon.common.context.Contexts.runInContext(Contexts.java:117)
        at io.helidon.microprofile.server.JaxRsService.handle(JaxRsService.java:185)
        at io.helidon.webserver.http.HttpRoutingImpl$RoutingExecutor.doRoute(HttpRoutingImpl.java:165)
        at io.helidon.webserver.http.HttpRoutingImpl$RoutingExecutor.call(HttpRoutingImpl.java:124)
        at io.helidon.webserver.http.HttpRoutingImpl$RoutingExecutor.call(HttpRoutingImpl.java:102)
        at io.helidon.webserver.http.ErrorHandlers.runWithErrorHandling(ErrorHandlers.java:76)
        ... 17 more
Caused by: io.helidon.common.buffers.DataReader$InsufficientDataAvailableException
        at io.helidon.common.buffers.DataReader.pullData(DataReader.java:83)
        at io.helidon.common.buffers.DataReader.ensureAvailable(DataReader.java:111)
        at io.helidon.webserver.http1.Http1Connection.readLengthEntity(Http1Connection.java:307)
        at io.helidon.webserver.http1.Http1Connection.readEntityFromPipeline(Http1Connection.java:267)
        at io.helidon.webserver.http1.Http1Connection.lambda$route$2(Http1Connection.java:416)
        at io.helidon.webserver.http1.Http1ServerRequestWithEntity.lambda$new$0(Http1ServerRequestWithEntity.java:71)
        at io.helidon.http.media.ReadableEntityBase$RequestingInputStream.ensureBuffer(ReadableEntityBase.java:254)
        at io.helidon.http.media.ReadableEntityBase$RequestingInputStream.close(ReadableEntityBase.java:233)
        at io.helidon.microprofile.server.LazyInputStream.close(LazyInputStream.java:85)
        at org.glassfish.jersey.message.internal.EntityInputStream.close(EntityInputStream.java:135)
        at org.glassfish.jersey.message.internal.InboundMessageContext$EntityContent.close(InboundMessageContext.java:132)
        at org.glassfish.jersey.message.internal.InboundMessageContext$EntityContent.close(InboundMessageContext.java:124)
        at org.glassfish.jersey.message.internal.ReaderWriter.safelyClose(ReaderWriter.java:284)
        at org.glassfish.jersey.message.internal.InboundMessageContext.readEntity(InboundMessageContext.java:679)
        at org.glassfish.jersey.message.internal.InboundMessageContext.readEntity(InboundMessageContext.java:591)
        at org.glassfish.jersey.server.ContainerRequest.readEntity(ContainerRequest.java:253)
        at org.glassfish.jersey.media.multipart.internal.FormDataParamValueParamProvider$ValueProvider.getEntity(FormDataParamValueParamProvider.java:88)
        at org.glassfish.jersey.media.multipart.internal.FormDataParamValueParamProvider$FormDataParamValueProvider.apply(FormDataParamValueParamProvider.java:245)
        at org.glassfish.jersey.media.multipart.internal.FormDataParamValueParamProvider$FormDataParamValueProvider.apply(FormDataParamValueParamProvider.java:226)
        at org.glassfish.jersey.server.spi.internal.ParamValueFactoryWithSource.apply(ParamValueFactoryWithSource.java:50)
        at org.glassfish.jersey.server.spi.internal.ParameterValueHelper.getParameterValues(ParameterValueHelper.java:64)
        at org.glassfish.jersey.server.model.internal.JavaResourceMethodDispatcherProvider$AbstractMethodParamInvoker.getParamValues(JavaResourceMethodDispatcherProvider.java:109)
        at org.glassfish.jersey.server.model.internal.JavaResourceMethodDispatcherProvider$ResponseOutInvoker.doDispatch(JavaResourceMethodDispatcherProvider.java:176)
        at org.glassfish.jersey.server.model.internal.AbstractJavaResourceMethodDispatcher.dispatch(AbstractJavaResourceMethodDispatcher.java:93)
        at org.glassfish.jersey.server.model.ResourceMethodInvoker.invoke(ResourceMethodInvoker.java:478)
        at org.glassfish.jersey.server.model.ResourceMethodInvoker.apply(ResourceMethodInvoker.java:400)
        at org.glassfish.jersey.server.model.ResourceMethodInvoker.apply(ResourceMethodInvoker.java:81)
        at org.glassfish.jersey.server.ServerRuntime$1.run(ServerRuntime.java:274)
        at org.glassfish.jersey.internal.Errors$1.call(Errors.java:248)
        at org.glassfish.jersey.internal.Errors$1.call(Errors.java:244)
        at org.glassfish.jersey.internal.Errors.process(Errors.java:292)
        at org.glassfish.jersey.internal.Errors.process(Errors.java:274)
        at org.glassfish.jersey.internal.Errors.process(Errors.java:244)
        at org.glassfish.jersey.process.internal.RequestScope.runInScope(RequestScope.java:266)
        at org.glassfish.jersey.server.ServerRuntime.process(ServerRuntime.java:253)
        at org.glassfish.jersey.server.ApplicationHandler.handle(ApplicationHandler.java:696)
        at io.helidon.microprofile.server.JaxRsService.doHandle(JaxRsService.java:228)

This is what we are seeing even after un-commenting the receiver buffer configuration ( Note, the server was killed after 2 minutes of waiting, hence the Broken pipe )

time curl -v -F [email protected] http://cagbu-phx-979.snphxprshared1.gbucdsint02phx.oraclevcn.com:8080/greet/multi                                                                                                                                                                                                                                                          
* Host cagbu-phx-979.snphxprshared1.gbucdsint02phx.oraclevcn.com:8080 was resolved.
* IPv6: (none)
* IPv4: 100.77.43.138
*   Trying 100.77.43.138:8080...
* Connected to cagbu-phx-979.snphxprshared1.gbucdsint02phx.oraclevcn.com (100.77.43.138) port 8080
> POST /greet/multi HTTP/1.1
> Host: cagbu-phx-979.snphxprshared1.gbucdsint02phx.oraclevcn.com:8080
> User-Agent: curl/8.7.1
> Accept: */*
> Content-Length: 4096213
> Content-Type: multipart/form-data; boundary=------------------------qID3LEO7MJvnxGvgLiyxdh
> Expect: 100-continue
> 
< HTTP/1.1 100 Continue
< 
* Send failure: Broken pipe
* Closing connection
curl: (55) Send failure: Broken pipe
curl -v -F [email protected]   0.02s user 0.04s system 0% cpu 2:10.04 total

What's something interesting is, we tried enabling SSH port-forwarding and the upload is relatively very quick. Here is how the curl looks with port forwarding enabled :

 time curl -v -F [email protected] http://localhost:8080/greet/multi                                                                                                                                                                                                                                                                                                       
* Host localhost:8080 was resolved.
* IPv6: ::1
* IPv4: 127.0.0.1
*   Trying [::1]:8080...
* connect to ::1 port 8080 from ::1 port 59058 failed: Connection refused
*   Trying 127.0.0.1:8080...
* Connected to localhost (127.0.0.1) port 8080
> POST /greet/multi HTTP/1.1
> Host: localhost:8080
> User-Agent: curl/8.7.1
> Accept: */*
> Content-Length: 4096213
> Content-Type: multipart/form-data; boundary=------------------------RSxiGiGzaaVZNzUsACWrUr
> Expect: 100-continue
> 
* Empty reply from server
* Closing connection
curl: (52) Empty reply from server
curl -v -F [email protected] http://localhost:8080/greet/multi  0.01s user 0.02s system 5% cpu 0.423 total

Thanks !

@spericas
Copy link
Member

spericas commented Jan 29, 2025

@DeepakR-Oracle That's interesting. Honestly, this looks more like a network issue now. Is HTTP traffic routed differently? Are there any load balancers along the way? I just don't know how to connect your findings with anything we do in Helidon. I'll ask PSR for some help evaluating this.

@DeepakR-Oracle
Copy link
Author

DeepakR-Oracle commented Jan 30, 2025

@spericas There are no load balancers or intermediate layers, and we don’t see this issue with any other HTTP requests—only those involving multipart data.

We initially suspected a network-related too, but this bug was raised because the same environment works fine with Helidon 1.x and JDK 8. The latency issue appears only with Helidon 4.x and JDK 21.

Thanks!

@spericas
Copy link
Member

@DeepakR-Oracle We are going to attempt to do some testing of this on OCI to see if we can reproduce the problem. We obviously need to reproduce this before we can try to fix it.

@spericas
Copy link
Member

spericas commented Jan 30, 2025

I managed to do some testing using two OCI machines on the same subnet. I'm not able to reproduce the same upload problem described above using the 2 Ubuntu systems on a wi-fi network. In particular, the setting

server.connection-options.socket-receive-buffer-size=32768

does not help nor hurt the upload performance, which is about 1.5 secs for a 4 MB file.

(helidon-client:~) >> time curl -v -F [email protected] http://<oci-server>:8080/greet/multi
* About to connect() to <oci-server> port 8080 (#0)
*   Trying XX.XX.XX.XX..
* Connected to <oci-server> (XX.XX.XX.XX) port 8080 (#0)
> POST /greet/multi HTTP/1.1
> User-Agent: curl/7.29.0
> Host: <oci-server>:8080
> Accept: */*
> Content-Length: 4096201
> Expect: 100-continue
> Content-Type: multipart/form-data; boundary=----------------------------845fde1fc189
> 
< HTTP/1.1 100 Continue
< HTTP/1.1 200 OK
< Date: Thu, 30 Jan 2025 15:56:08 GMT
< Connection: keep-alive
< Content-Length: 0
< 
* Connection #0 to host <oci-server> left intact

real	0m1.542s
user	0m0.003s
sys	0m0.013s

Server version is:

NAME="Oracle Linux Server"
VERSION="7.8"

Unfortunately, we still cannot reproduce the problem reported by @DeepakR-Oracle at this time.

@spericas
Copy link
Member

@DeepakR-Oracle I somehow missed that your client in the test above is Mac. I'm assuming that means you're going over VPN then? If so, could you reproduce the problem using two OCI machines?

@spericas spericas moved this from In Progress to Sprint Scope in Backlog Jan 30, 2025
@spericas
Copy link
Member

I've done one more test with a Linux server version 8.1 (thanks @barchetta!). If the client runs on another OCI machine (same client as in the test above), I see no problems at all with the upload. Naturally, if the client is a Mac connecting over VPN, then it is quite slow to upload --but that's the network not Helidon AFICT.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
4.x Version 4.x jax-rs JAX-RS and Jersey related issues performance
Projects
Status: Sprint Scope
Development

No branches or pull requests

3 participants