-
Notifications
You must be signed in to change notification settings - Fork 119
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
"java.io.IOException: Error writing request body to server" error during create ephemeral certificate #528
Comments
Hi @vbonne, It would be helpful if you can provide us with as small of an example as possible to reproduce the error. Make sure to include a dockerfile, pom/gradle, and sample code so we can try to reproduce on our end as well. If you haven't already, make sure you are on the latest version of the library and that you don't have any dependency conflicts. Looking around online, there are a few SO posts with this error. It looks like they are caused by the SSL handshake failing for some reason, but to find the exact reason you may need to run with |
Thanks @kurtisvg Ok, restarted with -Djavax.net.debug=all, on the next error I will update this ticket, with detailed log, and post also dockerfile + pom + datasource bean once the error happens again. In the mean time I will try resolving the remaining dependency conflicts and try to create a small project to reproduce this error. |
Ok all dependency conflicts resolved. Just updated Google cloud run project with latest image. Waiting for the error to happen... Dockerfile :
Pom file with dependency conflicts resolved
datasource bean, sorry that this bean is so large but the project runs on different infractructures that need various configuration options... but in this case of Cloud run the environment variable GOOGLE_SQL_PRIVATE_ACCESS set to true. And the DIRECT_JDBC_URL can be set to true or false in order to try the Google Sql conection with a manually created url or letting the socketfactory do its job to generate it. In both cases the same error happens after a few hours.
list of cloud run enviroment variables :
Simple project public gitlab repo here |
from the logs of the simple app created to reproduce the error, i can see this error in the logs :
even if it does no seem to generate any fatal error at this point could it be that I need to add the Tomcat libtcnative-1 package to the docker OS for negociating the ssl renewal of the Google SQL instance ? I leave this post for history, I edited the docker file with this package added. new Docker File :
|
Here is the a sample of the failing conection error log with debug enabled on the simple project.
and previous post error about libtcnative-1 corrected by creating a symlink to indicate to SpringBoot where the libtcnative-1.so is, docker file of previous post updated with correction. |
@vbonne Is this most recent error what you see on startup? Also, I assume you've configured a serverless VPC access connector? |
@enocom at startup there is no error, service runs fine until ephemeral ssl error occurs and this can be from a few hours to a few days after startup.
No I did not configure any VPC access conector, I just have a cloud run instance comunicating to a postgres SQL instance by its instance name... |
Steps to reproduce : test login page manually https://.a.run.app/login you should get a login form |
I updated the error log post with a sample of the simple project log, at the right moment where the conection is lost, the first lines show hikari pool conection ok, then 5 seconds later the conection is lost forever until service restart... |
Hi everyone, we are probably seeing the same issue in our Spring Boot project:
This leads to a restart of the service after 20 unsuccessful attempts. The error then disappears. These are the dependencies we use:
The database instance uses a public IP. The DataSource is created by Spring Boot. |
Hi to all, After my springboot app is deployed in GCP it works fine.
We need to use this solution to allow to use VPC for intranet campany services. Waiting GCP news about this fix. Kind regards |
Thanks @vincenzo-mazzotta @Maassmensch83 and @vbonne. I'll reproduce the error and report back. |
Hi @vincenzo-mazzotta @Maassmensch83 and @vbonne, have these errors only started occurring when using the latest version, or do previous socket factory versions also cause the error to appear? |
I can't say anything about appearing date of this error as my project is just been ported to GCP so I started with latest version. same error happened only 1h after startup :
but in this case after those errors were "normal" as the attemps were correctly retried and none of the scheduled tasks did actually fail as the conections managed to be re-established fast enough. |
Thanks @vbonne. I'm running a similar experiment and will report if the cert refresh works. |
I updated my previous post, with version 1.2.0 the error occurs, but is recovered fast enough not to generate actual failure of any REST request or scheduled task. With version 1.3.0 when the error occurs it does not get recovered. I will then proceed with my plateform tests with version 1.2.0. But it would be great if this error could not occur at all ;-) One other question, in my depency error check, I found that the postgres-socket-factory package v 1.2.0 contains in itself inconsistencies that i had to correct at my project level. The google-auth-library-oauth2 and the google-http-client have version conflicts within the postgres-socket-factory package. Is that normal ? Should I force the latest version of those 2 packages at my project level, or leave this google package with its own dependence inconsistencies ?. |
I guess it will not add much information because most of this post is already in the previous post but by comparing the 2 projects i have, one running 1.2.0 and 1.3.0 i could see that the period when the error occurs are identical within the day. Once the error stops, they stop on both projects and the 1.3.0 then also recovers normal oprations. I mean that by oposition to what i already stated, the v1.3.0 does not stay locked out the SQL connection for ever. |
I can't answer this question because we are currently moving our application from on-prem to GCP and haven't done any dependency updates yet. We import the |
@Maassmensch83 The errors are perhaps alarming, but as long as your app has no problem connecting with the database, they can be ignore. See #310 for context. Thanks for the additional report @shanawaspm. For now, the recommendation is to pin to v1.2.3 while we determine the issue in v1.3.0. |
thanks, I have now downgraded both |
Update: I have opened a draft PR with a possible solution that adds an IOExceptionHandler to the API calls, allowing the SQLAdmin Client to retry the requests that fail with IOExceptions. I've deployed two versions of the tiny app with and without the change, which have been running for ~9 hours. Still waiting to see whether the change reduces or eliminates the errors. Although it looks like there are multiple types of errors that we're dealing with. I'm not sure if the root cause for the |
Adjusting this to P2 since we have a workaround for the shortterm. |
Hi to all, |
Pardon I've seen now to downgrade library mysql-socket-factory to 1.2.3 version. Waiting more of 1 hour to see if continue to works fine. |
The bug isn't solved using mysql-socket-factory 1.2.3 version please see attachment log error JVM8 can't acquire connections to SQL Server by sql auth proxy when it need to refresh connections: I attach log messages around errors from 2021-07-12 13:19:48.956 CEST to 2021-07-12 13:22:50.046 CEST to see the problem: |
@vincenzo-mazzotta This issue is about the error " |
@vincenzo-mazzotta If you're seeing this error repeatedly (for example across a few hours) and it affects connectivity, would you mind opening a new issue? Otherwise, you might be seeing a transient error. |
I opened a CASE on GCP support because this error is present, as explained by GCP support, because ephimeral certificates are security certificates that are valid around 1h and then they need to refresh. |
Thank you @vincenzo-mazzotta. Meanwhile, we're working on getting to the root cause of this issue. There's a chance your issue is related. |
We've just cut a new release. I've been running it for almost 24 hours now and don't see any of these errors. Would you like to see if the new version fixes this issue? cc @vbonne and @vincenzo-mazzotta |
After running the new release, I'm still seeing the Presently, the workaround suggested in #528 (comment) (setting the max lifetime to zero) is probably the way to go until we find and fix this bug. |
After some digging and a number of conversations on our team, I've managed to reproduce this issue in a much smaller scale and have found the root of the problem. In short, this is not a bug in the socket factory. Instead, it's a result of running the socket factory on Cloud Run. From the Cloud Run docs in the section Avoiding background activities:
On a first request, the socket factory will initiate a connection to the Cloud SQL instance. It will also schedule a refresh of the associated credentials for that connection ~55 minutes in the future. What we're seeing here is that refresh operation failing on account of being throttled by Cloud Run. The best solution in my opinion is to run the socket factory using a Unix socket. Note: in this approach you will need to create a Cloud SQL Connector. |
Let's leave this open until we update our README to reflect this |
Sounds good. I'll make that change. |
Hi @enocom, I confirm, not a single error in 2 days working with unix socket and version 1.3.1 . Thanks all for your investigation and solution. |
I'm going to close this issue for now. We just merged #561 which greatly improves the application's ability to recover from errors when using TCP sockets, and also recommend using Unix sockets on Cloud Run and Functions if you can. |
Dear team,
I have been struggling for a month with the following issue.
I have a Java 11 springboot project running from a Docker image based on the OpenJdk11 slim image (already tried various other base images and always got the same error) in a Cloud run environment conecting to google postgres SQL instance.
The conection is done with the google socket factory and the datasource generated using hikari config with the SQL instance name (private acess, no public IP).
The project runs fine for a few hours, sometime almost a few days and then it stops beeing able to connect to the SQL instance The first 2 errors are the following :
Error 1
java.lang.RuntimeException: [<Name-Of-Google-SQL-Instance>] Failed to create ephemeral certificate for the Cloud SQL instance. at com.google.cloud.sql.core.CloudSqlInstance.addExceptionContext (CloudSqlInstance.java:574) at com.google.cloud.sql.core.CloudSqlInstance.fetchEphemeralCertificate (CloudSqlInstance.java:515) at com.google.cloud.sql.core.CloudSqlInstance.lambda$performRefresh$0 (CloudSqlInstance.java:330) at com.google.common.util.concurrent.TrustedListenableFutureTask$TrustedFutureInterruptibleTask.runInterruptibly (TrustedListenableFutureTask.java:125) at com.google.common.util.concurrent.InterruptibleTask.run (InterruptibleTask.java:69) at com.google.common.util.concurrent.TrustedListenableFutureTask.run (TrustedListenableFutureTask.java:78) at java.util.concurrent.Executors$RunnableAdapter.call (Unknown Source) at java.util.concurrent.FutureTask.run (Unknown Source) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run (Unknown Source) at java.util.concurrent.ThreadPoolExecutor.runWorker (Unknown Source) at java.util.concurrent.ThreadPoolExecutor$Worker.run (Unknown Source) at java.lang.Thread.run (Unknown Source) Caused by: javax.net.ssl.SSLHandshakeException: Remote host terminated the handshake at sun.security.ssl.SSLSocketImpl.handleEOF (Unknown Source) at sun.security.ssl.SSLSocketImpl.decode (Unknown Source) at sun.security.ssl.SSLSocketImpl.readHandshakeRecord (Unknown Source) at sun.security.ssl.SSLSocketImpl.startHandshake (Unknown Source) at sun.security.ssl.SSLSocketImpl.startHandshake (Unknown Source) at sun.net.www.protocol.https.HttpsClient.afterConnect (Unknown Source) at sun.net.www.protocol.https.AbstractDelegateHttpsURLConnection.connect (Unknown Source) at sun.net.www.protocol.http.HttpURLConnection.getOutputStream0 (Unknown Source) at sun.net.www.protocol.http.HttpURLConnection.getOutputStream (Unknown Source) at sun.net.www.protocol.https.HttpsURLConnectionImpl.getOutputStream (Unknown Source) at com.google.api.client.http.javanet.NetHttpRequest.execute (NetHttpRequest.java:113) at com.google.api.client.http.javanet.NetHttpRequest.execute (NetHttpRequest.java:84) at com.google.api.client.http.HttpRequest.execute (HttpRequest.java:1012) at com.google.api.client.googleapis.services.AbstractGoogleClientRequest.executeUnparsed (AbstractGoogleClientRequest.java:514) at com.google.api.client.googleapis.services.AbstractGoogleClientRequest.executeUnparsed (AbstractGoogleClientRequest.java:455) at com.google.api.client.googleapis.services.AbstractGoogleClientRequest.execute (AbstractGoogleClientRequest.java:565) at com.google.cloud.sql.core.CloudSqlInstance.fetchEphemeralCertificate (CloudSqlInstance.java:513)
Error 2:
java.io.EOFException: SSL peer shut down incorrectly at sun.security.ssl.SSLSocketInputRecord.read (Unknown Source) at sun.security.ssl.SSLSocketInputRecord.readHeader (Unknown Source) at sun.security.ssl.SSLSocketInputRecord.decode (Unknown Source) at sun.security.ssl.SSLTransport.decode (Unknown Source) at sun.security.ssl.SSLSocketImpl.decode (Unknown Source) at sun.security.ssl.SSLSocketImpl.readHandshakeRecord (Unknown Source) at sun.security.ssl.SSLSocketImpl.startHandshake (Unknown Source) at sun.security.ssl.SSLSocketImpl.startHandshake (Unknown Source) at sun.net.www.protocol.https.HttpsClient.afterConnect (Unknown Source) at sun.net.www.protocol.https.AbstractDelegateHttpsURLConnection.connect (Unknown Source) at sun.net.www.protocol.https.HttpsURLConnectionImpl.connect (Unknown Source) at com.google.api.client.http.javanet.NetHttpRequest.execute (NetHttpRequest.java:148) at com.google.api.client.http.javanet.NetHttpRequest.execute (NetHttpRequest.java:84) at com.google.api.client.http.HttpRequest.execute (HttpRequest.java:1012) at com.google.api.client.googleapis.services.AbstractGoogleClientRequest.executeUnparsed (AbstractGoogleClientRequest.java:514) at com.google.api.client.googleapis.services.AbstractGoogleClientRequest.executeUnparsed (AbstractGoogleClientRequest.java:455) at com.google.api.client.googleapis.services.AbstractGoogleClientRequest.execute (AbstractGoogleClientRequest.java:565) at com.google.cloud.sql.core.CloudSqlInstance.fetchMetadata (CloudSqlInstance.java:438) at com.google.common.util.concurrent.TrustedListenableFutureTask$TrustedFutureInterruptibleTask.runInterruptibly (TrustedListenableFutureTask.java:125) at com.google.common.util.concurrent.InterruptibleTask.run (InterruptibleTask.java:69) at com.google.common.util.concurrent.TrustedListenableFutureTask.run (TrustedListenableFutureTask.java:78) at java.util.concurrent.Executors$RunnableAdapter.call (Unknown Source) at java.util.concurrent.FutureTask.run (Unknown Source) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run (Unknown Source) at java.util.concurrent.ThreadPoolExecutor.runWorker (Unknown Source) at java.util.concurrent.ThreadPoolExecutor$Worker.run (Unknown Source) at java.lang.Thread.run (Unknown Source)
then at every attempt to connect to the project or when sheduled tasks run the following error :
java.io.EOFException: SSL peer shut down incorrectly at sun.security.ssl.SSLSocketInputRecord.read (Unknown Source) at sun.security.ssl.SSLSocketInputRecord.readHeader (Unknown Source) at sun.security.ssl.SSLSocketInputRecord.decode (Unknown Source) at sun.security.ssl.SSLTransport.decode (Unknown Source) at sun.security.ssl.SSLSocketImpl.decode (Unknown Source) at sun.security.ssl.SSLSocketImpl.readHandshakeRecord (Unknown Source) at sun.security.ssl.SSLSocketImpl.startHandshake (Unknown Source) at sun.security.ssl.SSLSocketImpl.startHandshake (Unknown Source) at sun.net.www.protocol.https.HttpsClient.afterConnect (Unknown Source) at sun.net.www.protocol.https.AbstractDelegateHttpsURLConnection.connect (Unknown Source) at sun.net.www.protocol.https.HttpsURLConnectionImpl.connect (Unknown Source) at com.google.api.client.http.javanet.NetHttpRequest.execute (NetHttpRequest.java:148) at com.google.api.client.http.javanet.NetHttpRequest.execute (NetHttpRequest.java:84) at com.google.api.client.http.HttpRequest.execute (HttpRequest.java:1012) at com.google.api.client.googleapis.services.AbstractGoogleClientRequest.executeUnparsed (AbstractGoogleClientRequest.java:514) at com.google.api.client.googleapis.services.AbstractGoogleClientRequest.executeUnparsed (AbstractGoogleClientRequest.java:455) at com.google.api.client.googleapis.services.AbstractGoogleClientRequest.execute (AbstractGoogleClientRequest.java:565) at com.google.cloud.sql.core.CloudSqlInstance.fetchMetadata (CloudSqlInstance.java:438) at com.google.common.util.concurrent.TrustedListenableFutureTask$TrustedFutureInterruptibleTask.runInterruptibly (TrustedListenableFutureTask.java:125) at com.google.common.util.concurrent.InterruptibleTask.run (InterruptibleTask.java:69) at com.google.common.util.concurrent.TrustedListenableFutureTask.run (TrustedListenableFutureTask.java:78) at java.util.concurrent.Executors$RunnableAdapter.call (Unknown Source) at java.util.concurrent.FutureTask.run (Unknown Source) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run (Unknown Source) at java.util.concurrent.ThreadPoolExecutor.runWorker (Unknown Source) at java.util.concurrent.ThreadPoolExecutor$Worker.run (Unknown Source) at java.lang.Thread.run (Unknown Source)
From what I understood it seems that this error appear when the ssl is renewd on the sql instance, and also understood that this is normal as far the conection is then retried and ends succesfull. (issue 310) In my case the conection is never re-established and the whole project stops working.
For the moment, as i am mostly doing trials to see how we could migrate our project to the Google cloud plateform, my current solution is to restart the google cloud run instance when this error occurs, but this is not a viable for a production solution...
And idea where the error could be ?
Thanks for your help to put me on right track !
The text was updated successfully, but these errors were encountered: