Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DefaultHttpClientCache gradually allocates memory resulting in OutOfMemory Exception #645

Open
5 of 11 tasks
swilen-iwanow opened this issue Nov 20, 2024 · 4 comments
Open
5 of 11 tasks

Comments

@swilen-iwanow
Copy link

Issue Description

Hi, colleagues

Recently, we've upgraded our service from cloud SDK 4 to the latest 5.14.0. Right after that we've experienced out of memory exceptions on our production system. Our service runs on CF and had dedicated 1.5G of ram prior to the library bump.
Now we've increased it to 3G to be able to create some heap dumps and what we observe is slow but gradual memory allocations, spread in a long time period (24h+). Our service is multi-tenant and we use Destinations only on the provider account. However, we run in a DwC environment with atleast 1 request per minute for a tenant (this is important as we also have the DwCTenantFacade in the application context). Our investigations, led us to a assumption there is some memory leak in the connection pool cache. If you need more details, feel free to contact me directly. Thanks!

image

Important information:

  • Your code
  • Expected outcome
  • Actual outcome
  • Steps attempted to resolve the issue
  • In case of issues with any of our VDMs:
    • What happens when a request is performed directly via an HTTP client tool such as Postman?
    • In case that succeeds please provide details on the request such as URL, query parameters, header parameters, prerequisite requests (e.g. CSRF token), etc.
  • Potentially missing information (open questions you might have)

Impact / Priority

Affected development phase: Production

Impact: Impaired

Error Message

Exit code 137

Project Details

I prefer not to share publicly. Contact me for the required details.

  • SDK Version:
  • Link to GitHub repo:
  • Project type, for example:
    • CAP Project
    • SDK Maven Archetype
    • None of the above:
  • Platform:
    • Cloud Foundry
    • Deploy with Confidence (Cloud Foundry)
    • None of the above:

Checklist

  • Checked out the documentation and Stack Overflow
  • Description provided with all relevant information
  • [] Exception and stack trace provided
  • Attached debug logs
  • Attached dependency tree
  • Provided Cloud SDK version & link to relevant source code
@MatKuhr
Copy link
Member

MatKuhr commented Nov 20, 2024

Hi Svilen, thanks for reaching out! Typically, such behavior occurs if there are connections open that are not properly closed. This can happen if HTTP requests are performed without fully consuming the response data.

Could you please share what kinds of requests you are using the Cloud SDK for?

@swilen-iwanow
Copy link
Author

In the service itself we use the connectivity library to fetch a destination from the provider account. However, this distination only provides properties and we are not using it to create requests.

@MatKuhr
Copy link
Member

MatKuhr commented Nov 21, 2024

Okay, I think for only reading destination properties from the provider account only a single HTTP client should be used. So I'm not sure where so many HTTP client instances are coming from. However, the CAP framework and potentially other libraries might use the Cloud SDK HTTP Client under the hood.

Could you try to debug the problem by accessing HttpClientAccessor -> httpClientCache -> cache -> cache in a debugger and look at the entries? They represent the cache keys associated with the clients which should tell us more where the clients are coming from. In particular, I'd expect the cache key components to contain destinations with URLs. Maybe doing this locally is already sufficient to observe the behavior.


In addition, you could configure the HTTP client cache to remove entries faster, e.g. HttpClientAccessor.setHttpClientCache(new DefaultHttpClientCache(5, TimeUnit.MINUTES)); (default is 1 hr).
This is more of a band aid fix and will only work, if the underlying HTTP connections are properly closed.


Finally, could you please confirm you are using one of the following:

var allDest = service.getAllDestinationProperties();
var dest = service.getDestinationProperties("my-destination");

This should be done instead of getDestination("my-destination") for reading properties only, see the docs.

Destinations are cached by default, so retrieving from the provider account should be fast and only produce an HTTP request every 5 min or fewer.

@swilen-iwanow
Copy link
Author

I've applied the suggested changes - HttpClientAccessor.setHttpClientCache(new DefaultHttpClientCache(5, TimeUnit.MINUTES)); and seems like the memory has stabilised. I also created an incident to CAP colleagues to investigate.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants