Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Orderbook with heaptrack profiler #2763

Closed
wants to merge 9 commits into from
Closed

Conversation

squadgazzz
Copy link
Contributor

@squadgazzz squadgazzz commented May 29, 2024

Background

Heaptrack provides a killer feature: attaching to an existing process to collect heap memory report, but after a couple of attempts on production, CPU usage is increased drastically, which adds delays in the orderbook API and causes container restarts making it impossible to run longer than a couple of minutes which is not enough to collect meaningful report.

Changes

This PR configures Dockerfile in a way to run heaptrack together with the orderbook process from the beginning. That, in theory, should reduce CPU consumption. During local runs, heaptrack doesn't cause CPU overhead.

For this specific setup, to generate a report, the orderbook process must be gracefully killed first. To make this happen, another infra PR is required to configure /tmp/heaptrack dir pointing to a separate PVC.

How to test

Tested locally + requires testing on staging.

Important

This PR should not be merged and will be tested via temporal tags.

@squadgazzz squadgazzz force-pushed the orderbook-with-profiler branch 3 times, most recently from dadfecd to 7b5babf Compare May 31, 2024 11:47
@squadgazzz squadgazzz marked this pull request as ready for review May 31, 2024 13:51
@squadgazzz squadgazzz requested a review from a team as a code owner May 31, 2024 13:51
@MartinquaXD
Copy link
Contributor

This PR configures Dockerfile in a way to run heaptrack together with the orderbook process from the beginning. That, in theory, should reduce CPU consumption.

Why should that reduce CPU consumption? Is the logic to track allocations dramatically different when started together?

@squadgazzz
Copy link
Contributor Author

Why should that reduce CPU consumption? Is the logic to track allocations dramatically different when started together?

I couldn't test it locally because, on my system, the CPU is not affected by the profiler. But in order to attach to a process, additional libraries are required(GDB, etc), this requires more operations and should consume more resources(more info).

Copy link
Contributor

@MartinquaXD MartinquaXD left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's really valuable to have heaptrack as a debugging tool at our disposal constantly but to make it useful long term we need to reduce the impact it has on the deploy step a lot and make it configurable at runtime whether we want to instrument from the start.

Dockerfile Outdated
COPY --from=cargo-build /orderbook /usr/local/bin/orderbook
ENTRYPOINT [ "orderbook" ]
ENTRYPOINT ["/bin/sh", "-c", "exec heaptrack -o /tmp/heaptrack/heaptrack.orderbook.$(shuf -i 1-99999 -n 1).gz orderbook"]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it necessary that the /tmp path gets defined here and in the shell script below?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, because this is where the PVC is defined.

Dockerfile Outdated
@@ -39,8 +39,15 @@ COPY --from=cargo-build /driver /usr/local/bin/driver
ENTRYPOINT [ "driver" ]

FROM intermediate as orderbook
RUN apt-get update && \
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Building heaptrack is useless since we rebuild it in the final container as well, right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is required for local tests when I build the orderbook image separately.

@@ -52,12 +59,21 @@ ENTRYPOINT [ "solvers" ]

# Extract Binary
FROM intermediate
RUN apt-get update && \
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have multiple concerns here:

  1. when we don't need heaptrack we rebuild it over and over again in the deploy action for no use. To get around this it could be made opt-in with a flag. Alternatively we could cache building heaptrack somehow. That way we only pay with increased image size which would probably be relatively small.
  2. heaptracking should be available to all binaries. Sure currently we only have an issue in the orderbook pod but we could very well have an issue in the autopilot pod at some other time
  3. the infra should be able to decide whether to run the instrumented version or not at runtime
  4. building heaptrack right in the final image means we have a bunch of dependencies we only need for building but not for running heaptrack. It should be built in a separate container and we should only copy over the binaries and libs we actually need.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR will not be merged. It can be tackled once we decide to deal with the heaptrack on a long-term basis.

@squadgazzz squadgazzz marked this pull request as draft June 10, 2024 09:36
Copy link

This pull request has been marked as stale because it has been inactive a while. Please update this pull request or it will be automatically closed.

Copy link

This pull request has been marked as stale because it has been inactive a while. Please update this pull request or it will be automatically closed.

@github-actions github-actions bot added the stale label Jun 26, 2024
@squadgazzz squadgazzz added blocked This issue is blocked by some other work and removed stale labels Jun 26, 2024
@squadgazzz squadgazzz closed this Jul 8, 2024
@github-actions github-actions bot locked and limited conversation to collaborators Jul 8, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
blocked This issue is blocked by some other work
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants