Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

OptimizationFailureException for DiskCapacityGoal wrong computation of disk usage #2166

Closed
IgorBerman opened this issue Jun 19, 2024 · 6 comments

Comments

@IgorBerman
Copy link

I have a question how it's possible to investigate why OptimizationFailureException happens. I believe there is something wrong (maybe due to some missing setting or config) with disk current utilization
we are getting following error while trying to get proposals:

errorMessage: "Error processing GET request '/proposals' due to: 'com.linkedin.kafka.cruisecontrol.exception.OptimizationFailureException: [DiskCapacityGoal] Violated capacity limit of 17579716.800000 via broker utilization of 22606978.000000 with broker 201 for resource disk. '.",
stackTrace: "java.util.concurrent.ExecutionException: com.linkedin.kafka.cruisecontrol.exception.OptimizationFailureException: [DiskCapacityGoal] Violated capacity limit of 17579716.800000 via broker utilization of 22606978.000000 with broker 201 for resource disk

capacity limit is computed right as 0.8 * capacity configured with capacity.json, however current usage is computed wrongly,
when looking at load endpoint for this broker we get
13709573.000bytes (62.39%) and not 22606978.000000. I can't figure out where this 22606978.000000 comes from
Screenshot 2024-06-19 at 14 53 11

we are using rather old verison of cruise control '2.0.100'

Any ideas suggestions will be highly appreciated
Thanks in advance
Igor

@IgorBerman IgorBerman changed the title OptimizationFailureException computation of disk usage OptimizationFailureException for DiskCapacityGoal wrong computation of disk usage Jun 19, 2024
@mhratson
Copy link
Contributor

Hard to tell given this much details.

Can you grep for "PARTITION_SIZE does not exist" in the logs?

@cpaika
Copy link

cpaika commented Aug 27, 2024

Can you also share your cruise-control configuration?

@IgorBerman
Copy link
Author

Hi @mhratson & @cpaika
thanks for response

unfortunately
kubectl logs kafka-cruise-control-fe-local-5bcc78bc45-l7k2k | grep "PARTITION_SIZE does not exist"
not printing anything, is it something that periodically printed or only on startup?(I can restart pod of cruise control if needed)

regarding configurations:
cruisecontrol.txt
capacityJBOD.json

@mhratson
Copy link
Contributor

  1. 17579716.800000 is 80% of the configured disk capacity of 21974646, so that's correct.
  2. proposal returns the future value 22606978.000000 that violates the 80% threshold.

I wonder if there was something else after the error? Like a recommendation to add more brokers?
Another try rebalance with --dry-run and see what kind of proposal you get and possible errors and/or recoomendations

I'm going to close the issue, but feel free to continue the conversation.

@mhratson
Copy link
Contributor

#2155 related?

@IgorBerman
Copy link
Author

IgorBerman commented Aug 28, 2024

ok, now I understood this number. this is usage after proposed rebalance. now it makes sense.
No, I can't find in log or as json response of dry-run cluster rebalance any recommendations(seems like we just need more disk capacity)
I'll take a look at attached ticket
thanks for help!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants