Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: thread autoscaling #1266

Open
wants to merge 196 commits into
base: main
Choose a base branch
from
Open

Conversation

Alliballibaba2
Copy link
Collaborator

I originally wanted to just create a PR that allows adding threads via the admin API, but after letting threads scale automatically, that PR kind of didn't make sense anymore by itself.

So here is what this PR does:

It adds 4 Caddy admin endpoints

POST     /frankenphp/workers/restart   # restarts workers (this can also be put into a smaller PR if necessary)
GET      /frankenphp/threads           # prints the current state of all threads (for debugging/caddytests)
PUT      /frankenphp/threads           # Adds a thread at runtime - accepts 'worker' and 'count' query parameters
DELETE   /frankenphp/threads           # Removes a thread at runtime - accepts 'worker' and 'count' query parameters

Additionally, the PR also introduces a new directive in the config: max_threads.

frankenphp {
    max_threads 200
    num_threads 40
}

If it's bigger than num_threads, worker and regular threads will attempt to autoscale after a request on a few different conditions:

  • no thread was available to immediately handle the request
  • the request was stalled for more than a few ms (15ms currently)
  • no other scaling is happening at that time
  • A CPU probe (50ms) successfully determines that PHP threads are consuming less than a predefined amount of CPU (80% currently)
  • we have not reached max_threads yet

This is all still a WIP. I'm not yet sure if max_threads is the best way to configure autoscaling or if it's even necessary to have the PUT/DELETE endpoints. Maybe it would also make sense to determine max_threads based on available memory.
I'll conduct some benchmarks showing that this approach performs better than default settings in a lot of different scenarios (and makes people worry less about thread configuration).

In regards to recent issues, spawning and destroying threads would also make the server more stable if we're experiencing timeouts (not sure yet how to safely destroy running threads).

@AlliBalliBaba
Copy link
Collaborator

I removed the POST and DELETE admin endpoints for now, we can add them back in if needed.

@dunglas
Copy link
Owner

dunglas commented Jan 27, 2025

Is this one ready for prime time?

@AlliBalliBaba
Copy link
Collaborator

Yes, I think this one should be good to go for an initial implementation via the max_threads configuration.
I set the channel used for scaling to nil in case scaling is inactive, so there shouldn't be any difference in behavior if max_threads is not set.

I might create a separate project with integration tests and some extensions/framework at some point so there's more confidence before releases.

@withinboredom
Copy link
Collaborator

giphy

@AlliBalliBaba
Copy link
Collaborator

I created a small testing repo with Laravel, since it's the PHP Framework I'm most familiar with. Right now it just tests a few of the most popular extensions in zts-bookworm, but I'll probably add more if there are any problematic ones.

The only issue I found so far is that calling opcache_reset can lead to a crash under high load (also on main). Not sure if this only affects ZTS, but there exist simliar issues for FPM.

On worker restarts all threads are first stopped, which prevents the crashing. It's just problematic if opcache_reset is called directly , so maybe documenting this might be enough for now.

@AlliBalliBaba
Copy link
Collaborator

Alright I think this is ready to merge. Not sure why watcher installation is failing right now in asan and msan

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants