Skip to content

Commit

Permalink
Merge branch 'kyle/CER-3498-scaling-endpoints' of github.com:Cerebriu…
Browse files Browse the repository at this point in the history
…mAI/documentation into kyle/CER-3498-scaling-endpoints
  • Loading branch information
Kyle Gani authored and Kyle Gani committed Dec 12, 2024
2 parents def77e4 + 83cb5a4 commit 5dfaff5
Showing 1 changed file with 2 additions and 1 deletion.
3 changes: 2 additions & 1 deletion cerebrium/scaling/scaling-apps.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,8 @@ The scaling system monitors two key metrics to make scaling decisions:
The **number of requests** currently waiting for processing in the queue indicates immediate demand. Additionally, the system tracks **how long each request has waited in the queue**. When either of these metrics exceeds their thresholds, new instances start within 3 seconds to handle the increased load.

<Info>
Scaling is also configurable based on the expected traffic of an application. See below for more information.
Scaling is also configurable based on the expected traffic of an application.
See below for more information.
</Info>

As traffic decreases, instances enter a cooldown period after processing their last request. When no new requests arrive during cooldown, instances terminate to optimize resource usage. This automatic cycle ensures apps remain responsive while managing costs effectively.
Expand Down

0 comments on commit 5dfaff5

Please sign in to comment.