Merge branch 'kyle/CER-3498-scaling-endpoints' of github.com:Cerebriu…

…mAI/documentation into kyle/CER-3498-scaling-endpoints
CerebriumAI · Dec 12, 2024 · eed264f · eed264f
2 parents 60685e0 + 161fc4d
commit eed264f
Show file tree

Hide file tree

Showing 6 changed files with 25 additions and 13 deletions.
diff --git a/cerebrium/endpoints/custom-web-servers.mdx b/cerebrium/endpoints/custom-web-servers.mdx
@@ -39,12 +39,16 @@ fastapi = "latest"
 ```
 
 The configuration requires three key parameters:
+
 - `entrypoint`: The command that starts your server
 - `port`: The port your server listens on
 - `healthcheck_endpoint`: The endpoint that confirms server health
 
 <Info>
-For ASGI applications like FastAPI, include the appropriate server package (like `uvicorn`) in your dependencies. After deployment, your endpoints become available at `https://api.cortex.cerebrium.ai/v4/{project-id}/{app-name}/your/endpoint`.
+  For ASGI applications like FastAPI, include the appropriate server package
+  (like `uvicorn`) in your dependencies. After deployment, your endpoints become
+  available at `https://api.cortex.cerebrium.ai/v4/{project - id}/{app - name}
+  /your/endpoint`.
 </Info>
 
-Our [FastAPI Server Example](https://github.com/CerebriumAI/examples) provides a complete implementation.
+Our [FastAPI Server Example](https://github.com/CerebriumAI/examples) provides a complete implementation.
diff --git a/cerebrium/getting-started/collaborating.mdx b/cerebrium/getting-started/collaborating.mdx
@@ -33,4 +33,4 @@ The Users table displays member details, including names, email addresses, roles
 4. Adjust roles as team needs change
 5. Resend invitations when needed
 
-Once members accept their invitations, they gain immediate access based on their assigned roles and can access their authorised project(s) from the dashboards.
+Once members accept their invitations, they gain immediate access based on their assigned roles and can access their authorised project(s) from the dashboards.
diff --git a/cerebrium/scaling/batching-concurrency.mdx b/cerebrium/scaling/batching-concurrency.mdx
@@ -45,7 +45,9 @@ xformers = "latest"
 When multiple requests arrive, vLLM automatically combines them into optimal batch sizes and processes them together, maximizing GPU utilization through its internal batching functionality.
 
 <Tip>
-    Check out the complete [vLLM batching example](https://github.com/CerebriumAI/examples/tree/master/10-batching/3-vllm-batching-gpu) for more information.
+  Check out the complete [vLLM batching
+  example](https://github.com/CerebriumAI/examples/tree/master/10-batching/3-vllm-batching-gpu)
+  for more information.
 </Tip>
 
 ### Custom Batching
@@ -66,10 +68,11 @@ fastapi = "latest"
 ```
 
 <Tip>
-    Check out the complete [Litserve example](https://github.com/CerebriumAI/examples/tree/master/10-batching/2-litserve-batching-gpu) for more information.
+  Check out the complete [Litserve
+  example](https://github.com/CerebriumAI/examples/tree/master/10-batching/2-litserve-batching-gpu)
+  for more information.
 </Tip>
 
 Custom batching provides complete control over request grouping and processing, particularly valuable for frameworks without native batching support or applications with specific processing requirements. The [Container Images Guide](/cerebrium/container-images/defining-container-images#custom-runtimes) provides detailed implementation instructions.
 
 Together, batching and concurrency create an efficient request processing system. Concurrency enables parallel request handling, while batching optimizes how these concurrent requests are processed, leading to better resource utilization and application performance.
-
diff --git a/cerebrium/scaling/scaling-apps.mdx b/cerebrium/scaling/scaling-apps.mdx
@@ -25,12 +25,15 @@ cooldown = 60          # Cooldown period in seconds
 ```
 
 ### Minimum Instances
+
 The `min_replicas` parameter defines how many instances remain active at all times. Setting this to 1 or higher maintains warm instances ready for immediate response, eliminating cold starts but increasing costs. This configuration suits apps that require consistent response times or need to meet specific SLA requirements.
 
 ### Maximum Instances
+
 The `max_replicas` parameter sets an upper limit on concurrent instances, controlling costs and protecting backend systems. When traffic increases, new instances start automatically up to this configured maximum.
 
 ### Cooldown Period
+
 After processing a request, instances remain available for the duration specified by `cooldown`. Each new request resets this timer. A longer cooldown period helps handle bursty traffic patterns but increases instance running time and cost.
 
 ## Processing Multiple Requests
@@ -54,9 +57,10 @@ response_grace_period = 1200   # Clean shutdown time
 The `response_grace_period` parameter provides time for instances to complete active requests during shutdown. The system first sends a SIGTERM signal, waits for the specified grace period, then issues a SIGKILL command if the instance hasn't stopped.
 
 Performance metrics available through the dashboard help monitor scaling behavior:
+
 - Request processing times
 - Active instance count
 - Cold start frequency
 - Resource usage patterns
 
-The system status and platform-wide metrics remain accessible through our [status page](https://status.cerebrium.ai), where Cerebrium maintains 99.9% uptime.
+The system status and platform-wide metrics remain accessible through our [status page](https://status.cerebrium.ai), where Cerebrium maintains 99.9% uptime.
diff --git a/cerebrium/storage/managing-files.mdx b/cerebrium/storage/managing-files.mdx
@@ -2,7 +2,6 @@
 title: "Managing Files"
 ---
 
-
 Cerebrium offers file management through a 50GB persistent volume that's available to all applications in a project. This storage mounts at `/persistent-storage` and helps store model weights and files efficiently across deployments.
 
 ## Including Files in Deployments
@@ -30,6 +29,7 @@ Files included in deployments must be under 2GB each, with deployments working b
 The CLI provides three commands for working with persistent storage:
 
 1. Upload files with `cerebrium cp`:
+
 ```bash
 # Upload to root directory
 cerebrium cp src_file_name.txt
@@ -42,6 +42,7 @@ cerebrium cp dir_name sub_folder/
 ```
 
 2. List files with `cerebrium ls`:
+
 ```bash
 # View root contents
 cerebrium ls
@@ -51,6 +52,7 @@ cerebrium ls sub_folder/
 ```
 
 3. Remove files with `cerebrium rm`:
+
 ```bash
 # Remove a file
 cerebrium rm file_name.txt
@@ -73,5 +75,6 @@ model = torch.jit.load(file_path)
 ```
 
 <Warning>
-    Should you require additional storage capacity, please reach out to us through [support](mailto:[email protected]).
-</Warning>
+  Should you require additional storage capacity, please reach out to us through
+  [support](mailto:[email protected]).
+</Warning>
diff --git a/mint.json b/mint.json
@@ -116,9 +116,7 @@
     },
     {
       "group": "Storage",
-      "pages": [
-        "cerebrium/storage/managing-files"
-      ]
+      "pages": ["cerebrium/storage/managing-files"]
     },
     {
       "group": "Integrations",