-
Notifications
You must be signed in to change notification settings - Fork 1
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
bug fix for kubescaler and setup for flux-operator-hpa-ca (#11)
* bug fix for kubescaler and setup for flux-operator-hpa-ca * added a switch to choose between eks nodegroup and cloudformation * added watch events for listing nodes * implemented eksctl cluster for lammps with ca * added metrics for scalability experiments and flux/lammps experiments * experimental setup complete for lammps full and semi auto * linting and version bump
- Loading branch information
1 parent
c1008e6
commit 9ea5e9c
Showing
42 changed files
with
7,748 additions
and
35 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,5 +1,5 @@ | ||
pre-commit | ||
black | ||
black==23.3.0 | ||
isort | ||
flake8 | ||
pytest |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -15,3 +15,5 @@ __pycache__ | |
*auth-config.yaml | ||
*kubeconfig.yaml | ||
*kubeconfig-*.yaml | ||
**/.DS_Store | ||
.vscode |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,89 @@ | ||
# Setup Kubernetes Cluster with Cluster Autoscaling | ||
|
||
## Deploy the cluster | ||
This file creates/deletes/scales a EKS Cluster. The nodes are managed by both EKS Nodegroup and Cloudformation Stacks. | ||
|
||
``` | ||
python3 k8s_cluster_operations.py -h | ||
positional arguments: | ||
cluster_name Cluster name suffix | ||
optional arguments: | ||
-h, --help show this help message and exit | ||
--experiment EXPERIMENT | ||
Experiment name (defaults to script name) | ||
--node-count NODE_COUNT | ||
starting node count of the cluster | ||
--max-node-count MAX_NODE_COUNT | ||
maximum node count | ||
--min-node-count MIN_NODE_COUNT | ||
minimum node count | ||
--machine-type MACHINE_TYPE | ||
AWS EC2 Instance types | ||
--operation [{create,delete,scale}] | ||
Define which operation you want to perform, If you want to scale, be sure to increase the NODE_COUNT. The cluster size will increase depending on the current instance size. if NODE_COUNT is less than the current, the cluster nodes will be scaled down. | ||
--eks-nodegroup | ||
Include this option to use eks nodegroup for instances, otherwise, it'll use cloudformation stack. EKS Nodegroup will automatically set tags in the aws autoscaling group so that cluster autoscaler can discover them. | ||
--enable-cluster-autoscaler | ||
Include this to enable cluster autoscaling. This will also create an OIDC provider for the cloud. be sure to take a note of the RoleARN that this script will print. | ||
``` | ||
|
||
Example usage | ||
|
||
```console | ||
basicinsect:flux_operator_ca_hpa hossen1$ python3 k8s_cluster_operations.py --operation "create" --enable-cluster-autoscaler --eks-nodegroup | ||
📛️ Cluster name is kubernetes-flux-operator-hpa-ca-cluster | ||
⭐️ Creating the cluster sized 1 to 5... | ||
🥞️ Creating VPC stack and subnets... | ||
🥣️ Creating cluster... | ||
The status of nodegroup CREATING | ||
Waiting for kubernetes-flux-operator-hpa-ca-cluster-worker-group nodegroup... | ||
Setting Up the cluster OIDC Provider | ||
The cluster autoscaler Role ARN - arn:aws:iam::<account-id>:role/AmazonEKSClusterAutoscalerRole | ||
|
||
⏱️ Waiting for 1 nodes to be Ready... | ||
Time for kubernetes to get nodes - 5.082208871841431 | ||
🦊️ Writing config file to kubeconfig-aws.yaml | ||
Usage: kubectl --kubeconfig=kubeconfig-aws.yaml get nodes | ||
``` | ||
|
||
## Set UP Cluster Autoscaler | ||
|
||
Be sure to change two things in this file [cluster-autoscaler-autodiscover.yaml](cluster-autoscaler/cluster-autoscaler-autodiscover.yaml) | ||
|
||
1. RoleARN `arn:aws:iam::<account-id>:role/AmazonEKSClusterAutoscalerRole` in the service account portion | ||
2. Cluster Name - `kubernetes-flux-operator-hpa-ca-cluster` in the commnds of the cluster autoscaler. | ||
|
||
then apply the changes.. | ||
```console | ||
kubectl --kubeconfig=kubeconfig-aws.yaml apply -f cluster-autoscaler/cluster-autoscaler-autodiscover.yaml | ||
``` | ||
|
||
Verify cluster autoscaler is up | ||
```console | ||
$ kubectl --kubeconfig=kubeconfig-aws.yaml get pods -n kube-system | ||
NAME READY STATUS RESTARTS AGE | ||
aws-node-2dz6x 1/1 Running 0 9h | ||
aws-node-pzwl9 1/1 Running 0 9h | ||
cluster-autoscaler-747689d74b-6lkfk 1/1 Running 0 8h | ||
coredns-79df7fff65-q984f 1/1 Running 0 9h | ||
coredns-79df7fff65-tlkwc 1/1 Running 0 9h | ||
kube-proxy-8ch5x 1/1 Running 0 9h | ||
kube-proxy-kq9ch 1/1 Running 0 9h | ||
metrics-server-7db4fb59f9-qdp2c 1/1 Running 0 7h5m | ||
``` | ||
|
||
This will print the logs. be sure that cluster autoscaler discovered the autoscaling group and working properly. | ||
```console | ||
kubectl --kubeconfig=kubeconfig-aws.yaml -n kube-system logs deploy/cluster-autoscaler | ||
``` | ||
|
||
## Run application to collect metrics | ||
Follow this [ca_hpa_readme.md](README_CA_HPA.md) to see how to run a program that will collect metrics for horizontal pod autoscaling, cluster autoscaling |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,38 @@ | ||
# Metrics Collections | ||
The purpose of this [file](application_ca_hpa_metrics.py) is to collect application and system metrics. The assumption is that, we have a kubernetes cluster with cluster autoscaling and horizontal pod autoscaling (HPA) enabled. The metrics and logs this file capture are - | ||
|
||
1. How long a POD is in pending state due to resource unavailability | ||
2. How long it takes to run the container once the pod is scheduled | ||
3. When does HPA take action by seeing the CPU Utilization | ||
4. When there's pending pod, how long it takes for cluster autoscaler to take action | ||
5. when does the cluster autoscaler add new nodes? | ||
6. when cluster autoscaler request for new nodes, how long it takes to get the nodes? | ||
7. when the load is decreased, how long it takes for HPA to scale down pods | ||
8. When there's no load, how long it takes for CA to remove nodes? | ||
9. when do the nodes are actually removed? | ||
|
||
We can answer the above questions and many more by collecting the metrics. This file will save the results in the data directory. | ||
|
||
Run the file following | ||
```console | ||
python3 application_ca_hpa_metrics.py -h | ||
usage: application_ca_hpa_metrics.py [-h] [--flux-namespace FLUX_NAMESPACE] [--autoscaler-namespace AUTOSCALER_NAMESPACE] [--hpa-namespace HPA_NAMESPACE] [--kubeconfig KUBECONFIG] [--outdir OUTDIR] | ||
|
||
Program to collect various metrics from kubernetes | ||
|
||
optional arguments: | ||
-h, --help | ||
show this help message and exit | ||
|
||
--flux-namespace FLUX_NAMESPACE | ||
Namespace of the flux operator | ||
|
||
--autoscaler-namespace AUTOSCALER_NAMESPACE | ||
Namespace of the cluster autoscaler | ||
|
||
--hpa-namespace HPA_NAMESPACE | ||
Namespace of the horizontal pod autoscaler | ||
|
||
--kubeconfig KUBECONFIG | ||
config file name, full path if the file is not in the current directory | ||
``` |
49 changes: 49 additions & 0 deletions
49
examples/flux_operator_ca_hpa/basic-minicluster-setup/README.md
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,49 @@ | ||
# Flux Operator Mini Cluster Setup | ||
|
||
## Basic Minicluster setup | ||
This setup assumes you already created kubernetes cluster with at least 1/2 Nodes by following the direction [here](../README.md) | ||
|
||
Create the flux-operator namespace and install the operator: | ||
|
||
```bash | ||
$ kubectl create namespace flux-operator | ||
$ kubectl apply -f operator-minicluster/basic-configs/flux-operator.yaml | ||
``` | ||
|
||
|
||
```bash | ||
$ kubectl apply -f operator-minicluster/basic-configs/minicluster.yaml | ||
``` | ||
|
||
You'll need to wait for the container to pull (status `ContainerCreating` to `Running`). | ||
At this point, wait until the containers go from creating to running. | ||
|
||
```bash | ||
$ kubectl get -n flux-operator pods | ||
NAME READY STATUS RESTARTS AGE | ||
flux-sample-0-4wmmp 1/1 Running 0 6m50s | ||
flux-sample-1-mjj7b 1/1 Running 0 6m50s | ||
``` | ||
|
||
## Flux Cluster for With LAMMPS Application | ||
|
||
For this setup, we can not use python api, because, currently, we need placement group for lammps and boto3 api lacks the support for providing `placement group` option. So, we will use `eksctl`. If you don't have `eksctl`, please install it first. | ||
|
||
```console | ||
eksctl create cluster -f operator-minicluster/hpc7g-configs/eks-efa-cluster-config-hpc7g.yaml | ||
``` | ||
|
||
This will create a cluster with managed nodegroup, oidc provider, and service account for cluster autoscaler. | ||
|
||
Now deploy an arm version of the Flux Operator. | ||
```console | ||
kubectl apply -f operator-minicluster/hpc7g-configs/flux-operator-arm.yaml | ||
``` | ||
|
||
This will create our size 1 cluster that we will be running LAMMPS on many times: | ||
``` | ||
kubectl create namespace flux-operator | ||
kubectl apply -f operator-minicluster/hpc7g-configs/minicluster-libfabric-new.yaml # 18.1.1 | ||
``` | ||
|
||
More to follow... |
Oops, something went wrong.