Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: add endpoints webhook for node autonomy #2211

Merged
merged 4 commits into from
Dec 2, 2024

Conversation

tnsimon
Copy link
Contributor

@tnsimon tnsimon commented Dec 1, 2024

What type of PR is this?

/kind feature

What this PR does / why we need it:

Adds endpoint webhook for node autonomy feature. When the node lifecycle controller marks a pod not ready; the endpoints are removed. This webhook ensures those endpoints are still in the ready state when node autonomy feature is in use.

Which issue(s) this PR fixes:

Fixes #2197

Special notes for your reviewer:

Does this PR introduce a user-facing change?

No

other Note

Test Results

Yurt manager : Node lifecycle controller and platformadmins controller disabled.
apiVersion: apps/v1
kind: Deployment
metadata:
  annotations:
    deployment.kubernetes.io/revision: "1"
    meta.helm.sh/release-name: yurt-manager
    meta.helm.sh/release-namespace: kube-system
  creationTimestamp: "2024-12-01T22:26:12Z"
  generation: 1
  labels:
    app.kubernetes.io/instance: yurt-manager
    app.kubernetes.io/managed-by: Helm
    app.kubernetes.io/name: yurt-manager
    app.kubernetes.io/version: v1.5.0
    helm.sh/chart: yurt-manager-1.5.0
  name: yurt-manager
  namespace: kube-system
  resourceVersion: "23331"
  uid: 792f663e-7c8c-48b7-9a46-9f5bd7fd96b1
spec:
  progressDeadlineSeconds: 600
  replicas: 1
  revisionHistoryLimit: 10
  selector:
    matchLabels:
      app.kubernetes.io/instance: yurt-manager
      app.kubernetes.io/name: yurt-manager
  strategy:
    rollingUpdate:
      maxSurge: 25%
      maxUnavailable: 25%
    type: RollingUpdate
  template:
    metadata:
      creationTimestamp: null
      labels:
        app.kubernetes.io/instance: yurt-manager
        app.kubernetes.io/name: yurt-manager
    spec:
      affinity:
        nodeAffinity:
          requiredDuringSchedulingIgnoredDuringExecution:
            nodeSelectorTerms:
            - matchExpressions:
              - key: yurt-control-plane
                operator: Exists
      containers:
      - args:
        - --metrics-addr=:10271
        - --health-probe-addr=:10272
        - --webhook-port=10273
        - --v=4
        - --working-namespace=kube-system
        - --leader-elect-resource-name=cloud-yurt-manager
        - --controllers=-platformadmin,-nodelifecycle,*
        command:
        - /usr/local/bin/yurt-manager
        image: tiensimon/yurt-manager:latest
        imagePullPolicy: IfNotPresent
        livenessProbe:
          failureThreshold: 3
          httpGet:
            path: /healthz
            port: 10272
            scheme: HTTP
          initialDelaySeconds: 60
          periodSeconds: 10
          successThreshold: 1
          timeoutSeconds: 2
        name: yurt-manager
        ports:
        - containerPort: 10273
          name: webhook-server
          protocol: TCP
        - containerPort: 10271
          name: metrics
          protocol: TCP
        - containerPort: 10272
          name: health
          protocol: TCP
        readinessProbe:
          failureThreshold: 2
          httpGet:
            path: /readyz
            port: 10272
            scheme: HTTP
          initialDelaySeconds: 60
          periodSeconds: 10
          successThreshold: 1
          timeoutSeconds: 2
        resources:
          limits:
            cpu: "2"
            memory: 1Gi
          requests:
            cpu: 100m
            memory: 256Mi
        terminationMessagePath: /dev/termination-log
        terminationMessagePolicy: File
      dnsPolicy: ClusterFirst
      restartPolicy: Always
      schedulerName: default-scheduler
      securityContext: {}
      serviceAccount: yurt-manager
      serviceAccountName: yurt-manager
      terminationGracePeriodSeconds: 30
      tolerations:
      - effect: NoSchedule
        key: node-role.kubernetes.io/master
      - effect: NoSchedule
        key: node-role.kubernetes.io/control-plane
status:
  availableReplicas: 1
  conditions:
  - lastTransitionTime: "2024-12-01T22:27:22Z"
    lastUpdateTime: "2024-12-01T22:27:22Z"
    message: Deployment has minimum availability.
    reason: MinimumReplicasAvailable
    status: "True"
    type: Available
  - lastTransitionTime: "2024-12-01T22:26:12Z"
    lastUpdateTime: "2024-12-01T22:27:22Z"
    message: ReplicaSet "yurt-manager-587b4b8bd4" has successfully progressed.
    reason: NewReplicaSetAvailable
    status: "True"
    type: Progressing
  observedGeneration: 1
  readyReplicas: 1
  replicas: 1
  updatedReplicas: 1
Nodes - node is ready and labelled with autonomy-duration
✗ k get nodes -o wide
NAME                                STATUS   ROLES    AGE     VERSION   INTERNAL-IP    EXTERNAL-IP      OS-IMAGE             KERNEL-VERSION      CONTAINER-RUNTIME
vagrant                             Ready    <none>   9m19s   v1.29.9   172.16.201.2   <none>           Ubuntu 24.04 LTS     6.8.0-38-generic    containerd://1.7.12

✗ k get nodes vagrant -o yaml 
apiVersion: v1
kind: Node
metadata:
  annotations:
    node.openyurt.io/autonomy-duration: 3600s
Pods - pods are running
✗ k get pods -o wide
NAME                                READY   STATUS    RESTARTS            AGE     IP             NODE                                NOMINATED NODE   READINESS GATES
nginx-deployment-79b9cd8cc9-xcmnl   1/1     Running   0                   3m30s   10.244.1.195   vagrant                             <none>           <none>
nginx-deployment-79b9cd8cc9-z89hz   1/1     Running   0                   3m30s   10.244.1.182   vagrant                             <none>           <none>

Endpoints - endpoints are up and ready

✗ k get endpoints
NAME            ENDPOINTS                           AGE
nginx-service   10.244.1.182:80,10.244.1.195:80     7m45s
Node goes
$ k get nodes
NAME                                STATUS     ROLES    AGE    VERSION
vagrant                             NotReady   <none>   18m    v1.29.9
Pods - still running becomes not ready
✗ k get pods -o wide
NAME                                READY   STATUS    RESTARTS            AGE     IP             NODE                                NOMINATED NODE   READINESS GATES
nginx-deployment-79b9cd8cc9-xcmnl   1/1     Running   0                   6m57s   10.244.1.195   vagrant                             <none>           <none>
nginx-deployment-79b9cd8cc9-z89hz   1/1     Running   0                   6m57s   10.244.1.182   vagrant                             <none>           <none>

apiVersion: v1
kind: Pod
metadata:
  creationTimestamp: "2024-12-01T22:39:16Z"
  generateName: nginx-deployment-79b9cd8cc9-
  labels:
    app: nginx
    pod-template-hash: 79b9cd8cc9
  name: nginx-deployment-79b9cd8cc9-xcmnl
status:
  conditions:
  - lastProbeTime: null
    lastTransitionTime: "2024-12-01T22:42:28Z"
    status: "True"
    type: PodReadyToStartContainers
  - lastProbeTime: null
    lastTransitionTime: "2024-12-01T22:42:21Z"
    status: "True"
    type: Initialized
  - lastProbeTime: null
    lastTransitionTime: "2024-12-01T22:44:51Z"
    status: "False" 
    type: Ready
  - lastProbeTime: null
    lastTransitionTime: "2024-12-01T22:42:28Z"
    status: "True"
    type: ContainersReady
  - lastProbeTime: null
    lastTransitionTime: "2024-12-01T22:42:21Z"
    status: "True"
    type: PodScheduled
  containerStatuses:
  - containerID: containerd://571f958748de3f9db6281524f5f904857b761884efb594c5b479a165b1dcc171
    image: docker.io/library/nginx:latest
    imageID: docker.io/library/nginx@sha256:0c86dddac19f2ce4fd716ac58c0fd87bf69bfd4edabfd6971fb885bafd12a00b
    lastState: {}
    name: nginx
    ready: true
    restartCount: 0
    started: true
    state:
      running:
        startedAt: "2024-12-01T22:42:27Z"
  hostIP: 172.16.201.2
  hostIPs:
  - ip: 172.16.201.2
  phase: Running
  podIP: 10.244.1.195
  podIPs:
  - ip: 10.244.1.195
  qosClass: BestEffort
  startTime: "2024-12-01T22:42:21Z"
Endpoints - remain up, unchanged and ready.
✗ k get endpoints
NAME            ENDPOINTS                           AGE
nginx-service   10.244.1.182:80,10.244.1.195:80     10m

Copy link

codecov bot commented Dec 1, 2024

Codecov Report

Attention: Patch coverage is 73.68421% with 15 lines in your changes missing coverage. Please review.

Project coverage is 45.19%. Comparing base (7763e7c) to head (0f38474).
Report is 44 commits behind head on master.

Files with missing lines Patch % Lines
...tmanager/webhook/endpoints/v1/endpoints_default.go 77.08% 7 Missing and 4 partials ⚠️
...tmanager/webhook/endpoints/v1/endpoints_handler.go 0.00% 3 Missing ⚠️
pkg/yurtmanager/webhook/server.go 0.00% 1 Missing ⚠️
Additional details and impacted files
@@             Coverage Diff             @@
##           master    #2211       +/-   ##
===========================================
- Coverage   58.93%   45.19%   -13.75%     
===========================================
  Files         210      404      +194     
  Lines       18968    27811     +8843     
===========================================
+ Hits        11179    12569     +1390     
- Misses       6707    14009     +7302     
- Partials     1082     1233      +151     
Flag Coverage Δ
unittests 45.19% <73.68%> (-13.75%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Copy link

sonarqubecloud bot commented Dec 2, 2024

@rambohe-ch rambohe-ch added approved approved lgtm lgtm labels Dec 2, 2024
@rambohe-ch rambohe-ch merged commit 57164bb into openyurtio:master Dec 2, 2024
13 of 14 checks passed
@tnsimon tnsimon deleted the addendpointwebhook branch December 3, 2024 02:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved approved lgtm lgtm
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[feature request]Add endpoints webhook for node automony feature
2 participants