generated from onedr0p/cluster-template
-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(helm): update gpu-operator ( v24.6.2 → v24.9.1 ) #544
Open
renovate
wants to merge
1
commit into
main
Choose a base branch
from
renovate/gpu-operator-24.x
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
--- HelmRelease: gpu-operator/gpu-operator ClusterRole: gpu-operator/gpu-operator
+++ HelmRelease: gpu-operator/gpu-operator ClusterRole: gpu-operator/gpu-operator
@@ -52,27 +52,12 @@
- update
- patch
- delete
- apiGroups:
- ''
resources:
- - events
- - pods
- - pods/eviction
- - services
- verbs:
- - create
- - get
- - list
- - watch
- - update
- - patch
- - delete
-- apiGroups:
- - ''
- resources:
- nodes
verbs:
- get
- list
- watch
- update
@@ -86,39 +71,33 @@
- list
- create
- watch
- update
- patch
- apiGroups:
+ - ''
+ resources:
+ - events
+ - pods
+ - pods/eviction
+ verbs:
+ - create
+ - get
+ - list
+ - watch
+ - update
+ - patch
+ - delete
+- apiGroups:
- apps
resources:
- daemonsets
verbs:
- get
- list
- watch
-- apiGroups:
- - apps
- resources:
- - controllerrevisions
- verbs:
- - get
- - list
- - watch
-- apiGroups:
- - monitoring.coreos.com
- resources:
- - servicemonitors
- - prometheusrules
- verbs:
- - get
- - list
- - create
- - watch
- - update
- - delete
- apiGroups:
- nvidia.com
resources:
- clusterpolicies
- clusterpolicies/finalizers
- clusterpolicies/status
@@ -141,24 +120,12 @@
verbs:
- get
- list
- watch
- create
- apiGroups:
- - coordination.k8s.io
- resources:
- - leases
- verbs:
- - get
- - list
- - watch
- - create
- - update
- - patch
- - delete
-- apiGroups:
- node.k8s.io
resources:
- runtimeclasses
verbs:
- get
- list
--- HelmRelease: gpu-operator/gpu-operator Role: gpu-operator/gpu-operator
+++ HelmRelease: gpu-operator/gpu-operator Role: gpu-operator/gpu-operator
@@ -22,12 +22,20 @@
- update
- patch
- delete
- apiGroups:
- apps
resources:
+ - controllerrevisions
+ verbs:
+ - get
+ - list
+ - watch
+- apiGroups:
+ - apps
+ resources:
- daemonsets
verbs:
- create
- get
- list
- watch
@@ -35,17 +43,46 @@
- patch
- delete
- apiGroups:
- ''
resources:
- configmaps
+ - endpoints
+ - pods
+ - pods/eviction
- secrets
+ - services
+ - services/finalizers
- serviceaccounts
verbs:
- create
- get
- list
- watch
- update
- patch
- delete
+- apiGroups:
+ - coordination.k8s.io
+ resources:
+ - leases
+ verbs:
+ - get
+ - list
+ - watch
+ - create
+ - update
+ - patch
+ - delete
+- apiGroups:
+ - monitoring.coreos.com
+ resources:
+ - servicemonitors
+ - prometheusrules
+ verbs:
+ - get
+ - list
+ - create
+ - watch
+ - update
+ - delete
--- HelmRelease: gpu-operator/gpu-operator Deployment: gpu-operator/gpu-operator
+++ HelmRelease: gpu-operator/gpu-operator Deployment: gpu-operator/gpu-operator
@@ -44,13 +44,13 @@
value: ''
- name: OPERATOR_NAMESPACE
valueFrom:
fieldRef:
fieldPath: metadata.namespace
- name: DRIVER_MANAGER_IMAGE
- value: nvcr.io/nvidia/cloud-native/k8s-driver-manager:v0.6.10
+ value: nvcr.io/nvidia/cloud-native/k8s-driver-manager:v0.7.0
volumeMounts:
- name: host-os-release
mountPath: /host-etc/os-release
readOnly: true
livenessProbe:
httpGet:
--- HelmRelease: gpu-operator/gpu-operator ClusterPolicy: gpu-operator/cluster-policy
+++ HelmRelease: gpu-operator/gpu-operator ClusterPolicy: gpu-operator/cluster-policy
@@ -15,30 +15,30 @@
operator:
defaultRuntime: docker
runtimeClass: nvidia
initContainer:
repository: nvcr.io/nvidia
image: cuda
- version: 12.6.1-base-ubi8
+ version: 12.6.3-base-ubi9
imagePullPolicy: IfNotPresent
daemonsets:
labels:
- helm.sh/chart: gpu-operator-v24.6.2
+ helm.sh/chart: gpu-operator-v24.9.1
app.kubernetes.io/managed-by: gpu-operator
tolerations:
- effect: NoSchedule
key: nvidia.com/gpu
operator: Exists
priorityClassName: system-node-critical
updateStrategy: RollingUpdate
rollingUpdate:
maxUnavailable: '1'
validator:
repository: nvcr.io/nvidia/cloud-native
image: gpu-operator-validator
- version: v24.6.2
+ version: v24.9.1
imagePullPolicy: IfNotPresent
plugin:
env:
- name: WITH_WORKLOAD
value: 'false'
mig:
@@ -52,26 +52,26 @@
enabled: false
useNvidiaDriverCRD: false
useOpenKernelModules: false
usePrecompiled: false
repository: nvcr.io/nvidia
image: driver
- version: 550.90.07
+ version: 550.127.08
imagePullPolicy: IfNotPresent
startupProbe:
failureThreshold: 120
initialDelaySeconds: 60
periodSeconds: 10
timeoutSeconds: 60
rdma:
enabled: false
useHostMofed: false
manager:
repository: nvcr.io/nvidia/cloud-native
image: k8s-driver-manager
- version: v0.6.10
+ version: v0.7.0
imagePullPolicy: IfNotPresent
env:
- name: ENABLE_GPU_POD_EVICTION
value: 'true'
- name: ENABLE_AUTO_DRAIN
value: 'false'
@@ -113,13 +113,13 @@
enabled: false
image: vgpu-manager
imagePullPolicy: IfNotPresent
driverManager:
repository: nvcr.io/nvidia/cloud-native
image: k8s-driver-manager
- version: v0.6.10
+ version: v0.7.0
imagePullPolicy: IfNotPresent
env:
- name: ENABLE_GPU_POD_EVICTION
value: 'false'
- name: ENABLE_AUTO_DRAIN
value: 'false'
@@ -138,35 +138,35 @@
url: nvcr.io/nvidia/cloud-native/kata-gpu-artifacts:ubuntu22.04-535.86.10-snp
name: kata-nvidia-gpu-snp
nodeSelector:
nvidia.com/cc.capable: 'true'
repository: nvcr.io/nvidia/cloud-native
image: k8s-kata-manager
- version: v0.2.1
+ version: v0.2.2
imagePullPolicy: IfNotPresent
vfioManager:
enabled: true
repository: nvcr.io/nvidia
image: cuda
- version: 12.6.1-base-ubi8
+ version: 12.6.3-base-ubi9
imagePullPolicy: IfNotPresent
driverManager:
repository: nvcr.io/nvidia/cloud-native
image: k8s-driver-manager
- version: v0.6.10
+ version: v0.7.0
imagePullPolicy: IfNotPresent
env:
- name: ENABLE_GPU_POD_EVICTION
value: 'false'
- name: ENABLE_AUTO_DRAIN
value: 'false'
vgpuDeviceManager:
enabled: true
repository: nvcr.io/nvidia/cloud-native
image: vgpu-device-manager
- version: v0.2.7
+ version: v0.2.8
imagePullPolicy: IfNotPresent
config:
default: default
name: ''
ccManager:
enabled: false
@@ -189,13 +189,13 @@
value: none
installDir: /var/nvidia
devicePlugin:
enabled: true
repository: nvcr.io/nvidia
image: k8s-device-plugin
- version: v0.16.2-ubi8
+ version: v0.17.0
imagePullPolicy: IfNotPresent
env:
- name: PASS_DEVICE_SPECS
value: 'true'
- name: FAIL_ON_INIT_ERROR
value: 'true'
@@ -211,19 +211,19 @@
name: time-slicing-config-all
default: any
dcgm:
enabled: false
repository: nvcr.io/nvidia/cloud-native
image: dcgm
- version: 3.3.7-1-ubuntu22.04
+ version: 3.3.9-1-ubuntu22.04
imagePullPolicy: IfNotPresent
dcgmExporter:
enabled: true
repository: nvcr.io/nvidia/k8s
image: dcgm-exporter
- version: 3.3.7-3.5.0-ubuntu22.04
+ version: 3.3.9-3.6.1-ubuntu22.04
imagePullPolicy: IfNotPresent
env:
- name: DCGM_EXPORTER_LISTEN
value: :9400
- name: DCGM_EXPORTER_KUBERNETES
value: 'true'
@@ -236,24 +236,24 @@
interval: 15s
relabelings: []
gfd:
enabled: true
repository: nvcr.io/nvidia
image: k8s-device-plugin
- version: v0.16.2-ubi8
+ version: v0.17.0
imagePullPolicy: IfNotPresent
env:
- name: GFD_SLEEP_INTERVAL
value: 60s
- name: GFD_FAIL_ON_INIT_ERROR
value: 'true'
migManager:
enabled: true
repository: nvcr.io/nvidia/cloud-native
image: k8s-mig-manager
- version: v0.8.0-ubuntu20.04
+ version: v0.10.0-ubuntu20.04
imagePullPolicy: IfNotPresent
env:
- name: WITH_REBOOT
value: 'false'
config:
name: null
@@ -261,24 +261,24 @@
gpuClientsConfig:
name: ''
nodeStatusExporter:
enabled: false
repository: nvcr.io/nvidia/cloud-native
image: gpu-operator-validator
- version: v24.6.2
+ version: v24.9.1
imagePullPolicy: IfNotPresent
gdrcopy:
enabled: false
repository: nvcr.io/nvidia/cloud-native
image: gdrdrv
- version: v2.4.1-1
+ version: v2.4.1-2
imagePullPolicy: IfNotPresent
sandboxWorkloads:
enabled: false
defaultWorkload: container
sandboxDevicePlugin:
enabled: true
repository: nvcr.io/nvidia
image: kubevirt-gpu-device-plugin
- version: v1.2.9
+ version: v1.2.10
imagePullPolicy: IfNotPresent
--- HelmRelease: gpu-operator/gpu-operator ServiceAccount: gpu-operator/gpu-operator-upgrade-crd-hook-sa
+++ HelmRelease: gpu-operator/gpu-operator ServiceAccount: gpu-operator/gpu-operator-upgrade-crd-hook-sa
@@ -0,0 +1,10 @@
+---
+apiVersion: v1
+kind: ServiceAccount
+metadata:
+ name: gpu-operator-upgrade-crd-hook-sa
+ annotations:
+ helm.sh/hook: pre-upgrade
+ helm.sh/hook-delete-policy: hook-succeeded,before-hook-creation
+ helm.sh/hook-weight: '0'
+
--- HelmRelease: gpu-operator/gpu-operator ClusterRole: gpu-operator/gpu-operator-upgrade-crd-hook-role
+++ HelmRelease: gpu-operator/gpu-operator ClusterRole: gpu-operator/gpu-operator-upgrade-crd-hook-role
@@ -0,0 +1,22 @@
+---
+apiVersion: rbac.authorization.k8s.io/v1
+kind: ClusterRole
+metadata:
+ name: gpu-operator-upgrade-crd-hook-role
+ annotations:
+ helm.sh/hook: pre-upgrade
+ helm.sh/hook-delete-policy: hook-succeeded,before-hook-creation
+ helm.sh/hook-weight: '0'
+rules:
+- apiGroups:
+ - apiextensions.k8s.io
+ resources:
+ - customresourcedefinitions
+ verbs:
+ - create
+ - get
+ - list
+ - watch
+ - patch
+ - update
+
--- HelmRelease: gpu-operator/gpu-operator ClusterRoleBinding: gpu-operator/gpu-operator-upgrade-crd-hook-binding
+++ HelmRelease: gpu-operator/gpu-operator ClusterRoleBinding: gpu-operator/gpu-operator-upgrade-crd-hook-binding
@@ -0,0 +1,18 @@
+---
+kind: ClusterRoleBinding
+apiVersion: rbac.authorization.k8s.io/v1
+metadata:
+ name: gpu-operator-upgrade-crd-hook-binding
+ annotations:
+ helm.sh/hook: pre-upgrade
+ helm.sh/hook-delete-policy: hook-succeeded,before-hook-creation
+ helm.sh/hook-weight: '0'
+subjects:
+- kind: ServiceAccount
+ name: gpu-operator-upgrade-crd-hook-sa
+ namespace: gpu-operator
+roleRef:
+ kind: ClusterRole
+ name: gpu-operator-upgrade-crd-hook-role
+ apiGroup: rbac.authorization.k8s.io
+
--- HelmRelease: gpu-operator/gpu-operator Job: gpu-operator/gpu-operator-upgrade-crd
+++ HelmRelease: gpu-operator/gpu-operator Job: gpu-operator/gpu-operator-upgrade-crd
@@ -0,0 +1,46 @@
+---
+apiVersion: batch/v1
+kind: Job
+metadata:
+ name: gpu-operator-upgrade-crd
+ namespace: gpu-operator
+ annotations:
+ helm.sh/hook: pre-upgrade
+ helm.sh/hook-weight: '1'
+ helm.sh/hook-delete-policy: hook-succeeded,before-hook-creation
+ labels:
+ app.kubernetes.io/name: gpu-operator
+ app.kubernetes.io/instance: gpu-operator
+ app.kubernetes.io/managed-by: Helm
+ app.kubernetes.io/component: gpu-operator
+spec:
+ template:
+ metadata:
+ name: gpu-operator-upgrade-crd
+ labels:
+ app.kubernetes.io/name: gpu-operator
+ app.kubernetes.io/instance: gpu-operator
+ app.kubernetes.io/managed-by: Helm
+ app.kubernetes.io/component: gpu-operator
+ spec:
+ serviceAccountName: gpu-operator-upgrade-crd-hook-sa
+ tolerations:
+ - effect: NoSchedule
+ key: node-role.kubernetes.io/master
+ operator: Equal
+ value: ''
+ - effect: NoSchedule
+ key: node-role.kubernetes.io/control-plane
+ operator: Equal
+ value: ''
+ containers:
+ - name: upgrade-crd
+ image: ghcr.io/jfroy/gpu-operator:v24.6.2-ubi8
+ imagePullPolicy: IfNotPresent
+ command:
+ - /bin/sh
+ - -c
+ - |
+ kubectl apply -f /opt/gpu-operator/nvidia.com_clusterpolicies.yaml; kubectl apply -f /opt/gpu-operator/nvidia.com_nvidiadrivers.yaml;
+ restartPolicy: OnFailure
+ |
--- kubernetes/apps/gpu-operator/gpu-operator/app Kustomization: flux-system/gpu-operator HelmRelease: gpu-operator/gpu-operator
+++ kubernetes/apps/gpu-operator/gpu-operator/app Kustomization: flux-system/gpu-operator HelmRelease: gpu-operator/gpu-operator
@@ -13,13 +13,13 @@
spec:
chart: gpu-operator
sourceRef:
kind: HelmRepository
name: nvidia
namespace: flux-system
- version: v24.6.2
+ version: v24.9.1
driftDetection:
mode: enabled
install:
crds: CreateReplace
disableOpenAPIValidation: true
remediation: |
jfroy
force-pushed
the
main
branch
10 times, most recently
from
November 7, 2024 18:10
44a8b71
to
e2e1ece
Compare
renovate
bot
force-pushed
the
renovate/gpu-operator-24.x
branch
from
November 9, 2024 10:15
fee98bb
to
177fe6f
Compare
renovate
bot
force-pushed
the
renovate/gpu-operator-24.x
branch
from
November 10, 2024 03:52
177fe6f
to
57e2949
Compare
jfroy
force-pushed
the
main
branch
2 times, most recently
from
November 10, 2024 04:00
05848cb
to
4f6fd94
Compare
renovate
bot
force-pushed
the
renovate/gpu-operator-24.x
branch
from
November 10, 2024 04:00
57e2949
to
5af188a
Compare
renovate
bot
force-pushed
the
renovate/gpu-operator-24.x
branch
from
November 10, 2024 04:02
5af188a
to
c982c9b
Compare
renovate
bot
force-pushed
the
renovate/gpu-operator-24.x
branch
2 times, most recently
from
November 10, 2024 04:04
326afa0
to
65b089a
Compare
renovate
bot
force-pushed
the
renovate/gpu-operator-24.x
branch
from
November 10, 2024 04:16
65b089a
to
d0604dc
Compare
jfroy
force-pushed
the
main
branch
2 times, most recently
from
November 13, 2024 18:46
8522a8e
to
2c1a094
Compare
jfroy
force-pushed
the
main
branch
9 times, most recently
from
November 26, 2024 18:22
fc85124
to
eb2fbea
Compare
renovate
bot
force-pushed
the
renovate/gpu-operator-24.x
branch
from
December 5, 2024 05:03
d0604dc
to
17e05bc
Compare
renovate
bot
changed the title
feat(helm): update gpu-operator ( v24.6.2 → v24.9.0 )
feat(helm): update gpu-operator ( v24.6.2 → v24.9.1 )
Dec 5, 2024
renovate
bot
force-pushed
the
renovate/gpu-operator-24.x
branch
from
December 5, 2024 19:45
17e05bc
to
50d064e
Compare
renovate
bot
force-pushed
the
renovate/gpu-operator-24.x
branch
from
December 13, 2024 20:08
50d064e
to
53d16eb
Compare
jfroy
force-pushed
the
main
branch
2 times, most recently
from
December 16, 2024 09:56
6eb2c51
to
8af725e
Compare
jfroy
force-pushed
the
main
branch
12 times, most recently
from
January 10, 2025 23:50
16e7cd4
to
d095fee
Compare
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This PR contains the following updates:
v24.6.2
->v24.9.1
Release Notes
NVIDIA/gpu-operator (gpu-operator)
v24.9.1
: GPU Operator 24.9.1 ReleaseCompare Source
https://docs.nvidia.com/datacenter/cloud-native/gpu-operator/24.9.1/release-notes.html
v24.9.0
: GPU Operator 24.9.0 ReleaseCompare Source
https://docs.nvidia.com/datacenter/cloud-native/gpu-operator/24.9.0/release-notes.html
Configuration
📅 Schedule: Branch creation - "* 0-4,22-23 * * 1-5,* * * * 0,6" in timezone America/Los_Angeles, Automerge - At any time (no schedule defined).
🚦 Automerge: Disabled by config. Please merge this manually once you are satisfied.
♻ Rebasing: Whenever PR becomes conflicted, or you tick the rebase/retry checkbox.
🔕 Ignore: Close this PR and you won't be reminded about this update again.
This PR was generated by Mend Renovate. View the repository job log.