Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

neonvm: prevent migrating VMs to the nodes with non-matching architecture #1123

Open
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

mikhail-sakhnov
Copy link
Contributor

@mikhail-sakhnov mikhail-sakhnov commented Oct 24, 2024

Add node affinity to the VM spec during the live migration pending state, if it doesn't have.

#1083

Copy link

github-actions bot commented Oct 24, 2024

No changes to the coverage.

HTML Report

Click to open

@mikhail-sakhnov mikhail-sakhnov force-pushed the misha/architecture-fence-for-livemigration branch 2 times, most recently from 5446383 to a036d85 Compare October 24, 2024 09:46
@mikhail-sakhnov mikhail-sakhnov marked this pull request as ready for review October 24, 2024 09:46
@mikhail-sakhnov mikhail-sakhnov force-pushed the misha/architecture-fence-for-livemigration branch 3 times, most recently from 893c635 to 8e751ea Compare October 31, 2024 00:58
@sharnoff sharnoff self-assigned this Oct 31, 2024
Copy link
Member

@sharnoff sharnoff left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some comments

pkg/neonvm/controllers/vmmigration_controller.go Outdated Show resolved Hide resolved
if vm.Spec.Affinity != nil && vm.Spec.Affinity.NodeAffinity != nil && vm.Spec.Affinity.NodeAffinity.RequiredDuringSchedulingIgnoredDuringExecution != nil {
for _, term := range vm.Spec.Affinity.NodeAffinity.RequiredDuringSchedulingIgnoredDuringExecution.NodeSelectorTerms {
for _, expr := range term.MatchExpressions {
if expr.Key == "kubernetes.io/arch" && expr.Operator == corev1.NodeSelectorOpIn && len(expr.Values) > 0 {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

question: does len(expr.Values) > 0 mean that this function will return true for VMs that have an architecture affinity for e.g. "either amd64 or arm64" ? IIUC, would that mean that we could accidentally migrate between x86 and ARM?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Values are checked in the loop body, if I understand correctly your question

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think what Em means that for the following input

{
	Key:      "kubernetes.io/arch",
	Operator: corev1.NodeSelectorOpIn,
	Values:   []string{"x86", "arm64"},
}

The function how it is written will return true, even though this affinity doesn't prevent us from migration between x86 and arm.

tests/e2e/vm-migration/01-assert.yaml Show resolved Hide resolved
Comment on lines +187 to +195
if err := addArchitectureAffinity(vm, sourceNode); err != nil {
log.Error(err, "Failed to add architecture affinity to VM", "sourceNodeStatus", sourceNode.Status)
}
if err = r.Update(ctx, vm); err != nil {
log.Error(err, "Failed to update node affinity of the source vm")
return ctrl.Result{RequeueAfter: time.Second}, err
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

question: IIUC, does this mean that the VM will permanently get affinity for the node it starts with?

Is there a reason we can't just do it on the target pod for the migration? (I guess, potential race conditions?)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think there is a particular reason, I just feel it is a bit safer to make the guard on the very first step of the migration.

Plus if we going to have multiarch clusters it is a bit unsafe to even have VM without architecture affinity, may be in later it should be even added during the VM creation, in the vmcontroller?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it is a bit unsafe to even have VM without architecture affinity

Do we know of any particular reason it might be unsafe? The same VM can create the pod first time on x86, and if it is deleted, it can be recreated on arm node. We can potentially utilize this to auto-balance between architectures.

Although, the regular restart of endpoints anyway means the recreation of the VM object (right?), so this might not matter much.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can potentially utilize this to auto-balance between architectures.

if we can do it, why to even prevent migration between architectures? I was under the impression that postgres can't safely migrate from one arch to another, because of the data files, need to double check that.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can potentially utilize this to auto-balance between architectures.

if we can do it, why to even prevent migration between architectures?

Having VM objects without affinity doesn't mean migration between architectures. Pods would still have defined architecture, architecture would be undefined only when there is no pod. This would mean if a pod is deleted for whatever reason, we can recreate it with different architecture. But I am not sure this is a practical scenario.

postgres can't safely migrate from one arch to another

It's not even postgres, it is the whole VM: all memory, including executable code, registers, etc.

@sharnoff sharnoff assigned mikhail-sakhnov and unassigned sharnoff Nov 18, 2024
@mikhail-sakhnov mikhail-sakhnov force-pushed the misha/architecture-fence-for-livemigration branch from a0219ef to c5ba550 Compare November 26, 2024 12:27
@mikhail-sakhnov mikhail-sakhnov force-pushed the misha/architecture-fence-for-livemigration branch from c5ba550 to f5246f7 Compare November 26, 2024 12:43
Comment on lines +777 to +792
func hasArchitectureAffinity(vm *vmv1.VirtualMachine, sourceNode *corev1.Node) bool {
if vm.Spec.Affinity != nil && vm.Spec.Affinity.NodeAffinity != nil && vm.Spec.Affinity.NodeAffinity.RequiredDuringSchedulingIgnoredDuringExecution != nil {
for _, term := range vm.Spec.Affinity.NodeAffinity.RequiredDuringSchedulingIgnoredDuringExecution.NodeSelectorTerms {
for _, expr := range term.MatchExpressions {
if expr.Key == "kubernetes.io/arch" && expr.Operator == corev1.NodeSelectorOpIn && len(expr.Values) > 0 {
for _, value := range expr.Values {
if value == sourceNode.Status.NodeInfo.Architecture {
return true
}
}
}
}
}
}
return false
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This function can be rewritten to reduce the max nested level from 7 to 4, smth like below.

This implementation also doesn't use sourceNode.

Suggested change
func hasArchitectureAffinity(vm *vmv1.VirtualMachine, sourceNode *corev1.Node) bool {
if vm.Spec.Affinity != nil && vm.Spec.Affinity.NodeAffinity != nil && vm.Spec.Affinity.NodeAffinity.RequiredDuringSchedulingIgnoredDuringExecution != nil {
for _, term := range vm.Spec.Affinity.NodeAffinity.RequiredDuringSchedulingIgnoredDuringExecution.NodeSelectorTerms {
for _, expr := range term.MatchExpressions {
if expr.Key == "kubernetes.io/arch" && expr.Operator == corev1.NodeSelectorOpIn && len(expr.Values) > 0 {
for _, value := range expr.Values {
if value == sourceNode.Status.NodeInfo.Architecture {
return true
}
}
}
}
}
}
return false
}
func hasArchitectureAffinity(vm *vmv1.VirtualMachine, sourceNode *corev1.Node) bool {
if vm.Spec.Affinity == nil || vm.Spec.Affinity.NodeAffinity == nil || vm.Spec.Affinity.NodeAffinity.RequiredDuringSchedulingIgnoredDuringExecution == nil {
return false
}
terms = vm.Spec.Affinity.NodeAffinity.RequiredDuringSchedulingIgnoredDuringExecution.NodeSelectorTerms
for _, term := range terms {
for _, expr := range term.MatchExpressions {
if expr.Key == "kubernetes.io/arch" &&
expr.Operator == corev1.NodeSelectorOpIn &&
len(expr.Values) == 1 {
return true
}
}
}
return false
}

if vm.Spec.Affinity != nil && vm.Spec.Affinity.NodeAffinity != nil && vm.Spec.Affinity.NodeAffinity.RequiredDuringSchedulingIgnoredDuringExecution != nil {
for _, term := range vm.Spec.Affinity.NodeAffinity.RequiredDuringSchedulingIgnoredDuringExecution.NodeSelectorTerms {
for _, expr := range term.MatchExpressions {
if expr.Key == "kubernetes.io/arch" && expr.Operator == corev1.NodeSelectorOpIn && len(expr.Values) > 0 {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think what Em means that for the following input

{
	Key:      "kubernetes.io/arch",
	Operator: corev1.NodeSelectorOpIn,
	Values:   []string{"x86", "arm64"},
}

The function how it is written will return true, even though this affinity doesn't prevent us from migration between x86 and arm.

Comment on lines +187 to +195
if err := addArchitectureAffinity(vm, sourceNode); err != nil {
log.Error(err, "Failed to add architecture affinity to VM", "sourceNodeStatus", sourceNode.Status)
}
if err = r.Update(ctx, vm); err != nil {
log.Error(err, "Failed to update node affinity of the source vm")
return ctrl.Result{RequeueAfter: time.Second}, err
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it is a bit unsafe to even have VM without architecture affinity

Do we know of any particular reason it might be unsafe? The same VM can create the pod first time on x86, and if it is deleted, it can be recreated on arm node. We can potentially utilize this to auto-balance between architectures.

Although, the regular restart of endpoints anyway means the recreation of the VM object (right?), so this might not matter much.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants