Skip to content

Commit

Permalink
(docs) VM.migrate.md: Rephrase and simplify, improve readability
Browse files Browse the repository at this point in the history
Signed-off-by: Bernhard Kaindl <[email protected]>
  • Loading branch information
bernhardkaindl committed Feb 15, 2025
1 parent ec3b62e commit b4a19e9
Show file tree
Hide file tree
Showing 3 changed files with 110 additions and 96 deletions.
178 changes: 96 additions & 82 deletions doc/content/xenopsd/walkthroughs/VM.migrate.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,38 +3,42 @@ title: 'Walkthrough: Migrating a VM'
linktitle: 'Migrating a VM'
description: Walkthrough of migrating a VM from one host to another.
weight: 50
mermaid:
force: true
---
At the end of this walkthrough, a sequence diagram of the overall process is included.

A XenAPI client wishes to migrate a VM from one host to another within
the same pool.
## Invocation

The client will issue a command to migrate the VM and it will be dispatched
The command to migrate the VM is dispatched
by the autogenerated `dispatch_call` function from **xapi/server.ml**. For
more information about the generated functions you can have a look to
[XAPI IDL model](https://github.com/xapi-project/xen-api/tree/master/ocaml/idl/ocaml_backend).

The command will trigger the operation
The command triggers the operation
[VM_migrate](https://github.com/xapi-project/xen-api/blob/7ac88b90e762065c5ebb94a8ea61c61bdbf62c5c/ocaml/xenopsd/lib/xenops_server.ml#L2572)
that has low level operations performed by the backend. These atomics operations
that we will describe in the documentation are:

- VM.restore
- VM.rename
- VBD.set_active
- VBD.plug
- VIF.set_active
- VGPU.set_active
- VM.create_device_model
- PCI.plug
- VM.set_domain_action_request

The command has several parameters such as: Should it be started asynchronously,
should it be forwarded to another host, how arguments should be marshalled and
so on. A new thread is created by [xapi/server_helpers.ml](https://github.com/xapi-project/xen-api/blob/7ac88b90e762065c5ebb94a8ea61c61bdbf62c5c/ocaml/xapi/server_helpers.ml#L55)
to handle the command asynchronously. At this point the helper also check if
that uses many low level atomics operations. These are:

- [VM.restore](#VM-restore)
- [VM.rename](#VM-rename)
- [VBD.set_active](#restoring-devices)
- [VBD.plug](#restoring-devices)
- [VIF.set_active](#restoring-devices)
- [VGPU.set_active](#restoring-devices)
- [VM.create_device_model](#creating-the-device-model)
- [PCI.plug](#pci-plug)

The migrate command has several parameters such as:

- Should it be started asynchronously,
- Should it be forwarded to another host,
- How arguments should be marshalled, and so on.

A new thread is created by [xapi/server_helpers.ml](https://github.com/xapi-project/xen-api/blob/7ac88b90e762065c5ebb94a8ea61c61bdbf62c5c/ocaml/xapi/server_helpers.ml#L55)
to handle the command asynchronously. The helper thread checks if
the command should be passed to the [message forwarding](https://github.com/xapi-project/xen-api/blob/master/ocaml/xapi/message_forwarding.ml)
layer in order to be executed on another host (the destination) or locally if
we are already at the right place.
layer in order to be executed on another host (the destination) or locally (if
it is already at the destination host).

It will finally reach [xapi/api_server.ml](https://github.com/xapi-project/xen-api/blob/7ac88b90e762065c5ebb94a8ea61c61bdbf62c5c/ocaml/xapi/api_server.ml#L242) that will take the action
of posted a command to the message broker [message switch](https://github.com/xapi-project/xen-api/tree/master/ocaml/message-switch).
Expand All @@ -43,34 +47,38 @@ XAPI daemons. In the case of the migration this message sends by **XAPI** will b
consumed by the [xenopsd](https://github.com/xapi-project/xen-api/tree/master/ocaml/xenopsd)
daemon that will do the job of migrating the VM.

# The migration of the VM
## Overview

The migration is an asynchronous task and a thread is created to handle this task.
The tasks's reference is returned to the client, which can then check
The task reference is returned to the client, which can then check
its status until completion.

As we see in the introduction the [xenopsd](https://github.com/xapi-project/xen-api/tree/master/ocaml/xenopsd)
daemon will pop the operation
As shown in the introduction, [xenopsd](https://github.com/xapi-project/xen-api/tree/master/ocaml/xenopsd)
fetches the
[VM_migrate](https://github.com/xapi-project/xen-api/blob/7ac88b90e762065c5ebb94a8ea61c61bdbf62c5c/ocaml/xenopsd/lib/xenops_server.ml#L2572)
from the message broker.
operation from the message broker.

Only one backend is know available that interacts with libxc, libxenguest
and xenstore. It is the [xc backend](https://github.com/xapi-project/xen-api/tree/master/ocaml/xenopsd/xc).
All tasks specific to [libxenctrl](../../lib/xenctrl),
[xenguest](VM.build/xenguest) and [Xenstore](https://wiki.xenproject.org/wiki/XenStore)
are handled by the xenopsd
[xc backend](https://github.com/xapi-project/xen-api/tree/master/ocaml/xenopsd/xc).

The entities that need to be migrated are: *VDI*, *VIF*, *VGPU* and *PCI* components.

During the migration process the destination domain will be built with the same
uuid than the original VM but the last part of the UUID will be
During the migration process, the destination domain will be built with the same
UUID as the original VM, except that the last part of the UUID will be
`XXXXXXXX-XXXX-XXXX-XXXX-000000000001`. The original domain will be removed using
`XXXXXXXX-XXXX-XXXX-XXXX-000000000000`.

There are some points called *hooks* at which `xenopsd` can execute some script.
Before starting a migration a command is send to the original domain to execute
a pre migrate script if it exists.
## Preparing VM migration

Before starting the migration a command is sent to Qemu using the Qemu Machine Protocol (QMP)
At specific places, `xenopsd` can execute *hooks* to run scripts.
In case a pre-migrate script is in place, a command to run this script
is sent to the original domain.

Likewise, a command is sent to Qemu using the Qemu Machine Protocol (QMP)
to check that the domain can be suspended (see [xenopsd/xc/device_common.ml](https://github.com/xapi-project/xen-api/blob/master/ocaml/xenopsd/xc/device_common.ml)).
After checking with Qemu that the VM is suspendable we can start the migration.
After checking with Qemu that the VM is can be suspended, the migration can begin.

## Importing metadata

Expand All @@ -82,38 +90,34 @@ Once imported, it will give us a reference id and will allow building the new do
on the destination using the temporary VM uuid `XXXXXXXX-XXXX-XXXX-XXXX-000000000001`
where `XXX...` is the reference id of the original VM.

## Setting memory
## Memory setup

One of the first thing to do is to set up the memory. The backend will check that there
is no ballooning operation in progress. At this point the migration can fail if a
ballooning operation is in progress and takes too much time.
One of the first steps the setup of the VM's memory: The backend checks that there
is no ballooning operation in progress. If so, the migration could fail.

Once memory has been checked, the daemon will get the state of the VM (running, halted, ...) and
information about the VM is retrieved by the backend like the maximum memory the domain
can consume but also information about quotas for example.
The backend retrieves this information from the Xenstore.
The backend retrieves the domain's platform data (memory, vCPUs setc) from the Xenstore.

Once this is complete, we can restore VIF and create the domain.

The synchronisation of the memory is the first point of synchronisation and everything
is ready for VM migration.

## VM Migration
## Destination VM setup

After receiving memory we can set up the destination domain. If we have a vGPU we need to kick
off its migration process. We will need to wait the acknowledge that indicates that the entry
for the GPU has been well initialized. before starting the main VM migration.
off its migration process. We will need to wait for the acknowledgement that the
GPU entry has been successfully initialized before starting the main VM migration.

Their is a mechanism of handshake for synchronizing between the source and the
destination. Using the handshake protocol the receiver inform the sender of the
request that everything has been setup and ready to save/restore.
The receiver informs the sender using a handshake protocol of the
request and informs the sender that everything has been set up and is ready for save/restore.

### VM restore
## Destination VM restore

VM restore is a low level atomic operation [VM.restore](https://github.com/xapi-project/xen-api/blob/7ac88b90e762065c5ebb94a8ea61c61bdbf62c5c/ocaml/xenopsd/xc/xenops_server_xen.ml#L2684).
This operation is represented by a function call to [backend](https://github.com/xapi-project/xen-api/blob/7ac88b90e762065c5ebb94a8ea61c61bdbf62c5c/ocaml/xenopsd/xc/domain.ml#L1540).
It uses **Xenguest**, a low-level utility from XAPI toolstack, to interact with the Xen hypervisor
and libxc for sending a request of migration to the **emu-manager**.
and `libxc` for sending a migration request to the **emu-manager**.

After sending the request results coming from **emu-manager** are collected
by the main thread. It blocks until results are received.
Expand All @@ -123,16 +127,14 @@ transitions for the devices and handling the message passing for the VM as
it's moved between hosts. This includes making sure that the state of the
VM's virtual devices, like disks or network interfaces, is correctly moved over.

### VM renaming
## Destination VM rename

Once all operations are done we can rename the VM on the target from its temporary
name to its real UUID. This operation is another low level atomic one
Once all operations are done, `xenopsd` renames the target VM from its temporary
name to its real UUID. This operation is a low-level atomic
[VM.rename](https://github.com/xapi-project/xen-api/blob/7ac88b90e762065c5ebb94a8ea61c61bdbf62c5c/ocaml/xenopsd/xc/xenops_server_xen.ml#L1667)
that will take care of updating the xenstore on the destination.

The next step is the restauration of devices and unpause the domain.
which takes care of updating the Xenstore on the destination host.

### Restoring remaining devices
## Restoring devices

Restoring devices starts by activating VBD using the low level atomic operation
[VBD.set_active](https://github.com/xapi-project/xen-api/blob/7ac88b90e762065c5ebb94a8ea61c61bdbf62c5c/ocaml/xenopsd/xc/xenops_server_xen.ml#L3674). It is an update of Xenstore. VBDs that are read-write must
Expand All @@ -143,39 +145,51 @@ is called. VDI are attached and activate.
Next devices are VIFs that are set as active [VIF.set_active](https://github.com/xapi-project/xen-api/blob/7ac88b90e762065c5ebb94a8ea61c61bdbf62c5c/ocaml/xenopsd/xc/xenops_server_xen.ml#L4296) and plug [VIF.plug](https://github.com/xapi-project/xen-api/blob/7ac88b90e762065c5ebb94a8ea61c61bdbf62c5c/ocaml/xenopsd/xc/xenops_server_xen.ml#L4394).
If there are VGPUs we will set them as active now using the atomic [VGPU.set_active](https://github.com/xapi-project/xen-api/blob/7ac88b90e762065c5ebb94a8ea61c61bdbf62c5c/ocaml/xenopsd/xc/xenops_server_xen.ml#L3490).

We are almost done. The next step is to create the device model

#### create device model
### Creating the device model

Create device model is done by using the atomic operation [VM.create_device_model](https://github.com/xapi-project/xen-api/blob/7ac88b90e762065c5ebb94a8ea61c61bdbf62c5c/ocaml/xenopsd/xc/xenops_server_xen.ml#L2375). This
will configure **qemu-dm** and started. This allows to manage PCI devices.
[create_device_model](https://github.com/xapi-project/xen-api/blob/ec3b62ee/ocaml/xenopsd/xc/xenops_server_xen.ml#L2293-L2349)
configures **qemu-dm** and starts it. This allows to manage PCI devices.

#### PCI plug
### PCI plug

[PCI.plug](https://github.com/xapi-project/xen-api/blob/7ac88b90e762065c5ebb94a8ea61c61bdbf62c5c/ocaml/xenopsd/xc/xenops_server_xen.ml#L3399)
is executed by the backend. It plugs a PCI device and advertises it to QEMU if this option is set. It is
the case for NVIDIA SR-IOV vGPUS.
the case for NVIDIA SR-IOV vGPUs.

At this point devices have been restored. The new domain is considered survivable. We can
unpause the domain and performs last actions
## Unpause

### Unpause and done
The libxenctrl call
[xc_domain_unpause()](https://github.com/xen-project/xen/blob/414dde3/tools/libs/ctrl/xc_domain.c#L76)
unpauses the domain, and it starts running.

Unpause is done by managing the state of the domain using bindings to [xenctrl](https://xenbits.xen.org/gitweb/?p=xen.git;a=blob;f=tools/libs/ctrl/xc_domain.c;h=f2d9d14b4d9f24553fa766c5dcb289f88d684bb0;hb=HEAD#l76).
Once hypervisor has unpaused the domain some actions can be requested using [VM.set_domain_action_request](https://github.com/xapi-project/xen-api/blob/7ac88b90e762065c5ebb94a8ea61c61bdbf62c5c/ocaml/xenopsd/xc/xenops_server_xen.ml#L3172).
It is a path in xenstore. By default no action is done but a reboot can be for example
initiated.
## Cleanup

Previously we spoke about some points called *hooks* at which `xenopsd` can execute some script. There
is also a hook to run a post migrate script. After the execution of the script if there is one
the migration is almost done. The last step is a handshake to seal the success of the migration
1. [VM_set_domain_action_request](https://github.com/xapi-project/xen-api/blob/ec3b62ee/ocaml/xenopsd/lib/xenops_server.ml#L3004)
marks the domain as alive: In case `xenopsd` restarts, it no longer reboots the VM.
See the chapter on [marking domains as alive](VM.start#11-mark-the-domain-as-alive)
for more information.

2. If a post-migrate script is in place, it is executed by the
[Xenops_hooks.VM_post_migrate](https://github.com/xapi-project/xen-api/blob/ec3b62ee/ocaml/xenopsd/lib/xenops_server.ml#L3005-L3009)
hook.

3. The final step is a handshake to seal the success of the migration
and the old VM can now be cleaned up.

# Links
[Syncronisation point 4](https://github.com/xapi-project/xen-api/blob/ec3b62ee/ocaml/xenopsd/lib/xenops_server.ml#L3014)
has been reached, the migration is complete.

## Live migration flowchart

This flowchart gives a visual representation of the VM migration workflow:

{{% include live-migration %}}

## References

Some links are old but even if many changes occurred, they are relevant for a global understanding
of the XAPI toolstack.
These pages might help for a better understanding of the XAPI toolstack:

- [XAPI architecture](https://xapi-project.github.io/xapi/architecture.html)
- [XAPI dispatcher](https://wiki.xenproject.org/wiki/XAPI_Dispatch)
- [Xenopsd architecture](https://xapi-project.github.io/xenopsd/architecture.html)
- See the [XAPI architecture](../../xapi/_index) for the overall architecture of Xapi
- See the [XAPI dispatcher](https://wiki.xenproject.org/wiki/XAPI_Dispatch) for service dispatch and message forwarding
- See the [Xenopsd architecture](../architecture/_index) for the overall architecture of Xenopsd
- See the [How Xen suspend and resume works](https://mirage.io/docs/xen-suspend) for very similar operations in more detail.
20 changes: 9 additions & 11 deletions doc/content/xenopsd/walkthroughs/VM.start.md
Original file line number Diff line number Diff line change
Expand Up @@ -135,17 +135,15 @@ When the Task has completed successfully, then calls to *.stat will show:
- a valid start time
- valid "targets" for memory and vCPU

Note: before a Task completes, calls to *.stat will show partial updates e.g.
the power state may be Paused but none of the disks may have become plugged.
Note: before a Task completes, calls to *.stat will show partial updates. E.g.
the power state may be paused, but no disk may have been plugged.
UI clients must choose whether they are happy displaying this in-between state
or whether they wish to hide it and pretend the whole operation has happened
transactionally. If a particular client wishes to perform side-effects in
response to Xenopsd state changes -- for example to clean up an external resource
when a VIF becomes unplugged -- then it must be very careful to avoid responding
to these in-between states. Generally it is safest to passively report these
values without driving things directly from them. Think of them as status lights
on the front panel of a PC: fine to look at but it's not a good idea to wire
them up to actuators which actually do things.
transactionally. If a particular, when a client wishes to perform side-effects in
response to `xenopsd` state changes (for example, to clean up an external resource
when a VIF becomes unplugged), it must be very careful to avoid responding
to these in-between states. Generally, it is safest to passively report these
values without driving things directly from them.

Note: the Xenopsd implementation guarantees that, if it is restarted at any point
during the start operation, on restart the VM state shall be "fixed" by either
Expand Down Expand Up @@ -304,7 +302,7 @@ calls bracket plug/unplug. If the "active" flag was set before the unplug
attempt then as soon as the frontend/backend connection is removed clients
would see the VBD as completely dissociated from the VM -- this would be misleading
because Xenopsd will not have had time to use the storage API to release locks
on the disks. By doing all the cleanup before setting "active" to false, clients
on the disks. By cleaning up before setting "active" to false, clients
can be assured that the disks are now free to be reassigned.

## 5. handle non-persistent disks
Expand Down Expand Up @@ -370,7 +368,7 @@ to be the order the nodes were created so this means that (i) xenstored must
continue to store directories as ordered lists rather than maps (which would
be more efficient); and (ii) Xenopsd must make sure to plug the vifs in
the same order. Note that relying on ethX device numbering has always been a
bad idea but is still common. I bet if you change this lots of tests will
bad idea but is still common. I bet if you change this, many tests will
suddenly start to fail!

The function
Expand Down
8 changes: 5 additions & 3 deletions doc/content/xenopsd/walkthroughs/live-migration.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,9 +2,12 @@
title = "Live Migration Sequence Diagram"
linkTitle = "Live Migration"
description = "Sequence diagram of the process of Live Migration."
# Note: This page is included by VM.migrate.md to provide a complete overview
# of the most important parts of live migration. Do not add text as that would
# break the mermaid diagram inclusion.
+++

{{<mermaid align="left">}}
```mermaid
sequenceDiagram
autonumber
participant tx as sender
Expand Down Expand Up @@ -44,5 +47,4 @@ deactivate rx1
tx->>tx: VM_shutdown<br/>VM_remove
deactivate tx

{{< /mermaid >}}
```

0 comments on commit b4a19e9

Please sign in to comment.