Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

VM Import Controller uses too much memory during QCOW2 conversion phase #51

Merged
merged 1 commit into from
Oct 9, 2024

Conversation

votdev
Copy link
Member

@votdev votdev commented Sep 30, 2024

Problem:
After successful creation of the image in OpenStack, VM Import Controller initiates the download of the image in Harvester.

It seems to happen in two steps:

  • Download the original QCOW2 image from OpenStack to memory ([]byte)
  • run the command qcow2-img convert to RAW.

Seems to be happening here.

This results in memory usage proportional to the QCOW2 image. More exactly, the qcow2-img convert command seems also to use memory, causing the total memory usage to reach : 2x , so for an image of 500Gi, VM Import Controller would need 1TiB of RAM!!

The first step can definitely be avoided.

Solution:

  • Create the volume image using RAW disk format instead of QCOW2, so no conversion is required after downloading. This will reduce memory consumption.
  • Download and write the image file in chunks (32KiB by default), so the whole file doesn't need to be downloaded completely and stored in memory before it is written to disk.
  • Fix a variable name shadowing issue.
  • Improve logging.

Related Issue:
harvester/harvester#6674

Test plan:

  • Check on which node the harvester-vm-import-controller-xxxx pod is running.
  • On this node run ps aux | grep vm-import-controller to get the PID of this process.
  • On this node run top -p <PID_OF_VM_IMPORT_CONTROLLER> to monitor the memory consumption.
  • Create the import manifest openstack_vmi_cirros-tiny.yaml:
apiVersion: migration.harvesterhci.io/v1beta1
kind: VirtualMachineImport
metadata:
  name: cirros-tiny
  namespace: default
spec:
  virtualMachineName: "cirros-tiny"
  networkMapping:
  - sourceNetwork: "shared"
    destinationNetwork: "default/vlan1"
  sourceCluster:
    name: devstack
    namespace: default
    kind: OpenstackSource
    apiVersion: migration.harvesterhci.io/v1beta1
  • Trigger the import via k apply -f openstack_vmi_cirros-tiny.yaml.

The top output should not look like this during the download of the image:

top - 12:02:24 up  3:14,  1 user,  load average: 0.75, 0.77, 0.67
Tasks:   1 total,   0 running,   1 sleeping,   0 stopped,   0 zombie
%Cpu(s): 21.8 us,  3.4 sy,  0.0 ni, 73.9 id,  0.0 wa,  0.0 hi,  0.8 si,  0.0 st
MiB Mem : 15980.77+total, 2315.172 free, 6884.547 used, 7148.234 buff/cache
MiB Swap:    0.000 total,    0.000 free,    0.000 used. 9096.223 avail Mem 

  PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     TIME+ COMMAND                                                                                 
23007 root      20   0 4949428 1.969g  30516 S 0.000 12.62   0:38.83 vm-import-contr

Because the download and writing of the image is done in junks, the memory consumption should be really low like:

top - 12:43:06 up  3:54,  1 user,  load average: 0.61, 1.42, 1.25
Tasks:   1 total,   0 running,   1 sleeping,   0 stopped,   0 zombie
%Cpu(s):  2.8 us,  1.6 sy,  0.0 ni, 94.2 id,  0.2 wa,  0.0 hi,  1.0 si,  0.3 st
MiB Mem : 15980.77+total, 4927.469 free, 4753.789 used, 6660.758 buff/cache
MiB Swap:    0.000 total,    0.000 free,    0.000 used. 11226.98+avail Mem 

  PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     TIME+ COMMAND                                                                                 
 4534 root      20   0 1341272  55908  30584 S 4.983 0.342   0:10.73 vm-import-contr  

The final log output of the import should look like:

time="2024-09-30T12:34:25Z" level=info msg="Applying CRD vmwaresources.migration.harvesterhci.io"
time="2024-09-30T12:34:25Z" level=info msg="Applying CRD openstacksources.migration.harvesterhci.io"
time="2024-09-30T12:34:25Z" level=info msg="Applying CRD virtualmachineimports.migration.harvesterhci.io"
time="2024-09-30T12:34:26Z" level=info msg="Starting migration.harvesterhci.io/v1beta1, Kind=VmwareSource controller"
time="2024-09-30T12:34:26Z" level=info msg="reoncilling vmware migration default/vcsim"
time="2024-09-30T12:34:26Z" level=info msg="Starting migration.harvesterhci.io/v1beta1, Kind=VirtualMachineImport controller"
time="2024-09-30T12:34:26Z" level=info msg="The VM was imported successfully" name=cirros-tiny namespace=default spec.virtualMachineName=cirros-tiny
time="2024-09-30T12:34:26Z" level=info msg="Starting migration.harvesterhci.io/v1beta1, Kind=OpenstackSource controller"
time="2024-09-30T12:34:26Z" level=info msg="reconcilling openstack soure :default/devstack"
time="2024-09-30T12:34:26Z" level=info msg="Starting storage.k8s.io/v1, Kind=StorageClass controller"
time="2024-09-30T12:34:26Z" level=info msg="Starting harvesterhci.io/v1beta1, Kind=VirtualMachineImage controller"
time="2024-09-30T12:41:05Z" level=info msg="Running preflight checks ..." name=cirros-tiny namespace=default spec.virtualMachineName=cirros-tiny
time="2024-09-30T12:41:05Z" level=info msg="Importing client disk images ..." name=cirros-tiny namespace=default spec.virtualMachineName=cirros-tiny
time="2024-09-30T12:41:06Z" level=info msg="Powering off client VM ..." name=cirros-tiny namespace=default spec.virtualMachineName=cirros-tiny
time="2024-09-30T12:41:08Z" level=info msg="Importing client disk images ..." name=cirros-tiny namespace=default spec.virtualMachineName=cirros-tiny
time="2024-09-30T12:41:09Z" level=info msg="Importing client disk images ..." name=cirros-tiny namespace=default spec.virtualMachineName=cirros-tiny
time="2024-09-30T12:41:12Z" level=info msg="Waiting for snapshot to be available" name=cirros-tiny namespace=default snapshot.id=e6e42a30-c44d-46ba-8592-7ffdb8d7c50b snapshot.name=import-controller-cirros-tiny-0 snapshot.size=1 spec.virtualMachineName=cirros-tiny
time="2024-09-30T12:41:15Z" level=info msg="Waiting for volume to be available" name=cirros-tiny namespace=default spec.virtualMachineName=cirros-tiny volume.createdat="2024-09-30 12:41:15.213862 +0000 UTC" volume.id=1ce7fa24-90f7-414a-9fa0-36e9ea191676 volume.snapshotid=e6e42a30-c44d-46ba-8592-7ffdb8d7c50b volume.status=creating
time="2024-09-30T12:41:20Z" level=info msg="Waiting for raw image to be available" image.id=bdae7dc6-d80b-41e5-91fc-80c22e01e5fd image.status=queued name=cirros-tiny namespace=default spec.virtualMachineName=cirros-tiny
time="2024-09-30T12:41:31Z" level=info msg="Downloading raw image" image.id=bdae7dc6-d80b-41e5-91fc-80c22e01e5fd name=cirros-tiny namespace=default spec.virtualMachineName=cirros-tiny
time="2024-09-30T12:47:12Z" level=info msg="Creating VM images ..." name=cirros-tiny namespace=default spec.virtualMachineName=cirros-tiny
time="2024-09-30T12:47:12Z" level=info msg="Evaluating VM images ..." name=cirros-tiny namespace=default spec.virtualMachineName=cirros-tiny
time="2024-09-30T12:47:12Z" level=info msg="Creating VM instances ..." name=cirros-tiny namespace=default spec.virtualMachineName=cirros-tiny
time="2024-09-30T12:47:15Z" level=info msg="Checking VM instances ..." name=cirros-tiny namespace=default spec.virtualMachineName=cirros-tiny
time="2024-09-30T12:52:15Z" level=info msg="Checking VM instances ..." name=cirros-tiny namespace=default spec.virtualMachineName=cirros-tiny
time="2024-09-30T12:52:16Z" level=info msg="The VM was imported successfully" name=cirros-tiny namespace=default spec.virtualMachineName=cirros-tiny

@votdev votdev self-assigned this Sep 30, 2024
@votdev votdev force-pushed the issue_6674_upload_raw_img branch 2 times, most recently from 8130b5a to 808a50b Compare September 30, 2024 13:00
@votdev votdev marked this pull request as ready for review September 30, 2024 13:01
@votdev votdev force-pushed the issue_6674_upload_raw_img branch from 808a50b to d917f12 Compare September 30, 2024 17:57
- Create the volume image using RAW disk format instead of QCOW2, so no conversion is required after downloading. This will reduce memory consumption.
- Download and write the image file in chunks (32KiB by default), so the whole file doesn't need to be downloaded completely and stored in memory before it is written to disk.
- Fix a variable name shadowing issue.
- Improve logging.

Related to: harvester/harvester#6674

Signed-off-by: Volker Theile <[email protected]>
@votdev votdev force-pushed the issue_6674_upload_raw_img branch from d917f12 to 3ee763d Compare October 7, 2024 06:06
Copy link
Collaborator

@ibrokethecloud ibrokethecloud left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

image

lgtm thanks. There is a marked drop in memory utilisation for the vm-import-controller deployment pre and post the change when importing the same 10G vm.

Copy link
Member

@starbops starbops left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thank you!

@starbops starbops merged commit ebf60d5 into harvester:main Oct 9, 2024
4 checks passed
@votdev votdev deleted the issue_6674_upload_raw_img branch October 9, 2024 09:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants