Can I configure Hardware Virtualization without stopping the VMs?

devawpz

Member
Sep 21, 2020
30
0
6
Hello, I've enabled Hardware Virtualization on a single node, by following the instructions over here:

https://pve.proxmox.com/wiki/Nested_Virtualization

At one point, the instructions state to unload and load the module:

Code:
# modprobe -r kvm_intel
# modprobe kvm_intel

I did this, but I can't get the module to be unloaded (to later be loaded again). I get this error:

Code:
# modprobe -r kvm_intel
FATAL: Module kvm_intel is in use.

This is a real bummer. Googled a lot for the error, used lsof and lsmod , and what I can really see is that kvm related processes are using this module. The ones that are running the virtual machines.

I've read somewhere that the VMs need to be shut down for these commands to run without error. Is this true? Wouldn't pausing them allow me to run the module commands? How can I avoid the downtime, or at the very least, to reboot the VMs? I would like this to be the least impactful in terms of time, and preferably, keep my VMs uptimes somehow.

Is this possible?
 
Last edited:
Hi,

Unfortunately, it seems the only way to reset the module is to shutdown any running VMs on the node, as when they are in any kind of active state (even paused), they will be using this module.
 
  • Like
Reactions: devawpz
Hi,

Unfortunately, it seems the only way to reset the module is to shutdown any running VMs on the node, as when they are in any kind of active state (even paused), they will be using this module.

Yes, unfortunately I was looking at what happened with suspended VMs, and the process kept running.

I managed to find a workaround, which would be to migrate the VM to another node, this way avoiding the downtime.

I had to do a thing or two to get the VM migrated, but it went in the end. Now that I was trying to migrate it to the original node, I got this error:


Code:
2020-10-06 19:32:44 ERROR: online migrate failure - aborting

2020-10-06 19:32:44 aborting phase 2 - cleanup resources

2020-10-06 19:32:44 migrate_cancel

drive-scsi0: Cancelling block job

channel 4: open failed: connect failed: open failed

channel 3: open failed: connect failed: open failed

channel 3: open failed: connect failed: open failed

channel 3: open failed: connect failed: open failed

drive-scsi0: Done.

2020-10-06 19:32:49 ERROR: migration finished with problems (duration 00:05:06)

TASK ERROR: migration problems


So, I managed to apply the hardware virtualization, no problem on that, because it's almost solved. But now I have this state where I can't migrate a VM back to the original node.

I've tried putting AllowTcpForwarding yes , to handle local and remote port forwarding, but with no success.

Is my cluster screwed up? can I get the VM back in the original node? I'm a bit scared right now.
 
Do you think it could be related?
Aside from the "channel X: open failed..." messages, the error you're receiving doesn't seem to match this, so i don't think it's related.


I had to do a thing or two to get the VM migrated, but it went in the end.
What exactly did you have to do the get the VM to migrate? There could be a clue to the problem there.

Could you post the syslog from around the time of migration? (found in
/var/log/syslog)

Is the VM responsive?
 
  • Like
Reactions: devawpz
What exactly did you have to do the get the VM to migrate? There could be a clue to the problem there.
Not a lot, just remove a cloudinit drive.

qm set <vmid> --ide2 none,media=cdrom

Could you post the syslog from around the time of migration? (found in
/var/log/syslog)
Wouldn't feel too comfortable putting the raw logs out there. Think I can PM them to you?
Is the VM responsive?
Completely. I can do everything I need, the migration is the only thing that fails.
 
Dylanw, I'm trying to PM you, but the system thinks I'm spamming. Not even big messages, just having trouble PMing you.

Is it for being new on the forum?
 
The logs don't give a lot of information, regarding what is going wrong.
Could you post the output of pveversion -v for the two nodes? Perhaps there are differences in package versions, which are creating problems.
Also, are there any non-default options that you're passing to the migrate command? Either through the migration parameter of /etc/pve/datacenter.cfg, or when entering the command itself?
 
  • Like
Reactions: devawpz
The logs don't give a lot of information, regarding what is going wrong.
Could you post the output of pveversion -v for the two nodes? Perhaps there are differences in package versions, which are creating problems.

pveversion -v on the source node:

Code:
root@source:~# pveversion -v
proxmox-ve: 6.2-1 (running kernel: 5.4.44-1-pve)
pve-manager: 6.2-6 (running version: 6.2-6/ee1d7754)
pve-kernel-5.4: 6.2-3
pve-kernel-helper: 6.2-3
pve-kernel-5.4.44-1-pve: 5.4.44-1
ceph-fuse: 12.2.11+dfsg1-2.1+b1
corosync: 3.0.4-pve1
criu: 3.11-3
glusterfs-client: 5.5-3
ifupdown: residual config
ifupdown2: 3.0.0-1+pve2
libjs-extjs: 6.0.1-10
libknet1: 1.16-pve1
libproxmox-acme-perl: 1.0.4
libpve-access-control: 6.1-1
libpve-apiclient-perl: 3.0-3
libpve-common-perl: 6.1-3
libpve-guest-common-perl: 3.0-10
libpve-http-server-perl: 3.0-5
libpve-storage-perl: 6.1-8
libqb0: 1.0.5-1
libspice-server1: 0.14.2-4~pve6+1
lvm2: 2.03.02-pve4
lxc-pve: 4.0.2-1
lxcfs: 4.0.3-pve3
novnc-pve: 1.1.0-1
proxmox-mini-journalreader: 1.1-1
proxmox-widget-toolkit: 2.2-8
pve-cluster: 6.1-8
pve-container: 3.1-8
pve-docs: 6.2-4
pve-edk2-firmware: 2.20200531-1
pve-firewall: 4.1-2
pve-firmware: 3.1-1
pve-ha-manager: 3.0-9
pve-i18n: 2.1-3
pve-qemu-kvm: 5.0.0-4
pve-xtermjs: 4.3.0-1
qemu-server: 6.2-3
smartmontools: 7.1-pve2
spiceterm: 3.1-1
vncterm: 1.6-1
zfsutils-linux: 0.8.4-pve1



pveversion -v on the target node:

Code:
root@target:~# pveversion -v
proxmox-ve: 6.2-1 (running kernel: 5.4.44-2-pve)
pve-manager: 6.2-6 (running version: 6.2-6/ee1d7754)
pve-kernel-5.4: 6.2-4
pve-kernel-helper: 6.2-4
pve-kernel-5.4.44-2-pve: 5.4.44-2
ceph-fuse: 12.2.11+dfsg1-2.1+b1
corosync: 3.0.4-pve1
criu: 3.11-3
glusterfs-client: 5.5-3
ifupdown: residual config
ifupdown2: 3.0.0-1+pve2
libjs-extjs: 6.0.1-10
libknet1: 1.16-pve1
libproxmox-acme-perl: 1.0.4
libpve-access-control: 6.1-1
libpve-apiclient-perl: 3.0-3
libpve-common-perl: 6.1-3
libpve-guest-common-perl: 3.0-10
libpve-http-server-perl: 3.0-5
libpve-storage-perl: 6.1-8
libqb0: 1.0.5-1
libspice-server1: 0.14.2-4~pve6+1
lvm2: 2.03.02-pve4
lxc-pve: 4.0.2-1
lxcfs: 4.0.3-pve3
novnc-pve: 1.1.0-1
proxmox-mini-journalreader: 1.1-1
proxmox-widget-toolkit: 2.2-8
pve-cluster: 6.1-8
pve-container: 3.1-8
pve-docs: 6.2-4
pve-edk2-firmware: 2.20200531-1
pve-firewall: 4.1-2
pve-firmware: 3.1-1
pve-ha-manager: 3.0-9
pve-i18n: 2.1-3
pve-qemu-kvm: 5.0.0-4
pve-xtermjs: 4.3.0-1
qemu-server: 6.2-3
smartmontools: 7.1-pve2
spiceterm: 3.1-1
vncterm: 1.6-1
zfsutils-linux: 0.8.4-pve1

Not the same, but very similar, I guess. Any thoughts?


Also, are there any non-default options that you're passing to the migrate command? Either through the migration parameter of /etc/pve/datacenter.cfg, or when entering the command itself?

No, no non-default options. I'm doing it through the GUI, where I only select the command to migrate, and the node and storage options in the dialog.

File /etc/pve/datacenter.cfg has only this:

migration: type=secure

It only happens with one VM, the other ones I can migrate back and forth.
 
Yeah, I don't think the version difference here is significant...

Are the VMs which migrate successfully on the same storage as the failing VM?
Could you post the output of qm conf VMID?
 
No, I couldn't do it in the end.

Had to take the downtime, migration wasn't possible the first time around, and had a few problems getting the VM to get back running.

Even offline migration had problems.

I'm a bit wary now of setting up HA on this, as I can't afford to have some production VMs offline in case something bad happens.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!