Proxmox 7.2-11 live migrate between two machines suspend VM but only from one type of HW

Szymons

Member
Feb 11, 2021
68
5
13
Poland
Hello,

I have weird situation when I am live migrating VM between two nodes.
Both have AMD cpu but different type.
First node type :

Code:
CPU(s) 64 x AMD EPYC 7513 32-Core Processor (2 Sockets)
Kernel Version Linux 5.15.60-2-pve #1 SMP PVE 5.15.60-2 (Tue, 04 Oct 2022 16:52:28 +0200)
PVE Manager Version pve-manager/7.2-11/b76d3178

Second node type :
Code:
CPU(s) 128 x AMD EPYC 7662 64-Core Processor (2 Sockets)
Kernel Version Linux 5.15.60-2-pve #1 SMP PVE 5.15.60-2 (Tue, 04 Oct 2022 16:52:28 +0200)
PVE Manager Version pve-manager/7.2-11/b76d3178

And now !
when I am migrating from second type node to first type - everything works.
when I am migrating from first type to second type - vm suspend.


I can reproduce issue.
here is gui log :
Code:
()
2022-10-19 07:02:14 starting migration of VM 99998 to node 'HV-COSSACK-C-01' (10.42.254.135)
2022-10-19 07:02:14 starting VM 99998 on remote node 'HV-COSSACK-C-01'
2022-10-19 07:02:17 start remote tunnel
2022-10-19 07:02:18 ssh tunnel ver 1
2022-10-19 07:02:18 starting online/live migration on unix:/run/qemu-server/99998.migrate
2022-10-19 07:02:18 set migration capabilities
2022-10-19 07:02:18 migration downtime limit: 100 ms
2022-10-19 07:02:18 migration cachesize: 512.0 MiB
2022-10-19 07:02:18 set migration parameters
2022-10-19 07:02:18 start migrate command to unix:/run/qemu-server/99998.migrate
2022-10-19 07:02:19 migration active, transferred 250.5 MiB of 4.0 GiB VM-state, 1000.1 MiB/s
2022-10-19 07:02:20 average migration speed: 2.0 GiB/s - downtime 93 ms
2022-10-19 07:02:20 migration status: completed
2022-10-19 07:02:23 migration finished successfully (duration 00:00:09)
TASK OK

journactl don't have any issues :
Code:
paź 19 06:47:15 hostname-1 sshd[1410497]: Accepted publickey for root from >here is IPv4< port 47388 ssh2: RSA SHA256:68BpbZuE8rw0MCKtQhL4HDNXIgjJOtiI+QSbfvrJugw
paź 19 06:47:15 hostname-1 sshd[1410497]: pam_unix(sshd:session): session opened for user root(uid=0) by (uid=0)
paź 19 06:47:15 hostname-1 systemd-logind[7765]: New session 1094 of user root.
paź 19 06:47:15 hostname-1 systemd[1]: Started Session 1094 of user root.
paź 19 06:47:16 hostname-1 sshd[1410497]: Received disconnect from >here is IPv4< port 47388:11: disconnected by user
paź 19 06:47:16 hostname-1 sshd[1410497]: Disconnected from user root >here is IPv4< port 47388
paź 19 06:47:16 hostname-1 sshd[1410497]: pam_unix(sshd:session): session closed for user root
paź 19 06:47:16 hostname-1 systemd[1]: session-1094.scope: Succeeded.
paź 19 06:47:16 hostname-1 systemd-logind[7765]: Session 1094 logged out. Waiting for processes to exit.
paź 19 06:47:16 hostname-1 systemd-logind[7765]: Removed session 1094.
paź 19 06:47:16 hostname-1 pmxcfs[2562]: [status] notice: received log
paź 19 06:47:16 hostname-1 pmxcfs[2562]: [status] notice: received log
paź 19 06:47:16 hostname-1 sshd[1410532]: Accepted publickey for root from >here is IPv4< port 37682 ssh2: RSA SHA256:gQoCOH/xCQvl/W3jDlzhTmVEdMnPQb9a4rIeEBRpHNo
paź 19 06:47:16 hostname-1 sshd[1410532]: pam_unix(sshd:session): session opened for user root(uid=0) by (uid=0)
paź 19 06:47:16 hostname-1 systemd-logind[7765]: New session 1095 of user root.
paź 19 06:47:16 hostname-1 systemd[1]: Started Session 1095 of user root.
paź 19 06:47:17 hostname-1 sshd[1410541]: Accepted publickey for root from >here is IPv4< port 47394 ssh2: RSA SHA256:68BpbZuE8rw0MCKtQhL4HDNXIgjJOtiI+QSbfvrJugw
paź 19 06:47:17 hostname-1 sshd[1410541]: pam_unix(sshd:session): session opened for user root(uid=0) by (uid=0)
paź 19 06:47:17 hostname-1 systemd-logind[7765]: New session 1096 of user root.
paź 19 06:47:17 hostname-1 systemd[1]: Started Session 1096 of user root.
paź 19 06:47:18 hostname-1 sshd[1410541]: Received disconnect from >here is IPv4< port 47394:11: disconnected by user
paź 19 06:47:18 hostname-1 sshd[1410541]: Disconnected from user root >here is IPv4< port 47394
paź 19 06:47:18 hostname-1 sshd[1410541]: pam_unix(sshd:session): session closed for user root
paź 19 06:47:18 hostname-1 systemd[1]: session-1096.scope: Succeeded.
paź 19 06:47:18 hostname-1 systemd[1]: session-1096.scope: Consumed 1.015s CPU time.
paź 19 06:47:18 hostname-1 systemd-logind[7765]: Session 1096 logged out. Waiting for processes to exit.
paź 19 06:47:18 hostname-1 systemd-logind[7765]: Removed session 1096.
paź 19 06:47:18 hostname-1 pmxcfs[2562]: [status] notice: received log
paź 19 06:47:35 hostname-1 pmxcfs[2562]: [status] notice: received log
paź 19 06:47:35 hostname-1 sshd[1410742]: Accepted publickey for root from >here is IPv4< port 50260 ssh2: RSA SHA256:gQoCOH/xCQvl/W3jDlzhTmVEdMnPQb9a4rIeEBRpHNo
paź 19 06:47:35 hostname-1 sshd[1410742]: pam_unix(sshd:session): session opened for user root(uid=0) by (uid=0)
paź 19 06:47:35 hostname-1 systemd-logind[7765]: New session 1097 of user root.
paź 19 06:47:35 hostname-1 systemd[1]: Started Session 1097 of user root.
paź 19 06:47:42 hostname-1 sshd[1410742]: Received disconnect from >here is IPv4< port 50260:11: disconnected by user
paź 19 06:47:42 hostname-1 sshd[1410742]: Disconnected from user root >here is IPv4< port 50260
paź 19 06:47:42 hostname-1 sshd[1410742]: pam_unix(sshd:session): session closed for user root
paź 19 06:47:42 hostname-1 systemd[1]: session-1097.scope: Succeeded.
paź 19 06:47:42 hostname-1 systemd-logind[7765]: Session 1097 logged out. Waiting for processes to exit.
paź 19 06:47:42 hostname-1 systemd-logind[7765]: Removed session 1097.
paź 19 06:47:42 hostname-1 pmxcfs[2562]: [status] notice: received log

vm not blinking " - "

1666162988569.png
 
When I select specified CPU , EPYC-MILAN , there was few errors on that second type of CPU

Code:
kvm: warning: host doesn't support requested feature: CPUID.01H:ECX.pcid [bit 17]
kvm: warning: host doesn't support requested feature: CPUID.07H:EBX.erms [bit 9]
kvm: warning: host doesn't support requested feature: CPUID.07H:EBX.invpcid [bit 10]
kvm: warning: host doesn't support requested feature: CPUID.07H:ECX.pku [bit 3]
kvm: warning: host doesn't support requested feature: CPUID.07H:EDX.fsrm [bit 4]
kvm: warning: host doesn't support requested feature: CPUID.0DH:EAX [bit 9]
kvm: Host doesn't support requested features
TASK ERROR: start failed: QEMU exited with code 1
 
Hi,
what CPU type did you have before selecting EPYC-MILAN? Can you try upgrading to kernel 5.19 and see if the issue persists?

Please post the output of pveversion -v from both nodes and the VM configuration qm config 99998.
 
cpu -> default (kvm64)


node first type
Code:
proxmox-ve: 7.2-1 (running kernel: 5.15.60-2-pve)
pve-manager: 7.2-11 (running version: 7.2-11/b76d3178)
pve-kernel-helper: 7.2-13
pve-kernel-5.15: 7.2-12
pve-kernel-5.15.60-2-pve: 5.15.60-2
ceph-fuse: 14.2.21-1
corosync: 3.1.5-pve2
criu: 3.15-1+pve-1
glusterfs-client: 9.2-1
ifupdown2: 3.1.0-1+pmx3
libjs-extjs: 7.0.0-1
libknet1: 1.24-pve1
libproxmox-acme-perl: 1.4.2
libproxmox-backup-qemu0: 1.3.1-1
libpve-access-control: 7.2-4
libpve-apiclient-perl: 3.2-1
libpve-common-perl: 7.2-3
libpve-guest-common-perl: 4.1-3
libpve-http-server-perl: 4.1-4
libpve-storage-perl: 7.2-10
libspice-server1: 0.14.3-2.1
lvm2: 2.03.11-2.1
lxc-pve: 5.0.0-3
lxcfs: 4.0.12-pve1
novnc-pve: 1.3.0-3
proxmox-backup-client: 2.2.7-1
proxmox-backup-file-restore: 2.2.7-1
proxmox-mini-journalreader: 1.3-1
proxmox-widget-toolkit: 3.5.1
pve-cluster: 7.2-2
pve-container: 4.2-2
pve-docs: 7.2-2
pve-edk2-firmware: 3.20220526-1
pve-firewall: 4.2-6
pve-firmware: 3.5-4
pve-ha-manager: 3.4.0
pve-i18n: 2.7-2
pve-qemu-kvm: 7.0.0-3
pve-xtermjs: 4.16.0-1
qemu-server: 7.2-4
smartmontools: 7.2-pve3
spiceterm: 3.2-2
swtpm: 0.7.1~bpo11+1
vncterm: 1.7-1
zfsutils-linux: 2.1.6-pve1

node second type :
Code:
proxmox-ve: 7.2-1 (running kernel: 5.15.60-2-pve)
pve-manager: 7.2-11 (running version: 7.2-11/b76d3178)
pve-kernel-helper: 7.2-13
pve-kernel-5.15: 7.2-12
pve-kernel-5.15.60-2-pve: 5.15.60-2
ceph-fuse: 14.2.21-1
corosync: 3.1.5-pve2
criu: 3.15-1+pve-1
glusterfs-client: 9.2-1
ifupdown2: 3.1.0-1+pmx3
libjs-extjs: 7.0.0-1
libknet1: 1.24-pve1
libproxmox-acme-perl: 1.4.2
libproxmox-backup-qemu0: 1.3.1-1
libpve-access-control: 7.2-4
libpve-apiclient-perl: 3.2-1
libpve-common-perl: 7.2-3
libpve-guest-common-perl: 4.1-3
libpve-http-server-perl: 4.1-4
libpve-storage-perl: 7.2-10
libspice-server1: 0.14.3-2.1
lvm2: 2.03.11-2.1
lxc-pve: 5.0.0-3
lxcfs: 4.0.12-pve1
novnc-pve: 1.3.0-3
proxmox-backup-client: 2.2.7-1
proxmox-backup-file-restore: 2.2.7-1
proxmox-mini-journalreader: 1.3-1
proxmox-widget-toolkit: 3.5.1
pve-cluster: 7.2-2
pve-container: 4.2-2
pve-docs: 7.2-2
pve-edk2-firmware: 3.20220526-1
pve-firewall: 4.2-6
pve-firmware: 3.5-4
pve-ha-manager: 3.4.0
pve-i18n: 2.7-2
pve-qemu-kvm: 7.0.0-3
pve-xtermjs: 4.16.0-1
qemu-server: 7.2-4
smartmontools: 7.2-pve3
spiceterm: 3.2-2
swtpm: 0.7.1~bpo11+1
vncterm: 1.7-1
zfsutils-linux: 2.1.6-pve1


vm config
Code:
agent: 1,fstrim_cloned_disks=1
balloon: 0
boot: order=scsi0
cores: 4
description:
memory: 4096
name: 99998-deb2
net0: virtio=02:EA:1A:97:76:E5,bridge=vmbr000004098
numa: 0
ostype: l26
scsi0: mfsstorage:99998/vm-99998-disk-0.qcow2,discard=on,size=10G
scsi1: mfsstorage:99998/vm-99998-disk-1.qcow2,discard=on,size=4G,ssd=1
scsihw: virtio-scsi-pci
smbios1: uuid=644e98d8-6dba-46b2-a094-0718516e7df0
sockets: 1
vmgenid: f2eeef7c-8f43-4853-a476-c5ab48916cfd

I will try to upgrade kernel and send You status.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!