Proxmox 7.2-11 live migrate between two machines suspend VM but only from one type of HW

Szymons

Member
Feb 11, 2021
69
8
13
Poland
Hello,

I have weird situation when I am live migrating VM between two nodes.
Both have AMD cpu but different type.
First node type :

Code:
CPU(s) 64 x AMD EPYC 7513 32-Core Processor (2 Sockets)
Kernel Version Linux 5.15.60-2-pve #1 SMP PVE 5.15.60-2 (Tue, 04 Oct 2022 16:52:28 +0200)
PVE Manager Version pve-manager/7.2-11/b76d3178

Second node type :
Code:
CPU(s) 128 x AMD EPYC 7662 64-Core Processor (2 Sockets)
Kernel Version Linux 5.15.60-2-pve #1 SMP PVE 5.15.60-2 (Tue, 04 Oct 2022 16:52:28 +0200)
PVE Manager Version pve-manager/7.2-11/b76d3178

And now !
when I am migrating from second type node to first type - everything works.
when I am migrating from first type to second type - vm suspend.


I can reproduce issue.
here is gui log :
Code:
()
2022-10-19 07:02:14 starting migration of VM 99998 to node 'HV-COSSACK-C-01' (10.42.254.135)
2022-10-19 07:02:14 starting VM 99998 on remote node 'HV-COSSACK-C-01'
2022-10-19 07:02:17 start remote tunnel
2022-10-19 07:02:18 ssh tunnel ver 1
2022-10-19 07:02:18 starting online/live migration on unix:/run/qemu-server/99998.migrate
2022-10-19 07:02:18 set migration capabilities
2022-10-19 07:02:18 migration downtime limit: 100 ms
2022-10-19 07:02:18 migration cachesize: 512.0 MiB
2022-10-19 07:02:18 set migration parameters
2022-10-19 07:02:18 start migrate command to unix:/run/qemu-server/99998.migrate
2022-10-19 07:02:19 migration active, transferred 250.5 MiB of 4.0 GiB VM-state, 1000.1 MiB/s
2022-10-19 07:02:20 average migration speed: 2.0 GiB/s - downtime 93 ms
2022-10-19 07:02:20 migration status: completed
2022-10-19 07:02:23 migration finished successfully (duration 00:00:09)
TASK OK

journactl don't have any issues :
Code:
paź 19 06:47:15 hostname-1 sshd[1410497]: Accepted publickey for root from >here is IPv4< port 47388 ssh2: RSA SHA256:68BpbZuE8rw0MCKtQhL4HDNXIgjJOtiI+QSbfvrJugw
paź 19 06:47:15 hostname-1 sshd[1410497]: pam_unix(sshd:session): session opened for user root(uid=0) by (uid=0)
paź 19 06:47:15 hostname-1 systemd-logind[7765]: New session 1094 of user root.
paź 19 06:47:15 hostname-1 systemd[1]: Started Session 1094 of user root.
paź 19 06:47:16 hostname-1 sshd[1410497]: Received disconnect from >here is IPv4< port 47388:11: disconnected by user
paź 19 06:47:16 hostname-1 sshd[1410497]: Disconnected from user root >here is IPv4< port 47388
paź 19 06:47:16 hostname-1 sshd[1410497]: pam_unix(sshd:session): session closed for user root
paź 19 06:47:16 hostname-1 systemd[1]: session-1094.scope: Succeeded.
paź 19 06:47:16 hostname-1 systemd-logind[7765]: Session 1094 logged out. Waiting for processes to exit.
paź 19 06:47:16 hostname-1 systemd-logind[7765]: Removed session 1094.
paź 19 06:47:16 hostname-1 pmxcfs[2562]: [status] notice: received log
paź 19 06:47:16 hostname-1 pmxcfs[2562]: [status] notice: received log
paź 19 06:47:16 hostname-1 sshd[1410532]: Accepted publickey for root from >here is IPv4< port 37682 ssh2: RSA SHA256:gQoCOH/xCQvl/W3jDlzhTmVEdMnPQb9a4rIeEBRpHNo
paź 19 06:47:16 hostname-1 sshd[1410532]: pam_unix(sshd:session): session opened for user root(uid=0) by (uid=0)
paź 19 06:47:16 hostname-1 systemd-logind[7765]: New session 1095 of user root.
paź 19 06:47:16 hostname-1 systemd[1]: Started Session 1095 of user root.
paź 19 06:47:17 hostname-1 sshd[1410541]: Accepted publickey for root from >here is IPv4< port 47394 ssh2: RSA SHA256:68BpbZuE8rw0MCKtQhL4HDNXIgjJOtiI+QSbfvrJugw
paź 19 06:47:17 hostname-1 sshd[1410541]: pam_unix(sshd:session): session opened for user root(uid=0) by (uid=0)
paź 19 06:47:17 hostname-1 systemd-logind[7765]: New session 1096 of user root.
paź 19 06:47:17 hostname-1 systemd[1]: Started Session 1096 of user root.
paź 19 06:47:18 hostname-1 sshd[1410541]: Received disconnect from >here is IPv4< port 47394:11: disconnected by user
paź 19 06:47:18 hostname-1 sshd[1410541]: Disconnected from user root >here is IPv4< port 47394
paź 19 06:47:18 hostname-1 sshd[1410541]: pam_unix(sshd:session): session closed for user root
paź 19 06:47:18 hostname-1 systemd[1]: session-1096.scope: Succeeded.
paź 19 06:47:18 hostname-1 systemd[1]: session-1096.scope: Consumed 1.015s CPU time.
paź 19 06:47:18 hostname-1 systemd-logind[7765]: Session 1096 logged out. Waiting for processes to exit.
paź 19 06:47:18 hostname-1 systemd-logind[7765]: Removed session 1096.
paź 19 06:47:18 hostname-1 pmxcfs[2562]: [status] notice: received log
paź 19 06:47:35 hostname-1 pmxcfs[2562]: [status] notice: received log
paź 19 06:47:35 hostname-1 sshd[1410742]: Accepted publickey for root from >here is IPv4< port 50260 ssh2: RSA SHA256:gQoCOH/xCQvl/W3jDlzhTmVEdMnPQb9a4rIeEBRpHNo
paź 19 06:47:35 hostname-1 sshd[1410742]: pam_unix(sshd:session): session opened for user root(uid=0) by (uid=0)
paź 19 06:47:35 hostname-1 systemd-logind[7765]: New session 1097 of user root.
paź 19 06:47:35 hostname-1 systemd[1]: Started Session 1097 of user root.
paź 19 06:47:42 hostname-1 sshd[1410742]: Received disconnect from >here is IPv4< port 50260:11: disconnected by user
paź 19 06:47:42 hostname-1 sshd[1410742]: Disconnected from user root >here is IPv4< port 50260
paź 19 06:47:42 hostname-1 sshd[1410742]: pam_unix(sshd:session): session closed for user root
paź 19 06:47:42 hostname-1 systemd[1]: session-1097.scope: Succeeded.
paź 19 06:47:42 hostname-1 systemd-logind[7765]: Session 1097 logged out. Waiting for processes to exit.
paź 19 06:47:42 hostname-1 systemd-logind[7765]: Removed session 1097.
paź 19 06:47:42 hostname-1 pmxcfs[2562]: [status] notice: received log

vm not blinking " - "

1666162988569.png
 
When I select specified CPU , EPYC-MILAN , there was few errors on that second type of CPU

Code:
kvm: warning: host doesn't support requested feature: CPUID.01H:ECX.pcid [bit 17]
kvm: warning: host doesn't support requested feature: CPUID.07H:EBX.erms [bit 9]
kvm: warning: host doesn't support requested feature: CPUID.07H:EBX.invpcid [bit 10]
kvm: warning: host doesn't support requested feature: CPUID.07H:ECX.pku [bit 3]
kvm: warning: host doesn't support requested feature: CPUID.07H:EDX.fsrm [bit 4]
kvm: warning: host doesn't support requested feature: CPUID.0DH:EAX [bit 9]
kvm: Host doesn't support requested features
TASK ERROR: start failed: QEMU exited with code 1
 
Hi,
what CPU type did you have before selecting EPYC-MILAN? Can you try upgrading to kernel 5.19 and see if the issue persists?

Please post the output of pveversion -v from both nodes and the VM configuration qm config 99998.
 
cpu -> default (kvm64)


node first type
Code:
proxmox-ve: 7.2-1 (running kernel: 5.15.60-2-pve)
pve-manager: 7.2-11 (running version: 7.2-11/b76d3178)
pve-kernel-helper: 7.2-13
pve-kernel-5.15: 7.2-12
pve-kernel-5.15.60-2-pve: 5.15.60-2
ceph-fuse: 14.2.21-1
corosync: 3.1.5-pve2
criu: 3.15-1+pve-1
glusterfs-client: 9.2-1
ifupdown2: 3.1.0-1+pmx3
libjs-extjs: 7.0.0-1
libknet1: 1.24-pve1
libproxmox-acme-perl: 1.4.2
libproxmox-backup-qemu0: 1.3.1-1
libpve-access-control: 7.2-4
libpve-apiclient-perl: 3.2-1
libpve-common-perl: 7.2-3
libpve-guest-common-perl: 4.1-3
libpve-http-server-perl: 4.1-4
libpve-storage-perl: 7.2-10
libspice-server1: 0.14.3-2.1
lvm2: 2.03.11-2.1
lxc-pve: 5.0.0-3
lxcfs: 4.0.12-pve1
novnc-pve: 1.3.0-3
proxmox-backup-client: 2.2.7-1
proxmox-backup-file-restore: 2.2.7-1
proxmox-mini-journalreader: 1.3-1
proxmox-widget-toolkit: 3.5.1
pve-cluster: 7.2-2
pve-container: 4.2-2
pve-docs: 7.2-2
pve-edk2-firmware: 3.20220526-1
pve-firewall: 4.2-6
pve-firmware: 3.5-4
pve-ha-manager: 3.4.0
pve-i18n: 2.7-2
pve-qemu-kvm: 7.0.0-3
pve-xtermjs: 4.16.0-1
qemu-server: 7.2-4
smartmontools: 7.2-pve3
spiceterm: 3.2-2
swtpm: 0.7.1~bpo11+1
vncterm: 1.7-1
zfsutils-linux: 2.1.6-pve1

node second type :
Code:
proxmox-ve: 7.2-1 (running kernel: 5.15.60-2-pve)
pve-manager: 7.2-11 (running version: 7.2-11/b76d3178)
pve-kernel-helper: 7.2-13
pve-kernel-5.15: 7.2-12
pve-kernel-5.15.60-2-pve: 5.15.60-2
ceph-fuse: 14.2.21-1
corosync: 3.1.5-pve2
criu: 3.15-1+pve-1
glusterfs-client: 9.2-1
ifupdown2: 3.1.0-1+pmx3
libjs-extjs: 7.0.0-1
libknet1: 1.24-pve1
libproxmox-acme-perl: 1.4.2
libproxmox-backup-qemu0: 1.3.1-1
libpve-access-control: 7.2-4
libpve-apiclient-perl: 3.2-1
libpve-common-perl: 7.2-3
libpve-guest-common-perl: 4.1-3
libpve-http-server-perl: 4.1-4
libpve-storage-perl: 7.2-10
libspice-server1: 0.14.3-2.1
lvm2: 2.03.11-2.1
lxc-pve: 5.0.0-3
lxcfs: 4.0.12-pve1
novnc-pve: 1.3.0-3
proxmox-backup-client: 2.2.7-1
proxmox-backup-file-restore: 2.2.7-1
proxmox-mini-journalreader: 1.3-1
proxmox-widget-toolkit: 3.5.1
pve-cluster: 7.2-2
pve-container: 4.2-2
pve-docs: 7.2-2
pve-edk2-firmware: 3.20220526-1
pve-firewall: 4.2-6
pve-firmware: 3.5-4
pve-ha-manager: 3.4.0
pve-i18n: 2.7-2
pve-qemu-kvm: 7.0.0-3
pve-xtermjs: 4.16.0-1
qemu-server: 7.2-4
smartmontools: 7.2-pve3
spiceterm: 3.2-2
swtpm: 0.7.1~bpo11+1
vncterm: 1.7-1
zfsutils-linux: 2.1.6-pve1


vm config
Code:
agent: 1,fstrim_cloned_disks=1
balloon: 0
boot: order=scsi0
cores: 4
description:
memory: 4096
name: 99998-deb2
net0: virtio=02:EA:1A:97:76:E5,bridge=vmbr000004098
numa: 0
ostype: l26
scsi0: mfsstorage:99998/vm-99998-disk-0.qcow2,discard=on,size=10G
scsi1: mfsstorage:99998/vm-99998-disk-1.qcow2,discard=on,size=4G,ssd=1
scsihw: virtio-scsi-pci
smbios1: uuid=644e98d8-6dba-46b2-a094-0718516e7df0
sockets: 1
vmgenid: f2eeef7c-8f43-4853-a476-c5ab48916cfd

I will try to upgrade kernel and send You status.