Proxmox Cluster error migration - cloud-init

May 24, 2022
133
17
23
Switzerland
Hello Everyone!

I have a problem on some nodes into my cluster and I don't understand the issue.
Sometimes and for some vm's (NOT ALL) we can't migrate to some nodes. Some nodes work and other not.
We have this error:

2023-10-18 14:33:16 ERROR: migration aborted (duration 00:00:00): target node is too old (manager <= 7.2-13) and doesn't support new cloudinit section


We can migrate some vm to PVE26 to PVE20 and some vm not!

Can you help me please?

But all my nodes are running with the same version of ProxMox (The latest)
Examle for the PVE26:
root@pve26:~# pveversion -v
proxmox-ve: 8.0.2 (running kernel: 6.2.16-15-pve)
pve-manager: 8.0.4 (running version: 8.0.4/d258a813cfa6b390)
pve-kernel-6.2: 8.0.5
proxmox-kernel-helper: 8.0.3
proxmox-kernel-6.2.16-15-pve: 6.2.16-15
proxmox-kernel-6.2: 6.2.16-15
proxmox-kernel-6.2.16-14-pve: 6.2.16-14
proxmox-kernel-6.2.16-12-pve: 6.2.16-12
pve-kernel-6.2.16-3-pve: 6.2.16-3
ceph-fuse: 17.2.6-pve1+3
corosync: 3.1.7-pve3
criu: 3.17.1-2
glusterfs-client: 10.3-5
ifupdown2: 3.2.0-1+pmx5
ksm-control-daemon: 1.4-1
libjs-extjs: 7.0.0-4
libknet1: 1.26-pve1
libproxmox-acme-perl: 1.4.6
libproxmox-backup-qemu0: 1.4.0
libproxmox-rs-perl: 0.3.1
libpve-access-control: 8.0.5
libpve-apiclient-perl: 3.3.1
libpve-common-perl: 8.0.9
libpve-guest-common-perl: 5.0.5
libpve-http-server-perl: 5.0.4
libpve-network-perl: 0.8.1
libpve-rs-perl: 0.8.5
libpve-storage-perl: 8.0.2
libspice-server1: 0.15.1-1
lvm2: 2.03.16-2
lxc-pve: 5.0.2-4
lxcfs: 5.0.3-pve3
novnc-pve: 1.4.0-2
openvswitch-switch: 3.1.0-2
proxmox-backup-client: 3.0.3-1
proxmox-backup-file-restore: 3.0.3-1
proxmox-kernel-helper: 8.0.3
proxmox-mail-forward: 0.2.0
proxmox-mini-journalreader: 1.4.0
proxmox-widget-toolkit: 4.0.9
pve-cluster: 8.0.4
pve-container: 5.0.4
pve-docs: 8.0.5
pve-edk2-firmware: 3.20230228-4
pve-firewall: 5.0.3
pve-firmware: 3.8-2
pve-ha-manager: 4.0.2
pve-i18n: 3.0.7
pve-qemu-kvm: 8.0.2-6
pve-xtermjs: 4.16.0-3
qemu-server: 8.0.7
smartmontools: 7.3-pve1
spiceterm: 3.3.0
swtpm: 0.8.0+pve1
vncterm: 1.8.0
zfsutils-linux: 2.1.13-pve1





Example for the PVE20:
root@pve20:~# pveversion -v
proxmox-ve: 8.0.2 (running kernel: 6.2.16-15-pve)
pve-manager: 8.0.4 (running version: 8.0.4/d258a813cfa6b390)
pve-kernel-6.2: 8.0.5
proxmox-kernel-helper: 8.0.3
proxmox-kernel-6.2.16-15-pve: 6.2.16-15
proxmox-kernel-6.2: 6.2.16-15
pve-kernel-6.2.16-3-pve: 6.2.16-3
ceph-fuse: 17.2.6-pve1+3
corosync: 3.1.7-pve3
criu: 3.17.1-2
glusterfs-client: 10.3-5
ifupdown2: 3.2.0-1+pmx5
ksm-control-daemon: 1.4-1
libjs-extjs: 7.0.0-4
libknet1: 1.26-pve1
libproxmox-acme-perl: 1.4.6
libproxmox-backup-qemu0: 1.4.0
libproxmox-rs-perl: 0.3.1
libpve-access-control: 8.0.5
libpve-apiclient-perl: 3.3.1
libpve-common-perl: 8.0.9
libpve-guest-common-perl: 5.0.5
libpve-http-server-perl: 5.0.4
libpve-network-perl: 0.8.1
libpve-rs-perl: 0.8.5
libpve-storage-perl: 8.0.2
libspice-server1: 0.15.1-1
lvm2: 2.03.16-2
lxc-pve: 5.0.2-4
lxcfs: 5.0.3-pve3
novnc-pve: 1.4.0-2
openvswitch-switch: 3.1.0-2
proxmox-backup-client: 3.0.3-1
proxmox-backup-file-restore: 3.0.3-1
proxmox-kernel-helper: 8.0.3
proxmox-mail-forward: 0.2.0
proxmox-mini-journalreader: 1.4.0
proxmox-widget-toolkit: 4.0.9
pve-cluster: 8.0.4
pve-container: 5.0.4
pve-docs: 8.0.5
pve-edk2-firmware: 3.20230228-4
pve-firewall: 5.0.3
pve-firmware: 3.8-2
pve-ha-manager: 4.0.2
pve-i18n: 3.0.7
pve-qemu-kvm: 8.0.2-6
pve-xtermjs: 4.16.0-3
qemu-server: 8.0.7
smartmontools: 7.3-pve1
spiceterm: 3.3.0
swtpm: 0.8.0+pve1
vncterm: 1.8.0
zfsutils-linux: 2.1.13-pve1
 
Hi,
please check journalctl --reverse -u pvestatd.service on both migration source node and target node for any errors. Or the full journal/system logs around the time the issue happens.

It would also be interesting to see what the broadcasted version is. To do so, you can create a script:
Code:
cat node-version.pm
Code:
#!/bin/perl

use strict;
use warnings;

use JSON;

use PVE::Cluster;

my $node = shift or die "usage: $0 <node name>\n";

PVE::Cluster::cfs_update(1);
my $nodes_version_info = PVE::Cluster::get_node_kv('version-info', $node);

print to_json($nodes_version_info, { pretty => 1, canonical => 1});
and run it on the migration source node with perl node-version.pm <node name of migration target>.
 
Hello!
Nothing about the migration request with this command line:
journalctl --reverse -u pvestatd.service on both nodes

About your script I have this result when executing from pve26 with this command
perl node-version.pm pve20

root@pve26:~# perl node-version.pm pve20
{}
root@pve26:~#
 
And the general log:

Oct 19 10:56:50 pve26 sshd[1098343]: Failed password for root from 139.155.140.29 port 41510 ssh2
Oct 19 10:56:51 pve26 sshd[1098343]: Received disconnect from 139.155.140.29 port 41510:11: Bye Bye [preauth]
Oct 19 10:56:51 pve26 sshd[1098343]: Disconnected from authenticating user root 139.155.140.29 port 41510 [preauth]
Oct 19 10:57:18 pve26 sshd[1098828]: Invalid user tv from 123.58.216.78 port 59096
Oct 19 10:57:18 pve26 sshd[1098828]: pam_unix(sshd:auth): check pass; user unknown
Oct 19 10:57:18 pve26 sshd[1098828]: pam_unix(sshd:auth): authentication failure; logname= uid=0 euid=0 tty=ssh ruser= rhost=123.58.216.78
Oct 19 10:57:20 pve26 sshd[1098828]: Failed password for invalid user tv from 123.58.216.78 port 59096 ssh2
Oct 19 10:57:21 pve26 sshd[1098828]: Received disconnect from 123.58.216.78 port 59096:11: Bye Bye [preauth]
Oct 19 10:57:21 pve26 sshd[1098828]: Disconnected from invalid user tv 123.58.216.78 port 59096 [preauth]
Oct 19 10:57:30 pve26 sshd[1099112]: Invalid user test from 128.199.30.145 port 35824
Oct 19 10:57:30 pve26 sshd[1099112]: pam_unix(sshd:auth): check pass; user unknown
Oct 19 10:57:30 pve26 sshd[1099112]: pam_unix(sshd:auth): authentication failure; logname= uid=0 euid=0 tty=ssh ruser= rhost=128.199.30.145
Oct 19 10:57:31 pve26 sshd[1097232]: fatal: Timeout before authentication for 180.101.88.223 port 61605
Oct 19 10:57:32 pve26 sshd[1099112]: Failed password for invalid user test from 128.199.30.145 port 35824 ssh2
Oct 19 10:57:34 pve26 sshd[1099112]: Received disconnect from 128.199.30.145 port 35824:11: Bye Bye [preauth]
Oct 19 10:57:34 pve26 sshd[1099112]: Disconnected from invalid user test 128.199.30.145 port 35824 [preauth]
Oct 19 10:57:34 pve26 pvedaemon[919629]: <root@pam> end task UPID:pve26:0010BB8B:00B53172:6530EEDA:vncshell::root@pam: OK
Oct 19 10:57:34 pve26 pvedaemon[919629]: <root@pam> starting task UPID:pve26:0010C580:00B57167:6530EF7E:vncshell::root@pam:
Oct 19 10:57:34 pve26 pvedaemon[1099136]: starting termproxy UPID:pve26:0010C580:00B57167:6530EF7E:vncshell::root@pam:
Oct 19 10:57:34 pve26 pvedaemon[960990]: <root@pam> successful auth for user 'root@pam'
Oct 19 10:57:34 pve26 login[1099139]: pam_unix(login:session): session opened for user root(uid=0) by root(uid=0)
Oct 19 10:57:34 pve26 systemd-logind[1264]: New session 69 of user root.
Oct 19 10:57:34 pve26 systemd[1]: Started session-69.scope - Session 69 of User root.
Oct 19 10:57:34 pve26 login[1099144]: ROOT LOGIN on '/dev/pts/0'
Oct 19 10:58:10 pve26 pvedaemon[919629]: <root@pam> starting task UPID:pve26:0010C80D:00B57F4F:6530EFA2:qmigrate:1034:root@pam:
Oct 19 10:58:10 pve26 pvedaemon[1099789]: migration aborted
Oct 19 10:58:10 pve26 pvedaemon[919629]: <root@pam> end task UPID:pve26:0010C80D:00B57F4F:6530EFA2:qmigrate:1034:root@pam: migration aborted
Oct 19 10:58:17 pve26 sshd[1099802]: Invalid user admin from 141.98.11.11 port 27728
Oct 19 10:58:17 pve26 sshd[1099802]: pam_unix(sshd:auth): check pass; user unknown
Oct 19 10:58:17 pve26 sshd[1099802]: pam_unix(sshd:auth): authentication failure; logname= uid=0 euid=0 tty=ssh ruser= rhost=141.98.11.11
Oct 19 10:58:19 pve26 sshd[1099804]: Unable to negotiate with 210.207.75.75 port 61872: no matching host key type found. Their offer: ssh-rsa,ssh-dss [preauth]
Oct 19 10:58:20 pve26 sshd[1099802]: Failed password for invalid user admin from 141.98.11.11 port 27728 ssh2
Oct 19 10:58:21 pve26 sshd[1099802]: Received disconnect from 141.98.11.11 port 27728:11: Bye Bye [preauth]
Oct 19 10:58:21 pve26 sshd[1099802]: Disconnected from invalid user admin 141.98.11.11 port 27728 [preauth]
Oct 19 10:58:34 pve26 sshd[1100113]: Invalid user tv from 106.12.139.246 port 48144
Oct 19 10:58:34 pve26 sshd[1100113]: pam_unix(sshd:auth): check pass; user unknown
Oct 19 10:58:34 pve26 sshd[1100113]: pam_unix(sshd:auth): authentication failure; logname= uid=0 euid=0 tty=ssh ruser= rhost=106.12.139.246
Oct 19 10:58:36 pve26 sshd[1100113]: Failed password for invalid user tv from 106.12.139.246 port 48144 ssh2
Oct 19 10:58:37 pve26 sshd[1100113]: Received disconnect from 106.12.139.246 port 48144:11: Bye Bye [preauth]
Oct 19 10:58:37 pve26 sshd[1100113]: Disconnected from invalid user tv 106.12.139.246 port 48144 [preauth]
Oct 19 10:58:39 pve26 sshd[1100116]: pam_unix(sshd:auth): authentication failure; logname= uid=0 euid=0 tty=ssh ruser= rhost=123.58.216.78 user=root
Oct 19 10:58:41 pve26 sshd[1100116]: Failed password for root from 123.58.216.78 port 58240 ssh2
Oct 19 10:58:41 pve26 sshd[1100116]: Received disconnect from 123.58.216.78 port 58240:11: Bye Bye [preauth]
Oct 19 10:58:41 pve26 sshd[1100116]: Disconnected from authenticating user root 123.58.216.78 port 58240 [preauth]
Oct 19 10:58:43 pve26 sshd[1098338]: fatal: Timeout before authentication for 180.101.88.223 port 35945
Oct 19 10:58:50 pve26 sshd[1100274]: Invalid user deploy from 139.155.140.29 port 59070
Oct 19 10:58:50 pve26 sshd[1100274]: pam_unix(sshd:auth): check pass; user unknown
Oct 19 10:58:50 pve26 sshd[1100274]: pam_unix(sshd:auth): authentication failure; logname= uid=0 euid=0 tty=ssh ruser= rhost=139.155.140.29
Oct 19 10:58:51 pve26 sshd[1100274]: Failed password for invalid user deploy from 139.155.140.29 port 59070 ssh2
Oct 19 10:58:52 pve26 sshd[1100274]: Received disconnect from 139.155.140.29 port 59070:11: Bye Bye [preauth]
Oct 19 10:58:52 pve26 sshd[1100274]: Disconnected from invalid user deploy 139.155.140.29 port 59070 [preauth]
Oct 19 10:58:56 pve26 sshd[1100429]: Invalid user teamspeak from 128.199.30.145 port 53584
Oct 19 10:58:56 pve26 sshd[1100429]: pam_unix(sshd:auth): check pass; user unknown
Oct 19 10:58:56 pve26 sshd[1100429]: pam_unix(sshd:auth): authentication failure; logname= uid=0 euid=0 tty=ssh ruser= rhost=128.199.30.145
Oct 19 10:58:58 pve26 sshd[1100429]: Failed password for invalid user teamspeak from 128.199.30.145 port 53584 ssh2
Oct 19 10:58:58 pve26 sshd[1100429]: Received disconnect from 128.199.30.145 port 53584:11: Bye Bye [preauth]
Oct 19 10:58:58 pve26 sshd[1100429]: Disconnected from invalid user teamspeak 128.199.30.145 port 53584 [preauth]
Oct 19 10:59:10 pve26 sshd[1100596]: pam_unix(sshd:auth): authentication failure; logname= uid=0 euid=0 tty=ssh ruser= rhost=218.92.0.27 user=root
Oct 19 10:59:12 pve26 sshd[1100596]: Failed password for root from 218.92.0.27 port 24342 ssh2
Oct 19 10:59:14 pve26 sshd[1100596]: Failed password for root from 218.92.0.27 port 24342 ssh2
Oct 19 10:59:18 pve26 sshd[1100596]: Failed password for root from 218.92.0.27 port 24342 ssh2
Oct 19 10:59:18 pve26 sshd[1100596]: Received disconnect from 218.92.0.27 port 24342:11: [preauth]
Oct 19 10:59:18 pve26 sshd[1100596]: Disconnected from authenticating user root 218.92.0.27 port 24342 [preauth]
Oct 19 10:59:18 pve26 sshd[1100596]: PAM 2 more authentication failures; logname= uid=0 euid=0 tty=ssh ruser= rhost=218.92.0.27 user=root
Oct 19 10:59:31 pve26 sshd[1100911]: pam_unix(sshd:auth): authentication failure; logname= uid=0 euid=0 tty=ssh ruser= rhost=218.92.0.27 user=root
Oct 19 10:59:33 pve26 sshd[1100911]: Failed password for root from 218.92.0.27 port 47420 ssh2
Oct 19 10:59:36 pve26 sshd[1100911]: Failed password for root from 218.92.0.27 port 47420 ssh2
Oct 19 10:59:41 pve26 sshd[1100911]: Failed password for root from 218.92.0.27 port 47420 ssh2
Oct 19 10:59:43 pve26 sshd[1100911]: Received disconnect from 218.92.0.27 port 47420:11: [preauth]
Oct 19 10:59:43 pve26 sshd[1100911]: Disconnected from authenticating user root 218.92.0.27 port 47420 [preauth]
Oct 19 10:59:43 pve26 sshd[1100911]: PAM 2 more authentication failures; logname= uid=0 euid=0 tty=ssh ruser= rhost=218.92.0.27 user=root
Oct 19 10:59:56 pve26 systemd[1]: session-69.scope: Deactivated successfully.
Oct 19 10:59:56 pve26 systemd-logind[1264]: Session 69 logged out. Waiting for processes to exit.
Oct 19 10:59:56 pve26 systemd-logind[1264]: Removed session 69.
Oct 19 10:59:56 pve26 pvedaemon[919629]: <root@pam> end task UPID:pve26:0010C580:00B57167:6530EF7E:vncshell::root@pam: OK
Oct 19 10:59:56 pve26 sshd[1101374]: pam_unix(sshd:auth): authentication failure; logname= uid=0 euid=0 tty=ssh ruser= rhost=218.92.0.27 user=root
Oct 19 10:59:59 pve26 sshd[1101374]: Failed password for root from 218.92.0.27 port 31345 ssh2
Oct 19 11:00:00 pve26 sshd[1101415]: Invalid user dodsserver from 123.58.216.78 port 57378
Oct 19 11:00:00 pve26 sshd[1101415]: pam_unix(sshd:auth): check pass; user unknown
Oct 19 11:00:00 pve26 sshd[1101415]: pam_unix(sshd:auth): authentication failure; logname= uid=0 euid=0 tty=ssh ruser= rhost=123.58.216.78
Oct 19 11:00:00 pve26 sshd[1099630]: fatal: Timeout before authentication for 180.101.88.223 port 60823
Oct 19 11:00:02 pve26 sshd[1101415]: Failed password for invalid user dodsserver from 123.58.216.78 port 57378 ssh2
Oct 19 11:00:03 pve26 sshd[1101374]: Failed password for root from 218.92.0.27 port 31345 ssh2
Oct 19 11:00:04 pve26 sshd[1101415]: Received disconnect from 123.58.216.78 port 57378:11: Bye Bye [preauth]
Oct 19 11:00:04 pve26 sshd[1101415]: Disconnected from invalid user dodsserver 123.58.216.78 port 57378 [preauth]
Oct 19 11:00:05 pve26 pvedaemon[1101572]: starting vnc proxy UPID:pve26:0010CF04:00B5AC86:6530F015:vncproxy:1034:root@pam:
Oct 19 11:00:05 pve26 pvedaemon[913402]: <root@pam> starting task UPID:pve26:0010CF04:00B5AC86:6530F015:vncproxy:1034:root@pam:
Oct 19 11:00:06 pve26 sshd[1101374]: Failed password for root from 218.92.0.27 port 31345 ssh2
Oct 19 11:00:06 pve26 sshd[1101374]: Received disconnect from 218.92.0.27 port 31345:11: [preauth]
Oct 19 11:00:06 pve26 sshd[1101374]: Disconnected from authenticating user root 218.92.0.27 port 31345 [preauth]
Oct 19 11:00:06 pve26 sshd[1101374]: PAM 2 more authentication failures; logname= uid=0 euid=0 tty=ssh ruser= rhost=218.92.0.27 user=root
Oct 19 11:00:13 pve26 pvedaemon[913402]: <root@pam> starting task UPID:pve26:0010CFA0:00B5AF8F:6530F01D:qmigrate:1034:root@pam:
Oct 19 11:00:13 pve26 pvedaemon[1101728]: migration aborted
Oct 19 11:00:13 pve26 pvedaemon[913402]: <root@pam> end task UPID:pve26:0010CFA0:00B5AF8F:6530F01D:qmigrate:1034:root@pam: migration aborted
Oct 19 11:00:16 pve26 pvedaemon[913402]: <root@pam> end task UPID:pve26:0010CF04:00B5AC86:6530F015:vncproxy:1034:root@pam: OK
Oct 19 11:00:27 pve26 sshd[1101916]: Invalid user GarrysMod from 128.199.30.145 port 45772
Oct 19 11:00:27 pve26 sshd[1101916]: pam_unix(sshd:auth): check pass; user unknown
Oct 19 11:00:27 pve26 sshd[1101916]: pam_unix(sshd:auth): authentication failure; logname= uid=0 euid=0 tty=ssh ruser= rhost=128.199.30.145
Oct 19 11:00:29 pve26 sshd[1101916]: Failed password for invalid user GarrysMod from 128.199.30.145 port 45772 ssh2
Oct 19 11:00:30 pve26 sshd[1101916]: Received disconnect from 128.199.30.145 port 45772:11: Bye Bye [preauth]
Oct 19 11:00:30 pve26 sshd[1101916]: Disconnected from invalid user GarrysMod 128.199.30.145 port 45772 [preauth]
 
I have done again the command line perl with check for all the nodes from PVE26 for checking pve20 to pve25

root@pve26:~# perl node-version.pm pve20
{}
root@pve26:~# perl node-version.pm pve21
{
"pve21" : "{\"version\":\"8.0.4\",\"release\":\"8.0\",\"repoid\":\"d258a813cfa6b390\"}"
}
root@pve26:~# perl node-version.pm pve22
{}
root@pve26:~# perl node-version.pm pve23
{}
root@pve26:~# perl node-version.pm pve24
{}
root@pve26:~# perl node-version.pm pve25
{}
root@pve26:~#
 
Hello!
Nothing about the migration request with this command line:
journalctl --reverse -u pvestatd.service on both nodes
But are there any errors/warnings? It's not responsible for the migration request, but it's responsible for broadcasting the node versions...

About your script I have this result when executing from pve26 with this command
perl node-version.pm pve20

root@pve26:~# perl node-version.pm pve20
{}
root@pve26:~#
I have done again the command line perl with check for all the nodes from PVE26 for checking pve20 to pve25

root@pve26:~# perl node-version.pm pve20
{}
root@pve26:~# perl node-version.pm pve21
{
"pve21" : "{\"version\":\"8.0.4\",\"release\":\"8.0\",\"repoid\":\"d258a813cfa6b390\"}"
}
root@pve26:~# perl node-version.pm pve22
{}
root@pve26:~# perl node-version.pm pve23
{}
root@pve26:~# perl node-version.pm pve24
{}
root@pve26:~# perl node-version.pm pve25
{}
root@pve26:~#
...and that's what doesn't seem to work properly for you.

Other services you might want to check are pve-cluster.service and corosync.service. I'd check on both pve20 and pve26.
 
root@pve20:~# systemctl status pve-cluster.service
● pve-cluster.service - The Proxmox VE cluster filesystem
Loaded: loaded (/lib/systemd/system/pve-cluster.service; enabled; preset: enabled)
Active: active (running) since Wed 2023-10-18 01:29:00 CEST; 1 day 13h ago
Main PID: 2062 (pmxcfs)
Tasks: 8 (limit: 57378)
Memory: 57.2M
CPU: 3min 54.009s
CGroup: /system.slice/pve-cluster.service
└─2062 /usr/bin/pmxcfs

Oct 19 14:58:26 pve20 pmxcfs[2062]: [status] notice: received log
Oct 19 14:58:39 pve20 pmxcfs[2062]: [status] notice: received log
Oct 19 14:58:39 pve20 pmxcfs[2062]: [status] notice: received log
Oct 19 14:58:39 pve20 pmxcfs[2062]: [status] notice: received log
Oct 19 14:59:16 pve20 pmxcfs[2062]: [status] notice: received log
Oct 19 14:59:16 pve20 pmxcfs[2062]: [status] notice: received log
Oct 19 14:59:35 pve20 pmxcfs[2062]: [status] notice: received log
Oct 19 14:59:59 pve20 pmxcfs[2062]: [status] notice: received log
Oct 19 14:59:59 pve20 pmxcfs[2062]: [status] notice: received log
Oct 19 15:00:01 pve20 pmxcfs[2062]: [status] notice: received log
root@pve20:~# systemctl status corosync.service
● corosync.service - Corosync Cluster Engine
Loaded: loaded (/lib/systemd/system/corosync.service; enabled; preset: enabled)
Active: active (running) since Wed 2023-10-18 01:29:00 CEST; 1 day 13h ago
Docs: man:corosync
man:corosync.conf
man:corosync_overview
Main PID: 2078 (corosync)
Tasks: 9 (limit: 57378)
Memory: 149.0M
CPU: 41min 22.726s
CGroup: /system.slice/corosync.service
└─2078 /usr/sbin/corosync -f

Oct 19 07:32:49 pve20 corosync[2078]: [TOTEM ] Retransmit List: 104751
Oct 19 07:36:04 pve20 corosync[2078]: [TOTEM ] Retransmit List: 105337
Oct 19 07:40:44 pve20 corosync[2078]: [TOTEM ] Retransmit List: 10640b
Oct 19 07:41:00 pve20 corosync[2078]: [TOTEM ] Retransmit List: 1064fd
Oct 19 07:46:21 pve20 corosync[2078]: [TOTEM ] Retransmit List: 10787f
Oct 19 07:54:04 pve20 corosync[2078]: [TOTEM ] Retransmit List: 10949d
Oct 19 07:59:28 pve20 corosync[2078]: [TOTEM ] Retransmit List: 10a862
Oct 19 08:00:13 pve20 corosync[2078]: [TOTEM ] Retransmit List: 10ab1d
Oct 19 08:17:58 pve20 corosync[2078]: [TOTEM ] Retransmit List: 10eb7f
Oct 19 12:12:04 pve20 corosync[2078]: [TOTEM ] Retransmit List: 144428
root@pve20:~#
 
root@pve26:~# systemctl status pve-cluster.service
● pve-cluster.service - The Proxmox VE cluster filesystem
Loaded: loaded (/lib/systemd/system/pve-cluster.service; enabled; preset: enabled)
Active: active (running) since Wed 2023-10-18 12:02:38 CEST; 1 day 2h ago
Process: 170971 ExecStart=/usr/bin/pmxcfs (code=exited, status=0/SUCCESS)
Main PID: 170972 (pmxcfs)
Tasks: 8 (limit: 348007)
Memory: 58.6M
CPU: 4min 32.822s
CGroup: /system.slice/pve-cluster.service
└─170972 /usr/bin/pmxcfs

Oct 19 14:58:35 pve26 pmxcfs[170972]: [status] notice: received log
Oct 19 14:58:35 pve26 pmxcfs[170972]: [status] notice: received log
Oct 19 14:58:36 pve26 pmxcfs[170972]: [status] notice: received log
Oct 19 14:58:36 pve26 pmxcfs[170972]: [status] notice: received log
Oct 19 14:58:57 pve26 pmxcfs[170972]: [status] notice: received log
Oct 19 14:58:57 pve26 pmxcfs[170972]: [status] notice: received log
Oct 19 14:59:02 pve26 pmxcfs[170972]: [status] notice: received log
Oct 19 14:59:02 pve26 pmxcfs[170972]: [status] notice: received log
Oct 19 14:59:07 pve26 pmxcfs[170972]: [status] notice: received log
Oct 19 14:59:07 pve26 pmxcfs[170972]: [status] notice: received log
root@pve26:~# systemctl status corosync.service
● corosync.service - Corosync Cluster Engine
Loaded: loaded (/lib/systemd/system/corosync.service; enabled; preset: enabled)
Active: active (running) since Wed 2023-10-18 12:02:37 CEST; 1 day 2h ago
Docs: man:corosync
man:corosync.conf
man:corosync_overview
Main PID: 170970 (corosync)
Tasks: 9 (limit: 348007)
Memory: 146.8M
CPU: 35min 22.153s
CGroup: /system.slice/corosync.service
└─170970 /usr/sbin/corosync -f

Oct 19 06:17:23 pve26 corosync[170970]: [KNET ] link: Resetting MTU for link 0 because host 2 joined
Oct 19 06:17:23 pve26 corosync[170970]: [KNET ] host: host: 2 (passive) best link: 0 (pri: 1)
Oct 19 06:17:23 pve26 corosync[170970]: [KNET ] pmtud: Global data MTU changed to: 1397
Oct 19 07:26:56 pve26 corosync[170970]: [KNET ] link: host: 4 link: 0 is down
Oct 19 07:26:56 pve26 corosync[170970]: [KNET ] host: host: 4 (passive) best link: 0 (pri: 1)
Oct 19 07:26:56 pve26 corosync[170970]: [KNET ] host: host: 4 has no active links
Oct 19 07:27:00 pve26 corosync[170970]: [KNET ] rx: host: 4 link: 0 is up
Oct 19 07:27:00 pve26 corosync[170970]: [KNET ] link: Resetting MTU for link 0 because host 4 joined
Oct 19 07:27:00 pve26 corosync[170970]: [KNET ] host: host: 4 (passive) best link: 0 (pri: 1)
Oct 19 07:27:00 pve26 corosync[170970]: [KNET ] pmtud: Global data MTU changed to: 1397
root@pve26:~#
 
Oct 19 14:51:57 pve20 pvestatd[1110]: auth key new enough, skipping rotation
Oct 18 16:24:24 pve20 pvestatd[1110]: PBSBackup: Cannot find datastore 'PBSBackup', check permissions and existence!
Oct 18 16:24:14 pve20 pvestatd[1110]: PBSBackup: Cannot find datastore 'PBSBackup', check permissions and existence!
Oct 18 16:24:05 pve20 pvestatd[1110]: PBSBackup: Cannot find datastore 'PBSBackup', check permissions and existence!
Oct 18 16:23:54 pve20 pvestatd[1110]: PBSBackup: Cannot find datastore 'PBSBackup', check permissions and existence!






root@pve26:~# journalctl --reverse -u pvestatd.service
Oct 18 22:53:24 pve26 pvestatd[1740]: status update time (5.042 seconds)
Oct 18 22:53:05 pve26 pvestatd[1740]: status update time (5.190 seconds)
Oct 18 20:09:15 pve26 pvestatd[1740]: status update time (5.698 seconds)
Oct 18 20:09:08 pve26 pvestatd[1740]: status update time (8.832 seconds)
Oct 18 20:09:07 pve26 pvestatd[1740]: VM 1034 qmp command failed - VM 1034 qmp command 'query-proxmox-support' failed - unable to connect to VM 1034 qmp socket - timeout after 51 retries
Oct 18 20:08:58 pve26 pvestatd[1740]: status update time (8.863 seconds)
Oct 18 20:08:57 pve26 pvestatd[1740]: VM 1034 qmp command failed - VM 1034 qmp command 'query-proxmox-support' failed - unable to connect to VM 1034 qmp socket - timeout after 51 retries
Oct 18 16:24:19 pve26 pvestatd[1740]: PBSBackup: Cannot find datastore 'PBSBackup', check permissions and existence!
Oct 18 16:24:09 pve26 pvestatd[1740]: PBSBackup: Cannot find datastore 'PBSBackup', check permissions and existence!
Oct 18 16:23:59 pve26 pvestatd[1740]: PBSBackup: Cannot find datastore 'PBSBackup', check permissions and existence!
Oct 18 16:23:49 pve26 pvestatd[1740]: PBSBackup: Cannot find datastore 'PBSBackup', check permissions and existence!
Oct 18 16:23:40 pve26 pvestatd[1740]: PBSBackup: Cannot find datastore 'PBSBackup', check permissions and existence!
Oct 18 16:23:30 pve26 pvestatd[1740]: PBSBackup: Cannot find datastore 'PBSBackup', check permissions and existence!
 
@fiona
Very strange thing! I have read this ticket threads
https://bugzilla.proxmox.com/show_bug.cgi?id=4784
And I have do the regenerate image button into cloud-inti section of the VM... And the migration work after that!

What is strange.... It's the VM was created yesterday...
The presence of pending cloudinit changes triggers the check. That's what the older version would get confused about and why regenerating works. I guess we could drop the check now in Proxmox VE 8. But the real question is why your nodes don't have the version-info from each other.

Do you get any result when using journalctl -b | grep "version info" (on some nodes of which pve26 can't see the version)? Can you try systemctl reload-or-restart pvestatd.service on one of the nodes, wait a few seconds and see if the node-version.pm script is able to get the version on pve26 after that?
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!