Hi,
We have a cluster of 6 Proxmox nodes running various versions of Proxmox 5.2 and 5.3.
After upgrading one of the nodes to 5.4 we have multiple problems in the entire cluster:
- vms don't start correctly:
- other commands fail as well, like qm list (hangs)
- pve cluster fails:
Pve versions:
Upgraded Proxmox host:
Not yet upgrade Proxmox host:
Note that on the upgraded Proxmox host we have this:
So we do
Which hangs at:
Please advise.
We have a cluster of 6 Proxmox nodes running various versions of Proxmox 5.2 and 5.3.
After upgrading one of the nodes to 5.4 we have multiple problems in the entire cluster:
- vms don't start correctly:
Code:
May 10 15:50:01 rwb070 pvedaemon[3101]: <root@pam> starting task UPID:rwb070:00002BD2:00076D70:5CD58189:qmstart:10030:root@pam:
May 10 15:50:01 rwb070 pvedaemon[11218]: start VM 10030: UPID:rwb070:00002BD2:00076D70:5CD58189:qmstart:10030:root@pam:
May 10 15:50:01 rwb070 pvedaemon[11218]: start failed: org.freedesktop.systemd1.UnitExists: Unit 10030.scope already exists.
May 10 15:50:01 rwb070 pvedaemon[3101]: <root@pam> end task UPID:rwb070:00002BD2:00076D70:5CD58189:qmstart:10030:root@pam: start failed: org.freedesktop.systemd1.UnitExists: Unit 10030.scope already exists.
- other commands fail as well, like qm list (hangs)
- pve cluster fails:
Pve versions:
Upgraded Proxmox host:
Code:
[16:25:43][root@rwb069(4)]:~
(0)#: pveversion -v
proxmox-ve: 5.4-1 (running kernel: 4.15.18-13-pve)
pve-manager: not correctly installed (running version: 5.4-5/c6fdb264)
pve-kernel-4.15: 5.4-1
pve-kernel-4.13: 5.2-2
pve-kernel-4.15.18-13-pve: 4.15.18-37
pve-kernel-4.15.18-4-pve: 4.15.18-23
pve-kernel-4.13.16-4-pve: 4.13.16-51
pve-kernel-4.13.16-1-pve: 4.13.16-46
pve-kernel-4.10.17-3-pve: 4.10.17-23
pve-kernel-4.10.17-2-pve: 4.10.17-20
corosync: 2.4.4-pve1
criu: 2.11.1-1~bpo90
glusterfs-client: 3.8.8-1
ksm-control-daemon: 1.2-2
libjs-extjs: 6.0.1-2
libpve-access-control: 5.1-8
libpve-apiclient-perl: 2.0-5
libpve-common-perl: 5.0-51
libpve-guest-common-perl: 2.0-20
libpve-http-server-perl: 2.0-13
libpve-storage-perl: 5.0-41
libqb0: 1.0.3-1~bpo9
lvm2: 2.02.168-pve6
lxc-pve: 3.1.0-3
lxcfs: 3.0.3-pve1
novnc-pve: 1.0.0-3
openvswitch-switch: 2.7.0-3
proxmox-widget-toolkit: 1.0-26
pve-cluster: 5.0-36
pve-container: 2.0-37
pve-docs: 5.4-2
pve-edk2-firmware: 1.20190312-1
pve-firewall: 3.0-20
pve-firmware: 2.0-6
pve-ha-manager: not correctly installed
pve-i18n: 1.1-4
pve-libspice-server1: 0.14.1-2
pve-qemu-kvm: 2.12.1-3
pve-xtermjs: 3.12.0-1
qemu-server: not correctly installed
smartmontools: 6.5+svn4324-1
spiceterm: 3.0-5
vncterm: 1.5-3
zfsutils-linux: 0.7.13-pve1~bpo2
Not yet upgrade Proxmox host:
Code:
[14:42:49][root@rwb071(4)]:/var/log/lxc
(1)#: pveversion -v
proxmox-ve: 5.2-2 (running kernel: 4.15.18-4-pve)
pve-manager: 5.2-8 (running version: 5.2-8/fdf39912)
pve-kernel-4.15: 5.2-7
pve-kernel-4.15.18-4-pve: 4.15.18-23
pve-kernel-4.13.13-2-pve: 4.13.13-33
corosync: 2.4.2-pve5
criu: 2.11.1-1~bpo90
glusterfs-client: 3.8.8-1
ksm-control-daemon: 1.2-2
libjs-extjs: 6.0.1-2
libpve-access-control: 5.0-8
libpve-apiclient-perl: 2.0-5
libpve-common-perl: 5.0-38
libpve-guest-common-perl: 2.0-17
libpve-http-server-perl: 2.0-10
libpve-storage-perl: 5.0-27
libqb0: 1.0.1-1
lvm2: 2.02.168-pve6
lxc-pve: 3.0.2+pve1-2
lxcfs: 3.0.0-1
novnc-pve: 1.0.0-2
openvswitch-switch: 2.7.0-3
proxmox-widget-toolkit: 1.0-19
pve-cluster: 5.0-30
pve-container: 2.0-26
pve-docs: 5.2-8
pve-firewall: 3.0-14
pve-firmware: 2.0-5
pve-ha-manager: 2.0-5
pve-i18n: 1.0-6
pve-libspice-server1: 0.12.8-3
pve-qemu-kvm: 2.11.2-1
pve-xtermjs: 1.0-5
qemu-server: 5.0-33
smartmontools: 6.5+svn4324-1
spiceterm: 3.0-5
vncterm: 1.5-3
zfsutils-linux: 0.7.9-pve1~bpo9
Note that on the upgraded Proxmox host we have this:
Code:
pve-manager: not correctly installed (running version: 5.4-5/c6fdb264)
So we do
Code:
[16:25:48][root@rwb069(4)]:~
(0)#: apt install pve-manager
E: dpkg was interrupted, you must manually run 'dpkg --configure -a' to correct the problem.
[16:28:08][root@rwb069(4)]:~
(100)#: dpkg --configure -a
Setting up pve-ha-manager (2.0-9) ...
Which hangs at:
Code:
root 10762 0.0 0.0 95220 6840 ? Ss 16:25 0:00 \_ sshd: root@pts/4
root 10788 0.0 0.0 21576 5504 pts/4 Ss 16:25 0:00 | \_ -bash
root 11002 0.0 0.0 19744 5332 pts/4 S+ 16:28 0:00 | \_ dpkg --configure -a
root 11003 0.0 0.0 4280 1268 pts/4 S+ 16:28 0:00 | \_ /bin/sh /var/lib/dpkg/info/pve-ha-manager.postinst configur
root 11029 0.0 0.0 39596 4824 pts/4 S+ 16:28 0:00 | \_ /bin/systemctl try-restart pve-ha-lrm.service
root 11030 0.0 0.0 37808 2148 pts/4 S+ 16:28 0:00 | \_ /bin/systemd-tty-ask-password-agent --watch
Code:
[16:33:49][root@rwb069(5)]:~
(0)#: systemctl list-jobs
JOB UNIT TYPE STATE
337 pvesr.service start running
550 pve-ha-lrm.service restart running
2 jobs listed.
Code:
[16:37:22][root@rwb069(5)]:~
(0)#: journalctl -u pve-ha-lrm.service
-- Logs begin at Fri 2019-05-10 14:54:11 CEST, end at Fri 2019-05-10 16:37:41 CEST. --
May 10 14:54:23 rwb069 systemd[1]: Starting PVE Local HA Ressource Manager Daemon...
May 10 14:54:24 rwb069 pve-ha-lrm[3305]: starting server
May 10 14:54:24 rwb069 pve-ha-lrm[3305]: status change startup => wait_for_agent_lock
May 10 14:54:24 rwb069 systemd[1]: Started PVE Local HA Ressource Manager Daemon.
May 10 16:28:14 rwb069 systemd[1]: Stopping PVE Local HA Ressource Manager Daemon...
[16:37:45][root@rwb069(5)]:~
(0)#: journalctl -u pvesr.service
-- Logs begin at Fri 2019-05-10 14:54:11 CEST, end at Fri 2019-05-10 16:37:57 CEST. --
May 10 14:55:00 rwb069 systemd[1]: Starting Proxmox VE replication runner...
May 10 14:55:00 rwb069 pvesr[3398]: trying to acquire cfs lock 'file-replication_cfg' ...
May 10 14:55:01 rwb069 pvesr[3398]: trying to acquire cfs lock 'file-replication_cfg' ...
May 10 14:55:02 rwb069 pvesr[3398]: trying to acquire cfs lock 'file-replication_cfg' ...
May 10 14:55:03 rwb069 pvesr[3398]: trying to acquire cfs lock 'file-replication_cfg' ...
May 10 14:55:04 rwb069 pvesr[3398]: trying to acquire cfs lock 'file-replication_cfg' ...
May 10 14:55:05 rwb069 pvesr[3398]: trying to acquire cfs lock 'file-replication_cfg' ...
May 10 14:55:06 rwb069 pvesr[3398]: trying to acquire cfs lock 'file-replication_cfg' ...
May 10 14:55:07 rwb069 pvesr[3398]: trying to acquire cfs lock 'file-replication_cfg' ...
May 10 14:55:08 rwb069 pvesr[3398]: trying to acquire cfs lock 'file-replication_cfg' ...
May 10 14:55:09 rwb069 pvesr[3398]: error with cfs lock 'file-replication_cfg': no quorum!
May 10 14:55:09 rwb069 systemd[1]: pvesr.service: Main process exited, code=exited, status=13/n/a
May 10 14:55:09 rwb069 systemd[1]: Failed to start Proxmox VE replication runner.
May 10 14:55:09 rwb069 systemd[1]: pvesr.service: Unit entered failed state.
May 10 14:55:09 rwb069 systemd[1]: pvesr.service: Failed with result 'exit-code'.
May 10 14:56:00 rwb069 systemd[1]: Starting Proxmox VE replication runner...
May 10 14:56:00 rwb069 pvesr[3516]: trying to acquire cfs lock 'file-replication_cfg' ...
May 10 14:56:01 rwb069 pvesr[3516]: trying to acquire cfs lock 'file-replication_cfg' ...
May 10 14:56:02 rwb069 pvesr[3516]: trying to acquire cfs lock 'file-replication_cfg' ...
May 10 14:56:03 rwb069 pvesr[3516]: trying to acquire cfs lock 'file-replication_cfg' ...
May 10 14:56:04 rwb069 pvesr[3516]: trying to acquire cfs lock 'file-replication_cfg' ...
May 10 14:56:05 rwb069 pvesr[3516]: trying to acquire cfs lock 'file-replication_cfg' ...
May 10 14:56:06 rwb069 pvesr[3516]: trying to acquire cfs lock 'file-replication_cfg' ...
May 10 14:56:07 rwb069 pvesr[3516]: trying to acquire cfs lock 'file-replication_cfg' ...
May 10 14:56:08 rwb069 pvesr[3516]: trying to acquire cfs lock 'file-replication_cfg' ...
May 10 14:56:09 rwb069 pvesr[3516]: error with cfs lock 'file-replication_cfg': no quorum!
May 10 14:56:09 rwb069 systemd[1]: pvesr.service: Main process exited, code=exited, status=13/n/a
May 10 14:56:09 rwb069 systemd[1]: Failed to start Proxmox VE replication runner.
May 10 14:56:09 rwb069 systemd[1]: pvesr.service: Unit entered failed state.
May 10 14:56:09 rwb069 systemd[1]: pvesr.service: Failed with result 'exit-code'.
May 10 14:57:00 rwb069 systemd[1]: Starting Proxmox VE replication runner...
Please advise.