Hi all,
I have not yet updated my proxmox clusters to 5.3, because I was too busy, but I am in the process to do so, and as usual, I test the process on a test cluster. It is a three nodes cluster built using nested virtualization, so it is a cluster of three nodes built on a single physical server, and it uses ceph with filestore at this stage.
As it is not a production cluster, it is not supported and use the nosubscription repository. It was on 5.2 version.
So, for the upgrade, I went as usual for apt update, then apt dist-upgrade. The first node went well, I rebooted, everything went fine (pve cluster and ceph OK). Then, I went to the second node, and there the dist-upgrade terminated with an error on postfix :
Part of the packages were installed, so I retried dist-upgrade and got the errors :
At this point, the pve and ceph were completly broken. /etc/pve was empty. For example, trying to retrieve the status gives this :
And systemd is also broken :
After much research, I found that systemd-sysv has been removed by the installation of some packages (presumably zfs ones), and I succeeded in reinstaling it :
So sysinit-core was installed and systemd-sysv was removed. After I reinstalled it, almost all was recovered, but zfs packages where not upgraded, and if I try; I get this :
And I say 'no', because I don't want systemd-sysv to be removed, and breaks everything.
After this, I went on the third node, and things went even worse. The upgrade failed also on postfix, I tried to fix things, with 'dpkg --configure -a', and after the configuration of packages, seeing that the new kernel was installed, I tried to reboot. After reboot, I had no more network (vmbr0 interface was no more there), and the root file system was read only. I had to manually set up an IP on eth0, then remount root as rw, to be able to reinstall systemd-sysv, as before. After this, and a new reboot, I recovered bot pve and ceph...
At this point, I have this version of packages :
So I am, at this point, not very confindent to go for the upgrade of the production clusters. The problem seems related to the recent posts on the upgrade of kernel 4.15.18-12 :
https://forum.proxmox.com/threads/upgrade-to-kernel-4-15-18-12-pve-goes-boom.52564/
I don't know if with the entreprise repository, I will encounter the same problems. So, is there a depency problem with the last proxmox updates ?
I have not yet updated my proxmox clusters to 5.3, because I was too busy, but I am in the process to do so, and as usual, I test the process on a test cluster. It is a three nodes cluster built using nested virtualization, so it is a cluster of three nodes built on a single physical server, and it uses ceph with filestore at this stage.
As it is not a production cluster, it is not supported and use the nosubscription repository. It was on 5.2 version.
So, for the upgrade, I went as usual for apt update, then apt dist-upgrade. The first node went well, I rebooted, everything went fine (pve cluster and ceph OK). Then, I went to the second node, and there the dist-upgrade terminated with an error on postfix :
Code:
# apt-dist-upgrade
...
Errors were encountered while processing:
/tmp/apt-dpkg-install-rAWCBC/071-postfix_3.1.9-0+deb9u2_amd64.deb
E: Sub-process /usr/bin/dpkg returned an error code (1)
Part of the packages were installed, so I retried dist-upgrade and got the errors :
Code:
Reading changelogs... Done
Preconfiguring packages ...
dpkg: file-rc: dependency problems, but removing anyway as you requested:
initscripts depends on sysv-rc | file-rc | openrc; however:
Package sysv-rc is not installed.
Package file-rc is to be removed.
Package openrc is not installed.
sysvinit-core depends on sysv-rc | file-rc | openrc; however:
Package sysv-rc is not installed.
Package file-rc is to be removed.
Package openrc is not installed.
At this point, the pve and ceph were completly broken. /etc/pve was empty. For example, trying to retrieve the status gives this :
Code:
root@prox-nest2:~# pvecm status
ipcc_send_rec[1] failed: Connection refused
ipcc_send_rec[2] failed: Connection refused
ipcc_send_rec[3] failed: Connection refused
Unable to load access control list: Connection refused
And systemd is also broken :
Code:
root@prox-nest2:~# systemctl
Failed to list units: No such method 'ListUnitsFiltered'
After much research, I found that systemd-sysv has been removed by the installation of some packages (presumably zfs ones), and I succeeded in reinstaling it :
Code:
root@prox-nest2:~# apt install systemd-sysv
Reading package lists... Done
Building dependency tree
Reading state information... Done
The following packages will be REMOVED:
sysvinit-core
The following NEW packages will be installed:
systemd-sysv
0 upgraded, 1 newly installed, 1 to remove and 3 not upgraded.
Need to get 82.0 kB of archives.
After this operation, 121 kB disk space will be freed.
Do you want to continue? [Y/n]
So sysinit-core was installed and systemd-sysv was removed. After I reinstalled it, almost all was recovered, but zfs packages where not upgraded, and if I try; I get this :
Code:
root@prox-nest2:~# apt dist-upgrade
Reading package lists... Done
Building dependency tree
Reading state information... Done
Calculating upgrade... Done
The following package was automatically installed and is no longer required:
startpar
Use 'apt autoremove' to remove it.
The following packages will be REMOVED:
systemd-sysv sysv-rc
The following NEW packages will be installed:
file-rc sysvinit-core
The following packages have been kept back:
zfs-initramfs zfs-zed zfsutils-linux
0 upgraded, 2 newly installed, 2 to remove and 3 not upgraded.
Need to get 0 B/173 kB of archives.
After this operation, 100 kB of additional disk space will be used.
Do you want to continue? [Y/n]
And I say 'no', because I don't want systemd-sysv to be removed, and breaks everything.
After this, I went on the third node, and things went even worse. The upgrade failed also on postfix, I tried to fix things, with 'dpkg --configure -a', and after the configuration of packages, seeing that the new kernel was installed, I tried to reboot. After reboot, I had no more network (vmbr0 interface was no more there), and the root file system was read only. I had to manually set up an IP on eth0, then remount root as rw, to be able to reinstall systemd-sysv, as before. After this, and a new reboot, I recovered bot pve and ceph...
At this point, I have this version of packages :
Code:
root@prox-nest2:~# pveversion -v
proxmox-ve: 5.3-1 (running kernel: 4.15.18-12-pve)
pve-manager: 5.3-11 (running version: 5.3-11/d4907f84)
pve-kernel-4.15: 5.3-3
pve-kernel-4.15.18-12-pve: 4.15.18-35
pve-kernel-4.15.17-3-pve: 4.15.17-14
pve-kernel-4.13.8-2-pve: 4.13.8-28
pve-kernel-4.13.4-1-pve: 4.13.4-26
pve-kernel-4.10.17-3-pve: 4.10.17-23
ceph: 12.2.11-pve1
corosync: 2.4.4-pve1
criu: 2.11.1-1~bpo90
glusterfs-client: 3.8.8-1
ksm-control-daemon: 1.2-2
libjs-extjs: 6.0.1-2
libpve-access-control: 5.1-3
libpve-apiclient-perl: 2.0-5
libpve-common-perl: 5.0-47
libpve-guest-common-perl: 2.0-20
libpve-http-server-perl: 2.0-12
libpve-storage-perl: 5.0-39
libqb0: 1.0.3-1~bpo9
lvm2: 2.02.168-pve6
lxc-pve: 3.1.0-3
lxcfs: 3.0.3-pve1
novnc-pve: 1.0.0-3
proxmox-widget-toolkit: 1.0-23
pve-cluster: 5.0-33
pve-container: 2.0-35
pve-docs: 5.3-3
pve-edk2-firmware: 1.20181023-1
pve-firewall: 3.0-18
pve-firmware: 2.0-6
pve-ha-manager: 2.0-8
pve-i18n: 1.0-9
pve-libspice-server1: 0.14.1-2
pve-qemu-kvm: 2.12.1-2
pve-xtermjs: 3.10.1-2
qemu-server: 5.0-47
smartmontools: 6.5+svn4324-1
spiceterm: 3.0-5
vncterm: 1.5-3
zfsutils-linux: 0.7.9-pve1~bpo9
So I am, at this point, not very confindent to go for the upgrade of the production clusters. The problem seems related to the recent posts on the upgrade of kernel 4.15.18-12 :
https://forum.proxmox.com/threads/upgrade-to-kernel-4-15-18-12-pve-goes-boom.52564/
I don't know if with the entreprise repository, I will encounter the same problems. So, is there a depency problem with the last proxmox updates ?
Last edited: