Hi All,
EDIT: Here's the fix, it's clean for most setups, things like Home Assistant OS may or may not work, still finding out. Ultimately, you'll need to create a file in /etc/docker/ called daemon.json. Inside the file put a line:
This will restore your old config, and you should be good to go. This only applies is you didn't previously explicitly call out the file system driver to use with docker. Previously it detected VFS and used that, without issue. If not specifically assigned in the daemon.json, it will incorrectly pick overlayfs, which is not a functional configuration for nested docker inside LXC on Proxmox.
I updated my 2 hosts (not clustered) and my backup box earlier today with the firmware on the free release repo. I've had nothing but trouble since.
1. I run docker inside LXC containers. All the containers disappeared after the upgrade. I was able to reload a few as I have the compose files, yet many just won't work anymore. Looks like some sort of permissions errors. This happens on 2 physical hosts. Here's an example of the error, this happens on MOST docker containers, yet not all when I try to use compose to bring them back.
"ApplyLayer exit status 1 stdout: stderr: unlinkat /tmp/patch/etc/s6/init: invalid argument"
On the guest LXC, running docker-compose, this happens before it dies, no deployment occurs. Looks to me like something us up with AppArmor/permissions, and there are several threads that point to this, yet no actual resolutions.
I also applied the same patches on my PBS guest, after which nothing could talk to the host anymore, all timeouts. Also, the box would no longer reboot, I had to hard power it down to restore. Luckily, that was virtual, and I was able to restore it. It now works again. It was showing hung process errors, unfortunately, I don't have these errors available as I restored.
Errors on the physical host, pretty generic for backups:
"pve pvestatd[9652]: backup_server: error fetching datastores - 500 Can't connect to 172.16.25.99:8007"
A few other errors of note in the syslog:
"Dec 23 01:41:13 pve kernel: [ 5220.738979] overlayfs: upper fs does not support RENAME_WHITEOUT.
Dec 23 01:41:13 pve kernel: [ 5220.755935] overlayfs: fs on '/var/lib/docker/overlay2/l/4DSKVQHYOJVZ4CNXH6MTZFDSRP' does not support file handles, falling back to xino=off."
"Dec 23 01:32:37 pve systemd[2453747]: /usr/lib/environment.d/99-environment.conf:3: invalid variable name "export PBS_REPOSITORY", ignoring."
So far I have tried pinning the previous kernel, no difference. I also tried the v6 kernel, just for kicks... also no difference, plus since it's advised not to use it at the moment I'm not going to pursue that as a solution. I've also done an update-bash and update-initramfs, just to be sure everything was kosher, ran with no issues, yet didn't fix anything.
My boxes are all 100% ZFS, and there was a ZFS update included, hence the mention. I haven't done a ton of troubleshooting on Proxmox/Linux, still learning, so please let me know if I can provide anything else. I'm hoping for a quick fix as I'm guessing I'm not alone with this latest update. Any help would be appreciated, even if it's rolling back. I backup my physical hosts, yet I'm a little concerned about restoring that when some things are working properly (VM's and containers without nested Docker) and I don't want to mess things up further. So if there are particular things I can restore, that'd be an option. Also looked into rolling back the updates, yet couldn't find a great way to do that. Thanks all!
I've also been spammed with this error for quite some time, every minute in syslog. This was there before the patching, yet not sure why:
"Dec 23 01:19:53 pve lxcfs[8066]: utils.c: 324: read_file_fuse: Write to cache was truncated"
Example of errors inside of one of the LXC's running docker that has the issue, note they are all unprivileged. Again, pointing to permissions issues.
Dec 22 13:50:23 notify rsyslogd[109]: imklog: cannot open kernel log (/proc/kmsg): Permission denied.
Dec 22 13:50:23 notify rsyslogd[109]: activation of module imklog failed [v8.2112.0 try https://www.rsyslog.com/e/2145 ]
Dec 22 13:50:25 notify ebpf.plugin[342]: PROCFILE: Cannot open file '/proc/442/status'
Dec 22 13:50:25 notify ebpf.plugin[342]: Cannot open /proc/442/status
Installed host software packages, the same versions on both malfunctioning boxes:
EDIT: Here's the fix, it's clean for most setups, things like Home Assistant OS may or may not work, still finding out. Ultimately, you'll need to create a file in /etc/docker/ called daemon.json. Inside the file put a line:
Code:
storage-driver": "vfs"
This will restore your old config, and you should be good to go. This only applies is you didn't previously explicitly call out the file system driver to use with docker. Previously it detected VFS and used that, without issue. If not specifically assigned in the daemon.json, it will incorrectly pick overlayfs, which is not a functional configuration for nested docker inside LXC on Proxmox.
I updated my 2 hosts (not clustered) and my backup box earlier today with the firmware on the free release repo. I've had nothing but trouble since.
1. I run docker inside LXC containers. All the containers disappeared after the upgrade. I was able to reload a few as I have the compose files, yet many just won't work anymore. Looks like some sort of permissions errors. This happens on 2 physical hosts. Here's an example of the error, this happens on MOST docker containers, yet not all when I try to use compose to bring them back.
"ApplyLayer exit status 1 stdout: stderr: unlinkat /tmp/patch/etc/s6/init: invalid argument"
On the guest LXC, running docker-compose, this happens before it dies, no deployment occurs. Looks to me like something us up with AppArmor/permissions, and there are several threads that point to this, yet no actual resolutions.
I also applied the same patches on my PBS guest, after which nothing could talk to the host anymore, all timeouts. Also, the box would no longer reboot, I had to hard power it down to restore. Luckily, that was virtual, and I was able to restore it. It now works again. It was showing hung process errors, unfortunately, I don't have these errors available as I restored.
Errors on the physical host, pretty generic for backups:
"pve pvestatd[9652]: backup_server: error fetching datastores - 500 Can't connect to 172.16.25.99:8007"
A few other errors of note in the syslog:
"Dec 23 01:41:13 pve kernel: [ 5220.738979] overlayfs: upper fs does not support RENAME_WHITEOUT.
Dec 23 01:41:13 pve kernel: [ 5220.755935] overlayfs: fs on '/var/lib/docker/overlay2/l/4DSKVQHYOJVZ4CNXH6MTZFDSRP' does not support file handles, falling back to xino=off."
"Dec 23 01:32:37 pve systemd[2453747]: /usr/lib/environment.d/99-environment.conf:3: invalid variable name "export PBS_REPOSITORY", ignoring."
So far I have tried pinning the previous kernel, no difference. I also tried the v6 kernel, just for kicks... also no difference, plus since it's advised not to use it at the moment I'm not going to pursue that as a solution. I've also done an update-bash and update-initramfs, just to be sure everything was kosher, ran with no issues, yet didn't fix anything.
My boxes are all 100% ZFS, and there was a ZFS update included, hence the mention. I haven't done a ton of troubleshooting on Proxmox/Linux, still learning, so please let me know if I can provide anything else. I'm hoping for a quick fix as I'm guessing I'm not alone with this latest update. Any help would be appreciated, even if it's rolling back. I backup my physical hosts, yet I'm a little concerned about restoring that when some things are working properly (VM's and containers without nested Docker) and I don't want to mess things up further. So if there are particular things I can restore, that'd be an option. Also looked into rolling back the updates, yet couldn't find a great way to do that. Thanks all!
I've also been spammed with this error for quite some time, every minute in syslog. This was there before the patching, yet not sure why:
"Dec 23 01:19:53 pve lxcfs[8066]: utils.c: 324: read_file_fuse: Write to cache was truncated"
Example of errors inside of one of the LXC's running docker that has the issue, note they are all unprivileged. Again, pointing to permissions issues.
Dec 22 13:50:23 notify rsyslogd[109]: imklog: cannot open kernel log (/proc/kmsg): Permission denied.
Dec 22 13:50:23 notify rsyslogd[109]: activation of module imklog failed [v8.2112.0 try https://www.rsyslog.com/e/2145 ]
Dec 22 13:50:25 notify ebpf.plugin[342]: PROCFILE: Cannot open file '/proc/442/status'
Dec 22 13:50:25 notify ebpf.plugin[342]: Cannot open /proc/442/status
Installed host software packages, the same versions on both malfunctioning boxes:
Code:
proxmox-ve: 7.3-1 (running kernel: 5.15.83-1-pve)
pve-manager: 7.3-4 (running version: 7.3-4/d69b70d4)
pve-kernel-5.15: 7.3-1
pve-kernel-helper: 7.3-1
pve-kernel-5.15.83-1-pve: 5.15.83-1
pve-kernel-5.15.74-1-pve: 5.15.74-1
ceph-fuse: 14.2.21-1
corosync: 3.1.7-pve1
criu: 3.15-1+pve-1
glusterfs-client: 9.2-1
ifupdown2: 3.1.0-1+pmx3
ksm-control-daemon: 1.4-1
libjs-extjs: 7.0.0-1
libknet1: 1.24-pve2
libproxmox-acme-perl: 1.4.3
libproxmox-backup-qemu0: 1.3.1-1
libpve-access-control: 7.3-1
libpve-apiclient-perl: 3.2-1
libpve-common-perl: 7.3-1
libpve-guest-common-perl: 4.2-3
libpve-http-server-perl: 4.1-5
libpve-storage-perl: 7.3-1
libspice-server1: 0.14.3-2.1
lvm2: 2.03.11-2.1
lxc-pve: 5.0.0-3
lxcfs: 4.0.12-pve1
novnc-pve: 1.3.0-3
openvswitch-switch: 2.15.0+ds1-2+deb11u1
proxmox-backup-client: 2.3.1-1
proxmox-backup-file-restore: 2.3.1-1
proxmox-mini-journalreader: 1.3-1
proxmox-offline-mirror-helper: 0.5.0-1
proxmox-widget-toolkit: 3.5.3
pve-cluster: 7.3-1
pve-container: 4.4-2
pve-docs: 7.3-1
pve-edk2-firmware: 3.20220526-1
pve-firewall: 4.2-7
pve-firmware: 3.6-2
pve-ha-manager: 3.5.1
pve-i18n: 2.8-1
pve-qemu-kvm: 7.1.0-4
pve-xtermjs: 4.16.0-1
qemu-server: 7.3-2
smartmontools: 7.2-pve3
spiceterm: 3.2-2
swtpm: 0.8.0~bpo11+2
vncterm: 1.7-1
zfsutils-linux: 2.1.7-pve1
Last edited: