linux 6.8.8-2-pve
pve-manager 8.2.4
7-node cluster
I'm having nodes go grey-question-mark in the UI; no problems with corosync or underlying storage. Restarting pvestatd brings back up the UI, except for cpu/load/memory stats. And it tends to keep hanging unless I reboot, after which it will...
100% correct. I found the unimplemented feature request here:
https://bugzilla.proxmox.com/show_bug.cgi?id=1007
and this discussion
https://forum.proxmox.com/threads/why-do-bind-mounts-prevent-snapshots.85495/
Annoyingly, the mount points are cephfs, which supports snapshots. I'm not sure...
I'm running the latest PVE with containers backed by CEPH. Somewhere in the past I stopped being able to snapshot containers. Backup jobs *can and do* make snapshots, but external tools are failing — for example,
INFO: filesystem type on dumpdir is 'ceph' -using /var/tmp/vzdumptmp547208_105 for...
I have this problem as well (both dashboard modules). Will this actually be ported to ceph 17.x? The ceph mailing list says "latest versions" — currently, 17.2.6 is the latest 17.x release.
Locked up again, different host (does not appear to be a pattern of LXC and host combination yet).
Here's the backup job:
https://pastebin.com/xxvUiKWS
and /var/log/messages:
https://pastebin.com/NSRnjxS7
It certainly appears as if something about the snapshot process is not playing nicely...
@dietmar There's a mix; currently they are on separate VLANs, and on some hosts share physical ports (I'm using openvswitch). No other cluster operations are failing. The problem appears to be intermittent but also possibly narrowed to only one or two LXCs.
Interestingly, for further notice there appear to be rbd errors at the time of the failure:
Jul 30 00:29:04 quarb kernel: [144026.433830] rbd: rbd2: write at objno 1056 2686976~40960 result -108
Jul 30 00:29:04 quarb kernel: [144026.433841] rbd: rbd2: write result -108
Jul 30 00:29:04 quarb...
6-node cluster, all running latest PVE and updated. Underlying VM/LXC storage is ceph. Backups -> cephfs.
In the backup job, syncfs fails, and then the following things happen.
•The node and container icons in the GUI have a grey question mark, but no functions of the UI itself appear to fail...
Thanks @fabian. I had read the release notes about cgroup2 but for some reason I thought the directives were forward compatible. <smh> All working now.
Even more fascinating… attempting to convert a privileged container (mknod=1 still set) to unprivileged fails and destroys the LXC:
recovering backed-up configuration from 'cephfs:backup/vzdump-lxc-105-2021_07_08-20_19_10.tar.zst'
/dev/rbd0
Creating filesystem with 4194304 4k blocks and 1048576...
With PVE 6.4, I had functional tun/tap (think ZeroTier) inside privileged LXC with the following config:
lxc.cgroup.devices.allow: c 10:200 rwm
lxc.mount.entry: /dev/net/tun dev/net/tun none bind,create=file
In PVE 7, with or without features: mknod=1, ZeroTier now fails:
zerotier-one[171]...
Thanks for the reply, @shantanu. I may have misworded my question. I was wondering if you were using nomad to directly create containers (LXC) on the cluster, but it sounds like you are using nomad inside prebuilt VMs.
I'm trying to avoid the overhead of having large, not-very-portable VMs; I'd...
shantanu, are you using nomad "natively" with consul, i.e., managing containers? I'd like to get into this (already using consul) but not sure if someone's created a solution already.
What is ALC? I'm attempting to follow the directions in the wiki for using specific UID mappings in unprivileged containers, and getting the same error. How do I add a UID to a mount? Thanks.
FWIW I am also getting occasional openvswitch crashes similar in nature since the upgrade to 6. Apparently there's no watchdog reboot enabled; system just locks up with kernel logs dumped.
I had to revert to linux bridge, and also removed LACP. Interestingly OVS+LACP works just fine with different NICs. I had three nearly identical servers that experienced the problem, but intermittently -- average working uptime per machine was about 36 hours, which meant that on any given day...
This site uses cookies to help personalise content, tailor your experience and to keep you logged in if you register.
By continuing to use this site, you are consenting to our use of cookies.