Upgrade breaks pvestatd against CTs

grin

Active Member
Dec 8, 2008
133
6
38
Hungary
grin.hu
Mar 19 01:10:30 freddy pvestatd[2324627]: lxc status update error: can't open '/sys/fs/cgroup/blkio/lxc/117/ns/blkio.throttle.io_service_bytes' - No such file or directory
Mar 19 01:10:34 freddy pvestatd[2324627]: lxc console cleanup error: can't open '/sys/fs/cgroup/blkio/lxc/124/ns/blkio.throttle.io_service_bytes' - No such file or directory
Mar 19 01:10:40 freddy pvestatd[2324627]: lxc status update error: can't open '/sys/fs/cgroup/blkio/lxc/110/ns/blkio.throttle.io_service_bytes' - No such file or directory

After upgradiing from 5.2 to 5.3 lots of container related things break until reboot. GUI loses all containers, and I guess starting may not work either.

The main reason seems to be that cgroup directory structure was changed (...blkio/lxc/xxx/ns/ seems to be missing and its content is under ../), and that pvestatd chokes on it completely.
On a machine I was able to get it running by stopping all containers by lxc-stop and when the last one stopped pvestatd was happily finished its aborted loop and everything got itself in order again.

It is obviously a very suboptimal solution, considering that you have been working on online migration for the last 4 years (to phrase it euphemistically) with little success so it's not possible to do that without serious downtime.
 

grin

Active Member
Dec 8, 2008
133
6
38
Hungary
grin.hu
Actually the (somewhat verbose) "fix":

Code:
--- LXC.pm.orig 2019-03-19 01:29:09.176628691 +0100
+++ LXC.pm      2019-03-19 01:39:05.055046924 +0100
@@ -334,6 +334,12 @@
     my $nsdir = $unprivileged ? '' : 'ns/';
     my $path = "/sys/fs/cgroup/$group/lxc/$vmid/${nsdir}$name";

+if( ! -r $path && -r "/sys/fs/cgroup/$group/lxc/$vmid/$name" ) {
+    ## fuck - nsdir is wrong --grin
+#    syslog('info', "nsdir '${nsdir}' is fucked for id=$vmid name=$name unp=$unprivileged; fixing up!");
+    $path = "/sys/fs/cgroup/$group/lxc/$vmid/$name";
+}
+
     return PVE::Tools::file_get_contents($path) if $full;

     return PVE::Tools::file_read_firstline($path);
 
Last edited:

dcsapak

Proxmox Staff Member
Staff member
Feb 1, 2016
5,638
581
133
32
Vienna
After upgradiing from 5.2 to 5.3 lots of container related things break until reboot
this contains at least one kernel upgrade where reboots are expected and necessary anyway
the only thing 'breaking' is the collection of stats
 

grin

Active Member
Dec 8, 2008
133
6
38
Hungary
grin.hu
this contains at least one kernel upgrade where reboots are expected and necessary
This was a rather impolite response.

First, one don't restart the system unless it is absolutely necessary, and as far as I am aware there weren't critical and incompatible kernel issues which would have make reboot compulsory and immediate. Often reboots do not happen the same time as upgrades since upgrade doesn't cause service outage while reboot does (since, as I've mentioned, online migration isn't ready just yet; not to mention misbehaving fencing).

anyway
the only thing 'breaking' is the collection of stats
Nope. Stats do not get collected, all right, but that would be a minor annoyance (though still a bug).

However at that state no data of any CT will be collected, including running status, HA status, or anything at all, and GUI shows a gray question mark as icon and all the data of the CT is blank.
HA seem to work, at least manual intervention, but the UI is definitely useless at that state. You can't even tell the names of the CTs so if you'd decide to HA migrate (which, reminding, requires restart) you can't say which is which, unless either you have an exceptionally good memory or start grepping the actual directories.

As well as I have provided the fix, which, if you look at it, actually checks whether the fix is needed and applies accordingly (since this bug would bite everyone upgrading, as it already did so several times). Would be neat to hear why it is not appropriate (from the engineering standpoint).

I would appreciate if you would approach this from the Solution side, not the "users are stupid, do what I say the way how I say" approach. :-(
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE and Proxmox Mail Gateway. We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get your own in 60 seconds.

Buy now!