[SOLVED] restarting lxcfs kicks running containers in the balls

grin

Renowned Member
Dec 8, 2008
172
21
83
Hungary
grin.hu
How to restart lxcfs, or, rather, how to resurrect /proc on running containers?

If lxcfs gets restarted by some or other reason all the CT's get choked:

Code:
# ps awuxf
Error: /proc must be mounted
  To mount /proc at boot you need an /etc/fstab line like:
      proc   /proc   proc    defaults
  In the meantime, run "mount proc /proc -t proc"

Is there any way
  1. to make them not to lose proc on lxcfs restart, and
  2. to give them proc back when they did anyway?
 
that's why it gets reloaded on upgrades and not restarted ;) I think you'll need to restart those containers, and not restart lxcfs in the future (it has a live-reloading mechanism built-in for that reason).
 
what exactly should we mention? don't restart random services things might break? we already ensure things get restarted or reloaded on upgrades depending on which is the appropriate solution. there is no need for a manual action unless instructed to do so (e.g., after manually applying some test/debug patch, or guest restart when needed to enable new features).
 
There are numerous times when PVE chokes, the UI become gray, all tasks stuck, system gets in iowait, unresponsive, gets on fire and explode.

Most of the time it requires some services to be restarted, usually after the culprit is found and eredicated, be that some STOP tasks never stopping but putting the CT in iowait, rbd which was screwed by outside factors (in the last month usually pbs), or various network related problems which should not have created a problem but did anyway.

Most often we repeatedly have to restart pvedaemon and pveproxy, but sometimes it isn't enough, and corosync needs to be restarted, along with various other daemons (we have became familiar with in due course). Since not all of them are documented in detail sometimes we have to make [more or less] educated guesses at which one may cause (or resolve) the problem. If you call this "random" then that's what sometimes needed to get stuff fixed. Most of the time there isn't enough data to report the issue, that's why you don't see me complaining monthly. Nevertheless it's not really fair to say "randomly" restarted services: lxcfs was restarted when not the "usual" restarts solved the "stuck CT" problem, which can't be stopped by any means (not even kill -9 or lxc direct tools), and it is one of the "last resort" acts. (Or it was, since, as it turns out, it would never help.)

If the documentation would have mentioned that "never restart this daemon since it will never fix any problem but all CTs will lose /proc until their restart" then obviously I would not have tried to fix the CT problem by restarting (pretty related) lxcfs daemon. I believe this sounds like a reasonable action considering the problem and the available information.

Majority cause of these problems is that CT online migration does not work since if it would I would just migrate them away and restart the node. But since it's not possible it's not easy to actually shut down everything, and some of these don't start that fast either. I do not see any progress regarding that in the last years: neither real onlie, nor something mixed with suspend or hybernate modes. Shutdown usually is not preferred.

That is the longer explanation.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!