Backups often leave my PVE in ? state

timdonovan

Active Member
Feb 3, 2020
79
15
28
37
pve-manager/8.0.3/bbf3993334bfa916

I run a large snapshot backup once a week, to a PBS, and one of those is the backup of an lxc that has an nfs mount passed through to the lxc. Quite often, it seems like the backup hangs (no error) which leaves one my lxc's locked, and the entire node with ?'s everywhere. I can still access the node. I've tried the various commands to restart all the PVE services and 50% of the time it works 100% of the time.

I'm not sure where I should be looking for errors or the source of these hangups. /var/log/syslog doesn't contain anything useful at the time of the backup.

My /etc/fstab is:

192.168.1.206:/Media /mnt/zee/Media nfs auto,rw,noatime,nolock,bg,soft,nfsvers=4,intr,tcp,timeo=50,retrans=5,actimeo=10,retry=5 0 0

I do the usual, kill the vzdump process, then:

service pve-cluster stop service corosync stop service pvestatd stop service pveproxy stop service pvedaemon stop service pve-cluster start service corosync start service pvestatd start service pveproxy start service pvedaemon start

The node then shows as online, and my VM's show as online.

All my storage and lxc's still show ?. Despite this, pvesm status however shows all storage online.
 
Last edited:
Actually this time it's totally broken my PVE node. I even hard powered it off/on.

Only a few lxc's start. The rest all error with:

Oct 02 11:05:34 proxmox-1 cgroup-network[8727]: Cannot open pid_from_cgroup() file '/sys/fs/cgroup/lxc/305/tasks'. Oct 02 11:05:34 proxmox-1 cgroup-network[8727]: Cannot open pid_from_cgroup() file '/sys/fs/cgroup/lxc/305/ns/tasks'. Oct 02 11:05:34 proxmox-1 cgroup-network[8727]: Cannot open pid_from_cgroup() file '/sys/fs/cgroup/lxc/305/ns/dev-mqueue.mount/tasks'. Oct 02 11:05:34 proxmox-1 cgroup-network[8727]: Cannot open pid_from_cgroup() file '/sys/fs/cgroup/lxc/305/ns/user.slice/tasks'.

All storage (even local) shows ?.

Issuing a "pvesm status" hangs caused all the node and all lxcs to to go to ? again.
 
Basically it seems like an NFS mount on the host is causing the PVE node all sorts of issues.I can't even manually mount the NFS share mount anymore - at a guess one of the ? lxc nodes is locking the path or something. The Proxmox Storage Manager is really confused and hangs.

I don't think a simple NFS mount should cause PVE this much issue should it? :/ Is there a safer way to mount NFS, in a way that doesn't kill the entire node when it tries to do anything with it, such as boot or start an lxc or do a backup?

Edit: last post, but it seems this node has lost the ability to make any NFS connections at all, i.e. it's not limited to a single NFS server. It's still odd to me that this would still cause the enter PVE storage service to fall over and report ? for even local / zfs.
 
Last edited:
Been hacking at this all day. Basically ended up removing all references to NFS mounts from my PVE storage and from fstab. The node then finally comes back. As soon as I add NFS shares back (to any location), the node goes down again, pvesm status hangs forever etc. I really don't know what is going on.

The NFS mounts work fine on my other nodes.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!