I notice that our 2 nodes in our cluster are greyed out now after update and restart of server.
I see this in logs:
ug 4 18:10:57 pve-2 systemd[1]: pvestatd.service: Found left-over process 21897 (vgs) in control group while starting unit. Ignoring.
Aug 4 18:10:57 pve-2 systemd[1]: This usually indicates unclean termination of a previous run, or service implementation deficiencies.
Aug 4 18:10:57 pve-2 systemd[1]: pvestatd.service: Found left-over process 3643 (vgs) in control group while starting unit. Ignoring.
Aug 4 18:10:57 pve-2 systemd[1]: This usually indicates unclean termination of a previous run, or service implementation deficiencies.
Aug 4 18:10:57 pve-2 systemd[1]: pvestatd.service: Found left-over process 26649 (vgs) in control group while starting unit. Ignoring.
Aug 4 18:10:57 pve-2 systemd[1]: This usually indicates unclean termination of a previous run, or service implementation deficiencies.
Aug 4 18:10:57 pve-2 systemd[1]: pvestatd.service: Found left-over process 29105 (vgs) in control group while starting unit. Ignoring.
Aug 4 18:10:57 pve-2 systemd[1]: This usually indicates unclean termination of a previous run, or service implementation deficiencies.
Aug 4 18:10:57 pve-2 systemd[1]: pvestatd.service: Found left-over process 4118 (vgs) in control group while starting unit. Ignoring.
Aug 4 18:10:57 pve-2 systemd[1]: This usually indicates unclean termination of a previous run, or service implementation deficiencies.
and
Aug 4 18:08:00 pve-2 systemd[1]: Starting Proxmox VE replication runner...
Aug 4 18:08:01 pve-2 systemd[1]: pvesr.service: Succeeded.
Aug 4 18:08:01 pve-2 systemd[1]: Started Proxmox VE replication runner.
Aug 4 18:09:00 pve-2 systemd[1]: Starting Proxmox VE replication runner...
Aug 4 18:09:01 pve-2 systemd[1]: pvesr.service: Succeeded.
Aug 4 18:09:01 pve-2 systemd[1]: Started Proxmox VE replication runner.
Aug 4 18:09:27 pve-2 systemd[1]: pvestatd.service: State 'stop-final-sigterm' timed out. Killing.
Aug 4 18:09:27 pve-2 systemd[1]: pvestatd.service: Killing process 21897 (vgs) with signal SIGKILL.
Aug 4 18:09:27 pve-2 systemd[1]: pvestatd.service: Killing process 3643 (vgs) with signal SIGKILL.
Aug 4 18:09:27 pve-2 systemd[1]: pvestatd.service: Killing process 26649 (vgs) with signal SIGKILL.
Aug 4 18:09:27 pve-2 systemd[1]: pvestatd.service: Killing process 29105 (vgs) with signal SIGKILL.
Aug 4 18:09:27 pve-2 systemd[1]: pvestatd.service: Killing process 4118 (vgs) with signal SIGKILL.
Seems after I restart pvestatd it works fine for awhiel but then nfs servers are still not accessible for backups.
When restarting node however all works for awhile until vgs check runs and the command freezes and does not complete.
Anyone else experiencing this on the new proxmox kernels?
I even tried going down to kernel to the follow:
But it still happens
Could it be that these nodes are using pve-manager 6.2-10 on our cluster but the others are using 6.2-4?
Thoughts?
I see this in logs:
ug 4 18:10:57 pve-2 systemd[1]: pvestatd.service: Found left-over process 21897 (vgs) in control group while starting unit. Ignoring.
Aug 4 18:10:57 pve-2 systemd[1]: This usually indicates unclean termination of a previous run, or service implementation deficiencies.
Aug 4 18:10:57 pve-2 systemd[1]: pvestatd.service: Found left-over process 3643 (vgs) in control group while starting unit. Ignoring.
Aug 4 18:10:57 pve-2 systemd[1]: This usually indicates unclean termination of a previous run, or service implementation deficiencies.
Aug 4 18:10:57 pve-2 systemd[1]: pvestatd.service: Found left-over process 26649 (vgs) in control group while starting unit. Ignoring.
Aug 4 18:10:57 pve-2 systemd[1]: This usually indicates unclean termination of a previous run, or service implementation deficiencies.
Aug 4 18:10:57 pve-2 systemd[1]: pvestatd.service: Found left-over process 29105 (vgs) in control group while starting unit. Ignoring.
Aug 4 18:10:57 pve-2 systemd[1]: This usually indicates unclean termination of a previous run, or service implementation deficiencies.
Aug 4 18:10:57 pve-2 systemd[1]: pvestatd.service: Found left-over process 4118 (vgs) in control group while starting unit. Ignoring.
Aug 4 18:10:57 pve-2 systemd[1]: This usually indicates unclean termination of a previous run, or service implementation deficiencies.
and
Aug 4 18:08:00 pve-2 systemd[1]: Starting Proxmox VE replication runner...
Aug 4 18:08:01 pve-2 systemd[1]: pvesr.service: Succeeded.
Aug 4 18:08:01 pve-2 systemd[1]: Started Proxmox VE replication runner.
Aug 4 18:09:00 pve-2 systemd[1]: Starting Proxmox VE replication runner...
Aug 4 18:09:01 pve-2 systemd[1]: pvesr.service: Succeeded.
Aug 4 18:09:01 pve-2 systemd[1]: Started Proxmox VE replication runner.
Aug 4 18:09:27 pve-2 systemd[1]: pvestatd.service: State 'stop-final-sigterm' timed out. Killing.
Aug 4 18:09:27 pve-2 systemd[1]: pvestatd.service: Killing process 21897 (vgs) with signal SIGKILL.
Aug 4 18:09:27 pve-2 systemd[1]: pvestatd.service: Killing process 3643 (vgs) with signal SIGKILL.
Aug 4 18:09:27 pve-2 systemd[1]: pvestatd.service: Killing process 26649 (vgs) with signal SIGKILL.
Aug 4 18:09:27 pve-2 systemd[1]: pvestatd.service: Killing process 29105 (vgs) with signal SIGKILL.
Aug 4 18:09:27 pve-2 systemd[1]: pvestatd.service: Killing process 4118 (vgs) with signal SIGKILL.
Seems after I restart pvestatd it works fine for awhiel but then nfs servers are still not accessible for backups.
When restarting node however all works for awhile until vgs check runs and the command freezes and does not complete.
Anyone else experiencing this on the new proxmox kernels?
I even tried going down to kernel to the follow:
Kernel Version Linux 5.4.41-1-pve #1 SMP PVE 5.4.41-1 (Fri, 15 May 2020 15:06:08 +0200) |
PVE Manager Version pve-manager/6.2-10/a20769ed |
But it still happens
Could it be that these nodes are using pve-manager 6.2-10 on our cluster but the others are using 6.2-4?
Thoughts?