PVE crashing with worker # started message

ShayGus

New Member
Aug 15, 2022
2
0
1
My setup crashing several times a day.
With the following in the system logs.

Aug 15 14:13:28 PVE pvedaemon[65058]: <root@pam> successful auth for user 'root@pam'
Aug 15 14:17:01 PVE CRON[73712]: pam_unix(cron:session): session opened for user root(uid=0) by (uid=0)
Aug 15 14:17:01 PVE CRON[73713]: (root) CMD ( cd / && run-parts --report /etc/cron.hourly)
Aug 15 14:17:01 PVE CRON[73712]: pam_unix(cron:session): session closed for user root
Aug 15 14:24:28 PVE pvedaemon[63495]: <root@pam> successful auth for user 'root@pam'
Aug 15 14:27:12 PVE pveproxy[71942]: worker exit
Aug 15 14:27:12 PVE pveproxy[1114]: worker 71942 finished
Aug 15 14:27:12 PVE pveproxy[1114]: starting 1 worker(s)
Aug 15 14:27:12 PVE pveproxy[1114]: worker 75283 started
Aug 15 14:27:24 PVE pveproxy[69715]: worker exit
Aug 15 14:27:24 PVE pveproxy[1114]: worker 69715 finished
Aug 15 14:27:24 PVE pveproxy[1114]: starting 1 worker(s)
Aug 15 14:27:24 PVE pveproxy[1114]: worker 75309 started
Aug 15 14:28:28 PVE pvedaemon[68623]: <root@pam> successful auth for user 'root@pam'
Aug 15 15:02:32 PVE pvedaemon[63495]: <root@pam> successful auth for user 'root@pam'
Aug 15 15:02:32 PVE pvedaemon[68623]: <root@pam> successful auth for user 'root@pam'
Aug 15 15:13:01 PVE pveproxy[72938]: worker exit
Aug 15 15:13:01 PVE pveproxy[1114]: worker 72938 finished
Aug 15 15:13:01 PVE pveproxy[1114]: starting 1 worker(s)
Aug 15 15:13:01 PVE pveproxy[1114]: worker 82164 started


pveversion --verbose
proxmox-ve: 7.2-1 (running kernel: 5.15.39-3-pve)
pve-manager: 7.2-7 (running version: 7.2-7/d0dd0e85)
pve-kernel-5.15: 7.2-8
pve-kernel-helper: 7.2-8
pve-kernel-5.15.39-3-pve: 5.15.39-3
pve-kernel-5.15.39-1-pve: 5.15.39-1
pve-kernel-5.15.35-1-pve: 5.15.35-3
pve-kernel-5.15.30-2-pve: 5.15.30-3
ceph-fuse: 15.2.16-pve1
corosync: 3.1.5-pve2
criu: 3.15-1+pve-1
glusterfs-client: 9.2-1
ifupdown2: 3.1.0-1+pmx3
ksm-control-daemon: 1.4-1
libjs-extjs: 7.0.0-1
libknet1: 1.24-pve1
libproxmox-acme-perl: 1.4.2
libproxmox-backup-qemu0: 1.3.1-1
libpve-access-control: 7.2-4
libpve-apiclient-perl: 3.2-1
libpve-common-perl: 7.2-2
libpve-guest-common-perl: 4.1-2
libpve-http-server-perl: 4.1-3
libpve-storage-perl: 7.2-7
libspice-server1: 0.14.3-2.1
lvm2: 2.03.11-2.1
lxc-pve: 5.0.0-3
lxcfs: 4.0.12-pve1
novnc-pve: 1.3.0-3
proxmox-backup-client: 2.2.5-1
proxmox-backup-file-restore: 2.2.5-1
proxmox-mini-journalreader: 1.3-1
proxmox-widget-toolkit: 3.5.1
pve-cluster: 7.2-2
pve-container: 4.2-2
pve-docs: 7.2-2
pve-edk2-firmware: 3.20210831-2
pve-firewall: 4.2-5
pve-firmware: 3.5-1
pve-ha-manager: 3.4.0
pve-i18n: 2.7-2
pve-qemu-kvm: 6.2.0-11
pve-xtermjs: 4.16.0-1
qemu-server: 7.2-3
smartmontools: 7.2-pve3
spiceterm: 3.2-2
swtpm: 0.7.1~bpo11+1
vncterm: 1.7-1
zfsutils-linux: 2.1.5-pve1
 
This is happening to me too. We have 7 very different servers, though all intel processors, and they are all crashing intermittently since earlier today. Sometimes a machine stays up for a few minutes, at other times it stays up for more than half an hour.
 
My setup crashing several times a day.
With the following in the system logs.
The logs would not indicate any kind of problem - so maybe the relevant logs don't get written to disk when the system "crashes"
* if you have access - do you see messages on the console when the issue occurs?

just to be on the same page - what does "crash" here mean - does the PVE-host become non-responsive and you need to reset it so that it goes back up, does it reset itself, does the issue go away after a while, do the guests continue to run?
 
The logs would not indicate any kind of problem - so maybe the relevant logs don't get written to disk when the system "crashes"
* if you have access - do you see messages on the console when the issue occurs?

just to be on the same page - what does "crash" here mean - does the PVE-host become non-responsive and you need to reset it so that it goes back up, does it reset itself, does the issue go away after a while, do the guests continue to run?
The host freezes and becomes 100% non responsive and needs a hard reset.
I don't have access to the console.
 
Same here, no log entries...
The Host resets and boots up again.
could not say 100% - but from Journalheute.txt - it seems you have HA active - so my first guess is that your corosync network was not stable - the node lost quorum and fenced itself.

to get a better idea check the other nodes' logs - they should log that the node dropped out (check for messages from corosync, pmxcfs and pve-ha-crm)
 
The host freezes and becomes 100% non responsive and needs a hard reset.
I don't have access to the console.
could be a hardware issue.

one thing that could further help getting information about where the problem ist - is configuring remote syslog via udp.
there are many tutorials about how to set this up - e.g. the debian wiki:
https://wiki.debian.org/Rsyslog

I hope this helps!
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!