Hi my Proxmox hard crashes, no recovery, has to be powered off. It was OK for the first month, I put it in late March, it was a new hardware build. Towards the end of April however i got intermittent crashes. This morning I checked, it had crashed overnight, I restarted it, it crashed in a few minutes. It lasted 30 minutes the time after with no VM's running.
8 x Intel(R) Core(TM) i7-9700K CPU @ 3.60GHz (1 Socket) 32GB RAM, 2x500GB NVMe drives. Linux 5.4.73-1-pve #1 SMP PVE 5.4.73-1 (Mon, 16 Nov 2020 10:52:16 +0100)
The part I suspect on the crash dump is the two 500GB NVMe drives. One is in a PCI slot with a converter. Looking through the latest message on the screen both drives are mentioned. I read some things on ZFS, and unfortunately, whilst not being an idiot I have very little idea on how Linux disks work or memory in how I set these up, but I remember having issues with the disks and booting and had to endlessly switch around the BIOS modes during install before Proxmox would install OK. I've run close to max on resources for short periods OK, but this is mostly overnight with a server using barely a quarter of the resources that it happens. Its a gaming server with lots of IO, and a SQL+Web.
I'd like some advice here please. I am genuinely surprised, I do expect hosts not to crash fully and stop responding and bring down all their clients. I do not fully understand the issue or what action to take.
8 x Intel(R) Core(TM) i7-9700K CPU @ 3.60GHz (1 Socket) 32GB RAM, 2x500GB NVMe drives. Linux 5.4.73-1-pve #1 SMP PVE 5.4.73-1 (Mon, 16 Nov 2020 10:52:16 +0100)
The part I suspect on the crash dump is the two 500GB NVMe drives. One is in a PCI slot with a converter. Looking through the latest message on the screen both drives are mentioned. I read some things on ZFS, and unfortunately, whilst not being an idiot I have very little idea on how Linux disks work or memory in how I set these up, but I remember having issues with the disks and booting and had to endlessly switch around the BIOS modes during install before Proxmox would install OK. I've run close to max on resources for short periods OK, but this is mostly overnight with a server using barely a quarter of the resources that it happens. Its a gaming server with lots of IO, and a SQL+Web.
I'd like some advice here please. I am genuinely surprised, I do expect hosts not to crash fully and stop responding and bring down all their clients. I do not fully understand the issue or what action to take.