Long story short as I can make it, last week I started experiencing increasing CPU IOWAIT loads on my proxmox core server (very old hardware), which seemed odd. A couple times I had to access the OOBM to reboot the headless server, or use the power button on the front as the Proxmox webgui had become completely unresponsive and nothing seemed to happen after hours or even a day.
Sunday evening, I had slowly started each VM and CT after letting the ZFS scrub complete which found no issues. Everything seemed to be going great, then Monday AM it was locked up again. In a hurry on the way out the door, I physically power cycled the server assuming it'd come back online as it had the week before. But that wasn't the case. Since then, the webgui doesn't load, I can't SSH, but once it gets to a certain point, it will respond to ping requests.
I do have proxmox backups on a USB HDD, I was running proxmox backup as one of the CT's on the core, more as a proof of concept, but I never did facilitate a separate replacement. I have the USB drive disconnected from the server for now. I have 2 SATA SSD's, both 256GB. One for Proxmox, the other for ISO storage/VM migration from my old Hyper-V homelab. I have a 6x3TB ZFS RAID 10-style setup that hosts my VM's and CT's. My hardware is old and on a budget, but everything seemed to be running very smoothly since April until last week. I was starting to look into logs where I had time, and I made the horrendously ignorant and stupid error of rebooting before verifying SSH access, or trying a little harder to capture logs. Uhg. Shame on me.
I did run some stress tests and everything passes right now. Memtest, CPU Prime, HDD Smart, etc.
I do utilize GPU passthru. The Intel iGPU to a Windows 10 VM. The nVidia 1660 GPU to a Plex CT for transcoding.
I would prefer not to have to start over with the proxmox install and repair/resolve what I have, but I will admit I am newer to Proxmox and am ignorant to varying levels. I migrated from hyper-v for a number of reasons. Learning this environment, learning GPU passthru for VM's and Plex transcoding, better usage of resources, etc.
Right now I still have the server headless, I'm using Meshcommander + Intel vPRO to gain visual access.
Here is the screen my server will get to if left to boot on its own, it stays here now because the VM's don't load. Once the iGPU gets allocated to the Windows 10 VM I lose signal as expected. But usually 1-3 minutes after getting to this portion, ping starts responding to the static IP of the server. Now it gets there, the IP starts responding, but no WebGUI, no SSH, no VM's load, nada.
So, I created a new bootable Proxmox USB flash drive for UEFI boot, and I enable the advanced menu, from which I've tried Rescue Boot, Test memory, and Install proxmox VE (Terminal UI, debug mode). I don't get very far with either direction in all honesty.
When I choose Rescue Boot, I generally end up here at which point all progress halts:
If I try the Terminal UI install for command prompt, I end up here:
But nothing is showing in my etc directory. So failing to run fsck or anything similar. Maybe I'm doing that wrong, and I intend to grab another Linux distro bootable. Frankly I was hoping to be able to repair install Proxmox (that's the ignorant Windows user in me coming out lol, in-place upgrades have been substantial in rare cases).
I know I haven't provided everything useful, but hopefully this is a start.
System Specs are as follows:
CPU - i7 4770 (4C, 8T, non-k) + Thermalright aftermarket cooler (keeps it below 65C).
MB - Supermicro X10SLQ
RAM - Mushkin 32GB DDR3 (4x8GB DDR3 1600 @ 1600 CL11)
GPU - Asus Phoenix GTX 1660 Super OC 6GB in PCIe 4X Slot (NVENC Encoding for Plex)
Storage:
- SSD 1 - SATA, 256GB for Proxmox OS
- SSD 2 - SATA, 256GB for Hyper-V Migration and Conversion
RAID: Inspur LSI 9300-8i HBA in IT Mode
- 3TB HGST x 6 in ZFS RAID10
I keep all of this in a modified Lenovo TS430 case with 2x4 3.5 backplanes. This is my ongoing homelab on a budget (parent budget).
Things I plan to try:
- Boot to a different live environment, see if I can verify or repair any potential data issues.
- Try not to clean reinstall Proxmox unless no other option.
- If clean install Proxmox, sort out how to restore backup VM's from USB HDD and also run Proxmox backup on mini PC I just picked up (was going to be my pfsense box upgrade from my older box, but proxmoxbackup is a priority atm).
Please help guide me, thanks in advance to any help I receive.
Sunday evening, I had slowly started each VM and CT after letting the ZFS scrub complete which found no issues. Everything seemed to be going great, then Monday AM it was locked up again. In a hurry on the way out the door, I physically power cycled the server assuming it'd come back online as it had the week before. But that wasn't the case. Since then, the webgui doesn't load, I can't SSH, but once it gets to a certain point, it will respond to ping requests.
I do have proxmox backups on a USB HDD, I was running proxmox backup as one of the CT's on the core, more as a proof of concept, but I never did facilitate a separate replacement. I have the USB drive disconnected from the server for now. I have 2 SATA SSD's, both 256GB. One for Proxmox, the other for ISO storage/VM migration from my old Hyper-V homelab. I have a 6x3TB ZFS RAID 10-style setup that hosts my VM's and CT's. My hardware is old and on a budget, but everything seemed to be running very smoothly since April until last week. I was starting to look into logs where I had time, and I made the horrendously ignorant and stupid error of rebooting before verifying SSH access, or trying a little harder to capture logs. Uhg. Shame on me.
I did run some stress tests and everything passes right now. Memtest, CPU Prime, HDD Smart, etc.
I do utilize GPU passthru. The Intel iGPU to a Windows 10 VM. The nVidia 1660 GPU to a Plex CT for transcoding.
I would prefer not to have to start over with the proxmox install and repair/resolve what I have, but I will admit I am newer to Proxmox and am ignorant to varying levels. I migrated from hyper-v for a number of reasons. Learning this environment, learning GPU passthru for VM's and Plex transcoding, better usage of resources, etc.
Right now I still have the server headless, I'm using Meshcommander + Intel vPRO to gain visual access.
Here is the screen my server will get to if left to boot on its own, it stays here now because the VM's don't load. Once the iGPU gets allocated to the Windows 10 VM I lose signal as expected. But usually 1-3 minutes after getting to this portion, ping starts responding to the static IP of the server. Now it gets there, the IP starts responding, but no WebGUI, no SSH, no VM's load, nada.
So, I created a new bootable Proxmox USB flash drive for UEFI boot, and I enable the advanced menu, from which I've tried Rescue Boot, Test memory, and Install proxmox VE (Terminal UI, debug mode). I don't get very far with either direction in all honesty.
When I choose Rescue Boot, I generally end up here at which point all progress halts:
If I try the Terminal UI install for command prompt, I end up here:
But nothing is showing in my etc directory. So failing to run fsck or anything similar. Maybe I'm doing that wrong, and I intend to grab another Linux distro bootable. Frankly I was hoping to be able to repair install Proxmox (that's the ignorant Windows user in me coming out lol, in-place upgrades have been substantial in rare cases).
I know I haven't provided everything useful, but hopefully this is a start.
System Specs are as follows:
CPU - i7 4770 (4C, 8T, non-k) + Thermalright aftermarket cooler (keeps it below 65C).
MB - Supermicro X10SLQ
RAM - Mushkin 32GB DDR3 (4x8GB DDR3 1600 @ 1600 CL11)
GPU - Asus Phoenix GTX 1660 Super OC 6GB in PCIe 4X Slot (NVENC Encoding for Plex)
Storage:
- SSD 1 - SATA, 256GB for Proxmox OS
- SSD 2 - SATA, 256GB for Hyper-V Migration and Conversion
RAID: Inspur LSI 9300-8i HBA in IT Mode
- 3TB HGST x 6 in ZFS RAID10
I keep all of this in a modified Lenovo TS430 case with 2x4 3.5 backplanes. This is my ongoing homelab on a budget (parent budget).
Things I plan to try:
- Boot to a different live environment, see if I can verify or repair any potential data issues.
- Try not to clean reinstall Proxmox unless no other option.
- If clean install Proxmox, sort out how to restore backup VM's from USB HDD and also run Proxmox backup on mini PC I just picked up (was going to be my pfsense box upgrade from my older box, but proxmoxbackup is a priority atm).
Please help guide me, thanks in advance to any help I receive.