Looking for some help diagnosing my problem and maybe even finding a solution.
I have a Supermicro x9dri-ln4f+ (dual CPU, 32GB) that was running PVE flawlessly until I took it offline to add a GPU for a Windows VM. The PVE install is running off a USB stick rather than a SSD.
There's also an LSI Megaraid 9240-4i but its only used for storage and all the drives are passed through as single drive RAID 0s into a ZFS pool. Last but not least theres a supermicro nvme to PCIE card with a single NVME drive that I use for VMs.
I made a bunch of bone-headed mistakes along the way and now can't seem to unbork it or even determine what specifically is borked.
1. I moved the two existing cards around to improve airflow off the GPU fans and RAID heatsink and didnt note where the cards were originally.
2. I added the GPU without adding the necessary blacklist options to GRUB.
3. When it didn't boot, I pulled the GPU card and tried to reset the cards to their original slots (but not 100% I've got it right)
4. Even after pulling the GPU, the server still won't come back up.
5. Using the Rescue option from the 6.2.1 installer CD hangs.
6. Using the E option from the boot menu and changing the options from quiet to debug sheds a little light but not much.
The last entries from the console during boot are
Starting Load/save random seed
Started Load/save random seed
*Looooong pause*
"sent watchdog=1 notification"
So I'm not sure what the problem is, but I think the random seed isnt the cause of the hang because the started entry should mean it completed its startup.
It's probable moving the cards around screwed something up or adding the GPU screwed something up.
I don't think it's the USB boot drive because the rescue CD is failing too; but its possible the USB drive was originally 5.2 and was upgraded over time to 6.1 or 6.2 using the CLI package manager and is now borked like the CD (ie, the 6.2.1 CD rescue option has the same bug with UEFI installs as the updated USB system).
Any help would be mightily appreciated.
I have a Supermicro x9dri-ln4f+ (dual CPU, 32GB) that was running PVE flawlessly until I took it offline to add a GPU for a Windows VM. The PVE install is running off a USB stick rather than a SSD.
There's also an LSI Megaraid 9240-4i but its only used for storage and all the drives are passed through as single drive RAID 0s into a ZFS pool. Last but not least theres a supermicro nvme to PCIE card with a single NVME drive that I use for VMs.
I made a bunch of bone-headed mistakes along the way and now can't seem to unbork it or even determine what specifically is borked.
1. I moved the two existing cards around to improve airflow off the GPU fans and RAID heatsink and didnt note where the cards were originally.
2. I added the GPU without adding the necessary blacklist options to GRUB.
3. When it didn't boot, I pulled the GPU card and tried to reset the cards to their original slots (but not 100% I've got it right)
4. Even after pulling the GPU, the server still won't come back up.
5. Using the Rescue option from the 6.2.1 installer CD hangs.
6. Using the E option from the boot menu and changing the options from quiet to debug sheds a little light but not much.
The last entries from the console during boot are
Starting Load/save random seed
Started Load/save random seed
*Looooong pause*
"sent watchdog=1 notification"
So I'm not sure what the problem is, but I think the random seed isnt the cause of the hang because the started entry should mean it completed its startup.
It's probable moving the cards around screwed something up or adding the GPU screwed something up.
I don't think it's the USB boot drive because the rescue CD is failing too; but its possible the USB drive was originally 5.2 and was upgraded over time to 6.1 or 6.2 using the CLI package manager and is now borked like the CD (ie, the 6.2.1 CD rescue option has the same bug with UEFI installs as the updated USB system).
Any help would be mightily appreciated.
Last edited: