Proxmox not booting anymore after motherboard swap

Sir-robin10 · Aug 11, 2020

Hi,

I've an (small ,it's a really big one...) issue...

I have had proxmox running just fine for like idk, couple of months, but now I decided to change the motherboard. Everyting works just fine, and pops up as it should in the BIOS... HOWEVER, I tried to have the proxmox working again, but ith issue here that I have now, is that, when I try to boot into proxmox itself, I get some sort of weird I/O error, that I don't understand at all.... When I google the error I get here and find out that the drive woudl be full, but that's not posible, then I get that the drive should be broken, that would be weird that the drive would be broken if the drive can be seen just fine, no?

now the error I get, is the following:

I have 12 HDD's connected (via 2 PCIe devices), 1 NVME drive, 1 SSD (boot drive) and 1 HDD internal as backup drive.

Motherboard: aorus pro x570
CPU: AMD 3700 X
GPU: Quadro P2000

Anyone who can help me out resolving this issue? The pve instance is 'just' migrated (hardware) from motherboard A (b450 pro max) to motherboard B (aorus pro x570)

I'd love to have the PVE instance back up and running today.... So I'll continue to try and troubleshootthe whole time...

Side note: All of my VMs are hosted on the nvme drive, thus they 'shouldnt' be lost... I also don't want to loose the ZFS pool that holds over 50TB of data....

Thanks in advance!

kindest regards,

Robin

illustris · Aug 11, 2020

Do you have any VMs with PCIe passthrough? Are they set to autostart on boot? IOMMU groupings and PCIe addresses would have changed when you changed the motherboard. I had a similar issue post a bios update, where a VM would try to reset the wrong PCIe device after a BIOS update. making the system hang on boot.

Sir-robin10 · Aug 11, 2020

illustris said:
Do you have any VMs with PCIe passthrough? Are they set to autostart on boot? IOMMU groupings and PCIe addresses would have changed when you changed the motherboard. I had a similar issue post a bios update, where a VM would try to reset the wrong PCIe device after a BIOS update. making the system hang on boot.

I think I've had one vm with passtrough, however when I try to move the pci device (the gpu) then the result stays the same... is there anything else I can try?

Sir-robin10 · Aug 11, 2020

Maybe not really needed, but this is my bios screen;

illustris · Aug 12, 2020

Sir-robin10 said:
I think I've had one vm with passtrough, however when I try to move the pci device (the gpu) then the result stays the same... is there anything else I can try?

The issue is likely not the GPU itself. The address that the GPU formerly used might now be used by something else (The SATA controller, USB controllers, Audio device). that other device might crash your system when attempting a reset for passthrough.

Just to confirm, is this VM set to autostart on boot?

Sir-robin10 · Aug 12, 2020

illustris said:
The issue is likely not the GPU itself. The address that the GPU formerly used might now be used by something else (The SATA controller, USB controllers, Audio device). that other device might crash your system when attempting a reset for passthrough.

Just to confirm, is this VM set to autostart on boot?

If I remember correctly, yes. I don't really want to pull everything out again and then move it back to the old board etc... I DO hope that there is some sort of "easy" solution to this...

illustris · Aug 13, 2020

Sir-robin10 said:
If I remember correctly, yes. I don't really want to pull everything out again and then move it back to the old board etc... I DO hope that there is some sort of "easy" solution to this...

back when I had this issue, I had a 60-second window between node startup (and becoming reachable over the network) and the VM autostart which would cause the node to go unresponsive. I don't remember if this is because I specified an autostart delay. I wrote a shell script to ssh and disable autostart as soon as the node was reachable. An easier solution might be to boot into a recovery shell, mount pmxcfs, and modify the conf (assuming you're not running a cluster).

Of course, there probably are easier ways of doing this. If anyone else has a better way, I'd like to know.

Sir-robin10 · Aug 13, 2020

I was thinking to like boot ubuntu off an usb stick and then modify the files, since it's almost instant that the vm crashes, it never gets available onto the network.

Dark26 · Aug 13, 2020

maybe try to boot on a live cd and see if there any hardware problem.

it's like a hard drive failure, or a cable problem

Sir-robin10 · Aug 13, 2020

Dark26 said:
maybe try to boot on a live cd and see if there any hardware problem.

it's like a hard drive failure, or a cable problem

Well, I still can access everything in the system, so a drive failure is not the thing (I booted ubuntu from a usb stick and I can navigate trough the boot drive)

I try to find now the config files of the vms but I can't seem to find them this way...

Sir-robin10 · Aug 13, 2020

Allright, I figured out that the issue is a faulty drve (not the boot drive tho).

Now It's stuck (again) on "reading all phisical volumes"...

Last line is something of like "/dev/mapper/pve-root: clean, 162076/3653632 files, 5844909/14614528 blocks"... And it's just stuck there, for like several hours now´...

Sir-robin10 · Aug 13, 2020

So the faulty drive is an drive that I used to store backups on (had like such backup thing in proxmox configured) and was listed as a drive.. I suspect that proxmox can't handle the "gone" drive and hangs, any idea how I could solve this issue?

Sir-robin10 · Aug 15, 2020

he proxmox host is just dead by now... I fear I've lost EVERYTHING aloso by now... I'm 99.9% sure that the issue is an assigned drive (that I used for nightly backups and then sync with a cloud storage) died. That gives me no options what so ever to boot into proxmox and remove that dead drive, since proxmox waits for that drive to magically appear, or, when I plug the drive in, crashes due to erroring the hell out. By this moment, I don't see any solution here and I'm starting to get REALLY frustrated by this. This drive was never assigned to any vm or is never used in a vm, yet it is dead and makes proxmox not boot anymore.if there is ANYONE who can help me out, to have proxmox boot without loosing ANY data on the ZFS pool and any other drive, please help me out.... I'm hopeless at this moment. I've been not able to WORK while the server was down, so I really need to have the server working again by monday....

Sir-robin10 · Aug 15, 2020

Update:

I tried to boot into recovery mode using a flash drive and proxmox installer, now I get an frozen screen with some vfio-pci thing that the VGA decoders have been changed...

Sir-robin10 · Aug 17, 2020

I wonder if it would be possible to reïnstall the pve on the old SSD, if I'd still have my vms etc.. Since these are stored on an nve drive and the ZFS cluster is also on separate drives.. Anyone who can answer me?

illustris · Aug 18, 2020

Sir-robin10 said:
Update:

I tried to boot into recovery mode using a flash drive and proxmox installer, now I get an frozen screen with some vfio-pci thing that the VGA decoders have been changed...

Can you share this error message?

As for reinstalling proxmox, unless you took backups of the VMs, simply preserving the VM storage won't help. You will still lose the configs for the VMs, so proxmox won't recognize them.

Sir-robin10 · Aug 18, 2020

I do have access to the boot drive via cli, using a flashed usb stick, but I can't find any vm/ct configs in there... else I would copy these over to a cloud server of mine, but no vms are found in there

Sir-robin10 · Aug 20, 2020

Anyone any idea what could go wrong? I hope that there is an option to like skip the auto boot/scan of vms and/or PCIE devices and that I can just recover everything..... It's been down for over two weeks by now, I don't know what to do :'(

leesteken · Aug 21, 2020

Sir-robin10 said:
I do have access to the boot drive via cli, using a flashed usb stick, but I can't find any vm/ct configs in there... else I would copy these over to a cloud server of mine, but no vms are found in there

Pleae note that /etc/pve is not a real filesystem and will only be visible if the right parts of Proxmox VE are running (I don't know which).
This is explained in another post hre /etc/pve/* not really on ZFS filesystem? Maybe this information can help you recover the VM configurations?

Sir-robin10 · Aug 22, 2020

avw said:
Pleae note that /etc/pve is not a real filesystem and will only be visible if the right parts of Proxmox VE are running (I don't know which).
This is explained in another post hre /etc/pve/* not really on ZFS filesystem? Maybe this information can help you recover the VM configurations?

I don't seem to be able to recover the data this way... any other ideas that I might try? I'm getting really hopeless by now.... That being said, I even think that the ZFS pool is corrupted already due to the different OS boots.... My hope of geting the old system workin again is really low by now..

Proxmox not booting anymore after motherboard swap

Member

Member

Member

Member

Member

Member

Member

Member

Renowned Member

Member

Member

Member

Member

Member

Member

Member

Member

Member

Distinguished Member

Member