Hello everyone,
I hope this is an easy fix. I just recently upgraded from Promox VE 1.9 to 2.2 I've also switched from a single box to a cluster configuration. When i say "upgraded" that means "back up all vms and install fresh & restore vms"
The cluster shares disks via shared LVM+DRBD replication in a dual "primary" configuration.
Here is how it looks:
Machine "vm":
Phenom 2 x4 965
64GB ssd = boot
raid 1 of 2 * 500G sata = lun1 (mdadm) "vm_primary" (drbd resource)
raid 1 of 2 * 500G sata = lun2 (mdadm) "vm2_primary" (drbd resource)
Machine "vm2":
Core 2 Duo E8400
raid 10 of 4 * 73G 10k raptor = lun1 (mdam) "vm_primary" (drbd resource)
raid 10 of 8 * 73G 10k raptor = lun2 (sata 300-8xlp) "vm2_primary" (drbd resource)
The 500G raid 1 luns were actually partitioned to match the exact size of the luns that are composed of the 73G disks. (the 500G drives have a lot of unpartitioned space).
Everything was working fantastically until I went to upgrade the cpu in vm2.
All vms that normally run on vm are on "vm_primary". All vms that normally run on vm2 are on "vm2_primary".
Everything was working great -- migration, replication, etc.
I then decided to upgrade the cpu in vm2 to a Xeon X3330 (the board is a server chipset board).
First: I hot-migrated all the vms from vm2 (on vm2_primary) to vm. No issues. Then in the GUI, I select vm2 and did a "shutdown".
After the machine shut down, I opened it up & swapped the CPU. I let the machine sit in the bios ifor about 15 minutes & the warmest temperature I saw was 36c.
After I booted it back up, the machine started to boot & got to the point of creating the bridging interfaces & then spontaneously rebooted (or so it seemed).
I decided to run a memory test on the box and went back to my desktop to do some googling -- when i found this in my e-mail:
===
Due to an emergency condition, DRBD is about to issue a reboot
of node vm2. If this is unintended, please check
your DRBD configuration file (/etc/drbd.conf).
===
and also 2 of these:
===
DRBD has detected that the resource vm2_primary
on vm2 has lost access to its backing device,
and has also lost connection to its peer, vm.
This resource now no longer has access to valid data.
===
Does anyone have any idea how to 1) Get vm2 started without it auto-rebooting so I can attempt to see what is going on ? 2) Why it's auto rebooting?
for 1) My ideas are: a) Try to boot without the nics connected b) boot in single user mode & edit stuff manually there (but what to edit? Disable drbd at startup with update-rc.d?)
vm2_primary is the hardware raid lun and it appears fine during post when the raid controller goes through it's bios....I don't know why the system wouldn't be able to see it after a cpu upgrade.
Any suggestions you guys have are appreciated. I'm fairly Linux proficient and have no qualms about using the CLI extensively. If I can actually get vm2 started, I'll just discard all the data & re-sync sine I know all the data on vm (both luns) is good as that is where all the vms are running currently.
I hope this is an easy fix. I just recently upgraded from Promox VE 1.9 to 2.2 I've also switched from a single box to a cluster configuration. When i say "upgraded" that means "back up all vms and install fresh & restore vms"
The cluster shares disks via shared LVM+DRBD replication in a dual "primary" configuration.
Here is how it looks:
Machine "vm":
Phenom 2 x4 965
64GB ssd = boot
raid 1 of 2 * 500G sata = lun1 (mdadm) "vm_primary" (drbd resource)
raid 1 of 2 * 500G sata = lun2 (mdadm) "vm2_primary" (drbd resource)
Machine "vm2":
Core 2 Duo E8400
raid 10 of 4 * 73G 10k raptor = lun1 (mdam) "vm_primary" (drbd resource)
raid 10 of 8 * 73G 10k raptor = lun2 (sata 300-8xlp) "vm2_primary" (drbd resource)
The 500G raid 1 luns were actually partitioned to match the exact size of the luns that are composed of the 73G disks. (the 500G drives have a lot of unpartitioned space).
Everything was working fantastically until I went to upgrade the cpu in vm2.
All vms that normally run on vm are on "vm_primary". All vms that normally run on vm2 are on "vm2_primary".
Everything was working great -- migration, replication, etc.
I then decided to upgrade the cpu in vm2 to a Xeon X3330 (the board is a server chipset board).
First: I hot-migrated all the vms from vm2 (on vm2_primary) to vm. No issues. Then in the GUI, I select vm2 and did a "shutdown".
After the machine shut down, I opened it up & swapped the CPU. I let the machine sit in the bios ifor about 15 minutes & the warmest temperature I saw was 36c.
After I booted it back up, the machine started to boot & got to the point of creating the bridging interfaces & then spontaneously rebooted (or so it seemed).
I decided to run a memory test on the box and went back to my desktop to do some googling -- when i found this in my e-mail:
===
Due to an emergency condition, DRBD is about to issue a reboot
of node vm2. If this is unintended, please check
your DRBD configuration file (/etc/drbd.conf).
===
and also 2 of these:
===
DRBD has detected that the resource vm2_primary
on vm2 has lost access to its backing device,
and has also lost connection to its peer, vm.
This resource now no longer has access to valid data.
===
Does anyone have any idea how to 1) Get vm2 started without it auto-rebooting so I can attempt to see what is going on ? 2) Why it's auto rebooting?
for 1) My ideas are: a) Try to boot without the nics connected b) boot in single user mode & edit stuff manually there (but what to edit? Disable drbd at startup with update-rc.d?)
vm2_primary is the hardware raid lun and it appears fine during post when the raid controller goes through it's bios....I don't know why the system wouldn't be able to see it after a cpu upgrade.
Any suggestions you guys have are appreciated. I'm fairly Linux proficient and have no qualms about using the CLI extensively. If I can actually get vm2 started, I'll just discard all the data & re-sync sine I know all the data on vm (both luns) is good as that is where all the vms are running currently.
Last edited: