How I migrated Proxmox from H700 to H200 IR mode Dell R710, adding ZFS (procedure log)

rosmaniac

Member
Jun 7, 2021
5
3
8
55
Hope this helps someone.

Seeing so much information and misinformation out there on the H200i in IR mode in an R710, I just wanted to post my own configuration and experiences with this particular hardware combination. This is my own experience: your mileage may vary, and I always reserve the right to be wrong. This is also a bit lengthy..... And note that I have over 30 years of experience administering Unix and Unix-like systems such as Linux; my daily driver laptop runs Debian Bullseye just like Proxmox uses, so I knew exactly what I was doing in the procedure below, and I had tested backups of everything. Because I'm old-school there's a really good chance this could be done more easily, but this is how I did it.

At one site I have a three-node Proxmox cluster set up, using two Dell R710's and a Supermicro X9QR7-TF. Not really new, but all work really well for my needs at that site. The network backbone is 10Gb/s, and all three nodes are connected to the site 4Gb/s fibrechannel SAN, with two EMC Clariion arrays, one a CX3-80 and the other a CX4-480, a few hundred terabytes of old but reliable storage. The X9QR7-TF has multiple IT-mode LSI HBAs on the motherboard, and the chassis has dual 24-slot drive enclosures with two SAS expanders; I had previously migrated 8 of the drives (all Samsung PM863 960GB SSDs) from being part of a hardware RAID setup (LSI MegaRAID SAS 9361-4i) on the bottom expander to individual disks on the upper expander, and had set up ZFS on them. So I wanted to take advantage of replication on a few largish VMs to reduce migration time but still have the performance advantage of the SSDs versus the 4Gb/s SAN shared LVM storage, which is significantly slower, even though migration on shared storage is a breeze.

As part of this project, I recently migrated one of the two R710's from its factory PERC H700 controller to an H200i. This is a journal of what I did to accomplish this migration.

First, the H700 had two RAID volumes (VD in H700-speak): the first VD a RAID1 of drives 0 and 1 (Samsung 960GB PM863 SSDs), and the second a RAID6 of drives 2 through 7, also Samsung PM863's. The H200 is factory with the IR mode (NOT IT mode) firmware. I started by deleting the second VD in the H700 setup utility (typical LSI Control-R setup). But, since I wanted the node to stay in the cluster I left the first VD alone, assuming (incorrectly!!) that the H200 IR mode firmware would pick it up, since the H200 can do a RAID 1 VD while still passing the rest of the drives through just like the IT mode firmware. (I wanted especially to keep the backups assigned to this node and stored on non-shared FC LUNs attached to this node, along with a couple of largish backup guests also stored on non-shared FC LUNs that would take several days in total to migrate to shared storage. I knew I could always just swap the H700 back in if I couldn't get it migrated, so I proceeded.

The controller swap went fine, but on first boot and checking in the LSI setup I could see that the RAID 1 VD did NOT make it through, and there is no function in the H200's setup to import a foreign configuration like on the H700; creating a new RAID 1 does not give the option to keep the data, either.... Hmm, time to do some 'Linux-y' magic....

After powering off the node, I pulled drives 0 and 1, replaced them with drives 6 and 7, and put drive 0 in slot 6 (keeping drive 1 out as a backup; took it to a SATA dock and cloned it to an image file in case I needed to back out the migration; Clonezilla is such a great tool for this kind of thing).

I set up a RAID 1 VD in the H200's Control-C setup, and let it initialize. Then booting from a Clonezilla Ubuntu USB drive I attempted to let Clonezilla do a full disk clone from the H700 member drive to the RAID 1 VD; this, however, failed due to the difference in volume size between the RAID 1 VD on the H200 and the H700 member drive, which has an area of metadata at the end of the disk. After a very careful check of the sizes of the various partitions on the H700 member disk in slot 6 and the H200's VD in slots 0 and 1, I saw all the partition sizes were identical (the size difference of the dirves was completely comprised of 'free space' on the H700 member drive), and so I used ddrescue (present on Clonezilla) to do a hard clone of the member drive back to the H200 VD and ignored the error at the end telling me the destination disk was out of space. That was a bit nerve-wracking, to say the least, but double and triple-checking sizes verified that it would be ok. And even if it wasn't ok, the OS logical volumes are at the start of the PV in the third partition, so it's not too hard to blow out the LV for the thin pool, resize the PV, and then recreate the thin pool, but so far it looks like that won't be necessary.

Once ddrescue was finished, I shut the R710 down, pulled the H700 member drive from 6, and booted. The node booted up all the way with no errors and rejoined the cluster. I then shut the node down and returned the two H700 member disks into slots 6 and 7. I rebooted into Clonezilla, dropped to a command line, and used wipefs to clear out the H700 metadata and the LVM signatures on the two H700 member disks, using dd to zero out the first few MB of the disks.

Rebooting into Proxmox, I went to the disks menu under that node and created a ZFS RAIDZ2 volume using all six of the disks in slots 2 through 7.

Here is a screenshot of the relevant section of the Proxmox web GUI after the ZFS creation:
Screenshot from 2023-06-05 11-10-56.png

Note carefully how the RAID1 in slots 0 and 1 shows as a Virtual_Disk, but the drives in slots 2 through 6 show up as individual disks with full SMART data; ZFS is happy with them. (all the of the VRAID mpath_members are RAID6 LUNs on the FC SAN).

Note also that this is IR mode firmware, NOT IT mode firmware. I had previously tried an IT mode H200 in a development R515 (also to run Proxmox as a single node) but after the installtion of Proxmox finished I could not get the boot drive to be recognized in UEFI as a boot drive; the drive showed up just fine booted from a Debian Live USB, and the boot didn't halt, meaning the flash of the IT mode firmware was OK even for the integrated RAID slot, but UEFI simply did not see the disk. In the same R515 an IR-mode H200 worked perfectly, with slots 12 and 13 being used as a RAID1 and slots 0 through 11 (12-bay R515) showing as raw disk with full SMART access, so that's what I decided to do with the R710, and it works fine. The H200i has no cache and no battery, so ZFS is satisfied. The IT mode H200i works perfectly in an R715 I have, booting fine, so go figure.

But since this isn't the 'conventional wisdom' that says you must absolutely use IT mode firmware for ZFS, I will be keeping a close eye on it.

One note about replication: the ZFS volumes participating in the replication job have to have the same volume name. The full replication has finished, and the every fifteen minute replication job is running like clockwork.
 
  • Like
Reactions: UdoB

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!