Please, no comments about ZFS, etc. etc., as ZFS is not an option for my environment.
[BACKGROUND INFORMATION]
1. I configured a server with a large hardware raid volume (DELL Perc H700 raid controller).
2. fresh install of ProxMox v6.2 to that single large volume using installer defaults for LVM-thin config
3. after installation, I configured an raid-controller attached SSD for LVM cache using the following commands:
<!--
pvcreate /dev/sdb
vgextend pve /dev/sdb
lvcreate -L 850G -n CacheDataLV pve /dev/sdb
lvcreate -L 50G -n CacheMetaLV pve /dev/sdb
lvconvert --type cache-pool --poolmetadata pve/CacheMetaLV pve/CacheDataLV
lvconvert --type cache --cachepool pve/CacheDataLV --cachemode writeback pve/data
echo "dm_cache" >> /etc/initramfs-tools/modules
echo "dm_cache_mq" >> /etc/initramfs-tools/modules
echo "dm_persistent_data" >> /etc/initramfs-tools/modules
echo "dm_bufio" >> /etc/initramfs-tools/modules
update-initramfs -u
-->
4. Everything ran fine for several months, and performance made it seem as though the cache was functioning as expected.
5. total failure of the SSD (raid controller no longer had access to the disk)
6. Running "vgchange --test -a y /dev/pve" returns "Refusing activation of partial LV /pve/data...pve/vm-###-disk-0" ... etc. etc. for all LVs
[QUESTIONS]
I would have expected the system to continue to function, at reduced speeds; however, all but two of my containers/ VMs stopped functioning. Two containers were still running, but after shutting down those containers, I was unable to start them or any other containers/VMs.
I will update as I take steps to correct the problem, but my questions are:
a) Did I configure the cache correctly?
b) Is the failure of the LVM, as a result of the SSD failure, expected behavior?
c) If not, what could have caused the LVM failure?
[BACKGROUND INFORMATION]
1. I configured a server with a large hardware raid volume (DELL Perc H700 raid controller).
2. fresh install of ProxMox v6.2 to that single large volume using installer defaults for LVM-thin config
3. after installation, I configured an raid-controller attached SSD for LVM cache using the following commands:
<!--
pvcreate /dev/sdb
vgextend pve /dev/sdb
lvcreate -L 850G -n CacheDataLV pve /dev/sdb
lvcreate -L 50G -n CacheMetaLV pve /dev/sdb
lvconvert --type cache-pool --poolmetadata pve/CacheMetaLV pve/CacheDataLV
lvconvert --type cache --cachepool pve/CacheDataLV --cachemode writeback pve/data
echo "dm_cache" >> /etc/initramfs-tools/modules
echo "dm_cache_mq" >> /etc/initramfs-tools/modules
echo "dm_persistent_data" >> /etc/initramfs-tools/modules
echo "dm_bufio" >> /etc/initramfs-tools/modules
update-initramfs -u
-->
4. Everything ran fine for several months, and performance made it seem as though the cache was functioning as expected.
5. total failure of the SSD (raid controller no longer had access to the disk)
6. Running "vgchange --test -a y /dev/pve" returns "Refusing activation of partial LV /pve/data...pve/vm-###-disk-0" ... etc. etc. for all LVs
[QUESTIONS]
I would have expected the system to continue to function, at reduced speeds; however, all but two of my containers/ VMs stopped functioning. Two containers were still running, but after shutting down those containers, I was unable to start them or any other containers/VMs.
I will update as I take steps to correct the problem, but my questions are:
a) Did I configure the cache correctly?
b) Is the failure of the LVM, as a result of the SSD failure, expected behavior?
c) If not, what could have caused the LVM failure?