Hi, I have question / curious if anyone has insight/comment.
- doing a P2V migration of a physical host > proxmox; using a temporary staging host to facilitate the transfer.
- Stage host and clobber old host - was done with ISO / stock install proxmox host 7.latest
- OldHost is supermicro hardware / LSI hardware raid controller
- For this one - the ISO proxmox appliance style install was done / carved up main disk into (mostly) an LVM_thin for VM storage by default
-- thestaging server, was a small NUC with ZFS Mirror proxmox install (stock proxmox ISO installer) (no hw raid)
- did a P2V migration of windows server off the physical server, by generating a server backup > converted to VHDX > imported to Qemu / proxmox on stage server
- the P2V stuff went smoothly and the VM booted fine, ran fine etc.
- detail to note, is the source physical host had ~2.6Tb C: drive for windows and about 90Gb of actual disk-used footprint inside windows.
- and when we did the VM image transfer, it had a footprint of around 130gb.
After a few days of patient waiting, the old-original windows host was clobbered to be proxmox. And then migrated the VM off the 'stage server' to the 're-christened supermicro'
-- setup a temporary local USB storage on StageServer as a 'backup storage target' for vanilla proxmox backup-dump
-- powered down the windows VM, and ran a normal backup.
-- this spits out a VM Backup file, approx 130gig footprint.
-- copied this over to the Supermicro, then did a standard CLI restore the VM to this host / designated the LocalLVMThin as the local storage to use.
-- it moved along fine and did bulk of work in ~45min.
-- the weird part, at the start it warned that LVM_thin had no measure to protect against disk filling. And at end once it had 'finished' the restore it then got stuck on "LVM Rescan".
Warn message text:
-- I was under impression that LVM Thin would tolerate a VM Disk that exceeded the size of physical storage available (which is the case here - we have about 2Tb of storage available in the LVM_THIN and the VM in question has an official disk size of ~2.6Tb / but only contains ~100gb of actual footprint.
What appears to have happened.
-- LVM was pissed off, and letting it sit overnight did not help.
-- in dmesg we had lots of stuff logged like this:
In this state if I tried to issue commands 'lvs' or 'vgs' it would just timeout / hang.
Hit Ctrl-C and it would tell me "cannot get lock".
Anyhow. I rebooted the box this morning. ... and after that, was able to get output from vgs and lvs.
The other added fun. Is that there was a tiny VM (VEID100) (2gb) I tried to restore also in parallel. It was unable to do anything while the LVM was 'hung' for the large VM (VEID 102) restore. After the reboot, the VM-100 is super weird.
- cannot delete it, it is shown in GUI as 'locked'
- cannot unlock it in CLI, it tells me the CONF file does not exist
- cannot create/touch the conf file - tells me it exists, cannot create
- cannot see or remove conf file that supposedly exists
- I was able to delete the lv for vm-100 but that did not free up that mystery mess piece either
some code bits relevant for this:
So, I am not sure if anyone has seen anything like this. I did reboot the box again, and that fails to 'release' anything with this locked-but-absent-but-dead-and-locked VM100. Worst case I just blow away the whole box and clean install, but that seems kind of lame-sad as a solution.
Regarding the LVM.
- I am curious if LVM_thin is 'meant' to behave this way and I just don't understand how LVM Thin works (ie, it cannot tolerate a VM with disk size that exceeds LVM capacity?) (or is there some other weird-problem-behaviour here, which is a 'real and unexpected problem' ?)
- I am guessing the 'cleanest' solution is something like
(1) shrink disk inside windows VM to be smaller/sane size, maybe 500gb
(2) do some work to shrink the proxmox disk allocation for this guest. I haven't done this before on a ZFS-backed VM so will have a bit of learning here.
(3) once the VM has a smaller/sane disk allocation. Do a backup-dump; then transfer and restore onto other host will be fine-sane and we have no issues with LVM_Thin or space overcommit.
Big picture I am maybe a stick in the mud, I kind of like the good old days when /var/lib/vz proxmox storage was a local filesystem (even if sitting on top of LVM) and not an actual LVM_Thin volume / which you can't 'get and see inside' via filesystem and manipulate directly the qcow/raw files etc that are just sitting there. Possibly I will just re-do my install and allocate tiny bit of space for LVM_THIN and then carve off a vanilla local directory to become VM storage on top of filesystem. Since I know I can do this sort of "thin overcommit disk" with a RAW or QCOW file with zero drama. And this LVM layer here just seems to be giving me multiple headaches and no big benefits.
yay.
anyhow. if anyone has any feedback or comments, it would be appreciated.
Tim
- doing a P2V migration of a physical host > proxmox; using a temporary staging host to facilitate the transfer.
- Stage host and clobber old host - was done with ISO / stock install proxmox host 7.latest
- OldHost is supermicro hardware / LSI hardware raid controller
- For this one - the ISO proxmox appliance style install was done / carved up main disk into (mostly) an LVM_thin for VM storage by default
-- thestaging server, was a small NUC with ZFS Mirror proxmox install (stock proxmox ISO installer) (no hw raid)
- did a P2V migration of windows server off the physical server, by generating a server backup > converted to VHDX > imported to Qemu / proxmox on stage server
- the P2V stuff went smoothly and the VM booted fine, ran fine etc.
- detail to note, is the source physical host had ~2.6Tb C: drive for windows and about 90Gb of actual disk-used footprint inside windows.
- and when we did the VM image transfer, it had a footprint of around 130gb.
After a few days of patient waiting, the old-original windows host was clobbered to be proxmox. And then migrated the VM off the 'stage server' to the 're-christened supermicro'
-- setup a temporary local USB storage on StageServer as a 'backup storage target' for vanilla proxmox backup-dump
-- powered down the windows VM, and ran a normal backup.
-- this spits out a VM Backup file, approx 130gig footprint.
-- copied this over to the Supermicro, then did a standard CLI restore the VM to this host / designated the LocalLVMThin as the local storage to use.
-- it moved along fine and did bulk of work in ~45min.
-- the weird part, at the start it warned that LVM_thin had no measure to protect against disk filling. And at end once it had 'finished' the restore it then got stuck on "LVM Rescan".
Warn message text:
Code:
WARNING: You have not turned on protection against thin pools running out of space.
WARNING: Set activation/thin_pool_autoextend_threshold below 100 to trigger automatic extension of thin pools before they get full.
-- I was under impression that LVM Thin would tolerate a VM Disk that exceeded the size of physical storage available (which is the case here - we have about 2Tb of storage available in the LVM_THIN and the VM in question has an official disk size of ~2.6Tb / but only contains ~100gb of actual footprint.
What appears to have happened.
-- LVM was pissed off, and letting it sit overnight did not help.
-- in dmesg we had lots of stuff logged like this:
Code:
[14585.042747] systemd-journald[653]: Failed to open system journal: No space left on device
[95748.704550] perf: interrupt took too long (2506 > 2500), lowering kernel.perf event_max_sample_rate to 79750
[96598.611255] device-mapper: thin: Data device (dm-3) discard unsupported: Disabling discard passdown.
In this state if I tried to issue commands 'lvs' or 'vgs' it would just timeout / hang.
Hit Ctrl-C and it would tell me "cannot get lock".
Anyhow. I rebooted the box this morning. ... and after that, was able to get output from vgs and lvs.
Code:
root@proxmox:/etc/pve/qemu-server# vgs
VG #PV #LV #SN Attr VSize VFree
pve 1 5 0 wz--n- <1.31t <16.00g
root@proxmox:/etc/pve/qemu-server# lvs
LV VG Attr LSize Pool Origin Data% Meta% Move Log Cpy%Sync Convert
data pve twi-aotz-- <1.17t 21.96 1.25
root pve -wi-ao---- 96.00g
swap pve -wi-ao---- 8.00g
vm-100-disk-0 pve Vwi-a-tz-- 8.00g data 0.00
vm-102-disk-0 pve Vwi-a-tz-- 2.18t data 11.75
The other added fun. Is that there was a tiny VM (VEID100) (2gb) I tried to restore also in parallel. It was unable to do anything while the LVM was 'hung' for the large VM (VEID 102) restore. After the reboot, the VM-100 is super weird.
- cannot delete it, it is shown in GUI as 'locked'
- cannot unlock it in CLI, it tells me the CONF file does not exist
- cannot create/touch the conf file - tells me it exists, cannot create
- cannot see or remove conf file that supposedly exists
- I was able to delete the lv for vm-100 but that did not free up that mystery mess piece either
some code bits relevant for this:
Code:
root@proxmox:~# qm unlock 100
Configuration file 'nodes/proxmox/qemu-server/100.conf' does not exist
root@proxmox:~# cd /etc/pve/qemu-server
root@proxmox:/etc/pve/qemu-server# touch 100.conf
touch: cannot touch '100.conf': File exists
root@proxmox:/etc/pve/qemu-server# ls -la
total 1
drwxr-xr-x 2 root www-data 0 Nov 22 13:34 .
drwxr-xr-x 2 root www-data 0 Nov 22 13:34 ..
-rw-r----- 1 root www-data 429 Nov 24 07:16 102.conf
root@proxmox:/etc/pve/qemu-server# rm 100.conf
rm: cannot remove '100.conf': No such file or directory
root@proxmox:/etc/pve/qemu-server# qm destroy 100
Configuration file 'nodes/proxmox/qemu-server/100.conf' does not exist
root@proxmox:/etc/pve/qemu-server#
So, I am not sure if anyone has seen anything like this. I did reboot the box again, and that fails to 'release' anything with this locked-but-absent-but-dead-and-locked VM100. Worst case I just blow away the whole box and clean install, but that seems kind of lame-sad as a solution.
Regarding the LVM.
- I am curious if LVM_thin is 'meant' to behave this way and I just don't understand how LVM Thin works (ie, it cannot tolerate a VM with disk size that exceeds LVM capacity?) (or is there some other weird-problem-behaviour here, which is a 'real and unexpected problem' ?)
- I am guessing the 'cleanest' solution is something like
(1) shrink disk inside windows VM to be smaller/sane size, maybe 500gb
(2) do some work to shrink the proxmox disk allocation for this guest. I haven't done this before on a ZFS-backed VM so will have a bit of learning here.
(3) once the VM has a smaller/sane disk allocation. Do a backup-dump; then transfer and restore onto other host will be fine-sane and we have no issues with LVM_Thin or space overcommit.
Big picture I am maybe a stick in the mud, I kind of like the good old days when /var/lib/vz proxmox storage was a local filesystem (even if sitting on top of LVM) and not an actual LVM_Thin volume / which you can't 'get and see inside' via filesystem and manipulate directly the qcow/raw files etc that are just sitting there. Possibly I will just re-do my install and allocate tiny bit of space for LVM_THIN and then carve off a vanilla local directory to become VM storage on top of filesystem. Since I know I can do this sort of "thin overcommit disk" with a RAW or QCOW file with zero drama. And this LVM layer here just seems to be giving me multiple headaches and no big benefits.
yay.
anyhow. if anyone has any feedback or comments, it would be appreciated.
Tim