I need to fail it more times to know for sure, I only did it once, but it worked immediately which it hasn't done before.Thanks, I mostly wanted to know if zvol_use_blk_mq was not set (it's not), but may find something more later on. I just experienced more weird behaviour with ZVOLs over time. Please let us know later in case the "fix" was not real, but I will just assume that thick provisioning the volume made it for you.
Interesting. I like QCOW2 on $anything because of its tree-like snapshot structure, yet the argument I always get when I say that I use QCOW2 for some machines on ZFS (datasets) is that you would have COW on COW and with snapshots, the performance gets unpredicable and slow. I have to concur with that. All VMs feel much more sluggish, slower and its backup takes much more time. If I clone the VM to a zvol, it is noticebly faster, including the backup. I did not run artificial tasks, I ran real world tasks like creating oracle databases per script so that you can "see" the runtime, install OSes. The article does not mention this at all, just non-snapshot-states and nothing about backup speed, which would be - as much as I love fio - a real world perfectly comparable test.@ballybob FWIW I personally avoid ZVOLs for VMs, rather than anectodal evidence, if I pull quick search results this is still in the top:
https://jrs-s.net/2018/03/13/zvol-vs-qcow2-with-kvm/
Interesting. I like QCOW2 on $anything because of its tree-like snapshot structure, yet the argument I always get when I say that I use QCOW2 for some machines on ZFS (datasets) is that you would have COW on COW and with snapshots, the performance gets unpredicable and slow. I have to concur with that. All VMs feel much more sluggish, slower and its backup takes much more time.
If I clone the VM to a zvol, it is noticebly faster, including the backup.
I did not run artificial tasks, I ran real world tasks like creating oracle databases per script so that you can "see" the runtime, install OSes. The article does not mention this at all, just non-snapshot-states and nothing about backup speed, which would be - as much as I love fio - a real world perfectly comparable test.
QCOW2 is also not trimmable as easy as a ZVOL, so that you will waste much more space on your machines. You need to offline compact the file in order to get back space.
I run a server with 40+ machines on 6 enterprise ssds in a RAID10 on ZVOL and it worked great for all machines with only linear snapshots, for the tree-like ones, I used a dataset.
You mentioned that already, yet I don't know why. Where do you get this? I've never stumbled upton this. I run everything in ZFS thin for almost a decade.Yet if ZVOL fails and is literally not recommended to be run thin...
Which type of storage is on feature-par with this? I don't know of any.Yes, but another way of looking at is that ZFS (or any COW) is simply unsuitable for it, not the other way around.
I don't have any other information, so of course it is anectodal. It would be great if one could reproduce the problem.That's anectodal evidence, but yes, it's possible. My concern is for the quality of ZVOL implementation.
That I don't know.The question here really is ... why thick provisioning that ZVOL with efidisk on mostly empty dataset solved something at all.
You mentioned that already, yet I don't know why. Where do you get this? I've never stumbled upton this. I run everything in ZFS thin for almost a decade.
Which type of storage is on feature-par with this? I don't know of any.
I don't have any other information, so of course it is anectodal. It would be great if one could reproduce the problem.
That I don't know.
FYI, you can only useAdding to it - I was able to narrow down the problem. The reason the other VM was working and the HAOS wasn't is because the HAOS was using the CPU as "Host" while the new one the default "x86-64-v2-AES". The former always fail. The latter always worked. Changed HAOS tot he default CPU and it just work.
host
CPU type when you have the exact same CPU model on source and target, see: https://pve.proxmox.com/pve-docs/chapter-qm.html#_cpu_typeSounds a bit like the target QEMU process might not have been running anymore, leading to the I/O error. You can check the system logs/journal on the target system.I wonder how that would be related to an "I/O" error on the UEFI disk...
Thanks for the reply.Hi,
FYI, you can only usehost
CPU type when you have the exact same CPU model on source and target, see: https://pve.proxmox.com/pve-docs/chapter-qm.html#_cpu_type
Sounds a bit like the target QEMU process might not have been running anymore, leading to the I/O error. You can check the system logs/journal on the target system.
Please test it rigorously. I also had such a problem years ago and some VMs did work, others failed a couple of hours later and some failed directly.I've played a bit with the options and it was also related to the CPU. When I changed to "x86-64-v3" it worked just fine. The CPU on the main machine is a TRP 7995WX while the target is an Intel i9 13900K which I use as temporary host when TRP is under maintenance. The "v4" works on the TRP but not on the i9, so that did the trick.
Yes I agree. This is what I do at production environments. Just doing this as an experiment at home.Please test it rigorously. I also had such a problem years ago and some VMs did work, others failed a couple of hours later and some failed directly.
Best to stick to the recommendation to NOT mix CPUs.