iothreads seems to be the culprit.
For history, I launched a backup on PBS of 42 VMs yesterday evening. The backup is still in progress, with the addition of a PG rebalance on Ceph.
Nothing to report :) Everything is going well with the iothreads unchecked/off.
Dear forum,
After ceph upgrade pacific to latest pacific 16.2.11 then to reef, rbd-mirror always crash and never create/continue image.
log :
-86> 2024-01-19T00:20:44.333+0100 7f9a759546c0 4 set_mon_vals no callback set
-85> 2024-01-19T00:20:44.333+0100 7f9a791b2a80 5 monclient...
This is a new records ! ;)
Nice to see improvement, and see that it's working without iothreads. It seems there is something bad with it.
What version of qemu is installed ?
Got a similar problem between 2 cluster in one-way replication.
This used to work with ceph pacifc before 16.2.11 then crash...
And now with ceph reef, it only crash.
Please see attached file for help.
@Whatever : i will look at the thread
Also, the latest update from proxmox qemu indicates :
pve-qemu-kvm (8.1.2-6) bookworm; urgency=medium
* revert attempted fix to avoid rare issue with stuck guest IO when using
iothread, because it caused a much more common issue with iothreads...
For windows, yes, I have a lot of this ... vioscsi : "Une réinitialisation au périphérique, \Device\RaidPort0, a été émise."
On Linux, I have message like "INFO: task XXX blocked for more than 120 seconds"
Did you try to unchek IO Thread on disk option ? Or switch to "VirtIO SCSI" and not...
May you post you VM config ? So we can compare ?
Here are 2 VM config : one linux and one windows which were defective and now working :
(before with iothread=1 then VM freeze)
VM Linux :
root@pve-xxx-xxx-xxx-xxx:~# qm config 203
agent: 1
balloon: 2048
bootdisk: scsi0
cores: 4
cpu: host...
It's working today with guest agent off. I will see this night if it's working for one vm.
I will not test for more for now, because VM get locked since too much time last 20 days ...
Ceph is running reef release from Proxmox (ceph --version) :
ceph version 18.2.0 (d724bab467c1c4e2a89e8070f01037ae589a37ca) reef (stable)
This sound interesting if you can make some test, at least with one VM and uncheck iothread, then poweroff VM and boot on for change to take effect.
Well yes, if I think too : we have a problem between Qemu <-> Ceph.
Comment #9 from drjaymz@ get me on the right way.
Did some tests, start VM, wait for bootup, make a snapshot then try to run top or any program then make reboot/poweroff from guest :
VirtISO SCSI Single + IOThread + AIO...
We have same problem while taking snapshot. VM are freezing and can't thaw anymore.
VM Disk are on ceph rbd, and we tried :
with qemu-guest-agent on & off
guest agent option to freeze/thaw on & off
qemu async native/threads/io_uring
Neither work. PVE has been updated to 8.1, before we were...
We did have the same problem.
We had to recreate all osd, node by node, to have a correct write performance 50-60Mb/s only for a 3 nodes cluster with gigabit network and hdd only.
Also, we changed snap trimming variables to get a reasonnable read/recovery while osd get recovered.
This site uses cookies to help personalise content, tailor your experience and to keep you logged in if you register.
By continuing to use this site, you are consenting to our use of cookies.