I am facing a issue for VM IO freeze for specific operation with Ceph RBD of Proxmox. This error has been occurring for more than 1 year, which is affecting the system reliability severely.
Environment:
Trigger:
1. High I/O Wait: iostat shows like
avg-cpu: %iowait 43.71%, %idle 56.22%
sdX: %util 100.00%
But all I/O metrics are 0: r/s=0, w/s=0, rMB/s=0, wMB/s=0, aqu-sz=0
2. Operations:
echo "abc" > new_file.txt (works)
echo "abc" >> existing_file.txt (works)
vi any_file.txt (VM freezes indefinitely)
cp old_file.txt new_file.txt (VM freezes indefinitely)
3. Workaround:
Live-migrating the VM to another Proxmox node temporarily resolves the freeze,
or reboot the VM resolves the freeze
Tested some other settings which are also failed:
1. disable the KSM on proxmox node
2. Async IO=native / io_uring / threads
3. SCSI Controller: Virtio SCSI Single / Virtio SCSI
4. disable fs-freeze
5. disalbe QEMU guest agent
Question:
Is there idea how to investigate and fix the issue? Is it some know deadlock for the current setting?
Thanks for the help.
Environment:
- Proxmox (8.1.3) with Ceph (17.2.7)
- VM: RHEL 8
- Storage: Ceph RBD (block device) → VM as XFS and ext4 disk
- Mount: fstab with defaults
- VM Config: VirtIO-SCSI single, discard=on, SSD emulation
Trigger:
- random IO freeze error happening once per 1 ~ 6 month randomly with normal VM usage pattern
- with fio stress, hourly backup, and memory create block and release stress, it will hang within 1 - 7 days
1. High I/O Wait: iostat shows like
avg-cpu: %iowait 43.71%, %idle 56.22%
sdX: %util 100.00%
But all I/O metrics are 0: r/s=0, w/s=0, rMB/s=0, wMB/s=0, aqu-sz=0
2. Operations:




3. Workaround:
Live-migrating the VM to another Proxmox node temporarily resolves the freeze,
or reboot the VM resolves the freeze
Tested some other settings which are also failed:
1. disable the KSM on proxmox node
2. Async IO=native / io_uring / threads
3. SCSI Controller: Virtio SCSI Single / Virtio SCSI
4. disable fs-freeze
5. disalbe QEMU guest agent
Question:
Is there idea how to investigate and fix the issue? Is it some know deadlock for the current setting?
Thanks for the help.