Problem with PVE 8.1 and OCFS2 shared storage with io_uring

AZ129 · Jan 23, 2024

Hey everyone,
I have problem with OCFS2 shared storage after upgrading one of the cluster nodes from 8.0 to 8.1.
Shared volume was correctly mounted on the upgraded node. VMs could be started, but can't correctly work because file system inside of VM is in read-only mode.
The error messages in syslog of host:

Code:

kernel: (kvm,85106,7):ocfs2_dio_end_io:2423 ERROR: Direct IO failed, bytes = -5
kernel: (kvm,85106,7):ocfs2_dio_end_io:2423 ERROR: Direct IO failed, bytes = -5
kernel: (kvm,85106,7):ocfs2_dio_end_io:2423 ERROR: Direct IO failed, bytes = -5

VM uses SCSI controller in VirtIO SCSI single mode, iothread=1, aio=io_uring.

I've found workaround switching to aio=threads. VMs are work fine now, but I'd be grateful for advices because as far as I know it's not the best setting for performance.

bbgeek17 · Jan 23, 2024

Unfortunately the setup you have is not officially supported, and hence not tested by anyone regularly.

I would check the following things:
Is the "broken" behavior only on 8.1 node? It sounds like you still have 8.0 node in the cluster? If you do, does the VM work when migrated there?
Depending on the outcome of the test above, compare the package versions between the two. Are there any major differences in system, OCFS, kernel versions?
I assume your VM images are in QCOW format? You can experiment with mounting them directly. Another option is to boot VM with aio=threads and add a new disk with aio=io_uring, to see if there is a difference in behavior.

In the end, you may need to perform a meticulous data collection from working and non-working node and then submit the findings in bugzilla.proxmox.com, and perhaps even to the maintainers of the OCFS2. This may be Debian or Ubuntu core maillists. The most helpful information would be a step-by-step reproduction of the issue. For this you may need to install Virtual PVE and record all the changes that lead to broken installation.

Good luck

Blockbridge : Ultra low latency all-NVME shared storage for Proxmox - https://www.blockbridge.com/proxmox

AZ129 · Jan 24, 2024

bbgeek17 said:
Is the "broken" behavior only on 8.1 node?

Yes, all VMs with aio=io_uring on PVE 8.0.4 can work with virtual drives on shared storage without problems. But not on 8.1.4.

bbgeek17 said:
Another option is to boot VM with aio=threads and add a new disk with aio=io_uring, to see if there is a difference in behavior

I added new virtual disk with aio=io_uring on the node with PVE 8.0.4 - works fine. Then I migrated running VM to the host with PVE 8.1.4. Writing to the disk with io_uring failed with the same errors in the host log.

bbgeek17 said:
Are there any major differences in system, OCFS, kernel versions?

I guess the culprit is the new kernel, because with previous version there are no any problems with OCFS2.
I'll try to update remaining nodes to 8.1.4 and new kernel to check if it helps.

bbgeek17 · Jan 24, 2024

If I knew that older version works, but newer doesnt - I wouldnt rush to update everything. But perhaps you have your reasons.
If I were you I'd open a bug here https://bugzilla.proxmox.com/ with detailed description, versions from working and non-working nodes. Configuration details, steps to reproduce the problem. Logs showing a good and bad boot. Etc.

Blockbridge : Ultra low latency all-NVME shared storage for Proxmox - https://www.blockbridge.com/proxmox

sivn · Feb 2, 2024

I have a similar problem. I used the OCSFS2-System to allow the connection to two SAN systems via FC. Currenty, there is only one node connected to the system. It's a fresh install of 8.1.3. I tried the older kernel 6.2.16-19-pve instead of 6.5.11-4-pve but it didn't change anything.

Code:

Feb 02 09:09:01 ve05 kernel: (kworker/3:2,835,3):ocfs2_dio_end_io:2423 ERROR: Direct IO failed, bytes = -11
Feb 02 09:10:10 ve05 kernel: (kworker/3:2,835,3):ocfs2_dio_end_io:2423 ERROR: Direct IO failed, bytes = -11

Similar to AZ129, it helps to set the VMs to thread.

devaux · Jun 4, 2024

Same here for Kernel 6.8 and 6.5.
But works with Kernel 6.2.16-20-pve (new installation of Proxmox 8.2.2)

fertig · Jun 13, 2024

anything new here? As far as I can see this as a "beginner", this problem concerns ISO CD-ROM files and cloud-init-drives too - which both may not set to "aio=threads" afaik

gurubert · Sep 10, 2024

There is now a kernel patch available: https://lore.kernel.org/io-uring/8b7a8200-f616-46a8-bc44-5af7ce9b081a@kernel.dk/T/#u

OCFS2 seems to be unmaintained.

spacefreak · Nov 6, 2024

There is another forum post about this: https://forum.proxmox.com/threads/kernel-6-8-12-3.156966/

I tested with Kernel 6.8.12-3 and it only partially fixes the issue:

Booting from ISO file on OCFS2 partition works
Booting from QCOW2 on OCFS2 does not work

dakralex · Nov 7, 2024

spacefreak said:
I tested with Kernel 6.8.12-3 and it only partially fixes the issue:

Booting from ISO file on OCFS2 partition works

Booting from QCOW2 on OCFS2 does not work

Thanks for pointing this out in the referenced thread! As you mentioned, it seems like the problem hasn't been fully fixed with the patch in io_uring. As for the mentioned thread, it would help to have a more exact reproducer (hardware setup, ocfs2 mount options, vm config) to have a clear picture when the ocfs2 I/O error is triggered:

Code:

ocfs2_dio_end_io:2421 ERROR: Direct IO failed, bytes = -5

I couldn't reproduce the error on a local ocfs2 setup (see [0] for the setup steps) with a VM having a SCSI-attached qcow2 drive image.

[0] http://gurubert.de/ocfs2_io_uring.html

spacefreak · Nov 8, 2024

This is my hardware setup:
- Hitachi G600 SAN storage
- 3 Quanta servers, each one is connected to the storage with Emulex FC adapters (redundant)

And my software setup:
- Multipath is in use on each server
- OCFS2 on FC LUN mounted on each server, these are the mount options:

rw,relatime,_netdev,heartbeat=global,nointr,data=ordered,errors=remount-ro,atime_quantum=60,cluster_stack=o2cb,coherency=full,user_xattr,acl,_netdev

- Mount-path of OCFS2 added to Proxmox as directory storage (shared)

And the virtual disk config of the VM:

virtio0: g600_data1:200/vm-200-disk-0.qcow2,discard=on,iothread=1,size=20G

If I start the VM, it seems to boot first (services getting started) and then the described errors appear in the kernel log and the VM hangs, it does not matter what OS (Linux, Windows, ...) is used within the VM.

If I change Async IO to native, everything works as expected:

virtio0: g600_data1:200/vm-200-disk-0.qcow2,aio=native,discard=on,iothread=1,size=20G

Unfortunately, there are no other messages in the kernel log related to this issue.

gurubert · Nov 8, 2024

`virtio` may be something completely different. Please switch to `scsi` and the `virtio-scsi-single` virtual HBA in the VM.

spacefreak · Nov 8, 2024

I switched to 'scsi', but that does not change anything. The VM hangs during boot and the kernel log shows exactly the same errors like before. This also happens if SSD emulation is disabled.

scsi0: g600_data1:200/vm-200-disk-0.qcow2,discard=on,iothread=1,size=20G,ssd=1
scsihw: virtio-scsi-single

If I switch to aio=native, the VM boots without any issues.
It doesn't seem to matter if 'scsi' or 'virtio' is in use.

dakralex · Nov 8, 2024

Thanks for the quick answer @spacefreak! I could reproduce the issue on my local setup now. I'll work on this and update this thread and the original bug report [0] with further details for a possible fix.

[0] https://bugzilla.proxmox.com/show_bug.cgi?id=5430

spacefreak · Nov 8, 2024

Thanks for your quick answer and support @dakralex . I have this issue since about one year and I am very grateful that you take a look at it.
Let me know if I can help in any way.

MateuszK · Nov 27, 2024

Any news on this? I installed a clean Proxmox VE 8.3.0 (kernel 6.8.12-4-pve) and the problem still exists. The only thing that works: "aio=native"

spacefreak · Nov 27, 2024

@MateuszK the release notes of Proxmox VE 8.3.0 say that it allows opt-in use of kernel 6.11.
Did you try that kernel as well? I am following this issue since over a year and I saw in the kernel git log that there were are number of changes done to IO_URING in kernel 6.10.

MateuszK · Nov 28, 2024

spacefreak said:
@MateuszK the release notes of Proxmox VE 8.3.0 say that it allows opt-in use of kernel 6.11.
Did you try that kernel as well? I am following this issue since over a year and I saw in the kernel git log that there were are number of changes done to IO_URING in kernel 6.10.

No luck:

Code:

Linux vs03 6.11.0-1-pve #1 SMP PREEMPT_DYNAMIC PMX 6.11.0-1 (2024-10-23T15:32Z) x86_64 GNU/Linux

[  319.703341] (CPU 0/KVM,3999,20):ocfs2_dio_end_io:2424 ERROR: Direct IO failed, bytes = -5
[  319.712155] (CPU 0/KVM,3999,20):ocfs2_dio_end_io:2424 ERROR: Direct IO failed, bytes = -5
[  319.720129] (CPU 0/KVM,3999,20):ocfs2_dio_end_io:2424 ERROR: Direct IO failed, bytes = -5
[  319.728134] (CPU 0/KVM,3999,20):ocfs2_dio_end_io:2424 ERROR: Direct IO failed, bytes = -5
[  319.736127] (CPU 0/KVM,3999,20):ocfs2_dio_end_io:2424 ERROR: Direct IO failed, bytes = -5
[  319.744149] (CPU 0/KVM,3999,20):ocfs2_dio_end_io:2424 ERROR: Direct IO failed, bytes = -5
[  319.746022] (CPU 0/KVM,3999,20):ocfs2_dio_end_io:2424 ERROR: Direct IO failed, bytes = -5
[  319.752138] (CPU 0/KVM,3999,20):ocfs2_dio_end_io:2424 ERROR: Direct IO failed, bytes = -5
[  319.760122] (CPU 0/KVM,3999,20):ocfs2_dio_end_io:2424 ERROR: Direct IO failed, bytes = -5

Search

Search

Problem with PVE 8.1 and OCFS2 shared storage with io_uring

AZ129

New Member

bbgeek17

Distinguished Member

AZ129

New Member

bbgeek17

Distinguished Member

sivn

Member

devaux

Active Member

fertig

New Member

gurubert

Distinguished Member

spacefreak

New Member

dakralex

Proxmox Staff Member

spacefreak

New Member

gurubert

Distinguished Member

spacefreak

New Member

dakralex

Proxmox Staff Member

spacefreak

New Member

MateuszK

New Member

spacefreak

New Member

MateuszK

New Member

We value your privacy