Problem with PVE 8.1 and OCFS2 shared storage with io_uring

AZ129

New Member
Feb 21, 2023
2
1
3
Hey everyone,
I have problem with OCFS2 shared storage after upgrading one of the cluster nodes from 8.0 to 8.1.
Shared volume was correctly mounted on the upgraded node. VMs could be started, but can't correctly work because file system inside of VM is in read-only mode.
The error messages in syslog of host:
Code:
kernel: (kvm,85106,7):ocfs2_dio_end_io:2423 ERROR: Direct IO failed, bytes = -5
kernel: (kvm,85106,7):ocfs2_dio_end_io:2423 ERROR: Direct IO failed, bytes = -5
kernel: (kvm,85106,7):ocfs2_dio_end_io:2423 ERROR: Direct IO failed, bytes = -5

VM uses SCSI controller in VirtIO SCSI single mode, iothread=1, aio=io_uring.

I've found workaround switching to aio=threads. VMs are work fine now, but I'd be grateful for advices because as far as I know it's not the best setting for performance.
 
  • Like
Reactions: gurubert
Unfortunately the setup you have is not officially supported, and hence not tested by anyone regularly.

I would check the following things:
Is the "broken" behavior only on 8.1 node? It sounds like you still have 8.0 node in the cluster? If you do, does the VM work when migrated there?
Depending on the outcome of the test above, compare the package versions between the two. Are there any major differences in system, OCFS, kernel versions?
I assume your VM images are in QCOW format? You can experiment with mounting them directly. Another option is to boot VM with aio=threads and add a new disk with aio=io_uring, to see if there is a difference in behavior.

In the end, you may need to perform a meticulous data collection from working and non-working node and then submit the findings in bugzilla.proxmox.com, and perhaps even to the maintainers of the OCFS2. This may be Debian or Ubuntu core maillists. The most helpful information would be a step-by-step reproduction of the issue. For this you may need to install Virtual PVE and record all the changes that lead to broken installation.

Good luck


Blockbridge : Ultra low latency all-NVME shared storage for Proxmox - https://www.blockbridge.com/proxmox
 
Is the "broken" behavior only on 8.1 node?

Yes, all VMs with aio=io_uring on PVE 8.0.4 can work with virtual drives on shared storage without problems. But not on 8.1.4.

Another option is to boot VM with aio=threads and add a new disk with aio=io_uring, to see if there is a difference in behavior

I added new virtual disk with aio=io_uring on the node with PVE 8.0.4 - works fine. Then I migrated running VM to the host with PVE 8.1.4. Writing to the disk with io_uring failed with the same errors in the host log.

Are there any major differences in system, OCFS, kernel versions?

I guess the culprit is the new kernel, because with previous version there are no any problems with OCFS2.
I'll try to update remaining nodes to 8.1.4 and new kernel to check if it helps.
 
If I knew that older version works, but newer doesnt - I wouldnt rush to update everything. But perhaps you have your reasons.
If I were you I'd open a bug here https://bugzilla.proxmox.com/ with detailed description, versions from working and non-working nodes. Configuration details, steps to reproduce the problem. Logs showing a good and bad boot. Etc.



Blockbridge : Ultra low latency all-NVME shared storage for Proxmox - https://www.blockbridge.com/proxmox
 
I have a similar problem. I used the OCSFS2-System to allow the connection to two SAN systems via FC. Currenty, there is only one node connected to the system. It's a fresh install of 8.1.3. I tried the older kernel 6.2.16-19-pve instead of 6.5.11-4-pve but it didn't change anything.

Code:
Feb 02 09:09:01 ve05 kernel: (kworker/3:2,835,3):ocfs2_dio_end_io:2423 ERROR: Direct IO failed, bytes = -11
Feb 02 09:10:10 ve05 kernel: (kworker/3:2,835,3):ocfs2_dio_end_io:2423 ERROR: Direct IO failed, bytes = -11

Similar to AZ129, it helps to set the VMs to thread.
 
Last edited:
  • Like
Reactions: fertig and gurubert
Same here for Kernel 6.8 and 6.5.
But works with Kernel 6.2.16-20-pve (new installation of Proxmox 8.2.2)
 
anything new here? As far as I can see this as a "beginner", this problem concerns ISO CD-ROM files and cloud-init-drives too - which both may not set to "aio=threads" afaik
 
I tested with Kernel 6.8.12-3 and it only partially fixes the issue:
  • Booting from ISO file on OCFS2 partition works
  • Booting from QCOW2 on OCFS2 does not work
Thanks for pointing this out in the referenced thread! As you mentioned, it seems like the problem hasn't been fully fixed with the patch in io_uring. As for the mentioned thread, it would help to have a more exact reproducer (hardware setup, ocfs2 mount options, vm config) to have a clear picture when the ocfs2 I/O error is triggered:

Code:
ocfs2_dio_end_io:2421 ERROR: Direct IO failed, bytes = -5

I couldn't reproduce the error on a local ocfs2 setup (see [0] for the setup steps) with a VM having a SCSI-attached qcow2 drive image.

[0] http://gurubert.de/ocfs2_io_uring.html
 
Last edited:
This is my hardware setup:
- Hitachi G600 SAN storage
- 3 Quanta servers, each one is connected to the storage with Emulex FC adapters (redundant)

And my software setup:
- Multipath is in use on each server
- OCFS2 on FC LUN mounted on each server, these are the mount options:
rw,relatime,_netdev,heartbeat=global,nointr,data=ordered,errors=remount-ro,atime_quantum=60,cluster_stack=o2cb,coherency=full,user_xattr,acl,_netdev
- Mount-path of OCFS2 added to Proxmox as directory storage (shared)

And the virtual disk config of the VM:
virtio0: g600_data1:200/vm-200-disk-0.qcow2,discard=on,iothread=1,size=20G

If I start the VM, it seems to boot first (services getting started) and then the described errors appear in the kernel log and the VM hangs, it does not matter what OS (Linux, Windows, ...) is used within the VM.

If I change Async IO to native, everything works as expected:
virtio0: g600_data1:200/vm-200-disk-0.qcow2,aio=native,discard=on,iothread=1,size=20G

Unfortunately, there are no other messages in the kernel log related to this issue.
 
I switched to 'scsi', but that does not change anything. The VM hangs during boot and the kernel log shows exactly the same errors like before. This also happens if SSD emulation is disabled.

scsi0: g600_data1:200/vm-200-disk-0.qcow2,discard=on,iothread=1,size=20G,ssd=1
scsihw: virtio-scsi-single



If I switch to aio=native, the VM boots without any issues.
It doesn't seem to matter if 'scsi' or 'virtio' is in use.
 
Last edited:
Thanks for your quick answer and support @dakralex . I have this issue since about one year and I am very grateful that you take a look at it.
Let me know if I can help in any way.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!