Thanks
@RoCE-geek for your extensive testing.
This might appear to be true, but I don't think that is actually the case. I do acknowledge though, that you are trying to establish an effective workaround in order to confidently use the product in production.
I should mention
aio=native did not work for me. I had to use
aio=threads, per
https://bugzilla.kernel.org/show_bug.cgi?id=199727. In combination with
VirtIO SCSI Single, this resulted in better performance for my workloads. YMMV I guess.
Bold claim imho. It's worth mentioning the comments in GitHub issue #756 that the issue is not seen in
RH and
Nutanix environments, but note Jon Kohler's comment mentions Nutanix's use of custom "host data path plumbing", which I think is somewhat telling.
Comment #14 in the kernel bug report mentions
"aio=threads avoids softlockups because the preadv(2)/pwritev(2)/fdatasync(2) syscalls run in worker threads that don't take the QEMU global mutex. Therefore vcpu threads can execute even when I/O is stuck in the kernel due to a lock."
To me the
root cause probably lies in the Debian (and maybe +Proxmox) implementation of virtio in relation to the QEMU global mutex. The driver issue might be coincident to a change not implemented in Debian (+Proxmox?) but implemented in RHEL, i.e. the driver might depend on a capability not present in Debian (+Proxmox?). As far as I'm aware the driver is platform agnostic, and probably does not interrogate for hypervisor capabilities.
Alternatively, there may have been a change in the Debian implementation coincident with the Bullseye release (as I mentioned above). IIRC, the Bullseye release falls between the 0.1.204 and 0.1.208 driver releases. Such a change may not be present in the RHEL implementation and thus not considered in the driver implementaiton. Similarly the issue appears between RHEL releases 8.4 and 8.5.
Hi
@benyamin, I'm very happy for your inputs, so let me some comments.
Yes, you're absolutely right and I'm usually skeptical to such statements too.
But let's do some short recap:
- This problem is not new. It's in the PVE community for more than two years, although fragmented / isolated.
- I've found approx. 10 threads regarding "Desc next is 3", "virtio: zero sized buffers are not allowed" or "Reset to device".
- Affected people are sad, often even desperate. Although there are others willing to help, I've seen many more or strange advices how to solve it, or at least mitigate it. It was a first warning for me, as there is a mix of unrelated tips, often confusing, often illogical. But at least everyone is trying to help and it's very valuable. But hard for any newcomer to this problem - it's a mess.
- I've checked all the relevant threads known to me, including the github ones and even those with more general issues.
- And to be honest, many posts seemed to me like a discussion of the members of the gentlemen's "Old England Club".
It doesn't make sense to me that too many smart people are discussing too long about this problem without any serious resolution.
It's full of assumptions, speculations, impressions, suspicions, but there are a very few (more or less hard) facts. It was another serious warning for me.
- After more than 20 years in enterprise IT I've learned that endless theorizing, albeit in the best spirit and with sincere motivation, never delivered a solution. I simply got the feeling that nobody wants to get their hands dirty.
So I decided to "solve it" by my way, as usual. And I've one secret weapon. In the "bug hunting" world, I expect nothing, but I'm ready for anything. In other words, when I'm deep in some topic, I don't care about the opinion of others. I'm immune to (even my personal) feelings, impressions, imaginations, etc.
I just need some measure, some numbers to compare and do the hard work. So only "incidence analysis" is what counts.
When I realized that I've found a "1/0" switch (in my environment), I had no emotions, but I had many doubts if it's enough. So the more I've tried to shoot this conclusion, but no way. It's fully reproducible on my side and it's causal. Driver's version rules.
If you'll find that 0.1.208 is (super) stable, and the closest higher version (0.1.215) is defective, what you'll do?
I've tried some other versions < 0.1.208 and others > 0.1.215, but the conclusion was the same.
The "sweet spot" between 0.1.208 and 0.1.215 is simply there and I can't ignore this.
And it was the same in 2nd round, 3rd round, etc.
I can't presume to speculate, just summarize the "facts" (although I can still have some doubts).
And only the "facts" I have are those I've described. I don't know if in the diff between 0.1.208 and 0.1.215 is the solution, but I'm quite sure that there is a key. Key to some definitive solution (or at least explanation).
I should mention aio=native did not work for me.
It's not in contradiction to my findings. I've stated that this is just a mitigation.
Only stable solution (for me) is the vioscsi downgrade to 0.1.208 or VirtIO Block.
Last but not least: I'm open to any meaningful solution, or any incidence analysis from others.
If the tip with vioscsi 0.1.208 will not work for others, it's absolutely OK, but I'm not aware of more such "dirty hands".
But still, I can't ignore my findings, although I know very well that demonstrations like "1/0 switch" are usually rare and suspicious.
I've a deep respect to anyone who is working hard on this problem, but as of me, it's really time for solution, not for more speculations.
I apologize if I've potentially oversimplified some things in this post, the goal was not to disparage anyone (much less you), but to explain my way of thinking and problem solving.
So let's go ahead to some promising future