I want to like VirtIOFS, but...

I am not informed enough to recommend this, I wouldn't use it on data I am not ready to lose. I am still using cache=auto, which I have set up half a year ago with hook scripts. Compared to always, auto loses about 21% performance, using OP's fio command.

If NFS has no issues for you, then I don't see any reasons to migrate as you won't get better performance with virtiofs.
Thanks for this. I was just about to post something similar.

I've never messed with the default (no cache) for my VirtIO SCSI disks, so I'm not really clear on what it means to change to cache being always on for VirtIO FS. What is that actually doing at the filesystem level? Does it actually change whether ZFS is working in async/sync mode?
 
Does it actually change whether ZFS is working in async/sync mode?
I did a bit more research, it seems the cache policy mode is about metadata and paths.... not data? as such i assume it just means reads of metadata and paths..... not sure how that would affect sync/async (which i though was about writes... but i am still new to ZFS on my truenas server and just have cpehRBD and LVMs on my promox.

  • cache=always: Metadata, data, and pathname lookup are cached in the guest and never expire.
  • cache=auto: Metadata and pathname lookup cache expires after a configured amount of time (default is 1 second).
  • cache=none: Forbids the FUSE client from caching to achieve the best coherency at the cost of performance.

looking at https://github.com/kata-containers/runtime/issues/2748 it seems it depends on what is writing to the file system (host vs guest)- it looks like if a host changes the metadata or path then virtioFS gets 'funky' - so its only safe to use always where it is gurateed only the guest will write to virtioFS and that is the only way the paths and metadat on the storage backing the virtioFS can be changed.

so for my cepFS backed virtioFS it would seem 'always' could be an issue incase another node changes the metadata or paths in the cephFS but the only scenario i could see this being an issue is as follows, this is how I interpret it based on the github link above:

  1. swarm container FOO is runningt on docker01 on pve1 - everything works fine with always
  2. swarm container FOO moves from vm-docker01 on pve1 to vm-docker02 on pve2 things will be fine if virtioFS has never cached the metadat or paths, it makes changes to these
  3. swarm container FOO moves from vm-docker02 on pve2 to vm-docker01 on pve1 - now because virtioFS hasn't seen the VM write the changes ot metadat / paths it will not provide the wrong metadata to the VM
i don't know how the processes in the container would respond at that point.... i won't bother testing, i will just always leave to auto based on this!
 
Last edited:
  • Like
Reactions: rzmeu
I think one major benefit of virtiofs over NFS is that you don't have to worry about doing your writes sync (and using a SLOG for security)?
 
Having the same performance issues.

Mount on host gets 36.k Read Iops/19.6K Write Iops with OP's command.
In the guest mounted with virtiofs I only get 3576 Read Iops/1928 Write Iops.

Big oof if you ask me.
 
Last edited:
  • Like
Reactions: waltar
VirtioFS is intended as a shared user-space file system between multiple containers. I believe it is still mostly single threaded and thus using it in VMs has significant overhead between context switching and synchronizing file system semantics. There are improvements being built such as shared memory pages between host and guest and multi-queue in Linux, but that also requires a lot of work to percolate that down.

Whether it is “better” than NFS or CephFS or a block device depends on your needs.
 
Last edited:
  • Like
Reactions: waltar
Hi,
a user mentions that setting a custom option --thread-pool-size improved the situation for them: https://bugzilla.proxmox.com/show_bug.cgi?id=6370

If you want to test if that helps for your performance issues too and are not afraid to get your hands dirty, you'll need to set it manually in the Perl code for now (in /usr/share/perl5/PVE/QemuServer/Virtiofs.pm and then run systemctl reload-or-restart pvedaemon.service pveproxy.service pvescheduler.service) or replace the binary with a wrapper. Note that messing up the code will lead to errors, you can reinstall with apt install --reinstall qemu-server virtiofsd. Both kinds of change will get lost after updates, so this is just for testing. Check with ps aux if the virtiofsd process is then actually started with the additional parameter.
 
  • Like
Reactions: waltar
It was actually me who filed that bug :).
Only reduced (note reduced as it stills craps the bed a bit too often :/) the amounts of hangs in the vm's.
Didn't improve performance sadly.
Did try with different amount of threads (1/4 total cores, 1/2, 1/1,...) but not that much difference.
 
  • Like
Reactions: fiona and waltar
My experience is that virtiofs bridging of a cephFS mount is signifcantly faster than mounting a cephFS volume in the VM (i haven't tried rbd yet).

Now it could be I have messed up using the cephFS kernel driver to mount the volume across the loopback network, not sure. I am in the middle of a post to ask for help on that, but here is a preview, this is the same volume mounted two different ways.

virtioFS passing cephFS from the host mount into a guest mount
Code:
| Test           | Read MB/s  | Write MB/s | Read IOPS | Write IOPS |
|----------------|------------|------------|-----------|------------|
| seqwrite-1M    | 0          | 2145MiB/s  | -         | 2145       |
| seqread-1M     | 3488MiB/s  | 0          | 3488      | -          |
| randrw-4k      | 118MiB/s   | 50.6MiB/s  | 30.2k     | 13.0k      |

libceph in the vm accessing the same cephFS volume across the loopback network (in theory the same way the host is accessing the cephFS)
Code:
| Test           | Read MB/s  | Write MB/s | Read IOPS | Write IOPS |
|----------------|------------|------------|-----------|------------|
| seqwrite-1M    | 0          | 984MiB/s   | -         | 983        |
| seqread-1M     | 1601MiB/s  | 0          | 1601      | -          |
| randrw-4k      | 5899KiB/s  | 2518KiB/s  | 1474      | 629        |

( sudo mount -t ceph :/ /mnt/docker-libceph -o name=docker-cephfs,secretfile=/etc/ceph/docker-cephFS.secret,conf=/etc/ceph/ceph.conf,fs=docker) not sure if there are params i should set to make it faster?)

I suspect in the first test i am seeing some aspects of metatdata and other caching provided by virtioFS and possibly QEMU. Output is from a script 'i' wrote to wrap FIO and make it easy for me to run repeatble tests and summarize output so it possible i also messed that up and the tests are bad.

i continue to investigate the loopback network i have for the VM to see if i made some dumb mistake there......

Havent seen any issues with hangs yet.

--edit--
no mystery after all, the first result is defintely due to virtiofs metadata caching and whatever qemu does, this is the same test run on the proxmox host, why i didn't think to do that before.... sigh... this is the host connecting to cephFS, similar results to the VM.


Code:
| Test           | Read MB/s  | Write MB/s | Read IOPS | Write IOPS |
|----------------|------------|------------|-----------|------------|
| seqwrite-1M    | 0          | 853MiB/s   | -         | 852        |
| seqread-1M     | 1755MiB/s  | 0          | 1755      | -          |
| randrw-4k      | 6401KiB/s  | 2739KiB/s  | 1600      | 684        |
 
Last edited:
A filesystem ever do write buffering before flush and read ahead which get next file blocks before requested from application.
This isn't that way when using block storage as you cannot foresee which blocks would be need next.
 
  • Like
Reactions: scyto
ElectronicsWizardry did some storage testing that really helped me get a handle on what to expect out of VirtIOFS right now.
See: https://www.youtube.com/watch?v=d_zlMxkattE

I actually see this as very useful for getting small files from the host to a container without needing to set up NFS or SMB. That's a bit overkill just to copy in an 8KB configuration file or even a whole home directory's worth of config dotfiles. And moving a ton of small files around doesn't need the best sustained throughput.
 
  • Like
Reactions: waltar