passing trim from virtual computer to host block device


Renowned Member
Oct 1, 2011

I think everything runs smooth but i would like to get confirmation if it is really ok. Setup has flash devices (actually nvme samsung and kioxia) and they are directly running on pcie (i.e. without some kind of raid controller etc although raid controller setup + nvme seems to be rare case in general as i understand)

root@pve:~# lsscsi -s
[16:0:0:0]   disk    Samsung  Flash Drive      1100  /dev/sda   64.1GB
[N:0:6:1]    disk    SAMSUNG MZQL2960HCJR-00A07__1              /dev/nvme0n1   960GB
[N:1:6:1]    disk    SAMSUNG MZQL2960HCJR-00A07__1              /dev/nvme1n1   960GB
[N:2:1:1]    disk    KCD6XVUL3T20__1                            /dev/nvme2n1  3.20TB
[N:3:1:1]    disk    KCD6XVUL3T20__1                            /dev/nvme3n1  3.20TB
[N:4:1:1]    disk    KCD6XVUL3T20__1                            /dev/nvme4n1  3.20TB
[N:5:1:1]    disk    KCD6XVUL3T20__1                            /dev/nvme5n1  3.20TB

root@pve:~# lspci | grep NVM
81:00.0 Non-Volatile memory controller: KIOXIA Corporation NVMe SSD Controller Cx6 (rev 01)
82:00.0 Non-Volatile memory controller: KIOXIA Corporation NVMe SSD Controller Cx6 (rev 01)
c1:00.0 Non-Volatile memory controller: Samsung Electronics Co Ltd NVMe SSD Controller PM9A1/PM9A3/980PRO
c2:00.0 Non-Volatile memory controller: Samsung Electronics Co Ltd NVMe SSD Controller PM9A1/PM9A3/980PRO
c3:00.0 Non-Volatile memory controller: KIOXIA Corporation NVMe SSD Controller Cx6 (rev 01)
c4:00.0 Non-Volatile memory controller: KIOXIA Corporation NVMe SSD Controller Cx6 (rev 01)

pve storage resources are created two ways

1. lvm-based storage (nvme4n1)
2. zpool-based storage (nvme2n1 and 3n1)

Virtual machine is created ordinarily except Discard is checked at virtual block device

scsihw: virtio-scsi-single
virtio0: vmpool1:vm-107-disk-2,discard=on,iothread=1,size=32G
virtio1: si_nvme4n1:vm-107-disk-0,discard=on,size=32G

And my concern is if trim gets thru from virtual computer down to the physical devices. I check for trim passing thru like this in lvm-based storage's case

1. in virtual computer i create ext4 filesystem and mount it under /mnt/vdb
2. in host root prompts i enter two commands (so i so to say double check, i believe one is sufficiant)

root@pve:~# btrace -a discard /dev/nvme4n1
root@pve:~# sar -d --dev=nvme4n1 1

3. in virtual computer i copy some into /mnt/vdb/usr-rm-1 and delete it

virt# cp -a /usr /mnt/vdb/usr-rm-1
virt# rm -rf /mnt/vdb/usr-rm-1

4. when i say in virtual computer 'fstrim -v /mnt/vdb' i see activity in btrace and in sar output (sar 'dkB/s' column i undestand it counts discards) at pve host

I try to check similarily in zpool-based storage's case (virt computer block device is at host zfs zvol kind dataset) but i do not see any activity in btrace or sar output. I see activity when i specificly say at host root prompt

root@pve:~# zpool trim vmpool1

and after this i see

root@pve:~# zpool status vmpool1 -t
  pool: vmpool1
 state: ONLINE

        NAME                                  STATE     READ WRITE CKSUM
        vmpool1                               ONLINE       0     0     0
          mirror-0                            ONLINE       0     0     0
            nvme-KCD6XVUL3T20_Y2B0A00VT4T8    ONLINE       0     0     0  (94% trimmed, started at Tue 05 Sep 2023 01:59:39 AM EEST)
            nvme-KCD6XVUL3T20_Y2B0A00XT4T8_1  ONLINE       0     0     0  (94% trimmed, started at Tue 05 Sep 2023 01:59:39 AM EEST)

errors: No known data errors


root@pve:~# sar -d --dev=nvme2n1,nvme3n1 1

01:59:47 AM       DEV       tps     rkB/s     wkB/s     dkB/s   areq-sz    aqu-sz     await     %util
01:59:48 AM   nvme3n1    609.00      0.00   1672.00 42459136.00  69722.18      0.51      0.83     99.60
01:59:48 AM   nvme2n1    611.00      0.00   1672.00 42459136.00  69493.96      0.51      0.83    100.40

01:59:48 AM       DEV       tps     rkB/s     wkB/s     dkB/s   areq-sz    aqu-sz     await     %util
01:59:49 AM   nvme3n1    617.00      0.00   1732.00 24334576.00  39442.96      0.50      0.81     99.20
01:59:49 AM   nvme2n1    621.00      0.00   1732.00 24203504.00  38977.84      0.51      0.81     99.60

I would like to ask if it is expected trimming behavior. (I believe this zpool trimming happens in general thanks to some cron job once per day or so, '/etc/cron.d/zfsutils-linux -> /usr/lib/zfs-linux/trim'.) And i would welcome very much if somebody suggests some maybe more appropriate ways to go around trimming at pve.

Best regards,

And i would welcome very much if somebody suggests some maybe more appropriate ways to go around trimming at pve.
First: AFAIK, trimming is not necessary anymore if you have proper enterprise hardware. I've never trimmed my enterprise SSDs in a decade and they still work as expected. Trim was only recently introduced in ZFS.

ZFS and trimming is somehow different. Trimming a zvol (dataset makes no sense) is "giving back space to the zpool" and will really zero deleted stuff. This is independend of the underlying storage backend and always works (also on harddisks). This is crucial for having thin-provisioning working best. Then there is zpool trimming which is what you expect: trim the underlying storage backend.

Another point is virtio. I was under the impression, that at least in some time in the past, discard was not possible on virtio and you have to use scsi with virtio-scsi in order to be able to trim properly. Not sure if this is still the case.
I was under the impression, that at least in some time in the past, discard was not possible on virtio and you have to use scsi with virtio-scsi in order to be able to trim properly. Not sure if this is still the case.
Got added with QEMU4, 4 years ago.