NAS iSCSI LUN and LVM-Shared on top - bad IO while having VM delete ongoing

jsk73

New Member
Mar 23, 2026
1
0
1
Hello everyone,

My homelab proxmox cluster with 2 node setup is:

- I have a TrueNAS storage and created HDD and SSD iSCSI share to proxmox cluster 2 node.
- iSCSI configured with multipath 3 path: (separate physical NICs, 1Gbps each)

Code:
root@pve1:~# multipath -ll
mpathb (36589cfc0000000daa067606013d13fe0) dm-6 TrueNAS,iSCSI Disk
size=730G features='0' hwhandler='0' wp=rw
`-+- policy='round-robin 0' prio=1 status=active
  |- 6:0:0:1 sdb 8:16 active ready running
  |- 7:0:0:1 sdd 8:48 active ready running
  `- 8:0:0:1 sdc 8:32 active ready running
mpathc (36589cfc000000b1c40188c025655bf67) dm-7 TrueNAS,iSCSI Disk
size=2.8T features='0' hwhandler='0' wp=rw
`-+- policy='round-robin 0' prio=1 status=active
  |- 9:0:0:0  sde 8:64 active ready running
  |- 10:0:0:0 sdf 8:80 active ready running
  `- 11:0:0:0 sdg 8:96 active ready running

- Then I create 2 lvm-shared on top of those 2 LUN with: allow snapshot-chain, shared enabled, wipe Removed Volume enabled, set saferemove_throughput to 1Gbps
- VMs disk put OS on SSD (sata SSD), and data on HDD, all are with qcow2 format

Everything work fine, but until I see a problem is, when I delete a VM, because of "wipe Removed Volume" enabled so proxmox will try to zero out the deleted lv in lvm. But even with throughput set to 1Gbps, it took about 5 minutes to clean a 20G disk on SSD
And during this time (during proxmox zero out the deleted disk) all other VMs, same or difference host, getting IO lag very bad, I mean, a VM even got to remount it's filesystem to readonly.

I know the multipath connection with 1Gbps each is bad, but, does it really bad that delete a 20G disk took that long? it even affect other VMs performance also.

Does anyone having this kind of storage setup seeing this issue? I dont have enough equipment to test on higher bandwidth network NIC yet but I wonder issue will less noisy with 10Gbps or 25GBps NIC? should I lower the throughput to like about 50% of the NIC bw?

I'm testing solution for my team to transfer from VMware to proxmox and we have to make use of our current SAN storage. but I really stuck with this kind of problem in my homelab for weeks now without solution, someone may able to test/help on this please?

Thanks and Regards.
 
Last edited:
the current code is

Code:
my $secure_delete_cmd = sub {
        my ($lvmpath) = @_;

        my $stepsize = $scfg->{'saferemove-stepsize'} // 32;
        $stepsize = $stepsize * 1024 * 1024;

        my $bdev = abs_path($lvmpath);

        my $sysdir = undef;
        if ($bdev && $bdev =~ m|^/dev/(dm-\d+)|) {
            $sysdir = "/sys/block/$1";
        } else {
            warn "skip zero-out for volume '$lvmpath' - no device mapper link\n";
            return;
        }

        my $write_zeroes_max_bytes =
            file_read_firstline("$sysdir/queue/write_zeroes_max_bytes") // 0;
        ($write_zeroes_max_bytes) = $write_zeroes_max_bytes =~ m/^(\d+)$/; #untaint

        if ($write_zeroes_max_bytes == 0) {
            # If the storage does not support 'write zeroes', we fallback to cstream.
            # wipe throughput up to 10MB/s by default; may be overwritten with saferemove_throughput
            my $throughput = '-10485760';
            if ($scfg->{saferemove_throughput}) {
                $throughput = $scfg->{saferemove_throughput};
            }

            my $cmd = [
                '/usr/bin/cstream',
                '-i',
                '/dev/zero',
                '-o',
                $lvmpath,
                '-T',
                '10',
                '-v',
                '1',
                '-b',
                '1048576',
                '-t',
                "$throughput",
            ];
            eval {
                run_command(
                    $cmd,
                    errmsg => "zero out finished (note: 'No space left on device' is ok here)",
                );
            };

so, by default, if your storage support wirte zeroes features, proxmox tell the storage to delete range of 0 by range of "saferemove-stepsize" (32MB).

the saferemove_throughput is only used in the fallback, if the support don't support write zeroes. (in this case, proxmox is really sending zeros through the network where the throughput limit apply)

maybe try to reduce saferemove-stepsize value if you are in the first case.