SAS disk array and discard option

Dd9

New Member
Apr 20, 2021
5
0
1
28
Hello,

I have set up a fresh install of an HA cluster of 3 identical servers with Proxmox.
Code:
Kernel Version

Linux 5.4.106-1-pve #1 SMP PVE 5.4.106-1 (Fri, 19 Mar 2021 11:08:47 +0100)
PVE Manager Version

pve-manager/6.3-6/2184247e
We have a Dell ME4024 disk array. I configured the disk array with the Dell recommended RAID settings (ADAPT Raid). The disk array has been attached to each server with 2 SAS cables. I provisionned two virtual volumes from the disk array. I set up the /etc/multipath.conf file accordingly to the Dell documentation :
Code:
defaults {
        find_multipaths yes
        user_friendly_names yes
}

blacklist {
}

devices {
device {
        vendor "DellEMC"
        product "ME4"
        path_grouping_policy "group_by_prio"
        path_checker "tur"
        hardware_handler "1 alua"
        prio "alua"
        failback immediate
        rr_weight "uniform"
        path_selector "service-time 0"
        }
}

multipaths {
        multipath {
                wwid "3600c0ff00052ff4d542b686001000000"
                alias vol-data-01
        }
        multipath {
                wwid "3600c0ff00052ff4d2fae650001000000"
                alias vol-backup-01
        }

}

With multipath -ll, I can see the attached disks on each server with the redundant paths.
I created LVM volumes and created some VMs for testing on the shared storage.

The test VMs have at least 2 SCSI virtual hard disks, one for the OS and an empty drive.

I have the following issue, both on Windows Server 2019 and Ubuntu :
After creating and deleting a few dummy files on the empty drive, with fsutil on Windows, dd or fallocate on Linux, the reported disk size is getting wrong. Even after deleting all the files, the drive shows inside the VM that, for example, 30/100 GB are used, and if I try to copy more than 70 GB, it will fail. Sometimes, even the OS disk shows a wrong space usage, and then it can lead to the VM not being able to boot anymore.
It seems not to matter if the SCSI disks are configured with or without writeback cache and/or discard.

Thanks for your help !

Edit : detaching and reattaching the disks sometimes fixes the free space shown but then unfortunately the issue comes back very quickly, sometimes even after 1 file copied.
 
Last edited:
After more investigation:
It is not caused by multipath because it happens with LVM volumes on a single path.
There is a huge performance difference between discard being On vs Off : 2 GB/s vs 500 Mb/s. Why does discard impact so much performance ?
It is as if discard acts as a cache option, because I have the same performance with discard on vs writeback cache on.

I tried to create a volume on local-lvm with writeback cache on to compare, it freezes after a few GB copied, no matter the amount of RAM. It does not freeze without write-back cache.

So,
External SAS storage:
Gets corrupted when discard is on, and performance with discard is as if cache was on (2GB/s vs 500 MB/s).
Internal storage lvm-thin:
Freezes when writeback cache is on after being at 2 GB/s. Performance with or without discard is at 500 MB/s.

Edit : no matter the VM OS, disk type (SCSI, SATA) the discard option messes up the drive. dmesg shows huge bitmap corruption error.
A ext4 test disk on a LVM storage with the same conditions, mounted on the node with discard=on does not get corrupt. Thin provisionning works correctly because I can see the space reclaimed on the storage array GUI.
 
Last edited:
New test : I tested the array directly with QEMU/KVM with Ubuntu 20.04 and the corruption did not happen. Thin provisionning worked.
I have created a SCSI device with discard=unmap.
So it seems to be related to Proxmox.
 
Did you ever fix this? Upgrading a Customer from ESX to PVE who has some DELL R730xd and an ME4024. Do I still have to expect issues in PVE 8?

Regards
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!