Discard for SSD with LVM over MD?

alexc

Active Member
Apr 13, 2015
123
4
38
I added two SSD in my server to have speedy partition to store VM images. I've create mirror over that two SSD with mdadm, then create LVM over that mirror (so I can afford to do snapshot backup of VM). Then I've mkfs.ext4-ted over that LVM.

Now I wonder: what if I set or not set 'discard' option for mounting that SSD in the direcroty tree, will this impact the speed and robustness of SSD?

P.S. Another approach is to use LVM to produce mirrors but I do like mdadm much more.
 
I have PVE 3.4 with 2xSSD in mdadm raid-mirror, already for 2 month for certain use read/write on SSD, with options:

# cat /proc/mounts | grep discard
/dev/mapper/pve-ssdata /mnt/ssdata ext4 rw,relatime,barrier=0,data=ordered,discard 0 0

fstrim utility returns 0 when invoked, speed of backups doesn't lower on time.

so with set discard it works fine.

I suppose your performance would be more affected by amount of unallocated blocks on SSD, aka OverProvisioning.
Read articles where they suggest about 30% for OP, saing their tests showed it would help more than TRIM/discard, even on hw-raid controllers which do not support TRIM.
 
I have PVE 3.4 with 2xSSD in mdadm raid-mirror, already for 2 month for certain use read/write on SSD, with options:

# cat /proc/mounts | grep discard
/dev/mapper/pve-ssdata /mnt/ssdata ext4 rw,relatime,barrier=0,data=ordered,discard 0 0

fstrim utility returns 0 when invoked, speed of backups doesn't lower on time.

so with set discard it works fine.

I suppose your performance would be more affected by amount of unallocated blocks on SSD, aka OverProvisioning.
Read articles where they suggest about 30% for OP, saing their tests showed it would help more than TRIM/discard, even on hw-raid controllers which do not support TRIM.

Thank you, I'll try to follow. But won't the new SSDs are already a bit over-proviosion-aware? I mean won't they are prepered to work speedy out of box and being used for 100%? Older ones were like you've said, but newer?.. I use Intel 3610 disks, they are enterprise-targeted (and a bit expensive, compared to Intel 535, too :) ) so I hope I shouldn't plan to waste another 30% (beside general OS and FS overhead)...

But the question is still there:

if I put ext4 over LVM over GPT over MD over SSD - if I should mount that ext4 with discard option or its no use in such a "layered" case? Speed of this sandwich scares me, really, raw disks are much faster that number I see on ext4...
 
alexc, I've researched the issue with OP 'cause my HP G380G7 has P410i which doesn't support TRIM (HP should be itching!), and I conclude that amount of OP free blocks should correspond to size of buffer of typical IOs, more than to percentage of SSD size.

I considered to have spare 16GB on SSD, regardless of its size, would help the controller of SSD to keep fligh smooth.
But to keep the budget I left this OP space to LVM UnAllocated Physical Extents, which are most time unused, and take load only during the snapshot-backup of VMs/CTs.

If the speed of backups isn't too much important, then I'd advise you the same trick.

As of speed of sandwich (LVM inside GPT inside MD-raid on SSD) - for me it flies perfectly.
I can run bench on this sandwich if you give me an idea what bench to use.

dumb dd gives approx result because it is synchronous.
 
reading the image from sandwich:

# dd if=/mnt/ssdata/images/201/vm-201-disk-1.qcow2 of=/dev/null bs=1M count=1024
1024+0 records in
1024+0 records out
1073741824 bytes (1.1 GB) copied, 0.149031 s, 7.2 GB/s
 
writes
# dd if=/mnt/ssdata/images/201/vm-201-disk-1.qcow2 of=/mnt/ssdata/test.file bs=1M count=1024
1024+0 records in
1024+0 records out
1073741824 bytes (1.1 GB) copied, 10.4791 s, 102 MB/s

but now the server is busy with user load
 
I guess you'd like this speed :)

# dd if=/mnt/ssdata/images/201/vm-201-disk-1.qcow2 of=/mnt/ssdata/test.file bs=1M count=1024
1024+0 records in
1024+0 records out
1073741824 bytes (1.1 GB) copied, 0.666782 s, 1.6 GB/s
 
I guess you'd like this speed :)

# dd if=/mnt/ssdata/images/201/vm-201-disk-1.qcow2 of=/mnt/ssdata/test.file bs=1M count=1024
1024+0 records in
1024+0 records out
1073741824 bytes (1.1 GB) copied, 0.666782 s, 1.6 GB/s
Quite a numbers, indeed!

Just wonder what's good in 'discard' flag for virtual disk in KVM VM, should we care for when we deal with SSD underneath?
 
Just wonder what's good in 'discard' flag for virtual disk in KVM VM, should we care for when we deal with SSD underneath?

In theory, when KVM-VM does its wites-deletes inside, it is always new writes to QCOW2-image file, with deletes marking freed blocks, which have corresponding blocks in host filesystem to be marked freed.
With discard=on, these freed blocks are given to host, which in turn, marks these freed block with TRIM command to SATA drive.
Then SSD could benefit from it.
The cost is probable delay for sending extra command to drive.

But you see, it is beefed-up sandwich, from deleted files in FS in VM on QCOW2 on host FS with 'discard' on SATA drive, but it should work.
With RAW type of images, there should be less layers and more benefit to SSD, in theory.

In practice you better test in your conditions - SSD, SATA-controller, size of Over-Provisioning, FS-optioons, VM-options, usage load.
Too many them to predict for always.
 
Using mount option discard is costly for IO performance. Every low level scsi command unmap effectively becomes a synchronous operation. For performance and durability of the SSD disk it is much better to avoid this mount option and instead periodically run the fstrim command on the disk.
 
Using mount option discard is costly for IO performance. Every low level scsi command unmap effectively becomes a synchronous operation. For performance and durability of the SSD disk it is much better to avoid this mount option and instead periodically run the fstrim command on the disk.
how often would you recommend?

I'll try this way and compare.
 
I guess you'd like this speed :)

# dd if=/mnt/ssdata/images/201/vm-201-disk-1.qcow2 of=/mnt/ssdata/test.file bs=1M count=1024
1024+0 records in
1024+0 records out
1073741824 bytes (1.1 GB) copied, 0.666782 s, 1.6 GB/s
Hi,
sorry, but this looks that you measure caching - not ssd-speed.

If you drop the cache before and use an conv=fdatasync you should get more realistic values... or better use tools like fio.

Udo
 
  • Like
Reactions: LiSergey
Using mount option discard is costly for IO performance. Every low level scsi command unmap effectively becomes a synchronous operation. For performance and durability of the SSD disk it is much better to avoid this mount option and instead periodically run the fstrim command on the disk.
changed mount options wirthout discard, nightly backup completed in 6:54 vs 7:52 usually. users didn't noticed any difference.
added to cron to run fstrim daily in the early morning and output number of trimmed bytes to log file.
after a while I'll check it :)
 
Hi,
sorry, but this looks that you measure caching - not ssd-speed.

If you drop the cache before and use an conv=fdatasync you should get more realistic values... or better use tools like fio.

Udo
Code:
# sync && echo 3 > /proc/sys/vm/drop_caches && dd if=/mnt/ssdata/images/202/vm-202-disk-1.qcow2 of=/mnt/ssdata/test.file bs=1M count=1024 conv=fdatasync
1024+0 records in
1024+0 records out
1073741824 bytes (1.1 GB) copied, 16.2512 s, 66.1 MB/s

is that more realistic?
then it's awfull for SSD.
or smth wrong?
 
Code:
# sync && echo 3 > /proc/sys/vm/drop_caches && dd if=/mnt/ssdata/images/202/vm-202-disk-1.qcow2 of=/mnt/ssdata/test.file bs=1M count=1024 conv=fdatasync
1024+0 records in
1024+0 records out
1073741824 bytes (1.1 GB) copied, 16.2512 s, 66.1 MB/s

is that more realistic?
then it's awfull for SSD.
or smth wrong?
Hi,
no that's right. What kind of SSD do you use?

Udo
 
2x 60GB in mdadm raid1 Silicon Power SSD.
from smartctl:
Code:
Device Model:  SPCC Solid State Disk
Serial Number:  EB84075A1BB200801379
Firmware Version: S9FM02.6
User Capacity:  60 022 480 896 bytes [60,0 GB]
Sector Size:  512 bytes logical/physical
from dmesg:
Code:
sd 2:0:0:0: [sdc] 117231408 512-byte logical blocks: (60.0 GB/55.8 GiB)
scsi 3:0:0:0: Direct-Access  ATA  SPCC Solid State S9FM PQ: 0 ANSI: 5
sd 2:0:0:0: [sdc] Write Protect is off
sd 2:0:0:0: [sdc] Mode Sense: 00 3a 00 00
sd 2:0:0:0: [sdc] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA

one of the cheapest in local store.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!