Using SSD's with ProxMox 2.0 -- How To?!?

oeginc

Member
Mar 21, 2009
133
0
16
I know I'm not the only one in this group that is looking to do something similar, I've seen lots of questions asked about SSD's, but not too many concrete answers.

For now, let's forget I already have 4 ProxMox servers up and running with more than 55 virtual hosts, other than - I'm looking to migrate those to these two new servers.

I have two servers, dual QC 2.5Ghz (8 cores total), 24GB RAM/each, and currently configured with one 512GB SSD, and one 2TB SATA2 drive.

Our existing servers are hitting ~1.5K IOPS with 4 2.5" SATA drives in RAID-10 setup, and I think we've hit our limit.... which is why we're going the SSD route (we have no more physical space in the servers to add more drives, and using an iSCSI SAN is not an option when we are pushing 100MB+ of data at our peaks... How we run our existing setup is we boot off of our 3.5" SATA drive, we have a 2nd SATA drive for backups, and then 4 2.5" SATA drives for VMs. We backup to the local drive first, then fire off a script to scp the backups to the other two machines (so all 3 machines have copies of all the backups at all times to make it faster to restore a machine in the event of a failure. Some of our VMs are 80+GB, so copying them -AT- time of failure isn't really an option).

In our new setup, we were thinking of setting up the server to boot off the 2 TB drive, changing the /var/vz directory to be used for backups, creating a new /var/vz directory on the SSD for VM's, and then setting up DRBD between the two machines for the /var/vz directory.

A couple questions though...

1. Does this sound like a good way to do this in this situation?

2. Will ProxMox HA work in this configuration (with DRBD)?

3. Can DRBD mirror between more than 1 host (we will be adding another 3 or so servers similar in configuration within the next 6-8 months)

4. I am not familiar with SSD's, but I've read several things that are of concern to me (should they be?)
a. You should enable TRIM support.... From what I understand, this is only available in 2.6.33+ of the linux kernel, and we're running 2.6.32 in ProxMox 2.0, correct? What is the recommended way of solving this?

b. You should 'align' your SSD partitions.. Again, not really sure how we'd go about doing that, is it necessary?

c. You should disable atime. You should NOT disable atime. I've read conflicting stories on this. I understand why disabling it would be good, but I've read that mail servers (possibly others) need the atime stamp in order to function properly??

Any help would be greatly appreciated..

-- Rob
 
@Telesight - That's good, so it seems to suggest that even though the ProxMox 2.0 kernel is only 2.6.32, it has the TRIM functionality backported to it? Also, it doesn't clearly answer whether or not you have to use EXT4 to get TRIM support, or if it doesn't matter which filesystem you use...
 
@Telesight - That's good, so it seems to suggest that even though the ProxMox 2.0 kernel is only 2.6.32, it has the TRIM functionality backported to it? Also, it doesn't clearly answer whether or not you have to use EXT4 to get TRIM support, or if it doesn't matter which filesystem you use...
Hi,
pve support trim but you need ext4 with the right mount-option (discard).

Udo
 
Ok, just a quick update on what I've done so far (for those interested, and also to get feedback on my procedures).

1. I setup the 2TB drive as the boot drive (/dev/sda) and the 512GB SSD as the secondary disk (/dev/sdb).

2. I installed ProxMox as normal (didn't change any parameters during install).

3. I did an "aptitude update; aptitude safe-upgrade -y"

4. Backed up /var/lib/vz

5. Unmounted /var/lib/vz

6. lvdisplay - Made note of the LE from the /dev/pve/data volume

7. lvremove /dev/pve/data

8. lvcreate --extents (previous LE from above) --name backup /dev/pve

9. mkfs.ext3 /dev/pve/backup

10. mkdir /backup

11. Added entry to /etc/fstab

12. fdisk /dev/sdb (for the SSD)
a. u - Turn on sector mode
b. o - Create new partition table
c. n - New partition
d. Start sector: 2048 (default - should be a good boundary for SSD's?)
e. End sector: 1000215215 (default - last sector of the drive)
f. w - Write the changes out to disk

13. pvcreate pve2 /dev/sdb1 (Create the physical volume)

14. vgdisplay pve2 (Make note of Total PE)

15. lvcreate -l (Total PE from above MINUS 4096 (16GB) for snapshots) pve2 data

16. mkfs.ext4 /dev/pve2/data

17. Add entry to /etc/fstab
a. Add discard option to enable TRIM support
b. Add relatime option to disable ridiculous atime support

18. mount /dev/pve2/data /var/lib/vz

19. Restore /var/lib/vz backup

20. Change the default scheduler for the SSD to noop in /etc/rc.local

21. Reboot

This gets me boot and backups on my 2TB HDD, and all VM's/templates/etc on my SSD.
 
@Oeginc
A few remarks on using the SSD:

An SSD uses TRIM, so you need the Ext4 filesystem to make use of TRIM and also the SSD must be mounted with the right option (discard) and you can add some other usefull commands to optimize the use of the SSD:
cp /etc/fstab /etc/fstab.bak #make a backup of fstab
nano /etc/fstab
add to the line: UUID=9e04f12c-4447-4e69-b1a7-48160eecafac /boot ext4 discard,noatime,nodiratime 0 1
Add a new line to fstab to let the system write temporary data to the RAM and not to SSD:
tmpfs /tmp tmpfs defaults,noatime,mode=1777 0 0
Save fstab
 
@Telesight - I already mentioned that in my post above, step #16 - I format the partition EXT4, step #17 I modify the /etc/fstab... Thanks though! ;)
 
Ok, just running some tests here, clearly something doesn't make sense...


This is on the regular HDD:

root@svr01:~# pveperf /
CPU BOGOMIPS: 36587.39
REGEX/SECOND: 673156
HD SIZE: 94.49 GB (/dev/mapper/pve-root)
BUFFERED READS: 111.93 MB/sec
AVERAGE SEEK TIME: 10.56 ms
FSYNCS/SECOND: 949.96
DNS EXT: 77.59 ms
DNS INT: 81.62 ms



This is on the SSD:

root@svr01:~# pveperf /var/lib/vz
CPU BOGOMIPS: 36587.39
REGEX/SECOND: 702718
HD SIZE: 453.70 GB (/dev/mapper/pve2-data)
BUFFERED READS: 191.28 MB/sec
AVERAGE SEEK TIME: 0.09 ms
FSYNCS/SECOND: 164.86
DNS EXT: 96.94 ms
DNS INT: 71.85 ms

Sync times on the SSD are lightning fast, but the MB/sec and the FSyncs/sec seem really slow. When I had installed ProxMox on the SSD directly, I was getting 250+ MB/sec and 2000+ Fsyncs/sec...

And in fact, when I run this test, I am getting VERY slow writes...

root@svr01:/var/lib/vz# dd if=/dev/urandom of=test.out count=204800
204800+0 records in
204800+0 records out
104857600 bytes (105 MB) copied, 12.8109 s, 8.2 MB/s

(Just for the record, I get almost the EXACT same performance as I do with my regular HDD for writes)

root@svr01:/# dd if=/dev/urandom of=test.out count=204800
204800+0 records in
204800+0 records out
104857600 bytes (105 MB) copied, 12.9158 s, 8.1 MB/s

I've checked my mount options:

root@svr01:/# mount|grep mapper
/dev/mapper/pve-root on / type ext3 (rw,errors=remount-ro)
/dev/mapper/pve2-data on /var/lib/vz type ext4 (rw,relatime,discard)
/dev/mapper/pve-backup on /backup type ext3 (rw)


And I've checked my scheduler (which changing didn't seem to make a difference)

root@svr01:/# cat /sys/block/sdb/queue/scheduler
[noop] anticipatory deadline cfq


Any suggestions? That partition is ext4 mounted with defaults,relatime,discard options on a Crucial M4 512GB SSD.
 
Last edited:
Speed testing information...

Some additional information I found which helps explains some things....

Code:
root@svr01:/var/lib/vz# dd if=/dev/urandom of=/dev/null bs=1K count=1024000
1024000+0 records in
1024000+0 records out
1048576000 bytes (1.0 GB) copied, 118.12 s, 8.9 MB/s

So it looks as though /dev/urandom is REALLY slow for some reason... This appears to have been a significant factor in my testing, so I switched to writing zeros out as follows...

The regular HDD:
Code:
root@svr01:/var/lib/vz# dd if=/dev/zero of=/test.out bs=1K count=1024000
1024000+0 records in
1024000+0 records out
1048576000 bytes (1.0 GB) copied, 5.44764 s, 192 MB/s

The SSD:
Code:
root@svr01:/var/lib/vz# dd if=/dev/zero of=test.out bs=1K count=1024000
1024000+0 records in
1024000+0 records out
1048576000 bytes (1.0 GB) copied, 6.96131 s, 151 MB/s

As you can see, I am still getting significantly faster speeds with the regular HDD (+41MB/sec).

My read speeds appear to be pretty good, here is the HDD:

Code:
root@svr01:/var/lib/vz# dd if=/test.out of=/dev/null
2048000+0 records in
2048000+0 records out
1048576000 bytes (1.0 GB) copied, 1.67675 s, 625 MB/s

My read speeds on the SSD:
Code:
root@svr01:/var/lib/vz# dd if=test.out of=/dev/null 
2048000+0 records in
2048000+0 records out
1048576000 bytes (1.0 GB) copied, 1.68963 s, 621 MB/s

(Yes, I understand that 1GB file is probably getting cached)...
 
Re: Speed testing information...

Some additional information I found which helps explains some things....

Code:
root@svr01:/var/lib/vz# dd if=/dev/urandom of=/dev/null bs=1K count=1024000
1024000+0 records in
1024000+0 records out
1048576000 bytes (1.0 GB) copied, 118.12 s, 8.9 MB/s

So it looks as though /dev/urandom is REALLY slow for some reason... This appears to have been a significant factor in my testing, so I switched to writing zeros out as follows...

The regular HDD:
Code:
root@svr01:/var/lib/vz# dd if=/dev/zero of=/test.out bs=1K count=1024000
1024000+0 records in
1024000+0 records out
1048576000 bytes (1.0 GB) copied, 5.44764 s, 192 MB/s

...

(Yes, I understand that 1GB file is probably getting cached)...
Hi,
not only the reads are cached - also the writes!
To get comparable values use something like this:
Code:
dd if=/dev/zero of=bigfile bs=1024k count=8192 conv=fdatasync
You can use other mountoptions for ext4 to be faster (much more fsyncs), but then you are not so save on powerloss/hardware-failure. This is the reason, why the pve-staff use ext3!

Udo
 
Any suggestions? That partition is ext4 mounted with defaults,relatime,discard options on a Crucial M4 512GB SSD.

As telesight said your mount option should look like:
UUID=9e04f12c-4447-4e69-b1a7-48160eecafac /boot ext4 discard,noatime,nodiratime 0 1

So try to replace your defaults,relatime,discard options with what telesight said.

You can use other mountoptions for ext4 to be faster (much more fsyncs), but then you are not so save on powerloss/hardware-failure. This is the reason, why the pve-staff use ext3!

I'm curious as I use SSD's along with UPS for my Proxmox setup, which options would these be? :)
 
Re: Speed testing information...

Hi,
not only the reads are cached - also the writes!
To get comparable values use something like this:
Code:
dd if=/dev/zero of=bigfile bs=1024k count=8192 conv=fdatasync
You can use other mountoptions for ext4 to be faster (much more fsyncs), but then you are not so save on powerloss/hardware-failure. This is the reason, why the pve-staff use ext3!

Udo

@Udo - I understand, but if EXT3 doesn't support TRIM functionality, what am I support to do?
 
Re: Speed testing information...

@Udo - I understand, but if EXT3 doesn't support TRIM functionality, what am I support to do?
Hi,
use the right SSD ;).

Here two different SSDs on one system (as backup-spool-disk - this is the reason for readahead):

Corsair Performa:
Code:
# pveperf /export/spool1
CPU BOGOMIPS:      24080.33
REGEX/SECOND:      1176818
HD SIZE:           234.73 GB (/dev/sdb1)
BUFFERED READS:    321.72 MB/sec
AVERAGE SEEK TIME: 0.05 ms
FSYNCS/SECOND:     [B]3658.38[/B]
DNS EXT:           102.06 ms
DNS INT:           0.81 ms
# grep sdb /proc/mounts
/dev/sdb1 /export/spool1 ext4 rw,noatime,barrier=0,data=ordered,inode_readahead_blks=128,discard 0 0

OCZ-VERTEX3:
Code:
# pveperf /export/spool2
CPU BOGOMIPS:      24080.33
REGEX/SECOND:      1201175
HD SIZE:           110.03 GB (/dev/sdc1)
BUFFERED READS:    360.68 MB/sec
AVERAGE SEEK TIME: 0.16 ms
FSYNCS/SECOND:     295.45
DNS EXT:           87.07 ms
DNS INT:           0.84 ms
# grep sdc /proc/mounts
/dev/sdc1 /export/spool2 ext4 rw,noatime,barrier=0,data=ordered,inode_readahead_blks=128,discard 0 0
The corsair should also be fast in a raidconfig (also without trim).

BTW. as spool-disks are SSDs not the best choice (not reliable enough with huge data troughput).

Udo
 
Re: Speed testing information...

Hi,
use the right SSD ;).

Here two different SSDs on one system (as backup-spool-disk - this is the reason for readahead):

Corsair Performa:
Code:
# pveperf /export/spool1
CPU BOGOMIPS:      24080.33
REGEX/SECOND:      1176818
HD SIZE:           234.73 GB (/dev/sdb1)
BUFFERED READS:    321.72 MB/sec
AVERAGE SEEK TIME: 0.05 ms
FSYNCS/SECOND:     [B]3658.38[/B]
DNS EXT:           102.06 ms
DNS INT:           0.81 ms
# grep sdb /proc/mounts
/dev/sdb1 /export/spool1 ext4 rw,noatime,barrier=0,data=ordered,inode_readahead_blks=128,discard 0 0

OCZ-VERTEX3:
Code:
# pveperf /export/spool2
CPU BOGOMIPS:      24080.33
REGEX/SECOND:      1201175
HD SIZE:           110.03 GB (/dev/sdc1)
BUFFERED READS:    360.68 MB/sec
AVERAGE SEEK TIME: 0.16 ms
FSYNCS/SECOND:     295.45
DNS EXT:           87.07 ms
DNS INT:           0.84 ms
# grep sdc /proc/mounts
/dev/sdc1 /export/spool2 ext4 rw,noatime,barrier=0,data=ordered,inode_readahead_blks=128,discard 0 0
The corsair should also be fast in a raidconfig (also without trim).

BTW. as spool-disks are SSDs not the best choice (not reliable enough with huge data troughput).

Udo

Correct me if I am wrong, but both of those ARE using EXT4, correct? I'm not sure I am following you...
 
Re: Speed testing information...

Correct me if I am wrong, but both of those ARE using EXT4, correct? I'm not sure I am following you...

Right,
that should show, that you can reach the right speed also with ext4. 330MB/s with over 3000 fsync/s is not too bad, or?

I don't test the speed with ext3 - can i do tomorrow (right now they are in use for backup).

Udo
 
Re: Speed testing information...

Right,
that should show, that you can reach the right speed also with ext4. 330MB/s with over 3000 fsync/s is not too bad, or?

I don't test the speed with ext3 - can i do tomorrow (right now they are in use for backup).

Udo

I see... No, 330MB/s with 3000+ fsync/sec is NOT bad... I would be happy with that... I'll try mounting mine with the same options and see what kind of speed I get just for comparison.
 
Ran some more tests, apparently this drive does not like EXT4...

When I formatted the partition as EXT3 on the SSD, this is what I got...

BUFFERED READS: 241.19 MB/sec (about the same as EXT4)
AVG. SEEK TIME : 0.08ms (about the same as EXT4)
FSYNCS/SECOND : 2827.41 (almost 20x the speed of EXT4)

What about EXT4 could the SSD not like so much to be 20x slower performance than EXT3?
 
it's not so much that ext3 > ext4, it's more like ext4 without trim > ext4 with.

try with ext4 as filesystem and disable discard (read: do not put discard in fstab). test. compare with ext4 & discard enabled.

not that i care much (i've everything on san, so if a ssd in one of the pve cluster servers die ha-failover comes to the rescue), but i suspect there might be a problem with trim & current kernel used for pve.
 
it's not so much that ext3 > ext4, it's more like ext4 without trim > ext4 with.

try with ext4 as filesystem and disable discard (read: do not put discard in fstab). test. compare with ext4 & discard enabled.

not that i care much (i've everything on san, so if a ssd in one of the pve cluster servers die ha-failover comes to the rescue), but i suspect there might be a problem with trim & current kernel used for pve.

I ran the tests again (with and without the discard option) and the results were fairly consistent... :(
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!