Very Slow Performance 4.1 Direct Single HDD ext3 or 4

fixed4life · Jan 22, 2016

I am currently working on testing for a new cluster I am building with three compute nodes and 3 Ceph nodes. This is my 1st go with the 4.x branch, but I have been working with Proxmox for some time (1.x-3.x). In compute node testing (compute nodes will consist of a X10SLL-F board with a single hard drive plugged directly in as AHCI) I found something very troubling. When installing onto a test drive (Samsung 1TB 7200) with both ext3 or 4 the FSYNC numbers are terrible. Buffered reads seem correct for that test drive @ 120 or so, but the FSYNC numbers are less than 60 (in ext3 it is even worse @ 25-ish). It all felt very wrong (and I wondered if it was hardware related), so I grabbed the 3.4 iso and installed it on the same hardware and the ext3 numbers were orders of magnitude better (details below).

I now have setup the following 3 identical nodes for testing:

tn01 - 4.1 ext4

Code:

root@tn01:~# pveversion -v
proxmox-ve: 4.1-26 (running kernel: 4.2.6-1-pve)
pve-manager: 4.1-1 (running version: 4.1-1/2f9650d4)
pve-kernel-4.2.6-1-pve: 4.2.6-26
lvm2: 2.02.116-pve2
corosync-pve: 2.3.5-2
libqb0: 0.17.2-1
pve-cluster: 4.0-29
qemu-server: 4.0-41
pve-firmware: 1.1-7
libpve-common-perl: 4.0-41
libpve-access-control: 4.0-10
libpve-storage-perl: 4.0-38
pve-libspice-server1: 0.12.5-2
vncterm: 1.2-1
pve-qemu-kvm: 2.4-17
pve-container: 1.0-32
pve-firewall: 2.0-14
pve-ha-manager: 1.0-14
ksm-control-daemon: 1.2-1
glusterfs-client: 3.5.2-2+deb8u1
lxc-pve: 1.1.5-5
lxcfs: 0.13-pve1
cgmanager: 0.39-pve1
criu: 1.6.0-1
zfsutils: 0.6.5-pve6~jessie

Code:

root@tn01:~# pveperf /var/lib/vz
CPU BOGOMIPS:      54399.60
REGEX/SECOND:      2891995
HD SIZE:           860.54 GB (/dev/mapper/pve-data)
BUFFERED READS:    128.17 MB/sec
AVERAGE SEEK TIME: 13.59 ms
FSYNCS/SECOND:     44.87
DNS EXT:           125.84 ms
DNS INT:           129.98 ms (dev.lmbx.net)

tn02 - 4.1 ext3

Code:

root@tn02:~# pveversion -v

proxmox-ve: 4.1-26 (running kernel: 4.2.6-1-pve)
pve-manager: 4.1-1 (running version: 4.1-1/2f9650d4)
pve-kernel-4.2.6-1-pve: 4.2.6-26
lvm2: 2.02.116-pve2
corosync-pve: 2.3.5-2
libqb0: 0.17.2-1
pve-cluster: 4.0-29
qemu-server: 4.0-41
pve-firmware: 1.1-7
libpve-common-perl: 4.0-41
libpve-access-control: 4.0-10
libpve-storage-perl: 4.0-38
pve-libspice-server1: 0.12.5-2
vncterm: 1.2-1
pve-qemu-kvm: 2.4-17
pve-container: 1.0-32
pve-firewall: 2.0-14
pve-ha-manager: 1.0-14
ksm-control-daemon: 1.2-1
glusterfs-client: 3.5.2-2+deb8u1
lxc-pve: 1.1.5-5
lxcfs: 0.13-pve1
cgmanager: 0.39-pve1
criu: 1.6.0-1
zfsutils: 0.6.5-pve6~jessie

Code:

root@tn02:~# pveperf /var/lib/vz
CPU BOGOMIPS:      54400.80
REGEX/SECOND:      2917401
HD SIZE:           860.54 GB (/dev/mapper/pve-data)
BUFFERED READS:    105.81 MB/sec
AVERAGE SEEK TIME: 15.50 ms
FSYNCS/SECOND:     23.63
DNS EXT:           161.34 ms
DNS INT:           121.51 ms (dev.lmbx.net)

tn03 - 3.4 ext3

Code:

root@tn03:~# pveversion -v
proxmox-ve-2.6.32: 3.4-156 (running kernel: 2.6.32-39-pve)
pve-manager: 3.4-6 (running version: 3.4-6/102d4547)
pve-kernel-2.6.32-39-pve: 2.6.32-156
lvm2: 2.02.98-pve4
clvm: 2.02.98-pve4
corosync-pve: 1.4.7-1
openais-pve: 1.1.4-3
libqb0: 0.11.1-2
redhat-cluster-pve: 3.2.0-2
resource-agents-pve: 3.9.2-4
fence-agents-pve: 4.0.10-2
pve-cluster: 3.0-17
qemu-server: 3.4-6
pve-firmware: 1.1-4
libpve-common-perl: 3.0-24
libpve-access-control: 3.0-16
libpve-storage-perl: 3.0-33
pve-libspice-server1: 0.12.4-3
vncterm: 1.1-8
vzctl: 4.0-1pve6
vzprocps: 2.0.11-2
vzquota: 3.1-2
pve-qemu-kvm: 2.2-10
ksm-control-daemon: 1.1-1
glusterfs-client: 3.5.2-1

Code:

root@tn03:~# pveperf /var/lib/vz
CPU BOGOMIPS:      54400.40
REGEX/SECOND:      2652530
HD SIZE:           860.67 GB (/dev/mapper/pve-data)
BUFFERED READS:    112.90 MB/sec
AVERAGE SEEK TIME: 18.49 ms
FSYNCS/SECOND:     2136.83
DNS EXT:           109.51 ms
DNS INT:           77.84 ms (dev.lmbx.net)

I should mention all of my fstab mounts are using the defaults.

Has anyone seen this with 4.1 or have any idea what might be going on? Any help would be greatly appreciated.

dietmar · Jan 22, 2016

fixed4life said:
I should mention all of my fstab mounts are using the defaults.

That is the difference. Newer kernels use other file system defaults. If you want compare results, you need to set the same options.

(ext3 on 3.4 uses barrier=0)

fixed4life · Jan 22, 2016

Thanks for that!

So I should run nobarrier in my fstab to compare apples to apples then?

This look correct?

# <file system> <mount point> <type> <options> <dump> <pass>
/dev/pve/root / ext4 errors=remount-ro,barrier=0 0 1
/dev/pve/data /var/lib/vz ext4 defaults,barrier=0 0 1
/dev/pve/swap none swap sw 0 0
proc /proc proc defaults 0 0

dietmar · Jan 22, 2016

Yes.

Note: this setting is only recommended if you have a battery backed cache (RAID controller).

fixed4life · Jan 22, 2016

Was this safer back in the earlier kernels / ext3, or is it just my ignorance that caused me to live dangerously this whole time?

The drives are / will be plugged directly into the onboard controller for the compute nodes, so there is no BBU or RAID controller. The idea was to use HDD,s for the compute nodes so if I need to migrate VM's off of Ceph temporarily (for maintenance and such) they could live on the nodes for a short period of time..

How will it affect performance with ~50 vs ~2400 FSYNCS/SECOND? hdparm results inside the VM look the same either way, but being the only VM on the node, I imagine it would not stay that way...

dietmar · Jan 22, 2016

fixed4life said:
Was this safer back in the earlier kernels / ext3

We never had problems on ext3 and barriers=0. But kernel developers decided that this is unsafe, and changed the default.

Ovidiu · Mar 11, 2016

Not sure whether to open my own thread but will do so if advised.

So if I have a raid controller with a BBU I can just add barrier=0 to my /etc/fstab and reboot? And btw. I noticed my system is using ext3 - any way to check the current status of barriers?

/etc/fstab looks like this:

Code:

# <file system> <mount point>   <type>  <options>       <dump>  <pass>
/dev/sda2       /       ext3    errors=remount-ro       0       1
/dev/sda3       swap    swap    defaults        0       0
/dev/pve/data   /var/lib/vz     ext3    defaults        1       2
proc            /proc   proc    defaults        0       0
sysfs           /sys    sysfs   defaults        0       0

Btw. my FSYNC looks OK but my buffered reads are horrible:

Code:

root@james:~# pveperf
CPU BOGOMIPS:      54274.88
REGEX/SECOND:      1623488
HD SIZE:           19.25 GB (/dev/sda2)
BUFFERED READS:    39.27 MB/sec
AVERAGE SEEK TIME: 25.25 ms
FSYNCS/SECOND:     928.03
DNS EXT:           95.71 ms
DNS INT:           11.35 ms

root@james:~# pveperf /var/lib/vz
CPU BOGOMIPS:      54274.88
REGEX/SECOND:      1724230
HD SIZE:           1823.36 GB (/dev/mapper/pve-data)
BUFFERED READS:    20.49 MB/sec
AVERAGE SEEK TIME: 37.67 ms
FSYNCS/SECOND:     1453.95
DNS EXT:           50.90 ms
DNS INT:           10.38 ms

That is a RAID 1, hardware controller with BBU with: Current Cache Policy: WriteBack, ReadAdaptive, Cached, No Write Cache if Bad BBU

fixed4life · Mar 11, 2016

cat /proc/mounts

pizza · Mar 11, 2016

I've also tested 4.1 and 3.4 on a single SATA disk:

fsyncs/sec on 4.1 gives 50 and 700 with 3.4

pizza · Mar 12, 2016

Update: Proxmox 3.4 with softraid (ext3) (2x sata disks) and kvm is roughly two times faster then 4.1 with softraid (ext4) on the same hardware.

Iowait is much lower with 3.4 then 4.1

sahostking · May 15, 2016

I'm also sitting with the same issue today with new server and using proxmox 4.2. So really pushing towards installing 3.4 rather than 4.2.
Alternatively maybe ZFS but not too keen on that yet.

spirit · May 16, 2016

Hi guys,

a 7200rpm hard disk can do around 75-100 iops (4K block), so this is normal that fsyncs are slow.

The barrier are used to be sure that datas it's correctly written to the disk platter, and not in the disk buffer memory.

If you remove barrier, the drive will said "OK it's written", when datas are in his buffer memory but not yet on the platter.
So if you have a power failure you can loose datas or have filesystem corruption.

if you use an hardware raid controller with cache, it'll return "OK it's written", when the datas are his in memory. (event with barriers=1, It'll fake the sync). But the controller have a battery., so it's ok here, in case of power failure.

PaulVM · Apr 10, 2017

Sorry for resuming this old post.
Looking around on forum and google didn't give a solution.
Tried "barrier = 0" and " --rootfs local:0", but still have Proxmox 4.4 much slower that 3.4 and with load average much higher (same simple hardware: SATA software RAID, 32 GB RAM).
Did anyone find a solution to increase 4.x performance?

Thanks, P.

camaran · Apr 18, 2017

Hi, it's correct:

Code:

/dev/pve/data   /var/lib/vz     ext4    defaults,barriers=0     1       2

or i must remove defaults ?

Search

Search

Very Slow Performance 4.1 Direct Single HDD ext3 or 4

fixed4life

Renowned Member

dietmar

Proxmox Staff Member

fixed4life

Renowned Member

dietmar

Proxmox Staff Member

fixed4life

Renowned Member

dietmar

Proxmox Staff Member

Ovidiu

Renowned Member

fixed4life

Renowned Member

pizza

Renowned Member

pizza

Renowned Member

sahostking

Renowned Member

spirit

Distinguished Member

PaulVM

Renowned Member

camaran

Active Member