ZFS on Proxmox and rsync

kobuki

Renowned Member
Dec 30, 2008
474
28
93
I know it's not strictly a Proxmox question, but many folks here seem to use ZFS on Proxmox. What I have noticed is that compared to running ZFS on stock Wheezy kernel (3.2), running it on the latest Proxmox/RHEL 2.6.32 kernel, running times are more than doubled. Even when using rsync with -n option, when no actual file operations and transfers are performed, but everything else is. We're talking about several million files (around 7M files, spanning several runs of backup processes of servers), and a runtime of about 2 hours increasing to more than 4 hours.

Does anyone have an idea why? Tunable kernel parameters, maybe? Intrinsic differences affecting performance on the RHEL kernel? No VMs are runnig, just installed the default setup over an existing Wheezy installation.
 
I know it's not strictly a Proxmox question, but many folks here seem to use ZFS on Proxmox.

I would have to disagree with this comment unless you meant many folks running ZFS on a separate server used as Shared Storage for Proxmox and not on Proxmox Host itself. Proxmox host performs best when the host used for nothing but Proxmox platform. ZFS uses as much RAM available in a host and other CPU intensive resources which may starve Proxmox's operations causing major random host failure.

Did you try to disable atime to see if any performance increase?
 
You may disagree but both scenarios are observable in the forums. But it's not the subject at hand. Please, let us not discuss here what the best usage scenario is for PVE or ZFS - that is another question.

In my scenario atime is irrelevant since the only thing that changed is the kernel, and of course some PVE processes running in the background but I wouldn't think those would cause any performance issues, and they're using the boot disks on ext3; that and the zpool disks are separate. The question woud possibly be more like what's the difference between the 2 kernels used, or default kernel/scheduler settings etc. At least that's what comes first into my mind. ATM, the server is doing only one task: running rsync between PVE and remote hosts.
 
I've run several tests on 2.6.32-26-pve, see below.
Code:
time /usr/bin/rsync -n --stats -aH --delete --numeric-ids --inplace --exclude /proc --exclude /sys --exclude /var/log --exclude /var/lib/mysql --exclude /mnt backupuser@remotehost:/ /destvol/

remote -> ext3:
real    1m51.565s
user    0m2.369s
sys     0m3.216s

remote -> zfs, atime=on:
real    18m29.933s
user    0m2.808s
sys     0m26.198s

remote -> zfs, atime=off:
real    16m12.875s
user    0m2.595s
sys     0m25.579s

destvol is either a ZFS dataset or a mounted ext3 directory under /mnt. ext3 mount paramters are the default for wheezy.

And then locally:

Code:
time /usr/bin/rsync -n --stats -aH --delete --numeric-ids --inplace /local_ext3_vol/ /zfsvol/

local ext3 -> zfs:
real    2m18.630s
user    0m2.885s
sys     0m12.507s

I'm totally clueless. Remote to local ext3 is very quick, so it's not a networking problem. Remote to local ZFS is very slow. Local ext3 to local ZFS is also very quick. I've run the tests multiple times, they're consistent. Running the tests on 3.2.0-4-amd64 (wheezy default kernel), the "remote -> ZFS,atime=off" tests are measuring half the duration... What the heck. It has the chance of a regression on the RHEL kernel. I'm still thinking of something related to process schedulers. The disk elevator settings are set the same for both kernels. There's a serious chance of this problem affecting a lot more than just a simple zfs+rsync scenario. Still looking for ideas.
 
Sure, here it is:

Code:
# zfs get all backup1
NAME     PROPERTY              VALUE                  SOURCE
backup1  type                  filesystem             -
backup1  creation              Thu Nov  7  0:50 2013  -
backup1  used                  871G                   -
backup1  available             2.72T                  -
backup1  referenced            152K                   -
backup1  compressratio         1.19x                  -
backup1  mounted               yes                    -
backup1  quota                 none                   default
backup1  reservation           none                   default
backup1  recordsize            128K                   default
backup1  mountpoint            /backup1               default
backup1  sharenfs              off                    default
backup1  checksum              on                     default
backup1  compression           lz4                    local
backup1  atime                 off                    local
backup1  devices               on                     default
backup1  exec                  on                     default
backup1  setuid                on                     default
backup1  readonly              off                    default
backup1  zoned                 off                    default
backup1  snapdir               hidden                 default
backup1  aclinherit            restricted             default
backup1  canmount              on                     default
backup1  xattr                 on                     default
backup1  copies                1                      default
backup1  version               5                      -
backup1  utf8only              off                    -
backup1  normalization         none                   -
backup1  casesensitivity       sensitive              -
backup1  vscan                 off                    default
backup1  nbmand                off                    default
backup1  sharesmb              off                    default
backup1  refquota              none                   default
backup1  refreservation        none                   default
backup1  primarycache          all                    default
backup1  secondarycache        all                    default
backup1  usedbysnapshots       0                      -
backup1  usedbydataset         152K                   -
backup1  usedbychildren        871G                   -
backup1  usedbyrefreservation  0                      -
backup1  logbias               latency                default
backup1  dedup                 off                    default
backup1  mlslabel              none                   default
backup1  sync                  standard               default
backup1  refcompressratio      1.00x                  -
backup1  written               152K                   -
backup1  snapdev               hidden                 default

# zpool status
  pool: backup1
 state: ONLINE
  scan: none requested
config:

        NAME                                          STATE     READ WRITE CKSUM
        backup1                                       ONLINE       0     0     0
          mirror-0                                    ONLINE       0     0     0
            ata-WDC_WD20EZRX-serial-omitted           ONLINE       0     0     0
            ata-WDC_WD20EZRX-serial-omitted           ONLINE       0     0     0
          mirror-1                                    ONLINE       0     0     0
            ata-SAMSUNG_HD204UI_serial-omitted        ONLINE       0     0     0
            ata-SAMSUNG_HD204UI_serial-omitted        ONLINE       0     0     0

errors: No known data errors
 
I can see your disks are a mix of SATA 2 and SATA 3 whether this means something is hard to tell. Also your SATA 3 WDC are from the green line which is not exactly performance monsters.

What CPU and the amount of RAM do you have the computer with the ZFS pool?

What does zpool iostat show when you run rsync?
 
Well, this is a cheap backup server. It has PVE now because we thought of running some simple, not very important or demanding services too in OVZ, like a backup caching DNS for a few hunderd users. All disks are low-end "green" ones with an RPM of 5400, on a SATA2 interface, ashift=12. RAM is 6GB, CPU is a quad-core Intel Q8200. Not exactly server HW, but more than adequate for the purpose. I'm pleased with the performance it provides. Well, almost, it's chuckling on the RHEL kernel, as written previously.

I'll run a few short tests to see zpool iostat. But again: the only thing that's changing is the kernel I boot from. Everything else is the same.
 
"zpool iostat backup1 1" is like this and stays the same:

Code:
pool        alloc   free   read  write   read  write
----------  -----  -----  -----  -----  -----  -----
...
backup1      871G  2.77T    484      0   804K      0
backup1      871G  2.77T    584      0  1003K      0
backup1      871G  2.77T    268      0   510K      0
backup1      871G  2.77T    565      0  1.15M      0
backup1      871G  2.77T    305      0   601K      0
backup1      871G  2.77T    228      0   310K      0
...

CPU load is ~1.0, wait is ~25, CPU usage is below 5% total. Moderate to high load and wait like this or higher is normal as I observed working ZoL instances. The server isn't breaking a single sweat. The problem is definitely not the load or the amount of I/O.
 
Ran a few tests with the kernel the PVE kernel is based on: 2.6.32-openvz-042stab083.2-amd64 (tho it's probably not the newest the PVE uses) from here: http://openvz.livejournal.com/45345.html

It's actually slower than the PVE kernel, by about 10%. I'm going back to the default 3.2 kernel in Wheezy, for now. The OVZ kernels seem to have a performance problem on some workloads, that's sad.
 
ZFS is probably starved for RAM. I wouldn't bother trying to run it without 8GB of ECC allocated just to ZFS. (The recommended amount is probably 4GB) Unless you want to put together your own kernel, that's the easiest thing to try.

I know you don't want to hear it, but if you want to run a backup server with low specs and run some vms as well, ZFS is at cross purposes with your task.
 
Last edited:
I know all the basic rules for setting up ZFS, that is not the problem at hand. For the Nth time: the problem is that the PVE 2.6.32 kernel takes about twice the time for the same specific task than the default Debian Wheezy 3.2 kernel. That's my issue. No memory problems. No overloading, no swapping. For a production NAS using ZFS I wouldn't really recommend anything below 16-32 gigs of ram, anyway. This is not that case.
 
remember that while pve is latest-debian based, the kernel is totally different (based on redhat)
probably better comparison could be redhat with same kernel version...?

Marco
 
I know all the basic rules for setting up ZFS, that is not the problem at hand. For the Nth time: the problem is that the PVE 2.6.32 kernel takes about twice the time for the same specific task than the default Debian Wheezy 3.2 kernel. That's my issue. No memory problems. No overloading, no swapping. For a production NAS using ZFS I wouldn't really recommend anything below 16-32 gigs of ram, anyway. This is not that case.


Hi, proxmox kernel use deadline scheduler by default, and debian kernel cfq , maybe this is the difference ?
 
RAM is cheap and ZFS likes more. I guess what is left is kernel hacking then?
 
remember that while pve is latest-debian based, the kernel is totally different (based on redhat)
probably better comparison could be redhat with same kernel version...?

Marco
Altready did that, see my previous posts. The official OVZ-patched RHEL kernel performs equally bad, TBH, even worse than the one in PVE.
 
RAM is cheap and ZFS likes more. I guess what is left is kernel hacking then?
No, adding RAM here won't help. The problem is not that serious to dig in the land of kernel hacking tho (I think).
 
Hi, proxmox kernel use deadline scheduler by default, and debian kernel cfq , maybe this is the difference ?
If you mean these, then they're the same on the zpool for all kernels tested:

Code:
# cat /sys/block/sd?/queue/scheduler
noop deadline [cfq]
noop deadline [cfq]
[noop] deadline cfq
[noop] deadline cfq
[noop] deadline cfq
[noop] deadline cfq

ZoL sets the elevator to noop automatically on all whole disks that are part of the zpool. The sd[ab] disks are on mdadm raid, but are not part of the tests (running the tests on them with ext3 produces very good performance though). If you meant the kernel task scheduler, that might be a good idea, but have no idea how to tune that - any suggestions?
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!