nobarrier (or the equivalent barrier=0) makes a lot of difference, in terms of fsyncs/sec:
with barrier=0, from /proc/mounts:
/dev/mapper/pve-data /var/lib/vz ext3 rw,relatime,errors=continue,user_xattr,acl,barrier=0,data=ordered 0 0
# pveperf /var/lib/vz
CPU BOGOMIPS: 38397.44...