v3.4 zfs txg_sync

chalan · Mar 16, 2015

hello, i have a proliant ml110 g7 server with 8gb ram, 2x500gb sata2 and proxmox 3.4. during instalation a i have settup raid1 so zfs was installed. after adding some vms i noticed a very bad performance. in iotop i can see a proces every apper every cca 3sek named txg_sync. sometimes the vms just freezes (lags) and a vm clone is a nightmare, take too long and during vm clone all running vms and the server itself freeezes and lags.

please help me solve this problem. do i have to reinstall with ext3 or ext4 and than assebmly a mdadm? or do i need to add more ram? does zfs need more ram than mdadm? and what is that tgx_sync think? thank you very much...

spirit · Mar 16, 2015

txg_sync is when zfs flush the datas from zfs journal to zfs datas.

And yes, zfs use more ressources than mdadm. (zfs is memory hungry)

The writes are done twice, once in the journal , once on the datas.
(That's great if you have dedicated ssd for journal, that's bad if you have journal and datas on the same drives)

(what cache config do you use in your vm config ?)

chalan · Mar 16, 2015

spirit said:
txg_sync is when zfs flush the datas from zfs journal to zfs datas.

And yes, zfs use more ressources than mdadm. (zfs is memory hungry)

The writes are done twice, once in the journal , once on the datas.
(That's great if you have dedicated ssd for journal, that's bad if you have journal and datas on the same drives)

(what cache config do you use in your vm config ?)

hello thank you for reply. all my vms (4x) (uses together 6,5GB RAM from 8GB) and i have set writetrough cache, because writeback was with (unsafe) so i decided for writetrough... i wasnt able to set no cache, the vms wont start with no cache

do i have to be afraid of loosing datas when it come to power outage?

do i have to set writeback? to get rid of txg_sync?
with ssd you mean i have to buy another harddisk (ssd) and somehove set zfs journal on it?

thank you...

Nemesiz · Mar 16, 2015

First ask yourself what you want from ZFS?

If you need traditional RAID use mdadm. If you need more secured RAID ZFS will help you but it require more resources for it.

chalan · Mar 16, 2015

during installation i just entered RAID1 and I was looking forward that I will not have to do mdadm after installation as it did in earlier versions of Proxmox
but when I booted from hard-disk I find out that there is a ZFS, which I have not even known at this point. i was expecting proxmox instalation will make raid1 with mdadm

so i tell myself ok, proxmox team know what is right so i started to install vms...

but now I have a problem with freezing and lagging VMs and do not know what is the cause and how to solve it.

so i ask you for help:

would it help to add ram?
or should i buy an ssd disk and somehow redirect journal?
or should i install Proxmox again without ZFS and do mdadm?

thank you

dietmar · Mar 16, 2015

chalan said:
would it help to add ram?
or should i buy an ssd disk and somehow redirect journal?

ZFS needs at least 4GB RAM. And it also helps if you have an SSD cache.

chalan said:
or should i install Proxmox again without ZFS and do mdadm?

We usually recommend to use a real HW RAID controller instead of mdadm. But that is up to you.

chalan · Mar 27, 2015

ok, but is there any solution for me without reinstall the whole system and use ext3/ext4 with mdadm or hw raid? my proxmox lags a lot (all vms), does this realy cause the ZFS (lack of memory)? below graphs and tests....

as you can see, there is lot of iowait, cpu, load and traffic is my opinion OK...

and here some hdd tests

dd if=/dev/zero of=/tmp/output.img bs=8k count=256k
262144+0 records in
262144+0 records out
2147483648 bytes (2,1 GB) copied, 34,0803 s, 63,0 MB/s

sometimes very slow...

/dev/sda:
Timing cached reads: 19634 MB in 2.00 seconds = 9825.81 MB/sec
Timing buffered disk reads: 50 MB in 3.15 seconds = 15.89 MB/sec

sometimes quite ok

/dev/sda:
Timing cached reads: 22010 MB in 2.00 seconds = 11015.43 MB/sec
Timing buffered disk reads: 252 MB in 3.14 seconds = 80.37 MB/sec

root@pve:~# pveperf
CPU BOGOMIPS: 24743.60
REGEX/SECOND: 1545035
HD SIZE: 449.27 GB (rpool/ROOT/pve-1)
FSYNCS/SECOND: 62.04
DNS EXT: 36.56 ms
DNS INT: 2.41 ms (elson.sk)

so what cause this lags? realy low memory and zfs?

Nemesiz · Mar 27, 2015

ZFS write to HDD synchronized way. If you have single slow hdd in your pool it will slow all your pool. And it is not related to ZIL (write log) or ZFS sync pool setting.

If you have not added ZIL (external write log) or your pool sync != disabled Information are writes twice. First as ZIL then as normal data.

AS for testing try to set `zfs set sync=disabled rpool/ROOT/pve-1` and look at your server iowait.

chalan · Mar 28, 2015

Nemesiz said:
ZFS write to HDD synchronized way. If you have single slow hdd in your pool it will slow all your pool. And it is not related to ZIL (write log) or ZFS sync pool setting.

If you have not added ZIL (external write log) or your pool sync != disabled Information are writes twice. First as ZIL then as normal data.

AS for testing try to set `zfs set sync=disabled rpool/ROOT/pve-1` and look at your server iowait.

hello thank you for your reply... i set zfs set sync=disabled rpool/ROOT/pve-1 fridey at afternoon, as you recomended but no change... all vms lags as before, see io wait

im helpless...

Nemesiz · Mar 28, 2015

Can you print result of `cat /proc/spl/kstat/zfs/arcstats`

chalan · Mar 28, 2015

Nemesiz said:
Can you print result of `cat /proc/spl/kstat/zfs/arcstats`

root@pve:~# cat /proc/spl/kstat/zfs/arcstats
5 1 0x01 85 4080 2323710577 1961538360646691
name type data
hits 4 426276676
misses 4 59396609
demand_data_hits 4 383816840
demand_data_misses 4 21460583
demand_metadata_hits 4 32670434
demand_metadata_misses 4 1121997
prefetch_data_hits 4 9285542
prefetch_data_misses 4 36730618
prefetch_metadata_hits 4 503860
prefetch_metadata_misses 4 83411
mru_hits 4 71686478
mru_ghost_hits 4 13408360
mfu_hits 4 344800798
mfu_ghost_hits 4 1430917
deleted 4 61009379
recycle_miss 4 760983
mutex_miss 4 753205
evict_skip 4 209969292
evict_l2_cached 4 0
evict_l2_eligible 4 1547850310144
evict_l2_ineligible 4 533319933440
hash_elements 4 180844
hash_elements_max 4 269064
hash_collisions 4 58519362
hash_chains 4 53201
hash_chain_max 4 12
p 4 106503680
c 4 1079207368
c_min 4 4194304
c_max 4 4160884736
size 4 1079067056
hdr_size 4 59500344
data_size 4 789247488
meta_size 4 200232960
other_size 4 30086264
anon_size 4 5783552
anon_evict_data 4 0
anon_evict_metadata 4 0
mru_size 4 110174720
mru_evict_data 4 86139392
mru_evict_metadata 4 4057088
mru_ghost_size 4 921012736
mru_ghost_evict_data 4 849706496
mru_ghost_evict_metadata 4 71306240
mfu_size 4 873522176
mfu_evict_data 4 697340928
mfu_evict_metadata 4 148361216
mfu_ghost_size 4 19025920
mfu_ghost_evict_data 4 19005440
mfu_ghost_evict_metadata 4 20480
l2_hits 4 0
l2_misses 4 0
l2_feeds 4 0
l2_rw_clash 4 0
l2_read_bytes 4 0
l2_write_bytes 4 0
l2_writes_sent 4 0
l2_writes_done 4 0
l2_writes_error 4 0
l2_writes_hdr_miss 4 0
l2_evict_lock_retry 4 0
l2_evict_reading 4 0
l2_free_on_write 4 0
l2_abort_lowmem 4 0
l2_cksum_bad 4 0
l2_io_error 4 0
l2_size 4 0
l2_asize 4 0
l2_hdr_size 4 0
l2_compress_successes 4 0
l2_compress_zeros 4 0
l2_compress_failures 4 0
memory_throttle_count 4 0
duplicate_buffers 4 0
duplicate_buffers_size 4 0
duplicate_reads 4 0
memory_direct_count 4 37106
memory_indirect_count 4 4272270
arc_no_grow 4 0
arc_tempreserve 4 0
arc_loaned_bytes 4 0
arc_prune 4 0
arc_meta_used 4 289819568
arc_meta_limit 4 3120663552
arc_meta_max 4 342435656

mir · Mar 28, 2015

What model and maker are those disks: 500gb sata2
500gb sata2 indicates ancient to me?

chalan · Mar 28, 2015

=== START OF INFORMATION SECTION ===
Device Model: MB0500EBZQA
Serial Number: Z1M0EGEJ
LU WWN Device Id: 5 000c50 04d05f2ab
Firmware Version: HPG1
User Capacity: 500 107 862 016 bytes [500 GB]
Sector Size: 512 bytes logical/physical
Device is: Not in smartctl database [for details use: -P showall]
ATA Version is: 8
ATA Standard is: ATA-8-ACS revision 6
Local Time is: Sat Mar 28 17:15:39 2015 CET
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

mir · Mar 28, 2015

In general terms:
To have a reasonable performance with ZFS the bare minimum is:
4 sata 3 disks (raid10 (striped raid 1) or raid5 raidz1)
8 GB RAM exclusively for ZFS

To have an OK and painless performance the bare minimum is:
4 sata 3 disks (raid10 (striped raid 1) or raid5 raidz1)
16 GB RAM exclusively for ZFS

chalan · Mar 28, 2015

so the only solution for me is reinstall with ext3/ext4 and than do a mdamd? the server itself support max 16GB RAM

mir · Mar 28, 2015

If the server only supports 16 GB RAM I would forget about ZFS. Remember you also need RAM for the VM's.

Nemesiz · Mar 28, 2015

ARC size is 1GB. It can be fine for home us only.

chalan · Mar 29, 2015

and this is ok?

/dev/sda

ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE
1 Raw_Read_Error_Rate 0x000f 083 063 044 Pre-fail Always - 221038186
3 Spin_Up_Time 0x0003 096 095 070 Pre-fail Always - 0
4 Start_Stop_Count 0x0032 100 100 020 Old_age Always - 170
5 Reallocated_Sector_Ct 0x0033 100 100 036 Pre-fail Always - 7
7 Seek_Error_Rate 0x000f 089 060 030 Pre-fail Always - 881147207
9 Power_On_Hours 0x0032 074 074 000 Old_age Always - 23261
10 Spin_Retry_Count 0x0013 100 100 097 Pre-fail Always - 0
12 Power_Cycle_Count 0x0032 100 100 020 Old_age Always - 170
180 Unused_Rsvd_Blk_Cnt_Tot 0x003b 100 100 030 Pre-fail Always - 151712808
184 End-to-End_Error 0x0032 100 100 003 Old_age Always - 0
187 Reported_Uncorrect 0x0032 100 100 000 Old_age Always - 0
188 Command_Timeout 0x0032 100 100 000 Old_age Always - 0
189 High_Fly_Writes 0x003a 100 100 000 Old_age Always - 0
190 Airflow_Temperature_Cel 0x0022 069 050 045 Old_age Always - 31 (Min/Max 19/35)
191 G-Sense_Error_Rate 0x0032 100 100 000 Old_age Always - 0
192 Power-Off_Retract_Count 0x0032 100 100 000 Old_age Always - 123
193 Load_Cycle_Count 0x0032 100 100 000 Old_age Always - 170
194 Temperature_Celsius 0x0022 031 050 000 Old_age Always - 31 (0 18 0 0)
195 Hardware_ECC_Recovered 0x001a 119 099 000 Old_age Always - 221038186
196 Reallocated_Event_Count 0x0033 100 100 036 Pre-fail Always - 7
197 Current_Pending_Sector 0x0012 100 100 000 Old_age Always - 0
198 Offline_Uncorrectable 0x0010 100 100 000 Old_age Offline - 0
199 UDMA_CRC_Error_Count 0x003e 200 200 000 Old_age Always - 2

run long test with no error

SMART Error Log Version: 1
No Errors Logged

SMART Self-test log structure revision number 1
Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error
# 1 Extended offline Completed without error 00% 23240 -
# 2 Extended offline Completed without error 00% 17446 -
# 3 Short offline Completed without error 00% 17443 -

/dev/sdb the same

Search

Search

v3.4 zfs txg_sync

chalan

Member

spirit

Distinguished Member

chalan

Member

Nemesiz

Renowned Member

chalan

Member

dietmar

Proxmox Staff Member

chalan

Member

Nemesiz

Renowned Member

chalan

Member

Nemesiz

Renowned Member

chalan

Member

mir

Famous Member

chalan

Member

mir

Famous Member

chalan

Member

mir

Famous Member

Nemesiz

Renowned Member

chalan

Member