Following this post, I've some updates.
https://forum.proxmox.com/threads/p...-blocked-for-more-than-120s.57765/post-277073
System Information:
zfs parameters, suggested by somebody for resolve similar symptom:
Porblem:
Proxmox server hang during fio benchmark on S860 and NAS pool at the same time, no auto recovery after hours. No VM is running during the test.
txg_sync hang is observed from the system log, if IPMI or softdog is enabled, server is rebooted in 10 seconds.
Update 11/13:
I'v disabled the swap, now it has plenty of RAM at idle state:
During the test RAM usage is between 24 - 30 GB. CPU usage is quite high since no SIMD support(?):
I've run the test multiple times and no hung is observed, seems like swap is the issue but I can't explain it, since swap is not on zvol.
https://forum.proxmox.com/threads/p...-blocked-for-more-than-120s.57765/post-277073
System Information:
Code:
CPU: Xeon E3-1285L V3
RAM: 32GB DDR3-1600 ECC, 50GB swap partition(not on zfs) on the same SSD as rpool
HBA: LSI9305-24i
Two SSD pool: rpool and S860; one hard disk pool: NAS
zpool status
pool: NAS
state: ONLINE
scan: scrub repaired 0B in 0 days 07:58:47 with 0 errors on Sun Nov 10 08:22:49 2019
config:
NAME STATE READ WRITE CKSUM
NAS ONLINE 0 0 0
raidz1-0 ONLINE 0 0 0
wwn-0x5000cca26bd9cc69 ONLINE 0 0 0
ata-HGST_HUH721010ALE600_7JJ3GG4C ONLINE 0 0 0
ata-HGST_HUH721010ALE600_7JJ535MC ONLINE 0 0 0
errors: No known data errors
pool: S860
state: ONLINE
scan: scrub repaired 0B in 0 days 00:06:37 with 0 errors on Sun Nov 10 00:30:44 2019
config:
NAME STATE READ WRITE CKSUM
S860 ONLINE 0 0 0
ata-Samsung_SSD_860_EVO_4TB_S3YPNB0KA00732Y ONLINE 0 0 0
ata-Samsung_SSD_860_EVO_4TB_S3YPNB0KA00919A ONLINE 0 0 0
errors: No known data errors
pool: rpool
state: ONLINE
scan: scrub repaired 0B in 0 days 00:16:00 with 0 errors on Sun Nov 10 00:40:14 2019
config:
NAME STATE READ WRITE CKSUM
rpool ONLINE 0 0 0
mirror-0 ONLINE 0 0 0
sda3 ONLINE 0 0 0
sdh3 ONLINE 0 0 0
errors: No known data errors
Code:
proxmox-ve: 6.0-2 (running kernel: 5.0.21-4-pve)
pve-manager: 6.0-11 (running version: 6.0-11/2140ef37)
pve-kernel-helper: 6.0-11
pve-kernel-5.0: 6.0-10
pve-kernel-5.0.21-4-pve: 5.0.21-9
pve-kernel-5.0.21-3-pve: 5.0.21-7
ceph-fuse: 12.2.11+dfsg1-2.1+b1
corosync: 3.0.2-pve4
criu: 3.11-3
glusterfs-client: 5.5-3
ksm-control-daemon: 1.3-1
libjs-extjs: 6.0.1-10
libknet1: 1.13-pve1
libpve-access-control: 6.0-3
libpve-apiclient-perl: 3.0-2
libpve-common-perl: 6.0-7
libpve-guest-common-perl: 3.0-2
libpve-http-server-perl: 3.0-3
libpve-storage-perl: 6.0-9
libqb0: 1.0.5-1
lvm2: 2.03.02-pve3
lxc-pve: 3.2.1-1
lxcfs: 3.0.3-pve60
novnc-pve: 1.1.0-1
proxmox-mini-journalreader: 1.1-1
proxmox-widget-toolkit: 2.0-8
pve-cluster: 6.0-7
pve-container: 3.0-10
pve-docs: 6.0-8
pve-edk2-firmware: 2.20190614-1
pve-firewall: 4.0-7
pve-firmware: 3.0-4
pve-ha-manager: 3.0-2
pve-i18n: 2.0-3
pve-qemu-kvm: 4.0.1-5
pve-xtermjs: 3.13.2-1
qemu-server: 6.0-13
smartmontools: 7.0-pve2
spiceterm: 3.1-1
vncterm: 1.6-1
zfsutils-linux: 0.8.2-pve2
zfs parameters, suggested by somebody for resolve similar symptom:
options zfs zfs_arc_max=25769803776
options zfs zfetch_max_distance=268435456
options zfs zfs_prefetch_disable=1
options zfs zfs_nocacheflush=1
options zfs zfs_vdev_scheduler=none
Porblem:
Proxmox server hang during fio benchmark on S860 and NAS pool at the same time, no auto recovery after hours. No VM is running during the test.
Code:
[global]
name=fio-seq-write
filename=fio-seq-write
rw=write
bs=256K
direct=0
numjobs=1
time_based=1
runtime=300
[file1]
size=50G
ioengine=libaio
iodepth=16
txg_sync hang is observed from the system log, if IPMI or softdog is enabled, server is rebooted in 10 seconds.
Code:
[ 968.292727] INFO: task txg_sync:2130 blocked for more than 120 seconds.
[ 968.292762] Tainted: P 0 5.0.21-4-pve #1
[ 968.292784] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 968.292815] txg_sync D 0 2130 2 0x80000000
[ 968.292837] Call Trace:
[ 968.292855] __schedule+0x2d4/0x870
[ 968.292872] schedule+Ox2c/0x70
[ 968.292887] schedule_timeout+0x157/0x360
[ 968.292906] ? __next_timer_interrupt+OxdO/OxdO
[ 968.292925] io_schedule_timeout+Ox1e/Ox50
[ 968.292946] __cv_timedwait_common+Oxl2f/0x170 [spl]
[ 968.292968] ? wait_woken+0x80/0x80
[ 968.292985] __cv_timedwait_io+0x19/0x20 [spl]
[ 968.293049] zio_wait+Ox13a/0x280 [zfs]
[ 968.293067] ? _cond_resched+0x19/0x30
[ 968.293115] dsl_pool_sync+Oxdc/Ox4f0 [zfs]
[ 968.293168] spa_sync+0x5b2/Oxfc0 [zfs]
[ 968.293221] ? spa_txg_history_init_io+0x106/0x110 [zfs]
[ 968.293278] txg_sync_thread+0x2d9/0x4c0 [zfs]
[ 968.293331] ? txg_thread_exit.isra.11+0x60/0x60 [zfs]
[ 968.293355] thread_generic_wrapper+0x74/0x90 [spl]
[ 968.293376] kthread+Ox120/0x140
[ 968.293393] ? __thread_exit+0x20/0x20 [sell
[ 968.293410] __kthread_parkme+0x70/0x70
[ 968.293428] ret_from_fork+0x35/0x40
Update 11/13:
I'v disabled the swap, now it has plenty of RAM at idle state:
Code:
total used free shared buff/cache available
Mem: 32894512 6502496 22191116 54452 200900 21998720
Swap: 0 0 0
During the test RAM usage is between 24 - 30 GB. CPU usage is quite high since no SIMD support(?):
Code:
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
2337 root 1 -19 0 0 0 S 45.0 0.0 10:56.39 z_wr_iss
2340 root 1 -19 0 0 0 S 45.0 0.0 10:58.38 z_wr_iss
2338 root 1 -19 0 0 0 S 44.7 0.0 10:58.54 z_wr_iss
2339 root 1 -19 0 0 0 S 44.7 0.0 10:58.87 z_wr_iss
2341 root 1 -19 0 0 0 S 44.7 0.0 10:55.02 z_wr_iss
2342 root 1 -19 0 0 0 S 44.7 0.0 10:56.07 z_wr_iss
32655 root 20 0 789228 8808 636 D 17.2 0.0 0:43.01 fio
453 root 39 19 0 0 0 S 4.3 0.0 0:35.99 dbuf_evict
447 root 0 -20 0 0 0 S 2.6 0.0 0:09.61 spl_kmem_cache
2350 root 0 -20 0 0 0 R 2.0 0.0 0:24.96 z_wr_int
2344 root 0 -20 0 0 0 S 1.7 0.0 0:24.77 z_wr_int
2345 root 0 -20 0 0 0 S 1.7 0.0 0:24.85 z_wr_int
2348 root 0 -20 0 0 0 S 1.7 0.0 0:24.97 z_wr_int
2349 root 0 -20 0 0 0 S 1.7 0.0 0:24.75 z_wr_int
2351 root 0 -20 0 0 0 S 1.7 0.0 0:25.33 z_wr_int
2346 root 0 -20 0 0 0 S 1.3 0.0 0:24.82 z_wr_int
2347 root 0 -20 0 0 0 R 1.3 0.0 0:24.95 z_wr_int
I've run the test multiple times and no hung is observed, seems like swap is the issue but I can't explain it, since swap is not on zvol.
Code:
Device Start End Sectors Size Type
/dev/sdh1 34 2047 2014 1007K BIOS boot
/dev/sdh2 2048 1050623 1048576 512M EFI System
/dev/sdh3 1050624 3649044480 3647993857 1.7T Solaris /usr & Apple ZFS
/dev/sdh4 3649044488 3750748814 101704327 48.5G Linux swap
Last edited: