My problem started after updating proxmox to version 6.
I had four nodes running proxmox 5.X with a uptime of about 600 days, after upgrading to 6.3, only one node starts freezing every two day.
This is de syslog information
Mar 2 11:06:38 genespx4 kernel: [145843.806614] INFO: task zvol:607 blocked for more than 120 seconds.
Mar 2 11:06:38 genespx4 kernel: [145843.806679] Tainted: P O 5.4.98-1-pve #1
Mar 2 11:06:38 genespx4 kernel: [145843.806723] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Mar 2 11:06:38 genespx4 kernel: [145843.806779] zvol D 0 607 2 0x80004000
Mar 2 11:06:38 genespx4 kernel: [145843.806784] Call Trace:
Mar 2 11:06:38 genespx4 kernel: [145843.806802] __schedule+0x2e6/0x6f0
Mar 2 11:06:38 genespx4 kernel: [145843.806805] schedule+0x33/0xa0
Mar 2 11:06:38 genespx4 kernel: [145843.806819] cv_wait_common+0x104/0x130 [spl]
Mar 2 11:06:38 genespx4 kernel: [145843.806827] ? wait_woken+0x80/0x80
Mar 2 11:06:38 genespx4 kernel: [145843.806836] __cv_wait+0x15/0x20 [spl]
Mar 2 11:06:38 genespx4 kernel: [145843.806964] zil_commit_impl+0x241/0xdb0 [zfs]
Mar 2 11:06:38 genespx4 kernel: [145843.807076] zil_commit+0x3d/0x60 [zfs]
Mar 2 11:06:38 genespx4 kernel: [145843.807181] zvol_write+0x325/0x4e0 [zfs]
Mar 2 11:06:38 genespx4 kernel: [145843.807192] taskq_thread+0x2f7/0x4e0 [spl]
Mar 2 11:06:38 genespx4 kernel: [145843.807200] ? wake_up_q+0x80/0x80
Mar 2 11:06:38 genespx4 kernel: [145843.807306] ? zvol_os_create_minor+0x7a0/0x7a0 [zfs]
Mar 2 11:06:38 genespx4 kernel: [145843.807312] kthread+0x120/0x140
Mar 2 11:06:38 genespx4 kernel: [145843.807321] ? task_done+0xb0/0xb0 [spl]
Mar 2 11:06:38 genespx4 kernel: [145843.807324] ? kthread_park+0x90/0x90
Mar 2 11:06:38 genespx4 kernel: [145843.807329] ret_from_fork+0x35/0x40
Mar 2 11:06:38 genespx4 kernel: [145843.807347] INFO: task z_wr_iss:1058 blocked for more than 120 seconds.
Mar 2 11:06:38 genespx4 kernel: [145843.807398] Tainted: P O 5.4.98-1-pve #1
Mar 2 11:06:38 genespx4 kernel: [145843.807441] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Mar 2 11:06:38 genespx4 kernel: [145843.807496] z_wr_iss D 0 1058 2 0x80004000
Mar 2 11:08:44 genespx4 zed: eid=1464 class=deadman pool='rpool' vdev=sdb1 size=49152 offset=578887012352 priority=3 err=0 flags=0x180880 bookmark=142:1:2:4
Mar 2 11:08:44 genespx4 zed: eid=1465 class=deadman pool='rpool' vdev=sdb1 size=49152 offset=773487357952 priority=3 err=0 flags=0x180880 bookmark=142:1:2:4
Mar 2 11:08:44 genespx4 zed: eid=1466 class=deadman pool='rpool' vdev=sdb1 size=24576 offset=581936488448 priority=3 err=0 flags=0x180880 bookmark=142:1:1:4943
Mar 2 11:08:45 genespx4 zed: eid=1467 class=deadman pool='rpool' vdev=sdb1 size=24576 offset=780181196800 priority=3 err=0 flags=0x180880 bookmark=142:1:1:4943
Tried adding more ram for zfs in modprobe.d zfs.conf, also disabling zfs options zfs_vdev_scheduler = none by another proxmox forum post.
And of course checking the disk, SMART and doing performance tests.
I can't reproduce this error, it just happens over time
Does anyone know what may be happening?
Thanks in advance.
I had four nodes running proxmox 5.X with a uptime of about 600 days, after upgrading to 6.3, only one node starts freezing every two day.
This is de syslog information
Mar 2 11:06:38 genespx4 kernel: [145843.806614] INFO: task zvol:607 blocked for more than 120 seconds.
Mar 2 11:06:38 genespx4 kernel: [145843.806679] Tainted: P O 5.4.98-1-pve #1
Mar 2 11:06:38 genespx4 kernel: [145843.806723] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Mar 2 11:06:38 genespx4 kernel: [145843.806779] zvol D 0 607 2 0x80004000
Mar 2 11:06:38 genespx4 kernel: [145843.806784] Call Trace:
Mar 2 11:06:38 genespx4 kernel: [145843.806802] __schedule+0x2e6/0x6f0
Mar 2 11:06:38 genespx4 kernel: [145843.806805] schedule+0x33/0xa0
Mar 2 11:06:38 genespx4 kernel: [145843.806819] cv_wait_common+0x104/0x130 [spl]
Mar 2 11:06:38 genespx4 kernel: [145843.806827] ? wait_woken+0x80/0x80
Mar 2 11:06:38 genespx4 kernel: [145843.806836] __cv_wait+0x15/0x20 [spl]
Mar 2 11:06:38 genespx4 kernel: [145843.806964] zil_commit_impl+0x241/0xdb0 [zfs]
Mar 2 11:06:38 genespx4 kernel: [145843.807076] zil_commit+0x3d/0x60 [zfs]
Mar 2 11:06:38 genespx4 kernel: [145843.807181] zvol_write+0x325/0x4e0 [zfs]
Mar 2 11:06:38 genespx4 kernel: [145843.807192] taskq_thread+0x2f7/0x4e0 [spl]
Mar 2 11:06:38 genespx4 kernel: [145843.807200] ? wake_up_q+0x80/0x80
Mar 2 11:06:38 genespx4 kernel: [145843.807306] ? zvol_os_create_minor+0x7a0/0x7a0 [zfs]
Mar 2 11:06:38 genespx4 kernel: [145843.807312] kthread+0x120/0x140
Mar 2 11:06:38 genespx4 kernel: [145843.807321] ? task_done+0xb0/0xb0 [spl]
Mar 2 11:06:38 genespx4 kernel: [145843.807324] ? kthread_park+0x90/0x90
Mar 2 11:06:38 genespx4 kernel: [145843.807329] ret_from_fork+0x35/0x40
Mar 2 11:06:38 genespx4 kernel: [145843.807347] INFO: task z_wr_iss:1058 blocked for more than 120 seconds.
Mar 2 11:06:38 genespx4 kernel: [145843.807398] Tainted: P O 5.4.98-1-pve #1
Mar 2 11:06:38 genespx4 kernel: [145843.807441] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Mar 2 11:06:38 genespx4 kernel: [145843.807496] z_wr_iss D 0 1058 2 0x80004000
Mar 2 11:08:44 genespx4 zed: eid=1464 class=deadman pool='rpool' vdev=sdb1 size=49152 offset=578887012352 priority=3 err=0 flags=0x180880 bookmark=142:1:2:4
Mar 2 11:08:44 genespx4 zed: eid=1465 class=deadman pool='rpool' vdev=sdb1 size=49152 offset=773487357952 priority=3 err=0 flags=0x180880 bookmark=142:1:2:4
Mar 2 11:08:44 genespx4 zed: eid=1466 class=deadman pool='rpool' vdev=sdb1 size=24576 offset=581936488448 priority=3 err=0 flags=0x180880 bookmark=142:1:1:4943
Mar 2 11:08:45 genespx4 zed: eid=1467 class=deadman pool='rpool' vdev=sdb1 size=24576 offset=780181196800 priority=3 err=0 flags=0x180880 bookmark=142:1:1:4943
Code:
root@genespx4:~# pveversion -v
proxmox-ve: 6.3-1 (running kernel: 5.4.98-1-pve)
pve-manager: 6.3-4 (running version: 6.3-4/0a38c56f)
pve-kernel-5.4: 6.3-5
pve-kernel-helper: 6.3-5
pve-kernel-5.4.98-1-pve: 5.4.98-1
pve-kernel-5.4.78-2-pve: 5.4.78-2
pve-kernel-4.15: 5.4-19
pve-kernel-4.15.18-30-pve: 4.15.18-58
pve-kernel-4.15.18-21-pve: 4.15.18-48
pve-kernel-4.15.18-20-pve: 4.15.18-46
pve-kernel-4.15.18-12-pve: 4.15.18-36
ceph-fuse: 12.2.11+dfsg1-2.1+b1
corosync: 3.1.0-pve1
criu: 3.11-3
glusterfs-client: 5.5-3
ifupdown: 0.8.35+pve1
ksm-control-daemon: 1.3-1
libjs-extjs: 6.0.1-10
libknet1: 1.20-pve1
libproxmox-acme-perl: 1.0.7
libproxmox-backup-qemu0: 1.0.3-1
libpve-access-control: 6.1-3
libpve-apiclient-perl: 3.1-3
libpve-common-perl: 6.3-4
libpve-guest-common-perl: 3.1-5
libpve-http-server-perl: 3.1-1
libpve-storage-perl: 6.3-7
libqb0: 1.0.5-1
libspice-server1: 0.14.2-4~pve6+1
lvm2: 2.03.02-pve4
lxc-pve: 4.0.6-2
lxcfs: 4.0.6-pve1
novnc-pve: 1.1.0-1
openvswitch-switch: 2.12.3-1
proxmox-backup-client: 1.0.8-1
proxmox-mini-journalreader: 1.1-1
proxmox-widget-toolkit: 2.4-5
pve-cluster: 6.2-1
pve-container: 3.3-4
pve-docs: 6.3-1
pve-edk2-firmware: 2.20200531-1
pve-firewall: 4.1-3
pve-firmware: 3.2-2
pve-ha-manager: 3.1-1
pve-i18n: 2.2-2
pve-qemu-kvm: 5.2.0-2
pve-xtermjs: 4.7.0-3
qemu-server: 6.3-5
smartmontools: 7.1-pve2
spiceterm: 3.1-1
vncterm: 1.6-2
zfsutils-linux: 2.0.3-pve1
Tried adding more ram for zfs in modprobe.d zfs.conf, also disabling zfs options zfs_vdev_scheduler = none by another proxmox forum post.
And of course checking the disk, SMART and doing performance tests.
I can't reproduce this error, it just happens over time
Does anyone know what may be happening?
Thanks in advance.
Last edited: