Hello,
I'm stressing a new cluster before putting it in production and I've got a problem on one node.
The node is a brand new Dell R740, perc H740p in HBA mode, (firmwares updated latest month) currently ~ 50GB/190 RAM used
4x10GB NIcs
proxmox is installed on a Boss Card in Ext4 / LVM
"mecanical" 10k SAS disks and a set of SSD
pools named : data / data02 and poolSDD
I've read multiple threads and touhght I've found the solution but crash still happens
upgraded from installed 6.3 to 6.4
upgraded the zpools from 0.8 to ZFS 2 (via zpool upgrade)
problem occurs when I restor a big server from pbs to data02 and at the same time install a pfsense to the SSD array (load is around 40 but server not really slowed, operating normally)
then occurs kernel problems on the proxmox node
on a machine running on the SDD pool:
another machine becomes unresponsive an is now unbootable (after upgrade / reboot)
pv logs
Jul 15 17:12:58 pveA01 pvestatd[5477]: VM 1999 qmp command failed - VM 1999 qmp command 'query-proxmox-support' failed - unable to connect to VM 1999 qmp socket - timeout after 31 retries
and the pfsense installer suffers multiples CDB errors...
I've trier multiple variations on the pfsense machine (writeback/nocache. install on ZFS or LVM. UDF vs ZFS) without success
load is around 10 during restore but rise around 40 during restore + install
I will not probably do this in exploitation but don't wan't to lose data after crash
Thanks in advance if you can provide me some guidance
and excuse my english
I'm stressing a new cluster before putting it in production and I've got a problem on one node.

The node is a brand new Dell R740, perc H740p in HBA mode, (firmwares updated latest month) currently ~ 50GB/190 RAM used
4x10GB NIcs
proxmox is installed on a Boss Card in Ext4 / LVM
"mecanical" 10k SAS disks and a set of SSD
pools named : data / data02 and poolSDD
I've read multiple threads and touhght I've found the solution but crash still happens
upgraded from installed 6.3 to 6.4
upgraded the zpools from 0.8 to ZFS 2 (via zpool upgrade)
Bash:
currently
5.4.124-1-pve
zpool status
zfs-2.0.4-pve1
problem occurs when I restor a big server from pbs to data02 and at the same time install a pfsense to the SSD array (load is around 40 but server not really slowed, operating normally)
then occurs kernel problems on the proxmox node
Bash:
Jul 15 17:13:40 pveA01 kernel: [ 4231.341788] INFO: task kvm:61475 blocked for more than 120 seconds.
Jul 15 17:13:40 pveA01 kernel: [ 4231.341843] Tainted: P O 5.4.124-1-pve #1
Jul 15 17:13:40 pveA01 kernel: [ 4231.341883] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Jul 15 17:13:40 pveA01 kernel: [ 4231.341937] kvm D 0 61475 1 0x00004000
Jul 15 17:13:40 pveA01 kernel: [ 4231.341941] Call Trace:
Jul 15 17:13:40 pveA01 kernel: [ 4231.341951] __schedule+0x2e6/0x6f0
Jul 15 17:13:40 pveA01 kernel: [ 4231.341955] schedule+0x33/0xa0
Jul 15 17:13:40 pveA01 kernel: [ 4231.341958] io_schedule+0x16/0x40
Jul 15 17:13:40 pveA01 kernel: [ 4231.341963] wait_on_page_bit+0x141/0x210
Jul 15 17:13:40 pveA01 kernel: [ 4231.341967] ? file_fdatawait_range+0x30/0x30
Jul 15 17:13:40 pveA01 kernel: [ 4231.341972] wait_on_page_writeback+0x43/0x90
Jul 15 17:13:40 pveA01 kernel: [ 4231.341975] __filemap_fdatawait_range+0xae/0x120
Jul 15 17:13:40 pveA01 kernel: [ 4231.341980] file_write_and_wait_range+0xa0/0xc0
Jul 15 17:13:40 pveA01 kernel: [ 4231.341985] blkdev_fsync+0x1b/0x50
Jul 15 17:13:40 pveA01 kernel: [ 4231.341989] vfs_fsync_range+0x48/0x80
Jul 15 17:13:40 pveA01 kernel: [ 4231.341992] ? __fget_light+0x59/0x70
Jul 15 17:13:40 pveA01 kernel: [ 4231.341995] do_fsync+0x3d/0x70
Jul 15 17:13:40 pveA01 kernel: [ 4231.341998] __x64_sys_fdatasync+0x17/0x20
Jul 15 17:13:40 pveA01 kernel: [ 4231.342004] do_syscall_64+0x57/0x190
Jul 15 17:13:40 pveA01 kernel: [ 4231.342009] entry_SYSCALL_64_after_hwframe+0x44/0xa9
Jul 15 17:13:40 pveA01 kernel: [ 4231.342013] RIP: 0033:0x7f13e27e42e7
Jul 15 17:13:40 pveA01 kernel: [ 4231.342020] Code: Bad RIP value.
Jul 15 17:13:40 pveA01 kernel: [ 4231.342022] RSP: 002b:00007f10a67f8d30 EFLAGS: 00000293 ORIG_RAX: 000000000000004b
Jul 15 17:13:40 pveA01 kernel: [ 4231.342026] RAX: ffffffffffffffda RBX: 000000000000001a RCX: 00007f13e27e42e7
Jul 15 17:13:40 pveA01 kernel: [ 4231.342028] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 000000000000001a
Jul 15 17:13:40 pveA01 kernel: [ 4231.342029] RBP: 0000560e11263b92 R08: 0000000000000000 R09: 00000000ffffffff
Jul 15 17:13:40 pveA01 kernel: [ 4231.342031] R10: 00007f10a67f8d20 R11: 0000000000000293 R12: 0000560e115de2e8
Jul 15 17:13:40 pveA01 kernel: [ 4231.342033] R13: 0000560e12813b58 R14: 0000560e12813ae0 R15: 0000560e12dfa260
Jul 15 17:13:53 pveA01 kernel: [ 4244.462599] vmbr192: port 2(tap1999i1) entered disabled state
Jul 15 17:13:53 pveA01 kernel: [ 4244.463171] vmbr192: port 2(tap1999i1) entered disabled state
Jul 15 17:14:36 pveA01 kernel: [ 4287.392408] vmbr10: port 2(tap1999i0) entered disabled state
Jul 15 17:14:36 pveA01 kernel: [ 4287.395316] vmbr10: port 2(tap1999i0) entered disabled state
on a machine running on the SDD pool:
Bash:
Jul 15 17:30:55 debian03-dsi kernel: [ 4472.009653] INFO: task jbd2/sda1-8:190 blocked for more than 120 seconds.
Jul 15 17:30:55 debian03-dsi kernel: [ 4472.009673] Not tainted 4.19.0-17-amd64 #1 Debian 4.19.194-2
Jul 15 17:30:55 debian03-dsi kernel: [ 4472.009677] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Jul 15 17:30:55 debian03-dsi kernel: [ 4472.009686] jbd2/sda1-8 D 0 190 2 0x80000000
Jul 15 17:30:55 debian03-dsi kernel: [ 4472.009707] Call Trace:
Jul 15 17:30:55 debian03-dsi kernel: [ 4472.009801] __schedule+0x29f/0x840
Jul 15 17:30:55 debian03-dsi kernel: [ 4472.009836] ? __switch_to_asm+0x41/0x70
Jul 15 17:30:55 debian03-dsi kernel: [ 4472.009840] ? __switch_to_asm+0x35/0x70
Jul 15 17:30:55 debian03-dsi kernel: [ 4472.009844] ? bit_wait_timeout+0x90/0x90
Jul 15 17:30:55 debian03-dsi kernel: [ 4472.009847] schedule+0x28/0x80
Jul 15 17:30:55 debian03-dsi kernel: [ 4472.009850] io_schedule+0x12/0x40
Jul 15 17:30:55 debian03-dsi kernel: [ 4472.009853] bit_wait_io+0xd/0x50
Jul 15 17:30:55 debian03-dsi kernel: [ 4472.009857] __wait_on_bit+0x73/0x90
Jul 15 17:30:55 debian03-dsi kernel: [ 4472.009862] out_of_line_wait_on_bit+0x91/0xb0
Jul 15 17:30:55 debian03-dsi kernel: [ 4472.009871] ? init_wait_var_entry+0x40/0x40
Jul 15 17:30:55 debian03-dsi kernel: [ 4472.009939] jbd2_journal_commit_transaction+0x144f/0x1840 [jbd2]
Jul 15 17:30:55 debian03-dsi kernel: [ 4472.009967] ? lock_timer_base+0x4c/0x80
Jul 15 17:30:55 debian03-dsi kernel: [ 4472.009984] kjournald2+0xbd/0x270 [jbd2]
Jul 15 17:30:55 debian03-dsi kernel: [ 4472.009998] ? finish_wait+0x80/0x80
Jul 15 17:30:55 debian03-dsi kernel: [ 4472.010012] ? commit_timeout+0x10/0x10 [jbd2]
Jul 15 17:30:55 debian03-dsi kernel: [ 4472.010020] kthread+0x112/0x130
Jul 15 17:30:55 debian03-dsi kernel: [ 4472.010036] ? kthread_bind+0x30/0x30
Jul 15 17:30:55 debian03-dsi kernel: [ 4472.010042] ret_from_fork+0x35/0x40
another machine becomes unresponsive an is now unbootable (after upgrade / reboot)
pv logs
Jul 15 17:12:58 pveA01 pvestatd[5477]: VM 1999 qmp command failed - VM 1999 qmp command 'query-proxmox-support' failed - unable to connect to VM 1999 qmp socket - timeout after 31 retries
and the pfsense installer suffers multiples CDB errors...
I've trier multiple variations on the pfsense machine (writeback/nocache. install on ZFS or LVM. UDF vs ZFS) without success
load is around 10 during restore but rise around 40 during restore + install
I will not probably do this in exploitation but don't wan't to lose data after crash

root@pveA01:/# pveversion -v
proxmox-ve: 6.4-1 (running kernel: 5.4.124-1-pve)
pve-manager: 6.4-13 (running version: 6.4-13/9f411e79)
pve-kernel-5.4: 6.4-4
pve-kernel-helper: 6.4-4
pve-kernel-5.4.124-1-pve: 5.4.124-1
pve-kernel-5.4.119-1-pve: 5.4.119-1
pve-kernel-5.4.114-1-pve: 5.4.114-1
pve-kernel-5.4.73-1-pve: 5.4.73-1
ceph-fuse: 12.2.11+dfsg1-2.1+b1
corosync: 3.1.2-pve1
criu: 3.11-3
glusterfs-client: 5.5-3
ifupdown: residual config
ifupdown2: 3.0.0-1+pve4~bpo10
ksm-control-daemon: 1.3-1
libjs-extjs: 6.0.1-10
libknet1: 1.20-pve1
libproxmox-acme-perl: 1.1.0
libproxmox-backup-qemu0: 1.1.0-1
libpve-access-control: 6.4-3
libpve-apiclient-perl: 3.1-3
libpve-common-perl: 6.4-3
libpve-guest-common-perl: 3.1-5
libpve-http-server-perl: 3.2-3
libpve-storage-perl: 6.4-1
libqb0: 1.0.5-1
libspice-server1: 0.14.2-4~pve6+1
lvm2: 2.03.02-pve4
lxc-pve: 4.0.6-2
lxcfs: 4.0.6-pve1
novnc-pve: 1.1.0-1
proxmox-backup-client: 1.1.12-1
proxmox-mini-journalreader: 1.1-1
proxmox-widget-toolkit: 2.6-1
pve-cluster: 6.4-1
pve-container: 3.3-6
pve-docs: 6.4-2
pve-edk2-firmware: 2.20200531-1
pve-firewall: 4.1-4
pve-firmware: 3.2-4
pve-ha-manager: 3.1-1
pve-i18n: 2.3-1
pve-qemu-kvm: 5.2.0-6
pve-xtermjs: 4.7.0-3
qemu-server: 6.4-2
smartmontools: 7.2-pve2
spiceterm: 3.1-1
vncterm: 1.6-2
zfsutils-linux: 2.0.4-pve1
#the pfsense VM
/etc/pve/qemu-server/1199.conf
boot: order=scsi0;ide2;net0
cores: 2
ide2: local:iso/pfSense-CE-2.5.2-RELEASE-amd64.iso,media=cdrom
memory: 2048
name: pflab
net0: virtio=FA:CB:60:33:F3:7C,bridge=vmbr10
net1: virtio=6E:CD
E:B6
4:ED,bridge=vmbr192
numa: 0
ostype: l26
scsi0: data:vm-1199-disk-0,iothread=1,size=32G,ssd=1
scsihw: virtio-scsi-single
smbios1: uuid=d50354f0-3455-45ce-b3ae-63fa83c81dcb
sockets: 2
vmgenid: 3f5469af-a614-45eb-93c5-160ebbe9bdad
proxmox-ve: 6.4-1 (running kernel: 5.4.124-1-pve)
pve-manager: 6.4-13 (running version: 6.4-13/9f411e79)
pve-kernel-5.4: 6.4-4
pve-kernel-helper: 6.4-4
pve-kernel-5.4.124-1-pve: 5.4.124-1
pve-kernel-5.4.119-1-pve: 5.4.119-1
pve-kernel-5.4.114-1-pve: 5.4.114-1
pve-kernel-5.4.73-1-pve: 5.4.73-1
ceph-fuse: 12.2.11+dfsg1-2.1+b1
corosync: 3.1.2-pve1
criu: 3.11-3
glusterfs-client: 5.5-3
ifupdown: residual config
ifupdown2: 3.0.0-1+pve4~bpo10
ksm-control-daemon: 1.3-1
libjs-extjs: 6.0.1-10
libknet1: 1.20-pve1
libproxmox-acme-perl: 1.1.0
libproxmox-backup-qemu0: 1.1.0-1
libpve-access-control: 6.4-3
libpve-apiclient-perl: 3.1-3
libpve-common-perl: 6.4-3
libpve-guest-common-perl: 3.1-5
libpve-http-server-perl: 3.2-3
libpve-storage-perl: 6.4-1
libqb0: 1.0.5-1
libspice-server1: 0.14.2-4~pve6+1
lvm2: 2.03.02-pve4
lxc-pve: 4.0.6-2
lxcfs: 4.0.6-pve1
novnc-pve: 1.1.0-1
proxmox-backup-client: 1.1.12-1
proxmox-mini-journalreader: 1.1-1
proxmox-widget-toolkit: 2.6-1
pve-cluster: 6.4-1
pve-container: 3.3-6
pve-docs: 6.4-2
pve-edk2-firmware: 2.20200531-1
pve-firewall: 4.1-4
pve-firmware: 3.2-4
pve-ha-manager: 3.1-1
pve-i18n: 2.3-1
pve-qemu-kvm: 5.2.0-6
pve-xtermjs: 4.7.0-3
qemu-server: 6.4-2
smartmontools: 7.2-pve2
spiceterm: 3.1-1
vncterm: 1.6-2
zfsutils-linux: 2.0.4-pve1
#the pfsense VM
/etc/pve/qemu-server/1199.conf
boot: order=scsi0;ide2;net0
cores: 2
ide2: local:iso/pfSense-CE-2.5.2-RELEASE-amd64.iso,media=cdrom
memory: 2048
name: pflab
net0: virtio=FA:CB:60:33:F3:7C,bridge=vmbr10
net1: virtio=6E:CD


numa: 0
ostype: l26
scsi0: data:vm-1199-disk-0,iothread=1,size=32G,ssd=1
scsihw: virtio-scsi-single
smbios1: uuid=d50354f0-3455-45ce-b3ae-63fa83c81dcb
sockets: 2
vmgenid: 3f5469af-a614-45eb-93c5-160ebbe9bdad
Thanks in advance if you can provide me some guidance
and excuse my english
Last edited: