After 6 days working ok with Linux proxmox 5.11.22-3-pve #1 SMP PVE 5.11.22-6 (Wed, 28 Jul 2021 10:51:12 +0200) x86_64 GNU/Linux
a same kernel crash appears (always "pvesr" task) on main node:
Main node:
Supermicro X10SDV-TLN4F motherboard
64GB RAM
16 x Intel(R) Xeon(R) CPU D-1541 @ 2.10GHz (1 Socket)
rpool 2 x Intel SSD D3-S4510 960GB (sata)
pool1 2 x HGST Ultrastar DC H520 12TB (sata)
zfs logs 1 x Intel Optane SSD 800p 60GB
Load average is low, IO delay is 0.00% for the most of time
Secondary node:
HP Microserver G8
16GB RAM
8 x Intel(R) Xeon(R) CPU E3-1265L V2 @ 2.50GHz (1 Socket)
rpool 2 x Intel SSD D3-S4510 960GB (sata)
pool1 2 x Western Digital Gold 2TB (sata)
All systems are using:
Linux 5.11.22-3-pve #1 SMP PVE 5.11.22-6 (Wed, 28 Jul 2021 10:51:12 +0200)
Containers amb virtual machines are in rpool and replicating each 30 minutes with "Proxmox replication" from Supermicro node to HP Microserver
Some ZFS volumes of pool1 are replicating each hour using "syncoid" from from Supermicro node to HP Microserver
Supermicro node is replicating some containers from external server to it using "syncoid".
zpool status on Supermicro server:
zpool status on HP Microserver:
zfs packages installed:
libzfs4linux/stable,now 2.0.5-pve1 amd64 [installed,automatic]
zfs-initramfs/stable,now 2.0.5-pve1 all [installed]
zfs-zed/stable,now 2.0.5-pve1 amd64 [installed]
zfsutils-linux/stable,now 2.0.5-pve1 amd64 [installed]
Only crashes on Supermicro node (main node):
a same kernel crash appears (always "pvesr" task) on main node:
Main node:
Supermicro X10SDV-TLN4F motherboard
64GB RAM
16 x Intel(R) Xeon(R) CPU D-1541 @ 2.10GHz (1 Socket)
rpool 2 x Intel SSD D3-S4510 960GB (sata)
pool1 2 x HGST Ultrastar DC H520 12TB (sata)
zfs logs 1 x Intel Optane SSD 800p 60GB
Load average is low, IO delay is 0.00% for the most of time
Secondary node:
HP Microserver G8
16GB RAM
8 x Intel(R) Xeon(R) CPU E3-1265L V2 @ 2.50GHz (1 Socket)
rpool 2 x Intel SSD D3-S4510 960GB (sata)
pool1 2 x Western Digital Gold 2TB (sata)
All systems are using:
Linux 5.11.22-3-pve #1 SMP PVE 5.11.22-6 (Wed, 28 Jul 2021 10:51:12 +0200)
Containers amb virtual machines are in rpool and replicating each 30 minutes with "Proxmox replication" from Supermicro node to HP Microserver
Some ZFS volumes of pool1 are replicating each hour using "syncoid" from from Supermicro node to HP Microserver
Supermicro node is replicating some containers from external server to it using "syncoid".
zpool status on Supermicro server:
Code:
root@proxmox:~# zpool status
pool: pool1
state: ONLINE
scan: scrub repaired 0B in 04:03:43 with 0 errors on Sun Aug 8 04:27:44 2021
config:
NAME STATE READ WRITE CKSUM
pool1 ONLINE 0 0 0
mirror-0 ONLINE 0 0 0
sdc ONLINE 0 0 0
sdd ONLINE 0 0 0
logs
nvme0n1 ONLINE 0 0 0
errors: No known data errors
pool: rpool
state: ONLINE
scan: scrub repaired 0B in 01:07:15 with 0 errors on Sun Aug 8 01:31:19 2021
config:
NAME STATE READ WRITE CKSUM
rpool ONLINE 0 0 0
mirror-0 ONLINE 0 0 0
sda3 ONLINE 0 0 0
sdb3 ONLINE 0 0 0
errors: No known data errors
zpool status on HP Microserver:
Code:
root@proxmox2:~# root@proxmox2:~# zpool status
pool: pool1
state: ONLINE
scan: scrub repaired 0B in 03:23:48 with 0 errors on Sun Aug 8 03:47:49 2021
config:
NAME STATE READ WRITE CKSUM
pool1 ONLINE 0 0 0
mirror-0 ONLINE 0 0 0
sdc ONLINE 0 0 0
sdd ONLINE 0 0 0
errors: No known data errors
pool: rpool
state: ONLINE
scan: scrub repaired 0B in 00:53:39 with 0 errors on Sun Aug 8 01:17:41 2021
config:
NAME STATE READ WRITE CKSUM
rpool ONLINE 0 0 0
mirror-0 ONLINE 0 0 0
ata-INTEL_SSDSC2KB960G8_PHYF912000JF960CGN-part3 ONLINE 0 0 0
ata-INTEL_SSDSC2KB960G8_PHYF912000KL960CGN-part3 ONLINE 0 0 0
errors: No known data errors
zfs packages installed:
libzfs4linux/stable,now 2.0.5-pve1 amd64 [installed,automatic]
zfs-initramfs/stable,now 2.0.5-pve1 all [installed]
zfs-zed/stable,now 2.0.5-pve1 amd64 [installed]
zfsutils-linux/stable,now 2.0.5-pve1 amd64 [installed]
Only crashes on Supermicro node (main node):
Code:
[556186.483567] INFO: task pvesr:3931554 blocked for more than 120 seconds.
[556186.483601] Tainted: P O 5.11.22-3-pve #1
[556186.483620] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[556186.483645] task:pvesr state:D stack: 0 pid:3931554 ppid:3931526 flags:0x00000000
[556186.483650] Call Trace:
[556186.483653] __schedule+0x2ca/0x880
[556186.483660] schedule+0x4f/0xc0
[556186.483663] rwsem_down_write_slowpath+0x212/0x590
[556186.483669] down_write+0x43/0x50
[556186.483672] filename_create+0x7e/0x160
[556186.483677] do_mkdirat+0x58/0x140
[556186.483681] __x64_sys_mkdir+0x1b/0x20
[556186.483684] do_syscall_64+0x38/0x90
[556186.483688] entry_SYSCALL_64_after_hwframe+0x44/0xa9
[556186.483692] RIP: 0033:0x7fcb8cbc7b07
[556186.483695] RSP: 002b:00007fffb911de98 EFLAGS: 00000246 ORIG_RAX: 0000000000000053
[556186.483697] RAX: ffffffffffffffda RBX: 000055e3ac9a62a0 RCX: 00007fcb8cbc7b07
[556186.483699] RDX: 000055e3ac455a65 RSI: 00000000000001ff RDI: 000055e3b0778880
[556186.483700] RBP: 0000000000000000 R08: 000055e3b09fe018 R09: 0000000000000000
[556186.483702] R10: 0000000000000008 R11: 0000000000000246 R12: 000055e3b0778880
[556186.483703] R13: 000055e3adc9f028 R14: 000055e3b0c37ef8 R15: 00000000000001ff
[556307.314854] INFO: task pvesr:3931554 blocked for more than 241 seconds.
[556307.314940] Tainted: P O 5.11.22-3-pve #1
[556307.315014] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[556307.315037] task:pvesr state:D stack: 0 pid:3931554 ppid:3931526 flags:0x00000000
[556307.315041] Call Trace:
[556307.315045] __schedule+0x2ca/0x880
[556307.315052] schedule+0x4f/0xc0
[556307.315055] rwsem_down_write_slowpath+0x212/0x590
[556307.315061] down_write+0x43/0x50
[556307.315064] filename_create+0x7e/0x160
[556307.315070] do_mkdirat+0x58/0x140
[556307.315073] __x64_sys_mkdir+0x1b/0x20
[556307.315076] do_syscall_64+0x38/0x90
[556307.315080] entry_SYSCALL_64_after_hwframe+0x44/0xa9
[556307.315084] RIP: 0033:0x7fcb8cbc7b07
[556307.315087] RSP: 002b:00007fffb911de98 EFLAGS: 00000246 ORIG_RAX: 0000000000000053
[556307.315089] RAX: ffffffffffffffda RBX: 000055e3ac9a62a0 RCX: 00007fcb8cbc7b07
[556307.315091] RDX: 000055e3ac455a65 RSI: 00000000000001ff RDI: 000055e3b0778880
[556307.315092] RBP: 0000000000000000 R08: 000055e3b09fe018 R09: 0000000000000000
[556307.315094] R10: 0000000000000008 R11: 0000000000000246 R12: 000055e3b0778880
[556307.315095] R13: 000055e3adc9f028 R14: 000055e3b0c37ef8 R15: 00000000000001ff