Warning: do not remove ZFS cache device remotely (machine may hang)

niziak

Member
Apr 18, 2020
20
5
23
45
Last days I decided to improve my experimental CEPH cluster (4 x PVE = 4 x OSD = 4 x 2TB HDD) performance by adding DB on small partition of NVMe.
To do this I need to cut some space from existing NVMe L2ARC partition.
Every PVE host has 2 x HDD for rpool, and rpool's ZIL and rpool's L2ARC are located on partitions on NVMe.

Everything went smoothly on 2 machines, but on 3rd machine after issuing command zpool remove rpool /dev/nvme0n1p3 machine "hangs".
With 2 previous machines it works smoothly, without any hang or lag.
I still can ping 3rd machine. Command nmap returns some PVE normal open ports, but there is response from aby service from this machine.
I don't think there is kernel panic, because I've added to every PVEs panic=30 to kernel cmdline.

I will inform when I got more information. There is similar issue reported to open-zfs project: removing L2ARC device from pool sometimes causes a freeze