Removing ZFS SLOG device during backup to PBS makes a stack trace

May 18, 2024
53
12
8
So I forgot that rebooting the PVE makes backups run longer. Was wondering why SLOG was not doing anything and tried to remove it during the backup.

It took a while but eventually the remove went through AND nothing crashed but it yielded this in the log and on the console (removed SILENT from grub defaults)

Code:
Feb 19 21:03:55 pve kernel: INFO: task l2arc_feed:2077 blocked for more than 122 seconds.
Feb 19 21:03:55 pve kernel:       Tainted: P           O       6.8.12-8-pve #1
Feb 19 21:03:55 pve kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Feb 19 21:03:55 pve kernel: task:l2arc_feed      state:D stack:0     pid:2077  tgid:2077  ppid:2      flags:0x00004000
Feb 19 21:03:55 pve kernel: Call Trace:
Feb 19 21:03:55 pve kernel:  <TASK>
Feb 19 21:03:55 pve kernel:  __schedule+0x42b/0x1500
Feb 19 21:03:55 pve kernel:  ? __timer_delete_sync+0x8c/0x100
Feb 19 21:03:55 pve kernel:  schedule+0x33/0x110
Feb 19 21:03:55 pve kernel:  schedule_preempt_disabled+0x15/0x30
Feb 19 21:03:55 pve kernel:  __mutex_lock.constprop.0+0x3f8/0x7a0
Feb 19 21:03:55 pve kernel:  ? sigprocmask+0xb4/0xe0
Feb 19 21:03:55 pve kernel:  __mutex_lock_slowpath+0x13/0x20
Feb 19 21:03:55 pve kernel:  mutex_lock+0x3c/0x50
Feb 19 21:03:55 pve kernel:  l2arc_feed_thread+0x102/0x1ab0 [zfs]
Feb 19 21:03:55 pve kernel:  ? __pfx_l2arc_feed_thread+0x10/0x10 [zfs]
Feb 19 21:03:55 pve kernel:  ? __pfx_thread_generic_wrapper+0x10/0x10 [spl]
Feb 19 21:03:55 pve kernel:  thread_generic_wrapper+0x5c/0x70 [spl]
Feb 19 21:03:55 pve kernel:  kthread+0xef/0x120
Feb 19 21:03:55 pve kernel:  ? __pfx_kthread+0x10/0x10
Feb 19 21:03:55 pve kernel:  ret_from_fork+0x44/0x70
Feb 19 21:03:55 pve kernel:  ? __pfx_kthread+0x10/0x10
Feb 19 21:03:55 pve kernel:  ret_from_fork_asm+0x1b/0x30
Feb 19 21:03:55 pve kernel:  </TASK>
Feb 19 21:03:55 pve kernel: INFO: task zfs:303952 blocked for more than 122 seconds.
Feb 19 21:03:55 pve kernel:       Tainted: P           O       6.8.12-8-pve #1
Feb 19 21:03:55 pve kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Feb 19 21:03:55 pve kernel: task:zfs             state:D stack:0     pid:303952 tgid:303952 ppid:4692   flags:0x00000006
Feb 19 21:03:55 pve kernel: Call Trace:
Feb 19 21:03:55 pve kernel:  <TASK>
Feb 19 21:03:55 pve kernel:  __schedule+0x42b/0x1500
Feb 19 21:03:55 pve kernel:  schedule+0x33/0x110
Feb 19 21:03:55 pve kernel:  schedule_preempt_disabled+0x15/0x30
Feb 19 21:03:55 pve kernel:  __mutex_lock.constprop.0+0x3f8/0x7a0
Feb 19 21:03:55 pve kernel:  __mutex_lock_slowpath+0x13/0x20
Feb 19 21:03:55 pve kernel:  mutex_lock+0x3c/0x50
Feb 19 21:03:55 pve kernel:  spa_open_common+0x61/0x450 [zfs]
Feb 19 21:03:55 pve kernel:  ? srso_alias_return_thunk+0x5/0xfbef5
Feb 19 21:03:55 pve kernel:  ? __kmalloc_large_node+0xb6/0x130
Feb 19 21:03:55 pve kernel:  spa_open+0x13/0x30 [zfs]
Feb 19 21:03:55 pve kernel:  pool_status_check.constprop.0+0x6d/0x110 [zfs]
Feb 19 21:03:55 pve kernel:  zfsdev_ioctl_common+0x42e/0x9f0 [zfs]
Feb 19 21:03:55 pve kernel:  ? kvmalloc_node+0x5d/0x100
Feb 19 21:03:55 pve kernel:  ? srso_alias_return_thunk+0x5/0xfbef5
Feb 19 21:03:55 pve kernel:  ? __check_object_size+0x9d/0x300
Feb 19 21:03:55 pve kernel:  zfsdev_ioctl+0x57/0xf0 [zfs]
Feb 19 21:03:55 pve kernel:  __x64_sys_ioctl+0xa0/0xf0
Feb 19 21:03:55 pve kernel:  x64_sys_call+0xa71/0x2480
Feb 19 21:03:55 pve kernel:  do_syscall_64+0x81/0x170
Feb 19 21:03:55 pve kernel:  ? srso_alias_return_thunk+0x5/0xfbef5
Feb 19 21:03:55 pve kernel:  ? __mod_memcg_lruvec_state+0x87/0x140
Feb 19 21:03:55 pve kernel:  ? srso_alias_return_thunk+0x5/0xfbef5
Feb 19 21:03:55 pve kernel:  ? __mod_lruvec_state+0x36/0x50
Feb 19 21:03:55 pve kernel:  ? srso_alias_return_thunk+0x5/0xfbef5
Feb 19 21:03:55 pve kernel:  ? __lruvec_stat_mod_folio+0x70/0xc0
Feb 19 21:03:55 pve kernel:  ? srso_alias_return_thunk+0x5/0xfbef5
Feb 19 21:03:55 pve kernel:  ? set_ptes.constprop.0+0x2b/0xb0
Feb 19 21:03:55 pve kernel:  ? srso_alias_return_thunk+0x5/0xfbef5
Feb 19 21:03:55 pve kernel:  ? do_anonymous_page+0x3a8/0x740
Feb 19 21:03:55 pve kernel:  ? srso_alias_return_thunk+0x5/0xfbef5
Feb 19 21:03:55 pve kernel:  ? __pte_offset_map+0x1c/0x1b0
Feb 19 21:03:55 pve kernel:  ? srso_alias_return_thunk+0x5/0xfbef5
Feb 19 21:03:55 pve kernel:  ? __handle_mm_fault+0xbca/0xf70
Feb 19 21:03:55 pve kernel:  ? srso_alias_return_thunk+0x5/0xfbef5
Feb 19 21:03:55 pve kernel:  ? syscall_exit_to_user_mode+0x86/0x260
Feb 19 21:03:55 pve kernel:  ? do_syscall_64+0x8d/0x170
Feb 19 21:03:55 pve kernel:  ? srso_alias_return_thunk+0x5/0xfbef5
Feb 19 21:03:55 pve kernel:  ? __count_memcg_events+0x6f/0xe0
Feb 19 21:03:55 pve kernel:  ? srso_alias_return_thunk+0x5/0xfbef5
Feb 19 21:03:55 pve kernel:  ? count_memcg_events.constprop.0+0x2a/0x50
Feb 19 21:03:55 pve kernel:  ? srso_alias_return_thunk+0x5/0xfbef5
Feb 19 21:03:55 pve kernel:  ? handle_mm_fault+0xad/0x380
Feb 19 21:03:55 pve kernel:  ? srso_alias_return_thunk+0x5/0xfbef5
Feb 19 21:03:55 pve kernel:  ? do_user_addr_fault+0x33e/0x660
Feb 19 21:03:55 pve kernel:  ? srso_alias_return_thunk+0x5/0xfbef5
Feb 19 21:03:55 pve kernel:  ? irqentry_exit_to_user_mode+0x7b/0x260
Feb 19 21:03:55 pve kernel:  ? srso_alias_return_thunk+0x5/0xfbef5
Feb 19 21:03:55 pve kernel:  ? irqentry_exit+0x43/0x50
Feb 19 21:03:55 pve kernel:  ? srso_alias_return_thunk+0x5/0xfbef5
Feb 19 21:03:55 pve kernel:  ? exc_page_fault+0x94/0x1b0
Feb 19 21:03:55 pve kernel:  entry_SYSCALL_64_after_hwframe+0x78/0x80
Feb 19 21:03:55 pve kernel: RIP: 0033:0x7d6b4f9aacdb
Feb 19 21:03:55 pve kernel: RSP: 002b:00007ffd1bfdfee0 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
Feb 19 21:03:55 pve kernel: RAX: ffffffffffffffda RBX: 00007ffd1bfdff60 RCX: 00007d6b4f9aacdb
Feb 19 21:03:55 pve kernel: RDX: 00007ffd1bfdff60 RSI: 0000000000005a12 RDI: 0000000000000003
Feb 19 21:03:55 pve kernel: RBP: 00007ffd1bfdff50 R08: 00000000ffffffff R09: 0000000000000000
Feb 19 21:03:55 pve kernel: R10: 0000000000000022 R11: 0000000000000246 R12: 000061fedf40b2c0
Feb 19 21:03:55 pve kernel: R13: 000061fedf40b2c0 R14: 000061fedf416650 R15: 00007d6b4f0d4750
Feb 19 21:03:55 pve kernel:  </TASK>
Feb 19 21:04:49 pve ovs-vsctl[304846]: ovs|00001|vsctl|INFO|Called as /usr/bin/ovs-vsctl del-port fwln1000i0
Feb 19 21:04:49 pve ovs-vsctl[304846]: ovs|00002|db_ctl_base|ERR|no port named fwln1000i0
Feb 19 21:04:49 pve ovs-vsctl[304847]: ovs|00001|vsctl|INFO|Called as /usr/bin/ovs-vsctl del-port tap1000i0
Feb 19 21:04:49 pve qmeventd[3734]: read: Connection reset by peer
Feb 19 21:04:49 pve systemd[1]: 1000.scope: Deactivated successfully.
Feb 19 21:04:49 pve systemd[1]: 1000.scope: Consumed 1min 31.504s CPU time.
Feb 19 21:04:49 pve (udev-worker)[304830]: zd176: Failed to process device, ignoring: Unknown error 512
Feb 19 21:04:49 pve qmeventd[304853]: Starting cleanup for 1000
Feb 19 21:04:49 pve qmeventd[304853]: Finished cleanup for 1000
Feb 19 21:04:50 pve (udev-worker)[304830]: zd176p1: Failed to process device, ignoring: Unknown error 512
Feb 19 21:04:51 pve (udev-worker)[304851]: zd176p2: Failed to process device, ignoring: Unknown error 512
Feb 19 21:04:52 pve (udev-worker)[304829]: zd176p3: Failed to process device, ignoring: Unknown error 512
Feb 19 21:04:53 pve (udev-worker)[304920]: zd176p4: Failed to process device, ignoring: Unknown error 512
Feb 19 21:04:54 pve zed[304954]: eid=37 class=config_sync pool='hddpooli3'
Feb 19 21:04:54 pve zed[304980]: eid=38 class=vdev_remove_dev pool='hddpooli3' vdev=nvme-WD_Red_SN700_1000GB_23403L800524-part1 vdev_state=ONLINE
Feb 19 21:04:55 pve pvestatd[4692]: command 'zfs get -o value -Hp available,used hddpooli3' failed: got timeout
Feb 19 21:04:55 pve zed[304982]: eid=39 class=config_sync pool='hddpooli3'
Feb 19 21:04:55 pve pvestatd[4692]: status update time (237.501 seconds)
Feb 19 21:04:55 pve pvescheduler[220801]: INFO: Finished Backup of VM 1000 (00:37:47)
Feb 19 21:04:55 pve pvescheduler[220801]: INFO: Starting Backup of VM 2001 (qemu)
Feb 19 21:05:01 pve CRON[305175]: pam_unix(cron:session): session opened for user root(uid=0) by (uid=0)
Feb 19 21:05:01 pve CRON[305176]: (root) CMD (command -v debian-sa1 > /dev/null && debian-sa1 1 1)
Feb 19 21:05:01 pve CRON[305175]: pam_unix(cron:session): session closed for user root
Feb 19 21:06:05 pve zed[305769]: eid=45 class=config_sync pool='hddpooli3'
Feb 19 21:06:06 pve zed[305781]: eid=47 class=vdev_add pool='hddpooli3'
Feb 19 21:06:06 pve zed[305780]: eid=46 class=config_sync pool='hddpooli3'

Fleecing in backups is in use. Don't know if it works as intended or anything. Nothing seems to have broken.

So I am happy.

Upgraded to kernel 6.8.12-8-pve AND maybe that changed the SLOG and ZIL activity. Had gigabytes/day synchronously written. Now only a few megabytes.
Old kernel was 6.8.12-6-pve.

Code:
proxmox-ve: 8.3.0 (running kernel: 6.8.12-8-pve)
pve-manager: 8.3.3 (running version: 8.3.3/f157a38b211595d6)
proxmox-kernel-helper: 8.1.0
pve-kernel-5.15: 7.4-4
proxmox-kernel-6.8: 6.8.12-8
proxmox-kernel-6.8.12-8-pve-signed: 6.8.12-8
proxmox-kernel-6.8.12-6-pve-signed: 6.8.12-6
proxmox-kernel-6.8.12-4-pve-signed: 6.8.12-4
proxmox-kernel-6.8.8-4-pve-signed: 6.8.8-4
proxmox-kernel-6.8.8-2-pve-signed: 6.8.8-2
proxmox-kernel-6.5.13-6-pve-signed: 6.5.13-6
proxmox-kernel-6.5: 6.5.13-6
pve-kernel-5.15.108-1-pve: 5.15.108-1
pve-kernel-5.15.74-1-pve: 5.15.74-1
amd64-microcode: 3.20240820.1~deb12u1
ceph-fuse: 16.2.15+ds-0+deb12u1
corosync: 3.1.7-pve3
criu: 3.17.1-2+deb12u1
glusterfs-client: 10.3-5
ifupdown2: 3.2.0-1+pmx11
ksm-control-daemon: 1.5-1
libjs-extjs: 7.0.0-5
libknet1: 1.28-pve1
libproxmox-acme-perl: 1.5.1
libproxmox-backup-qemu0: 1.5.1
libproxmox-rs-perl: 0.3.4
libpve-access-control: 8.2.0
libpve-apiclient-perl: 3.3.2
libpve-cluster-api-perl: 8.0.10
libpve-cluster-perl: 8.0.10
libpve-common-perl: 8.2.9
libpve-guest-common-perl: 5.1.6
libpve-http-server-perl: 5.2.0
libpve-network-perl: 0.10.0
libpve-rs-perl: 0.9.1
libpve-storage-perl: 8.3.3
libspice-server1: 0.15.1-1
lvm2: 2.03.16-2
lxc-pve: 6.0.0-1
lxcfs: 6.0.0-pve2
novnc-pve: 1.5.0-1
openvswitch-switch: 3.1.0-2+deb12u1
proxmox-backup-client: 3.3.2-1
proxmox-backup-file-restore: 3.3.2-2
proxmox-firewall: 0.6.0
proxmox-kernel-helper: 8.1.0
proxmox-mail-forward: 0.3.1
proxmox-mini-journalreader: 1.4.0
proxmox-widget-toolkit: 4.3.4
pve-cluster: 8.0.10
pve-container: 5.2.3
pve-docs: 8.3.1
pve-edk2-firmware: 4.2023.08-4
pve-esxi-import-tools: 0.7.2
pve-firewall: 5.1.0
pve-firmware: 3.14-3
pve-ha-manager: 4.0.6
pve-i18n: 3.3.3
pve-qemu-kvm: 9.0.2-5
pve-xtermjs: 5.3.0-3
qemu-server: 8.3.7
smartmontools: 7.3-pve1
spiceterm: 3.3.0
swtpm: 0.8.0+pve1
vncterm: 1.8.0
zfsutils-linux: 2.2.7-pve1