Zfs unmount process stuck while rollback

Petr.114

Active Member
Jun 25, 2019
35
2
28
32
Hello,
previous week i faced the issue of process zfs unmount stuck while proceeding rollback of LXC. After that, also incoming replication got stuck (zfs receive).

I though allright, its been long time since restart and there was new kernel and zfs installed but system was not rebooted. So i have restarted the system and the running pve-kernel version changed from 5.15.83-1 to 5.15.107-2 and ZFS changed from 2.1.7 to 2.1.11.

Unfortunatelly, it took just few days and i got into this issue again. I was hoping after rebooting the server, the new version of kernel and zfs, will solve the problem and it will not appear again. Unfortunatelly i was wrong.

The reroll issue happened for the same LXC - 194. Since the reboot i have used rollback only once for LXC 194 and it went allright, but today, the second time it got stuck again.

Since the process is in D state, it cannot be killed.

I will appreciate any hints for possible solution or process to reveal more details about this issue.


zfs unmount.png
zfs recv.png

proxmox-ve: 7.4-1 (running kernel: 5.15.107-2-pve) pve-manager: 7.4-3 (running version: 7.4-3/9002ab8a) pve-kernel-5.15: 7.4-3 pve-kernel-5.15.107-2-pve: 5.15.107-2 pve-kernel-5.15.107-1-pve: 5.15.107-1 pve-kernel-5.15.102-1-pve: 5.15.102-1 pve-kernel-5.15.85-1-pve: 5.15.85-1 pve-kernel-5.15.83-1-pve: 5.15.83-1 pve-kernel-5.15.74-1-pve: 5.15.74-1 ceph-fuse: 15.2.17-pve1 corosync: 3.1.7-pve1 criu: 3.15-1+pve-1 glusterfs-client: 9.2-1 ifupdown2: 3.1.0-1+pmx4 ksm-control-daemon: 1.4-1 libjs-extjs: 7.0.0-1 libknet1: 1.24-pve2 libproxmox-acme-perl: 1.4.4 libproxmox-backup-qemu0: 1.3.1-1 libproxmox-rs-perl: 0.2.1 libpve-access-control: 7.4-3 libpve-apiclient-perl: 3.2-1 libpve-common-perl: 7.4-1 libpve-guest-common-perl: 4.2-4 libpve-http-server-perl: 4.2-3 libpve-rs-perl: 0.7.6 libpve-storage-perl: 7.4-2 libspice-server1: 0.14.3-2.1 lvm2: 2.03.11-2.1 lxc-pve: 5.0.2-2 lxcfs: 5.0.3-pve1 novnc-pve: 1.4.0-1 proxmox-backup-client: 2.4.2-1 proxmox-backup-file-restore: 2.4.2-1 proxmox-kernel-helper: 7.4-1 proxmox-mail-forward: 0.1.1-1 proxmox-mini-journalreader: 1.3-1 proxmox-widget-toolkit: 3.7.0 pve-cluster: 7.3-3 pve-container: 4.4-3 pve-docs: 7.4-2 pve-edk2-firmware: 3.20230228-2 pve-firewall: 4.3-2 pve-firmware: 3.6-5 pve-ha-manager: 3.6.1 pve-i18n: 2.12-1 pve-qemu-kvm: 7.2.0-8 pve-xtermjs: 4.16.0-1 qemu-server: 7.4-3 smartmontools: 7.2-pve3 spiceterm: 3.2-2 swtpm: 0.8.0~bpo11+3 vncterm: 1.7-1 zfsutils-linux: 2.1.11-pve1

11:12:09 up 4 days, 13:23, 2 users, load average: 12.89, 11.37, 11.12

total used free shared buff/cache available Mem: 188Gi 122Gi 60Gi 2.6Gi 5.7Gi 62Gi Swap: 54Gi 1.6Gi 53Gi

root 3459569 0.0 0.0 0 0 ? D 09:51 0:00 [z_zrele] root 3459708 0.0 0.0 8876 4572 ? D 09:51 0:00 zfs unmount rpool/data/subvol-194-disk-0 root 3821079 0.0 0.0 8880 4696 ? D 10:14 0:00 zfs recv -F -- rpool/data/subvol-100-disk-0

[Jun 6 09:55] INFO: task z_zrele:3459569 blocked for more than 120 seconds. [ +0.001258] Tainted: P O 5.15.107-2-pve #1 [ +0.001135] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ +0.001185] task:z_zrele state:D stack: 0 pid:3459569 ppid: 2 flags:0x00004000 [ +0.001122] Call Trace: [ +0.001098] <TASK> [ +0.001058] __schedule+0x34e/0x1740 [ +0.001088] ? __mod_memcg_lruvec_state+0x63/0xe0 [ +0.001044] ? __count_memcg_events+0x40/0xa0 [ +0.000979] schedule+0x69/0x110 [ +0.000925] io_schedule+0x46/0x80 [ +0.000970] wait_on_page_bit_common+0x114/0x3e0 [ +0.000924] ? filemap_invalidate_unlock_two+0x50/0x50 [ +0.000861] __lock_page+0x4c/0x60 [ +0.000945] truncate_inode_pages_range+0x352/0x6a0 [ +0.000925] truncate_pagecache+0x49/0x70 [ +0.000923] truncate_setsize+0x27/0x60 [ +0.000908] zpl_evict_inode+0x33/0x60 [zfs] [ +0.001147] evict+0xd2/0x190 [ +0.000947] iput.part.0+0x174/0x1d0 [ +0.000926] iput+0x1c/0x30 [ +0.000911] zfs_rele_async_task+0xe/0x20 [zfs] [ +0.001015] taskq_thread+0x29f/0x4d0 [spl] [ +0.000896] ? wake_up_q+0x90/0x90 [ +0.000932] ? taskq_thread_spawn+0x60/0x60 [spl] [ +0.000939] kthread+0x12a/0x150 [ +0.000819] ? set_kthread_struct+0x50/0x50 [ +0.000794] ret_from_fork+0x22/0x30 [ +0.000798] </TASK> [ +0.000762] INFO: task zfs:3459708 blocked for more than 120 seconds. [ +0.000743] Tainted: P O 5.15.107-2-pve #1 [ +0.000734] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ +0.000706] task:zfs state:D stack: 0 pid:3459708 ppid:3456877 flags:0x00004000 [ +0.000702] Call Trace: [ +0.000710] <TASK> [ +0.000676] __schedule+0x34e/0x1740 [ +0.000676] ? wake_up_process+0x15/0x20 [ +0.000644] ? __queue_work+0x20c/0x490 [ +0.000650] schedule+0x69/0x110 [ +0.000676] taskq_wait_outstanding+0x90/0xd0 [spl] [ +0.000647] ? wait_woken+0x70/0x70 [ +0.000598] zfs_preumount+0x4b/0x70 [zfs] [ +0.000683] zpl_kill_sb+0x13/0x30 [zfs] [ +0.000658] deactivate_locked_super+0x3b/0xb0 [ +0.000572] deactivate_super+0x44/0x50 [ +0.000631] cleanup_mnt+0x150/0x1d0 [ +0.000606] __cleanup_mnt+0x12/0x20 [ +0.000558] task_work_run+0x6d/0xb0 [ +0.000537] exit_to_user_mode_prepare+0x1a8/0x1b0 [ +0.000556] syscall_exit_to_user_mode+0x27/0x50 [ +0.000531] do_syscall_64+0x69/0xc0 [ +0.000531] entry_SYSCALL_64_after_hwframe+0x61/0xcb [ +0.000506] RIP: 0033:0x7f67e5288a67 [ +0.000532] RSP: 002b:00007ffc71e39b18 EFLAGS: 00000246 ORIG_RAX: 00000000000000a6 [ +0.000555] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 00007f67e5288a67 [ +0.000554] RDX: 0009200008010000 RSI: 0000000000000000 RDI: 000055da7cd74510 [ +0.000542] RBP: 00007ffc71e39b80 R08: 0000000000000020 R09: 0000000000000000 [ +0.000536] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000 [ +0.000529] R13: 000055da7cd74510 R14: 000055da7cd65320 R15: 000055da7cd678a0 [ +0.000532] </TASK> [Jun 6 09:59] INFO: task z_zrele:3459569 blocked for more than 362 seconds. [ +0.000743] Tainted: P O 5.15.107-2-pve #1 [ +0.000536] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ +0.000550] task:z_zrele state:D stack: 0 pid:3459569 ppid: 2 flags:0x00004000 [ +0.000604] Call Trace: [ +0.000575] <TASK> [ +0.000659] __schedule+0x34e/0x1740 [ +0.000627] ? __mod_memcg_lruvec_state+0x63/0xe0 [ +0.000614] ? __count_memcg_events+0x40/0xa0 [ +0.000543] schedule+0x69/0x110 [ +0.000562] io_schedule+0x46/0x80 [ +0.000607] wait_on_page_bit_common+0x114/0x3e0 [ +0.000608] ? filemap_invalidate_unlock_two+0x50/0x50 [ +0.000649] __lock_page+0x4c/0x60 [ +0.000596] truncate_inode_pages_range+0x352/0x6a0 [ +0.000571] truncate_pagecache+0x49/0x70 [ +0.000598] truncate_setsize+0x27/0x60 [ +0.000576] zpl_evict_inode+0x33/0x60 [zfs] [ +0.000686] evict+0xd2/0x190 [ +0.000535] iput.part.0+0x174/0x1d0 [ +0.000601] iput+0x1c/0x30 [ +0.000598] zfs_rele_async_task+0xe/0x20 [zfs] [ +0.000649] taskq_thread+0x29f/0x4d0 [spl] [ +0.000563] ? wake_up_q+0x90/0x90 [ +0.000556] ? taskq_thread_spawn+0x60/0x60 [spl] [ +0.000564] kthread+0x12a/0x150 [ +0.000569] ? set_kthread_struct+0x50/0x50 [ +0.000572] ret_from_fork+0x22/0x30 [ +0.000563] </TASK> [ +0.000516] INFO: task zfs:3459708 blocked for more than 362 seconds. [ +0.000575] Tainted: P O 5.15.107-2-pve #1 [ +0.000570] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ +0.000572] task:zfs state:D stack: 0 pid:3459708 ppid:3456877 flags:0x00004000 [ +0.000617] Call Trace: [ +0.000603] <TASK> [ +0.000582] __schedule+0x34e/0x1740 [ +0.000586] ? wake_up_process+0x15/0x20 [ +0.000609] ? __queue_work+0x20c/0x490 [ +0.000581] schedule+0x69/0x110 [ +0.000582] taskq_wait_outstanding+0x90/0xd0 [spl] [ +0.000626] ? wait_woken+0x70/0x70 [ +0.000618] zfs_preumount+0x4b/0x70 [zfs] [ +0.000697] zpl_kill_sb+0x13/0x30 [zfs] [ +0.000646] deactivate_locked_super+0x3b/0xb0 [ +0.000582] deactivate_super+0x44/0x50 [ +0.000586] cleanup_mnt+0x150/0x1d0 [ +0.000564] __cleanup_mnt+0x12/0x20 [ +0.000607] task_work_run+0x6d/0xb0 [ +0.000548] exit_to_user_mode_prepare+0x1a8/0x1b0 [ +0.000546] syscall_exit_to_user_mode+0x27/0x50 [ +0.000535] do_syscall_64+0x69/0xc0 [ +0.000506] entry_SYSCALL_64_after_hwframe+0x61/0xcb [ +0.000523] RIP: 0033:0x7f67e5288a67 [ +0.000506] RSP: 002b:00007ffc71e39b18 EFLAGS: 00000246 ORIG_RAX: 00000000000000a6 [ +0.000561] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 00007f67e5288a67 [ +0.000569] RDX: 0009200008010000 RSI: 0000000000000000 RDI: 000055da7cd74510 [ +0.000531] RBP: 00007ffc71e39b80 R08: 0000000000000020 R09: 0000000000000000 [ +0.000561] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000 [ +0.000566] R13: 000055da7cd74510 R14: 000055da7cd65320 R15: 000055da7cd678a0 [ +0.000555] </TASK> [Jun 6 10:03] INFO: task z_zrele:3459569 blocked for more than 604 seconds. [ +0.001032] Tainted: P O 5.15.107-2-pve #1 [ +0.000793] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ +0.000740] task:z_zrele state:D stack: 0 pid:3459569 ppid: 2 flags:0x00004000 [ +0.000818] Call Trace: [ +0.000805] <TASK> [ +0.000789] __schedule+0x34e/0x1740 [ +0.000787] ? __mod_memcg_lruvec_state+0x63/0xe0 [ +0.000758] ? __count_memcg_events+0x40/0xa0 [ +0.000786] schedule+0x69/0x110 [ +0.000851] io_schedule+0x46/0x80 [ +0.000755] wait_on_page_bit_common+0x114/0x3e0 [ +0.000773] ? filemap_invalidate_unlock_two+0x50/0x50 [ +0.000765] __lock_page+0x4c/0x60 [ +0.000768] truncate_inode_pages_range+0x352/0x6a0 [ +0.000805] truncate_pagecache+0x49/0x70 [ +0.000782] truncate_setsize+0x27/0x60 [ +0.000764] zpl_evict_inode+0x33/0x60 [zfs] [ +0.000955] evict+0xd2/0x190 [ +0.000746] iput.part.0+0x174/0x1d0 [ +0.000761] iput+0x1c/0x30 [ +0.000690] zfs_rele_async_task+0xe/0x20 [zfs] [ +0.000837] taskq_thread+0x29f/0x4d0 [spl] [ +0.000721] ? wake_up_q+0x90/0x90 [ +0.000735] ? taskq_thread_spawn+0x60/0x60 [spl] [ +0.000776] kthread+0x12a/0x150 [ +0.000728] ? set_kthread_struct+0x50/0x50 [ +0.000720] ret_from_fork+0x22/0x30 [ +0.000714] </TASK> [ +0.000692] INFO: task zfs:3459708 blocked for more than 604 seconds. [ +0.000733] Tainted: P O 5.15.107-2-pve #1 [ +0.000791] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ +0.000771] task:zfs state:D stack: 0 pid:3459708 ppid:3456877 flags:0x00004000 [ +0.000773] Call Trace: [ +0.000764] <TASK> [ +0.000773] __schedule+0x34e/0x1740 [ +0.000852] ? wake_up_process+0x15/0x20 [ +0.000792] ? __queue_work+0x20c/0x490 [ +0.000757] schedule+0x69/0x110 [ +0.000777] taskq_wait_outstanding+0x90/0xd0 [spl] [ +0.000791] ? wait_woken+0x70/0x70 [ +0.000823] zfs_preumount+0x4b/0x70 [zfs] [ +0.000902] zpl_kill_sb+0x13/0x30 [zfs] [ +0.000836] deactivate_locked_super+0x3b/0xb0 [ +0.000772] deactivate_super+0x44/0x50 [ +0.000735] cleanup_mnt+0x150/0x1d0 [ +0.000809] __cleanup_mnt+0x12/0x20 [ +0.000720] task_work_run+0x6d/0xb0 [ +0.000658] exit_to_user_mode_prepare+0x1a8/0x1b0 [ +0.000744] syscall_exit_to_user_mode+0x27/0x50 [ +0.000638] do_syscall_64+0x69/0xc0 [ +0.000712] entry_SYSCALL_64_after_hwframe+0x61/0xcb [ +0.000664] RIP: 0033:0x7f67e5288a67 [ +0.000701] RSP: 002b:00007ffc71e39b18 EFLAGS: 00000246 ORIG_RAX: 00000000000000a6 [ +0.000690] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 00007f67e5288a67 [ +0.000708] RDX: 0009200008010000 RSI: 0000000000000000 RDI: 000055da7cd74510 [ +0.000693] RBP: 00007ffc71e39b80 R08: 0000000000000020 R09: 0000000000000000 [ +0.000730] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000 [ +0.000712] R13: 000055da7cd74510 R14: 000055da7cd65320 R15: 000055da7cd678a0

arch: amd64 cores: 4 hostname: thingsboard-testing lock: rollback memory: 10240 net0: name=eth0,bridge=vmbr0,firewall=1,hwaddr=3E:05:71:41:7F:20,ip=dhcp,type=veth onboot: 1 ostype: ubuntu parent: pred_pouzitim_java_convertoru rootfs: local-zfs:subvol-194-disk-0,size=180G swap: 2048 [before_dump] arch: amd64 cores: 2 hostname: thingsboard-testing memory: 5120 net0: name=eth0,bridge=vmbr0,firewall=1,hwaddr=3E:05:71:41:7F:20,ip=dhcp,type=veth onboot: 1 ostype: ubuntu parent: pred_nahratim_cql rootfs: local-zfs:subvol-194-disk-0,size=120G snaptime: 1685104752 swap: 2048 [nainstalovana_cassandra] arch: amd64 cores: 2 hostname: thingsboard-testing memory: 5120 net0: name=eth0,bridge=vmbr0,firewall=1,hwaddr=3E:05:71:41:7F:20,ip=dhcp,type=veth onboot: 1 ostype: ubuntu parent: po_buildu_jar rootfs: local-zfs:subvol-194-disk-0,size=120G snaptime: 1685021857 swap: 2048 [po_buildu_jar] arch: amd64 cores: 2 hostname: thingsboard-testing memory: 5120 net0: name=eth0,bridge=vmbr0,firewall=1,hwaddr=3E:05:71:41:7F:20,ip=dhcp,type=veth onboot: 1 ostype: ubuntu parent: snap_working_community rootfs: local-zfs:subvol-194-disk-0,size=120G snaptime: 1685021009 swap: 2048 [po_dumpu] arch: amd64 cores: 2 hostname: thingsboard-testing memory: 5120 net0: name=eth0,bridge=vmbr0,firewall=1,hwaddr=3E:05:71:41:7F:20,ip=dhcp,type=veth onboot: 1 ostype: ubuntu parent: before_dump rootfs: local-zfs:subvol-194-disk-0,size=180G snaptime: 1685354172 swap: 2048 [pred_nahratim_cql] arch: amd64 cores: 2 hostname: thingsboard-testing memory: 5120 net0: name=eth0,bridge=vmbr0,firewall=1,hwaddr=3E:05:71:41:7F:20,ip=dhcp,type=veth onboot: 1 ostype: ubuntu parent: nainstalovana_cassandra rootfs: local-zfs:subvol-194-disk-0,size=120G snaptime: 1685104372 swap: 2048 [pred_pouzitim_java_convertoru] arch: amd64 cores: 2 hostname: thingsboard-testing memory: 5120 net0: name=eth0,bridge=vmbr0,firewall=1,hwaddr=3E:05:71:41:7F:20,ip=dhcp,type=veth onboot: 1 ostype: ubuntu parent: po_dumpu rootfs: local-zfs:subvol-194-disk-0,size=180G snaptime: 1685355177 swap: 2048 [snap_1] arch: amd64 cores: 2 hostname: thingsboard-testing memory: 5120 net0: name=eth0,bridge=vmbr0,firewall=1,hwaddr=3E:05:71:41:7F:20,ip=dhcp,type=veth onboot: 1 ostype: ubuntu rootfs: local-zfs:subvol-194-disk-0,size=120G snaptime: 1685013073 swap: 2048 [snap_working_community] arch: amd64 cores: 2 hostname: thingsboard-testing memory: 5120 net0: name=eth0,bridge=vmbr0,firewall=1,hwaddr=3E:05:71:41:7F:20,ip=dhcp,type=veth onboot: 1 ostype: ubuntu parent: snap_1 rootfs: local-zfs:subvol-194-disk-0,size=120G snaptime: 1685014924 swap: 2048

total 4.5K drwxr----- 2 root root 2 May 25 11:21 . drwxr-xr-x 50 root root 50 May 25 11:21 ..
 
Hi,
is there anything interesting in the system logs before the stuck task messages? What does zpool status -v show? I'd also check the health of the underlying physical disks with e.g. smartctl just to be sure.
 
All disks completed the test without error.
I have used command smartctl -t long /dev/sd[a-f]

Right now, there is no issue, because since restart of the server, i have not made a rollbacke on that LXC.

Code:
  pool: rpool
 state: ONLINE
  scan: scrub repaired 0B in 05:52:17 with 0 errors on Sun Jun 11 06:16:18 2023
config:

        NAME                                                     STATE     READ WRITE CKSUM
        rpool                                                    ONLINE       0     0     0
          raidz1-0                                               ONLINE       0     0     0
            ata-Samsung_SSD_860_EVO_1TB_S3Z9NA0MC00282B-part3    ONLINE       0     0     0
            ata-Samsung_SSD_860_EVO_1TB_S4CSNF0M949079E-part3    ONLINE       0     0     0
            ata-SAMSUNG_MZ7LM960HMJP-00005_S2TZNX0J312819-part3  ONLINE       0     0     0
            ata-SAMSUNG_MZ7LM960HMJP-00005_S2TZNX0J312822-part3  ONLINE       0     0     0
            ata-Samsung_SSD_860_PRO_1TB_S42NNF0M902570N-part3    ONLINE       0     0     0
            ata-Samsung_SSD_860_PRO_1TB_S42NNF0K901427Y-part3    ONLINE       0     0     0

errors: No known data errors
 
Can you share the system log from before the first
Code:
INFO: task z_zrele:3459569 blocked for more than 120 seconds
message?
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!