pct snapshot gets stuck

ohmantics

Member
Jan 23, 2023
14
0
6
When I issue a "pct snapshot 103 foo" there is no progress, no apparent log messages, no nothing. I've tried waiting three hours. Ctrl-C recovers just fine.

It's a slightly older box: pve-manager/8.4.14/b502d23c55afcba1 (running kernel: 6.8.12-16-pve)

Code:
arch: amd64
cores: 4
features: fuse=1,keyctl=1,nesting=1,mount=cifs
hostname: foobar
memory: 16384
net0: name=eth0,bridge=vmbr0,hwaddr=xx:yy:zz:aa:bb:cc,ip=dhcp,type=veth
net1: name=eth1,bridge=vmbr1,hwaddr=xx:yy:zz:aa:bb:cc,ip=172.31.0.2/24,type=veth
onboot: 1
ostype: debian
rootfs: local-zfs:subvol-103-disk-1,mountoptions=noatime,size=100G
startup: order=100
swap: 16384
lxc.cap.drop:
lxc.apparmor.profile: unconfined

Issuing a direct zfs snapshot command for the subvol-103-disk-1 dataset is instant.

Any ideas?
 
Hi,
how does the process tree look like while it hangs? E.g. the subtree of ps faxl starting with the pct command. If you check strace -p PID with the PID of the most nested command in the tree, what does it show?

Can you try without the FUSE feature and/or CIFS, maybe it's on of those?
 
Code:
4     0 3834818 3834817  20   0   9132  5480 do_wai Ss   pts/1      0:00  |       \_ -bash
4     0 3839421 3834818  20   0 196888 138644 poll_s S   pts/1      0:00  |           \_ /usr/bin/perl -T /usr/sbin/pct snapshot 103 foo
1     0 3839499 3839421  20   0 204120 121176 do_wai S+  pts/1      0:00  |               \_ task UPID:proxmox:003A960B:026B0553:69598C73:vzsnapshot:103:root@pam:
5     0 3839501 3839499  20   0 204120 119332 reques D+  pts/1      0:00  |                   \_ task UPID:proxmox:003A960B:026B0553:69598C73:vzsnapshot:103:root@pam:

strace -p 3839501 produces no output after attaching and has to be killed because ^C doesn't work.

I need FUSE and CIFS. zfs snapshot works instantly. Beyond suspending/resuming I'm not immediately clear what other operations are needed for a snapshot here.
 
Okay, so the task is in uninterruptible D state, but we don't know which operation exactly. What does pvesm status say? Once before the hang and once while the hang is occurring might be interesting. Are there any messages in the journal from around the time of the operation?
 
Okay, so the task is in uninterruptible D state, but we don't know which operation exactly. What does pvesm status say? Once before the hang and once while the hang is occurring might be interesting. Are there any messages in the journal from around the time of the operation?
Before the command:
Code:
root@proxmox:~# pvesm status
Name             Type     Status     Total (KiB)      Used (KiB) Available (KiB)        %
local             dir     active       741996672        35941120       706055552    4.84%
local-zfs     zfspool     active       857593892       151538228       706055664   17.67%
After the command:
Code:
root@proxmox:~# pvesm status
Name             Type     Status     Total (KiB)      Used (KiB) Available (KiB)        %
local             dir     active       741996672        35941120       706055552    4.84%
local-zfs     zfspool     active       857593864       151538228       706055636   17.67%

From journalctl -f during the command:
Code:
Jan 05 13:56:36 proxmox sudo[3717833]: pam_unix(sudo:session): session opened for user root(uid=0) by (uid=112)
Jan 05 13:56:36 proxmox sudo[3717833]: pam_unix(sudo:session): session closed for user root
Jan 05 13:57:31 proxmox pct[3721703]: <root@pam> starting task UPID:proxmox:0038CA28:037463C8:695C33CB:vzsnapshot:103:root@pam:
Jan 05 13:57:31 proxmox pct[3721768]: <root@pam> snapshot container 103: foo
Jan 05 13:57:36 proxmox pct[3721768]: received interrupt
Jan 05 13:57:36 proxmox pct[3721703]: <root@pam> end task UPID:proxmox:0038CA28:037463C8:695C33CB:vzsnapshot:103:root@pam: received interrupt