[SOLVED] Nested Proxmox; blocked for more than 120 seconds.

leesteken

Distinguished Member
May 31, 2020
7,812
2,617
278
Found this in the Syslog while running automated updates on the only 3 containers inside a nested PVE 8.0.4 (ext4 and LVM, on top on PVE 8.0.4 with ZFS):
Code:
nov 08 21:51:00 pve9 kernel: INFO: task dmeventd:337 blocked for more than 120 seconds.
nov 08 21:51:00 pve9 kernel:       Tainted: P           O       6.2.16-19-pve #1
nov 08 21:51:00 pve9 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
nov 08 21:51:00 pve9 kernel: task:dmeventd        state:D stack:0     pid:337   ppid:1      flags:0x00000002
nov 08 21:51:00 pve9 kernel: Call Trace:
nov 08 21:51:00 pve9 kernel:  <TASK>
nov 08 21:51:00 pve9 kernel:  __schedule+0x402/0x1510
nov 08 21:51:00 pve9 kernel:  ? srso_alias_return_thunk+0x5/0x7f
nov 08 21:51:00 pve9 kernel:  ? post_alloc_hook+0xcc/0x120
nov 08 21:51:00 pve9 kernel:  schedule+0x63/0x110
nov 08 21:51:00 pve9 kernel:  schedule_preempt_disabled+0x15/0x30
nov 08 21:51:00 pve9 kernel:  rwsem_down_read_slowpath+0x284/0x4d0
nov 08 21:51:00 pve9 kernel:  down_read+0x48/0xc0
nov 08 21:51:00 pve9 kernel:  dm_pool_get_metadata_transaction_id+0x23/0x60 [dm_thin_pool]
nov 08 21:51:00 pve9 kernel:  pool_status+0x1c4/0x810 [dm_thin_pool]
nov 08 21:51:00 pve9 kernel:  retrieve_status+0x15a/0x220
nov 08 21:51:00 pve9 kernel:  table_status+0x9b/0x150
nov 08 21:51:00 pve9 kernel:  ctl_ioctl+0x349/0x690
nov 08 21:51:00 pve9 kernel:  ? __pfx_table_status+0x10/0x10
nov 08 21:51:00 pve9 kernel:  dm_ctl_ioctl+0xe/0x20
nov 08 21:51:00 pve9 kernel:  __x64_sys_ioctl+0xa0/0xe0
nov 08 21:51:00 pve9 kernel:  do_syscall_64+0x5b/0x90
nov 08 21:51:00 pve9 kernel:  ? srso_alias_return_thunk+0x5/0x7f
nov 08 21:51:00 pve9 kernel:  ? do_syscall_64+0x67/0x90
nov 08 21:51:00 pve9 kernel:  ? srso_alias_return_thunk+0x5/0x7f
nov 08 21:51:00 pve9 kernel:  ? irqentry_exit+0x43/0x50
nov 08 21:51:00 pve9 kernel:  ? srso_alias_return_thunk+0x5/0x7f
nov 08 21:51:00 pve9 kernel:  ? common_interrupt+0x54/0xb0
nov 08 21:51:00 pve9 kernel:  entry_SYSCALL_64_after_hwframe+0x73/0xdd
nov 08 21:51:00 pve9 kernel: RIP: 0033:0x7f398c8f7b5b
nov 08 21:51:00 pve9 kernel: RSP: 002b:00007f398bcaeaa0 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
nov 08 21:51:00 pve9 kernel: RAX: ffffffffffffffda RBX: 00005595fb7f28b8 RCX: 00007f398c8f7b5b
nov 08 21:51:00 pve9 kernel: RDX: 00007f3984042300 RSI: 00000000c138fd0c RDI: 0000000000000007
nov 08 21:51:00 pve9 kernel: RBP: 00007f398bcaebc0 R08: 0000000000000004 R09: 00007f398ca39518
nov 08 21:51:00 pve9 kernel: R10: 00007f398ca38842 R11: 0000000000000246 R12: 0000000000000000
nov 08 21:51:00 pve9 kernel: R13: 00007f398ca38842 R14: 00007f398ca38842 R15: 00007f398ca38842
nov 08 21:51:00 pve9 kernel:  </TASK>
See the attached file for all blocked tasks and stacktraces. This is the VM configuration:
Code:
agent: 1
args: -global scsi-hd.physical_block_size=4k -global scsi-hd.logical_block_size=4k
balloon: 2048
bios: ovmf
boot: order=scsi0;scsi1
cores: 1
cpu: EPYC-Milan,flags=-pcid;+ibpb;+virt-ssbd;+amd-ssbd
efidisk0: qpool-zfs:vm-109-disk-0,efitype=4m,size=1M
hotplug: 0
memory: 4096
meta: creation-qemu=7.2.0,ctime=1689939036
name: pve9
net0: virtio=52:54:56:17:02:09,bridge=vmbr2,firewall=1
net1: virtio=52:54:54:17:03:09,bridge=vmbr3
numa: 1
onboot: 1
ostype: l26
rng0: source=/dev/urandom
scsi0: none,media=cdrom
scsi1: qpool-zfs:vm-109-disk-1,cache=writeback,discard=on,size=9G,ssd=1
scsi2: qpool-zfs:vm-109-disk-2,backup=0,cache=writeback,discard=on,size=24G,ssd=1
scsihw: virtio-scsi-pci
serial0: socket
smbios1: uuid=dda0dacd-e7c3-4179-bd2d-c8906871b360
sockets: 2
startup: down=45
tablet: 0
vga: serial0
vmgenid: 0a443592-b16d-4769-8ec2-81d426b8a2c1
All containers were question-marked and would not stop. Forced stop of the pve9 VM did finally worked but the nested Proxmox GUI was working otherwise. It only happened today and I don't seem to be able to reproduce the issue. No problems found in the Syslog on the host. Does anyone happen recognize this issue and know how to prevent it?

EDIT: I guess nobody knows and the problem is not reproducible, so I don't expect any answer ever.
EDIT2: Maybe it's one of those rare edge cases:
pve-qemu-kvm (8.1.2-5) bookworm; urgency=medium

* backport workaround for stuck guest IO with iothread and VirtIO block/SCSI
in some rare edge cases
 

Attachments

Last edited: