[SOLVED] Nested Proxmox; blocked for more than 120 seconds.

leesteken

Distinguished Member
May 31, 2020
6,625
1,816
228
Found this in the Syslog while running automated updates on the only 3 containers inside a nested PVE 8.0.4 (ext4 and LVM, on top on PVE 8.0.4 with ZFS):
Code:
nov 08 21:51:00 pve9 kernel: INFO: task dmeventd:337 blocked for more than 120 seconds.
nov 08 21:51:00 pve9 kernel:       Tainted: P           O       6.2.16-19-pve #1
nov 08 21:51:00 pve9 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
nov 08 21:51:00 pve9 kernel: task:dmeventd        state:D stack:0     pid:337   ppid:1      flags:0x00000002
nov 08 21:51:00 pve9 kernel: Call Trace:
nov 08 21:51:00 pve9 kernel:  <TASK>
nov 08 21:51:00 pve9 kernel:  __schedule+0x402/0x1510
nov 08 21:51:00 pve9 kernel:  ? srso_alias_return_thunk+0x5/0x7f
nov 08 21:51:00 pve9 kernel:  ? post_alloc_hook+0xcc/0x120
nov 08 21:51:00 pve9 kernel:  schedule+0x63/0x110
nov 08 21:51:00 pve9 kernel:  schedule_preempt_disabled+0x15/0x30
nov 08 21:51:00 pve9 kernel:  rwsem_down_read_slowpath+0x284/0x4d0
nov 08 21:51:00 pve9 kernel:  down_read+0x48/0xc0
nov 08 21:51:00 pve9 kernel:  dm_pool_get_metadata_transaction_id+0x23/0x60 [dm_thin_pool]
nov 08 21:51:00 pve9 kernel:  pool_status+0x1c4/0x810 [dm_thin_pool]
nov 08 21:51:00 pve9 kernel:  retrieve_status+0x15a/0x220
nov 08 21:51:00 pve9 kernel:  table_status+0x9b/0x150
nov 08 21:51:00 pve9 kernel:  ctl_ioctl+0x349/0x690
nov 08 21:51:00 pve9 kernel:  ? __pfx_table_status+0x10/0x10
nov 08 21:51:00 pve9 kernel:  dm_ctl_ioctl+0xe/0x20
nov 08 21:51:00 pve9 kernel:  __x64_sys_ioctl+0xa0/0xe0
nov 08 21:51:00 pve9 kernel:  do_syscall_64+0x5b/0x90
nov 08 21:51:00 pve9 kernel:  ? srso_alias_return_thunk+0x5/0x7f
nov 08 21:51:00 pve9 kernel:  ? do_syscall_64+0x67/0x90
nov 08 21:51:00 pve9 kernel:  ? srso_alias_return_thunk+0x5/0x7f
nov 08 21:51:00 pve9 kernel:  ? irqentry_exit+0x43/0x50
nov 08 21:51:00 pve9 kernel:  ? srso_alias_return_thunk+0x5/0x7f
nov 08 21:51:00 pve9 kernel:  ? common_interrupt+0x54/0xb0
nov 08 21:51:00 pve9 kernel:  entry_SYSCALL_64_after_hwframe+0x73/0xdd
nov 08 21:51:00 pve9 kernel: RIP: 0033:0x7f398c8f7b5b
nov 08 21:51:00 pve9 kernel: RSP: 002b:00007f398bcaeaa0 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
nov 08 21:51:00 pve9 kernel: RAX: ffffffffffffffda RBX: 00005595fb7f28b8 RCX: 00007f398c8f7b5b
nov 08 21:51:00 pve9 kernel: RDX: 00007f3984042300 RSI: 00000000c138fd0c RDI: 0000000000000007
nov 08 21:51:00 pve9 kernel: RBP: 00007f398bcaebc0 R08: 0000000000000004 R09: 00007f398ca39518
nov 08 21:51:00 pve9 kernel: R10: 00007f398ca38842 R11: 0000000000000246 R12: 0000000000000000
nov 08 21:51:00 pve9 kernel: R13: 00007f398ca38842 R14: 00007f398ca38842 R15: 00007f398ca38842
nov 08 21:51:00 pve9 kernel:  </TASK>
See the attached file for all blocked tasks and stacktraces. This is the VM configuration:
Code:
agent: 1
args: -global scsi-hd.physical_block_size=4k -global scsi-hd.logical_block_size=4k
balloon: 2048
bios: ovmf
boot: order=scsi0;scsi1
cores: 1
cpu: EPYC-Milan,flags=-pcid;+ibpb;+virt-ssbd;+amd-ssbd
efidisk0: qpool-zfs:vm-109-disk-0,efitype=4m,size=1M
hotplug: 0
memory: 4096
meta: creation-qemu=7.2.0,ctime=1689939036
name: pve9
net0: virtio=52:54:56:17:02:09,bridge=vmbr2,firewall=1
net1: virtio=52:54:54:17:03:09,bridge=vmbr3
numa: 1
onboot: 1
ostype: l26
rng0: source=/dev/urandom
scsi0: none,media=cdrom
scsi1: qpool-zfs:vm-109-disk-1,cache=writeback,discard=on,size=9G,ssd=1
scsi2: qpool-zfs:vm-109-disk-2,backup=0,cache=writeback,discard=on,size=24G,ssd=1
scsihw: virtio-scsi-pci
serial0: socket
smbios1: uuid=dda0dacd-e7c3-4179-bd2d-c8906871b360
sockets: 2
startup: down=45
tablet: 0
vga: serial0
vmgenid: 0a443592-b16d-4769-8ec2-81d426b8a2c1
All containers were question-marked and would not stop. Forced stop of the pve9 VM did finally worked but the nested Proxmox GUI was working otherwise. It only happened today and I don't seem to be able to reproduce the issue. No problems found in the Syslog on the host. Does anyone happen recognize this issue and know how to prevent it?

EDIT: I guess nobody knows and the problem is not reproducible, so I don't expect any answer ever.
EDIT2: Maybe it's one of those rare edge cases:
pve-qemu-kvm (8.1.2-5) bookworm; urgency=medium

* backport workaround for stuck guest IO with iothread and VirtIO block/SCSI
in some rare edge cases
 

Attachments

Last edited:

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!