Hi,
I have been having an issue with the web interface of proxmox showing my node and all vm/containers running on it as having an "unknown" status with grey ? marks.
This seems to happen a few hours after every reboot of the server. Restarting pvedaemon, pveproxy, pvestatd does not seem to help.
From what I can tell from the syslog this seems to be a problem with pvestatd (PID:4707):
Does anyone know why/what could be causing this to happen? I have updated everything to the latest versions.
I am also having an issue with scheduled backups failing, which seems to be related: snapshots work fine when the web GUI is showing green status symbols, but fail when showing grey ? marks. The backup log goes as far as "starting backup of VM 10x (lxc)" but does not progress to writing to the backup location. It will stay in this status until I reboot the server (and only then will it produce an interrupt error in the log). I then have to manually unlock/start the container to get it running again.
Package versions:
I have been having an issue with the web interface of proxmox showing my node and all vm/containers running on it as having an "unknown" status with grey ? marks.
This seems to happen a few hours after every reboot of the server. Restarting pvedaemon, pveproxy, pvestatd does not seem to help.
From what I can tell from the syslog this seems to be a problem with pvestatd (PID:4707):
Dec 03 06:32:09 charlie kernel: INFO: task lvs:4707 blocked for more than 120 seconds.
Dec 03 06:32:09 charlie kernel: Tainted: P O 5.0.21-5-pve #1
Dec 03 06:32:09 charlie kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Dec 03 06:32:09 charlie kernel: lvs D 0 4707 1656 0x00000000
Dec 03 06:32:09 charlie kernel: Call Trace:
Dec 03 06:32:09 charlie kernel: __schedule+0x2d4/0x870
Dec 03 06:32:09 charlie kernel: ? get_page_from_freelist+0xefe/0x1440
Dec 03 06:32:09 charlie kernel: schedule+0x2c/0x70
Dec 03 06:32:09 charlie kernel: schedule_timeout+0x258/0x360
Dec 03 06:32:09 charlie kernel: wait_for_completion+0xb7/0x140
Dec 03 06:32:09 charlie kernel: ? wake_up_q+0x80/0x80
Dec 03 06:32:09 charlie kernel: __flush_work+0x138/0x1f0
Dec 03 06:32:09 charlie kernel: ? worker_detach_from_pool+0xb0/0xb0
Dec 03 06:32:09 charlie kernel: ? get_work_pool+0x40/0x40
Dec 03 06:32:09 charlie kernel: __cancel_work_timer+0x115/0x190
Dec 03 06:32:09 charlie kernel: ? exact_lock+0x11/0x20
Dec 03 06:32:09 charlie kernel: cancel_delayed_work_sync+0x13/0x20
Dec 03 06:32:09 charlie kernel: disk_block_events+0x78/0x80
Dec 03 06:32:09 charlie kernel: __blkdev_get+0x73/0x550
Dec 03 06:32:09 charlie kernel: ? bd_acquire+0xd0/0xd0
Dec 03 06:32:09 charlie kernel: blkdev_get+0x10c/0x330
Dec 03 06:32:09 charlie kernel: ? bd_acquire+0xd0/0xd0
Dec 03 06:32:09 charlie kernel: blkdev_open+0x92/0x100
Dec 03 06:32:09 charlie kernel: do_dentry_open+0x143/0x3a0
Dec 03 06:32:09 charlie kernel: vfs_open+0x2d/0x30
Dec 03 06:32:09 charlie kernel: path_openat+0x2bf/0x1570
Dec 03 06:32:09 charlie kernel: ? filename_lookup.part.61+0xe0/0x170
Dec 03 06:32:09 charlie kernel: ? strncpy_from_user+0x57/0x1c0
Dec 03 06:32:09 charlie kernel: do_filp_open+0x93/0x100
Dec 03 06:32:09 charlie kernel: ? strncpy_from_user+0x57/0x1c0
Dec 03 06:32:09 charlie kernel: ? __alloc_fd+0x46/0x150
Dec 03 06:32:09 charlie kernel: do_sys_open+0x177/0x280
Dec 03 06:32:09 charlie kernel: __x64_sys_openat+0x20/0x30
Dec 03 06:32:09 charlie kernel: do_syscall_64+0x5a/0x110
Dec 03 06:32:09 charlie kernel: entry_SYSCALL_64_after_hwframe+0x44/0xa9
Dec 03 06:32:09 charlie kernel: RIP: 0033:0x7fae96c151ae
Dec 03 06:32:09 charlie kernel: Code: Bad RIP value.
Dec 03 06:32:09 charlie kernel: RSP: 002b:00007ffe72629630 EFLAGS: 00000246 ORIG_RAX: 0000000000000101
Dec 03 06:32:09 charlie kernel: RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007fae96c151ae
Dec 03 06:32:09 charlie kernel: RDX: 0000000000044000 RSI: 0000557630d80698 RDI: 00000000ffffff9c
Dec 03 06:32:09 charlie kernel: RBP: 00007ffe72629790 R08: 0000557630dbe010 R09: 0000000000000000
Dec 03 06:32:09 charlie kernel: R10: 0000000000000000 R11: 0000000000000246 R12: 00007ffe7262be95
Dec 03 06:32:09 charlie kernel: R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
Does anyone know why/what could be causing this to happen? I have updated everything to the latest versions.
I am also having an issue with scheduled backups failing, which seems to be related: snapshots work fine when the web GUI is showing green status symbols, but fail when showing grey ? marks. The backup log goes as far as "starting backup of VM 10x (lxc)" but does not progress to writing to the backup location. It will stay in this status until I reboot the server (and only then will it produce an interrupt error in the log). I then have to manually unlock/start the container to get it running again.
Package versions:
proxmox-ve: 6.0-2 (running kernel: 5.0.21-5-pve)
pve-manager: 6.0-15 (running version: 6.0-15/52b91481)
pve-kernel-helper: 6.0-12
pve-kernel-5.0: 6.0-11
pve-kernel-5.0.21-5-pve: 5.0.21-10
pve-kernel-5.0.21-3-pve: 5.0.21-7
pve-kernel-5.0.21-2-pve: 5.0.21-7
pve-kernel-5.0.15-1-pve: 5.0.15-1
ceph-fuse: 12.2.11+dfsg1-2.1+b1
corosync: 3.0.2-pve4
criu: 3.11-3 glusterfs-client: 5.5-3
ksm-control-daemon: 1.3-1
libjs-extjs: 6.0.1-10
libknet1: 1.13-pve1
libpve-access-control: 6.0-5
libpve-apiclient-perl: 3.0-2
libpve-common-perl: 6.0-9
libpve-guest-common-perl: 3.0-3
libpve-http-server-perl: 3.0-3
libpve-storage-perl: 6.0-12
libqb0: 1.0.5-1
libspice-server1: 0.14.2-4~pve6+1
lvm2: 2.03.02-pve3
lxc-pve: 3.2.1-1
lxcfs: 3.0.3-pve60
novnc-pve: 1.1.0-1
proxmox-mini-journalreader: 1.1-1
proxmox-widget-toolkit: 2.1-1
pve-cluster: 6.0-9
pve-container: 3.0-14
pve-docs: 6.0-9
pve-edk2-firmware: 2.20191002-1
pve-firewall: 4.0-8
pve-firmware: 3.0-4
pve-ha-manager: 3.0-5
pve-i18n: 2.0-3
pve-qemu-kvm: 4.1.1-2
pve-xtermjs: 3.13.2-1
qemu-server: 6.1-1
smartmontools: 7.0-pve2
spiceterm: 3.1-1
vncterm: 1.6-1
zfsutils-linux: 0.8.2-pve2