[SOLVED] Proxmox VE 8.2.2 - pvescheduler not running and cannot be started on STANDALONE install (yet no quorum?)

inertle

New Member
Apr 30, 2024
2
0
1
Hello,

I've seen similar threads on here about this, but they all involve clusters and cluster-specific solutions. My installation is NOT and has NEVER been setup in a cluster, and yet pvescheduler has suddenly been vomiting below into the syslog for days and refusing to restart (systemctl restart hangs forever).

Note that I corrected the initial out-of-storage issue and the system has otherwise been running fine, but it seems like that was the straw that broke pvescheduler's back.

Additionally, I have tried removing the /var/lib/pve-manager/pve-replication-state* files and restarting, but it does not make a difference.

Any ideas on how to get pvescheduler to start and run?

Code:
Apr 27 02:04:59 BLDC-PMX1 pvescheduler[428974]: jobs: cfs-lock 'file-jobs_cfg' error: got lock request timeout
Apr 27 04:08:34 BLDC-PMX1 pvescheduler[458573]: jobs: cfs-lock 'file-jobs_cfg' error: got lock request timeout
Apr 27 04:12:07 BLDC-PMX1 pvescheduler[460308]: replication: unable to open file '/var/lib/pve-manager/pve-replication-state.json.tmp.460308' - No space left on device
Apr 27 04:12:16 BLDC-PMX1 pvescheduler[460309]: jobs: cfs-lock 'file-jobs_cfg' error: got lock request timeout
Apr 27 04:13:03 BLDC-PMX1 pvescheduler[460702]: replication: unable to open file '/var/lib/pve-manager/pve-replication-state.json.tmp.460702' - No space left on device
Apr 27 04:13:11 BLDC-PMX1 pvescheduler[302938]: VM 101 qmp command failed - VM 101 qmp command 'query-backup' failed - client closed connection
Apr 27 04:13:11 BLDC-PMX1 pvescheduler[302938]: VM 101 qmp command failed - VM 101 not running
Apr 27 04:13:11 BLDC-PMX1 pvescheduler[302938]: VM 101 qmp command failed - VM 101 not running
Apr 27 04:13:11 BLDC-PMX1 pvescheduler[302938]: unable to open file '/etc/pve/nodes/BLDC-PMX1/qemu-server/101.conf.tmp.302938' - Input/output error
Apr 27 04:13:11 BLDC-PMX1 pvescheduler[302938]: ERROR: Backup of VM 101 failed - VM 101 not running
Apr 27 04:13:11 BLDC-PMX1 pvescheduler[302938]: INFO: Backup job finished with errors
Apr 27 04:13:11 BLDC-PMX1 postfix/postdrop[461005]: warning: mail_queue_enter: create file maildrop/567004.461005: No space left on device
Apr 27 04:13:12 BLDC-PMX1 pvescheduler[460703]: jobs: cfs-lock 'file-jobs_cfg' error: got lock request timeout
Apr 27 04:13:21 BLDC-PMX1 pvescheduler[302938]: job errors
...
Apr 29 08:00:14 BLDC-PMX1 pvescheduler[3119653]: replication: cfs-lock 'file-replication_cfg' error: got lock request timeout
Apr 29 08:01:14 BLDC-PMX1 pvescheduler[3120459]: jobs: cfs-lock 'file-jobs_cfg' error: got lock request timeout
Apr 29 08:01:14 BLDC-PMX1 pvescheduler[3120458]: replication: cfs-lock 'file-replication_cfg' error: got lock request timeout
Apr 29 08:02:14 BLDC-PMX1 pvescheduler[3121357]: replication: cfs-lock 'file-replication_cfg' error: got lock request timeout
Apr 29 08:02:14 BLDC-PMX1 pvescheduler[3121358]: jobs: cfs-lock 'file-jobs_cfg' error: got lock request timeout
Apr 29 08:03:05 BLDC-PMX1 pvescheduler[3122195]: jobs: cfs-lock 'file-jobs_cfg' error: no quorum!
Apr 29 08:03:15 BLDC-PMX1 pvescheduler[3122194]: replication: Connection refused

Output of pvecm status, as expected:
Code:
Error: Corosync config '/etc/pve/corosync.conf' does not exist - is this node part of a cluster?

Output of pveversion -v:
Code:
proxmox-ve: 8.2.0 (running kernel: 6.5.13-1-pve)
pve-manager: 8.2.2 (running version: 8.2.2/9355359cd7afbae4)
proxmox-kernel-helper: 8.1.0
pve-kernel-6.2: 8.0.5
proxmox-kernel-6.8: 6.8.4-2
proxmox-kernel-6.8.4-2-pve-signed: 6.8.4-2
proxmox-kernel-6.5.13-5-pve-signed: 6.5.13-5
proxmox-kernel-6.5: 6.5.13-5
proxmox-kernel-6.5.13-1-pve-signed: 6.5.13-1
proxmox-kernel-6.5.11-7-pve-signed: 6.5.11-7
proxmox-kernel-6.2.16-20-pve: 6.2.16-20
proxmox-kernel-6.2: 6.2.16-20
proxmox-kernel-6.2.16-15-pve: 6.2.16-15
pve-kernel-6.2.16-3-pve: 6.2.16-3
ceph-fuse: 17.2.6-pve1+3
corosync: 3.1.7-pve3
criu: 3.17.1-2
glusterfs-client: 10.3-5
ifupdown2: 3.2.0-1+pmx8
ksm-control-daemon: 1.5-1
libjs-extjs: 7.0.0-4
libknet1: 1.28-pve1
libproxmox-acme-perl: 1.5.1
libproxmox-backup-qemu0: 1.4.1
libproxmox-rs-perl: 0.3.3
libpve-access-control: 8.1.4
libpve-apiclient-perl: 3.3.2
libpve-cluster-api-perl: 8.0.6
libpve-cluster-perl: 8.0.6
libpve-common-perl: 8.2.1
libpve-guest-common-perl: 5.1.1
libpve-http-server-perl: 5.1.0
libpve-network-perl: 0.9.8
libpve-rs-perl: 0.8.8
libpve-storage-perl: 8.2.1
libspice-server1: 0.15.1-1
lvm2: 2.03.16-2
lxc-pve: 6.0.0-1
lxcfs: 6.0.0-pve2
novnc-pve: 1.4.0-3
proxmox-backup-client: 3.2.2-1
proxmox-backup-file-restore: 3.2.2-1
proxmox-kernel-helper: 8.1.0
proxmox-mail-forward: 0.2.3
proxmox-mini-journalreader: 1.4.0
proxmox-widget-toolkit: 4.2.3
pve-cluster: 8.0.6
pve-container: 5.0.11
pve-docs: 8.2.2
pve-edk2-firmware: 4.2023.08-4
pve-esxi-import-tools: 0.7.0
pve-firewall: 5.0.6
pve-firmware: 3.11-1
pve-ha-manager: 4.0.4
pve-i18n: 3.2.2
pve-qemu-kvm: 8.1.5-6
pve-xtermjs: 5.3.0-3
qemu-server: 8.2.1
smartmontools: 7.3-pve1
spiceterm: 3.3.0
swtpm: 0.8.0+pve1
vncterm: 1.8.0
zfsutils-linux: 2.2.3-pve2
 
Hi,
please share the output of the following commands
Code:
df -h
df -i
stat /etc/pve/local
systemctl status pve-cluster.service
 
I've since got a moment to troubleshoot and restart the system and in the process figured out the culprit:

Canceling the "Bulk startup of VMs and containers" while it is timing a particular VMs startup delay makes it hang rather than gracefully exit as of a recent update (I remember doing this a while ago without issue), which prevents all of the downstream services from completing startup, which means pvescheduler was waiting on pve-guests, who in turn was waiting on the startup delay script that would never exit naturally for some reason.

pve-cluster.service was inactive before I rebooted, which aligns with its dependencies including pve-guests.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!