VM not starting at boot on 3 node Proxmox cluster

tommisan

Renowned Member
Dec 9, 2014
35
0
71
We have a 3-node production Cluster (enterprise repo) with Dell PowerEdge r740XD with both local and shared storage.
After power loss in the datacenter VM on Proxmox cluster didn't start at boot, even if Start at boot is set to yes.

pveversion -v
proxmox-ve: 7.1-1 (running kernel: 5.13.19-3-pve)
pve-manager: 7.1-10 (running version: 7.1-10/6ddebafe)
pve-kernel-helper: 7.1-8
pve-kernel-5.13: 7.1-6
pve-kernel-5.11: 7.0-10
pve-kernel-5.13.19-3-pve: 5.13.19-7
pve-kernel-5.11.22-7-pve: 5.11.22-12
pve-kernel-5.11.22-5-pve: 5.11.22-10
pve-kernel-5.11.22-4-pve: 5.11.22-9

Servers are configured to start automatically after power restore.
VM started if powered on manually.


Any ideas?
Thanks
 
Last edited:
can you post the journal from such a boot where it does not work?
 
can you post the journal from such a boot where it does not work?
-- Journal begins at Mon 2021-09-20 16:22:32 CEST, ends at Mon 2022-02-28 12:17:01 CET. --
Feb 24 17:15:01 pve2 systemd[1]: Starting The Proxmox VE cluster filesystem...
Feb 24 17:15:01 pve2 pmxcfs[2218]: [quorum] crit: quorum_initialize failed: 2
Feb 24 17:15:01 pve2 pmxcfs[2218]: [quorum] crit: can't initialize service
Feb 24 17:15:01 pve2 pmxcfs[2218]: [confdb] crit: cmap_initialize failed: 2
Feb 24 17:15:01 pve2 pmxcfs[2218]: [confdb] crit: can't initialize service
Feb 24 17:15:01 pve2 pmxcfs[2218]: [dcdb] crit: cpg_initialize failed: 2
Feb 24 17:15:01 pve2 pmxcfs[2218]: [dcdb] crit: can't initialize service
Feb 24 17:15:01 pve2 pmxcfs[2218]: [status] crit: cpg_initialize failed: 2
Feb 24 17:15:01 pve2 pmxcfs[2218]: [status] crit: can't initialize service
Feb 24 17:15:02 pve2 systemd[1]: Started The Proxmox VE cluster filesystem.
Feb 24 17:15:07 pve2 pmxcfs[2218]: [status] notice: update cluster info (cluster name clusterdisia, version = 3)
Feb 24 17:15:07 pve2 pmxcfs[2218]: [dcdb] notice: members: 2/2218
Feb 24 17:15:07 pve2 pmxcfs[2218]: [dcdb] notice: all data is up to date
Feb 24 17:15:07 pve2 pmxcfs[2218]: [status] notice: members: 2/2218
Feb 24 17:15:07 pve2 pmxcfs[2218]: [status] notice: all data is up to date
Feb 24 17:16:10 pve2 pmxcfs[2218]: [dcdb] notice: members: 2/2218, 3/2197
Feb 24 17:16:10 pve2 pmxcfs[2218]: [dcdb] notice: starting data syncronisation
Feb 24 17:16:10 pve2 pmxcfs[2218]: [dcdb] notice: cpg_send_message retried 1 times
Feb 24 17:16:10 pve2 pmxcfs[2218]: [status] notice: node has quorum
Feb 24 17:16:10 pve2 pmxcfs[2218]: [status] notice: members: 2/2218, 3/2197
Feb 24 17:16:10 pve2 pmxcfs[2218]: [status] notice: starting data syncronisation
Feb 24 17:16:10 pve2 pmxcfs[2218]: [dcdb] notice: received sync request (epoch 2/2218/00000002)
Feb 24 17:16:10 pve2 pmxcfs[2218]: [status] notice: received sync request (epoch 2/2218/00000002)
Feb 24 17:16:10 pve2 pmxcfs[2218]: [dcdb] notice: received all states
Feb 24 17:16:10 pve2 pmxcfs[2218]: [dcdb] notice: leader is 2/2218
Feb 24 17:16:10 pve2 pmxcfs[2218]: [dcdb] notice: synced members: 2/2218, 3/2197
Feb 24 17:16:10 pve2 pmxcfs[2218]: [dcdb] notice: start sending inode updates
Feb 24 17:16:10 pve2 pmxcfs[2218]: [dcdb] notice: sent all (0) updates
Feb 24 17:16:10 pve2 pmxcfs[2218]: [dcdb] notice: all data is up to date
Feb 24 17:16:10 pve2 pmxcfs[2218]: [status] notice: received all states
Feb 24 17:16:10 pve2 pmxcfs[2218]: [status] notice: all data is up to date
Feb 24 17:16:11 pve2 pmxcfs[2218]: [status] notice: received log
Feb 24 17:16:11 pve2 pmxcfs[2218]: [dcdb] notice: members: 1/2227, 2/2218, 3/2197
Feb 24 17:16:11 pve2 pmxcfs[2218]: [dcdb] notice: starting data syncronisation
Feb 24 17:16:11 pve2 pmxcfs[2218]: [status] notice: members: 1/2227, 2/2218, 3/2197
Feb 24 17:16:11 pve2 pmxcfs[2218]: [status] notice: starting data syncronisation
Feb 24 17:16:12 pve2 pmxcfs[2218]: [dcdb] notice: received sync request (epoch 1/2227/00000002)
Feb 24 17:16:12 pve2 pmxcfs[2218]: [status] notice: received sync request (epoch 1/2227/00000002)
Feb 24 17:16:12 pve2 pmxcfs[2218]: [dcdb] notice: received all states
Feb 24 17:16:12 pve2 pmxcfs[2218]: [dcdb] notice: leader is 1/2227
Feb 24 17:16:12 pve2 pmxcfs[2218]: [dcdb] notice: synced members: 1/2227, 2/2218, 3/2197
Feb 24 17:16:12 pve2 pmxcfs[2218]: [dcdb] notice: all data is up to date
Feb 24 17:16:12 pve2 pmxcfs[2218]: [status] notice: received all states
Feb 24 17:16:12 pve2 pmxcfs[2218]: [status] notice: all data is up to date
Feb 24 17:16:12 pve2 pmxcfs[2218]: [status] notice: dfsm_deliver_queue: queue length 3
Feb 24 17:16:12 pve2 pmxcfs[2218]: [status] notice: received log
Feb 24 17:16:18 pve2 pmxcfs[2218]: [status] notice: received log
Feb 24 17:16:19 pve2 pmxcfs[2218]: [status] notice: received log
Feb 24 17:16:26 pve2 pmxcfs[2218]: [status] notice: received log
Feb 24 17:33:45 pve2 pmxcfs[2218]: [status] notice: received log
Feb 24 17:35:28 pve2 pmxcfs[2218]: [status] notice: received log
Feb 24 17:35:28 pve2 pmxcfs[2218]: [status] notice: received log
Feb 24 17:35:32 pve2 pmxcfs[2218]: [status] notice: received log
Feb 24 17:35:33 pve2 pmxcfs[2218]: [status] notice: received log
Feb 24 17:36:08 pve2 pmxcfs[2218]: [status] notice: received log
Feb 24 17:36:09 pve2 pmxcfs[2218]: [status] notice: received log
Feb 24 17:48:21 pve2 pmxcfs[2218]: [status] notice: received log
Feb 24 18:14:47 pve2 pmxcfs[2218]: [dcdb] notice: data verification successful
Feb 24 19:14:47 pve2 pmxcfs[2218]: [dcdb] notice: data verification successful
Feb 24 20:14:47 pve2 pmxcfs[2218]: [dcdb] notice: data verification successful
Feb 24 21:14:47 pve2 pmxcfs[2218]: [dcdb] notice: data verification successful
Feb 24 22:14:47 pve2 pmxcfs[2218]: [dcdb] notice: data verification successful
Feb 24 23:14:47 pve2 pmxcfs[2218]: [dcdb] notice: data verification successful
Feb 25 00:14:47 pve2 pmxcfs[2218]: [dcdb] notice: data verification successful
Feb 25 01:14:47 pve2 pmxcfs[2218]: [dcdb] notice: data verification successful
Feb 25 02:14:47 pve2 pmxcfs[2218]: [dcdb] notice: data verification successful
Feb 25 03:14:47 pve2 pmxcfs[2218]: [dcdb] notice: data verification successful
 
mhmm... that is not the whole journal since boot, i'd need the output of
Code:
journalct -b -0
for example (the -0 is the current boot, -1 the one before that, etc.)
 
it seems that the storage is not online during the boot (or not reachable), i see many lines like this:

Feb 24 17:16:31 pve2 pve-guests[2779]: storage 'dm5000h' is not online

can you show your storage config? (/etc/pve/storage.cfg)
 
it seems that the storage is not online during the boot (or not reachable), i see many lines like this:



can you show your storage config? (/etc/pve/storage.cfg)
The shared storage was also badly switched off during the power loss, so it could be possible it would take some time to boot and recover properly.

#cat /etc/pve/storage.cfg
dir: local
path /var/lib/vz
content iso,vztmpl,backup

lvmthin: local-lvm
thinpool data
vgname pve
content rootdir,images

dir: bck1
path /mnt/pve/bck1
content iso,rootdir,backup,snippets,images,vztmpl
is_mountpoint 1
nodes pve1

dir: bck2
path /mnt/pve/bck2
content vztmpl,images,snippets,backup,rootdir,iso
is_mountpoint 1
nodes pve2

dir: bck3
path /mnt/pve/bck3
content iso,rootdir,backup,snippets,images,vztmpl
is_mountpoint 1
nodes pve3

nfs: dm5000h
export /vm_nfs
path /mnt/pve/dm5000h
server 10.0.34.11
content images,rootdir
prune-backups keep-all=1


#pvesm status --storage dm5000h
Name Type Status Total Used Available %
dm5000h nfs active 4294967360 321790400 3973176960 7.49%
 
The shared storage was also badly switched off during the power loss, so it could be possible it would take some time to boot and recover properly.
if the necessary storage is not online when trying to start the vm, that cannot work...
 
if the necessary storage is not online when trying to start the vm, that cannot work...
I agree, but when different systems, like servers and storages, start at the same time, there could be a misalignment of some minutes.
The retry actions to mount the storage nfs and start a vm how long will go after PVE node boot?
 
the nfs will be retried every 10 seconds, but once the 'startall' task was run (even with errors) it will not be run again (until the next reboot)

you can set an 'onboot delay' for the startall, but that will then unconditionally happen everytime (regardless if the storage is online or not)

Code:
pvenode config set -startall-onboot-delay <seconds>
 
the nfs will be retried every 10 seconds, but once the 'startall' task was run (even with errors) it will not be run again (until the next reboot)

you can set an 'onboot delay' for the startall, but that will then unconditionally happen everytime (regardless if the storage is online or not)

Code:
pvenode config set -startall-onboot-delay <seconds>
Could also be hardcoded in a configuration file?
What is the default startall-onboot-delay?
Startall-onboot-delay could be set per storage and not globally?
Thanks for your support
 
Could also be hardcoded in a configuration file?
what do you mean? if you execute that command this is saved until you set it to a different value

What is the default startall-onboot-delay?
the default is no delay at all

Startall-onboot-delay could be set per storage and not globally?
not thats not how that works currently

before starting the vms, the 'startall' call does not have an idea which storages are needed, that is handled deeper in the code by the vm starting itself
 
what do you mean? if you execute that command this is saved until you set it to a different value
I mean if I can edit the file (where is it?) in which the parameters are written and then eventually reload it in some ways, instead of using command line to set it at runtime, just to remember that I set it in case of cluster upgrade/migration.
But command line is fine too.


the default is no delay at all


not thats not how that works currently

before starting the vms, the 'startall' call does not have an idea which storages are needed, that is handled deeper in the code by the vm starting itself

Ok
 
I mean if I can edit the file (where is it?) in which the parameters are written and then eventually reload it in some ways, instead of using command line to set it at runtime, just to remember that I set it in case of cluster upgrade/migration.
the file where it is saved is '/etc/pve/nodes/<NODENAME>/config'
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!