/run/pve ??

DocMAX

Member
Jan 30, 2023
250
18
23
Bremen
Lateltly i get error starting CT.
Starting CT 100 ... problem with monitor socket, but continuing anyway: got timeout

I found out on every reboot /run/pve folder is missing. Manually creating it fixes the problem.
What happened?
 
Code:
root@pve:~# systemctl status pve-cluster
● pve-cluster.service - The Proxmox VE cluster filesystem
     Loaded: loaded (/usr/lib/systemd/system/pve-cluster.service; enabled; preset: enabled)
     Active: active (running) since Wed 2026-01-07 07:28:25 CET; 10min ago
 Invocation: d9c3e321c1ea473b8fc6020a707b7b54
    Process: 1739 ExecStart=/usr/bin/pmxcfs (code=exited, status=0/SUCCESS)
   Main PID: 1780 (pmxcfs)
      Tasks: 7 (limit: 76305)
     Memory: 61.9M (peak: 69M)
        CPU: 1.161s
     CGroup: /system.slice/pve-cluster.service
             └─1780 /usr/bin/pmxcfs

Jan 07 07:28:24 pve pmxcfs[1780]: [dcdb] crit: cpg_initialize failed: CS_ERR_LIBRARY (failed to connect to corosync)
Jan 07 07:28:24 pve pmxcfs[1780]: [dcdb] crit: can't initialize service
Jan 07 07:28:24 pve pmxcfs[1780]: [status] crit: cpg_initialize failed: CS_ERR_LIBRARY (failed to connect to corosync)
Jan 07 07:28:24 pve pmxcfs[1780]: [status] crit: can't initialize service
Jan 07 07:28:30 pve pmxcfs[1780]: [status] notice: update cluster info (cluster name  pve-cluster, version = 3)
Jan 07 07:28:30 pve pmxcfs[1780]: [status] notice: node has quorum
Jan 07 07:28:30 pve pmxcfs[1780]: [dcdb] notice: members: 1/1780
Jan 07 07:28:30 pve pmxcfs[1780]: [dcdb] notice: all data is up to date
Jan 07 07:28:30 pve pmxcfs[1780]: [status] notice: members: 1/1780
Jan 07 07:28:30 pve pmxcfs[1780]: [status] notice: all data is up to date
 
Code:
root@pve:~# systemctl status corosync
● corosync.service - Corosync Cluster Engine
     Loaded: loaded (/usr/lib/systemd/system/corosync.service; enabled; preset: enabled)
     Active: active (running) since Wed 2026-01-07 07:28:26 CET; 13min ago
 Invocation: ac2bc46a5b8441eb870aa51f41e116c9
       Docs: man:corosync
             man:corosync.conf
             man:corosync_overview
   Main PID: 2600 (corosync)
      Tasks: 9 (limit: 76305)
     Memory: 150.1M (peak: 150.6M)
        CPU: 5.864s
     CGroup: /system.slice/corosync.service
             └─2600 /usr/sbin/corosync -f

Jan 07 07:28:26 pve corosync[2600]:   [WD    ] resource memory_used missing a recovery key.
Jan 07 07:28:26 pve corosync[2600]:   [WD    ] no resources configured.
Jan 07 07:28:26 pve corosync[2600]:   [SERV  ] Service engine loaded: corosync watchdog service [7]
Jan 07 07:28:26 pve corosync[2600]:   [QUORUM] Using quorum provider corosync_votequorum
Jan 07 07:28:26 pve corosync[2600]:   [KNET  ] host: host: 2 has no active links
Jan 07 07:28:26 pve corosync[2600]:   [KNET  ] host: host: 2 has no active links
Jan 07 07:28:26 pve corosync[2600]:   [KNET  ] host: host: 2 has no active links
Jan 07 07:28:26 pve corosync[2600]:   [KNET  ] host: host: 3 has no active links
Jan 07 07:28:26 pve corosync[2600]:   [KNET  ] host: host: 3 has no active links
Jan 07 07:28:26 pve corosync[2600]:   [KNET  ] host: host: 3 has no active links
root@pve:~# journalctl -u corosync
Sep 12 16:11:02 pve corosync[3170]:   [MAIN  ] Corosync Cluster Engine  starting up
Sep 12 16:11:02 pve corosync[3170]:   [MAIN  ] Corosync built-in features: dbus monitoring watchdog systemd xmlconf vqsim nozzle snmp pie relro bindnow
Sep 12 16:11:02 pve corosync[3170]:   [TOTEM ] Initializing transport (Kronosnet).
Sep 12 16:11:03 pve corosync[3170]:   [TOTEM ] totemknet initialized
Sep 12 16:11:03 pve corosync[3170]:   [KNET  ] pmtud: MTU manually set to: 0
Sep 12 16:11:03 pve corosync[3170]:   [KNET  ] common: crypto_nss.so has been loaded from /usr/lib/x86_64-linux-gnu/kronosnet/crypto_nss.so
Sep 12 16:11:03 pve corosync[3170]:   [SERV  ] Service engine loaded: corosync configuration map access [0]
Sep 12 16:11:03 pve corosync[3170]:   [QB    ] server name: cmap
Sep 12 16:11:03 pve corosync[3170]:   [SERV  ] Service engine loaded: corosync configuration service [1]
Sep 12 16:11:03 pve corosync[3170]:   [QB    ] server name: cfg
Sep 12 16:11:03 pve corosync[3170]:   [SERV  ] Service engine loaded: corosync cluster closed process group service v1.01 [2]
Sep 12 16:11:03 pve corosync[3170]:   [QB    ] server name: cpg
Sep 12 16:11:03 pve corosync[3170]:   [SERV  ] Service engine loaded: corosync profile loading service [4]
Sep 12 16:11:03 pve corosync[3170]:   [SERV  ] Service engine loaded: corosync resource monitoring service [6]
Sep 12 16:11:03 pve corosync[3170]:   [WD    ] Watchdog not enabled by configuration
Sep 12 16:11:03 pve corosync[3170]:   [WD    ] resource load_15min missing a recovery key.
Sep 12 16:11:03 pve corosync[3170]:   [WD    ] resource memory_used missing a recovery key.
Sep 12 16:11:03 pve corosync[3170]:   [WD    ] no resources configured.
Sep 12 16:11:03 pve corosync[3170]:   [SERV  ] Service engine loaded: corosync watchdog service [7]
Sep 12 16:11:03 pve corosync[3170]:   [QUORUM] Using quorum provider corosync_votequorum
Sep 12 16:11:03 pve corosync[3170]:   [QUORUM] This node is within the primary component and will provide service.
Sep 12 16:11:03 pve corosync[3170]:   [QUORUM] Members[0]:
Sep 12 16:11:03 pve corosync[3170]:   [SERV  ] Service engine loaded: corosync vote quorum service v1.0 [5]
Sep 12 16:11:03 pve corosync[3170]:   [QB    ] server name: votequorum
Sep 12 16:11:03 pve corosync[3170]:   [SERV  ] Service engine loaded: corosync cluster quorum service v0.1 [3]
Sep 12 16:11:03 pve corosync[3170]:   [QB    ] server name: quorum
Sep 12 16:11:03 pve corosync[3170]:   [TOTEM ] Configuring link 0
Sep 12 16:11:03 pve corosync[3170]:   [TOTEM ] Configured link number 0: local addr: 192.168.1.10, port=5405
Sep 12 16:11:03 pve corosync[3170]:   [KNET  ] host: host: 2 (passive) best link: 0 (pri: 1)
Sep 12 16:11:03 pve corosync[3170]:   [KNET  ] host: host: 2 has no active links
Sep 12 16:11:03 pve corosync[3170]:   [KNET  ] host: host: 2 (passive) best link: 0 (pri: 1)
Sep 12 16:11:03 pve corosync[3170]:   [KNET  ] host: host: 2 has no active links
Sep 12 16:11:03 pve corosync[3170]:   [KNET  ] host: host: 2 has no active links
Sep 12 16:11:03 pve corosync[3170]:   [KNET  ] host: host: 3 has no active links
Sep 12 16:11:03 pve corosync[3170]:   [KNET  ] host: host: 3 has no active links
Sep 12 16:11:03 pve corosync[3170]:   [KNET  ] host: host: 3 has no active links
Sep 12 23:33:38 pve systemd-journald[465]: [] Suppressed 10 messages from corosync.service
Sep 12 23:33:38 pve corosync-cfgtool[906418]: Shutting down corosync
Sep 12 23:33:38 pve corosync[3170]:   [MAIN  ] Node was shut down by a signal
Sep 12 23:33:38 pve corosync[3170]:   [SERV  ] Unloading all Corosync service engines.
Sep 12 23:33:38 pve corosync[3170]:   [QB    ] withdrawing server sockets
Sep 12 23:33:38 pve corosync[3170]:   [SERV  ] Service engine unloaded: corosync vote quorum service v1.0
Sep 12 23:33:38 pve corosync[3170]:   [CFG   ] Node 1 was shut down by sysadmin
Sep 12 23:33:38 pve corosync[3170]:   [QB    ] withdrawing server sockets
Sep 12 23:33:38 pve corosync[3170]:   [SERV  ] Service engine unloaded: corosync configuration map access
Sep 12 23:33:38 pve corosync[3170]:   [QB    ] withdrawing server sockets
Sep 12 23:33:38 pve corosync[3170]:   [SERV  ] Service engine unloaded: corosync configuration service
Sep 12 23:33:38 pve corosync[3170]:   [QB    ] withdrawing server sockets
Sep 12 23:33:38 pve corosync[3170]:   [SERV  ] Service engine unloaded: corosync cluster closed process group service v1.01
Sep 12 23:33:38 pve corosync[3170]:   [QB    ] withdrawing server sockets
Sep 12 23:33:38 pve corosync[3170]:   [SERV  ] Service engine unloaded: corosync cluster quorum service v0.1
Sep 12 23:33:38 pve corosync[3170]:   [SERV  ] Service engine unloaded: corosync profile loading service
Sep 12 23:33:38 pve corosync[3170]:   [SERV  ] Service engine unloaded: corosync resource monitoring service
Sep 12 23:33:38 pve corosync[3170]:   [SERV  ] Service engine unloaded: corosync watchdog service
Sep 12 23:33:39 pve corosync[3170]:   [KNET  ] link: Resetting MTU for link 0 because host 1 joined
Sep 12 23:33:39 pve corosync[3170]:   [MAIN  ] Corosync Cluster Engine exiting normally
-- Boot b6dd5ef05f15484d9bfb46ad05dc517a --
Sep 12 23:35:53 pve corosync[1999]:   [MAIN  ] Corosync Cluster Engine  starting up
Sep 12 23:35:53 pve corosync[1999]:   [MAIN  ] Corosync built-in features: dbus monitoring watchdog systemd xmlconf vqsim nozzle snmp pie relro bindnow
Sep 12 23:35:53 pve corosync[1999]:   [TOTEM ] Initializing transport (Kronosnet).
Sep 12 23:35:53 pve corosync[1999]:   [TOTEM ] totemknet initialized
Sep 12 23:35:53 pve corosync[1999]:   [KNET  ] pmtud: MTU manually set to: 0
Sep 12 23:35:53 pve corosync[1999]:   [KNET  ] common: crypto_nss.so has been loaded from /usr/lib/x86_64-linux-gnu/kronosnet/crypto_nss.so
Sep 12 23:35:53 pve corosync[1999]:   [SERV  ] Service engine loaded: corosync configuration map access [0]
Sep 12 23:35:53 pve corosync[1999]:   [QB    ] server name: cmap
Sep 12 23:35:53 pve corosync[1999]:   [SERV  ] Service engine loaded: corosync configuration service [1]
Sep 12 23:35:53 pve corosync[1999]:   [QB    ] server name: cfg
Sep 12 23:35:53 pve corosync[1999]:   [SERV  ] Service engine loaded: corosync cluster closed process group service v1.01 [2]
Sep 12 23:35:53 pve corosync[1999]:   [QB    ] server name: cpg
Sep 12 23:35:53 pve corosync[1999]:   [SERV  ] Service engine loaded: corosync profile loading service [4]
Sep 12 23:35:53 pve corosync[1999]:   [SERV  ] Service engine loaded: corosync resource monitoring service [6]
 
Test your network config, do you use the same network for cluster and other services?
There are signals of a misconfiguration or congestion in cluster network

Code:
Jan 07 07:28:26 pve corosync[2600]:   [KNET  ] host: host: 2 has no active links
Code:
Sep 12 23:33:38 pve corosync-cfgtool[906418]: Shutting down corosync
 
The problem is that this node is part of a cluster, but cannot connect on any of the both networks to the remaining cluster:
host: host: 2 has no active links

The result is that the service/protocol for the cluster communication is not working and as a result of this, the Proxmox Cluster File System (pmxcfs), which provides the /etc/pve directory that contains all of Proxmox VE specific configs, will not work.

So if this node should be part of a cluster, make sure that it can talk to the remaining nodes of the cluster.
 
  • Like
Reactions: Johannes S
This is intended. I have 3 pve hosts and 2 of them are most of the time down. I updated the quorum value accordinly.
But does this have something to do with a missing /run/pve dir?
 
My use case is to share VM to my other PCs. This way i don't have to update 3 OS, but just one.
When the VM is loaded on the other host it uses the disk image from the "main" node.
Well thats another chapter but i never had issues with that.
The VM is using GPU passthrough from the host.
 
Last edited:
This is intended. I have 3 pve hosts and 2 of them are most of the time down. I updated the quorum value accordinly.
well, that is setup that is not officially supported, so there might be some edge cases and we don't have a lot of experience with it.

regarding the /run/pve directory: is there a chance that the permissions are not as expected on some level?

The /run directory should have the following permissions:
Code:
ls -la / | grep run
drwxr-xr-x  53 root root 1880 Jan  7 00:18 run
and the /run/pve:
Code:
ls -la /run | grep pve
drwxr-x---  2 root              www-data     80 Jan  7 09:35 pve
If you create it manually, chances are that the group owner is not correct and that can interfere if a service wants to write something in there which is running as www-data.

It should be created automatically though! So please check if the packages are on the latest version and that there are no modifications to the host that could hijack anything here.
 
Last edited:
i create /run/pve as root. the CT startup works. but if i reboot now, the /run/pve dir will disappear. i can guarantee that.
and yes it's an edge case, but a good one to have a single image for X number of hosts. maybe someone can come up with another idea.
 
Last edited:
also having pve nodes offline shouldn't be an "edge case" at all since you have a WOL option build in
A regular Proxmox VE cluster needs the majority of the nodes to be online to be fully functional. With 2/3 nodes down, that is not the case.
 
  • Like
Reactions: gfngfn256
Code:
root@pve:~# systemctl cat pve-container@             

# /usr/lib/systemd/system/pve-container@.service
# based on lxc@.service, but without an install section because
# starting and stopping should be initiated by PVE code, not
# systemd.
[Unit]
Description=PVE LXC Container: %i
DefaultDependencies=No
After=lxc.service
Wants=lxc.service
Documentation=man:lxc-start man:lxc man:pct

[Service]
Type=simple
Delegate=yes
KillMode=mixed
TimeoutStopSec=120s
ExecStart=/usr/bin/lxc-start -F -n %i
ExecStop=/usr/share/lxc/pve-container-stop-wrapper %i
# Environment=BOOTUP=serial
# Environment=CONSOLETYPE=serial
# Prevent container init from putting all its output into the journal
StandardOutput=null
StandardError=file:/run/pve/ct-%i.stderr

i see this is the default service template which wants to write to /run/pve. /run/pve is a tmpfs. so what is responsible to create that dir on boot???