command 'systemctl show pve-cluster' failed: exit code 1

Mar 26, 2023
3
1
3
Hi,

Proxmox 7.2, Dell PowerEdge R640, first member of cluster of 5.

From this morning I started to see the errors, listed below, in the syslog file. Manually running the commands generates the following error :

root@pm1:~# systemctl show chrony
Failed to get properties: Transport endpoint is not connected

Backups are failing too to local disk with error :

ERROR: Backup of VM 108 failed - start failed: org.freedesktop.DBus.Error.NoReply: Did not receive a reply. Possible causes include: the remote application did not send a reply, the message bus security policy blocked the reply, the reply timeout expired, or the network connection was broken.

Virtual machines are all running fine, network connectivity seems fine, network mount points are fine, I can write to the mounted filesystems etc and nothing useful in dmesg that I can see.

Any ideas where to start to to fix this other than a reboot (I currently moving all the VM's off the server)


Mar 26 10:44:19 pm1-bt pvedaemon[31780]: command 'systemctl show chrony' failed: exit code 1
Mar 26 10:45:49 pm1-bt pvedaemon[31780]: command 'systemctl show corosync' failed: exit code 1
Mar 26 10:47:01 pm1-bt pmxcfs[687781]: [dcdb] notice: data verification successful
Mar 26 10:47:19 pm1-bt pvedaemon[31780]: command 'systemctl show cron' failed: exit code 1
Mar 26 10:48:49 pm1-bt pvedaemon[31780]: command 'systemctl show ksmtuned' failed: exit code 1
Mar 26 10:50:19 pm1-bt pvedaemon[31780]: command 'systemctl show postfix@-' failed: exit code 1
Mar 26 10:51:49 pm1-bt pvedaemon[31780]: command 'systemctl show pve-cluster' failed: exit code 1
Mar 26 10:53:19 pm1-bt pvedaemon[31780]: command 'systemctl show pve-firewall' failed: exit code 1
Mar 26 10:54:49 pm1-bt pvedaemon[31780]: command 'systemctl show pve-ha-crm' failed: exit code 1
Mar 26 10:56:19 pm1-bt pvedaemon[31780]: command 'systemctl show pve-ha-lrm' failed: exit code 1


Note : This server is version Proxmox 7.2 where all the other nodes are 6.3. Its been working for 245 days without issue, but its a difference that might be the reason for the failure?

Best Regards,
Nigel
 
  • Like
Reactions: a0lite
This sounds like you are running into the current systemd daylight savins issue - is your timezone in Ireland by chance?

If yes, changing the timezone to UTC should help as a workaround. You can do so by running:
Code:
dpkg-reconfigure tzdata
Select "none of the above" and the UTC.

A fix is already contained in the no-subscription repository - but should hit the subscription repository as well this week.
 
  • Like
Reactions: a0lite
Thank you. I did suspect a possible daylight issue as I recieved one email with "Failed to restart atop.service: Transport endpoint is not connected" at 3 minutes passed midnight, with midnight being the time change for daylight savings.

Unfortuately the server did not come back up cleanly after a reboot, I could ping the server but nothing else was accessible e.g. ssh or web and console was non-interactive. However for now, I'm happy to wipe the server and start a freash with Proxmox 7.4.

Thanks again, always good to know what the reason for failure is and reading the following

https://news.ycombinator.com/item?id=35308796&ref=upstract.com

suggests setting time to UTC is the best course of action to prevent issues like this going forward.

Regards,
Nigel
 
Last edited:
Update : Run dpkg-reconfigure tzdata and change to UTC BEFORE you do your server updates as on another server with same issue the updates had lots of problems restarting things e.g.

Setting up udev (247.3-7+1-pmx11u1) ...
Failed to reload daemon: Transport endpoint is not connected
Failed to restart udev.service: Transport endpoint is not connected
See system logs and 'systemctl status udev.service' for details.
invoke-rc.d: initscript udev, action "restart" failed.
Failed to get properties: Transport endpoint is not connected
dpkg: error processing package udev (--configure):
installed udev package post-installation script subprocess returned error exit status 1
Setting up pve-i18n (2.11-1) ...
Setting up libgnutlsxx28:amd64 (3.7.1-5+deb11u3) ...
Setting up lxc-pve (5.0.2-2) ...
Installing new version of config file /etc/apparmor.d/abstractions/lxc/start-container ...
Installing new version of config file /etc/apparmor.d/lxc/lxc-default-cgns ...
Failed to reload daemon: Transport endpoint is not connected
Failed to get unit file state for lxc-monitord.service: Transport endpoint is not connected
Failed to retrieve unit state: Transport endpoint is not connected
lxc-monitord.service is a disabled or a static unit, not starting it.
Failed to get unit file state for lxc-net.service: Transport endpoint is not connected
Failed to retrieve unit state: Transport endpoint is not connected
 
Just spent ages trying to firefight this. Proxmox was hosting its containers etc but when I tried to log in to it this morning everything was hanging. Could not reboot and so physically powered it off and on but its been tortuous trying to chase the issue with slow response to commands. The bash-completion package also didn't help as auto completion was ridiculously slow too.

Yes I'm in Ireland and yes, the "dpkg-reconfigure tzdata" to UTC worked.

I also took out bash-completion: apt purge bash-completion

Thanks for your help - I'm currently verbally abusing systemd for wasting most of today. I might have to go out for a walk around the block.
 
  • Like
Reactions: a0lite
OK, this is still a bug! Thanks so much for creating this thread a year ago.
Two hours over the past 24 trying to work away on this looking at logs, trying to figure out why nothing would work.
Learned lots and something as simple as this was the issue.

Adding some extra symptoms here to help with keywords for the next person in 2025!

Symptoms started after trying to destroy a container. The whole web GUI started to run slowly and bug out. I couldn't start or stop anything, everything taking an age to load.
TASK ERROR: can't lock file '/var/lock/qemu-server/lock-110.conf' - got timeout
All devices showing a small grey question mark
SSH was very slow to connect
Rebooting wasn't working from the UI or SSH
Physically pulling the plug eventually got SSH running a little faster - bad idea but I run out of choices and it's only a home lab
No new logs and Proxmox UI still not loading after reboot. Only port's 22 and 111 available (3128 squit-http not open)
Eventually when trying to run service pve-cluster stopI got

Failed to stop pve-cluster.service: Transport endpoint is not connected

This lead me to this thread which happened to just say 'is your timezone in Ireland by chance?'
Fixed with a bit of the aul dpkg-reconfigure tzdata to UTC, apt purge bash-completion and a reboot now

Now I'm back to where I was - cheers lads!
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!