pvedaemon crashed and wont start

wahmed

Famous Member
Oct 28, 2012
1,147
58
113
Calgary, Canada
www.symmcom.com
Hello,
While trying to stop a LXC container it crashed pvedaemon and pvestatd services. The node has several KVM VMs which seems to be running just fine but WebGUI showed the node is offline. I restarted pveproxy and pvestatd but when trying to restart pvedaemon it gives error
Code:
start failed - unable to create socket - Address already in use
I tried to kill pvedaemon with:
#pkill pvedaemon
#killall pvedaemon
But both of them said pvedaemon process does not exist. Htop shows no pvedaemon running. After restarting pvestatd i can see the node online through GUI but goes offline again after several minutes. SSH access and running VMs are uneffected.
Any idea how to restart pvedaemon?
 
The issue seems to be narrowing down to LXC container in my case. I was able to reproduce the error on a different node with no VMs. I tried to create a new LXC container on that node with storage on a Ceph KRBD enabled pool. The node got stuck on creating LXC killing services. The node can still be accessed through SSH but thats about it.
Tried the same thing on a Proxmox+Ceph node. It also got stuck on LXC creation while crashing all services including Ceph OSDs marking them down and out.

Any clue why this is happening with LXC? Anybody else had issue ?
 
That means pvedaemon is not running. What is displayed if you run pvedaemon from the command line?
pvedaemon --debug
Pvedaemon wont run from command line. It fails with instruction to see journalctl -xn for details. Following is the screenshot of the journalctl details:
pvedaemon-crash.PNG
 
What do you see when running ps -ef |grep pvedaemon before and after you have tried to start pvedaemon from the command line?
 
Do you have something else running which binds to the same port as pvedaemon?
In Proxmox 4 pvedaemon binds to port 85 which means it can only be run as root.
 
Last edited:
Do you have something else running which binds to the same port as pvedaemon?
In Proxmox 4 pvedaemon binds to port 85 which means it can only be run as root.
Nothing running as far as i know. All nodes are standard Proxmox installation. No manual service or programs has been installed.
 
...
Code:
start failed - unable to create socket - Address already in use
...
Hi Wasim,
what process use that address(Port)?:
Code:
# ss -pat | grep :85
LISTEN     0      128    127.0.0.1:85                       *:*                     users:(("pvedaemon worke",pid=1591,fd=6),("pvedaemon worke",pid=1590,fd=6),("pvedaemon worke",pid=1589,fd=6),("pvedaemon",pid=1588,fd=6))
localhost is well defined in /etc/hosts?

Udo
 
I'm going to piggy-back onto this thread as I have the same problem and symptoms but if told so I will go start my own thread?
I started a new LXC, stopped it and both tasks were hanging.
I then tried: service pvedaemon restart exactly like symmcom.

ss -pat | grep :85 | more
LISTEN 80 128 127.0.0.1:85 *:* users:(("lxc-info",pid=20751,fd=6),("lxc-info",pid=20452,fd=6),("lxc-info",pid=19574,fd=6))
CLOSE-WAIT 839 0 127.0.0.1:85 127.0.0.1:49644
CLOSE-WAIT 860 0 127.0.0.1:85 127.0.0.1:48178
CLOSE-WAIT 965 0 127.0.0.1:85 127.0.0.1:48624
CLOSE-WAIT 878 0 127.0.0.1:85 127.0.0.1:48426
CLOSE-WAIT 876 0 127.0.0.1:85 127.0.0.1:48458
CLOSE-WAIT 878 0 127.0.0.1:85 127.0.0.1:48384
CLOSE-WAIT 892 0 127.0.0.1:85 127.0.0.1:48608
CLOSE-WAIT 843 0 127.0.0.1:85 127.0.0.1:48656
CLOSE-WAIT 865 0 127.0.0.1:85 127.0.0.1:48278
CLOSE-WAIT 1031 0 127.0.0.1:85 127.0.0.1:48640
CLOSE-WAIT 880 0 127.0.0.1:85 127.0.0.1:48460
CLOSE-WAIT 878 0 127.0.0.1:85 127.0.0.1:48268
CLOSE-WAIT 862 0 127.0.0.1:85 127.0.0.1:48610
CLOSE-WAIT 892 0 127.0.0.1:85 127.0.0.1:48466
CLOSE-WAIT 862 0 127.0.0.1:85 127.0.0.1:48552
CLOSE-WAIT 876 0 127.0.0.1:85 127.0.0.1:48628
CLOSE-WAIT 860 0 127.0.0.1:85 127.0.0.1:48214
.
.
.
.

my localhost is well defined and nothing changed, I was running all good, then created a new LXC, started it, it stalled, I stopped it and then this.
 
Hi Wasim,
what process use that address(Port)?:
Code:
# ss -pat | grep :85
LISTEN     0      128    127.0.0.1:85                       *:*                     users:(("pvedaemon worke",pid=1591,fd=6),("pvedaemon worke",pid=1590,fd=6),("pvedaemon worke",pid=1589,fd=6),("pvedaemon",pid=1588,fd=6))
localhost is well defined in /etc/hosts?

Udo
The command also showed me many lines of Close-Wait message on localhost.
Looks like Ovidiu has exactly the same problem. Mine too started after stopping a newly created LXC. Looks like i should not see Close-Wait messages as shown in the screenshot below:
pvedaemon-crash2.PNG
Trying my best to find a solution without reboot. As it may happen to a well VM populated Proxmox node.
 
Thanks for confirming symmcom, but guess what: I woke up right now, checked and everything is working fine. HUH!?

root@james:~# service pvedaemon status
● pvedaemon.service - PVE API Daemon Loaded: loaded (/lib/systemd/system/pvedaemon.service; enabled)
Active: active (running) since Mon 2015-12-07 06:25:04 CET; 5min ago
Process: 25178 ExecStop=/usr/bin/pvedaemon stop (code=exited, status=0/SUCCESS)
Process: 31436 ExecStart=/usr/bin/pvedaemon start (code=exited, status=0/SUCCESS)
Main PID: 31446 (pvedaemon)
CGroup: /system.slice/pvedaemon.service
├─31446 pvedaemon
├─31447 pvedaemon worker
├─31448 pvedaemon worker
└─31449 pvedaemon worker

Dec 07 06:25:04 james pvedaemon[31446]: starting server
Dec 07 06:25:04 james pvedaemon[31446]: starting 3 worker(s)
Dec 07 06:25:04 james pvedaemon[31446]: worker 31447 started
Dec 07 06:25:04 james pvedaemon[31446]: worker 31448 started
Dec 07 06:25:04 james pvedaemon[31446]: worker 31449 started
Dec 07 06:25:04 james systemd[1]: Started PVE API Daemon.
Dec 07 06:29:26 james pvedaemon[31447]: successful auth for user 'root@pam'
 
Thanks for confirming symmcom, but guess what: I woke up right now, checked and everything is working fine. HUH!?
Unfortunately even after few days mine still not working on its own. Pvedaemon remains stubborn and wont start. All VMs on it still working though. I cant even migrate them to another node since cluster thinks the node is down.
Open for more suggestions. Really trying to figure this out without reboot.
 
We uploaded new packages to pve-no-subscription today - please can you test?
My package upgrade seems to be stuck at
"Preparing to unpack .../lxc-pve_1.1.5-5_amd64.deb ..."
Should i Ctrl+Z and see if i can restart the upgrade process without breaking the node any further ? All VMs on the node still working and SSH access still available.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!