pvedaemon cannot start

marson · Nov 16, 2017

Hello

First of all this is my first post so hello to everybody, also I am sorry for my poor english, it is not my native language, so I apologize in advance

I have some problem and I kindly ask for your help. Yesterday I started a backup of the VMs and it stuck at some point, I think it was because my storage VPS had some outage at the same time, so today I cancelled the backup through GUI, I simply clicked on stop button under backup detail window and the problem starts. The problem is that I cannot use Proxmox VE GUI any longer because it keeps displaying that my node is offline. but all VPSes amd services on it works fine, I can also SSH to the node and on the VM.

firstly for clarify below is pveversion --verbose output:

root@node:~# pveversion --verbose
proxmox-ve: 4.4-96 (running kernel: 4.4.83-1-pve)
pve-manager: 4.4-18 (running version: 4.4-18/ef2610e8)
pve-kernel-4.4.44-1-pve: 4.4.44-84
pve-kernel-4.4.83-1-pve: 4.4.83-96
lvm2: 2.02.116-pve3
corosync-pve: 2.4.2-2~pve4+1
libqb0: 1.0.1-1
pve-cluster: 4.0-53
qemu-server: 4.0-112
pve-firmware: 1.1-11
libpve-common-perl: 4.0-96
libpve-access-control: 4.0-23
libpve-storage-perl: 4.0-76
pve-libspice-server1: 0.12.8-2
vncterm: 1.3-2
pve-docs: 4.4-4
pve-qemu-kvm: 2.9.0-5~pve4
pve-container: 1.0-101
pve-firewall: 2.0-33
pve-ha-manager: 1.0-41
ksm-control-daemon: 1.2-1
glusterfs-client: 3.5.2-2+deb8u3
lxc-pve: 2.0.7-4
lxcfs: 2.0.6-pve1
criu: 1.6.0-1
novnc-pve: 0.5-9
smartmontools: 6.5+svn4324-1~pve80
root@node:~#

After some time of searching solution on this forum and debugging I noticed that pvedaemon process is down and I cannot run it again, when I try I get:

root@node:~# systemctl start pvedaemon
Job for pvedaemon.service failed. See 'systemctl status pvedaemon.service' and 'journalctl -xn' for details.

and the systemctl status pvedaemon.service gives me following:

root@node:~# systemctl status pvedaemon.service
● pvedaemon.service - PVE API Daemon
Loaded: loaded (/lib/systemd/system/pvedaemon.service; enabled)
Active: failed (Result: timeout) since czw 2017-11-16 14:38:26 CET; 4min 15s ago
Process: 702 ExecStop=/usr/bin/pvedaemon stop (code=exited, status=0/SUCCESS)
Process: 21317 ExecStart=/usr/bin/pvedaemon start (code=exited, status=255)
Main PID: 15217 (code=exited, status=0/SUCCESS)

lis 16 14:35:26 node pvedaemon[21317]: start failed - unable to create socket - Address already in use
lis 16 14:35:26 node pvedaemon[21317]: start failed - unable to create socket - Address already in use
lis 16 14:35:26 node systemd[1]: pvedaemon.service: control process exited, code=exited status=255
lis 16 14:36:56 node systemd[1]: pvedaemon.service stop-final-sigterm timed out. Killing.
lis 16 14:38:26 node systemd[1]: pvedaemon.service still around after final SIGKILL. Entering failed mode.
lis 16 14:38:26 node systemd[1]: Failed to start PVE API Daemon.
lis 16 14:38:26 node systemd[1]: Unit pvedaemon.service entered failed state.
root@node:~#

so I assume that something is bind on pvedaemon port so I tried:

root@node:~# ps -aux |grep pvedaemon
root 15218 0.0 0.5 338792 97280 ? D lis15 0:02 pvedaemon worker
root 15220 0.0 0.5 345224 97256 ? D lis15 0:00 pvedaemon worker
root 21778 0.0 0.0 13968 2088 pts/3 S+ 14:45 0:00 grep pvedaemon
root@node:~#

and then kill those PIDs listed above by kill -9 15218 and kill -9 15220 but no luck, those processes are still alive, I also tried pkill command with the same results.

I read somewhere on this forum that pvedaemon runs on port 83 so I check it with:

root@node:~# netstat -lntp
Active Internet connections (only servers)
Proto Recv-Q Send-Q Local Address Foreign Address State PID/Program name
tcp 0 0 0.0.0.0:36490 0.0.0.0:* LISTEN -
tcp 0 0 0.0.0.0:111 0.0.0.0:* LISTEN 1289/rpcbind
tcp 0 0 127.0.0.1:85 0.0.0.0:* LISTEN 15218/pvedaemon wor
tcp 0 0 188.165.238.227:53 0.0.0.0:* LISTEN 1317/named
tcp 0 0 127.0.0.1:53 0.0.0.0:* LISTEN 1317/named
tcp 0 0 0.0.0.0:22 0.0.0.0:* LISTEN 1321/sshd
tcp 0 0 0.0.0.0:3128 0.0.0.0:* LISTEN 1718/spiceproxy
tcp 0 0 127.0.0.1:25 0.0.0.0:* LISTEN 1509/master
tcp 0 0 127.0.0.1:953 0.0.0.0:* LISTEN 1317/named
tcp 0 0 0.0.0.0:40066 0.0.0.0:* LISTEN 1299/rpc.statd
tcp6 0 0 :::56204 :::* LISTEN 1299/rpc.statd
tcp6 0 0 :::111 :::* LISTEN 1289/rpcbind
tcp6 0 0 :::53 :::* LISTEN 1317/named
tcp6 0 0 :::22 :::* LISTEN 1321/sshd
tcp6 0 0 ::1:25 :::* LISTEN 1509/master
tcp6 0 0 ::1:953 :::* LISTEN 1317/named
tcp6 0 0 :::33020 :::* LISTEN -

and as you can see pvedaemon worker is listed there as well however when I check its status I get

root@node:~# pvedaemon status
stopped

at this point I am lost, I tried also to restart pvestatd and pveproxy, and both restarted fine, although pveproxy need a long while. Can anyone help me?

edit: if this make any difference, I have only 2 VPSes on this node and that VPSes are KVM based.

dcsapak · Nov 17, 2017

if your backup target was nfs it can happen that the process hanging can not be stopped (not even with kill -9)
if that happens, you have to reboot your host to get rid of those processes

marson · Nov 17, 2017

dcsapak said:
if your backup target was nfs it can happen that the process hanging can not be stopped (not even with kill -9)
if that happens, you have to reboot your host to get rid of those processes

Yes, that an NFS storage is this problem still exist on Proxmox 5 branch? Also can anyone recommend better solution for this? maybe sshfs or something else? if it is matter my backup storage is KVM VPS whih is based on cepch storage.

Search

Search

pvedaemon cannot start

marson

New Member

dcsapak

Proxmox Staff Member

marson

New Member

We value your privacy