pvedaemon cannot start

marson

New Member
Nov 16, 2017
2
0
1
37
Hello

First of all this is my first post so hello to everybody, also I am sorry for my poor english, it is not my native language, so I apologize in advance

I have some problem and I kindly ask for your help. Yesterday I started a backup of the VMs and it stuck at some point, I think it was because my storage VPS had some outage at the same time, so today I cancelled the backup through GUI, I simply clicked on stop button under backup detail window and the problem starts. The problem is that I cannot use Proxmox VE GUI any longer because it keeps displaying that my node is offline. but all VPSes amd services on it works fine, I can also SSH to the node and on the VM.

firstly for clarify below is pveversion --verbose output:

root@node:~# pveversion --verbose
proxmox-ve: 4.4-96 (running kernel: 4.4.83-1-pve)
pve-manager: 4.4-18 (running version: 4.4-18/ef2610e8)
pve-kernel-4.4.44-1-pve: 4.4.44-84
pve-kernel-4.4.83-1-pve: 4.4.83-96
lvm2: 2.02.116-pve3
corosync-pve: 2.4.2-2~pve4+1
libqb0: 1.0.1-1
pve-cluster: 4.0-53
qemu-server: 4.0-112
pve-firmware: 1.1-11
libpve-common-perl: 4.0-96
libpve-access-control: 4.0-23
libpve-storage-perl: 4.0-76
pve-libspice-server1: 0.12.8-2
vncterm: 1.3-2
pve-docs: 4.4-4
pve-qemu-kvm: 2.9.0-5~pve4
pve-container: 1.0-101
pve-firewall: 2.0-33
pve-ha-manager: 1.0-41
ksm-control-daemon: 1.2-1
glusterfs-client: 3.5.2-2+deb8u3
lxc-pve: 2.0.7-4
lxcfs: 2.0.6-pve1
criu: 1.6.0-1
novnc-pve: 0.5-9
smartmontools: 6.5+svn4324-1~pve80
root@node:~#

After some time of searching solution on this forum and debugging I noticed that pvedaemon process is down and I cannot run it again, when I try I get:

root@node:~# systemctl start pvedaemon
Job for pvedaemon.service failed. See 'systemctl status pvedaemon.service' and 'journalctl -xn' for details.

and the systemctl status pvedaemon.service gives me following:

root@node:~# systemctl status pvedaemon.service
● pvedaemon.service - PVE API Daemon
Loaded: loaded (/lib/systemd/system/pvedaemon.service; enabled)
Active: failed (Result: timeout) since czw 2017-11-16 14:38:26 CET; 4min 15s ago
Process: 702 ExecStop=/usr/bin/pvedaemon stop (code=exited, status=0/SUCCESS)
Process: 21317 ExecStart=/usr/bin/pvedaemon start (code=exited, status=255)
Main PID: 15217 (code=exited, status=0/SUCCESS)

lis 16 14:35:26 node pvedaemon[21317]: start failed - unable to create socket - Address already in use
lis 16 14:35:26 node pvedaemon[21317]: start failed - unable to create socket - Address already in use
lis 16 14:35:26 node systemd[1]: pvedaemon.service: control process exited, code=exited status=255
lis 16 14:36:56 node systemd[1]: pvedaemon.service stop-final-sigterm timed out. Killing.
lis 16 14:38:26 node systemd[1]: pvedaemon.service still around after final SIGKILL. Entering failed mode.
lis 16 14:38:26 node systemd[1]: Failed to start PVE API Daemon.
lis 16 14:38:26 node systemd[1]: Unit pvedaemon.service entered failed state.
root@node:~#

so I assume that something is bind on pvedaemon port so I tried:

root@node:~# ps -aux |grep pvedaemon
root 15218 0.0 0.5 338792 97280 ? D lis15 0:02 pvedaemon worker
root 15220 0.0 0.5 345224 97256 ? D lis15 0:00 pvedaemon worker
root 21778 0.0 0.0 13968 2088 pts/3 S+ 14:45 0:00 grep pvedaemon
root@node:~#

and then kill those PIDs listed above by kill -9 15218 and kill -9 15220 but no luck, those processes are still alive, I also tried pkill command with the same results.

I read somewhere on this forum that pvedaemon runs on port 83 so I check it with:

root@node:~# netstat -lntp
Active Internet connections (only servers)
Proto Recv-Q Send-Q Local Address Foreign Address State PID/Program name
tcp 0 0 0.0.0.0:36490 0.0.0.0:* LISTEN -
tcp 0 0 0.0.0.0:111 0.0.0.0:* LISTEN 1289/rpcbind
tcp 0 0 127.0.0.1:85 0.0.0.0:* LISTEN 15218/pvedaemon wor
tcp 0 0 188.165.238.227:53 0.0.0.0:* LISTEN 1317/named
tcp 0 0 127.0.0.1:53 0.0.0.0:* LISTEN 1317/named
tcp 0 0 0.0.0.0:22 0.0.0.0:* LISTEN 1321/sshd
tcp 0 0 0.0.0.0:3128 0.0.0.0:* LISTEN 1718/spiceproxy
tcp 0 0 127.0.0.1:25 0.0.0.0:* LISTEN 1509/master
tcp 0 0 127.0.0.1:953 0.0.0.0:* LISTEN 1317/named
tcp 0 0 0.0.0.0:40066 0.0.0.0:* LISTEN 1299/rpc.statd
tcp6 0 0 :::56204 :::* LISTEN 1299/rpc.statd
tcp6 0 0 :::111 :::* LISTEN 1289/rpcbind
tcp6 0 0 :::53 :::* LISTEN 1317/named
tcp6 0 0 :::22 :::* LISTEN 1321/sshd
tcp6 0 0 ::1:25 :::* LISTEN 1509/master
tcp6 0 0 ::1:953 :::* LISTEN 1317/named
tcp6 0 0 :::33020 :::* LISTEN -

and as you can see pvedaemon worker is listed there as well however when I check its status I get

root@node:~# pvedaemon status
stopped

at this point I am lost, I tried also to restart pvestatd and pveproxy, and both restarted fine, although pveproxy need a long while. Can anyone help me?

edit: if this make any difference, I have only 2 VPSes on this node and that VPSes are KVM based.
 
Last edited:
if your backup target was nfs it can happen that the process hanging can not be stopped (not even with kill -9)
if that happens, you have to reboot your host to get rid of those processes
 
  • Like
Reactions: marson
if your backup target was nfs it can happen that the process hanging can not be stopped (not even with kill -9)
if that happens, you have to reboot your host to get rid of those processes

Yes, that an NFS storage is this problem still exist on Proxmox 5 branch? Also can anyone recommend better solution for this? maybe sshfs or something else? if it is matter my backup storage is KVM VPS whih is based on cepch storage.