PVEPROXY will not start after system update, again

brainsoft

New Member
Jun 25, 2024
8
0
1
HI Everyone,
I was struggling with this last night as I am trying to learn the ropes. I have a couple of machines (single nodes for now, no cluster, no HA) and I did a system update. After that I could not access the PVE site anymore. I did a bunch of digging, found that the guest service and everything dependant on it were no longer running. I messed around with pinning the previous kernel in grub but didn't have any luck. Since this is all just training, i just wiped it and started again.

Today, I did an update on the other machine, this one running a pfSense test and the exact same thing happened. What am I missing here? Once again this appears to be the exact same problem on a nearly identical setup. I have full SSH into the machine just like last time so I can run any commands except for the obvious ones that should bring things up. I did fine some big breadcrumbs on my digging yesterday but just couldn't google my way through it.

I know I will need to provide a bunch of info but I did so much reading yesterday I don't even know where to start. Running PVE 8.2.8 but I'm not sure what the previous version was... very recent though.

Code:
root@regulus:~# sysctl -a | grep pve
kernel.osrelease = 6.8.12-3-pve


I have no problem reinstalling again, I'm just trying to find the landmines as I continue on my journey.
 
With a clear head, lets try something new. I attempted to reinstall all PVE components incase something had failed in the previous upgrade

apt install --reinstall $(apt list --installed | grep pve | sed 's/\/.*$//g')
Things seemed to move along fine until
Code:
Setting up pve-manager (8.2.8) ...
Job for pveproxy.service failed because the control process exited with error code.
See "systemctl status pveproxy.service" and "journalctl -xeu pveproxy.service" for details.
Job for spiceproxy.service failed because the control process exited with error code.
See "systemctl status spiceproxy.service" and "journalctl -xeu spiceproxy.service" for details.

Code:
root@regulus:~# systemctl status pveproxy.service
× pveproxy.service - PVE API Proxy Server
     Loaded: loaded (/lib/systemd/system/pveproxy.service; enabled; preset: enabled)
     Active: failed (Result: exit-code) since Tue 2024-11-19 08:16:28 EST; 3min 33s ago
    Process: 153595 ExecStartPre=/usr/bin/pvecm updatecerts --silent (code=exited, status=0/SUCCESS)
    Process: 153603 ExecStart=/usr/bin/pveproxy start (code=exited, status=255/EXCEPTION)
        CPU: 1.309s

Nov 19 08:16:28 regulus systemd[1]: pveproxy.service: Scheduled restart job, restart counter is at 4.
Nov 19 08:16:28 regulus systemd[1]: Stopped pveproxy.service - PVE API Proxy Server.
Nov 19 08:16:28 regulus systemd[1]: pveproxy.service: Consumed 1.309s CPU time.
Nov 19 08:16:28 regulus systemd[1]: pveproxy.service: Start request repeated too quickly.
Nov 19 08:16:28 regulus systemd[1]: pveproxy.service: Failed with result 'exit-code'.
Nov 19 08:16:28 regulus systemd[1]: Failed to start pveproxy.service - PVE API Proxy Server.

Code:
root@regulus:~# journalctl -xeu pveproxy.service
░░ The job identifier is 3149 and the job result is failed.
Nov 19 08:16:28 regulus systemd[1]: pveproxy.service: Consumed 1.309s CPU time.
░░ Subject: Resources consumed by unit runtime
░░ Defined-By: systemd
░░ Support: https://www.debian.org/support
░░
░░ The unit pveproxy.service completed and consumed the indicated resources.
Nov 19 08:16:28 regulus systemd[1]: pveproxy.service: Scheduled restart job, restart counter is at 4.
░░ Subject: Automatic restarting of a unit has been scheduled
░░ Defined-By: systemd
░░ Support: https://www.debian.org/support
░░
░░ Automatic restarting of the unit pveproxy.service has been scheduled, as the result for
░░ the configured Restart= setting for the unit.
Nov 19 08:16:28 regulus systemd[1]: Stopped pveproxy.service - PVE API Proxy Server.
░░ Subject: A stop job for unit pveproxy.service has finished
░░ Defined-By: systemd
░░ Support: https://www.debian.org/support
░░
░░ A stop job for unit pveproxy.service has finished.
░░
░░ The job identifier is 3331 and the job result is done.
Nov 19 08:16:28 regulus systemd[1]: pveproxy.service: Consumed 1.309s CPU time.
░░ Subject: Resources consumed by unit runtime
░░ Defined-By: systemd
░░ Support: https://www.debian.org/support
░░
░░ The unit pveproxy.service completed and consumed the indicated resources.
Nov 19 08:16:28 regulus systemd[1]: pveproxy.service: Start request repeated too quickly.
Nov 19 08:16:28 regulus systemd[1]: pveproxy.service: Failed with result 'exit-code'.
░░ Subject: Unit failed
░░ Defined-By: systemd
░░ Support: https://www.debian.org/support
░░
░░ The unit pveproxy.service has entered the 'failed' state with result 'exit-code'.
Nov 19 08:16:28 regulus systemd[1]: Failed to start pveproxy.service - PVE API Proxy Server.
░░ Subject: A start job for unit pveproxy.service has failed
░░ Defined-By: systemd
░░ Support: https://www.debian.org/support
░░
░░ A start job for unit pveproxy.service has finished with a failure.
░░
░░ The job identifier is 3331 and the job result is failed.

Now lets check the spice proxy. This one has different error about not being able to create socket

Code:
root@regulus:~# systemctl status spiceproxy.service
× spiceproxy.service - PVE SPICE Proxy Server
     Loaded: loaded (/lib/systemd/system/spiceproxy.service; enabled; preset: enabled)
     Active: failed (Result: exit-code) since Tue 2024-11-19 08:16:25 EST; 7min ago
    Process: 153573 ExecStart=/usr/bin/spiceproxy start (code=exited, status=255/EXCEPTION)
        CPU: 337ms

Nov 19 08:16:25 regulus systemd[1]: Starting spiceproxy.service - PVE SPICE Proxy Server...
Nov 19 08:16:25 regulus spiceproxy[153573]: start failed - unable to create socket - Cannot assign requested address
Nov 19 08:16:25 regulus spiceproxy[153573]: start failed - unable to create socket - Cannot assign requested address
Nov 19 08:16:25 regulus systemd[1]: spiceproxy.service: Control process exited, code=exited, status=255/EXCEPTION
Nov 19 08:16:25 regulus systemd[1]: spiceproxy.service: Failed with result 'exit-code'.
Nov 19 08:16:25 regulus systemd[1]: Failed to start spiceproxy.service - PVE SPICE Proxy Server.

Code:
root@regulus:~# journalctl -xeu spiceproxy.service
Nov 18 21:32:48 regulus systemd[1]: Starting spiceproxy.service - PVE SPICE Proxy Server...
░░ Subject: A start job for unit spiceproxy.service has begun execution
░░ Defined-By: systemd
░░ Support: https://www.debian.org/support
░░
░░ A start job for unit spiceproxy.service has begun execution.
░░
░░ The job identifier is 97.
Nov 18 21:32:48 regulus spiceproxy[867]: start failed - unable to create socket - Cannot assign requested address
Nov 18 21:32:48 regulus spiceproxy[867]: start failed - unable to create socket - Cannot assign requested address
Nov 18 21:32:48 regulus systemd[1]: spiceproxy.service: Control process exited, code=exited, status=255/EXCEPTION
░░ Subject: Unit process exited
░░ Defined-By: systemd
░░ Support: https://www.debian.org/support
░░
░░ An ExecStart= process belonging to unit spiceproxy.service has exited.
░░
░░ The process' exit code is 'exited' and its exit status is 255.
Nov 18 21:32:48 regulus systemd[1]: spiceproxy.service: Failed with result 'exit-code'.
░░ Subject: Unit failed
░░ Defined-By: systemd
░░ Support: https://www.debian.org/support
░░
░░ The unit spiceproxy.service has entered the 'failed' state with result 'exit-code'.
Nov 18 21:32:48 regulus systemd[1]: Failed to start spiceproxy.service - PVE SPICE Proxy Server.
░░ Subject: A start job for unit spiceproxy.service has failed
░░ Defined-By: systemd
░░ Support: https://www.debian.org/support
░░
░░ A start job for unit spiceproxy.service has finished with a failure.
░░
░░ The job identifier is 97 and the job result is failed.
Nov 19 08:16:25 regulus systemd[1]: Starting spiceproxy.service - PVE SPICE Proxy Server...
░░ Subject: A start job for unit spiceproxy.service has begun execution
░░ Defined-By: systemd
░░ Support: https://www.debian.org/support
░░
░░ A start job for unit spiceproxy.service has begun execution.
░░
░░ The job identifier is 2849.
Nov 19 08:16:25 regulus spiceproxy[153573]: start failed - unable to create socket - Cannot assign requested address
Nov 19 08:16:25 regulus spiceproxy[153573]: start failed - unable to create socket - Cannot assign requested address
Nov 19 08:16:25 regulus systemd[1]: spiceproxy.service: Control process exited, code=exited, status=255/EXCEPTION
░░ Subject: Unit process exited
░░ Defined-By: systemd
░░ Support: https://www.debian.org/support
░░
░░ An ExecStart= process belonging to unit spiceproxy.service has exited.
░░
░░ The process' exit code is 'exited' and its exit status is 255.
Nov 19 08:16:25 regulus systemd[1]: spiceproxy.service: Failed with result 'exit-code'.
░░ Subject: Unit failed
░░ Defined-By: systemd
░░ Support: https://www.debian.org/support
░░
░░ The unit spiceproxy.service has entered the 'failed' state with result 'exit-code'.
Nov 19 08:16:25 regulus systemd[1]: Failed to start spiceproxy.service - PVE SPICE Proxy Server.
░░ Subject: A start job for unit spiceproxy.service has failed
░░ Defined-By: systemd
░░ Support: https://www.debian.org/support
░░
░░ A start job for unit spiceproxy.service has finished with a failure.
░░
░░ The job identifier is 2849 and the job result is failed.


Both of these systems were basic installs with tteck post-install scripts and a single LXC or VM so I would have expected an upgrade not to break everything. My experience so in the various attempts over the last few months has been alternately absolutely amazing and also rather rouge-like instant death. I'm sure this death was caused by upgrading 40 packages simultaneously, and like I said, restarting is no problem at all, this is not production. I'm just looking for the land mines so I can advance a bit further in the next iteration.
 
Turns out my guest is not affected this time, the pfSense instance is still operational and accessible via the web, it's just the PVE gui that is not available. The other machine had a the pve-guests and pvescheduler both failing as well.

This is better than the other machine at least

Code:
root@regulus:~# systemctl | grep pve
  etc-pve.mount                                                                            loaded active     mounted   /etc/pve
  pve-cluster.service                                                                      loaded active     running   The Proxmox VE cluster filesystem
  pve-firewall.service                                                                     loaded active     running   Proxmox VE firewall
  pve-guests.service                                                                       loaded active     exited    PVE guests
  pve-ha-crm.service                                                                       loaded active     running   PVE Cluster HA Resource Manager Daemon
  pve-ha-lrm.service                                                                       loaded active     running   PVE Local HA Resource Manager Daemon
  pve-lxc-syscalld.service                                                                 loaded active     running   Proxmox VE LXC Syscall Daemon
  pvebanner.service                                                                        loaded active     exited    Proxmox VE Login Banner
  pvedaemon.service                                                                        loaded active     running   PVE API Daemon
  pvefw-logger.service                                                                     loaded active     running   Proxmox VE firewall logger
  pvenetcommit.service                                                                     loaded active     exited    Commit Proxmox VE network changes
● pveproxy.service                                                                         loaded failed     failed    PVE API Proxy Server
  pvescheduler.service                                                                     loaded active     running   Proxmox VE scheduler
  pvestatd.service                                                                         loaded active     running   PVE Status Daemon
  pve-storage.target                                                                       loaded active     active    PVE Storage Target
  pve-daily-update.timer                                                                   loaded active     waiting   Daily PVE download activities

I'm going to assume it will be easier to just reinstall but if anyone sees anything obvious please let me know.
 
Yeah this is the main API service, so guests will still run, but you cannot manage stuff.

What's the output of:

journalctl -u pveproxy.service --since -1h

Please ignore the "Scheduled restart job, restart counter is at 4." messages, look for the first error or copy the whole output.
 
I upgrade a few other systems of mine, some also on no-subscription for dogfooding, and did not run into the pveproxy failing to start after the upgrade. So this doesn't seem to be a very wide/generic issue, it still could be a regression from the upgrade for some specific setting or hardrware or the like, so if you do not mind then it would be great to get more information out of this system. If it is indeed a (targeted) regression from PVE, then we can hopefully fix it for you and avoid that other run into it.
 
Yeah this is the main API service, so guests will still run, but you cannot manage stuff.

What's the output of:

journalctl -u pveproxy.service --since -1h

Please ignore the "Scheduled restart job, restart counter is at 4." messages, look for the first error or copy the whole output.
No entries since 1 hour, but there are some red lines about inotifiy being disabled, immediately after the SHA fingerprint info

Code:
Jul 22 14:07:07 regulus pvecm[837]: got inotify poll request in wrong process - disabling inotify
Jul 22 14:07:08 regulus pveproxy[882]: starting server
Jul 22 14:07:08 regulus pveproxy[882]: starting 3 worker(s)
Jul 22 14:07:08 regulus pveproxy[882]: worker 883 started
Jul 22 14:07:08 regulus pveproxy[882]: worker 884 started
Jul 22 14:07:08 regulus pveproxy[882]: worker 885 started
Jul 22 14:07:08 regulus systemd[1]: Started pveproxy.service - PVE API Proxy Server.
Jul 22 14:15:50 regulus systemd[1]: Reloading pveproxy.service - PVE API Proxy Server...
Jul 22 14:15:51 regulus pveproxy[15511]: send HUP to 882
Jul 22 14:15:51 regulus pveproxy[882]: received signal HUP
Jul 22 14:15:51 regulus pveproxy[882]: server closing
Jul 22 14:15:51 regulus pveproxy[882]: server shutdown (restart)
Jul 22 14:15:51 regulus systemd[1]: Reloaded pveproxy.service - PVE API Proxy Server.
Jul 22 14:15:52 regulus pveproxy[882]: restarting server
Jul 22 14:15:52 regulus pveproxy[882]: starting 3 worker(s)
Jul 22 14:15:52 regulus pveproxy[882]: worker 15524 started
Jul 22 14:15:52 regulus pveproxy[882]: worker 15525 started
Jul 22 14:15:52 regulus pveproxy[882]: worker 15526 started
Jul 22 14:15:57 regulus pveproxy[882]: worker 884 finished
Jul 22 14:15:57 regulus pveproxy[882]: worker 883 finished
Jul 22 14:15:57 regulus pveproxy[882]: worker 885 finished
Jul 22 14:15:58 regulus pveproxy[16936]: got inotify poll request in wrong process - disabling inotify
Jul 22 14:15:58 regulus pveproxy[16937]: got inotify poll request in wrong process - disabling inotify
Jul 22 14:15:58 regulus pveproxy[16935]: got inotify poll request in wrong process - disabling inotify
Jul 22 14:16:00 regulus pveproxy[16936]: worker exit
Jul 22 14:16:00 regulus pveproxy[16935]: worker exit
Jul 22 14:47:08 regulus pveproxy[16937]: worker exit
Jul 22 14:47:13 regulus systemd[1]: Stopping pveproxy.service - PVE API Proxy Server...
Jul 22 14:47:13 regulus pveproxy[882]: received signal TERM
Jul 22 14:47:13 regulus pveproxy[882]: server closing
Jul 22 14:47:13 regulus pveproxy[15526]: worker exit
Jul 22 14:47:13 regulus pveproxy[882]: worker 15526 finished
Jul 22 14:47:13 regulus pveproxy[882]: worker 15525 finished
Jul 22 14:47:13 regulus pveproxy[882]: worker 15524 finished
Jul 22 14:47:13 regulus pveproxy[882]: server stopped
Jul 22 14:47:14 regulus systemd[1]: pveproxy.service: Deactivated successfully.
Jul 22 14:47:14 regulus systemd[1]: Stopped pveproxy.service - PVE API Proxy Server.
Jul 22 14:47:14 regulus systemd[1]: pveproxy.service: Consumed 12.953s CPU time.
-- Boot b963c4888837490c9dbf858a63213581 --
Jul 22 14:47:37 regulus systemd[1]: Starting pveproxy.service - PVE API Proxy Server...
Jul 22 14:47:39 regulus pveproxy[847]: starting server
Jul 22 14:47:39 regulus pveproxy[847]: starting 3 worker(s)
Jul 22 14:47:39 regulus pveproxy[847]: worker 848 started
Jul 22 14:47:39 regulus pveproxy[847]: worker 849 started
Jul 22 14:47:39 regulus pveproxy[847]: worker 850 started
Jul 22 14:47:39 regulus systemd[1]: Started pveproxy.service - PVE API Proxy Server.
Jul 22 15:02:02 regulus pveproxy[849]: multipart upload complete (size: 874672128B time: 7.396s rate: 112.79MiB/s filename: pfSense-CE-2.7.2-RELEASE-amd64.iso)
Jul 22 15:36:52 regulus pveproxy[847]: worker 850 finished
Jul 22 15:36:52 regulus pveproxy[847]: starting 1 worker(s)
Jul 22 15:36:52 regulus pveproxy[847]: worker 7061 started
Jul 22 15:36:54 regulus pveproxy[7060]: got inotify poll request in wrong process - disabling inotify
Jul 22 15:38:04 regulus pveproxy[848]: worker exit
Jul 22 15:38:04 regulus pveproxy[847]: worker 848 finished
Jul 22 15:38:04 regulus pveproxy[847]: starting 1 worker(s)
Jul 22 15:38:04 regulus pveproxy[847]: worker 7204 started
Jul 22 15:44:04 regulus pveproxy[849]: worker exit

After this it just keeps running with worker starting exiting and finishing until now, with one more inotify in wrong process message

Looking at my ISO you can see uploading, it appears my upgrade path was from 2.7.2 to 2.8.2. I'll make sure to grab the current iso at least, but this is the same iso i used to reformat the other machine, which ran the full update prior to recreating PBS lxc but didn't break the second time.
 
Last edited:
Jul 22 14:47:37
But this is a bit older log I think? Or is your server time behind by a lot?
No entries since 1 hour, but there are some red lines about inotifiy being disabled
Yeah, I should have used a longer time period for the proposed command to hedge against you seeing my reply only a while later.

Let's go for the last two days, it should contain the time when things went south, additionally include the log of some other services as well:

journalctl -u pveproxy.service -u pvedaemon.service -u pve-cluster.service --since -2d
 
whats the content of:
hostname
your hosts file
/etc/network/interfaces

I was researching how to add a second virtual IP to the lan interface vmbr1 from PVE but did not get to any actual changes. That is actually how I ended up running the upgrade, in that research, but I don't think I got to the point of making any actual changes. It is labelled vlan trunk but there is no vlan set up yet in pfSense or elsewhere. Dynamic PPPoE and static LAN connection to the ISP hardware on one interface and pfSense lan gateway at 192.168.10.1/24 on the other, pcie pass through to pfSense.

Code:
auto lo
iface lo inet loopback


iface enp0s31f6 inet manual


iface enp2s0 inet manual


auto vmbr0
iface vmbr0 inet static
        address 192.168.2.10/24
        gateway 192.168.2.1
        bridge-ports enp0s31f6
        bridge-stp off
        bridge-fd 0
#PPPoE WAN


auto vmbr1
iface vmbr1 inet manual
        bridge-ports enp2s0
        bridge-stp off
        bridge-fd 0
#new lan / VLAN trunk


source /etc/network/interfaces.d/*

I upgrade a few other systems of mine, some also on no-subscription for dogfooding, and did not run into the pveproxy failing to start after the upgrade. So this doesn't seem to be a very wide/generic issue, it still could be a regression from the upgrade for some specific setting or hardrware or the like, so if you do not mind then it would be great to get more information out of this system. If it is indeed a (targeted) regression from PVE, then we can hopefully fix it for you and avoid that other run into it.

I can't pay for the software but I will gladly support the community in the only way I can, testing and reporting. Until I started playing with Docker and back door ssh into my Synology about a year ago, I've been out of linux since computer programming in college, (~20 years) so I've got a shear cliff of a learning curve ahead of me. I can make some great copy pasta though, but I try to at least read the recipe first. Don't ask me to craft my own regular expressions beyond * though!
 
But this is a bit older log I think? Or is your server time behind by a lot?

Yeah, I should have used a longer time period for the proposed command to hedge against you seeing my reply only a while later.

Let's go for the last two days, it should contain the time when things went south, additionally include the log of some other services as well:

journalctl -u pveproxy.service -u pvedaemon.service -u pve-cluster.service --since -2d
Okay, my log starts Sunday evening. I recall there was a bad gateway or ip hostname mapping assigned that I had missed when i changed subnet on this little lab, but I see log entires like this pmxcfs[153212]: [main] notice: resolved node name 'regulus' to '192.168.1.235' for default node IP address . This was causing the upgrade process to fail initially IIRC, but i fixed that and was able to proceed with the upgrade.

I feel the issue was network related, there may still be a few networking gremlins hiding out after moving subnets.


Here is the whole log from sign-in, through upgrade and failure, bed time Sunday night.

Log is bit too verbose, see attachment.

Gotcha you little jerk, hiding in the host file!

192.168.1.235 regulus.local regulus

Will move that to the ISP hardware static lan IP and reboot and see if that changes anything.
 

Attachments

  • 4days.log
    23.5 KB · Views: 1
no immediate change but the host file kept the changes at least.

Checking my command history and I actually had to run dhclient on the night in question in order to get something to proceed. I was trying to install net-tools to add the aformentioned second lan IP at the time

Code:
root@regulus:~# apt list | grep net-tools
ddnet-tools/stable 16.4-1.1 amd64
net-tools/stable 2.10-0.1 amd64

Looks like the install was successful but I don't believe I actually used it. I wish I could rememeber the exact order, that would probably give a good clue.

Off to work, I'll check back in this afternoon.
 
Code:
auto lo
iface lo inet loopback


iface enp0s31f6 inet manual


iface enp2s0 inet manual


auto vmbr0
iface vmbr0 inet static
        address 192.168.2.10/24
        gateway 192.168.2.1
        bridge-ports enp0s31f6
        bridge-stp off
        bridge-fd 0
#PPPoE WAN


auto vmbr1
iface vmbr1 inet manual
        bridge-ports enp2s0
        bridge-stp off
        bridge-fd 0
#new lan / VLAN trunk


source /etc/network/interfaces.d/*
those other two bits of info are important.
 
I think they meant:
hostname
your hosts file

I.e.:

Code:
hostname
# which normally comes from
cat /etc/hostname
cat /etc/hosts

As a mistake or missing entry there could cause fail the pmxcfs configuration file system to start, which in turn would explain why the HTTP API daemons cannot start, as they read their certificates from pmxcfs.
 
  • Like
Reactions: alexskysilk

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!