ERR_CONNECTION_REFUSED to the web UI after kernel update

rmh240

Member
Feb 24, 2022
10
0
6
101
Hey everyone,

Since updating to the latest kernel, I have been experiencing a Proxmox outage. None of the VM seems to be online. If I log onto the web UI, I get the error ERR_CONNECTION_REFUSED.

What I have done so far
  • I checked for duplicate IP addresses, and the server is the only one with this given address.
  • I can get into the server using ssh but cant do an apt-get update && apt-get upgrade it states that there is no internet connection.
I hope someone can help me and/or point me in the right direction.

Thanks very much already.
 
hi,

If I log onto the web UI, I get the error ERR_CONNECTION_REFUSED.
* are you connecting like https://your.ip.address.here:8006 ?

Since updating to the latest kernel, I have been experiencing a Proxmox outage.
* have you done a reboot after the kernel upgrade was complete? (if not, please do that first!)
* which version are you on? pveversion -v

I checked for duplicate IP addresses, and the server is the only one with this given address.
okay
I can get into the server using ssh but cant do an apt-get update && apt-get upgrade it states that there is no internet connection.
* are the pve services running? systemctl | grep pve
 
  • Like
Reactions: rmh240
Thanks so much for getting back to me so quickly.
Yes indeed.
* have you done a reboot after the kernel upgrade was complete? (if not, please do that first!)
I always reboot avfter a kernel update. Interestingly enough this occurred after the reboot. To be more specific after initiating the reboot I was never able to log on again.
* which version are you on? pveversion -v
proxmox-ve: 6.4-1 (running kernel: 5.4.166-1-pve)
pve-manager: 6.4-13 (running version: 6.4-13/9f411e79)
pve-kernel-5.4: 6.4-13
pve-kernel-helper: 6.4-13
pve-kernel-5.4.166-1-pve: 5.4.166-1
pve-kernel-5.4.162-1-pve: 5.4.162-2
pve-kernel-5.4.157-1-pve: 5.4.157-1
pve-kernel-5.4.151-1-pve: 5.4.151-1
pve-kernel-5.4.34-1-pve: 5.4.34-2
ceph-fuse: 12.2.11+dfsg1-2.1+b1
corosync: 3.1.5-pve2~bpo10+1
criu: 3.11-3
glusterfs-client: 5.5-3
ifupdown: 0.8.35+pve1
ksm-control-daemon: 1.3-1
libjs-extjs: 6.0.1-10
libknet1: 1.22-pve2~bpo10+1
libproxmox-acme-perl: 1.1.0
libproxmox-backup-qemu0: 1.1.0-1
libpve-access-control: 6.4-3
libpve-apiclient-perl: 3.1-3
libpve-common-perl: 6.4-4
libpve-guest-common-perl: 3.1-5
libpve-http-server-perl: 3.2-3
libpve-storage-perl: 6.4-1
libqb0: 1.0.5-1
libspice-server1: 0.14.2-4~pve6+1
lvm2: 2.03.02-pve4
lxc-pve: 4.0.6-2
lxcfs: 4.0.6-pve1
novnc-pve: 1.1.0-1
proxmox-backup-client: 1.1.13-2
proxmox-mini-journalreader: 1.1-1
proxmox-widget-toolkit: 2.6-1
pve-cluster: 6.4-1
pve-container: 3.3-6
pve-docs: 6.4-2
pve-edk2-firmware: 2.20200531-1
pve-firewall: 4.1-4
pve-firmware: 3.3-2
pve-ha-manager: 3.1-1
pve-i18n: 2.3-1
pve-qemu-kvm: 5.2.0-6
pve-xtermjs: 4.7.0-3
qemu-server: 6.4-2
smartmontools: 7.2-pve2
spiceterm: 3.1-1
vncterm: 1.6-2
zfsutils-linux: 2.0.7-pve1
* are the pve services running? systemctl | grep pve
Some of them are not running but I am not sure what is meant to run. Here is the printout.
Code:
  etc-pve.mount                                                                            loaded active       mounted             /etc/pve                                                                    
  pve-cluster.service                                                                      loaded active       running             The Proxmox VE cluster filesystem                                          
  pve-firewall.service                                                                     loaded deactivating final-sigterm start Proxmox VE firewall                                                        
  pve-guests.service                                                                       loaded inactive     dead          start PVE guests                                                                  
  pve-ha-crm.service                                                                       loaded inactive     dead          start PVE Cluster HA Resource Manager Daemon                                      
  pve-ha-lrm.service                                                                       loaded inactive     dead          start PVE Local HA Resource Manager Daemon                                        
  pve-lxc-syscalld.service                                                                 loaded active       running             Proxmox VE LXC Syscall Daemon                                              
  pvebanner.service                                                                        loaded active       exited              Proxmox VE Login Banner                                                    
  pvedaemon.service                                                                        loaded deactivating final-sigterm start PVE API Daemon                                                              
  pvefw-logger.service                                                                     loaded active       running             Proxmox VE firewall logger                                                  
  pvenetcommit.service                                                                     loaded active       exited              Commit Proxmox VE network changes                                          
  pveproxy.service                                                                         loaded inactive     dead          start PVE API Proxy Server                                                        
  pvesr.service                                                                            loaded activating   start         start Proxmox VE replication runner                                              
  pvestatd.service                                                                         loaded deactivating final-sigterm start PVE Status Daemon                                                          
  dev-pve-swap.swap                                                                        loaded active       active              /dev/pve/swap                                                              
  pve-storage.target                                                                       loaded active       active              PVE Storage Target                                                          
  pve-daily-update.timer                                                                   loaded active       waiting             Daily PVE download activities                                              
  pvesr.timer                                                                              loaded active       running             Proxmox VE replication runner
 
thanks for the outputs.

Some of them are not running but I am not sure what is meant to run. Here is the printout.
looks like some important services aren't working (pvestatd, pvedaemon and pveproxy)

could you also post the resulting output file from journalctl -e -u pveproxy -u pvedaemon -u pvestatd > output.txt
 
thanks for the outputs.
You are welcome, and thanks again for your time. This is very much appreciated :)
looks like some important services aren't working (pvestatd, pvedaemon and pveproxy)
That's correct. And for some reason it is shutting down some more services.
could you also post the resulting output file from journalctl -e -u pveproxy -u pvedaemon -u pvestatd > output.txt
I have attached the output.txt file to this reply
 

Attachments

  • output.txt
    18.8 KB · Views: 4
can't find much of a reason in the output, but thanks for sending.
That's correct. And for some reason it is shutting down some more services.
in that case could you also post the output from journalctl -xe -u 'pve*' > output2.txt
 
  • Like
Reactions: rmh240
can't find much of a reason in the output, but thanks for sending.

in that case could you also post the output from journalctl -xe -u 'pve*' > output2.txt
Of course :) See the file attached
 

Attachments

  • output2.txt
    105 KB · Views: 5
Of course :) See the file attached
really weird... could you send us your whole journal? journalctl -b > journal.txt
also post the outputs from:
* cat /etc/hosts
* ping -c2 $(uname -n)
 
  • Like
Reactions: rmh240
really weird... could you send us your whole journal? journalctl -b > journal.txt
Please see the journal file attached. I hope this helps as it doesn't seem to go far back in time.
also post the outputs from:
* cat /etc/hosts
Code:
127.0.0.1 localhost.localdomain localhost
192.168.0.10 HomeServer.local HomeServer

# The following lines are desirable for IPv6 capable hosts

::1     ip6-localhost ip6-loopback
fe00::0 ip6-localnet
ff00::0 ip6-mcastprefix
ff02::1 ip6-allnodes
ff02::2 ip6-allrouters
ff02::3 ip6-allhosts
* ping -c2 $(uname -n)
Code:
PING HomeServer.local (192.168.0.10) 56(84) bytes of data.
64 bytes from HomeServer.local (192.168.0.10): icmp_seq=1 ttl=64 time=0.026 ms
64 bytes from HomeServer.local (192.168.0.10): icmp_seq=2 ttl=64 time=0.034 ms

--- HomeServer.local ping statistics ---
2 packets transmitted, 2 received, 0% packet loss, time 23ms
rtt min/avg/max/mdev = 0.026/0.030/0.034/0.004 ms
 

Attachments

  • journal.txt
    297.5 KB · Views: 6
Please see the journal file attached. I hope this helps as it doesn't seem to go far back in time.
thanks, that should be alright (it's the journal from this boot)

and it seems like there's a kernel oops trace in there, so we'll have to take a look at that.

in the meantime could you try downgrading or pinning to the older kernel version to see if things start working again?
 
thanks, that should be alright (it's the journal from this boot)

and it seems like there's a kernel oops trace in there, so we'll have to take a look at that.

in the meantime could you try downgrading or pinning to the older kernel version to see if things start working again?
That's interesting? how can I do this?
 
That's interesting? how can I do this?
depends on your bootloader [0], and since you're on PVE6 it might be slightly different.

but in essence you can either:
* manually select the older kernel when booting (if you have physical access to the host)
* edit your bootloader config (grub/systemd-boot) to point to the older kernel

if you use grub check this post here [1]
and if you're using systemd-boot then you can use our tool proxmox-boot-tool

[0]: https://pve.proxmox.com/wiki/Host_Bootloader
[1]: https://forum.proxmox.com/threads/revert-to-prior-kernel.100310/#post-434580
 
  • Like
Reactions: rmh240
depends on your bootloader [0], and since you're on PVE6 it might be slightly different.

but in essence you can either:
* manually select the older kernel when booting (if you have physical access to the host)
* edit your bootloader config (grub/systemd-boot) to point to the older kernel

if you use grub check this post here [1]
and if you're using systemd-boot then you can use our tool proxmox-boot-tool

[0]: https://pve.proxmox.com/wiki/Host_Bootloader
[1]: https://forum.proxmox.com/threads/revert-to-prior-kernel.100310/#post-434580
Thank you very much for your assistance! This has solved this problem. Selecting the older kernel manually didn't work for me.
The grub method has worked now, and all is working again. Once there is a new kernel, I should revert the setting to GRUB_DEFAULT=0?
 
The grub method has worked now, and all is working again.
great to hear that :)

Once there is a new kernel, I should revert the setting to GRUB_DEFAULT=0?
yes, you could just comment out that one and comment the hardcoded values (just in case)

the kernel oops looks to be related to sound devices. after the oops, your system keeps booting but proceeds to timeout on almost every service since it's in an unstable state.

are you doing any kind of device passthrough to your VMs (especially sound related) ? if yes please post the corresponding VM configs here.

it would also be helpful to know what kind of hardware you have: lspci -nnk | grep -i audio -A 2
 
great to hear that :)


yes, you could just comment out that one and comment the hardcoded values (just in case)

the kernel oops looks to be related to sound devices. after the oops, your system keeps booting but proceeds to timeout on almost every service since it's in an unstable state.

are you doing any kind of device passthrough to your VMs (especially sound related) ? if yes please post the corresponding VM configs here.
I am not, although I was thinking of doing so. It is a Lenovo ThinkCentre M710 which I use to run various server products on.
it would also be helpful to know what kind of hardware you have: lspci -nnk | grep -i audio -A 2
I hope this helps:
Code:
00:1f.3 Audio device [0403]: Intel Corporation Cannon Lake PCH cAVS [8086:a348] (rev 10)
        Subsystem: Lenovo Cannon Lake PCH cAVS [17aa:312d]
        Kernel driver in use: sof-audio-pci
        Kernel modules: snd_hda_intel, snd_sof_pci
00:1f.4 SMBus [0c05]: Intel Corporation Cannon Lake PCH SMBus Controller [8086:a323] (rev 10)
 
thanks for the output :)

by the way, PVE6 will be end of life at some point, so you should think about upgrading your server to PVE 7 [0] :)
I am aware of this but I haven't had the courage so far to update it. I have taken the time today and upgraded to PVE 7. Initially, after upgrading I had an issue with VMs not booting - i.e. they were looking for the boot disk. I have then reverted to the most current kernel, i.e. uncommented the original command as you have suggested on another forum post and then it all worked again as before.
we also have an opt-in kernel version 5.15 [1] that could potentially work fine with your hardware (the default kernel version on PVE7 at the time of writing is 5.13)

[0]: https://pve.proxmox.com/wiki/Upgrade_from_6.x_to_7.0
[1]: https://forum.proxmox.com/threads/opt-in-linux-kernel-5-15-for-proxmox-ve-7-x-available.100936/
Once again thank you so much for your assistance. It has helped me greatly.
 
Once again thank you so much for your assistance. It has helped me greatly.
gladly :)

i can't reproduce your oops trace here at the moment, and haven't found other users reporting on this (most people have already moved on to PVE7 and newer kernels).
in the meantime you can just keep booting the older working kernel and wait for a new kernel version until there's one available.

for testing purposes, you could try to blacklist the offending kernel modules unless you need sound on your host (see the lspci output, which lists the kernel modules in use).

you can add them to /etc/modprobe.d/blacklist.conf and reboot your machine to the buggy kernel (edit your bootloader config as before).

if you're still getting errors with that, we then deduce that the issue is not related to your firmware/hardware but a different bug.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!