Proxmox Cluster lost Gui

adminkc

Member
Sep 28, 2020
91
0
11
29
Hi,

our cluster with 13 nodes lost every second day the access to login in the Cluster.
What we have to do is to restart the services = (service corosync stop && service pveproxy restart && service corosync start) and then it work again.
This happen since the upgrade on Proxmox 7.0. (Everything with the upgrade went well)

Does everybody have a idea how to fix this because it is not a solution to restart that every second day?

BR,
KC IT Team
 

Attachments

  • Gui fehler.PNG
    Gui fehler.PNG
    12.9 KB · Views: 6
Please provide your corosync config /etc/pve/corosync.conf, as well as your interfaces file /etc/network/interfaces.
Could you also provide the journal/syslog of the time frame this happens? If possible ~15 minutes before and after restarting the services.
 
Are always the same nodes `offline`? If so, one each (offline and online).
 
Then it doesn't matter which one.
Could you also provide the output of systemctl status pveproxy.service when the issue appears?
 
The error happend today morning again , I checked all journlactls without error.
Attached are the corosync and network config.
systemctl status pveproxy.service also say nothing.
 

Attachments

  • corosync_121021.jpg
    corosync_121021.jpg
    645.2 KB · Views: 5
  • corosync_12102021.jpg
    corosync_12102021.jpg
    904.3 KB · Views: 5
  • network interfaces.PNG
    network interfaces.PNG
    31.6 KB · Views: 5
Without the logs (journal) there's nothing to see here.
Please provide it as mentioned previously.
 
here is the pveproxy service from one of our nodes and the journalctl what give me some error now.
 

Attachments

  • pve proxy service.PNG
    pve proxy service.PNG
    23.4 KB · Views: 5
  • journalctl -xfe error.PNG
    journalctl -xfe error.PNG
    139.7 KB · Views: 4
Last edited:
and also two other nodes proxmox 05 & 06 gives me errors
 

Attachments

  • proxmox 06 error.PNG
    proxmox 06 error.PNG
    133.2 KB · Views: 4
  • proxmox 05 error.PNG
    proxmox 05 error.PNG
    106.7 KB · Views: 4
Were there any changes on the switch?
Is the NIC firmware up-to-date? Is the BIOS up-to-date?


When providing logs and configs, please provide them either as text in code tags, or attach them as text files.
This makes it a lot easier.
 
We didn't change anything on the swtiches and the firmwares are up to date.
For the first time that happen when we update our 2 new servers and put them into the cluster.

Is it possible that the error might be caused by the kernel version since not all the nodes have the same kernel version (not all of them are restarted after the update)?