[SOLVED] Proxmox Server random reboot

Edison · Apr 2, 2023

I used version 5.x before, and then restarted randomly, and I couldn’t find any reason. Later, I upgraded to 7.x, and there was no problem at the beginning. Recently, I found that there was at least one random restart every day, and I couldn’t find the reason. , please help

Code:

proxmox-ve: 7.3-1 (running kernel: 5.15.74-1-pve)
pve-manager: 7.3-3 (running version: 7.3-3/c3928077)
pve-kernel-5.15: 7.2-14
pve-kernel-helper: 7.2-14
pve-kernel-5.15.74-1-pve: 5.15.74-1
ceph-fuse: 15.2.17-pve1
corosync: 3.1.7-pve1
criu: 3.15-1+pve-1
glusterfs-client: 9.2-1
ifupdown2: 3.1.0-1+pmx3
ksm-control-daemon: 1.4-1
libjs-extjs: 7.0.0-1
libknet1: 1.24-pve2
libproxmox-acme-perl: 1.4.2
libproxmox-backup-qemu0: 1.3.1-1
libpve-access-control: 7.2-5
libpve-apiclient-perl: 3.2-1
libpve-common-perl: 7.2-8
libpve-guest-common-perl: 4.2-3
libpve-http-server-perl: 4.1-5
libpve-storage-perl: 7.2-12
libspice-server1: 0.14.3-2.1
lvm2: 2.03.11-2.1
lxc-pve: 5.0.0-3
lxcfs: 4.0.12-pve1
novnc-pve: 1.3.0-3
proxmox-backup-client: 2.2.7-1
proxmox-backup-file-restore: 2.2.7-1
proxmox-mini-journalreader: 1.3-1
proxmox-widget-toolkit: 3.5.3
pve-cluster: 7.3-1
pve-container: 4.4-2
pve-docs: 7.3-1
pve-edk2-firmware: 3.20220526-1
pve-firewall: 4.2-7
pve-firmware: 3.5-6
pve-ha-manager: 3.5.1
pve-i18n: 2.8-1
pve-qemu-kvm: 7.1.0-4
pve-xtermjs: 4.16.0-1
qemu-server: 7.3-1
smartmontools: 7.2-pve3
spiceterm: 3.2-2
swtpm: 0.8.0~bpo11+2
vncterm: 1.7-1
zfsutils-linux: 2.1.6-pve1

Moayad · Apr 3, 2023

Hello,

Thank you for the output of syslog.

There is no there isn't a clear indication of the cause of the reboot. I would instead check:
- dmesg
- Try updating the BIOS firmware (sometimes the issue might be related to hardware) or the hardware temperature
- If you monitor the PVE, can you see before the restart if there is high I/O?
- Lastly, checking the power issues, maybe the power supply problem?

Edison · Apr 4, 2023

Moayad said:
Hello,

Thank you for the output of syslog.

There is no there isn't a clear indication of the cause of the reboot. I would instead check:
- dmesg
- Try updating the BIOS firmware (sometimes the issue might be related to hardware) or the hardware temperature
- If you monitor the PVE, can you see before the restart if there is high I/O?
- Lastly, checking the power issues, maybe the power supply problem?

Thank you for your suggestion, I will try it for a while according to your suggestion and give feedback.

amengus · Apr 4, 2023

Hello,

I think I'm facing the same issue on only one node.

- It's a new node (since 7 days) in a production cluster (3 members + this one) so there is no VMs running on it (yet)
- No high I/O before the reboot
- No information related inside dmesg
- The node is using the latest packages available (Community Edition)

Last crash occured yesterday April 3th at 19:30 (Europe/Paris) :

Moayad · Apr 4, 2023

Do you enable the HA Proxy?
At the time when a node got rebooted, did you see anything in the Syslog/jornalctl related to corosync?

amengus · Apr 4, 2023

No HA Proxy enabled on my side.
Sorry, I've read logs twice and no error about corosync in them.

The only parameter that was different is about the time zone : 3 nodes under Europe/Paris and the last one (crash) under UTC.
I've changed the setting this morning but "I don't think" that it could lead to a crash.

I've just migrate a VM on this crashed node in order to check if the reboot occurs again in a few hours.

Moayad · Apr 4, 2023

Sorry, I meant HA Availability, not ~~HA proxy~~.

Well the next check is to see if the Hardware issue like power supply, and I would also check if there is a BIOS update.

amengus · Apr 5, 2023

Update :
* No issue from RAM / CPU.
* Always checking if it's a PSU issue

Another difference I see is that the node is listed with it's public address instead of it's private one :

But it shouldn't affect ?

Falk R. · Apr 5, 2023

Can you post your Corosync.conf?

amengus · Apr 5, 2023

Of course :

JSON:

logging {
  debug: off
  to_syslog: yes
}


nodelist {
  node {
    name: oc0
    nodeid: 3
    quorum_votes: 1
    ring0_addr: 192.168.0.10
  }
  node {
    name: oc1
    nodeid: 1
    quorum_votes: 1
    ring0_addr: 192.168.0.1
  }
  node {
    name: oc2
    nodeid: 2
    quorum_votes: 1
    ring0_addr: 192.168.0.2
  }
  node {
    name: ocr
    nodeid: 4
    quorum_votes: 1
    ring0_addr: 192.168.0.11
  }
}


quorum {
  provider: corosync_votequorum
}


totem {
  cluster_name: OC
  config_version: 6
  interface {
    linknumber: 0
  }
  ip_version: ipv4-6
  link_mode: passive
  secauth: on
  version: 2
}

Moayad · Apr 5, 2023

We recommend having a separate network for corosync or/and adding a ring to the corosync configuration [0].

[0] https://pve.proxmox.com/pve-docs/pve-admin-guide.html#pvecm_redundancy

amengus · Apr 7, 2023

Our hosting provider (OVHcloud) tells us that we are concerned about this issue :
https://bugzilla.proxmox.com/show_bug.cgi?id=2569

That's strange because we only use 3 VLANs but we will check if we have made any configuration error.

Edison · Apr 20, 2023

Moayad said:
Hello,

Thank you for the output of syslog.

There is no there isn't a clear indication of the cause of the reboot. I would instead check:
- dmesg
- Try updating the BIOS firmware (sometimes the issue might be related to hardware) or the hardware temperature
- If you monitor the PVE, can you see before the restart if there is high I/O?
- Lastly, checking the power issues, maybe the power supply problem?

After checking that all the suggestions you provided were ineffective, I shifted my focus to hardware and eventually discovered that the problem was with the SATA data cable. After replacing it, the problem never occurred again. Finally, I want to thank you again for your help.

Moayad · Apr 20, 2023

Glad to read that you fix the issue yourself!

I will set your thread as [SOLVED] to help other people who have the similar issue.

[SOLVED] Proxmox Server random reboot

Edison

New Member

Attachments

Moayad

Proxmox Staff Member

Edison

New Member

amengus

Member

Moayad

Proxmox Staff Member

amengus

Member

Moayad

Proxmox Staff Member

amengus

Member

Falk R.

Distinguished Member

amengus

Member

Moayad

Proxmox Staff Member

amengus

Member

Edison

New Member

Moayad

Proxmox Staff Member

We value your privacy