Unsolicited reboot

Znuf · Mar 29, 2021

Hello, I have a problem with my proxmox server. For some reason, once a month the server reboots.

Today between 17:00 and 17:04

my syslog :

Code:

Mar 29 16:57:00 pve systemd[1]: Started Proxmox VE replication runner.
Mar 29 16:58:00 pve systemd[1]: Starting Proxmox VE replication runner...
Mar 29 16:58:00 pve systemd[1]: pvesr.service: Succeeded.
Mar 29 16:58:00 pve systemd[1]: Started Proxmox VE replication runner.
Mar 29 16:59:00 pve systemd[1]: Starting Proxmox VE replication runner...
Mar 29 16:59:00 pve systemd[1]: pvesr.service: Succeeded.
Mar 29 16:59:00 pve systemd[1]: Started Proxmox VE replication runner.
Mar 29 17:00:00 pve systemd[1]: Starting Proxmox VE replication runner...
Mar 29 17:00:00 pve systemd[1]: pvesr.service: Succeeded.
Mar 29 17:00:00 pve systemd[1]: Started Proxmox VE replication runner.
Mar 29 17:01:00 pve systemd[1]: Starting Proxmox VE replication runner...
Mar 29 17:01:00 pve systemd[1]: pvesr.service: Succeeded.
Mar 29 17:01:00 pve systemd[1]: Started Proxmox VE replication runner.
Mar 29 17:04:21 pve systemd-modules-load[1602]: Inserted module 'iscsi_tcp'
Mar 29 17:04:21 pve systemd-modules-load[1602]: Inserted module 'ib_iser'
Mar 29 17:04:21 pve systemd-modules-load[1602]: Inserted module 'vhost_net'
Mar 29 17:04:21 pve systemd[1]: Starting Flush Journal to Persistent Storage...
Mar 29 17:04:21 pve systemd[1]: Started Flush Journal to Persistent Storage.
Mar 29 17:04:21 pve systemd[1]: Started udev Coldplug all Devices.
Mar 29 17:04:21 pve systemd[1]: Starting udev Wait for Complete Device Initialization...
Mar 29 17:04:21 pve systemd[1]: Starting Helper to synchronize boot up for ifupdown...
Mar 29 17:04:21 pve systemd[1]: Started udev Kernel Device Manager.
Mar 29 17:04:21 pve systemd[1]: Started Monitoring of LVM2 mirrors, snapshots etc. using dmeventd or progress polling.
Mar 29 17:04:21 pve systemd[1]: Reached target Local File Systems (Pre).
Mar 29 17:04:21 pve systemd-udevd[1667]: Using default interface naming scheme 'v240'.

Did you know what should be the problem ? Thanks

Hardware :
CPU : AMD Ryzen 7 3700X 8-Core Processor
MB :Asus PRIME B450M-A
RAM : 32go
HDD : 2TO ZFS

oguz · Mar 30, 2021

hi,

* is server clustered?

* is HA enabled?

* is it once a month regularly? maybe it's a cron job or systemd timer?

* pveversion -v

Znuf · Mar 30, 2021

Hi,

Thanks for your reply.
No not clustered,
Yes it's HA enabled.
No it's not regulary

pveversion -v :

Code:

proxmox-ve: 6.3-1 (running kernel: 5.4.106-1-pve)
pve-manager: 6.3-6 (running version: 6.3-6/2184247e)
pve-kernel-5.4: 6.3-8
pve-kernel-helper: 6.3-8
pve-kernel-5.3: 6.1-6
pve-kernel-5.4.106-1-pve: 5.4.106-1
pve-kernel-5.4.78-2-pve: 5.4.78-2
pve-kernel-5.4.78-1-pve: 5.4.78-1
pve-kernel-5.3.18-3-pve: 5.3.18-3
pve-kernel-5.3.10-1-pve: 5.3.10-1
ceph-fuse: 12.2.11+dfsg1-2.1+b1
corosync: 3.1.0-pve1
criu: 3.11-3
glusterfs-client: 5.5-3
ifupdown: residual config
ifupdown2: 3.0.0-1+pve3
ksm-control-daemon: 1.3-1
libjs-extjs: 6.0.1-10
libknet1: 1.20-pve1
libproxmox-acme-perl: 1.0.8
libproxmox-backup-qemu0: 1.0.3-1
libpve-access-control: 6.1-3
libpve-apiclient-perl: 3.1-3
libpve-common-perl: 6.3-5
libpve-guest-common-perl: 3.1-5
libpve-http-server-perl: 3.1-1
libpve-storage-perl: 6.3-7
libqb0: 1.0.5-1
libspice-server1: 0.14.2-4~pve6+1
lvm2: 2.03.02-pve4
lxc-pve: 4.0.6-2
lxcfs: 4.0.6-pve1
novnc-pve: 1.1.0-1
proxmox-backup-client: 1.0.11-1
proxmox-mini-journalreader: 1.1-1
proxmox-widget-toolkit: 2.4-9
pve-cluster: 6.2-1
pve-container: 3.3-4
pve-docs: 6.3-1
pve-edk2-firmware: 2.20200531-1
pve-firewall: 4.1-3
pve-firmware: 3.2-2
pve-ha-manager: 3.1-1
pve-i18n: 2.2-2
pve-qemu-kvm: 5.2.0-4
pve-xtermjs: 4.7.0-3
qemu-server: 6.3-8
smartmontools: 7.2-pve2
spiceterm: 3.1-1
vncterm: 1.6-2
zfsutils-linux: 2.0.4-pve1

My contrab has only one line : 0 4 * * * rsync -av --delete /var/lib/vz/dump/ /home/network/

oguz · Mar 30, 2021

Znuf said:
No it's not regulary

if it's not happening regularly then we can rule out cron or timer

HA is enabled but you don't have a cluster? not sure if this is really the case
can you post the output from ha-manager status --verbose?

Znuf · Mar 30, 2021

ha-manager status --verbose

Code:

quorum OK
master pve (active, Tue Mar 30 17:01:41 2021)
lrm pve (active, Tue Mar 30 17:01:38 2021)
service vm:100 (pve, started)
service vm:104 (pve, started)
service vm:103 (pve, ignored)
service vm:105 (pve, ignored)
full cluster state:
{
   "lrm_status" : {
      "pve" : {
         "mode" : "active",
         "results" : {
            "X3vJPWMZG9yQmKPTVHZTbg" : {
               "exit_code" : 0,
               "sid" : "vm:104",
               "state" : "started"
            },
            "uSkhQE6v+jLEHyIVZGxSXw" : {
               "exit_code" : 0,
               "sid" : "vm:100",
               "state" : "started"
            }
         },
         "state" : "active",
         "timestamp" : 1617116498
      }
   },
   "manager_status" : {
      "master_node" : "pve",
      "node_status" : {
         "pve" : "online"
      },
      "service_status" : {
         "vm:100" : {
            "node" : "pve",
            "running" : 1,
            "state" : "started",
            "uid" : "sj1AxEho+X111fPZSZKFhw"
         },
         "vm:104" : {
            "node" : "pve",
            "running" : 1,
            "state" : "started",
            "uid" : "0xSd9FuSUOVWy5yd58jYwQ"
         }
      },
      "timestamp" : 1617116501
   },
   "quorum" : {
      "node" : "pve",
      "quorate" : "1"
   }
}

oguz · Apr 7, 2021

without a cluster setup setting up HA will only complicate things more (your node could be getting fenced, which would explain the reboots)

just to be sure could you also check systemctl list-timers

Znuf · Apr 9, 2021

In my case, HA is usefull when my windows VM do an upgrade. If I don't activate HA, some time the machine didn't reboot.

systemctl list-timers :

Code:

NEXT                          LEFT     LAST                          PASSED   UNIT                         ACTIVATES
Fri 2021-04-09 17:27:00 CEST  8s left  Fri 2021-04-09 17:26:00 CEST  51s ago  pvesr.timer                  pvesr.service
Sat 2021-04-10 00:00:00 CEST  6h left  Fri 2021-04-09 00:00:00 CEST  17h ago  logrotate.timer              logrotate.service
Sat 2021-04-10 00:00:00 CEST  6h left  Fri 2021-04-09 00:00:00 CEST  17h ago  man-db.timer                 man-db.service
Sat 2021-04-10 00:25:14 CEST  6h left  Fri 2021-04-09 07:13:46 CEST  10h ago  apt-daily.timer              apt-daily.service
Sat 2021-04-10 05:34:52 CEST  12h left Fri 2021-04-09 02:53:46 CEST  14h ago  pve-daily-update.timer       pve-daily-update.service
Sat 2021-04-10 06:41:21 CEST  13h left Fri 2021-04-09 06:02:00 CEST  11h ago  apt-daily-upgrade.timer      apt-daily-upgrade.service
Sat 2021-04-10 17:21:00 CEST  23h left Fri 2021-04-09 17:21:00 CEST  5min ago systemd-tmpfiles-clean.timer systemd-tmpfiles-clean.service

7 timers listed.
Pass --all to see loaded but inactive timers, too.

Znuf · Jan 10, 2022

I found the problem, but not the solution.

It's the IO, if the SSD are too solicited, the server died and reboot.
If I copy large file from one VM to another or if I run CrystalDiskMark I'm sure that the server will fail.

Search

Search

Unsolicited reboot

Znuf

Member

oguz

Proxmox Retired Staff

Znuf

Member

oguz

Proxmox Retired Staff

Znuf

Member

oguz

Proxmox Retired Staff

Znuf

Member

Znuf

Member

We value your privacy