Question marks on all nodes but 1 in Proxmox VE 6

Mor H. · Aug 9, 2019

Hi everyone,

We recently installed a cluster with Proxmox VE 6 and restored hundreds of VPS on it.
The cluster has 6 virtualization nodes and 4 storage nodes.

It was running fine for about 20 hours, no issues at all - all nodes showing up with green checkmarks.

All of a sudden, two of the nodes showed a red X.
We restarted corosync on the two nodes that had the X and
5 minutes after - all nodes have question marks but 1

The actual virtual machines seem to be intact and online, the cluster seems to be fine with Quorate = yes and 10 show online.

We can't see anything out of the ordinary in syslog, we did however, see this:

Code:

[ 1489.040938] HTB: quantum of class 10001 is big. Consider r2q change
[ 4049.114183] perf: interrupt took too long (2509 > 2500), lowering kernel.perf_event_max_sample_rate to 79500
[ 4808.865178] perf: interrupt took too long (3143 > 3136), lowering kernel.perf_event_max_sample_rate to 63500
[ 5760.579921] perf: interrupt took too long (3940 > 3928), lowering kernel.perf_event_max_sample_rate to 50750
[ 7373.209122] perf: interrupt took too long (4949 > 4925), lowering kernel.perf_event_max_sample_rate to 40250
[ 8590.353579] INFO: NMI handler (ghes_notify_nmi) took too long to run: 1.538 msecs
[10160.077215] perf: interrupt took too long (6205 > 6186), lowering kernel.perf_event_max_sample_rate to 32000
[15252.770846] INFO: NMI handler (ghes_notify_nmi) took too long to run: 1.670 msecs
[15726.728383] perf: interrupt took too long (7766 > 7756), lowering kernel.perf_event_max_sample_rate to 25750

Any clue what we should do to troubleshoot this?

Any and all help will be greatly appreciated.

Mor H. · Aug 9, 2019

Also, we're seeing this:

Code:

root@hyp08:~# systemctl status pvestatd
â pvestatd.service - PVE Status Daemon
  Loaded: loaded (/lib/systemd/system/pvestatd.service; enabled; vendor preset: enabled)
  Active: active (running) since Thu 2019-08-08 18:31:19 CEST; 1 day 3h ago
Main PID: 2217 (pvestatd)
   Tasks: 1 (limit: 13516)
  Memory: 196.8M
  CGroup: /system.slice/pvestatd.service
          ââ2217 pvestatdAug 09 21:34:15 hyp08 pvestatd[2217]: could not activate storage 'local-zfs', zfs error: cannot open 'rpool': no such pool
Aug 09 21:34:25 hyp08 pvestatd[2217]: zfs error: cannot open 'rpool': no such pool
Aug 09 21:34:25 hyp08 pvestatd[2217]: zfs error: cannot open 'rpool': no such pool
Aug 09 21:34:25 hyp08 pvestatd[2217]: could not activate storage 'local-zfs', zfs error: cannot open 'rpool': no such pool
Aug 09 21:34:35 hyp08 pvestatd[2217]: zfs error: cannot open 'rpool': no such pool
Aug 09 21:34:35 hyp08 pvestatd[2217]: zfs error: cannot open 'rpool': no such pool
Aug 09 21:34:35 hyp08 pvestatd[2217]: could not activate storage 'local-zfs', zfs error: cannot open 'rpool': no such pool
Aug 09 21:34:45 hyp08 pvestatd[2217]: zfs error: cannot open 'rpool': no such pool
Aug 09 21:34:45 hyp08 pvestatd[2217]: zfs error: cannot open 'rpool': no such pool
Aug 09 21:34:45 hyp08 pvestatd[2217]: could not activate storage 'local-zfs', zfs error: cannot open 'rpool': no such pool

but we do not use ZFS storage at all. so why does it throw that error. How can we fix this?

Search

Search

Question marks on all nodes but 1 in Proxmox VE 6

Mor H.

New Member

Mor H.

New Member