Question marks on all nodes but 1 in Proxmox VE 6

Mor H.

New Member
Jun 3, 2019
18
1
3
32
Hi everyone,

We recently installed a cluster with Proxmox VE 6 and restored hundreds of VPS on it.
The cluster has 6 virtualization nodes and 4 storage nodes.

It was running fine for about 20 hours, no issues at all - all nodes showing up with green checkmarks.

All of a sudden, two of the nodes showed a red X.
We restarted corosync on the two nodes that had the X and
5 minutes after - all nodes have question marks but 1

The actual virtual machines seem to be intact and online, the cluster seems to be fine with Quorate = yes and 10 show online.

We can't see anything out of the ordinary in syslog, we did however, see this:
Code:
[ 1489.040938] HTB: quantum of class 10001 is big. Consider r2q change
[ 4049.114183] perf: interrupt took too long (2509 > 2500), lowering kernel.perf_event_max_sample_rate to 79500
[ 4808.865178] perf: interrupt took too long (3143 > 3136), lowering kernel.perf_event_max_sample_rate to 63500
[ 5760.579921] perf: interrupt took too long (3940 > 3928), lowering kernel.perf_event_max_sample_rate to 50750
[ 7373.209122] perf: interrupt took too long (4949 > 4925), lowering kernel.perf_event_max_sample_rate to 40250
[ 8590.353579] INFO: NMI handler (ghes_notify_nmi) took too long to run: 1.538 msecs
[10160.077215] perf: interrupt took too long (6205 > 6186), lowering kernel.perf_event_max_sample_rate to 32000
[15252.770846] INFO: NMI handler (ghes_notify_nmi) took too long to run: 1.670 msecs
[15726.728383] perf: interrupt took too long (7766 > 7756), lowering kernel.perf_event_max_sample_rate to 25750

Any clue what we should do to troubleshoot this?

Any and all help will be greatly appreciated.
 
Also, we're seeing this:
Code:
root@hyp08:~# systemctl status pvestatd
â pvestatd.service - PVE Status Daemon
  Loaded: loaded (/lib/systemd/system/pvestatd.service; enabled; vendor preset: enabled)
  Active: active (running) since Thu 2019-08-08 18:31:19 CEST; 1 day 3h ago
Main PID: 2217 (pvestatd)
   Tasks: 1 (limit: 13516)
  Memory: 196.8M
  CGroup: /system.slice/pvestatd.service
          ââ2217 pvestatdAug 09 21:34:15 hyp08 pvestatd[2217]: could not activate storage 'local-zfs', zfs error: cannot open 'rpool': no such pool
Aug 09 21:34:25 hyp08 pvestatd[2217]: zfs error: cannot open 'rpool': no such pool
Aug 09 21:34:25 hyp08 pvestatd[2217]: zfs error: cannot open 'rpool': no such pool
Aug 09 21:34:25 hyp08 pvestatd[2217]: could not activate storage 'local-zfs', zfs error: cannot open 'rpool': no such pool
Aug 09 21:34:35 hyp08 pvestatd[2217]: zfs error: cannot open 'rpool': no such pool
Aug 09 21:34:35 hyp08 pvestatd[2217]: zfs error: cannot open 'rpool': no such pool
Aug 09 21:34:35 hyp08 pvestatd[2217]: could not activate storage 'local-zfs', zfs error: cannot open 'rpool': no such pool
Aug 09 21:34:45 hyp08 pvestatd[2217]: zfs error: cannot open 'rpool': no such pool
Aug 09 21:34:45 hyp08 pvestatd[2217]: zfs error: cannot open 'rpool': no such pool
Aug 09 21:34:45 hyp08 pvestatd[2217]: could not activate storage 'local-zfs', zfs error: cannot open 'rpool': no such pool

but we do not use ZFS storage at all. so why does it throw that error. How can we fix this?
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!