Node with question mark

decibel83

Active Member
Oct 15, 2008
159
1
38
Hi,

suddenly one node in my Proxmox 5.1 cluster becomes unavailable into the web interface and it's icon becomes grey with a question mark, like in the following screenshots:

Screen Shot 2018-02-05 at 09.18.32.png

This happened three times on three different nodes (on node01 and node02, I rebooted them to solve the problem).

All virtual machines on the node are running good, and the web interface on the failed node is reachable, but it shows the same (if I connect to the node03 web interface all nodes are green except itself).

All nodes are correctly pingable from node03.

In the datacenter summary all 11 nodes are Online.

I don't see any error in /var/log/syslog on node03.

The call to the status API (https://node1:8006/api2/json/cluster/status) returns all nodes online but node03 is marked as local cand has "level":null instead of "level":"":

Code:
{"data":[{"nodes":11,"id":"cluster","type":"cluster","name":"mycluster","version":11,"quorate":1},
{"nodeid":6,"id":"node/node10","local":0,"name":"node10","online":1,"ip":"192.168.60.10","level":"","type":"node"},
{"ip":"192.168.60.2","level":"","type":"node","nodeid":10,"id":"node/node02","local":0,"name":"node02","online":1},
{"ip":"192.168.60.11","level":"","type":"node","nodeid":2,"id":"node/node11","local":0,"name":"node11","online":1},
{"ip":"192.168.60.3","level":null,"type":"node","nodeid":7,"id":"node/node03","local":0,"name":"node03","online":1},
{"online":1,"nodeid":8,"id":"node/node01","name":"node01","local":1,"ip":"192.168.60.1","level":"","type":"node"},
{"name":"node09","local":0,"nodeid":11,"id":"node/node09","online":1,"type":"node","level":"","ip":"192.168.60.9"},
{"level":"","type":"node","ip":"192.168.60.5","id":"node/node05","nodeid":1,"local":0,"name":"node05","online":1},
{"ip":"192.168.60.6","level":"","type":"node","id":"node/node06","nodeid":3,"name":"node06","local":0,"online":1},
{"ip":"192.168.60.8","type":"node","level":"","online":1,"local":0,"name":"node08","id":"node/node08","nodeid":5},
{"level":"","type":"node","ip":"192.168.60.4","nodeid":9,"id":"node/node04","local":0,"name":"node04","online":1},
{"ip":"192.168.60.7","type":"node","level":"","name":"node07","local":0,"id":"node/node07","nodeid":4,"online":1}]}
This is the pvecm status output from node03:

Code:
root@node03:~# pvecm status
Quorum information
------------------
Date:             Mon Feb  5 10:09:54 2018
Quorum provider:  corosync_votequorum
Nodes:            11
Node ID:          0x00000007
Ring ID:          8/1256
Quorate:          Yes

Votequorum information
----------------------
Expected votes:   11
Highest expected: 11
Total votes:      11
Quorum:           6
Flags:            Quorate

Membership information
----------------------
    Nodeid      Votes Name
0x00000008          1 192.168.60.1
0x0000000a          1 192.168.60.2
0x00000007          1 192.168.60.3 (local)
0x00000009          1 192.168.60.4
0x00000001          1 192.168.60.5
0x00000003          1 192.168.60.6
0x00000004          1 192.168.60.7
0x00000005          1 192.168.60.8
0x0000000b          1 192.168.60.9
0x00000006          1 192.168.60.10
0x00000002          1 192.168.60.11
All nodes are updated to the last version of Proxmox, the last version of the kernel (I updated all packages yesterday):

Code:
root@node03:~# pveversion -v
proxmox-ve: 5.1-38 (running kernel: 4.13.13-5-pve)
pve-manager: 5.1-43 (running version: 5.1-43/bdb08029)
pve-kernel-4.13.4-1-pve: 4.13.4-26
pve-kernel-4.13.13-2-pve: 4.13.13-33
pve-kernel-4.13.13-5-pve: 4.13.13-38
pve-kernel-4.13.13-3-pve: 4.13.13-34
libpve-http-server-perl: 2.0-8
lvm2: 2.02.168-pve6
corosync: 2.4.2-pve3
libqb0: 1.0.1-1
pve-cluster: 5.0-19
qemu-server: 5.0-20
pve-firmware: 2.0-3
libpve-common-perl: 5.0-25
libpve-guest-common-perl: 2.0-14
libpve-access-control: 5.0-7
libpve-storage-perl: 5.0-17
pve-libspice-server1: 0.12.8-3
vncterm: 1.5-3
pve-docs: 5.1-16
pve-qemu-kvm: 2.9.1-6
pve-container: 2.0-18
pve-firewall: 3.0-5
pve-ha-manager: 2.0-4
ksm-control-daemon: 1.2-2
glusterfs-client: 3.8.8-1
lxc-pve: 2.1.1-2
lxcfs: 2.0.8-1
criu: 2.11.1-1~bpo90
novnc-pve: 0.6-4
smartmontools: 6.5+svn4324-1
zfsutils-linux: 0.7.4-pve2~bpo9
Could you help me please?

Thanks!
 
Last edited:

masterdaweb

Member
Apr 17, 2017
78
3
13
27
Same problem here, I've just posted another thread about this.

I think this is caused by the last updates.
 

fadmedi

New Member
Feb 17, 2018
2
0
1
32
hello everyone,

I have the 5.1 version
I have the same problem happened 3 times since last month in 2 differents nodes. I have to reboot the node to rectify the issue, this is not a permanent resolution.
In console, my cluster status is ok. I have composition of 3 nodes.
1 node is not responding on the web interface, but its containers works fine.
Guys please help me, i would like to find a permanent resolution for this problem.

upload_2018-2-17_17-43-36.png


upload_2018-2-17_17-42-26.png
 
Last edited:

dcsapak

Proxmox Staff Member
Staff member
Feb 1, 2016
4,019
365
88
31
Vienna
you should check if the pvestatd daemon still runs, and if maybe there is a storage which blocks (e.g. nfs) since the pvestatd is responsible for collecting/sending that information across the cluster, if
it hangs/crashes (most often because of a error with a storage) it stops sending that information
 
  • Like
Reactions: NolesFan78

lastb0isct

Member
Dec 29, 2015
61
0
6
34
I've been having this issue as well...whenever i initiate a backup the system faults and is thrown into this state. There really isn't anything in the logs to go off of. Restarting services doesn't seem to fix the issue either. My backup is being sent to a nfs share, but it NEVER had issues like this before 5.x version of Proxmox.

pvestatd is still running and a restart doesn't solve the issue.
 

masterdaweb

Member
Apr 17, 2017
78
3
13
27
I've been having this issue as well...whenever i initiate a backup the system faults and is thrown into this state. There really isn't anything in the logs to go off of. Restarting services doesn't seem to fix the issue either. My backup is being sent to a nfs share, but it NEVER had issues like this before 5.x version of Proxmox.

pvestatd is still running and a restart doesn't solve the issue.
same here, every week
 

fadmedi

New Member
Feb 17, 2018
2
0
1
32
Meanwhile waiting update from prox team, I have decided to downgrade 1 node to the 5.0 version. I do not have issue since 4 days now.
 

masterdaweb

Member
Apr 17, 2017
78
3
13
27
Meanwhile waiting update from prox team, I have decided to downgrade 1 node to the 5.0 version. I do not have issue since 4 days now.
In my case everything runs good for 2 - 3 weeks, and then it happens.

I'm investigating if it could be caused by using Unicast instead of Multicast.

Are you using Unicast too ?
 

lastb0isct

Member
Dec 29, 2015
61
0
6
34
I'm having issues with this constantly now, not just when backing up. I'm really not able to find any possiblities as to why this is happening. Even simple poweroff's of CTs cause this to happen now. Is there anyone on the proxmox team that would be able to help us with this?!
 

tom

Proxmox Staff Member
Staff member
Aug 29, 2006
13,816
447
103
Without knowing your setup, I assume the issue is somewhere in your cluster network. Most issues are caused when storage and cluster network is not separated and/or the network does not meet the requirement regarding latency and reliability. Or storage overloads.

Do you have a separate cluster network? Test with omping if it works reliable.

We can deeply analyse your setup by logging into your cluster via SSH, please contact our enterprise support team (subscription needed).
 

venk25

New Member
Feb 5, 2018
6
0
1
46
I ran into this grey question mark situation yesterday on PVE 5.1-43. Single node setup (setup PVE fresh 2 weeks ago) with all default options; no cluster. Rebooted node and it fixed the issue.

After reboot, I applied all updates - now at 5.1-46. Let’s see if this happens again.
 

Jospeh Huber

Member
Apr 18, 2016
76
3
8
40
5.1.46, now this happens the second time in this week on the same node in my cluster.
When I restart the "pvestatd" on the node, the KVMs become visible again.
Some of the lxc containers are running and some are dead.
"pct list" hangs...
 
  • Like
Reactions: frawst

masterdaweb

Member
Apr 17, 2017
78
3
13
27
It's happening every week for me too. I have 12 nodes, and when it happens, I have to stop all proxmox services, in every node:

service pve-cluster stop
service corosync stop
service pvestatd stop
service pveproxy stop
service pvedaemon stop

and then

service pve-cluster start
service corosync start
service pvestatd start
service pveproxy start
service pvedaemon start
 

Jospeh Huber

Member
Apr 18, 2016
76
3
8
40
It's happening every week for me too. I have 12 nodes, and when it happens, I have to stop all proxmox services, in every node:

service pve-cluster stop
service corosync stop
service pvestatd stop
service pveproxy stop
service pvedaemon stop

and then

service pve-cluster start
service corosync start
service pvestatd start
service pveproxy start
service pvedaemon start
Thanks for the hint, I will also try the "solution" from here ... we don't have zfs:
https://forum.proxmox.com/threads/proxmox-ve-5-1-zfs-kernel-tainted-pvestatd-frozen.38408/#post-189727
 

Jospeh Huber

Member
Apr 18, 2016
76
3
8
40
Unfortunately both solutions do not work for me.
Everyday one node crashes... unusable.
@tom Is there an upgrade planned for this issue?
 

Vasu Sreekumar

Active Member
Mar 3, 2018
123
34
28
50
St Louis MO USA
yes, I also face same issue.

I am having sleepless nights for last one week.

Proxmox is a nightmare now, everyday 2 or 3 nodes crash for me. I have 25 nodes with LXC.

No reply from Proxmox till now.
 

Kaijia Feng

New Member
Mar 8, 2017
5
0
1
26
Same issue here on a 16-node LXC cluster. Only begin with a recent update and reboot, so I also suspect this to be an issue with the kernel. But, I also notice the issue persists for an hour or two everytime, then everything backs to normal. So instead of rebooting every node, I just wait (of course, this is not a solution for hosting provider).
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE and Proxmox Mail Gateway. We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get your own in 60 seconds.

Buy now!