Node with question mark

Vasu Sreekumar

Active Member
Mar 3, 2018
123
34
28
50
St Louis MO USA
There is another way.

Migrate to another node in the cluster and start it. Target node must be a node which never had the same issue. If not it will not work.

Not an good way, but a temporary solution. I have 5 nodes in each cluster, so i move around.
 

Jospeh Huber

Member
Apr 18, 2016
76
3
8
40
Migration was no solution for us, the problem "migrated" also to the other nodes...

The reason for our "nodes with question mark" was a nightly offline backup in stop/start mode of some lxc containers.

Our workaround is:
We switched the backup to "snapshot" and we have no freezes anymore ;)
Hint: the workaround is only valid if the storage supports snapshots...
P.S. Never restart a container...
 

kroem

Member
Jul 12, 2016
44
0
6
34
I'm having the same problem - upgrading the cluster now to see if it helps.
 

Vasu Sreekumar

Active Member
Mar 3, 2018
123
34
28
50
St Louis MO USA
I also started facing same issue with 4.15.10 with KVM guests.

I have 4 nodes, only one shows green, rest all three grey.

But all nodes and guests pinging fine.
 

dcsapak

Proxmox Staff Member
Staff member
Feb 1, 2016
3,776
343
83
31
Vienna
what is the output of 'pvesh get /cluster/resources' ?
 

Vasu Sreekumar

Active Member
Mar 3, 2018
123
34
28
50
St Louis MO USA
root@S034:~# pvesh get /cluster/resources
200 OK
[
{
"cpu" : 0.0297653311349122,
"disk" : 0,
"diskread" : 9547070464,
"diskwrite" : 31136726528,
"id" : "qemu/204",
"maxcpu" : 8,
"maxdisk" : 53687091200,
"maxmem" : 8589934592,
"mem" : 1270067200,
"name" : "250",
"netin" : 331129685373,
"netout" : 366792946,
"node" : "S070",
"status" : "running",
"template" : 0,
"type" : "qemu",
"uptime" : 1407401,
"vmid" : 204
},
{
"cpu" : 0.0318802305705756,
"disk" : 0,
"diskread" : 25530366464,
"diskwrite" : 21063669760,
"id" : "qemu/294",
"maxcpu" : 8,
"maxdisk" : 53687091200,
"maxmem" : 8589934592,
"mem" : 2142736384,
"name" : "S984",
"netin" : 66459327158,
"netout" : 96076890,
"node" : "S048",
"status" : "running",
"template" : 0,
"type" : "qemu",
"uptime" : 236272,
"vmid" : 294
},
{
"cpu" : 0,
"disk" : 0,
"diskread" : 0,
"diskwrite" : 0,
"id" : "qemu/185",
"maxcpu" : 8,
"maxdisk" : 53687091200,
"maxmem" : 8589934592,
"mem" : 0,
"name" : "S582",
"netin" : 0,
"netout" : 0,
"node" : "S048",
"status" : "stopped",
"template" : 0,
"type" : "qemu",
"uptime" : 0,
"vmid" : 185
},
{
"cpu" : 0,
"disk" : 0,
"diskread" : 0,
"diskwrite" : 0,
"id" : "qemu/186",
"maxcpu" : 8,
"maxdisk" : 53687091200,
"maxmem" : 8589934592,
"mem" : 0,
"name" : "S628",
"netin" : 0,
"netout" : 0,
"node" : "S048",
"status" : "stopped",
"template" : 0,
"type" : "qemu",
"uptime" : 0,
"vmid" : 186
},
{
"cpu" : 0,
"disk" : 0,
"diskread" : 0,
"diskwrite" : 0,
"id" : "qemu/251",
"maxcpu" : 8,
"maxdisk" : 53687091200,
"maxmem" : 8589934592,
"mem" : 0,
"name" : "250",
"netin" : 0,
"netout" : 0,
"node" : "S034",
"status" : "stopped",
"template" : 1,
"type" : "qemu",
"uptime" : 0,
"vmid" : 251
},
{
"cpu" : 0.0249752746229761,
"disk" : 0,
"diskread" : 25065144320,
"diskwrite" : 18865004032,
"id" : "qemu/177",
"maxcpu" : 8,
"maxdisk" : 53687091200,
"maxmem" : 8589934592,
"mem" : 1593655296,
"name" : "S984",
"netin" : 38174364094,
"netout" : 41832197,
"node" : "S074",
"status" : "running",
"template" : 0,
"type" : "qemu",
"uptime" : 147775,
"vmid" : 177
},
{
"cpu" : 0,
"disk" : 0,
"diskread" : 0,
"diskwrite" : 0,
"id" : "qemu/252",
"maxcpu" : 8,
"maxdisk" : 53687091200,
"maxmem" : 8589934592,
"mem" : 0,
"name" : "250",
"netin" : 0,
"netout" : 0,
"node" : "S070",
"status" : "stopped",
"template" : 1,
"type" : "qemu",
"uptime" : 0,
"vmid" : 252
},
{
"cpu" : 0.0443813920325582,
"disk" : 0,
"diskread" : 19203254784,
"diskwrite" : 59236661248,
"id" : "qemu/201",
"maxcpu" : 8,
"maxdisk" : 53687091200,
"maxmem" : 8589934592,
"mem" : 2363322368,
"name" : "S142.datasoft.ws",
"netin" : 377437060060,
"netout" : 1123426692,
"node" : "S034",
"status" : "running",
"template" : 0,
"type" : "qemu",
"uptime" : 1657180,
"vmid" : 201
},
{
"cpu" : 0.337669773123858,
"disk" : 0,
"diskread" : 4564233728,
"diskwrite" : 2287325696,
"id" : "qemu/295",
"maxcpu" : 8,
"maxdisk" : 53687091200,
"maxmem" : 8589934592,
"mem" : 2390536192,
"name" : "X058.datasoft.ws",
"netin" : 353928892,
"netout" : 19262729,
"node" : "S070",
"status" : "running",
"template" : 0,
"type" : "qemu",
"uptime" : 1231,
"vmid" : 295
},
{
"cpu" : 0.0308204111337209,
"disk" : 0,
"diskread" : 3461092864,
"diskwrite" : 2121150976,
"id" : "qemu/203",
"maxcpu" : 8,
"maxdisk" : 107374182400,
"maxmem" : 8589934592,
"mem" : 1117327360,
"name" : "U110.datasoft.ws",
"netin" : 13659200225,
"netout" : 19041087,
"node" : "S034",
"status" : "running",
"template" : 0,
"type" : "qemu",
"uptime" : 56080,
"vmid" : 203
},
{
"cpu" : 0,
"disk" : 0,
"diskread" : 0,
"diskwrite" : 0,
"id" : "qemu/250",
"maxcpu" : 8,
"maxdisk" : 53687091200,
"maxmem" : 8589934592,
"mem" : 0,
"name" : "250",
"netin" : 0,
"netout" : 0,
"node" : "S048",
"status" : "stopped",
"template" : 1,
"type" : "qemu",
"uptime" : 0,
"vmid" : 250
},
{
"cpu" : 0,
"disk" : 0,
"diskread" : 0,
"diskwrite" : 0,
"id" : "qemu/182",
"maxcpu" : 8,
"maxdisk" : 53687091200,
"maxmem" : 8589934592,
"mem" : 0,
"name" : "S485",
"netin" : 0,
"netout" : 0,
"node" : "S048",
"status" : "stopped",
"template" : 0,
"type" : "qemu",
"uptime" : 0,
"vmid" : 182
},
{
"cpu" : 0,
"disk" : 0,
"diskread" : 0,
"diskwrite" : 0,
"id" : "qemu/184",
"maxcpu" : 8,
"maxdisk" : 53687091200,
"maxmem" : 8589934592,
"mem" : 0,
"name" : "S894",
"netin" : 0,
"netout" : 0,
"node" : "S048",
"status" : "stopped",
"template" : 0,
"type" : "qemu",
"uptime" : 0,
"vmid" : 184
},
{
"cpu" : 0.0951464245710976,
"disk" : 0,
"diskread" : 14175080448,
"diskwrite" : 13538449920,
"id" : "qemu/274",
"maxcpu" : 8,
"maxdisk" : 53687091200,
"maxmem" : 8589934592,
"mem" : 2401140736,
"name" : "P245.datasoft.ws",
"netin" : 140886100371,
"netout" : 434310726,
"node" : "S048",
"status" : "running",
"template" : 0,
"type" : "qemu",
"uptime" : 503630,
"vmid" : 274
},
{
"cpu" : 0,
"disk" : 0,
"diskread" : 0,
"diskwrite" : 0,
"id" : "qemu/180",
"maxcpu" : 8,
"maxdisk" : 53687091200,
"maxmem" : 8589934592,
"mem" : 0,
"name" : "S748",
"netin" : 0,
"netout" : 0,
"node" : "S048",
"status" : "stopped",
"template" : 0,
"type" : "qemu",
"uptime" : 0,
"vmid" : 180
},
{
"cpu" : 0,
"disk" : 0,
"diskread" : 0,
"diskwrite" : 0,
"id" : "qemu/253",
"maxcpu" : 8,
"maxdisk" : 53687091200,
"maxmem" : 8589934592,
"mem" : 0,
"name" : "250",
"netin" : 0,
"netout" : 0,
"node" : "S074",
"status" : "stopped",
"template" : 1,
"type" : "qemu",
"uptime" : 0,
"vmid" : 253
},
{
"cpu" : 0,
"disk" : 0,
"diskread" : 0,
"diskwrite" : 0,
"id" : "qemu/181",
"maxcpu" : 8,
"maxdisk" : 53687091200,
"maxmem" : 8589934592,
"mem" : 0,
"name" : "S389",
"netin" : 0,
"netout" : 0,
"node" : "S048",
"status" : "stopped",
"template" : 0,
"type" : "qemu",
"uptime" : 0,
"vmid" : 181
},
{
"cpu" : 0.0420527387468992,
"disk" : 0,
"diskread" : 10289083904,
"diskwrite" : 24731916288,
"id" : "qemu/202",
"maxcpu" : 8,
"maxdisk" : 53687091200,
"maxmem" : 12884901888,
"mem" : 2014048256,
"name" : "S963",
"netin" : 126177079232,
"netout" : 264081588,
"node" : "S034",
"status" : "running",
"template" : 0,
"type" : "qemu",
"uptime" : 415840,
"vmid" : 202
},
{
"cpu" : 0,
"disk" : 0,
"diskread" : 0,
"diskwrite" : 0,
"id" : "qemu/179",
"maxcpu" : 8,
"maxdisk" : 53687091200,
"maxmem" : 8589934592,
"mem" : 0,
"name" : "S784",
"netin" : 0,
"netout" : 0,
"node" : "S048",
"status" : "stopped",
"template" : 0,
"type" : "qemu",
"uptime" : 0,
"vmid" : 179
},
{
"cpu" : 0,
"disk" : 0,
"diskread" : 0,
"diskwrite" : 0,
"id" : "qemu/183",
"maxcpu" : 8,
"maxdisk" : 53687091200,
"maxmem" : 8589934592,
"mem" : 0,
"name" : "S748",
"netin" : 0,
"netout" : 0,
"node" : "S048",
"status" : "stopped",
"template" : 0,
"type" : "qemu",
"uptime" : 0,
"vmid" : 183
},
{
"cpu" : 0.18999071218876,
"disk" : 0,
"diskread" : 47669778432,
"diskwrite" : 73533008896,
"id" : "qemu/261",
"maxcpu" : 8,
"maxdisk" : 53687091200,
"maxmem" : 8589934592,
"mem" : 2913050624,
"name" : "V131.datasoft.ws",
"netin" : 195453681545,
"netout" : 693492590,
"node" : "S034",
"status" : "running",
"template" : 0,
"type" : "qemu",
"uptime" : 738122,
"vmid" : 261
},
{
"cpu" : 0.0824615562046685,
"disk" : 110436286464,
"id" : "node/S048",
"level" : "",
"maxcpu" : 24,
"maxdisk" : 1921430585344,
"maxmem" : 67541700608,
"mem" : 24357752832,
"node" : "S048",
"status" : "online",
"type" : "node",
"uptime" : 3524600
},
{
"cpu" : 0.12062177031573,
"disk" : 55469277184,
"id" : "node/S070",
"level" : "",
"maxcpu" : 24,
"maxdisk" : 1921453260800,
"maxmem" : 50629959680,
"mem" : 23707160576,
"node" : "S070",
"status" : "online",
"type" : "node",
"uptime" : 1826219
},
{
"cpu" : 0.0150834663624653,
"disk" : 33034076160,
"id" : "node/S074",
"level" : "",
"maxcpu" : 24,
"maxdisk" : 1921452736512,
"maxmem" : 50629967872,
"mem" : 15279632384,
"node" : "S074",
"status" : "online",
"type" : "node",
"uptime" : 1442498
},
{
"cpu" : 0.10414983044466,
"disk" : 135120945152,
"id" : "node/S034",
"level" : "",
"maxcpu" : 24,
"maxdisk" : 1850711736320,
"maxmem" : 67541381120,
"mem" : 30377394176,
"node" : "S034",
"status" : "online",
"type" : "node",
"uptime" : 1839311
},
{
"disk" : 98304,
"id" : "storage/S048/local-zfs",
"maxdisk" : 1810994483200,
"node" : "S048",
"status" : "available",
"storage" : "local-zfs",
"type" : "storage"
},
{
"disk" : 98304,
"id" : "storage/S070/local-zfs",
"maxdisk" : 1865984126976,
"node" : "S070",
"status" : "available",
"storage" : "local-zfs",
"type" : "storage"
},
{
"disk" : 98304,
"id" : "storage/S074/local-zfs",
"maxdisk" : 1888418775040,
"node" : "S074",
"status" : "available",
"storage" : "local-zfs",
"type" : "storage"
},
{
"disk" : 70697439232,
"id" : "storage/S034/local-zfs",
"maxdisk" : 1786288304128,
"node" : "S034",
"status" : "available",
"storage" : "local-zfs",
"type" : "storage"
},
{
"disk" : 110436286464,
"id" : "storage/S048/local2",
"maxdisk" : 1921430585344,
"node" : "S048",
"status" : "available",
"storage" : "local2",
"type" : "storage"
},
{
"disk" : 55469277184,
"id" : "storage/S070/local2",
"maxdisk" : 1921453260800,
"node" : "S070",
"status" : "available",
"storage" : "local2",
"type" : "storage"
},
{
"disk" : 33034076160,
"id" : "storage/S074/local2",
"maxdisk" : 1921452736512,
"node" : "S074",
"status" : "available",
"storage" : "local2",
"type" : "storage"
},
{
"disk" : 135120945152,
"id" : "storage/S034/local2",
"maxdisk" : 1850711736320,
"node" : "S034",
"status" : "available",
"storage" : "local2",
"type" : "storage"
},
{
"disk" : 110436286464,
"id" : "storage/S048/local",
"maxdisk" : 1921430585344,
"node" : "S048",
"status" : "available",
"storage" : "local",
"type" : "storage"
},
{
"disk" : 55469277184,
"id" : "storage/S070/local",
"maxdisk" : 1921453260800,
"node" : "S070",
"status" : "available",
"storage" : "local",
"type" : "storage"
},
{
"disk" : 33034076160,
"id" : "storage/S074/local",
"maxdisk" : 1921452736512,
"node" : "S074",
"status" : "available",
"storage" : "local",
"type" : "storage"
},
{
"disk" : 135120945152,
"id" : "storage/S034/local",
"maxdisk" : 1850711736320,
"node" : "S034",
"status" : "available",
"storage" : "local",
"type" : "storage"
},
{
"disk" : 419865346048,
"id" : "storage/S048/xtemplates",
"maxdisk" : 495058247680,
"node" : "S048",
"status" : "available",
"storage" : "xtemplates",
"type" : "storage"
},
{
"disk" : 113438121984,
"id" : "storage/S070/xtemplates",
"maxdisk" : 482630778880,
"node" : "S070",
"status" : "available",
"storage" : "xtemplates",
"type" : "storage"
},
{
"disk" : 76128968704,
"id" : "storage/S074/xtemplates",
"maxdisk" : 482637598720,
"node" : "S074",
"status" : "available",
"storage" : "xtemplates",
"type" : "storage"
},
{
"disk" : 148937924608,
"id" : "storage/S034/xtemplates",
"maxdisk" : 495023693824,
"node" : "S034",
"status" : "available",
"storage" : "xtemplates",
"type" : "storage"
}
]
 

dcsapak

Proxmox Staff Member
Staff member
Feb 1, 2016
3,776
343
83
31
Vienna
can you also post a screenshot, and open the javascript console and check if there are any errors ?
 

Vasu Sreekumar

Active Member
Mar 3, 2018
123
34
28
50
St Louis MO USA
There was no error messages.

Running service pve-cluster restart cleared the grey issue.

No reboot was required.

All 4 nodes are on same CISCO 1000 mbps switch.
 

re-host.eu

New Member
Oct 3, 2017
29
0
1
24
Thank you, however the pvedaemon doesnt restart.. no error, just hangs around for now at least 5 mins at the command "service pvedaemon restart"

EDIT:

The following error Message shows in systemctl: start failed - unable to create socket - Address already in use
 

Kaijia Feng

New Member
Mar 8, 2017
5
0
1
26
The problem seems to be getting worse, now I cannot even run an update for some lock in pve-ha-lrm even though I didn't have an HA setup.
 

barad1tos

New Member
Mar 20, 2018
1
0
1
29
Same problem right here. Any solutions? Neither upgrading kernel nor restarting services don't work for me.
 

Kaijia Feng

New Member
Mar 8, 2017
5
0
1
26
Same problem right here. Any solutions? Neither upgrading kernel nor restarting services don't work for me.
In short, no. Basically, you need to restart every single node to solve the problem, which, I believe, happens when one node has too large transfers for a long time that it disrupted the corosync communication on that node. At this point, only this node should go question-marked. However, a bug with corosync 2.4.2 (fixed in 2.4.3) might be the reason that brought down the cluster. I filed a bug report to Proxmox earlier and the dev said they plan to upgrade corosync to 2.4.4 "soon".

I'm not entirely sure that bug is the cause of the problem. But, nevertheless, it has to be something with corosync. So I guess, we might just want to wait for the 2.4.4 update.

BTW: I now manually limit transfer, e.g. to 95Mbps on bottleneck servers, and now the problem is rare (happened a few times but self-healed quickly).
 

iFargle

New Member
Mar 22, 2018
9
0
1
27
Turns out mine was mostly caused by a borked DNS server (which, in my case was caused by borked backend storage due to a mistake on my part)
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE and Proxmox Mail Gateway. We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get your own in 60 seconds.

Buy now!