Node with question mark

There is another way.

Migrate to another node in the cluster and start it. Target node must be a node which never had the same issue. If not it will not work.

Not an good way, but a temporary solution. I have 5 nodes in each cluster, so i move around.
 
Migration was no solution for us, the problem "migrated" also to the other nodes...

The reason for our "nodes with question mark" was a nightly offline backup in stop/start mode of some lxc containers.

Our workaround is:
We switched the backup to "snapshot" and we have no freezes anymore ;)
Hint: the workaround is only valid if the storage supports snapshots...
P.S. Never restart a container...
 
I'm having the same problem - upgrading the cluster now to see if it helps.
 
what is the output of 'pvesh get /cluster/resources' ?
 
root@S034:~# pvesh get /cluster/resources
200 OK
[
{
"cpu" : 0.0297653311349122,
"disk" : 0,
"diskread" : 9547070464,
"diskwrite" : 31136726528,
"id" : "qemu/204",
"maxcpu" : 8,
"maxdisk" : 53687091200,
"maxmem" : 8589934592,
"mem" : 1270067200,
"name" : "250",
"netin" : 331129685373,
"netout" : 366792946,
"node" : "S070",
"status" : "running",
"template" : 0,
"type" : "qemu",
"uptime" : 1407401,
"vmid" : 204
},
{
"cpu" : 0.0318802305705756,
"disk" : 0,
"diskread" : 25530366464,
"diskwrite" : 21063669760,
"id" : "qemu/294",
"maxcpu" : 8,
"maxdisk" : 53687091200,
"maxmem" : 8589934592,
"mem" : 2142736384,
"name" : "S984",
"netin" : 66459327158,
"netout" : 96076890,
"node" : "S048",
"status" : "running",
"template" : 0,
"type" : "qemu",
"uptime" : 236272,
"vmid" : 294
},
{
"cpu" : 0,
"disk" : 0,
"diskread" : 0,
"diskwrite" : 0,
"id" : "qemu/185",
"maxcpu" : 8,
"maxdisk" : 53687091200,
"maxmem" : 8589934592,
"mem" : 0,
"name" : "S582",
"netin" : 0,
"netout" : 0,
"node" : "S048",
"status" : "stopped",
"template" : 0,
"type" : "qemu",
"uptime" : 0,
"vmid" : 185
},
{
"cpu" : 0,
"disk" : 0,
"diskread" : 0,
"diskwrite" : 0,
"id" : "qemu/186",
"maxcpu" : 8,
"maxdisk" : 53687091200,
"maxmem" : 8589934592,
"mem" : 0,
"name" : "S628",
"netin" : 0,
"netout" : 0,
"node" : "S048",
"status" : "stopped",
"template" : 0,
"type" : "qemu",
"uptime" : 0,
"vmid" : 186
},
{
"cpu" : 0,
"disk" : 0,
"diskread" : 0,
"diskwrite" : 0,
"id" : "qemu/251",
"maxcpu" : 8,
"maxdisk" : 53687091200,
"maxmem" : 8589934592,
"mem" : 0,
"name" : "250",
"netin" : 0,
"netout" : 0,
"node" : "S034",
"status" : "stopped",
"template" : 1,
"type" : "qemu",
"uptime" : 0,
"vmid" : 251
},
{
"cpu" : 0.0249752746229761,
"disk" : 0,
"diskread" : 25065144320,
"diskwrite" : 18865004032,
"id" : "qemu/177",
"maxcpu" : 8,
"maxdisk" : 53687091200,
"maxmem" : 8589934592,
"mem" : 1593655296,
"name" : "S984",
"netin" : 38174364094,
"netout" : 41832197,
"node" : "S074",
"status" : "running",
"template" : 0,
"type" : "qemu",
"uptime" : 147775,
"vmid" : 177
},
{
"cpu" : 0,
"disk" : 0,
"diskread" : 0,
"diskwrite" : 0,
"id" : "qemu/252",
"maxcpu" : 8,
"maxdisk" : 53687091200,
"maxmem" : 8589934592,
"mem" : 0,
"name" : "250",
"netin" : 0,
"netout" : 0,
"node" : "S070",
"status" : "stopped",
"template" : 1,
"type" : "qemu",
"uptime" : 0,
"vmid" : 252
},
{
"cpu" : 0.0443813920325582,
"disk" : 0,
"diskread" : 19203254784,
"diskwrite" : 59236661248,
"id" : "qemu/201",
"maxcpu" : 8,
"maxdisk" : 53687091200,
"maxmem" : 8589934592,
"mem" : 2363322368,
"name" : "S142.datasoft.ws",
"netin" : 377437060060,
"netout" : 1123426692,
"node" : "S034",
"status" : "running",
"template" : 0,
"type" : "qemu",
"uptime" : 1657180,
"vmid" : 201
},
{
"cpu" : 0.337669773123858,
"disk" : 0,
"diskread" : 4564233728,
"diskwrite" : 2287325696,
"id" : "qemu/295",
"maxcpu" : 8,
"maxdisk" : 53687091200,
"maxmem" : 8589934592,
"mem" : 2390536192,
"name" : "X058.datasoft.ws",
"netin" : 353928892,
"netout" : 19262729,
"node" : "S070",
"status" : "running",
"template" : 0,
"type" : "qemu",
"uptime" : 1231,
"vmid" : 295
},
{
"cpu" : 0.0308204111337209,
"disk" : 0,
"diskread" : 3461092864,
"diskwrite" : 2121150976,
"id" : "qemu/203",
"maxcpu" : 8,
"maxdisk" : 107374182400,
"maxmem" : 8589934592,
"mem" : 1117327360,
"name" : "U110.datasoft.ws",
"netin" : 13659200225,
"netout" : 19041087,
"node" : "S034",
"status" : "running",
"template" : 0,
"type" : "qemu",
"uptime" : 56080,
"vmid" : 203
},
{
"cpu" : 0,
"disk" : 0,
"diskread" : 0,
"diskwrite" : 0,
"id" : "qemu/250",
"maxcpu" : 8,
"maxdisk" : 53687091200,
"maxmem" : 8589934592,
"mem" : 0,
"name" : "250",
"netin" : 0,
"netout" : 0,
"node" : "S048",
"status" : "stopped",
"template" : 1,
"type" : "qemu",
"uptime" : 0,
"vmid" : 250
},
{
"cpu" : 0,
"disk" : 0,
"diskread" : 0,
"diskwrite" : 0,
"id" : "qemu/182",
"maxcpu" : 8,
"maxdisk" : 53687091200,
"maxmem" : 8589934592,
"mem" : 0,
"name" : "S485",
"netin" : 0,
"netout" : 0,
"node" : "S048",
"status" : "stopped",
"template" : 0,
"type" : "qemu",
"uptime" : 0,
"vmid" : 182
},
{
"cpu" : 0,
"disk" : 0,
"diskread" : 0,
"diskwrite" : 0,
"id" : "qemu/184",
"maxcpu" : 8,
"maxdisk" : 53687091200,
"maxmem" : 8589934592,
"mem" : 0,
"name" : "S894",
"netin" : 0,
"netout" : 0,
"node" : "S048",
"status" : "stopped",
"template" : 0,
"type" : "qemu",
"uptime" : 0,
"vmid" : 184
},
{
"cpu" : 0.0951464245710976,
"disk" : 0,
"diskread" : 14175080448,
"diskwrite" : 13538449920,
"id" : "qemu/274",
"maxcpu" : 8,
"maxdisk" : 53687091200,
"maxmem" : 8589934592,
"mem" : 2401140736,
"name" : "P245.datasoft.ws",
"netin" : 140886100371,
"netout" : 434310726,
"node" : "S048",
"status" : "running",
"template" : 0,
"type" : "qemu",
"uptime" : 503630,
"vmid" : 274
},
{
"cpu" : 0,
"disk" : 0,
"diskread" : 0,
"diskwrite" : 0,
"id" : "qemu/180",
"maxcpu" : 8,
"maxdisk" : 53687091200,
"maxmem" : 8589934592,
"mem" : 0,
"name" : "S748",
"netin" : 0,
"netout" : 0,
"node" : "S048",
"status" : "stopped",
"template" : 0,
"type" : "qemu",
"uptime" : 0,
"vmid" : 180
},
{
"cpu" : 0,
"disk" : 0,
"diskread" : 0,
"diskwrite" : 0,
"id" : "qemu/253",
"maxcpu" : 8,
"maxdisk" : 53687091200,
"maxmem" : 8589934592,
"mem" : 0,
"name" : "250",
"netin" : 0,
"netout" : 0,
"node" : "S074",
"status" : "stopped",
"template" : 1,
"type" : "qemu",
"uptime" : 0,
"vmid" : 253
},
{
"cpu" : 0,
"disk" : 0,
"diskread" : 0,
"diskwrite" : 0,
"id" : "qemu/181",
"maxcpu" : 8,
"maxdisk" : 53687091200,
"maxmem" : 8589934592,
"mem" : 0,
"name" : "S389",
"netin" : 0,
"netout" : 0,
"node" : "S048",
"status" : "stopped",
"template" : 0,
"type" : "qemu",
"uptime" : 0,
"vmid" : 181
},
{
"cpu" : 0.0420527387468992,
"disk" : 0,
"diskread" : 10289083904,
"diskwrite" : 24731916288,
"id" : "qemu/202",
"maxcpu" : 8,
"maxdisk" : 53687091200,
"maxmem" : 12884901888,
"mem" : 2014048256,
"name" : "S963",
"netin" : 126177079232,
"netout" : 264081588,
"node" : "S034",
"status" : "running",
"template" : 0,
"type" : "qemu",
"uptime" : 415840,
"vmid" : 202
},
{
"cpu" : 0,
"disk" : 0,
"diskread" : 0,
"diskwrite" : 0,
"id" : "qemu/179",
"maxcpu" : 8,
"maxdisk" : 53687091200,
"maxmem" : 8589934592,
"mem" : 0,
"name" : "S784",
"netin" : 0,
"netout" : 0,
"node" : "S048",
"status" : "stopped",
"template" : 0,
"type" : "qemu",
"uptime" : 0,
"vmid" : 179
},
{
"cpu" : 0,
"disk" : 0,
"diskread" : 0,
"diskwrite" : 0,
"id" : "qemu/183",
"maxcpu" : 8,
"maxdisk" : 53687091200,
"maxmem" : 8589934592,
"mem" : 0,
"name" : "S748",
"netin" : 0,
"netout" : 0,
"node" : "S048",
"status" : "stopped",
"template" : 0,
"type" : "qemu",
"uptime" : 0,
"vmid" : 183
},
{
"cpu" : 0.18999071218876,
"disk" : 0,
"diskread" : 47669778432,
"diskwrite" : 73533008896,
"id" : "qemu/261",
"maxcpu" : 8,
"maxdisk" : 53687091200,
"maxmem" : 8589934592,
"mem" : 2913050624,
"name" : "V131.datasoft.ws",
"netin" : 195453681545,
"netout" : 693492590,
"node" : "S034",
"status" : "running",
"template" : 0,
"type" : "qemu",
"uptime" : 738122,
"vmid" : 261
},
{
"cpu" : 0.0824615562046685,
"disk" : 110436286464,
"id" : "node/S048",
"level" : "",
"maxcpu" : 24,
"maxdisk" : 1921430585344,
"maxmem" : 67541700608,
"mem" : 24357752832,
"node" : "S048",
"status" : "online",
"type" : "node",
"uptime" : 3524600
},
{
"cpu" : 0.12062177031573,
"disk" : 55469277184,
"id" : "node/S070",
"level" : "",
"maxcpu" : 24,
"maxdisk" : 1921453260800,
"maxmem" : 50629959680,
"mem" : 23707160576,
"node" : "S070",
"status" : "online",
"type" : "node",
"uptime" : 1826219
},
{
"cpu" : 0.0150834663624653,
"disk" : 33034076160,
"id" : "node/S074",
"level" : "",
"maxcpu" : 24,
"maxdisk" : 1921452736512,
"maxmem" : 50629967872,
"mem" : 15279632384,
"node" : "S074",
"status" : "online",
"type" : "node",
"uptime" : 1442498
},
{
"cpu" : 0.10414983044466,
"disk" : 135120945152,
"id" : "node/S034",
"level" : "",
"maxcpu" : 24,
"maxdisk" : 1850711736320,
"maxmem" : 67541381120,
"mem" : 30377394176,
"node" : "S034",
"status" : "online",
"type" : "node",
"uptime" : 1839311
},
{
"disk" : 98304,
"id" : "storage/S048/local-zfs",
"maxdisk" : 1810994483200,
"node" : "S048",
"status" : "available",
"storage" : "local-zfs",
"type" : "storage"
},
{
"disk" : 98304,
"id" : "storage/S070/local-zfs",
"maxdisk" : 1865984126976,
"node" : "S070",
"status" : "available",
"storage" : "local-zfs",
"type" : "storage"
},
{
"disk" : 98304,
"id" : "storage/S074/local-zfs",
"maxdisk" : 1888418775040,
"node" : "S074",
"status" : "available",
"storage" : "local-zfs",
"type" : "storage"
},
{
"disk" : 70697439232,
"id" : "storage/S034/local-zfs",
"maxdisk" : 1786288304128,
"node" : "S034",
"status" : "available",
"storage" : "local-zfs",
"type" : "storage"
},
{
"disk" : 110436286464,
"id" : "storage/S048/local2",
"maxdisk" : 1921430585344,
"node" : "S048",
"status" : "available",
"storage" : "local2",
"type" : "storage"
},
{
"disk" : 55469277184,
"id" : "storage/S070/local2",
"maxdisk" : 1921453260800,
"node" : "S070",
"status" : "available",
"storage" : "local2",
"type" : "storage"
},
{
"disk" : 33034076160,
"id" : "storage/S074/local2",
"maxdisk" : 1921452736512,
"node" : "S074",
"status" : "available",
"storage" : "local2",
"type" : "storage"
},
{
"disk" : 135120945152,
"id" : "storage/S034/local2",
"maxdisk" : 1850711736320,
"node" : "S034",
"status" : "available",
"storage" : "local2",
"type" : "storage"
},
{
"disk" : 110436286464,
"id" : "storage/S048/local",
"maxdisk" : 1921430585344,
"node" : "S048",
"status" : "available",
"storage" : "local",
"type" : "storage"
},
{
"disk" : 55469277184,
"id" : "storage/S070/local",
"maxdisk" : 1921453260800,
"node" : "S070",
"status" : "available",
"storage" : "local",
"type" : "storage"
},
{
"disk" : 33034076160,
"id" : "storage/S074/local",
"maxdisk" : 1921452736512,
"node" : "S074",
"status" : "available",
"storage" : "local",
"type" : "storage"
},
{
"disk" : 135120945152,
"id" : "storage/S034/local",
"maxdisk" : 1850711736320,
"node" : "S034",
"status" : "available",
"storage" : "local",
"type" : "storage"
},
{
"disk" : 419865346048,
"id" : "storage/S048/xtemplates",
"maxdisk" : 495058247680,
"node" : "S048",
"status" : "available",
"storage" : "xtemplates",
"type" : "storage"
},
{
"disk" : 113438121984,
"id" : "storage/S070/xtemplates",
"maxdisk" : 482630778880,
"node" : "S070",
"status" : "available",
"storage" : "xtemplates",
"type" : "storage"
},
{
"disk" : 76128968704,
"id" : "storage/S074/xtemplates",
"maxdisk" : 482637598720,
"node" : "S074",
"status" : "available",
"storage" : "xtemplates",
"type" : "storage"
},
{
"disk" : 148937924608,
"id" : "storage/S034/xtemplates",
"maxdisk" : 495023693824,
"node" : "S034",
"status" : "available",
"storage" : "xtemplates",
"type" : "storage"
}
]
 
can you also post a screenshot, and open the javascript console and check if there are any errors ?
 
Thank you, however the pvedaemon doesnt restart.. no error, just hangs around for now at least 5 mins at the command "service pvedaemon restart"

EDIT:

The following error Message shows in systemctl: start failed - unable to create socket - Address already in use
 
The problem seems to be getting worse, now I cannot even run an update for some lock in pve-ha-lrm even though I didn't have an HA setup.
 
Same problem right here. Any solutions? Neither upgrading kernel nor restarting services don't work for me.
 
Same problem right here. Any solutions? Neither upgrading kernel nor restarting services don't work for me.

In short, no. Basically, you need to restart every single node to solve the problem, which, I believe, happens when one node has too large transfers for a long time that it disrupted the corosync communication on that node. At this point, only this node should go question-marked. However, a bug with corosync 2.4.2 (fixed in 2.4.3) might be the reason that brought down the cluster. I filed a bug report to Proxmox earlier and the dev said they plan to upgrade corosync to 2.4.4 "soon".

I'm not entirely sure that bug is the cause of the problem. But, nevertheless, it has to be something with corosync. So I guess, we might just want to wait for the 2.4.4 update.

BTW: I now manually limit transfer, e.g. to 95Mbps on bottleneck servers, and now the problem is rare (happened a few times but self-healed quickly).
 
Any more updates here? I've just started seeing this problem myself.
 
Turns out mine was mostly caused by a borked DNS server (which, in my case was caused by borked backend storage due to a mistake on my part)
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!