Node with question mark

Discussion in 'Proxmox VE: Installation and configuration' started by decibel83, Feb 5, 2018.

  1. Vasu Sreekumar

    Vasu Sreekumar Active Member

    Joined:
    Mar 3, 2018
    Messages:
    123
    Likes Received:
    34
    There is another way.

    Migrate to another node in the cluster and start it. Target node must be a node which never had the same issue. If not it will not work.

    Not an good way, but a temporary solution. I have 5 nodes in each cluster, so i move around.
     
  2. Jospeh Huber

    Jospeh Huber Member

    Joined:
    Apr 18, 2016
    Messages:
    75
    Likes Received:
    3
    Again my question: @tom is there a upgrade planned with the fixed kernel?
     
  3. Jospeh Huber

    Jospeh Huber Member

    Joined:
    Apr 18, 2016
    Messages:
    75
    Likes Received:
    3
    Migration was no solution for us, the problem "migrated" also to the other nodes...

    The reason for our "nodes with question mark" was a nightly offline backup in stop/start mode of some lxc containers.

    Our workaround is:
    We switched the backup to "snapshot" and we have no freezes anymore ;)
    Hint: the workaround is only valid if the storage supports snapshots...
    P.S. Never restart a container...
     
  4. Vasu Sreekumar

    Vasu Sreekumar Active Member

    Joined:
    Mar 3, 2018
    Messages:
    123
    Likes Received:
    34
    Try with new kernel 4.15.
     
  5. Easying_Freeman

    Easying_Freeman New Member

    Joined:
    Mar 23, 2018
    Messages:
    1
    Likes Received:
    0
    I tried but it didn't work... daf.png
     
  6. kroem

    kroem Member

    Joined:
    Jul 12, 2016
    Messages:
    44
    Likes Received:
    0
    I'm having the same problem - upgrading the cluster now to see if it helps.
     
  7. Vasu Sreekumar

    Vasu Sreekumar Active Member

    Joined:
    Mar 3, 2018
    Messages:
    123
    Likes Received:
    34
    I also started facing same issue with 4.15.10 with KVM guests.

    I have 4 nodes, only one shows green, rest all three grey.

    But all nodes and guests pinging fine.
     
  8. dcsapak

    dcsapak Proxmox Staff Member
    Staff Member

    Joined:
    Feb 1, 2016
    Messages:
    3,515
    Likes Received:
    318
    what is the output of 'pvesh get /cluster/resources' ?
     
    Stop hovering to collapse... Click to collapse... Hover to expand... Click to expand...
  9. Vasu Sreekumar

    Vasu Sreekumar Active Member

    Joined:
    Mar 3, 2018
    Messages:
    123
    Likes Received:
    34
    root@S034:~# pvesh get /cluster/resources
    200 OK
    [
    {
    "cpu" : 0.0297653311349122,
    "disk" : 0,
    "diskread" : 9547070464,
    "diskwrite" : 31136726528,
    "id" : "qemu/204",
    "maxcpu" : 8,
    "maxdisk" : 53687091200,
    "maxmem" : 8589934592,
    "mem" : 1270067200,
    "name" : "250",
    "netin" : 331129685373,
    "netout" : 366792946,
    "node" : "S070",
    "status" : "running",
    "template" : 0,
    "type" : "qemu",
    "uptime" : 1407401,
    "vmid" : 204
    },
    {
    "cpu" : 0.0318802305705756,
    "disk" : 0,
    "diskread" : 25530366464,
    "diskwrite" : 21063669760,
    "id" : "qemu/294",
    "maxcpu" : 8,
    "maxdisk" : 53687091200,
    "maxmem" : 8589934592,
    "mem" : 2142736384,
    "name" : "S984",
    "netin" : 66459327158,
    "netout" : 96076890,
    "node" : "S048",
    "status" : "running",
    "template" : 0,
    "type" : "qemu",
    "uptime" : 236272,
    "vmid" : 294
    },
    {
    "cpu" : 0,
    "disk" : 0,
    "diskread" : 0,
    "diskwrite" : 0,
    "id" : "qemu/185",
    "maxcpu" : 8,
    "maxdisk" : 53687091200,
    "maxmem" : 8589934592,
    "mem" : 0,
    "name" : "S582",
    "netin" : 0,
    "netout" : 0,
    "node" : "S048",
    "status" : "stopped",
    "template" : 0,
    "type" : "qemu",
    "uptime" : 0,
    "vmid" : 185
    },
    {
    "cpu" : 0,
    "disk" : 0,
    "diskread" : 0,
    "diskwrite" : 0,
    "id" : "qemu/186",
    "maxcpu" : 8,
    "maxdisk" : 53687091200,
    "maxmem" : 8589934592,
    "mem" : 0,
    "name" : "S628",
    "netin" : 0,
    "netout" : 0,
    "node" : "S048",
    "status" : "stopped",
    "template" : 0,
    "type" : "qemu",
    "uptime" : 0,
    "vmid" : 186
    },
    {
    "cpu" : 0,
    "disk" : 0,
    "diskread" : 0,
    "diskwrite" : 0,
    "id" : "qemu/251",
    "maxcpu" : 8,
    "maxdisk" : 53687091200,
    "maxmem" : 8589934592,
    "mem" : 0,
    "name" : "250",
    "netin" : 0,
    "netout" : 0,
    "node" : "S034",
    "status" : "stopped",
    "template" : 1,
    "type" : "qemu",
    "uptime" : 0,
    "vmid" : 251
    },
    {
    "cpu" : 0.0249752746229761,
    "disk" : 0,
    "diskread" : 25065144320,
    "diskwrite" : 18865004032,
    "id" : "qemu/177",
    "maxcpu" : 8,
    "maxdisk" : 53687091200,
    "maxmem" : 8589934592,
    "mem" : 1593655296,
    "name" : "S984",
    "netin" : 38174364094,
    "netout" : 41832197,
    "node" : "S074",
    "status" : "running",
    "template" : 0,
    "type" : "qemu",
    "uptime" : 147775,
    "vmid" : 177
    },
    {
    "cpu" : 0,
    "disk" : 0,
    "diskread" : 0,
    "diskwrite" : 0,
    "id" : "qemu/252",
    "maxcpu" : 8,
    "maxdisk" : 53687091200,
    "maxmem" : 8589934592,
    "mem" : 0,
    "name" : "250",
    "netin" : 0,
    "netout" : 0,
    "node" : "S070",
    "status" : "stopped",
    "template" : 1,
    "type" : "qemu",
    "uptime" : 0,
    "vmid" : 252
    },
    {
    "cpu" : 0.0443813920325582,
    "disk" : 0,
    "diskread" : 19203254784,
    "diskwrite" : 59236661248,
    "id" : "qemu/201",
    "maxcpu" : 8,
    "maxdisk" : 53687091200,
    "maxmem" : 8589934592,
    "mem" : 2363322368,
    "name" : "S142.datasoft.ws",
    "netin" : 377437060060,
    "netout" : 1123426692,
    "node" : "S034",
    "status" : "running",
    "template" : 0,
    "type" : "qemu",
    "uptime" : 1657180,
    "vmid" : 201
    },
    {
    "cpu" : 0.337669773123858,
    "disk" : 0,
    "diskread" : 4564233728,
    "diskwrite" : 2287325696,
    "id" : "qemu/295",
    "maxcpu" : 8,
    "maxdisk" : 53687091200,
    "maxmem" : 8589934592,
    "mem" : 2390536192,
    "name" : "X058.datasoft.ws",
    "netin" : 353928892,
    "netout" : 19262729,
    "node" : "S070",
    "status" : "running",
    "template" : 0,
    "type" : "qemu",
    "uptime" : 1231,
    "vmid" : 295
    },
    {
    "cpu" : 0.0308204111337209,
    "disk" : 0,
    "diskread" : 3461092864,
    "diskwrite" : 2121150976,
    "id" : "qemu/203",
    "maxcpu" : 8,
    "maxdisk" : 107374182400,
    "maxmem" : 8589934592,
    "mem" : 1117327360,
    "name" : "U110.datasoft.ws",
    "netin" : 13659200225,
    "netout" : 19041087,
    "node" : "S034",
    "status" : "running",
    "template" : 0,
    "type" : "qemu",
    "uptime" : 56080,
    "vmid" : 203
    },
    {
    "cpu" : 0,
    "disk" : 0,
    "diskread" : 0,
    "diskwrite" : 0,
    "id" : "qemu/250",
    "maxcpu" : 8,
    "maxdisk" : 53687091200,
    "maxmem" : 8589934592,
    "mem" : 0,
    "name" : "250",
    "netin" : 0,
    "netout" : 0,
    "node" : "S048",
    "status" : "stopped",
    "template" : 1,
    "type" : "qemu",
    "uptime" : 0,
    "vmid" : 250
    },
    {
    "cpu" : 0,
    "disk" : 0,
    "diskread" : 0,
    "diskwrite" : 0,
    "id" : "qemu/182",
    "maxcpu" : 8,
    "maxdisk" : 53687091200,
    "maxmem" : 8589934592,
    "mem" : 0,
    "name" : "S485",
    "netin" : 0,
    "netout" : 0,
    "node" : "S048",
    "status" : "stopped",
    "template" : 0,
    "type" : "qemu",
    "uptime" : 0,
    "vmid" : 182
    },
    {
    "cpu" : 0,
    "disk" : 0,
    "diskread" : 0,
    "diskwrite" : 0,
    "id" : "qemu/184",
    "maxcpu" : 8,
    "maxdisk" : 53687091200,
    "maxmem" : 8589934592,
    "mem" : 0,
    "name" : "S894",
    "netin" : 0,
    "netout" : 0,
    "node" : "S048",
    "status" : "stopped",
    "template" : 0,
    "type" : "qemu",
    "uptime" : 0,
    "vmid" : 184
    },
    {
    "cpu" : 0.0951464245710976,
    "disk" : 0,
    "diskread" : 14175080448,
    "diskwrite" : 13538449920,
    "id" : "qemu/274",
    "maxcpu" : 8,
    "maxdisk" : 53687091200,
    "maxmem" : 8589934592,
    "mem" : 2401140736,
    "name" : "P245.datasoft.ws",
    "netin" : 140886100371,
    "netout" : 434310726,
    "node" : "S048",
    "status" : "running",
    "template" : 0,
    "type" : "qemu",
    "uptime" : 503630,
    "vmid" : 274
    },
    {
    "cpu" : 0,
    "disk" : 0,
    "diskread" : 0,
    "diskwrite" : 0,
    "id" : "qemu/180",
    "maxcpu" : 8,
    "maxdisk" : 53687091200,
    "maxmem" : 8589934592,
    "mem" : 0,
    "name" : "S748",
    "netin" : 0,
    "netout" : 0,
    "node" : "S048",
    "status" : "stopped",
    "template" : 0,
    "type" : "qemu",
    "uptime" : 0,
    "vmid" : 180
    },
    {
    "cpu" : 0,
    "disk" : 0,
    "diskread" : 0,
    "diskwrite" : 0,
    "id" : "qemu/253",
    "maxcpu" : 8,
    "maxdisk" : 53687091200,
    "maxmem" : 8589934592,
    "mem" : 0,
    "name" : "250",
    "netin" : 0,
    "netout" : 0,
    "node" : "S074",
    "status" : "stopped",
    "template" : 1,
    "type" : "qemu",
    "uptime" : 0,
    "vmid" : 253
    },
    {
    "cpu" : 0,
    "disk" : 0,
    "diskread" : 0,
    "diskwrite" : 0,
    "id" : "qemu/181",
    "maxcpu" : 8,
    "maxdisk" : 53687091200,
    "maxmem" : 8589934592,
    "mem" : 0,
    "name" : "S389",
    "netin" : 0,
    "netout" : 0,
    "node" : "S048",
    "status" : "stopped",
    "template" : 0,
    "type" : "qemu",
    "uptime" : 0,
    "vmid" : 181
    },
    {
    "cpu" : 0.0420527387468992,
    "disk" : 0,
    "diskread" : 10289083904,
    "diskwrite" : 24731916288,
    "id" : "qemu/202",
    "maxcpu" : 8,
    "maxdisk" : 53687091200,
    "maxmem" : 12884901888,
    "mem" : 2014048256,
    "name" : "S963",
    "netin" : 126177079232,
    "netout" : 264081588,
    "node" : "S034",
    "status" : "running",
    "template" : 0,
    "type" : "qemu",
    "uptime" : 415840,
    "vmid" : 202
    },
    {
    "cpu" : 0,
    "disk" : 0,
    "diskread" : 0,
    "diskwrite" : 0,
    "id" : "qemu/179",
    "maxcpu" : 8,
    "maxdisk" : 53687091200,
    "maxmem" : 8589934592,
    "mem" : 0,
    "name" : "S784",
    "netin" : 0,
    "netout" : 0,
    "node" : "S048",
    "status" : "stopped",
    "template" : 0,
    "type" : "qemu",
    "uptime" : 0,
    "vmid" : 179
    },
    {
    "cpu" : 0,
    "disk" : 0,
    "diskread" : 0,
    "diskwrite" : 0,
    "id" : "qemu/183",
    "maxcpu" : 8,
    "maxdisk" : 53687091200,
    "maxmem" : 8589934592,
    "mem" : 0,
    "name" : "S748",
    "netin" : 0,
    "netout" : 0,
    "node" : "S048",
    "status" : "stopped",
    "template" : 0,
    "type" : "qemu",
    "uptime" : 0,
    "vmid" : 183
    },
    {
    "cpu" : 0.18999071218876,
    "disk" : 0,
    "diskread" : 47669778432,
    "diskwrite" : 73533008896,
    "id" : "qemu/261",
    "maxcpu" : 8,
    "maxdisk" : 53687091200,
    "maxmem" : 8589934592,
    "mem" : 2913050624,
    "name" : "V131.datasoft.ws",
    "netin" : 195453681545,
    "netout" : 693492590,
    "node" : "S034",
    "status" : "running",
    "template" : 0,
    "type" : "qemu",
    "uptime" : 738122,
    "vmid" : 261
    },
    {
    "cpu" : 0.0824615562046685,
    "disk" : 110436286464,
    "id" : "node/S048",
    "level" : "",
    "maxcpu" : 24,
    "maxdisk" : 1921430585344,
    "maxmem" : 67541700608,
    "mem" : 24357752832,
    "node" : "S048",
    "status" : "online",
    "type" : "node",
    "uptime" : 3524600
    },
    {
    "cpu" : 0.12062177031573,
    "disk" : 55469277184,
    "id" : "node/S070",
    "level" : "",
    "maxcpu" : 24,
    "maxdisk" : 1921453260800,
    "maxmem" : 50629959680,
    "mem" : 23707160576,
    "node" : "S070",
    "status" : "online",
    "type" : "node",
    "uptime" : 1826219
    },
    {
    "cpu" : 0.0150834663624653,
    "disk" : 33034076160,
    "id" : "node/S074",
    "level" : "",
    "maxcpu" : 24,
    "maxdisk" : 1921452736512,
    "maxmem" : 50629967872,
    "mem" : 15279632384,
    "node" : "S074",
    "status" : "online",
    "type" : "node",
    "uptime" : 1442498
    },
    {
    "cpu" : 0.10414983044466,
    "disk" : 135120945152,
    "id" : "node/S034",
    "level" : "",
    "maxcpu" : 24,
    "maxdisk" : 1850711736320,
    "maxmem" : 67541381120,
    "mem" : 30377394176,
    "node" : "S034",
    "status" : "online",
    "type" : "node",
    "uptime" : 1839311
    },
    {
    "disk" : 98304,
    "id" : "storage/S048/local-zfs",
    "maxdisk" : 1810994483200,
    "node" : "S048",
    "status" : "available",
    "storage" : "local-zfs",
    "type" : "storage"
    },
    {
    "disk" : 98304,
    "id" : "storage/S070/local-zfs",
    "maxdisk" : 1865984126976,
    "node" : "S070",
    "status" : "available",
    "storage" : "local-zfs",
    "type" : "storage"
    },
    {
    "disk" : 98304,
    "id" : "storage/S074/local-zfs",
    "maxdisk" : 1888418775040,
    "node" : "S074",
    "status" : "available",
    "storage" : "local-zfs",
    "type" : "storage"
    },
    {
    "disk" : 70697439232,
    "id" : "storage/S034/local-zfs",
    "maxdisk" : 1786288304128,
    "node" : "S034",
    "status" : "available",
    "storage" : "local-zfs",
    "type" : "storage"
    },
    {
    "disk" : 110436286464,
    "id" : "storage/S048/local2",
    "maxdisk" : 1921430585344,
    "node" : "S048",
    "status" : "available",
    "storage" : "local2",
    "type" : "storage"
    },
    {
    "disk" : 55469277184,
    "id" : "storage/S070/local2",
    "maxdisk" : 1921453260800,
    "node" : "S070",
    "status" : "available",
    "storage" : "local2",
    "type" : "storage"
    },
    {
    "disk" : 33034076160,
    "id" : "storage/S074/local2",
    "maxdisk" : 1921452736512,
    "node" : "S074",
    "status" : "available",
    "storage" : "local2",
    "type" : "storage"
    },
    {
    "disk" : 135120945152,
    "id" : "storage/S034/local2",
    "maxdisk" : 1850711736320,
    "node" : "S034",
    "status" : "available",
    "storage" : "local2",
    "type" : "storage"
    },
    {
    "disk" : 110436286464,
    "id" : "storage/S048/local",
    "maxdisk" : 1921430585344,
    "node" : "S048",
    "status" : "available",
    "storage" : "local",
    "type" : "storage"
    },
    {
    "disk" : 55469277184,
    "id" : "storage/S070/local",
    "maxdisk" : 1921453260800,
    "node" : "S070",
    "status" : "available",
    "storage" : "local",
    "type" : "storage"
    },
    {
    "disk" : 33034076160,
    "id" : "storage/S074/local",
    "maxdisk" : 1921452736512,
    "node" : "S074",
    "status" : "available",
    "storage" : "local",
    "type" : "storage"
    },
    {
    "disk" : 135120945152,
    "id" : "storage/S034/local",
    "maxdisk" : 1850711736320,
    "node" : "S034",
    "status" : "available",
    "storage" : "local",
    "type" : "storage"
    },
    {
    "disk" : 419865346048,
    "id" : "storage/S048/xtemplates",
    "maxdisk" : 495058247680,
    "node" : "S048",
    "status" : "available",
    "storage" : "xtemplates",
    "type" : "storage"
    },
    {
    "disk" : 113438121984,
    "id" : "storage/S070/xtemplates",
    "maxdisk" : 482630778880,
    "node" : "S070",
    "status" : "available",
    "storage" : "xtemplates",
    "type" : "storage"
    },
    {
    "disk" : 76128968704,
    "id" : "storage/S074/xtemplates",
    "maxdisk" : 482637598720,
    "node" : "S074",
    "status" : "available",
    "storage" : "xtemplates",
    "type" : "storage"
    },
    {
    "disk" : 148937924608,
    "id" : "storage/S034/xtemplates",
    "maxdisk" : 495023693824,
    "node" : "S034",
    "status" : "available",
    "storage" : "xtemplates",
    "type" : "storage"
    }
    ]
     
  10. dcsapak

    dcsapak Proxmox Staff Member
    Staff Member

    Joined:
    Feb 1, 2016
    Messages:
    3,515
    Likes Received:
    318
    can you also post a screenshot, and open the javascript console and check if there are any errors ?
     
    Stop hovering to collapse... Click to collapse... Hover to expand... Click to expand...
  11. Vasu Sreekumar

    Vasu Sreekumar Active Member

    Joined:
    Mar 3, 2018
    Messages:
    123
    Likes Received:
    34
    There was no error messages.

    Running service pve-cluster restart cleared the grey issue.

    No reboot was required.

    All 4 nodes are on same CISCO 1000 mbps switch.
     
  12. re-host.eu

    re-host.eu New Member

    Joined:
    Oct 3, 2017
    Messages:
    29
    Likes Received:
    0
    For me this didnt work..
     
  13. Vasu Sreekumar

    Vasu Sreekumar Active Member

    Joined:
    Mar 3, 2018
    Messages:
    123
    Likes Received:
    34
    Try all 4

    service pve-cluster restart
    service pveproxy restart
    service pvedaemon restart
    service pvestatd restart
     
  14. re-host.eu

    re-host.eu New Member

    Joined:
    Oct 3, 2017
    Messages:
    29
    Likes Received:
    0
    Thank you, however the pvedaemon doesnt restart.. no error, just hangs around for now at least 5 mins at the command "service pvedaemon restart"

    EDIT:

    The following error Message shows in systemctl: start failed - unable to create socket - Address already in use
     
  15. Kaijia Feng

    Kaijia Feng New Member

    Joined:
    Mar 8, 2017
    Messages:
    5
    Likes Received:
    0
    The problem seems to be getting worse, now I cannot even run an update for some lock in pve-ha-lrm even though I didn't have an HA setup.
     
  16. barad1tos

    barad1tos New Member

    Joined:
    Mar 20, 2018
    Messages:
    1
    Likes Received:
    0
    Same problem right here. Any solutions? Neither upgrading kernel nor restarting services don't work for me.
     
  17. Kaijia Feng

    Kaijia Feng New Member

    Joined:
    Mar 8, 2017
    Messages:
    5
    Likes Received:
    0
    In short, no. Basically, you need to restart every single node to solve the problem, which, I believe, happens when one node has too large transfers for a long time that it disrupted the corosync communication on that node. At this point, only this node should go question-marked. However, a bug with corosync 2.4.2 (fixed in 2.4.3) might be the reason that brought down the cluster. I filed a bug report to Proxmox earlier and the dev said they plan to upgrade corosync to 2.4.4 "soon".

    I'm not entirely sure that bug is the cause of the problem. But, nevertheless, it has to be something with corosync. So I guess, we might just want to wait for the 2.4.4 update.

    BTW: I now manually limit transfer, e.g. to 95Mbps on bottleneck servers, and now the problem is rare (happened a few times but self-healed quickly).
     
  18. iFargle

    iFargle New Member

    Joined:
    Mar 22, 2018
    Messages:
    9
    Likes Received:
    0
    Any more updates here? I've just started seeing this problem myself.
     
  19. Hugues Depasse

    Hugues Depasse New Member

    Joined:
    Mar 2, 2016
    Messages:
    7
    Likes Received:
    1
    Same issue for me with random nodes going "offline", already tried to restart all services, does not work.

    Help please :(
     
  20. iFargle

    iFargle New Member

    Joined:
    Mar 22, 2018
    Messages:
    9
    Likes Received:
    0
    Turns out mine was mostly caused by a borked DNS server (which, in my case was caused by borked backend storage due to a mistake on my part)
     
  1. This site uses cookies to help personalise content, tailor your experience and to keep you logged in if you register.
    By continuing to use this site, you are consenting to our use of cookies.
    Dismiss Notice