Cluster without quorum, Nodes doas't join the cluster

Discussion in 'Proxmox VE: Installation and configuration' started by yena, Feb 14, 2019.

  1. yena

    yena Member

    Joined:
    Nov 18, 2011
    Messages:
    288
    Likes Received:
    1
    Hello,
    i hav 3 nodes cluster,
    this morning VPS are UP but the cluster status was down, no quorum.
    i have rebooted 2 of 3 nodes ( the nodes without vps ) but not solved..

    LAN is ok and hostname /etc/host is ok.

    I hav also tried to reboot services on all 3 nodes:

    systemctl restart pve-cluster
    systemctl restart pvedaemon
    systemctl restart pveproxy
    systemctl restart pvestatd

    But no way still no quorum:

    pvecm status ( same inj all 3 nodes )
    Quorum information
    ------------------
    Date: Thu Feb 14 14:09:46 2019
    Quorum provider: corosync_votequorum
    Nodes: 1
    Node ID: 0x00000001
    Ring ID: 1/4161200
    Quorate: No

    Votequorum information
    ----------------------
    Expected votes: 3
    Highest expected: 3
    Total votes: 1
    Quorum: 2 Activity blocked
    Flags:

    Membership information
    ----------------------
    Nodeid Votes Name
    0x00000001 1 10.10.10.1 (local)

    -------------------------------------------------------------
    Pve cluster seem ok in all 3 nodes:

    systemctl status corosync pve-cluster
    ● corosync.service - Corosync Cluster Engine
    Loaded: loaded (/lib/systemd/system/corosync.service; enabled; vendor preset: enabled)
    Active: active (running) since Thu 2019-02-14 13:31:04 CET; 33min ago
    Docs: man:corosync
    man:corosync.conf
    man:corosync_overview
    Main PID: 9977 (corosync)
    Tasks: 2 (limit: 6144)
    Memory: 46.3M
    CPU: 35.491s
    CGroup: /system.slice/corosync.service
    └─9977 /usr/sbin/corosync -f

    Feb 14 13:31:04 n1 corosync[9977]: [QUORUM] Members[1]: 1
    Feb 14 13:31:04 n1 corosync[9977]: [MAIN ] Completed service synchronization, ready to provide service.
    Feb 14 13:46:05 n1 corosync[9977]: notice [TOTEM ] A new membership (10.10.10.1:4158332) was formed. Members
    Feb 14 13:46:05 n1 corosync[9977]: warning [CPG ] downlist left_list: 0 received
    Feb 14 13:46:05 n1 corosync[9977]: notice [QUORUM] Members[1]: 1
    Feb 14 13:46:05 n1 corosync[9977]: [TOTEM ] A new membership (10.10.10.1:4158332) was formed. Members
    Feb 14 13:46:05 n1 corosync[9977]: notice [MAIN ] Completed service synchronization, ready to provide service.
    Feb 14 13:46:05 n1 corosync[9977]: [CPG ] downlist left_list: 0 received
    Feb 14 13:46:05 n1 corosync[9977]: [QUORUM] Members[1]: 1
    Feb 14 13:46:05 n1 corosync[9977]: [MAIN ] Completed service synchronization, ready to provide service.

    ● pve-cluster.service - The Proxmox VE cluster filesystem
    Loaded: loaded (/lib/systemd/system/pve-cluster.service; enabled; vendor preset: enabled)
    Active: active (running) since Thu 2019-02-14 14:03:41 CET; 58s ago
    Process: 1375 ExecStartPost=/usr/bin/pvecm updatecerts --silent (code=exited, status=0/SUCCESS)
    Process: 1354 ExecStart=/usr/bin/pmxcfs (code=exited, status=0/SUCCESS)
    Main PID: 1355 (pmxcfs)
    Tasks: 5 (limit: 6144)
    Memory: 38.2M
    CPU: 522ms
    CGroup: /system.slice/pve-cluster.service
    └─1355 /usr/bin/pmxcfs

    ------------------------------------------------------------------------------------------------------------------

    pveversion -v
    proxmox-ve: 5.2-2 (running kernel: 4.15.18-7-pve)
    pve-manager: 5.2-10 (running version: 5.2-10/6f892b40)
    pve-kernel-4.15: 5.2-10
    pve-kernel-4.15.18-7-pve: 4.15.18-27
    pve-kernel-4.13.13-2-pve: 4.13.13-33
    corosync: 2.4.2-pve5
    criu: 2.11.1-1~bpo90
    glusterfs-client: 3.8.8-1
    ksm-control-daemon: 1.2-2
    libjs-extjs: 6.0.1-2
    libpve-access-control: 5.0-8
    libpve-apiclient-perl: 2.0-5
    libpve-common-perl: 5.0-41
    libpve-guest-common-perl: 2.0-18
    libpve-http-server-perl: 2.0-11
    libpve-storage-perl: 5.0-30
    libqb0: 1.0.1-1
    lvm2: 2.02.168-pve6
    lxc-pve: 3.0.2+pve1-3
    lxcfs: 3.0.2-2
    novnc-pve: 1.0.0-2
    proxmox-widget-toolkit: 1.0-20
    pve-cluster: 5.0-30
    pve-container: 2.0-29
    pve-docs: 5.2-9
    pve-firewall: 3.0-14
    pve-firmware: 2.0-5
    pve-ha-manager: 2.0-5
    pve-i18n: 1.0-6
    pve-libspice-server1: 0.14.1-1
    pve-qemu-kvm: 2.12.1-1
    pve-xtermjs: 1.0-5
    pve-zsync: 1.7-1
    qemu-server: 5.0-38
    smartmontools: 6.5+svn4324-1
    spiceterm: 3.0-5
    vncterm: 1.5-3
    zfsutils-linux: 0.7.11-pve1~bpo1

    -----------------------------------------------------------------------------------------------------------------
    I can't stop VM:
    root@n1:~# qm shutdown 301
    VM is locked (snapshot-delete)

    and i can't unlock:
    root@n1:~# qm unlock 301
    unable to open file '/etc/pve/nodes/n1/qemu-server/301.conf.tmp.956' - Permission denied

    But i can access /etc/pve:
    ls -la /etc/pve
    total 13
    drwxr-xr-x 2 root www-data 0 Jan 1 1970 .
    drwxr-xr-x 92 root root 184 Nov 7 09:46 ..
    -r--r----- 1 root www-data 451 Oct 24 21:06 authkey.pub
    -r--r----- 1 root www-data 859 Jan 1 1970 .clusterlog
    ......

    Please Help !
    Thanks!!
     
  2. dlimbeck

    dlimbeck Proxmox Staff Member
    Staff Member

    Joined:
    Aug 1, 2018
    Messages:
    137
    Likes Received:
    9
    Can you post the corosync config? (/etc/corosync/corosync.conf) If possible for all 3 nodes.
     
    Stop hovering to collapse... Click to collapse... Hover to expand... Click to expand...
  3. yena

    yena Member

    Joined:
    Nov 18, 2011
    Messages:
    288
    Likes Received:
    1
    root@n1:~# cat /etc/corosync/corosync.conf
    logging {
    debug: off
    to_syslog: yes
    }

    nodelist {
    node {
    name: n1
    nodeid: 1
    quorum_votes: 1
    ring0_addr: 10.10.10.1
    }
    node {
    name: n2
    nodeid: 2
    quorum_votes: 1
    ring0_addr: 10.10.10.2
    }
    node {
    name: n3
    nodeid: 3
    quorum_votes: 1
    ring0_addr: 10.10.10.3
    }
    }

    quorum {
    provider: corosync_votequorum
    }

    totem {
    cluster_name: civitavecchia
    config_version: 3
    interface {
    bindnetaddr: 10.10.10.1
    ringnumber: 0
    }
    ip_version: ipv4
    secauth: on
    version: 2
    }

    --------------------------------------------------------------------------

    cat /etc/corosync/corosync.conf
    logging {
    debug: off
    to_syslog: yes
    }

    nodelist {
    node {
    name: n1
    nodeid: 1
    quorum_votes: 1
    ring0_addr: 10.10.10.1
    }
    node {
    name: n2
    nodeid: 2
    quorum_votes: 1
    ring0_addr: 10.10.10.2
    }
    node {
    name: n3
    nodeid: 3
    quorum_votes: 1
    ring0_addr: 10.10.10.3
    }
    }

    quorum {
    provider: corosync_votequorum
    }

    totem {
    cluster_name: civitavecchia
    config_version: 3
    interface {
    bindnetaddr: 10.10.10.1
    ringnumber: 0
    }
    ip_version: ipv4
    secauth: on
    version: 2
    }

    ------------------------------------------------------------------------

    root@n3:~# cat /etc/corosync/corosync.conf
    logging {
    debug: off
    to_syslog: yes
    }

    nodelist {
    node {
    name: n1
    nodeid: 1
    quorum_votes: 1
    ring0_addr: 10.10.10.1
    }
    node {
    name: n2
    nodeid: 2
    quorum_votes: 1
    ring0_addr: 10.10.10.2
    }
    node {
    name: n3
    nodeid: 3
    quorum_votes: 1
    ring0_addr: 10.10.10.3
    }
    }

    quorum {
    provider: corosync_votequorum
    }

    totem {
    cluster_name: civitavecchia
    config_version: 3
    interface {
    bindnetaddr: 10.10.10.1
    ringnumber: 0
    }
    ip_version: ipv4
    secauth: on
    version: 2
    }

    ------------------------------------------------------------
    host file ( same in all 3 nodes )

    root@n1:~# cat /etc/hosts
    10.10.10.1 n1.civitavecchia.local n1
    10.10.10.2 n2.civitavecchia.local n2
    10.10.10.3 n3.civitavecchia.local n3

    ------------------------------------------------------------

    I can ping all nodes via LAN

    Many Thanks!!
     
  4. dlimbeck

    dlimbeck Proxmox Staff Member
    Staff Member

    Joined:
    Aug 1, 2018
    Messages:
    137
    Likes Received:
    9
    The config looks good. Can you try running 'omping -c 10000 -i 0.001 -F -q <node1> <node2> <node3>' to see if multicast works?
     
    Stop hovering to collapse... Click to collapse... Hover to expand... Click to expand...
  5. yena

    yena Member

    Joined:
    Nov 18, 2011
    Messages:
    288
    Likes Received:
    1
    This is the test:

    root@n1:~# omping -c 10000 -i 0.001 -F -q 10.10.10.1 10.10.10.2 10.10.10.3
    10.10.10.2 : waiting for response msg
    10.10.10.3 : waiting for response msg
    10.10.10.2 : waiting for response msg
    10.10.10.3 : waiting for response msg
    10.10.10.2 : waiting for response msg
    10.10.10.3 : waiting for response msg
    10.10.10.2 : waiting for response msg
    10.10.10.3 : waiting for response msg
    10.10.10.2 : waiting for response msg
    10.10.10.3 : waiting for response msg
    10.10.10.2 : waiting for response msg
    10.10.10.3 : waiting for response msg
    10.10.10.2 : waiting for response msg
    10.10.10.3 : waiting for response msg
    10.10.10.2 : waiting for response msg
    10.10.10.3 : waiting for response msg
    10.10.10.2 : waiting for response msg
    10.10.10.3 : waiting for response msg
    ^C
    10.10.10.2 : response message never received
    10.10.10.3 : response message never received
     
  6. dlimbeck

    dlimbeck Proxmox Staff Member
    Staff Member

    Joined:
    Aug 1, 2018
    Messages:
    137
    Likes Received:
    9
    Oh, sorry, forgot to mention to run it on all 3 nodes simultaneously.
     
    Stop hovering to collapse... Click to collapse... Hover to expand... Click to expand...
  7. yena

    yena Member

    Joined:
    Nov 18, 2011
    Messages:
    288
    Likes Received:
    1
    Seem OK :)

    10.10.10.1 : unicast, xmt/rcv/%loss = 10000/10000/0%, min/avg/max/std-dev = 0.023/0.096/0.342/0.048
    10.10.10.1 : multicast, xmt/rcv/%loss = 10000/10000/0%, min/avg/max/std-dev = 0.032/0.120/0.362/0.050
    10.10.10.2 : unicast, xmt/rcv/%loss = 10000/10000/0%, min/avg/max/std-dev = 0.027/0.092/1.028/0.044
    10.10.10.2 : multicast, xmt/rcv/%loss = 10000/10000/0%, min/avg/max/std-dev = 0.032/0.105/1.033/0.046

    Now seem back online but everything is very slow !

    Feb 14 15:55:17 n1 corosync[10145]: notice [TOTEM ] A processor failed, forming new configuration.
    Feb 14 15:55:17 n1 corosync[10145]: [TOTEM ] A processor failed, forming new configuration.
    Feb 14 15:56:00 n1 systemd[1]: Starting Proxmox VE replication runner...
    Feb 14 15:56:33 n1 pvedaemon[1863]: VM 301 qmp command failed - VM 301 qmp command 'guest-ping' failed - got timeout

    I thnik i hve to reboot the node with vps...
     
    #7 yena, Feb 14, 2019
    Last edited: Feb 14, 2019
  8. yena

    yena Member

    Joined:
    Nov 18, 2011
    Messages:
    288
    Likes Received:
    1
    Maybe a network LAN problem... i don't know:

    If i try a qm unlock vmid,
    everything stops

    [ 20.500796] vmbr1: port 2(tap305i1) entered blocking state
    [ 20.500799] vmbr1: port 2(tap305i1) entered disabled state
    [ 20.500972] vmbr1: port 2(tap305i1) entered blocking state
    [ 20.500974] vmbr1: port 2(tap305i1) entered forwarding state
    [ 242.433126] INFO: task pve-firewall:5579 blocked for more than 120 seconds.
    [ 242.433154] Tainted: P O 4.15.18-7-pve #1
    [ 242.433172] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
    [ 242.433195] pve-firewall D 0 5579 1 0x00000000
    [ 242.433210] Call Trace:
    [ 242.433216] __schedule+0x3e0/0x870
    [ 242.433218] schedule+0x36/0x80
    [ 242.433219] rwsem_down_read_failed+0x10a/0x170
    [ 242.433221] ? _cond_resched+0x1a/0x50
    [ 242.433223] call_rwsem_down_read_failed+0x18/0x30
    [ 242.433225] ? call_rwsem_down_read_failed+0x18/0x30
    [ 242.433226] down_read+0x20/0x40
    [ 242.433228] path_openat+0x897/0x14a0
    [ 242.433230] do_filp_open+0x99/0x110
    [ 242.433233] ? simple_attr_release+0x20/0x20
    [ 242.433236] do_sys_open+0x135/0x280
    [ 242.433237] ? do_sys_open+0x135/0x280
    [ 242.433239] SyS_open+0x1e/0x20
    [ 242.433241] do_syscall_64+0x73/0x130
    [ 242.433243] entry_SYSCALL_64_after_hwframe+0x3d/0xa2
    [ 242.433245] RIP: 0033:0x7f0cecf9c820
    [ 242.433246] RSP: 002b:00007ffeac82c678 EFLAGS: 00000246 ORIG_RAX: 0000000000000002
    [ 242.433247] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f0cecf9c820
    [ 242.433248] RDX: 00000000000001b6 RSI: 0000000000000000 RDI: 0000562da050e780
    [ 242.433249] RBP: 0000000000000000 R08: 00007ffeac82c880 R09: 0000562da050e780
    [ 242.433250] R10: 0000562d9cff81e0 R11: 0000000000000246 R12: 0000000000000000
    [ 242.433250] R13: 0000562d9e288010 R14: 00007ffeac82c881 R15: 0000562d9e2a2320
    [ 242.433256] INFO: task qm:14740 blocked for more than 120 seconds.
    [ 242.433275] Tainted: P O 4.15.18-7-pve #1
    [ 242.433292] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
    [ 242.433315] qm D 0 14740 13798 0x00000000
    [ 242.433316] Call Trace:
    [ 242.433318] __schedule+0x3e0/0x870
    [ 242.433319] ? path_parentat+0x3e/0x80
    [ 242.433320] schedule+0x36/0x80
    [ 242.433322] rwsem_down_write_failed+0x208/0x390
    [ 242.433324] call_rwsem_down_write_failed+0x17/0x30
    [ 242.433325] ? call_rwsem_down_write_failed+0x17/0x30
    [ 242.433327] down_write+0x2d/0x40
    [ 242.433328] filename_create+0x7e/0x160
    [ 242.433342] SyS_mkdir+0x51/0x100
    [ 242.433344] do_syscall_64+0x73/0x130
    [ 242.433345] entry_SYSCALL_64_after_hwframe+0x3d/0xa2
    [ 242.433346] RIP: 0033:0x7f72a4fe7447
    [ 242.433347] RSP: 002b:00007ffde8469718 EFLAGS: 00000246 ORIG_RAX: 0000000000000053
    [ 242.433348] RAX: ffffffffffffffda RBX: 000055e1b69d9010 RCX: 00007f72a4fe7447
    [ 242.433349] RDX: 000000000000001c RSI: 00000000000001ff RDI: 000055e1b9c72a10
    [ 242.433350] RBP: 0000000000000000 R08: 0000000000000200 R09: 000055e1b69d9028
    [ 242.433351] R10: 0000000000000000 R11: 0000000000000246 R12: 000055e1b7246798
    [ 242.433351] R13: 000055e1b9c61768 R14: 000055e1b9c72a10 R15: 00000000000001ff
    [ 242.433354] INFO: task pvesr:15815 blocked for more than 120 seconds.
    [ 242.433372] Tainted: P O 4.15.18-7-pve #1
    [ 242.433389] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
    [ 242.433422] pvesr D 0 15815 1 0x00000000
    [ 242.433424] Call Trace:
    [ 242.433425] __schedule+0x3e0/0x870
    [ 242.433426] ? path_parentat+0x3e/0x80
    [ 242.433428] schedule+0x36/0x80
    [ 242.433429] rwsem_down_write_failed+0x208/0x390
    [ 242.433431] call_rwsem_down_write_failed+0x17/0x30
    [ 242.433432] ? call_rwsem_down_write_failed+0x17/0x30
    [ 242.433434] down_write+0x2d/0x40
    [ 242.433435] filename_create+0x7e/0x160
    [ 242.433436] SyS_mkdir+0x51/0x100
    [ 242.433438] do_syscall_64+0x73/0x130
    [ 242.433440] entry_SYSCALL_64_after_hwframe+0x3d/0xa2
    [ 242.433441] RIP: 0033:0x7f651fb92447
    [ 242.433441] RSP: 002b:00007fff0839cab8 EFLAGS: 00000246 ORIG_RAX: 0000000000000053
    [ 242.433442] RAX: ffffffffffffffda RBX: 0000557e44437010 RCX: 00007f651fb92447
    [ 242.433443] RDX: 0000557e43aee9a4 RSI: 00000000000001ff RDI: 0000557e47bc4910
    [ 242.433444] RBP: 0000000000000000 R08: 0000000000000200 R09: 0000557e44437028
    [ 242.433444] R10: 0000000000000000 R11: 0000000000000246 R12: 0000557e45a4d698
    [ 242.433445] R13: 0000557e47b0dd28 R14: 0000557e47bc4910 R15: 00000000000001ff
    [ 363.270230] INFO: task pve-firewall:5579 blocked for more than 120 seconds.
    [ 363.270258] Tainted: P O 4.15.18-7-pve #1
    [ 363.270276] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
    [ 363.270300] pve-firewall D 0 5579 1 0x00000000
    [ 363.270303] Call Trace:
    [ 363.270309] __schedule+0x3e0/0x870
    [ 363.270311] schedule+0x36/0x80
    [ 363.270313] rwsem_down_read_failed+0x10a/0x170
    [ 363.270314] ? _cond_resched+0x1a/0x50
    [ 363.270317] call_rwsem_down_read_failed+0x18/0x30
    [ 363.270318] ? call_rwsem_down_read_failed+0x18/0x30
    [ 363.270320] down_read+0x20/0x40
    [ 363.270322] path_openat+0x897/0x14a0
    [ 363.270324] do_filp_open+0x99/0x110
    [ 363.270327] ? simple_attr_release+0x20/0x20
    [ 363.270329] do_sys_open+0x135/0x280
    [ 363.270331] ? do_sys_open+0x135/0x280
    [ 363.270332] SyS_open+0x1e/0x20
    [ 363.270335] do_syscall_64+0x73/0x130
    [ 363.270337] entry_SYSCALL_64_after_hwframe+0x3d/0xa2
    [ 363.270338] RIP: 0033:0x7f0cecf9c820
    [ 363.270339] RSP: 002b:00007ffeac82c678 EFLAGS: 00000246 ORIG_RAX: 0000000000000002
    [ 363.270341] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f0cecf9c820
    [ 363.270342] RDX: 00000000000001b6 RSI: 0000000000000000 RDI: 0000562da050e780
    [ 363.270343] RBP: 0000000000000000 R08: 00007ffeac82c880 R09: 0000562da050e780
    [ 363.270343] R10: 0000562d9cff81e0 R11: 0000000000000246 R12: 0000000000000000
    [ 363.270344] R13: 0000562d9e288010 R14: 00007ffeac82c881 R15: 0000562d9e2a2320
    [ 363.270346] INFO: task pvedaemon worke:5706 blocked for more than 120 seconds.
    [ 363.270368] Tainted: P O 4.15.18-7-pve #1
    [ 363.270386] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
    [ 363.270409] pvedaemon worke D 0 5706 5703 0x00000004
    [ 363.270410] Call Trace:
    [ 363.270412] __schedule+0x3e0/0x870
    [ 363.270413] ? path_parentat+0x3e/0x80
    [ 363.270414] schedule+0x36/0x80
    [ 363.270416] rwsem_down_write_failed+0x208/0x390
    [ 363.270418] call_rwsem_down_write_failed+0x17/0x30
    [ 363.270419] ? call_rwsem_down_write_failed+0x17/0x30
    [ 363.270421] down_write+0x2d/0x40
    [ 363.270422] filename_create+0x7e/0x160
    [ 363.270424] SyS_mkdir+0x51/0x100
    [ 363.270425] do_syscall_64+0x73/0x130
    [ 363.270427] entry_SYSCALL_64_after_hwframe+0x3d/0xa2
    [ 363.270428] RIP: 0033:0x7fdb99009447
    [ 363.270429] RSP: 002b:00007fff3d1743c8 EFLAGS: 00000246 ORIG_RAX: 0000000000000053
    [ 363.270430] RAX: ffffffffffffffda RBX: 0000564e0ed8d010 RCX: 00007fdb99009447
    [ 363.270431] RDX: 0000564e0e3259a4 RSI: 00000000000001ff RDI: 0000564e151172f0
    [ 363.270444] RBP: 0000000000000000 R08: 0000564e0e324608 R09: 0000000000000008
    [ 363.270445] R10: 0000000000000000 R11: 0000000000000246 R12: 0000564e151223a8
    [ 363.270446] R13: 0000564e108b5e08 R14: 0000564e151172f0 R15: 00000000000001ff
    [ 363.270451] INFO: task qm:14740 blocked for more than 120 seconds.
    [ 363.270470] Tainted: P O 4.15.18-7-pve #1
    [ 363.270487] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
    [ 363.270522] qm D 0 14740 13798 0x00000004
    [ 363.270523] Call Trace:
    [ 363.270525] __schedule+0x3e0/0x870
    [ 363.270526] ? path_parentat+0x3e/0x80
    [ 363.270527] schedule+0x36/0x80
    [ 363.270529] rwsem_down_write_failed+0x208/0x390
    [ 363.270531] call_rwsem_down_write_failed+0x17/0x30
    [ 363.270532] ? call_rwsem_down_write_failed+0x17/0x30
    [ 363.270533] down_write+0x2d/0x40
    [ 363.270535] filename_create+0x7e/0x160
    [ 363.270536] SyS_mkdir+0x51/0x100
    [ 363.270538] do_syscall_64+0x73/0x130
    [ 363.270539] entry_SYSCALL_64_after_hwframe+0x3d/0xa2
    [ 363.270540] RIP: 0033:0x7f72a4fe7447
    [ 363.270541] RSP: 002b:00007ffde8469718 EFLAGS: 00000246 ORIG_RAX: 0000000000000053
    [ 363.270542] RAX: ffffffffffffffda RBX: 000055e1b69d9010 RCX: 00007f72a4fe7447
    [ 363.270543] RDX: 000000000000001c RSI: 00000000000001ff RDI: 000055e1b9c72a10
    [ 363.270543] RBP: 0000000000000000 R08: 0000000000000200 R09: 000055e1b69d9028
    [ 363.270544] R10: 0000000000000000 R11: 0000000000000246 R12: 000055e1b7246798
    [ 363.270545] R13: 000055e1b9c61768 R14: 000055e1b9c72a10 R15: 00000000000001ff
    [ 363.270547] INFO: task pvesr:15815 blocked for more than 120 seconds.
    [ 363.270567] Tainted: P O 4.15.18-7-pve #1
    [ 363.270584] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
    [ 363.270607] pvesr D 0 15815 1 0x00000000
    [ 363.270608] Call Trace:
    [ 363.270609] __schedule+0x3e0/0x870
    [ 363.270611] ? path_parentat+0x3e/0x80
    [ 363.270612] schedule+0x36/0x80
    [ 363.270613] rwsem_down_write_failed+0x208/0x390
    [ 363.270615] call_rwsem_down_write_failed+0x17/0x30
    [ 363.270616] ? call_rwsem_down_write_failed+0x17/0x30
    [ 363.270618] down_write+0x2d/0x40
    [ 363.270619] filename_create+0x7e/0x160
    [ 363.270620] SyS_mkdir+0x51/0x100
    [ 363.270622] do_syscall_64+0x73/0x130
    [ 363.270624] entry_SYSCALL_64_after_hwframe+0x3d/0xa2
    [ 363.270624] RIP: 0033:0x7f651fb92447
    [ 363.270625] RSP: 002b:00007fff0839cab8 EFLAGS: 00000246 ORIG_RAX: 0000000000000053
    [ 363.270626] RAX: ffffffffffffffda RBX: 0000557e44437010 RCX: 00007f651fb92447
    [ 363.270627] RDX: 0000557e43aee9a4 RSI: 00000000000001ff RDI: 0000557e47bc4910
    [ 363.270628] RBP: 0000000000000000 R08: 0000000000000200 R09: 0000557e44437028
    [ 363.270628] R10: 0000000000000000 R11: 0000000000000246 R12: 0000557e45a4d698
    [ 363.270629] R13: 0000557e47b0dd28 R14: 0000557e47bc4910 R15: 00000000000001ff
     
  9. yena

    yena Member

    Joined:
    Nov 18, 2011
    Messages:
    288
    Likes Received:
    1
    Can i use a single node "standalone" ?
    i wold like to temporary exclude the other two nodes because i think it's a filesytem shared problem ..
     
  10. dlimbeck

    dlimbeck Proxmox Staff Member
    Staff Member

    Joined:
    Aug 1, 2018
    Messages:
    137
    Likes Received:
    9
    If the other 2 nodes are offline you can use 'pvecm expect 1' to set the required votes to 1 temporarily.
     
    Stop hovering to collapse... Click to collapse... Hover to expand... Click to expand...
  11. yena

    yena Member

    Joined:
    Nov 18, 2011
    Messages:
    288
    Likes Received:
    1
    Thanks you very mutch!
    Finally i hav find the problem:
    the Third node have some network problem, but this problem is discontinuous;
    so the pve shared Filesystem lock everithing....

    Now i have powered off thi node and all seem good!

    Thanks !!
     
  1. This site uses cookies to help personalise content, tailor your experience and to keep you logged in if you register.
    By continuing to use this site, you are consenting to our use of cookies.
    Dismiss Notice