Cluster loss

Antonino89

Member
Jul 13, 2017
76
1
6
35
Guys,

here i am again :D

after some network troubleshooting i needed to reload network and server itself.

After that i lost all my cluster, i have 3 server currently they are unable to see eachothers...

i tried to reload every things...

service pve-cluster restart <--- Nothing changed.

systemctl stop pve-cluster
systemctl stop corosync
systemctl start pve-cluster <----- An error occured.


root@Server1:/etc# systemctl start pve-cluster
Job for pve-cluster.service failed because the control process exited with error code.
See "systemctl status pve-cluster.service" and "journalctl -xe" for details.


root@Server1:/etc# systemctl status pve-cluster.service
● pve-cluster.service - The Proxmox VE cluster filesystem
Loaded: loaded (/lib/systemd/system/pve-cluster.service; enabled; vendor preset: enabled)
Active: failed (Result: exit-code) since Thu 2017-08-10 10:51:23 CEST; 29s ago
Process: 30477 ExecStartPost=/usr/bin/pvecm updatecerts --silent (code=exited, status=0/SUCCESS)
Process: 30735 ExecStart=/usr/bin/pmxcfs $DAEMON_OPTS (code=exited, status=255)
Main PID: 30475 (code=exited, status=0/SUCCESS)

Aug 10 10:51:13 Server1 systemd[1]: Starting The Proxmox VE cluster filesystem...
Aug 10 10:51:13 Server1 pmxcfs[30735]: [main] notice: unable to aquire pmxcfs lock - trying again
Aug 10 10:51:13 Server1 pmxcfs[30735]: [main] notice: unable to aquire pmxcfs lock - trying again
Aug 10 10:51:23 Server1 pmxcfs[30735]: [main] crit: unable to aquire pmxcfs lock: Resource temporarily una
Aug 10 10:51:23 Server1 pmxcfs[30735]: [main] crit: unable to aquire pmxcfs lock: Resource temporarily una
Aug 10 10:51:23 Server1 pmxcfs[30735]: [main] notice: exit proxmox configuration filesystem (-1)
Aug 10 10:51:23 Server1 systemd[1]: pve-cluster.service: Control process exited, code=exited status=255
Aug 10 10:51:23 Server1 systemd[1]: Failed to start The Proxmox VE cluster filesystem.
Aug 10 10:51:23 Server1 systemd[1]: pve-cluster.service: Unit entered failed state.
Aug 10 10:51:23 Server1 systemd[1]: pve-cluster.service: Failed with result 'exit-code'.
lines 1-17/17 (END)

Loaded: loaded (/lib/systemd/system/pve-cluster.service; enabled; vendor preset: enabled)
Active: failed (Result: exit-code) since Thu 2017-08-10 10:51:23 CEST; 29s ago
Process: 30477 ExecStartPost=/usr/bin/pvecm updatecerts --silent (code=exited, status=0/SUCCESS)
Process: 30735 ExecStart=/usr/bin/pmxcfs $DAEMON_OPTS (code=exited, status=255)
Main PID: 30475 (code=exited, status=0/SUCCESS)

Aug 10 10:51:13 Server1 systemd[1]: Starting The Proxmox VE cluster filesystem...
Aug 10 10:51:13 Server1 pmxcfs[30735]: [main] notice: unable to aquire pmxcfs lock - trying again
Aug 10 10:51:13 Server1 pmxcfs[30735]: [main] notice: unable to aquire pmxcfs lock - trying again
Aug 10 10:51:23 Server1 pmxcfs[30735]: [main] crit: unable to aquire pmxcfs lock: Resource temporarily unavailable
Aug 10 10:51:23 Server1 pmxcfs[30735]: [main] crit: unable to aquire pmxcfs lock: Resource temporarily unavailable
Aug 10 10:51:23 Server1 pmxcfs[30735]: [main] notice: exit proxmox configuration filesystem (-1)
Aug 10 10:51:23 Server1 systemd[1]: pve-cluster.service: Control process exited, code=exited status=255
Aug 10 10:51:23 Server1 systemd[1]: Failed to start The Proxmox VE cluster filesystem.
Aug 10 10:51:23 Server1 systemd[1]: pve-cluster.service: Unit entered failed state.
Aug 10 10:51:23 Server1 systemd[1]: pve-cluster.service: Failed with result 'exit-code'.


root@Server1:/etc# systemctl status corosync
● corosync.service - Corosync Cluster Engine
Loaded: loaded (/lib/systemd/system/corosync.service; enabled; vendor preset: enabled)
Active: active (running) since Thu 2017-08-10 10:51:23 CEST; 6min ago
Docs: man:corosync
man:corosync.conf
man:corosync_overview
Main PID: 30752 (corosync)
Tasks: 2 (limit: 4915)
CGroup: /system.slice/corosync.service
└─30752 /usr/sbin/corosync -f

Aug 10 10:51:23 Server1 corosync[30752]: notice [MAIN ] Completed service synchronization, ready to provide service.
Aug 10 10:51:23 Server1 corosync[30752]: [QUORUM] This node is within the primary component and will provide service.
Aug 10 10:51:23 Server1 corosync[30752]: [QUORUM] Members[2]: 1 3
Aug 10 10:51:23 Server1 corosync[30752]: [MAIN ] Completed service synchronization, ready to provide service.
Aug 10 10:51:25 Server1 corosync[30752]: notice [TOTEM ] A new membership (192.168.100.11:108) was formed. Members joi
Aug 10 10:51:25 Server1 corosync[30752]: [TOTEM ] A new membership (192.168.100.11:108) was formed. Members joined: 2
Aug 10 10:51:25 Server1 corosync[30752]: notice [QUORUM] Members[3]: 1 2 3
Aug 10 10:51:25 Server1 corosync[30752]: notice [MAIN ] Completed service synchronization, ready to provide service.
Aug 10 10:51:25 Server1 corosync[30752]: [QUORUM] Members[3]: 1 2 3
Aug 10 10:51:25 Server1 corosync[30752]: [MAIN ] Completed service synchronization, ready to provide service.
 
Well, maybe an obvious question, did you reboot the server(s) in question? ;) As reload of the network services might have left over something and now give you a hard time.
 
  • Like
Reactions: Zaman
i just changed the resolv.c
Well, maybe an obvious question, did you reboot the server(s) in question? ;) As reload of the network services might have left over something and now give you a hard time.


i'm going to try...

Anyway just you to know i changed the /etc/resolv.conf and i put my internal DNS webserver...
 
  • Like
Reactions: Zaman

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!