proxmox node offline

j0k4b0

Active Member
Apr 1, 2016
59
1
26
28
Hi,

I know, there are so many threads like this, but I could not found a solution.

I have a proxmox cluster with two hosts. Now I want to add a third one.

So, I logged in to the new host and enther "pvecm add <ClusterIp"

I named my hosts by colors: cluster node = admin, second node = blue, my new thrifnode = green

So, I add that to my /etc/hosts file.


pvecm add worked successfully. The node is added to the cluster (I can see the Server in GUI) but the node is offline!

So I checked the status on admin and my green host: "pvecm status"
On the admin host I can see two hosts (admin and blue)
On the green node I can see only the local node.

The admin and blue server are on the same subnet, the green one not. I ask my provider, he told me it is no problem because all servers are in the same vlan.

So I check the multicast functionality. I run this command on any server:
Code:
 omping -c 10000 -i 0.001 -F -q <admin-ip> <blue-ip> <green-ip>

I got this result (copied from admin node), so in my optinion it is correct:
Code:
<blue-ip>  :   unicast, xmt/rcv/%loss = 10000/10000/0%, min/avg/max/std-dev = 0.141/0.438/8.162/0.240
<blue-ip>  : multicast, xmt/rcv/%loss = 10000/10000/0%, min/avg/max/std-dev = 0.172/0.469/8.187/0.243
<green-ip> :   unicast, xmt/rcv/%loss = 10000/10000/0%, min/avg/max/std-dev = 0.131/0.454/4.009/0.211
<green-ip> : multicast, xmt/rcv/%loss = 10000/10000/0%, min/avg/max/std-dev = 0.141/0.467/4.018/0.211

So I checked the logs in my green node, where I can see an critical error:
Code:
root@green:~# journalctl -u corosync -u pve-cluster
-- Logs begin at Sa 2017-11-04 20:25:09 CET, end at So 2017-11-05 11:58:15 CET. --
Nov 04 20:25:18 green.node.<server-ip> systemd[1]: Starting The Proxmox VE cluster filesystem...
Nov 04 20:25:18 green.node.<server-ip> pmxcfs[1053]: [quorum] crit: quorum_initialize failed: 2
Nov 04 20:25:18 green.node.<server-ip> pmxcfs[1053]: [quorum] crit: can't initialize service
Nov 04 20:25:18 green.node.<server-ip> pmxcfs[1053]: [confdb] crit: cmap_initialize failed: 2
Nov 04 20:25:18 green.node.<server-ip> pmxcfs[1053]: [confdb] crit: can't initialize service
Nov 04 20:25:18 green.node.<server-ip> pmxcfs[1053]: [dcdb] crit: cpg_initialize failed: 2
Nov 04 20:25:18 green.node.<server-ip> pmxcfs[1053]: [dcdb] crit: can't initialize service
Nov 04 20:25:18 green.node.<server-ip> pmxcfs[1053]: [status] crit: cpg_initialize failed: 2
Nov 04 20:25:18 green.node.<server-ip> pmxcfs[1053]: [status] crit: can't initialize service
Nov 04 20:25:19 green.node.<server-ip> systemd[1]: Started The Proxmox VE cluster filesystem.
Nov 04 20:25:19 green.node.<server-ip> systemd[1]: Starting Corosync Cluster Engine...
Nov 04 20:25:19 green.node.<server-ip> corosync[1156]: [MAIN  ] Corosync Cluster Engine ('2.4.2'): started and ready to provide service.
Nov 04 20:25:19 green.node.<server-ip> corosync[1156]: [MAIN  ] Corosync built-in features: augeas systemd pie relro bindnow
Nov 04 20:25:19 green.node.<server-ip> corosync[1157]: [TOTEM ] waiting_trans_ack changed to 1
Nov 04 20:25:19 green.node.<server-ip> corosync[1157]: [TOTEM ] Token Timeout (1650 ms) retransmit timeout (392 ms)
Nov 04 20:25:19 green.node.<server-ip> corosync[1157]: [TOTEM ] token hold (303 ms) retransmits before loss (4 retrans)
Nov 04 20:25:19 green.node.<server-ip> corosync[1157]: [TOTEM ] join (50 ms) send_join (0 ms) consensus (1980 ms) merge (200 ms)
Nov 04 20:25:19 green.node.<server-ip> corosync[1157]: [TOTEM ] downcheck (1000 ms) fail to recv const (2500 msgs)
Nov 04 20:25:19 green.node.<server-ip> corosync[1157]: [TOTEM ] seqno unchanged const (30 rotations) Maximum network MTU 1401
Nov 04 20:25:19 green.node.<server-ip> corosync[1157]: [TOTEM ] window size per rotation (50 messages) maximum messages per rotation (17 messages)
Nov 04 20:25:19 green.node.<server-ip> corosync[1157]: [TOTEM ] missed count const (5 messages)
Nov 04 20:25:19 green.node.<server-ip> corosync[1157]: [TOTEM ] send threads (0 threads)
Nov 04 20:25:19 green.node.<server-ip> corosync[1157]: [TOTEM ] RRP token expired timeout (392 ms)
Nov 04 20:25:19 green.node.<server-ip> corosync[1157]: [TOTEM ] RRP token problem counter (2000 ms)
Nov 04 20:25:19 green.node.<server-ip> corosync[1157]: [TOTEM ] RRP threshold (10 problem count)
Nov 04 20:25:19 green.node.<server-ip> corosync[1157]: [TOTEM ] RRP multicast threshold (100 problem count)
Nov 04 20:25:19 green.node.<server-ip> corosync[1157]: [TOTEM ] RRP automatic recovery check timeout (1000 ms)
Nov 04 20:25:19 green.node.<server-ip> corosync[1157]: [TOTEM ] RRP mode set to none.
Nov 04 20:25:19 green.node.<server-ip> corosync[1157]: [TOTEM ] heartbeat_failures_allowed (0)
Nov 04 20:25:19 green.node.<server-ip> corosync[1157]: [TOTEM ] max_network_delay (50 ms)
Nov 04 20:25:19 green.node.<server-ip> corosync[1157]: [TOTEM ] HeartBeat is Disabled. To enable set heartbeat_failures_allowed > 0
Nov 04 20:25:19 green.node.<server-ip> corosync[1157]: [TOTEM ] Initializing transport (UDP/IP Multicast).
Nov 04 20:25:19 green.node.<server-ip> corosync[1157]: [TOTEM ] Initializing transmit/receive security (NSS) crypto: aes256 hash: sha1
Nov 04 20:25:19 green.node.<server-ip> corosync[1157]: [TOTEM ] Receive multicast socket recv buffer size (320000 bytes).
Nov 04 20:25:19 green.node.<server-ip> corosync[1157]: [TOTEM ] Transmit multicast socket send buffer size (320000 bytes).
Nov 04 20:25:19 green.node.<server-ip> corosync[1157]: [TOTEM ] Local receive multicast loop socket recv buffer size (320000 bytes).
Nov 04 20:25:19 green.node.<server-ip> corosync[1157]: [TOTEM ] Local transmit multicast loop socket send buffer size (320000 bytes).
Nov 04 20:25:19 green.node.<server-ip> corosync[1157]: [TOTEM ] The network interface is down.
Nov 04 20:25:19 green.node.<server-ip> corosync[1157]: [TOTEM ] Created or loaded sequence id 14.127.0.0.1 for this ring.
Nov 04 20:25:19 green.node.<server-ip> corosync[1157]: [SERV  ] Service engine loaded: corosync configuration map access [0]
Nov 04 20:25:19 green.node.<server-ip> corosync[1157]: [MAIN  ] Initializing IPC on cmap [0]
Nov 04 20:25:19 green.node.<server-ip> corosync[1157]: [MAIN  ] No configured qb.ipc_type. Using native ipc
Nov 04 20:25:19 green.node.<server-ip> corosync[1157]: [QB    ] server name: cmap

Any Idea how to fix?

Thanks for your help!

**Add**
If I click to the green node in the web-gui, I got the following error: ssl3_get_server_certificate: certificate verify failed (596)

I think this is because the /etc/pve is not synchron.
 
Last edited:
Srange...
  1. So the nodes are in same vlan/subnet?
  2. If ping it manualy that is working with vmbr0 and with the clusterip's
  3. Did you force creater certs again?
 
Hi,

thanks for your answer!

1. The nodes on the same vlan, not the same subnet. But my hoster told me it is not a problem (He use it on the same way).
2. I can ping and ssh all nodes from all nodes, there is no trouble
3. You mean "pvecm updatecerts"? Sure, I did that on the cluster(admin) and the node. It is still not working.

Hope you have any other idea! :)