unable to start server: unable to create socket - PVE::APIDaemon: Address already in

charnov · Aug 8, 2012

Re: unable to start server: unable to create socket - PVE::APIDaemon: Address already

dietmar said:
And the stop was successful?

Figured out the other node was somehow still interfering. Killed cman on it, cleared /etc/pve/, and simultaneously restarted cman (and other services) on all nodes in the cluster. I can get back into the cluster from one node, but the other gives me log in errors and is not controllable from the other. It's the one that really needs to be rebooted at this point (but I can't until Sunday).

Boy, I wish I had a third node (management wouldn't let me get another one)...

charnov · Aug 8, 2012

Re: unable to start server: unable to create socket - PVE::APIDaemon: Address already

Stopped pve-cluster on both and started again... looks sort of good, now.

dietmar · Aug 9, 2012

Re: unable to start server: unable to create socket - PVE::APIDaemon: Address already

BTW, what network card/driver do you use (igb?). And do you use bonding?

techguys · Aug 9, 2012

Re: unable to start server: unable to create socket - PVE::APIDaemon: Address already

I have 3 different networks in use on the server. eth0 is used for 'infrastructure' and is the nic/interface used to connect to the server for ssh, web interface, etc. It's one of the onboard NICs (INtel):
hyper4:~# dmesg | grep eth1
e1000e 0000:13:00.1: eth1: (PCI Express:2.5GT/s:Width x4) 00:15:17:21:30:17
e1000e 0000:13:00.1: eth1: Intel(R) PRO/1000 Network Connection
e1000e 0000:13:00.1: eth1: MAC: 1, PHY: 4, PBA No: C57721-005
e1000e 0000:13:00.1: eth1: changing MTU from 1500 to 9000

bond0 uses eth1 and eth2 and is used for accessing the 'shared storage' vlan.
eth3 is used for vlan20 which is a private vlan for some vms on this server.

All are Gb connections and plugged into a cisco 3750.

charnov · Aug 12, 2012

Re: unable to start server: unable to create socket - PVE::APIDaemon: Address already

Had to completely rebuild the node and the cluster. It appears whatever changes in the clustering came in 2.1 are dramatically different from the previous version. I had to do a considerable amount of changes on our L3 core switch to accommodate the traffic and I am not too happy about. It seems extremely "chatty".

I will say the network traffic with my nodes, which is typical for a VM server with a SAN, saturates multiple 1 gigabit links with dedicated NICs for the storage arrays. This is definitely not a friendly environment for multicast traffic, especially latency sensitive traffic.

dietmar · Aug 13, 2012

Re: unable to start server: unable to create socket - PVE::APIDaemon: Address already

charnov said:
I will say the network traffic with my nodes, which is typical for a VM server with a SAN, saturates multiple 1 gigabit links with dedicated NICs for the storage arrays. This is definitely not a friendly environment for multicast traffic, especially latency sensitive traffic.

You need reliable multicast. Else cluster will not work. Maybe you can separate cluster traffic form SAN traffic?

charnov · Aug 13, 2012

Re: unable to start server: unable to create socket - PVE::APIDaemon: Address already

It had been working but something with the current build changed. I had to turn on multicast PIM sparse mode and IGMP snooping to prevent flooding. The nodes are already in their own VLAN to prevent flooding across segments but it is still really chatty. From reading on Totem and corosync, it looks like it is extremely sensitive to latency and should be treated like VoIP or the old Microsoft cluster services. If possible, give it it's own dedicated NIC and segment.

Maybe switch from CMan to Pacemaker or using corosync over TCP? Probably more trouble than it is worth... root cause of my previous issue (as near as I can tell) was multicast heartbeat being disrupted by heavy traffic on the ports and the corruption the cluster FS was due to my trying to fix stuff by hand (mucking with clustered files on one disconnected node is bad... oops). Fix was changing the switch multicast mode to Sparse instead of Dense which limits flooding and multicast to specific ports and builds a shortest path tree to keep it that way. Traffic is now properly segmented and zipping along again. Just live migrated one of the busiest server instances and no one noticed (along with hitting 78MB/s on the migrate).

dietmar · Aug 13, 2012

Re: unable to start server: unable to create socket - PVE::APIDaemon: Address already

charnov said:
Maybe switch from CMan to Pacemaker or using corosync over TCP?

Sorry, but CMan uses corosync (CMAN and Pacemaker use the same corosync cluster engine).

charnov · Aug 13, 2012

Re: unable to start server: unable to create socket - PVE::APIDaemon: Address already

True but you don't need CMan for cluster management. You can use pacemaker directly.

dietmar · Aug 13, 2012

Re: unable to start server: unable to create socket - PVE::APIDaemon: Address already

charnov said:
True but you don't need CMan for cluster management. You can use pacemaker directly.

But that is not advantage at all, because you still have to run the whole corosync cluster stack.

Search

Search

unable to start server: unable to create socket - PVE::APIDaemon: Address already in

charnov

Guest

charnov

Guest

dietmar

Proxmox Staff Member

techguys

Member

charnov

Guest

dietmar

Proxmox Staff Member

charnov

Guest

dietmar

Proxmox Staff Member

charnov

Guest

dietmar

Proxmox Staff Member

We value your privacy