CLUSTER problem: pvedaemon[481993]: WARNING: ipcc_send_rec failed: Transport endpoint

  • Thread starter Thread starter coffe
  • Start date Start date
C

coffe

Guest
one of 6 nodes in my cluster keeps falling out.
pvedaemon[481993]: WARNING: ipcc_send_rec failed: Transport endpoint is not connected

notice its stops the cman.
one way of getting it back for 1-30 min is to restart service cman and pve-cluster.

is it any way of debugging the pvedeamon error ?

Best Regards
Coffe
 
Re: CLUSTER problem: pvedaemon[481993]: WARNING: ipcc_send_rec failed: Transport endp

You need to debug cman (corosync). Any errors in /var/log/syslog?

# grep corosync /var/log/syslog
 
Re: CLUSTER problem: pvedaemon[481993]: WARNING: ipcc_send_rec failed: Transport endp

its lots of info this is the first sign after a restart of services that its goind to fail again:corosync[535241]: [TOTEM ] FAILED TO RECEIVE
 
Re: CLUSTER problem: pvedaemon[481993]: WARNING: ipcc_send_rec failed: Transport endp

its lots of info this is the first sign after a restart of services that its goind to fail again:corosync[535241]: [TOTEM ] FAILED TO RECEIVE

Problem with multicast? Is there high load on the network? Do you use separate network for cluster communication?
 
Re: CLUSTER problem: pvedaemon[481993]: WARNING: ipcc_send_rec failed: Transport endp

tested multicast could not see any problem, its only this node.. the other 5 is working good.
networkis not so heawy loaded .. did try to use the dedicated network for them but it selected to use this. all nodes have 2 networks .17. and .99. eth0 is connected to .17. but i would like it to use .99.

Best Regards
Coffe
 
Re: CLUSTER problem: pvedaemon[481993]: WARNING: ipcc_send_rec failed: Transport endp

tested multicast could not see any problem, its only this node.. the other 5 is working good.

What is different on this node?
 
Re: CLUSTER problem: pvedaemon[481993]: WARNING: ipcc_send_rec failed: Transport endp

it where the first node i did set up in this cluster. else from that its the same hardware as the rest of them.

Best Regards
Coffe
 
Re: CLUSTER problem: pvedaemon[481993]: WARNING: ipcc_send_rec failed: Transport endp

Oct 9 14:07:22 cloud904 kernel: sd 5:0:0:0: [sdb] Very big device. Trying to use READ CAPACITY(16).
Oct 9 14:07:26 cloud904 corosync[551453]: [TOTEM ] FAILED TO RECEIVE
Oct 9 14:07:38 cloud904 pmxcfs[551618]: [quorum] crit: quorum_dispatch failed: 2
Oct 9 14:07:38 cloud904 dlm_controld[551534]: cluster is down, exiting
Oct 9 14:07:38 cloud904 fenced[551521]: cluster is down, exiting
Oct 9 14:07:38 cloud904 dlm_controld[551534]: daemon cpg_dispatch error 2
Oct 9 14:07:38 cloud904 fenced[551521]: daemon cpg_dispatch error 2
Oct 9 14:07:38 cloud904 pmxcfs[551618]: [libqb] warning: epoll_ctl(del): Bad file descriptor (9)
Oct 9 14:07:38 cloud904 pmxcfs[551618]: [confdb] crit: confdb_dispatch failed: 2
Oct 9 14:07:40 cloud904 pmxcfs[551618]: [libqb] warning: epoll_ctl(del): Bad file descriptor (9)
Oct 9 14:07:40 cloud904 pmxcfs[551618]: [dcdb] crit: cpg_dispatch failed: 2
Oct 9 14:07:40 cloud904 kernel: dlm: closing connection to node 6
Oct 9 14:07:40 cloud904 kernel: dlm: closing connection to node 5
Oct 9 14:07:40 cloud904 kernel: dlm: closing connection to node 4
Oct 9 14:07:40 cloud904 kernel: dlm: closing connection to node 3
Oct 9 14:07:40 cloud904 kernel: dlm: closing connection to node 2
Oct 9 14:07:40 cloud904 kernel: dlm: closing connection to node 1
Oct 9 14:07:42 cloud904 pmxcfs[551618]: [dcdb] crit: cpg_leave failed: 2
then it spams syslog with
Oct 9 14:09:21 cloud904 pmxcfs[551618]: [status] crit: cpg_send_message failed: 9
 
Re: CLUSTER problem: pvedaemon[481993]: WARNING: ipcc_send_rec failed: Transport endp

Oct 9 14:07:22 cloud904 kernel: sd 5:0:0:0: [sdb] Very big device. Trying to use READ CAPACITY(16).
Oct 9 14:07:26 cloud904 corosync[551453]: [TOTEM ] FAILED TO RECEIVE

This is the problem (corosync crash). We observed that on several installations, but are unable to reproduce the bug.
 
Re: CLUSTER problem: pvedaemon[481993]: WARNING: ipcc_send_rec failed: Transport endp

Do you use latest version? Maybe you can try with latest code from pvetest repository?
 
Re: CLUSTER problem: pvedaemon[481993]: WARNING: ipcc_send_rec failed: Transport endp

It where working without a problem for 2weeks. then after latest update it started to fail.
 
Re: CLUSTER problem: pvedaemon[481993]: WARNING: ipcc_send_rec failed: Transport endp

no. normal pve repo.

also got the problem that it lost connections to the VMs sockets.