CLUSTER problem: pvedaemon[481993]: WARNING: ipcc_send_rec failed: Transport endpoint

C

coffe

Guest
one of 6 nodes in my cluster keeps falling out.
pvedaemon[481993]: WARNING: ipcc_send_rec failed: Transport endpoint is not connected

notice its stops the cman.
one way of getting it back for 1-30 min is to restart service cman and pve-cluster.

is it any way of debugging the pvedeamon error ?

Best Regards
Coffe
 
Re: CLUSTER problem: pvedaemon[481993]: WARNING: ipcc_send_rec failed: Transport endp

You need to debug cman (corosync). Any errors in /var/log/syslog?

# grep corosync /var/log/syslog
 
Re: CLUSTER problem: pvedaemon[481993]: WARNING: ipcc_send_rec failed: Transport endp

its lots of info this is the first sign after a restart of services that its goind to fail again:corosync[535241]: [TOTEM ] FAILED TO RECEIVE
 
Re: CLUSTER problem: pvedaemon[481993]: WARNING: ipcc_send_rec failed: Transport endp

its lots of info this is the first sign after a restart of services that its goind to fail again:corosync[535241]: [TOTEM ] FAILED TO RECEIVE

Problem with multicast? Is there high load on the network? Do you use separate network for cluster communication?
 
Re: CLUSTER problem: pvedaemon[481993]: WARNING: ipcc_send_rec failed: Transport endp

tested multicast could not see any problem, its only this node.. the other 5 is working good.
networkis not so heawy loaded .. did try to use the dedicated network for them but it selected to use this. all nodes have 2 networks .17. and .99. eth0 is connected to .17. but i would like it to use .99.

Best Regards
Coffe
 
Re: CLUSTER problem: pvedaemon[481993]: WARNING: ipcc_send_rec failed: Transport endp

tested multicast could not see any problem, its only this node.. the other 5 is working good.

What is different on this node?
 
Re: CLUSTER problem: pvedaemon[481993]: WARNING: ipcc_send_rec failed: Transport endp

it where the first node i did set up in this cluster. else from that its the same hardware as the rest of them.

Best Regards
Coffe
 
Re: CLUSTER problem: pvedaemon[481993]: WARNING: ipcc_send_rec failed: Transport endp

Oct 9 14:07:22 cloud904 kernel: sd 5:0:0:0: [sdb] Very big device. Trying to use READ CAPACITY(16).
Oct 9 14:07:26 cloud904 corosync[551453]: [TOTEM ] FAILED TO RECEIVE
Oct 9 14:07:38 cloud904 pmxcfs[551618]: [quorum] crit: quorum_dispatch failed: 2
Oct 9 14:07:38 cloud904 dlm_controld[551534]: cluster is down, exiting
Oct 9 14:07:38 cloud904 fenced[551521]: cluster is down, exiting
Oct 9 14:07:38 cloud904 dlm_controld[551534]: daemon cpg_dispatch error 2
Oct 9 14:07:38 cloud904 fenced[551521]: daemon cpg_dispatch error 2
Oct 9 14:07:38 cloud904 pmxcfs[551618]: [libqb] warning: epoll_ctl(del): Bad file descriptor (9)
Oct 9 14:07:38 cloud904 pmxcfs[551618]: [confdb] crit: confdb_dispatch failed: 2
Oct 9 14:07:40 cloud904 pmxcfs[551618]: [libqb] warning: epoll_ctl(del): Bad file descriptor (9)
Oct 9 14:07:40 cloud904 pmxcfs[551618]: [dcdb] crit: cpg_dispatch failed: 2
Oct 9 14:07:40 cloud904 kernel: dlm: closing connection to node 6
Oct 9 14:07:40 cloud904 kernel: dlm: closing connection to node 5
Oct 9 14:07:40 cloud904 kernel: dlm: closing connection to node 4
Oct 9 14:07:40 cloud904 kernel: dlm: closing connection to node 3
Oct 9 14:07:40 cloud904 kernel: dlm: closing connection to node 2
Oct 9 14:07:40 cloud904 kernel: dlm: closing connection to node 1
Oct 9 14:07:42 cloud904 pmxcfs[551618]: [dcdb] crit: cpg_leave failed: 2
then it spams syslog with
Oct 9 14:09:21 cloud904 pmxcfs[551618]: [status] crit: cpg_send_message failed: 9
 
Re: CLUSTER problem: pvedaemon[481993]: WARNING: ipcc_send_rec failed: Transport endp

Oct 9 14:07:22 cloud904 kernel: sd 5:0:0:0: [sdb] Very big device. Trying to use READ CAPACITY(16).
Oct 9 14:07:26 cloud904 corosync[551453]: [TOTEM ] FAILED TO RECEIVE

This is the problem (corosync crash). We observed that on several installations, but are unable to reproduce the bug.
 
Re: CLUSTER problem: pvedaemon[481993]: WARNING: ipcc_send_rec failed: Transport endp

Do you use latest version? Maybe you can try with latest code from pvetest repository?
 
Re: CLUSTER problem: pvedaemon[481993]: WARNING: ipcc_send_rec failed: Transport endp

It where working without a problem for 2weeks. then after latest update it started to fail.
 
Re: CLUSTER problem: pvedaemon[481993]: WARNING: ipcc_send_rec failed: Transport endp

no. normal pve repo.

also got the problem that it lost connections to the VMs sockets.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!