It didn't help. But: the corosync.conf is changed back after restart... i don't know why the token config disappears
I cant edit the corosync in the pve folder because of missing rights. i edited the one in /etc/corosync/corosync. Was that wrong?
I missed to increase the version number. that's a good point
systemctl stop corosync pve-cluster
pmxcfs -l
cp working-corosync.conf /etc/pve/corosync.conf
cp working-corosync.conf /etc/corosync/corosync.conf
killall pmxcfs
systemctl start pve-cluster corosync
$ sha256sum libknet1_1.10-pve2~test1_amd64.deb
64521083486b6b2683826cc95f6d869ab3fde01e9b6d1ae90fae49fcac305478 libknet1_1.10-pve2~test1_amd64.deb
http://download.proxmox.com/debian/pve/dists/buster/pvetest/binary-amd64/libknet-dev_1.10-pve2_amd64.deb
http://download.proxmox.com/debian/pve/dists/buster/pvetest/binary-amd64/libknet-doc_1.10-pve2_all.deb
http://download.proxmox.com/debian/pve/dists/buster/pvetest/binary-amd64/libknet1-dbgsym_1.10-pve2_amd64.deb
http://download.proxmox.com/debian/pve/dists/buster/pvetest/binary-amd64/libknet1_1.10-pve2_amd64.deb
http://download.proxmox.com/debian/pve/dists/buster/pvetest/binary-amd64/libnozzle-dev_1.10-pve2_amd64.deb
http://download.proxmox.com/debian/pve/dists/buster/pvetest/binary-amd64/libnozzle1-dbgsym_1.10-pve2_amd64.deb
http://download.proxmox.com/debian/pve/dists/buster/pvetest/binary-amd64/libnozzle1_1.10-pve2_amd64.deb
With the second set of packages the nodes are unstable. This is the third time one of the nodes isn't pingable anymore ans the only way to get it back is a hardreset
192.168.131.20 ii libknet1:amd64 1.10-pve2~test1 amd64 kronosnet core switching implementation
192.168.131.21 ii libknet1:amd64 1.10-pve2~test1 amd64 kronosnet core switching implementation
192.168.131.22 ii libknet1:amd64 1.10-pve2~test1 amd64 kronosnet core switching implementation
192.168.131.23 ii libknet1:amd64 1.10-pve2~test1 amd64 kronosnet core switching implementation
192.168.131.27 ii libknet1:amd64 1.10-pve2~test1 amd64 kronosnet core switching implementation
192.168.131.28 ii libknet1:amd64 1.10-pve2~test1 amd64 kronosnet core switching implementation
192.168.131.29 ii libknet1:amd64 1.10-pve2~test1 amd64 kronosnet core switching implementation
I think I have a similiar / same issue here.
Two nodes cluster (new r740 and r340) over one switch loosing quorum within a day or so. Once I restarted corosync.service on one node, today I had to restart them on both.
Ask me anything.
Same again today and restarted both corosync servies to connect the nodes again.
What should I post configuratio wise or regarding status of theses "events"?
Are you uptodate on this system, we pushed out a kronosnet (libknet) and kernel update a few days ago, currently still only available through the no-subscription as it's quite recent. Would be great if some people experience those issues could test those packages.
Aug 30 02:16:32 vmhost02 systemd[1]: Started Proxmox VE replication runner.
Aug 30 02:16:32 vmhost02 systemd[1]: Starting Proxmox VE replication runner...
Aug 30 02:16:33 vmhost02 systemd[1]: pvesr.service: Succeeded.
Aug 30 02:16:33 vmhost02 systemd[1]: Started Proxmox VE replication runner.
Aug 30 02:16:56 vmhost02 corosync[3959]: [KNET ] link: host: 2 link: 0 is down
Aug 30 02:16:56 vmhost02 corosync[3959]: [KNET ] host: host: 2 (passive) best link: 0 (pri: 1)
Aug 30 02:16:56 vmhost02 corosync[3959]: [KNET ] host: host: 2 has no active links
Aug 30 02:16:57 vmhost02 corosync[3959]: [TOTEM ] Token has not been received in 36 ms
Aug 30 02:16:57 vmhost02 corosync[3959]: [KNET ] rx: host: 2 link: 0 is up
Aug 30 02:16:57 vmhost02 corosync[3959]: [KNET ] host: host: 2 (passive) best link: 0 (pri: 1)
Aug 30 02:17:00 vmhost02 systemd[1]: Starting Proxmox VE replication runner...
Aug 30 02:17:01 vmhost02 systemd[1]: pvesr.service: Succeeded.
Aug 30 02:17:01 vmhost02 systemd[1]: Started Proxmox VE replication runner.
Aug 30 02:17:01 vmhost02 CRON[15092]: pam_unix(cron:session): session opened for user root by (uid=0)
Aug 30 02:17:01 vmhost02 CRON[15093]: (root) CMD ( cd / && run-parts --report /etc/cron.hourly)
Aug 30 02:17:01 vmhost02 CRON[15092]: pam_unix(cron:session): session closed for user root
Aug 30 02:17:44 vmhost02 corosync[3959]: [TOTEM ] Token has not been received in 750 ms
Aug 30 02:17:44 vmhost02 corosync[3959]: [TOTEM ] A processor failed, forming new configuration.
Aug 30 02:17:45 vmhost02 corosync[3959]: [TOTEM ] A new membership (1:389760) was formed. Members
Aug 30 02:17:45 vmhost02 corosync[3959]: [CPG ] downlist left_list: 0 received
Aug 30 02:17:45 vmhost02 corosync[3959]: [CPG ] downlist left_list: 0 received
Aug 30 02:17:45 vmhost02 corosync[3959]: [QUORUM] Members[2]: 1 2
Aug 30 02:17:45 vmhost02 corosync[3959]: [MAIN ] Completed service synchronization, ready to provide service.
Aug 30 02:17:48 vmhost02 corosync[3959]: [TOTEM ] Token has not been received in 750 ms
Aug 30 02:17:49 vmhost02 corosync[3959]: [TOTEM ] A processor failed, forming new configuration.
Aug 30 02:17:50 vmhost02 corosync[3959]: [TOTEM ] A new membership (1:389764) was formed. Members left: 2
Aug 30 02:17:50 vmhost02 corosync[3959]: [TOTEM ] Failed to receive the leave message. failed: 2
Aug 30 02:17:50 vmhost02 corosync[3959]: [CPG ] downlist left_list: 1 received
Aug 30 02:17:50 vmhost02 pmxcfs[13984]: [dcdb] notice: members: 1/13984
Aug 30 02:17:50 vmhost02 corosync[3959]: [QUORUM] This node is within the non-primary component and will NOT provide any services.
Aug 30 02:17:50 vmhost02 corosync[3959]: [QUORUM] Members[1]: 1
Aug 30 02:17:50 vmhost02 corosync[3959]: [MAIN ] Completed service synchronization, ready to provide service.
Aug 30 02:17:50 vmhost02 pmxcfs[13984]: [status] notice: node lost quorum
Aug 30 02:17:50 vmhost02 pmxcfs[13984]: [status] notice: members: 1/13984
Aug 30 02:18:00 vmhost02 systemd[1]: Starting Proxmox VE replication runner...
Aug 30 02:18:01 vmhost02 pvesr[18172]: trying to acquire cfs lock 'file-replication_cfg' ...
Aug 30 02:18:02 vmhost02 pvesr[18172]: trying to acquire cfs lock 'file-replication_cfg' ...
Aug 30 02:18:02 vmhost02 corosync[3959]: [KNET ] link: host: 2 link: 0 is down
Aug 30 02:18:02 vmhost02 corosync[3959]: [KNET ] host: host: 2 (passive) best link: 0 (pri: 1)
Aug 30 02:18:02 vmhost02 corosync[3959]: [KNET ] host: host: 2 has no active links
Aug 30 02:18:03 vmhost02 pvesr[18172]: trying to acquire cfs lock 'file-replication_cfg' ...
Aug 30 02:18:03 vmhost02 corosync[3959]: [KNET ] rx: host: 2 link: 0 is up
Aug 30 02:18:03 vmhost02 corosync[3959]: [KNET ] host: host: 2 (passive) best link: 0 (pri: 1)
Aug 30 02:18:04 vmhost02 pvesr[18172]: trying to acquire cfs lock 'file-replication_cfg' ...
Aug 30 02:18:05 vmhost02 pvesr[18172]: trying to acquire cfs lock 'file-replication_cfg' ...
Aug 30 02:18:06 vmhost02 pvesr[18172]: trying to acquire cfs lock 'file-replication_cfg' ...
Aug 30 02:18:07 vmhost02 pvesr[18172]: trying to acquire cfs lock 'file-replication_cfg' ...
Aug 30 02:18:08 vmhost02 pvesr[18172]: trying to acquire cfs lock 'file-replication_cfg' ...
Aug 30 02:18:09 vmhost02 pvesr[18172]: trying to acquire cfs lock 'file-replication_cfg' ...
Aug 30 02:18:10 vmhost02 pvesr[18172]: error with cfs lock 'file-replication_cfg': no quorum!
Aug 30 02:18:10 vmhost02 systemd[1]: pvesr.service: Main process exited, code=exited, status=13/n/a
Aug 30 02:18:10 vmhost02 systemd[1]: pvesr.service: Failed with result 'exit-code'.
Aug 30 02:18:10 vmhost02 systemd[1]: Failed to start Proxmox VE replication runner.
Aug 30 02:19:00 vmhost02 systemd[1]: Starting Proxmox VE replication runner...
Aug 30 02:19:01 vmhost02 pvesr[25696]: trying to acquire cfs lock 'file-replication_cfg' ...
Aug 30 02:19:02 vmhost02 pvesr[25696]: trying to acquire cfs lock 'file-replication_cfg' ...
Aug 30 02:19:03 vmhost02 pvesr[25696]: trying to acquire cfs lock 'file-replication_cfg' ...
Aug 30 02:19:04 vmhost02 pvesr[25696]: trying to acquire cfs lock 'file-replication_cfg' ...
Aug 30 02:19:05 vmhost02 pvesr[25696]: trying to acquire cfs lock 'file-replication_cfg' ...
Aug 30 02:19:06 vmhost02 pvesr[25696]: trying to acquire cfs lock 'file-replication_cfg' ...
Aug 30 02:19:07 vmhost02 pvesr[25696]: trying to acquire cfs lock 'file-replication_cfg' ...
Aug 30 02:19:08 vmhost02 pvesr[25696]: trying to acquire cfs lock 'file-replication_cfg' ...
Aug 30 02:19:09 vmhost02 pvesr[25696]: trying to acquire cfs lock 'file-replication_cfg' ...
Aug 30 02:19:10 vmhost02 pvesr[25696]: error with cfs lock 'file-replication_cfg': no quorum!
We use essential cookies to make this site work, and optional cookies to enhance your experience.