Cluster die after adding the 39th node! Proxmox is not stable!

lazypaul · Sep 7, 2020

My cluster have 38 nodes with ceph, yesterday, I added the 39th node, the hole cluster die!!!!

I don't have HA enabled, I think the node should not reboot, but my cluserter has 2 node reboot, and the cluster spit to different cluster quorum, like node 1, 3, 5, 7 , another is 2,4, 6,8. Really a disaster!

How I slove it :

1. Reboot all node not work!
2. I shutdown all the 39 nodes, start 3 nodes that start one by one, that works !

one reboot node log:
last reboot 0:11- 17 rebooting

Code:

root@g8kvm02:/var/log# last | grep -i boot
reboot   system boot  5.4.55-1-pve     Mon Sep  7 00:17   still running
reboot   system boot  5.4.55-1-pve     Mon Aug 31 10:43   still running
reboot   system boot  5.4.34-1-pve     Wed Aug 19 20:27 - 10:40 (11+14:12)
reboot   system boot  5.4.34-1-pve     Wed Aug 19 17:45 - 10:40 (11+16:54)
reboot   system boot  5.4.34-1-pve     Tue Aug 18 12:37 - 10:40 (12+22:02)
reboot   system boot  5.4.34-1-pve     Tue Aug 18 12:13 - 10:40 (12+22:26)
reboot   system boot  5.4.34-1-pve     Tue Aug 18 09:56 - 12:05  (02:09)
reboot   system boot  5.4.34-1-pve     Tue Aug 18 00:37 - 12:05  (11:27)
reboot   system boot  5.4.34-1-pve     Mon Aug 17 19:32 - 12:05  (16:32)
reboot   system boot  5.4.34-1-pve     Mon Aug 17 19:14 - 19:29  (00:14)
reboot   system boot  5.4.34-1-pve     Mon Aug 17 19:00 - 19:29  (00:28)
reboot   system boot  5.4.34-1-pve     Fri Aug  7 14:56 - 19:29 (10+04:33)
reboot   system boot  5.4.34-1-pve     Wed Jun 24 23:54 - 14:51 (43+14:57)
reboot   system boot  5.4.34-1-pve     Thu Jun 25 07:10 - 23:49  (-7:20)

reboot log

Code:

Sep  7 00:12:10 g8kvm02 ceph-osd[2292]: 2020-09-07 00:12:10.885 7fa29b6fa700 -1 osd.8 96815 heartbeat_check: no reply from 10.0.141.36:6851 osd.302 since back 2020-09-07 00:10:45.448456 front 2020-09-07 00:10:52.950887 (oldest deadline 2020-09-07 00:11:07.747487)
Sep  7 00:12:10 g8kvm02 ceph-osd[2292]: 2020-09-07 00:12:10.885 7fa29b6fa700 -1 osd.8 96815 heartbeat_check: no reply from 10.0.141.36:6838 osd.303 since back 2020-09-07 00:10:52.950433 front 2020-09-07 00:10:52.950764 (oldest deadline 2020-09-07 00:11:15.249640)
Sep  7 00:12:10 g8kvm02 ceph-osd[2292]: 2020-09-07 00:12:10.885 7fa29b6fa700 -1 osd.8 96815 heartbeat_check: no reply from 10.0.141.37:6850 osd.305 since back 2020-09-07 00:10:52.950589 front 2020-09-07 00:10:52.950805 (oldest deadline 2020-09-07 00:11:15.249640)
Sep  7 00:12:10 g8kvm02 ceph-osd[2292]: 2020-09-07 00:12:10.885 7fa29b6fa700 -1 osd.8 96815 heartbeat_check: no reply from 10.0.141.37:6810 osd.307 since back 2020-09-07 00:10:52.950811 front 2020-09-07 00:10:52.950563 (oldest deadline 2020-09-07 00:11:15.249640)
Sep  7 00:12:10 g8kvm02 ceph-osd[2292]: 2020-09-07 00:12:10.885 7fa29b6fa700 -1 osd.8 96815 heartbeat_check: no reply from 10.0.141.38:6835 osd.313 since back 2020-09-07 00:10:39.646821 front 2020-09-07 00:10:39.647357 (oldest deadline 2020-09-07 00:11:04.346043)
Sep  7 00:12:10 g8kvm02 ceph-osd[2292]: 2020-09-07 00:12:10.885 7fa29b6fa700 -1 osd.8 96815 heartbeat_check: no reply from 10.0.141.38:6868 osd.314 since back 2020-09-07 00:10:39.647466 front 2020-09-07 00:10:39.646843 (oldest deadline 2020-09-07 00:11:04.346043)
Sep  7 00:12:10 g8kvm02 ceph-osd[2292]: 2020-09-07 00:12:10.885 7fa29b6fa700 -1 osd.8 96815 heartbeat_check: no reply from 10.0.141.38:6814 osd.319 since back 2020-09-07 00:10:39.647415 front 2020-09-07 00:10:39.647523 (oldest deadline 2020-09-07 00:11:04.346043)
Sep  7 00:12:10 g8kvm02 ceph-osd[2292]: 2020-09-07 00:12:10.885 7fa29b6fa700 -1 osd.8 96815 heartbeat_check: no reply from 10.0.141.38:6854 osd.320 since back 2020-09-07 00:10:39.647401 front 2020-09-07 00:10:39.647537 (oldest deadline 2020-09-07 00:11:04.346043)
^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@Sep  7 00:17:12 g8kvm02 dmeventd[755]: dmeventd ready for processing.
Sep  7 00:17:12 g8kvm02 lvm[748]:   1 logical volume(s) in volume group "ceph-e6fd3e3c-8853-4dd2-9e0e-6399af0ba30b" monitored
Sep  7 00:17:12 g8kvm02 kernel: [    0.000000] Linux version 5.4.55-1-pve (root@nora) (gcc version 8.3.0 (Debian 8.3.0-6)) #1 SMP PVE 5.4.55-1 (Mon, 10 Aug 2020 10:26:27 +0200) ()
Sep  7 00:17:12 g8kvm02 lvm[748]:   1 logical volume(s) in volume group "ceph-d07a5580-6e25-4c92-8a4a-76e949751b87" monitored
Sep  7 00:17:12 g8kvm02 kernel: [    0.000000] Command line: BOOT_IMAGE=/boot/vmlinuz-5.4.55-1-pve root=/dev/mapper/pve-root ro quiet nmi_watchdog=0
Sep  7 00:17:12 g8kvm02 lvm[748]:   1 logical volume(s) in volume group "ceph-c732bd53-0f25-40c7-b726-fb67866d8176" monitored
Sep  7 00:17:12 g8kvm02 kernel: [    0.000000] KERNEL supported cpus:
Sep  7 00:17:12 g8kvm02 kernel: [    0.000000]   Intel GenuineIntel
Sep  7 00:17:12 g8kvm02 kernel: [    0.000000]   AMD AuthenticAMD
Sep  7 00:17:12 g8kvm02 lvm[748]:   1 logical volume(s) in volume group "ceph-9e42c888-294f-4c48-a575-285e72ee4114" monitored
Sep  7 00:17:12 g8kvm02 kernel: [    0.000000]   Hygon HygonGenuine
Sep  7 00:17:12 g8kvm02 kernel: [    0.000000]   Centaur CentaurHauls
Sep  7 00:17:12 g8kvm02 kernel: [    0.000000]   zhaoxin   Shanghai
Sep  7 00:17:12 g8kvm02 lvm[748]:   1 logical volume(s) in volume group "ceph-bfc41a84-297e-42f0-8170-0b2dc3d9f2cf" monitored
Sep  7 00:17:12 g8kvm02 kernel: [    0.000000] x86/fpu: Supporting XSAVE feature 0x001: 'x87 floating point registers'
Sep  7 00:17:12 g8kvm02 kernel: [    0.000000] x86/fpu: Supporting XSAVE feature 0x002: 'SSE registers'
Sep  7 00:17:12 g8kvm02 kernel: [    0.000000] x86/fpu: Supporting XSAVE feature 0x004: 'AVX registers'
Sep  7 00:17:12 g8kvm02 kernel: [    0.000000] x86/fpu: xstate_offset[2]:  576, xstate_sizes[2]:  256
Sep  7 00:17:12 g8kvm02 lvm[748]:   1 logical volume(s) in volume group "ceph-e908d34c-5cae-4dfa-9e1c-33a5c3925c73" monitored

spirit · Sep 7, 2020

Hi,

I think you have enabled HA previously ?

https://forum.proxmox.com/threads/p...node-auto-reboot-need-help.74643/#post-333093

Did you have only remove HA from the vm since this time ?
or do you have also stop/start pve-ha-lrm service ? (or maybe reboot the nodes)

(Because, once you have HA enable vm, the lrm service enable the watchdog. and only at lrm service stop, the watchdog is disabled).

could you send /var/log/daemon.log from each node ? (interesting logs are corosync,pve-ha-lrm,pve-ha-crm, pmxcfs,...)

lazypaul · Sep 7, 2020

spirit said:
Hi,

I think you have enabled HA previously ?

https://forum.proxmox.com/threads/p...node-auto-reboot-need-help.74643/#post-333093

Did you have only remove HA from the vm since this time ?
or do you have also stop/start pve-ha-lrm service ? (or maybe reboot the nodes)

(Because, once you have HA enable vm, the lrm service enable the watchdog. and only at lrm service stop, the watchdog is disabled).

could you send /var/log/daemon.log from each node ? (interesting logs are corosync,pve-ha-lrm,pve-ha-crm, pmxcfs,...)

I had close HA already, but the HA web interface show quorum OK, master g8kvm31 idle, all other node are idle.

They told just remove HA groups the HA will stop, does it right? Still the HA problem ?

And the reboot log has nothing about the die node info, just told disconnect

lazypaul · Sep 7, 2020

Other node log:

Code:

Sep  7 00:02:01 g8kvm01 pvesr[3114968]: trying to acquire cfs lock 'file-replication_cfg' ...
Sep  7 00:02:02 g8kvm01 systemd[1]: pvesr.service: Succeeded.
Sep  7 00:02:02 g8kvm01 systemd[1]: Started Proxmox VE replication runner.
Sep  7 00:03:00 g8kvm01 systemd[1]: Starting Proxmox VE replication runner...
Sep  7 00:03:01 g8kvm01 systemd[1]: pvesr.service: Succeeded.
Sep  7 00:03:01 g8kvm01 systemd[1]: Started Proxmox VE replication runner.
Sep  7 00:03:30 g8kvm01 pvedaemon[2925731]: <root@pam> end task UPID:g8kvm01:002F4988:035F1BD4:5F5504AE:vncshell::root@pam: OK
Sep  7 00:03:30 g8kvm01 pveproxy[3113107]: worker exit
Sep  7 00:03:31 g8kvm01 pmxcfs[1635]: [status] notice: received log
Sep  7 00:03:31 g8kvm01 pmxcfs[1635]: [status] notice: received log
Sep  7 00:03:37 g8kvm01 pmxcfs[1635]: [dcdb] notice: members: 1/1635, 2/1655, 3/1711, 4/1625, 5/1547, 6/1707, 7/1709, 8/1458, 9/1454, 10/1507, 11/1500, 12/1474, 13/1438, 14/1479, 15/3825, 16/1684, 17/1739, 18/1376, 19/1445, 20/1456, 21/1448, 22/1626, 23/1575, 24/1610, 25/1509, 26/1504, 27/1542, 28/1861, 29/1591, 30/1618, 31/1589, 32/1588, 33/1576, 34/1969, 35/41728, 36/42607, 37/42948, 38/43877
Sep  7 00:03:37 g8kvm01 pmxcfs[1635]: [dcdb] notice: starting data syncronisation
Sep  7 00:03:37 g8kvm01 pmxcfs[1635]: [status] notice: members: 1/1635, 2/1655, 3/1711, 4/1625, 5/1547, 6/1707, 7/1709, 8/1458, 9/1454, 10/1507, 11/1500, 12/1474, 13/1438, 14/1479, 15/3825, 16/1684, 17/1739, 18/1376, 19/1445, 20/1456, 21/1448, 22/1626, 23/1575, 24/1610, 25/1509, 26/1504, 27/1542, 28/1861, 29/1591, 30/1618, 31/1589, 32/1588, 33/1576, 34/1969, 35/41728, 36/42607, 37/42948, 38/43877
Sep  7 00:03:37 g8kvm01 pmxcfs[1635]: [status] notice: starting data syncronisation
Sep  7 00:03:37 g8kvm01 corosync[1711]:   [TOTEM ] A new membership (1.2d9e) was formed. Members left: 39
Sep  7 00:03:38 g8kvm01 corosync[1711]:   [QUORUM] Members[38]: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24\
Sep  7 00:03:38 g8kvm01 corosync[1711]:   [QUORUM] Members[38]: 25 26 27 28 29 30 31 32 33 34 35 36 37 38
Sep  7 00:03:38 g8kvm01 corosync[1711]:   [MAIN  ] Completed service synchronization, ready to provide service.
Sep  7 00:03:38 g8kvm01 pmxcfs[1635]: [dcdb] notice: received sync request (epoch 1/1635/0000001C)
Sep  7 00:03:38 g8kvm01 pmxcfs[1635]: [status] notice: received sync request (epoch 1/1635/00000018)
Sep  7 00:03:54 g8kvm01 corosync[1711]:   [KNET  ] link: host: 39 link: 0 is down
Sep  7 00:03:54 g8kvm01 corosync[1711]:   [KNET  ] host: host: 39 (passive) best link: 0 (pri: 1)
Sep  7 00:03:54 g8kvm01 corosync[1711]:   [KNET  ] host: host: 39 has no active links
Sep  7 00:04:00 g8kvm01 systemd[1]: Starting Proxmox VE replication runner...
Sep  7 00:04:11 g8kvm01 corosync[1711]:   [TOTEM ] Token has not been received in 1573 ms
Sep  7 00:04:22 g8kvm01 corosync[1711]:   [TOTEM ] A processor failed, forming new configuration.
Sep  7 00:05:23 g8kvm01 pvedaemon[2739282]: <root@pam> successful auth for user 'root@pam'
Sep  7 00:06:08 g8kvm01 corosync[1711]:   [TOTEM ] Retransmit List: 7f 80 81 82 83 84 85 86
Sep  7 00:06:29 g8kvm01 corosync[1711]:   [TOTEM ] Retransmit List: 80 81 82 83 84 85 86 b8 b9 ba bb bc bd c7 c8
Sep  7 00:06:51 g8kvm01 corosync[1711]:   [TOTEM ] Retransmit List: c8 e7 e8 e9 ea
Sep  7 00:07:01 g8kvm01 corosync[1711]:   [TOTEM ] Retransmit List: ea
Sep  7 00:07:35 g8kvm01 kernel: [566820.734890] INFO: task pvesr:3117203 blocked for more than 120 seconds.
Sep  7 00:07:35 g8kvm01 kernel: [566820.735492]       Tainted: P           O      5.4.55-1-pve #1
Sep  7 00:07:35 g8kvm01 kernel: [566820.735996] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Sep  7 00:07:35 g8kvm01 kernel: [566820.736540] pvesr           D    0 3117203      1 0x00000000
Sep  7 00:07:35 g8kvm01 kernel: [566820.736542] Call Trace:
Sep  7 00:07:35 g8kvm01 kernel: [566820.736551]  __schedule+0x2e6/0x6f0
Sep  7 00:07:35 g8kvm01 kernel: [566820.736555]  ? filename_parentat.isra.57.part.58+0xf7/0x180
Sep  7 00:07:35 g8kvm01 kernel: [566820.736556]  schedule+0x33/0xa0
Sep  7 00:07:35 g8kvm01 kernel: [566820.736560]  rwsem_down_write_slowpath+0x2ed/0x4a0
Sep  7 00:07:35 g8kvm01 kernel: [566820.736561]  down_write+0x3d/0x40
Sep  7 00:07:35 g8kvm01 kernel: [566820.736563]  filename_create+0x8e/0x180
Sep  7 00:07:35 g8kvm01 kernel: [566820.736564]  do_mkdirat+0x59/0x110
Sep  7 00:07:36 g8kvm01 kernel: [566820.736566]  __x64_sys_mkdir+0x1b/0x20
Sep  7 00:07:36 g8kvm01 kernel: [566820.736568]  do_syscall_64+0x57/0x190
Sep  7 00:07:36 g8kvm01 kernel: [566820.736570]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
Sep  7 00:07:36 g8kvm01 kernel: [566820.736572] RIP: 0033:0x7f7938c5b0d7
Sep  7 00:07:36 g8kvm01 kernel: [566820.736575] Code: Bad RIP value.
Sep  7 00:07:36 g8kvm01 kernel: [566820.736576] RSP: 002b:00007ffe8c18cb08 EFLAGS: 00000246 ORIG_RAX: 0000000000000053
Sep  7 00:07:36 g8kvm01 kernel: [566820.736577] RAX: ffffffffffffffda RBX: 000055cd8eb9b260 RCX: 00007f7938c5b0d7
Sep  7 00:07:36 g8kvm01 kernel: [566820.736578] RDX: 000055cd8db813d4 RSI: 00000000000001ff RDI: 000055cd92c05510
Sep  7 00:07:36 g8kvm01 kernel: [566820.736578] RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000008
Sep  7 00:07:36 g8kvm01 kernel: [566820.736579] R10: 0000000000000000 R11: 0000000000000246 R12: 000055cd8ff5aa78
Sep  7 00:07:36 g8kvm01 kernel: [566820.736579] R13: 000055cd92c05510 R14: 000055cd9288d4e0 R15: 00000000000001ff
Sep  7 00:07:50 g8kvm01 corosync[1711]:   [KNET  ] rx: host: 39 link: 0 is up
Sep  7 00:07:50 g8kvm01 corosync[1711]:   [KNET  ] host: host: 39 (passive) best link: 0 (pri: 1)
Sep  7 00:09:29 g8kvm01 systemd[1]: Started Session 179 of user root.
Sep  7 00:09:36 g8kvm01 kernel: [566941.561369] INFO: task pvesr:3117203 blocked for more than 241 seconds.
Sep  7 00:09:36 g8kvm01 kernel: [566941.561983]       Tainted: P           O      5.4.55-1-pve #1
Sep  7 00:09:36 g8kvm01 kernel: [566941.562471] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Sep  7 00:09:36 g8kvm01 kernel: [566941.562970] pvesr           D    0 3117203      1 0x00000000
Sep  7 00:09:36 g8kvm01 kernel: [566941.562972] Call Trace:
Sep  7 00:09:36 g8kvm01 kernel: [566941.562981]  __schedule+0x2e6/0x6f0
Sep  7 00:09:36 g8kvm01 kernel: [566941.562984]  ? filename_parentat.isra.57.part.58+0xf7/0x180
Sep  7 00:09:36 g8kvm01 kernel: [566941.562986]  schedule+0x33/0xa0
Sep  7 00:09:36 g8kvm01 kernel: [566941.562989]  rwsem_down_write_slowpath+0x2ed/0x4a0
Sep  7 00:09:36 g8kvm01 kernel: [566941.562990]  down_write+0x3d/0x40
Sep  7 00:09:36 g8kvm01 kernel: [566941.562991]  filename_create+0x8e/0x180
Sep  7 00:09:36 g8kvm01 kernel: [566941.562993]  do_mkdirat+0x59/0x110
Sep  7 00:09:36 g8kvm01 kernel: [566941.562994]  __x64_sys_mkdir+0x1b/0x20
Sep  7 00:09:36 g8kvm01 kernel: [566941.562997]  do_syscall_64+0x57/0x190
Sep  7 00:09:36 g8kvm01 kernel: [566941.562998]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
Sep  7 00:09:36 g8kvm01 kernel: [566941.563000] RIP: 0033:0x7f7938c5b0d7
Sep  7 00:09:36 g8kvm01 kernel: [566941.563004] Code: Bad RIP value.
Sep  7 00:09:36 g8kvm01 kernel: [566941.563004] RSP: 002b:00007ffe8c18cb08 EFLAGS: 00000246 ORIG_RAX: 0000000000000053
Sep  7 00:09:36 g8kvm01 kernel: [566941.563005] RAX: ffffffffffffffda RBX: 000055cd8eb9b260 RCX: 00007f7938c5b0d7
Sep  7 00:09:36 g8kvm01 kernel: [566941.563006] RDX: 000055cd8db813d4 RSI: 00000000000001ff RDI: 000055cd92c05510
Sep  7 00:09:36 g8kvm01 kernel: [566941.563006] RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000008
Sep  7 00:09:36 g8kvm01 kernel: [566941.563007] R10: 0000000000000000 R11: 0000000000000246 R12: 000055cd8ff5aa78
Sep  7 00:09:36 g8kvm01 kernel: [566941.563007] R13: 000055cd92c05510 R14: 000055cd9288d4e0 R15: 00000000000001ff
Sep  7 00:10:14 g8kvm01 corosync[1711]:   [TOTEM ] Retransmit List: 39f 3a0 3a1 3a4
Sep  7 00:10:35 g8kvm01 corosync[1711]:   [TOTEM ] Retransmit List: 3a0 3a1
Sep  7 00:10:38 g8kvm01 corosync[1711]:   [KNET  ] link: host: 39 link: 0 is down
Sep  7 00:10:38 g8kvm01 corosync[1711]:   [KNET  ] host: host: 39 (passive) best link: 0 (pri: 1)
Sep  7 00:10:38 g8kvm01 corosync[1711]:   [KNET  ] host: host: 39 has no active links
Sep  7 00:10:45 g8kvm01 pmxcfs[1635]: [status] notice: cpg_send_message retry 10
Sep  7 00:10:46 g8kvm01 pmxcfs[1635]: [status] notice: cpg_send_message retry 20
Sep  7 00:10:47 g8kvm01 pmxcfs[1635]: [status] notice: cpg_send_message retry 30
Sep  7 00:10:48 g8kvm01 pmxcfs[1635]: [status] notice: cpg_send_message retry 40

lazypaul · Sep 7, 2020

Code:

Sep  7 00:10:49 g8kvm01 pmxcfs[1635]: [status] notice: cpg_send_message retry 50

Sep  7 00:10:50 g8kvm01 pmxcfs[1635]: [status] notice: cpg_send_message retry 60

Sep  7 00:10:51 g8kvm01 pmxcfs[1635]: [status] notice: cpg_send_message retry 70

Sep  7 00:10:52 g8kvm01 pmxcfs[1635]: [status] notice: cpg_send_message retry 80

Sep  7 00:10:53 g8kvm01 pmxcfs[1635]: [status] notice: cpg_send_message retry 90

Sep  7 00:10:54 g8kvm01 ceph-osd[2252]: 2020-09-07 00:10:54.309 7f05ba5ba700 -1 osd.3 96815 heartbeat_check: no reply from 10.0.141.8:6830 osd.86 since back 2020-09-07 00:10:40.625996 front 2020-09-07 00:10:31.322959 (oldest deadline 2020-09-07 00:10:54.222485)

Sep  7 00:10:54 g8kvm01 ceph-osd[2251]: 2020-09-07 00:10:54.401 7fa527fdf700 -1 osd.1 96815 heartbeat_check: no reply from 10.0.141.8:6846 osd.83 since back 2020-09-07 00:10:39.167351 front 2020-09-07 00:10:30.464891 (oldest deadline 2020-09-07 00:10:53.964669)

Sep  7 00:10:54 g8kvm01 pmxcfs[1635]: [status] notice: cpg_send_message retry 100

Sep  7 00:10:54 g8kvm01 pmxcfs[1635]: [status] notice: cpg_send_message retried 100 times

Sep  7 00:10:54 g8kvm01 pmxcfs[1635]: [status] crit: cpg_send_message failed: 6

Sep  7 00:10:54 g8kvm01 ceph-osd[2248]: 2020-09-07 00:10:54.601 7f1c324f8700 -1 osd.6 96815 heartbeat_check: no reply from 10.0.141.8:6813 osd.85 since back 2020-09-07 00:10:30.252395 front 2020-09-07 00:10:40.754276 (oldest deadline 2020-09-07 00:10:54.351091)

Sep  7 00:10:55 g8kvm01 ceph-osd[2252]: 2020-09-07 00:10:55.297 7f05ba5ba700 -1 osd.3 96815 heartbeat_check: no reply from 10.0.141.8:6830 osd.86 since back 2020-09-07 00:10:40.625996 front 2020-09-07 00:10:31.322959 (oldest deadline 2020-09-07 00:10:54.222485)

spirit · Sep 7, 2020

lazypaul said:
They told just remove HA groups the HA will stop, does it right? Still the HA problem ?

I think that once 1 vm ha has been started on a node, the watchdog is enabled. (until pve-ha-lrm service is stopped).

This could just explain the reboot.

Now, about corosync itself, as I previously said, you have a lot of nodes for a corosync cluster, and I'm really not sure about the stability of corosync with 30-40 nodes.
in normal time (not here when the problem have occured), in /var/log/daemon.log, do you see corosync logs like "[TOTEM ] Retransmit List ..." ?

lazypaul · Sep 8, 2020

spirit said:
I think that once 1 vm ha has been started on a node, the watchdog is enabled. (until pve-ha-lrm service is stopped).

This could just explain the reboot.

Now, about corosync itself, as I previously said, you have a lot of nodes for a corosync cluster, and I'm really not sure about the stability of corosync with 30-40 nodes.
in normal time (not here when the problem have occured), in /var/log/daemon.log, do you see corosync logs like "[TOTEM ] Retransmit List ..." ?

In the Deamon show the same, a lot of ceph osd error message then reboot

Code:

root@g8kvm02:~# vi /var/log/daemon.log

Sep  7 00:12:10 g8kvm02 ceph-osd[2292]: 2020-09-07 00:12:10.885 7fa29b6fa700 -1 osd.8 96815 heartbeat_check: no reply from 10.0.141.37:6810 osd.307 since back 2020-09-07 00:10:52.950811 front 2020-09-07 00:10:52.950563 (oldest deadline 2020-09-07 00:11:15.249640)
Sep  7 00:12:10 g8kvm02 ceph-osd[2292]: 2020-09-07 00:12:10.885 7fa29b6fa700 -1 osd.8 96815 heartbeat_check: no reply from 10.0.141.38:6835 osd.313 since back 2020-09-07 00:10:39.646821 front 2020-09-07 00:10:39.647357 (oldest deadline 2020-09-07 00:11:04.346043)
Sep  7 00:12:10 g8kvm02 ceph-osd[2292]: 2020-09-07 00:12:10.885 7fa29b6fa700 -1 osd.8 96815 heartbeat_check: no reply from 10.0.141.38:6868 osd.314 since back 2020-09-07 00:10:39.647466 front 2020-09-07 00:10:39.646843 (oldest deadline 2020-09-07 00:11:04.346043)
Sep  7 00:12:10 g8kvm02 ceph-osd[2292]: 2020-09-07 00:12:10.885 7fa29b6fa700 -1 osd.8 96815 heartbeat_check: no reply from 10.0.141.38:6814 osd.319 since back 2020-09-07 00:10:39.647415 front 2020-09-07 00:10:39.647523 (oldest deadline 2020-09-07 00:11:04.346043)
Sep  7 00:12:10 g8kvm02 ceph-osd[2292]: 2020-09-07 00:12:10.885 7fa29b6fa700 -1 osd.8 96815 heartbeat_check: no reply from 10.0.141.38:6854 osd.320 since back 2020-09-07 00:10:39.647401 front 2020-09-07 00:10:39.647537 (oldest deadline 2020-09-07 00:11:04.346043)
^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@Sep  7 00:17:12 g8kvm02 dmeventd[755]: dmeventd ready for processing.
Sep  7 00:17:12 g8kvm02 lvm[748]:   1 logical volume(s) in volume group "ceph-e6fd3e3c-8853-4dd2-9e0e-6399af0ba30b" monitored
Sep  7 00:17:12 g8kvm02 lvm[748]:   1 logical volume(s) in volume group "ceph-d07a5580-6e25-4c92-8a4a-76e949751b87" monitored
Sep  7 00:17:12 g8kvm02 lvm[748]:   1 logical volume(s) in volume group "ceph-c732bd53-0f25-40c7-b726-fb67866d8176" monitored
Sep  7 00:17:12 g8kvm02 lvm[748]:   1 logical volume(s) in volume group "ceph-9e42c888-294f-4c48-a575-285e72ee4114" monitored
Sep  7 00:17:12 g8kvm02 lvm[748]:   1 logical volume(s) in volume group "ceph-bfc41a84-297e-42f0-8170-0b2dc3d9f2cf" monitored
Sep  7 00:17:12 g8kvm02 lvm[748]:   1 logical volume(s) in volume group "ceph-e908d34c-5cae-4dfa-9e1c-33a5c3925c73" monitored
Sep  7 00:17:12 g8kvm02 lvm[748]:   1 logical volume(s) in volume group "ceph-6579ec3b-9d1e-4fa1-aa6d-3d9536b0bf6a" monitored
Sep  7 00:17:12 g8kvm02 systemd[1]: Starting Flush Journal to Persistent Storage...
Sep  7 00:17:12 g8kvm02 systemd[1]: Started Flush Journal to Persistent Storage.
Sep  7 00:17:12 g8kvm02 systemd[1]: Started Create Static Device Nodes in /dev.

lazypaul · Sep 8, 2020

spirit said:
I think that once 1 vm ha has been started on a node, the watchdog is enabled. (until pve-ha-lrm service is stopped).

This could just explain the reboot.

Now, about corosync itself, as I previously said, you have a lot of nodes for a corosync cluster, and I'm really not sure about the stability of corosync with 30-40 nodes.
in normal time (not here when the problem have occured), in /var/log/daemon.log, do you see corosync logs like "[TOTEM ] Retransmit List ..." ?

Can I disable pve-ha-lrm service in all nodes? But the strange things is that only 2 node reboot but make the whole cluster die. So in your experiments corosync is not good enough to manage 30-40 nodes?

Proxmox is really good with ceph in Hyper-converged infrastructure, but not stable enough.

spirit · Sep 8, 2020

Can I disable pve-ha-lrm service in all nodes?

yes, first disable pve-ha-lrm on all nodes, then pve-ha-crm on all nodes
But the strange things is that only 2 node reboot but make the whole cluster die.

Code:

So in your experiments corosync is not good enough to manage 30-40 nodes?

Previously, on proxmox 5.0, it's was corosync2, and it have already retransmist with 20 nodes.
now in proxmox 6.0, this is corosync3 (new procotol), so I really can't tell.
But even corosync devs say that you are not going to do 100 nodes.
If you already have retransmit problem currently, and need some timeout tuning, you need to be carefull.
Also I think you have old servers in your cluster? be carefull with them, check corosync cpu usage

Code:

Proxmox is really good with ceph in Hyper-converged infrastructure, but not stable enough.

BTW, do you have dedicated link for your corosync ? (or dedicated for your ceph).
Because with hyperconverged, osd resync can use a lot of bandwith, and your switch really don't need to lost some corosync packets. (be carefull of cheap switchs with small buffers, if you don't have good swtich, it's recommended to have dedicated links/switch for corosync traffic)

(Personnaly, I'm doing multiple clusters of 20 nodes max)

lazypaul · Sep 8, 2020

spirit said:
yes, first disable pve-ha-lrm on all nodes, then pve-ha-crm on all nodes
But the strange things is that only 2 node reboot but make the whole cluster die.

Code:

So in your experiments corosync is not good enough to manage 30-40 nodes?

Previously, on proxmox 5.0, it's was corosync2, and it have already retransmist with 20 nodes.
now in proxmox 6.0, this is corosync3 (new procotol), so I really can't tell.
But even corosync devs say that you are not going to do 100 nodes.
If you already have retransmit problem currently, and need some timeout tuning, you need to be carefull.
Also I think you have old servers in your cluster? be carefull with them, check corosync cpu usage

Code:

Proxmox is really good with ceph in Hyper-converged infrastructure, but not stable enough.

BTW, do you have dedicated link for your corosync ? (or dedicated for your ceph).
Because with hyperconverged, osd resync can use a lot of bandwith, and your switch really don't need to lost some corosync packets. (be carefull of cheap switchs with small buffers, if you don't have good swtich, it's recommended to have dedicated links/switch for corosync traffic)

(Personnaly, I'm doing multiple clusters of 20 nodes max)

I have 2 vsan cluster in the same cisco 4506 switch, Vsan is really stable, I think network is ok. May be the problem is the Network problem,

I am wondering the different with vsan is : All node is only a 10G trunk port, but vsan is using vlan to separate the client and vsan storage.

Is there any way move the ceph to a single vlan, and corosync another vlan?

Dominic · Sep 8, 2020

As @spirit mentioned, it is strongly recommended to have (physically) separated networks for storage and cluster. The faster the network the better. Unfortunately, with a 10G switch that handles both storage and cluster traffic for a 39 node cluster, I wouldn't be surprised about unsatisfying results.

We have tested up to 36 nodes ourselves and have heard of people building Corosync clusters of up to 59 nodes.

lazypaul · Sep 8, 2020

Dominic said:
As @spirit mentioned, it is strongly recommended to have (physically) separated networks for storage and cluster. The faster the network the better. Unfortunately, with a 10G switch that handles both storage and cluster traffic for a 39 node cluster, I wouldn't be surprised about unsatisfying results.

We have tested up to 36 nodes ourselves and have heard of people building Corosync clusters of up to 59 nodes.

See, it is mostly the network cause my problem. Do you know how to change the corosync network, since the cluster is in use. I am going to add nodes up to 100 nodes.

You can see that my Vmware vsan is ok in the same network, So i think proxmox could change if I divide them into 2 vlan.

spirit · Sep 8, 2020

lazypaul said:
Do you know how to change the corosync network, since the cluster is in use.

you can add multiple links (ip address) in corosync

you need to edit /etc/pve/corosync.conf manually
https://pve.proxmox.com/wiki/Cluster_Manager#pvecm_redundancy

add ring1_addr to each node with second ip on different network

Code:

totem {
 ....
  interface {
    bindnetaddr: old network ip
    ringnumber: 0
    linknumber: 0
    knet_link_priority: 10
  }
  interface {
    bindnetaddr: new network ip
    ringnumber: 1
    linknumber: 1
    knet_link_priority: 5  #lower value, have more priority, so it'll use this link
  }

}

(and maybe restart corosync on each node)

Like this, you'll use your new network for main corosync link, and you have failover to old network in case of switch failure.

lazypaul · Sep 8, 2020

spirit said:
you can add multiple links (ip address) in corosync

you need to edit /etc/pve/corosync.conf manually
https://pve.proxmox.com/wiki/Cluster_Manager#pvecm_redundancy

add ring1_addr to each node with second ip on different network

Code:

totem { .... interface { bindnetaddr: old network ip ringnumber: 0 linknumber: 0 knet_link_priority: 10 } interface { bindnetaddr: new network ip ringnumber: 1 linknumber: 1 knet_link_priority: 5 #lower value, have more priority, so it'll use this link } }

(and maybe restart corosync on each node)

Like this, you'll use your new network for main corosync link, and you have failover to old network in case of switch failure.

Before apply to the production envirment I deploy to the Test envirement first, is it right?

Code:

root@backupkvm05:~# pvecm status
Cluster information
-------------------
Name:             BackupKvm
Config Version:   13
Transport:        knet
Secure auth:      on

Quorum information
------------------
Date:             Tue Sep  8 18:34:07 2020
Quorum provider:  corosync_votequorum
Nodes:            5
Node ID:          0x00000005
Ring ID:          1.ca6
Quorate:          Yes

Votequorum information
----------------------
Expected votes:   5
Highest expected: 5
Total votes:      5
Quorum:           3 
Flags:            Quorate

Membership information
----------------------
    Nodeid      Votes Name
0x00000001          1 10.0.142.1
0x00000002          1 10.0.142.2
0x00000003          1 10.0.142.3
0x00000004          1 10.0.142.4
0x00000005          1 10.0.142.5 (local)
root@backupkvm05:~# exit

Code:

root@backupkvm01:~# cat /etc/pve/corosync.conf
logging {
  debug: off
  to_syslog: yes
}

nodelist {
  node {
    name: backupkvm01
    nodeid: 1
    quorum_votes: 1
    ring0_addr: 10.0.142.1
    ring1_addr: 192.168.142.1
  }
  node {
    name: backupkvm02
    nodeid: 2
    quorum_votes: 1
    ring0_addr: 10.0.142.2
    ring1_addr: 192.168.142.2
  }
  node {
    name: backupkvm03
    nodeid: 3
    quorum_votes: 1
    ring0_addr: 10.0.142.3
    ring1_addr: 192.168.142.3
  }
  node {
    name: backupkvm04
    nodeid: 4
    quorum_votes: 1
    ring0_addr: 10.0.142.4
    ring1_addr: 192.168.142.4
  }
  node {
    name: backupkvm05
    nodeid: 5
    quorum_votes: 1
    ring0_addr: 10.0.142.5
    ring1_addr: 192.168.142.5
  }
}

quorum {
  provider: corosync_votequorum
}

totem {
  cluster_name: BackupKvm
  config_version: 13
  interface {
    ringnumber: 0
    linknumber: 0
    knet_link_priority: 10
  }
    interface {
    ringnumber: 1
    linknumber: 1
    knet_link_priority: 5
  }
  ip_version: ipv4-6
  link_mode: passive
  secauth: on
  version: 2
}

root@backupkvm01:~#

lazypaul · Sep 8, 2020

Is there a same way to change the ceph network? I want to put ceph into another vlan. So thoses traffice would not in the same vlan, might be slove my problem.

Dominic · Sep 10, 2020

Is there a same way to change the ceph network?

This thread could help.

lazypaul · Sep 10, 2020

Dominic said:
This thread could help.

Thank you I will try that. But it is difficult for me, there are 39 nodes.

lazypaul · Sep 11, 2020

Dominic said:
This thread could help.

Hi, I have add the new ceph cluster network, do I want to restart in all nodes ? Can I just place the commamd to all 39 nodes?

systemctl restart ceph.target

Search

Search

Cluster die after adding the 39th node! Proxmox is not stable!

lazypaul

Member

spirit

Distinguished Member

lazypaul

Member

lazypaul

Member

lazypaul

Member

spirit

Distinguished Member

lazypaul

Member

lazypaul

Member

spirit

Distinguished Member

lazypaul

Member

Dominic

Proxmox Retired Staff

lazypaul

Member

spirit

Distinguished Member

lazypaul

Member

lazypaul

Member

Dominic

Proxmox Retired Staff

lazypaul

Member

lazypaul

Member

We value your privacy