Network Problems with proxmox-ve 6.0.2

kev1904

Well-Known Member
Feb 11, 2019
61
4
48
31
Hello Guys,

We have some strange Network Problems sience updated the Cluster to version 6.X
They are 11 nodes in the Cluster and the Network from all Nodes Break random but all at the same time like there is some strange Traffic.
No other Servers have Problems connected to the same switches.
In the syslog we see strange things i post below, maybe there is a Driver or Kernel Bug ? All nodes run the same kernel/pve-version.
The cluster run Fine before upgrading to 6. I Hope Someone can help me

[276065.034418] drv_pulse_mb[00001d0b]
[276065.034423] bnx2 0000:1c:00.1 eth5: DEBUG: dev_info_signature[44564903] reset_type[01005254]
[276065.034778] condition[0003e10e]
[276065.034785] bnx2 0000:1c:00.1 eth5: DEBUG: 000001c0: 01005254 42530000 0003e10e 00000000
[276065.035148] bnx2 0000:1c:00.1 eth5: DEBUG: 000003cc: 44444444 44444444 44444444 00000a00
[276065.035513] bnx2 0000:1c:00.1 eth5: DEBUG: 000003dc: 0ffeffff 0000ffff ffffffff ffffffff
[276065.035884] bnx2 0000:1c:00.1 eth5: DEBUG: 000003ec: 00000000 00000000 00000000 00a60630
[276065.036257] bnx2 0000:1c:00.1 eth5: DEBUG: 0x3fc[0000ffff]
[276065.036631] bnx2 0000:1c:00.1 eth5: <--- end MCP states dump --->
[276070.125979] bnx2 0000:1c:00.1 eth5: <--- start FTQ dump --->
[276070.126522] bnx2 0000:1c:00.1 eth5: RV2P_PFTQ_CTL 00010000
[276070.126930] bnx2 0000:1c:00.1 eth5: RV2P_TFTQ_CTL 00020000
[276070.127329] bnx2 0000:1c:00.1 eth5: RV2P_MFTQ_CTL 00004000
[276070.127724] bnx2 0000:1c:00.1 eth5: TBDR_FTQ_CTL 00004002
[276070.128118] bnx2 0000:1c:00.1 eth5: TDMA_FTQ_CTL 00010002
[276070.128513] bnx2 0000:1c:00.1 eth5: TXP_FTQ_CTL 00010002
[276070.128911] bnx2 0000:1c:00.1 eth5: TXP_FTQ_CTL 00010002
[276070.129305] bnx2 0000:1c:00.1 eth5: TPAT_FTQ_CTL 00010000
[276070.129703] bnx2 0000:1c:00.1 eth5: RXP_CFTQ_CTL 00008000
[276070.130118] bnx2 0000:1c:00.1 eth5: RXP_FTQ_CTL 00100000
[276070.130547] bnx2 0000:1c:00.1 eth5: COM_COMXQ_FTQ_CTL 00010000
[276070.130953] bnx2 0000:1c:00.1 eth5: COM_COMTQ_FTQ_CTL 00020000
[276070.131356] bnx2 0000:1c:00.1 eth5: COM_COMQ_FTQ_CTL 00010000
[276070.131762] bnx2 0000:1c:00.1 eth5: CP_CPQ_FTQ_CTL 00004000
[276070.132170] bnx2 0000:1c:00.1 eth5: CPU states:
[276070.132587] bnx2 0000:1c:00.1 eth5: 045000 mode b84c state 80001000 evt_mask 500 pc 800128c pc 8001284 instr 38640001
[276070.133048] bnx2 0000:1c:00.1 eth5: 085000 mode b84c state 80001000 evt_mask 500 pc 8000a48 pc 8000a58 instr 8f820014
[276070.133506] bnx2 0000:1c:00.1 eth5: 0c5000 mode b84c state 80001000 evt_mask 500 pc 8004c20 pc 8004c14 instr 10e00088
[276070.133975] bnx2 0000:1c:00.1 eth5: 105000 mode b8cc state 80000000 evt_mask 500 pc 8000a98 pc 8000b24 instr 1040ffd9
[276070.134436] bnx2 0000:1c:00.1 eth5: 145000 mode b800 state 80000000 evt_mask 500 pc 800af54 pc 800aedc instr 1440000e
[276070.134900] bnx2 0000:1c:00.1 eth5: 185000 mode b8cc state 80004000 evt_mask 500 pc 8000c6c pc 8000920 instr 8ce800e8
[276070.135369] bnx2 0000:1c:00.1 eth5: <--- end FTQ dump --->
[276070.135838] bnx2 0000:1c:00.1 eth5: <--- start TBDC dump --->
[276070.136307] bnx2 0000:1c:00.1 eth5: TBDC free cnt: 32
[276070.136777] bnx2 0000:1c:00.1 eth5: LINE CID BIDX CMD VALIDS
[276070.137259] bnx2 0000:1c:00.1 eth5: 00 001180 49e0 00 [0]
[276070.137741] bnx2 0000:1c:00.1 eth5: 01 001080 44b0 00 [0]
[276070.138230] bnx2 0000:1c:00.1 eth5: 02 001100 45e0 00 [0]
[276070.138701] bnx2 0000:1c:00.1 eth5: 03 001100 45e8 00 [0]
[276070.139162] bnx2 0000:1c:00.1 eth5: 04 001000 3568 00 [0]
[276070.139612] bnx2 0000:1c:00.1 eth5: 05 001000 3570 00 [0]
[276070.140052] bnx2 0000:1c:00.1 eth5: 06 001100 4480 00 [0]
[276070.140486] bnx2 0000:1c:00.1 eth5: 07 001100 43c8 00 [0]
[276070.140914] bnx2 0000:1c:00.1 eth5: 08 001000 2b40 00 [0]
[276070.141337] bnx2 0000:1c:00.1 eth5: 09 001080 4070 00 [0]
[276070.141749] bnx2 0000:1c:00.1 eth5: 0a 001100 6788 00 [0]
[276070.142162] bnx2 0000:1c:00.1 eth5: 0b 001300 4e88 00 [0]
[276070.142553] bnx2 0000:1c:00.1 eth5: 0c 001080 1698 00 [0]
[276070.142936] bnx2 0000:1c:00.1 eth5: 0d 001180 69e8 00 [0]
[276070.143308] bnx2 0000:1c:00.1 eth5: 0e 001100 96d0 00 [0]
[276070.143669] bnx2 0000:1c:00.1 eth5: 0f 001000 abf8 00 [0]
[276070.144022] bnx2 0000:1c:00.1 eth5: 10 001080 4e70 00 [0]
[276070.144369] bnx2 0000:1c:00.1 eth5: 11 001000 fc38 00 [0]
[276070.144709] bnx2 0000:1c:00.1 eth5: 12 001180 9f40 00 [0]
[276070.145043] bnx2 0000:1c:00.1 eth5: 13 001100 e2d0 00 [0]
[276070.145368] bnx2 0000:1c:00.1 eth5: 14 001280 e180 00 [0]
[276070.145688] bnx2 0000:1c:00.1 eth5: 15 001300 b570 00 [0]
[276070.146014] bnx2 0000:1c:00.1 eth5: 16 001280 1110 00 [0]
[276070.146329] bnx2 0000:1c:00.1 eth5: 17 001100 2b88 00 [0]
[276070.146641] bnx2 0000:1c:00.1 eth5: 18 001180 3a48 00 [0]
[276070.146953] bnx2 0000:1c:00.1 eth5: 19 001180 3a50 00 [0]
[276070.147262] bnx2 0000:1c:00.1 eth5: 1a 001080 acf8 00 [0]
[276070.147571] bnx2 0000:1c:00.1 eth5: 1b 001080 ad00 00 [0]
[276070.147878] bnx2 0000:1c:00.1 eth5: 1c 001300 e4f8 00 [0]
[276070.148187] bnx2 0000:1c:00.1 eth5: 1d 001300 e500 00 [0]
[276070.148493] bnx2 0000:1c:00.1 eth5: 1e 000800 94a8 00 [0]
[276070.148798] bnx2 0000:1c:00.1 eth5: 1f 1ffe80 f7a8 fd [0]
[276070.149096] bnx2 0000:1c:00.1 eth5: <--- end TBDC dump --->
[276070.149398] bnx2 0000:1c:00.1 eth5: DEBUG: intr_sem[0] PCI_CMD[00100446]
[276070.149711] bnx2 0000:1c:00.1 eth5: DEBUG: PCI_PM[19002008] PCI_MISC_CFG[92000088]
[276070.150041] bnx2 0000:1c:00.1 eth5: DEBUG: EMAC_TX_STATUS[00000008] EMAC_RX_STATUS[00000000]
[276070.150404] bnx2 0000:1c:00.1 eth5: DEBUG: RPM_MGMT_PKT_CTRL[40000088]
[276070.150733] bnx2 0000:1c:00.1 eth5: DEBUG: HC_STATS_INTERRUPT_STATUS[01800019]
[276070.151051] bnx2 0000:1c:00.1 eth5: DEBUG: PBA[00000000]
[276070.151368] bnx2 0000:1c:00.1 eth5: <--- start MCP states dump --->
[276070.151692] bnx2 0000:1c:00.1 eth5: DEBUG: MCP_STATE_P0[0003e10e] MCP_STATE_P1[0003e10e]
[276070.152030] bnx2 0000:1c:00.1 eth5: DEBUG: MCP mode[0000b800] state[80000000] evt_mask[00000500]
[276070.152377] bnx2 0000:1c:00.1 eth5: DEBUG: pc[0800d974] pc[0800d900] instr[10400033]
[276070.152724] bnx2 0000:1c:00.1 eth5: DEBUG: shmem states:
[276070.153058] bnx2 0000:1c:00.1 eth5: DEBUG: drv_mb[01030009] fw_mb[00000009] link_status[0000006f]
[276070.153407] drv_pulse_mb[00001d10]
[276070.153412] bnx2 0000:1c:00.1 eth5: DEBUG: dev_info_signature[44564903] reset_type[01005254]
[276070.153766] condition[0003e10e]
[276070.153772] bnx2 0000:1c:00.1 eth5: DEBUG: 000001c0: 01005254 42530000 0003e10e 00000000
[276070.154146] bnx2 0000:1c:00.1 eth5: DEBUG: 000003cc: 44444444 44444444 44444444 00000a00
[276070.154559] bnx2 0000:1c:00.1 eth5: DEBUG: 000003dc: 0ffeffff 0000ffff ffffffff ffffffff
[276070.154929] bnx2 0000:1c:00.1 eth5: DEBUG: 000003ec: 00000000 00000000 00000000 00a60630
[276070.155302] bnx2 0000:1c:00.1 eth5: DEBUG: 0x3fc[0000ffff]
[276070.155675] bnx2 0000:1c:00.1 eth5: <--- end MCP states dump --->
[276071.263394] bnx2 0000:1c:00.1 eth5: NIC Copper Link is Down
[276071.383949] bnx2 0000:1c:00.1 eth5: speed changed to 0 for port eth5
[276071.393975] bond1: link status definitely down for interface eth5, disabling it
[276074.489064] bnx2 0000:1c:00.1 eth5: NIC Copper Link is Up, 1000 Mbps full duplex

Errors From another Node : https://pastebin.com/raw/rKff9643


pveversion -v
proxmox-ve: 6.0-2 (running kernel: 5.0.21-1-pve)
pve-manager: 6.0-7 (running version: 6.0-7/28984024)
pve-kernel-5.0: 6.0-7
pve-kernel-helper: 6.0-7
pve-kernel-5.0.21-1-pve: 5.0.21-2
pve-kernel-5.0.15-1-pve: 5.0.15-1
ceph: 14.2.2-pve1
ceph-fuse: 14.2.2-pve1
corosync: 3.0.2-pve2
criu: 3.11-3
glusterfs-client: 5.5-3
ksm-control-daemon: 1.3-1
libjs-extjs: 6.0.1-10
libknet1: 1.11-pve1
libpve-access-control: 6.0-2
libpve-apiclient-perl: 3.0-2
libpve-common-perl: 6.0-4
libpve-guest-common-perl: 3.0-1
libpve-http-server-perl: 3.0-2
libpve-storage-perl: 6.0-8
libqb0: 1.0.5-1
lvm2: 2.03.02-pve3
lxc-pve: 3.1.0-64
lxcfs: 3.0.3-pve60
novnc-pve: 1.0.0-60
proxmox-mini-journalreader: 1.1-1
proxmox-widget-toolkit: 2.0-7
pve-cluster: 6.0-7
pve-container: 3.0-7
pve-docs: 6.0-4
pve-edk2-firmware: 2.20190614-1
pve-firewall: 4.0-7
pve-firmware: 3.0-2
pve-ha-manager: 3.0-2
pve-i18n: 2.0-3
pve-qemu-kvm: 4.0.0-5
pve-xtermjs: 3.13.2-1
qemu-server: 6.0-7
smartmontools: 7.0-pve2
spiceterm: 3.1-1
vncterm: 1.6-1
zfsutils-linux: 0.8.1-pve2
 
Hmm, one node uses Broadcom NIC and the other Intel GB ones, both have issues - which seems weird..

pvesr hangs, but that it's specifically this service is probably a result of the fact that it is scheduled quite often (so has a higher chance to run into issues) Seems like pmxcfs (cluster configuration file system) hangs.
Here the only thing I already know is that you still have libknet1 in verison 1.11-pve1, a newer version fixes a crash in the cluster communication stack, so updating to it could be worth a shot (currently avaiable in all but the enterprise repository, as it's relatively fresh, but will move soon there)..

What's your general network setup? Multiple networks, the cluster network separated on it's own net?
Do you use Ceph, or the like? Is the time this happens always the same? Are backup jobs being run during the time this happens?
 
The setup is this: the Nodes have 8 Copper nics. We created 3 Bond devices (lacp) one for the managament one for Ceph and one for mounted NAS storage, every bond is in his own vlan we have 5 Computing nodes and 6 Ceph nodes. But the nic errors are on all Nodes random at one node its a nic from the managament bond at the other node its a nic from the storage bond, so its absolutly random nic failures. And we got also crashes when we run multiple Disk moves to other ceph pool, the nodes gets Fenced and rebooted all. But at the nic issue yesterday night nothing runns, no backup nothing. And the problem whas at the same time at all nodes. I must restart corosync and pve-cluster to get the cluster Healthy again after the nic fails.


EDIT: I Updated now to kernel 5.0.21-2 and libknet1: 1.12-pve1; Reboot Pending
 
Last edited:
about bnx2 error, 1 user reported that

options bnx2 disable_msi=1 to /etc/modprobe.d/bnx2.conf

fix the problem (I have see same bug some year ago, it was a kernel bug, and this was also fixing it).


Another user reported problem, when vlan was configured on a vm at the promox side, and also vlan inside the vm. (double vlan tag). I'm trying to have more info about this setup to reproduce.
 
I attach the Network Config.

The Virtual Servers also uses Vlan Tag's over bond2

We will update soon all Nodes with 2x 10 Gbe LACP

# Bond 0 Management
auto bond0
iface bond0 inet static
address 192.168.75.1
netmask 255.255.255.0
gateway 192.168.75.254
bond-mode 4
bond-miimon 100
bond-lacp-rate 1
bond-xmit_hash_policy layer3+4
slaves eth4 eth0

# Bond 1 Storage (ceph)
auto bond1
iface bond1 inet manual
bond-mode 4
bond-miimon 100
bond-lacp-rate 1
bond-xmit_hash_policy layer3+4
slaves eth5 eth1

# VLAN 76 for Storage (ceph)
auto vlan76
iface vlan76 inet static
address 192.168.76.1
netmask 255.255.255.0
vlan-raw-device bond1

# VLAN 302 for NAS Storage (backups)
auto vlan302
iface vlan302 inet static
address 172.19.76.1
netmask 255.255.0.0
vlan-raw-device bond1

# Bond 2 for VM Traffic
auto bond2
iface bond2 inet manual
bond-mode 4
bond-miimon 100
bond-lacp-rate 1
bond-xmit_hash_policy layer3+4
slaves eth2 eth3 eth6 eth7
 
Got the Same errors at bnx2 after setting
options bnx2 disable_msi=1 to /etc/modprobe.d/bnx2.conf
 
This is the complete Config.... only thing that is not included is this:

auto vmbr0
iface vmbr0 inet manual
bridge_ports bond2
bridge_stp off
bridge_fd 0
 
Again had Problems at the Weakend seems like pmxcfs crashed on one node (proxstore12)

Sep 29 11:48:55 prox01 corosync[19554]: [KNET ] link: host: 10 link: 0 is down
Sep 29 11:48:55 prox01 corosync[19554]: [KNET ] host: host: 10 (passive) best link: 0 (pri: 1)
Sep 29 11:48:55 prox01 corosync[19554]: [KNET ] host: host: 10 has no active links
Sep 29 11:48:56 prox01 corosync[19554]: [TOTEM ] Token has not been received in 177 ms
Sep 29 11:49:00 prox01 systemd[1]: Starting Proxmox VE replication runner...
Sep 29 11:49:03 prox01 corosync[19554]: [TOTEM ] A new membership (1:4196) was formed. Members left: 10
Sep 29 11:49:03 prox01 corosync[19554]: [TOTEM ] Failed to receive the leave message. failed: 10
Sep 29 11:49:03 prox01 corosync[19554]: [CPG ] downlist left_list: 1 received
Sep 29 11:49:03 prox01 corosync[19554]: [CPG ] downlist left_list: 1 received
Sep 29 11:49:03 prox01 corosync[19554]: [CPG ] downlist left_list: 1 received
Sep 29 11:49:03 prox01 corosync[19554]: [CPG ] downlist left_list: 1 received
Sep 29 11:49:03 prox01 corosync[19554]: [CPG ] downlist left_list: 1 received
Sep 29 11:49:03 prox01 corosync[19554]: [CPG ] downlist left_list: 1 received
Sep 29 11:49:03 prox01 corosync[19554]: [CPG ] downlist left_list: 1 received
Sep 29 11:49:03 prox01 pmxcfs[19482]: [dcdb] notice: members: 1/19482, 2/27572, 3/22831, 4/3819, 8/17473, 9/3152441, 11/3910129
Sep 29 11:49:03 prox01 pmxcfs[19482]: [dcdb] notice: starting data syncronisation
Sep 29 11:49:03 prox01 corosync[19554]: [QUORUM] Members[7]: 1 2 3 4 8 9 11
Sep 29 11:49:03 prox01 corosync[19554]: [MAIN ] Completed service synchronization, ready to provide service.
Sep 29 11:49:03 prox01 pmxcfs[19482]: [dcdb] notice: cpg_send_message retried 1 times
Sep 29 11:49:03 prox01 pmxcfs[19482]: [status] notice: members: 1/19482, 2/27572, 3/22831, 4/3819, 8/17473, 9/3152441, 11/3910129
Sep 29 11:49:03 prox01 pmxcfs[19482]: [status] notice: starting data syncronisation
Sep 29 11:49:03 prox01 pmxcfs[19482]: [dcdb] notice: received sync request (epoch 1/19482/00000019)
Sep 29 11:49:03 prox01 pmxcfs[19482]: [status] notice: received sync request (epoch 1/19482/00000019)
Sep 29 11:49:03 prox01 pmxcfs[19482]: [dcdb] notice: received all states
Sep 29 11:49:03 prox01 pmxcfs[19482]: [dcdb] notice: leader is 1/19482
Sep 29 11:49:03 prox01 pmxcfs[19482]: [dcdb] notice: synced members: 1/19482, 2/27572, 3/22831, 4/3819, 8/17473, 9/3152441, 11/3910129
Sep 29 11:49:03 prox01 pmxcfs[19482]: [dcdb] notice: start sending inode updates
Sep 29 11:49:03 prox01 pmxcfs[19482]: [dcdb] notice: sent all (0) updates
Sep 29 11:49:03 prox01 pmxcfs[19482]: [dcdb] notice: all data is up to date
Sep 29 11:49:03 prox01 pmxcfs[19482]: [dcdb] notice: dfsm_deliver_queue: queue length 9
Sep 29 11:49:03 prox01 pmxcfs[19482]: [status] notice: received all states
Sep 29 11:49:03 prox01 pmxcfs[19482]: [status] notice: all data is up to date
Sep 29 11:49:03 prox01 pmxcfs[19482]: [status] notice: dfsm_deliver_queue: queue length 122
Sep 29 11:49:03 prox01 systemd[1]: pvesr.service: Succeeded.
Sep 29 11:49:03 prox01 systemd[1]: Started Proxmox VE replication runner.
Sep 29 11:49:13 prox01 pve-ha-crm[2112]: node 'proxstore12': state changed from 'online' => 'unknown'

Sep 29 11:48:52 proxstore12 pmxcfs[32247]: [status] crit: rrdentry_hash_set: assertion 'data[len-1] == 0' failed
Sep 29 11:48:53 proxstore12 pmxcfs[32247]: [quorum] crit: quorum_dispatch failed: 2
Sep 29 11:48:53 proxstore12 pmxcfs[32247]: [status] notice: node lost quorum
Sep 29 11:48:53 proxstore12 systemd[1]: corosync.service: Main process exited, code=killed, status=11/SEGV
Sep 29 11:48:53 proxstore12 systemd[1]: corosync.service: Failed with result 'signal'.
Sep 29 11:48:53 proxstore12 pmxcfs[32247]: [dcdb] crit: cpg_dispatch failed: 2
Sep 29 11:48:53 proxstore12 pmxcfs[32247]: [dcdb] crit: cpg_leave failed: 2
Sep 29 11:48:53 proxstore12 pmxcfs[32247]: [confdb] crit: cmap_dispatch failed: 2
Sep 29 11:48:53 proxstore12 pmxcfs[32247]: [status] crit: cpg_dispatch failed: 2
Sep 29 11:48:53 proxstore12 pmxcfs[32247]: [status] crit: cpg_leave failed: 2
Sep 29 11:48:53 proxstore12 pmxcfs[32247]: [quorum] crit: quorum_initialize failed: 2
Sep 29 11:48:53 proxstore12 pmxcfs[32247]: [quorum] crit: can't initialize service
Sep 29 11:48:53 proxstore12 pmxcfs[32247]: [confdb] crit: cmap_initialize failed: 2
Sep 29 11:48:53 proxstore12 pmxcfs[32247]: [confdb] crit: can't initialize service
Sep 29 11:48:53 proxstore12 pmxcfs[32247]: [dcdb] notice: start cluster connection
Sep 29 11:48:53 proxstore12 pmxcfs[32247]: [dcdb] crit: cpg_initialize failed: 2
Sep 29 11:48:53 proxstore12 pmxcfs[32247]: [dcdb] crit: can't initialize service
Sep 29 11:48:53 proxstore12 pmxcfs[32247]: [status] notice: start cluster connection
Sep 29 11:48:53 proxstore12 pmxcfs[32247]: [status] crit: cpg_initialize failed: 2
Sep 29 11:48:53 proxstore12 pmxcfs[32247]: [status] crit: can't initialize service
Sep 29 11:48:59 proxstore12 pmxcfs[32247]: [quorum] crit: quorum_initialize failed: 2
Sep 29 11:48:59 proxstore12 pmxcfs[32247]: [confdb] crit: cmap_initialize failed: 2
Sep 29 11:48:59 proxstore12 pmxcfs[32247]: [status] crit: cpg_initialize failed: 2
Sep 29 11:49:00 proxstore12 systemd[1]: Starting Proxmox VE replication runner...
Sep 29 11:49:00 proxstore12 pvesr[2918316]: trying to acquire cfs lock 'file-replication_cfg' ...
Sep 29 11:49:01 proxstore12 pvesr[2918316]: trying to acquire cfs lock 'file-replication_cfg' ...
Sep 29 11:49:02 proxstore12 pvesr[2918316]: trying to acquire cfs lock 'file-replication_cfg' ...
Sep 29 11:49:03 proxstore12 pvesr[2918316]: trying to acquire cfs lock 'file-replication_cfg' ...
Sep 29 11:49:04 proxstore12 pvesr[2918316]: trying to acquire cfs lock 'file-replication_cfg' ...
Sep 29 11:49:05 proxstore12 pmxcfs[32247]: [quorum] crit: quorum_initialize failed: 2
Sep 29 11:49:05 proxstore12 pmxcfs[32247]: [confdb] crit: cmap_initialize failed: 2
Sep 29 11:49:05 proxstore12 pmxcfs[32247]: [dcdb] crit: cpg_initialize failed: 2
Sep 29 11:49:05 proxstore12 pmxcfs[32247]: [status] crit: cpg_initialize failed: 2
Sep 29 11:49:05 proxstore12 pvesr[2918316]: trying to acquire cfs lock 'file-replication_cfg' ...
Sep 29 11:49:06 proxstore12 pvesr[2918316]: trying to acquire cfs lock 'file-replication_cfg' ...
Sep 29 11:49:07 proxstore12 pvesr[2918316]: trying to acquire cfs lock 'file-replication_cfg' ...
Sep 29 11:49:08 proxstore12 pvesr[2918316]: trying to acquire cfs lock 'file-replication_cfg' ...
Sep 29 11:49:09 proxstore12 pvesr[2918316]: error with cfs lock 'file-replication_cfg': no quorum!
Sep 29 11:49:09 proxstore12 systemd[1]: pvesr.service: Main process exited, code=exited, status=13/n/a
Sep 29 11:49:09 proxstore12 systemd[1]: pvesr.service: Failed with result 'exit-code'.
Sep 29 11:49:09 proxstore12 systemd[1]: Failed to start Proxmox VE replication runner.
Sep 29 11:49:11 proxstore12 pmxcfs[32247]: [quorum] crit: quorum_initialize failed: 2
 
@kev1904

for the corosync crash, it seem that they are still a bug (we have fixed another segfault recently)

Can you send your log here to centralized them with others users report:

https://bugzilla.proxmox.com/show_bug.cgi?id=2326

also, if you are able to reproduce it, can you enable corosync debug log in corosync.conf ? and also install 'apt install systemd-coredump' .
Like this we'll have more infos on next crash
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!