Nowtime Proxmox updated to 6.4
But our two smaller clusters working separated - we avoid build one big cluster again.
Does exists any news/comments about this bug ? Does it really fixed in 6.3 / 6.4 version ?
I catch long-time login in Debian-11@LXC too.
New container was created from downloaded template as unprivileged.
systemctl mask systemd-logind
solve issue without edit container config;
Thank you very much.
Nowtime we switched to SCTP and increase timeouts. Also we setting up sysctl option:
net.core.netdev_max_backlog = 50000
Working over SCTP produce more readable and useful logs, we remain with this option's set.
We trying stay updated at actual versions too, but not ready install test version...
In progress of recovery and try understanding issue we perform this steps:
0. Check errors on switch ports and option like storm control - this correct;
1. disable encryption for traffic analyze;
crypto_cipher: none
crypto_hash: none
and found this;
Sep 21 10:20:32 vps4 corosync[9641]: [TOTEM...
Flood can appear after restart corosync/pve-cluster random time delay, even 2-3 days after restart.
No. We have before 22 nodes in cluster, and after stop corosync and repeated restarts one-by-one some split-brains exists, some nodes view only self - no quorum and blocked activity. But when we...
1. Yes, we try stop pve-cluster and pkill pmxcfs.
This stop flood only for short time; After restart service some times later flood appear again.
In normal time logs not contain retransmits; Retransmits appear only under storm or shaper overload.
We continue diagnostics and try understand of...
I avoid touch /etc/pve/corosync.conf.
But at one node with problems i stopped corosync, enable debug in /etc/corosync/corosync.conf and start corosync again.
I collect logs from journalctl -f and attach this.
Some interesting logs:
Sep 17 11:12:59 vps4 corosync[24235]: [KNET ] link...
No, nic was not saturated or overloaded before.
Some nodes transmit up to 80 Mbps (maximum) of traffic - it so far from 1G link saturation/
When storm occur we exhaust only routing hardware and links;
Only apply iptables script restrict flood level and allow continue work and resurvive...
corosync in status active (running)
cluster splitted; Total votes is 1 or 5 on nodes, not 22 as usually; Some nodes have 13 votes and stay quorumed;
pvecm status answered with lag;
servive pve-cluster at nodes in status active / running; but in status of some nodes present rows like this:
Sep...
Nowday we catch udp-impact again;
I simple remove some old VMs, this removed correctly without problems; and after 30-40 minutes we catch udp-impact spontaneously;
I apply OUTPUT hashlimit rules for supress udp-flood from corosync at each node in cluster:
iptables -P OUTPUT ACCEPT
iptables...
Does inter-vlan routing enable on router for corresponding vlan's ?
Check firewall on border / gateway or similar security solution ?
Try install :
- tcpdump on proxmox node (NOT IN VM !!)
- wireshark on workstation
Catch some traffic by tcpdump at node with problem VM/CT.
Copy pcap-file to...
Yes, we have split too.
We stop corosync at all nodes, and run again one-by-one consequently with 2 seconds pause before run on next node.
After this cluster restore quorum correctly.
But we badly surprise too many UDP-flood traffic - we realy have some nodes in flood tsunami.
At early version...
I also can't reproduce - i don't know pre-conditions for this. =[
Problem taking place with corosync v. 3.0.4;
# apt info libknet*
Package: libknet1
Version: 1.16-pve1
Priority: optional
Section: libs
Source: kronosnet
Maintainer: Proxmox Support Team <support@proxmox.com>
Installed-Size: 329...
- update all at first
- do not modify anywhere in configs not in proxmox, not inside CT
- simply add vlan tag value in web-interface for your LXC-container
- check that your switch configured correctly for pass tagged traffic with corresponding vlan-id
ex. in cisco - check switchport trunk...
Hello.
We have Proxmox VE 6.2 cluster with 22 nodes, places over L2-network (2 physical server rooms, 10G link over, 1G to each node).
All VMs/CTs use only local storages; we use NFS only for backups. HA-containers not used;
We host only our services, inspected and trusted; No infected VMs or...
This site uses cookies to help personalise content, tailor your experience and to keep you logged in if you register.
By continuing to use this site, you are consenting to our use of cookies.