Disabled / enabled spontaneously 1 node proxmox

BelokonevAS

New Member
May 18, 2020
8
0
1
32
Hi!
I have a proxmox cluster of 3 nodes, (node1, node2, node3) but 2 days ago I started to see a picture of how the cluster is losing node2. This node is active, it works, the VM continues to work on it, I have access via ssh and web. All nodes see it on the network. This problem prevents VM replication and VM megration.
Log node2 in attach files
in syslog pve-ha-lrm:
Code:
Oct 28 10:20:17 node2 pve-ha-lrm[51076]: unable to write lrm status file - unable to open file '/etc/pve/nodes/node2/lrm_status.tmp.51076' - Permission denied
Oct 28 10:21:20 node2 pve-ha-lrm[51076]: unable to write lrm status file - unable to open file '/etc/pve/nodes/node2/lrm_status.tmp.51076' - Permission denied
What could it be?
 

Attachments

  • node2.log
    24.5 KB · Views: 3
Last edited:
Hi,

Please post output of pvecm status and pveversion -v as well
Problem node
Bash:
root@node2:~# pvecm status
Cluster information
-------------------
Name:             Cluster
Config Version:   5
Transport:        knet
Secure auth:      on

Quorum information
------------------
Date:             Wed Oct 28 10:37:00 2020
Quorum provider:  corosync_votequorum
Nodes:            3
Node ID:          0x00000002
Ring ID:          1.c7e3
Quorate:          Yes

Votequorum information
----------------------
Expected votes:   3
Highest expected: 3
Total votes:      3
Quorum:           2
Flags:            Quorate

Membership information
----------------------
    Nodeid      Votes Name
0x00000001          1 172.16.100.10
0x00000002          1 172.16.100.11 (local)
0x00000004          1 172.16.100.13
node1
Bash:
root@node1:~# pvecm status
Cluster information
-------------------
Name:             Cluster
Config Version:   5
Transport:        knet
Secure auth:      on

Quorum information
------------------
Date:             Wed Oct 28 10:38:05 2020
Quorum provider:  corosync_votequorum
Nodes:            3
Node ID:          0x00000001
Ring ID:          1.c7e3
Quorate:          Yes


Votequorum information
----------------------
Expected votes:   3
Highest expected: 3
Total votes:      3
Quorum:           2
Flags:            Quorate


Membership information
----------------------
    Nodeid      Votes Name
0x00000001          1 172.16.100.10 (local)
0x00000002          1 172.16.100.11
0x00000004          1 172.16.100.13

Bash:
root@node2:~# pveversion -v
proxmox-ve: 6.1-2 (running kernel: 5.3.18-2-pve)
pve-manager: 6.1-7 (running version: 6.1-7/13e58d5e)
pve-kernel-helper: 6.1-6
pve-kernel-5.3: 6.1-5
pve-kernel-5.3.18-2-pve: 5.3.18-2
ceph-fuse: 12.2.11+dfsg1-2.1+b1
corosync: 3.0.3-pve1
criu: 3.11-3
glusterfs-client: 5.5-3
ifupdown: 0.8.35+pve1
ksm-control-daemon: 1.3-1
libjs-extjs: 6.0.1-10
libknet1: 1.15-pve1
libpve-access-control: 6.0-6
libpve-apiclient-perl: 3.0-3
libpve-common-perl: 6.0-13
libpve-guest-common-perl: 3.0-3
libpve-http-server-perl: 3.0-4
libpve-storage-perl: 6.1-5
libqb0: 1.0.5-1
libspice-server1: 0.14.2-4~pve6+1
lvm2: 2.03.02-pve4
lxc-pve: 3.2.1-1
lxcfs: 3.0.3-pve60
novnc-pve: 1.1.0-1
proxmox-mini-journalreader: 1.1-1
proxmox-widget-toolkit: 2.1-3
pve-cluster: 6.1-4
pve-container: 3.0-21
pve-docs: 6.1-6
pve-edk2-firmware: 2.20200229-1
pve-firewall: 4.0-10
pve-firmware: 3.0-6
pve-ha-manager: 3.0-8
pve-i18n: 2.0-4
pve-qemu-kvm: 4.1.1-3
pve-xtermjs: 4.3.0-1
qemu-server: 6.1-6
smartmontools: 7.1-pve2
spiceterm: 3.1-1
vncterm: 1.6-1
zfsutils-linux: 0.8.3-pve1
 
The log file indicate that Corosync does not have a stable network. How is the network configured for Corosync? Does it have its own physical network or does it share it with other services?

Corosync needs the lowest possible latency of just a few ms. If other services run on the same cable (also called VLAN), they can use the network to full capacity and thus increase the latency for other services such as Corosync.


Also please consider upgrading to the latest version of PVE :)
 
  • Like
Reactions: BelokonevAS
The log file indicate that Corosync does not have a stable network. How is the network configured for Corosync? Does it have its own physical network or does it share it with other services?

Corosync needs the lowest possible latency of just a few ms. If other services run on the same cable (also called VLAN), they can use the network to full capacity and thus increase the latency for other services such as Corosync.


Also please consider upgrading to the latest version of PVE :)
I have not configured Coronsync in a special way, however node1 and node2 are connected by a separate optical cable with a speed of 10Gb \ s. The rest of the network is a regular LAN 1Gb \ s.
Network config screen in attached files.
 

Attachments

  • node1.host.PNG
    node1.host.PNG
    12.3 KB · Views: 3
  • node1.network.PNG
    node1.network.PNG
    21.2 KB · Views: 3
  • node2.host.PNG
    node2.host.PNG
    12.9 KB · Views: 3
  • node2.network.PNG
    node2.network.PNG
    17.9 KB · Views: 3
Hi again,

Please post output of the following command:
Bash:
cat /etc/pve/corosync.conf
 
Hi again,

Please post output of the following command:
Bash:
cat /etc/pve/corosync.conf
When you say what problem in network, i started search, and see in other mac in arp table. I found new device в subnet proxmox with identic IP addres. Why this problem reveal itself now, i dont know, but system worked 9 moth. Thank for help!
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!