1 node offline after changing host hardware

Magneto

Well-Known Member
Jul 30, 2017
133
4
58
44
I have a 3 host cluster and changed one of the hardware nodes. The new machine didn't want to boot from the proxmox OS drive, so I decided to Install Proxmox onto another drive.

Before I did this, however, I removed the 3rd node (SRV3) from the cluster.
Then i wanted to re-add the node, but now the first node (SRV1) is offline.

What I found, which I think could be the problem is that the SSH key from the previous host node is still in /root/.ssh/authorized_keys:

more /root/.ssh/authorized_keys ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQDKWzBxavMN0F4uwrYIN49bmSNwllbBZGzPwhVuDZyk8Z1ZrTA4CNZyaAf0glYoVaRzg3Le2EGCLl2maNRVROmxPxAbxIeIN9M4twK3LghPjuc5oK7I3pl+mPvXQ+lEjSM1xgOp5AQ eVbrh+WlmwkOpg0iLtlC9NjWNe9xMg3EWpV/ybi45ixvPQD2X+o8S3ZcBnYWxTvQ1fprUkRLDpqnVwW3qQqfB7IjoRuWEBdDxZjrBw6calwS8zczfbB6gm03/MDyvVdktJcnRsXs3kx0b/9N5yhwse3POQRk8bCKw1WhLXed9Kd6mN8 EqKRlFtwI6hyqiKOcXhRVTyCxa3Vm9 root@SRV1 ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQDLueRL/xOOQkGeN2RpUBSDVbyZ4YELd/MmTBPm5u8K8RVLtkjnG1YxKkuJBGQ1w8s29752CzQSA75GHy+TGLXMvNUxRB2HtzLAY1XK6XhJ4zlvKI60EWbk3gDUXTN5fQIHclptooJ aMOPSfivOhyWBKKcKJsu7Amh9HZz4DJGoMC1xOP9Ya77NgDrWuHu7AxFP3v8JDV/3OlYgjbsdmXckxg5err2CoBRXchcMfi6I67nnfLsSESjpfhI/JIAFvopSjZ5CGW/d2J9ReXB8tE3pBszdyrEC1zSeTWi3lec9Xg3qTBWMgspsJW HsqOBHZgZONGT0eGDdANO3mZh+tJa3 root@SRV2 ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQDJ2L87LdcVaPrFL6NVwuoq/qfnAHvvhXP8D1P5502bTdTofHuKOkaRv7widIhTmi+YobH3yCMhVBWIfbK/Jit0bKdCTCElEee64irSvm5eGo2MjpPG/SOqevKQbNJuFxvtboGr3nI 8ni8rXlIvgpZb4OSMlwOLypUgzll0eIKv++x4gGbH1nyaY+OezRnM5yBK68AlUr3htf/8KPvnhGoLoWAun0MlJVCuTY6GE61YVYDtPooYdEebIvtJgPGhZdsCNsp/XwvdzvTXpUEtsW9Mf9enAtc4E5JCP54DiysOdk00R5UwjE3DQu K4xAfrojXOQywgmuIb5XzCF44JZ24N root@SRV3 ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQCUY4kKQMMIztU5mvUOpit3pv0zUSy9EiprkuwTz+bdr/OO3YBYaoJ+9TmockCAnPADAJoWTM6uOD/ogMLiVs6isM7mHs0B1zNwCtDA+8Bq4/klgoqZ/6M5FJwgzqOqaHatfr2Z/uQ FV19Ta+trOIIBKSUiyM4vYMZEfSQtiaoCp2aiSgkjLSs0lufMA94UPhwdqWfX8609NWV9K+3o45+izd7rii8kyvhnAz9Pxu0n+9WCdVG2s59v7ZhftN8dvrJ0BtBm4jqH6pl4TxKbR3GhXcvYAiN39R+vEeMQ2DH9Ac6SB7IOg865lf /qPLr94BwoSMS8zCFFdxunpiqrXJkf root@SRV3

Note 2 keys for SRV3

I then tried to run "pvecm updatecerts -f" on all 3 nodes, but it doesn't run on the 1st node:

Code:
root@SRV1:~# pvecm updatecerts -f
no quorum - unable to update files
root@SRV1:~#


How do I fix this?
 
you need a minimum of two nodes are (online) to do that because of quorum [0]
That's the problem.

When I login to SRV1, only SRV1 is online. When I login to SRV2 and SRV3, both SRV2 and SRV3 are online - almost asif there's 2 clusters.



you need a minimum of two nodes are (online) to do that because of quorum [0]

can you please post the output of pvecm status command

[0] https://pve.proxmox.com/pve-docs/chapter-pvecm.html#_quorum
root@SRV1:~# pvecm status Cluster information ------------------- Name: WHZ Config Version: 5 Transport: knet Secure auth: on Quorum information ------------------ Date: Mon Mar 1 11:08:02 2021 Quorum provider: corosync_votequorum Nodes: 2 Node ID: 0x00000001 Ring ID: 1.58d2 Quorate: Yes Votequorum information ---------------------- Expected votes: 3 Highest expected: 3 Total votes: 2 Quorum: 2 Flags: Quorate Membership information ---------------------- Nodeid Votes Name 0x00000001 1 192.168.11.241 (local) 0x00000002 1 192.168.10.242

Now it seems to have changed, depending on in which sequence I powered on the servers.

So, SRV3 is now in it's own cluster.

Code:
root@SRV3:~# pvecm status
Cluster information
-------------------
Name:             WHZ
Config Version:   5
Transport:        knet
Secure auth:      on

Quorum information
------------------
Date:             Mon Mar  1 11:08:14 2021
Quorum provider:  corosync_votequorum
Nodes:            1
Node ID:          0x00000003
Ring ID:          3.5b92
Quorate:          No

Votequorum information
----------------------
Expected votes:   3
Highest expected: 3
Total votes:      1
Quorum:           2 Activity blocked
Flags:

Membership information
----------------------
    Nodeid      Votes Name
0x00000003          1 192.168.10.243 (local)
root@SRV3:~#


I cannot login to either SRV2 or SRV2 web interface since booting up the servers this morning.


1614591361906.png

1614591385656.png



1614591399980.png
 
can you ssh to SRV1 node? if so please check the Syslog from SRV1 node to get any hint
 
ok,
can you ssh to SRV1 node? if so please check the Syslog from SRV1 node to get any hint
Ok, I can now login. Had to run the following commands on all 3 servers:


  • On every node do
    systemctl stop pve-cluster
    This may take a while
  • On every node do
    sudo rm -f /var/lib/pve-cluster/.pmxcfs.lockfile
  • On each node – one by one do
    systemctl start pve-cluster



And then it's like this again:


1614593794094.png
 
Also please send output of pveversion -v
Code:
root@SRV1:~# pveversion -v

proxmox-ve: 6.3-1 (running kernel: 5.4.98-1-pve)
pve-manager: 6.3-4 (running version: 6.3-4/0a38c56f)
pve-kernel-5.4: 6.3-5
pve-kernel-helper: 6.3-5
pve-kernel-5.4.98-1-pve: 5.4.98-1
pve-kernel-5.4.65-1-pve: 5.4.65-1
pve-kernel-5.4.34-1-pve: 5.4.34-2
ceph: 14.2.16-pve1
ceph-fuse: 14.2.16-pve1
corosync: 3.1.0-pve1
criu: 3.11-3
glusterfs-client: 5.5-3
ifupdown: residual config
ifupdown2: 3.0.0-1+pve3
ksm-control-daemon: 1.3-1
libjs-extjs: 6.0.1-10
libknet1: 1.20-pve1
libproxmox-acme-perl: 1.0.7
libproxmox-backup-qemu0: 1.0.3-1
libpve-access-control: 6.1-3
libpve-apiclient-perl: 3.1-3
libpve-common-perl: 6.3-4
libpve-guest-common-perl: 3.1-5
libpve-http-server-perl: 3.1-1
libpve-storage-perl: 6.3-7
libqb0: 1.0.5-1
libspice-server1: 0.14.2-4~pve6+1
lvm2: 2.03.02-pve4
lxc-pve: 4.0.6-2
lxcfs: 4.0.6-pve1
novnc-pve: 1.1.0-1
proxmox-backup-client: 1.0.8-1
proxmox-mini-journalreader: 1.1-1
proxmox-widget-toolkit: 2.4-5
pve-cluster: 6.2-1
pve-container: 3.3-4
pve-docs: 6.3-1
pve-edk2-firmware: 2.20200531-1
pve-firewall: 4.1-3
pve-firmware: 3.2-2
pve-ha-manager: 3.1-1
pve-i18n: 2.2-2
pve-qemu-kvm: 5.2.0-2
pve-xtermjs: 4.7.0-3
qemu-server: 6.3-5
smartmontools: 7.1-pve2
spiceterm: 3.1-1
vncterm: 1.6-2
zfsutils-linux: 2.0.3-pve1

Bash:
root@SRV2:~#  pveversion -v
proxmox-ve: 6.3-1 (running kernel: 5.4.98-1-pve)
pve-manager: 6.3-4 (running version: 6.3-4/0a38c56f)
pve-kernel-5.4: 6.3-5
pve-kernel-helper: 6.3-5
pve-kernel-5.4.98-1-pve: 5.4.98-1
pve-kernel-5.4.65-1-pve: 5.4.65-1
pve-kernel-5.4.34-1-pve: 5.4.34-2
ceph: 14.2.16-pve1
ceph-fuse: 14.2.16-pve1
corosync: 3.1.0-pve1
criu: 3.11-3
glusterfs-client: 5.5-3
ifupdown: residual config
ifupdown2: 3.0.0-1+pve3
ksm-control-daemon: 1.3-1
libjs-extjs: 6.0.1-10
libknet1: 1.20-pve1
libproxmox-acme-perl: 1.0.7
libproxmox-backup-qemu0: 1.0.3-1
libpve-access-control: 6.1-3
libpve-apiclient-perl: 3.1-3
libpve-common-perl: 6.3-4
libpve-guest-common-perl: 3.1-5
libpve-http-server-perl: 3.1-1
libpve-storage-perl: 6.3-7
libqb0: 1.0.5-1
libspice-server1: 0.14.2-4~pve6+1
lvm2: 2.03.02-pve4
lxc-pve: 4.0.6-2
lxcfs: 4.0.6-pve1
novnc-pve: 1.1.0-1
proxmox-backup-client: 1.0.8-1
proxmox-mini-journalreader: 1.1-1
proxmox-widget-toolkit: 2.4-5
pve-cluster: 6.2-1
pve-container: 3.3-4
pve-docs: 6.3-1
pve-edk2-firmware: 2.20200531-1
pve-firewall: 4.1-3
pve-firmware: 3.2-2
pve-ha-manager: 3.1-1
pve-i18n: 2.2-2
pve-qemu-kvm: 5.2.0-2
pve-xtermjs: 4.7.0-3
qemu-server: 6.3-5
smartmontools: 7.1-pve2
spiceterm: 3.1-1
vncterm: 1.6-2
zfsutils-linux: 2.0.3-pve1




Code:
root@SRV3:~#  pveversion -v
proxmox-ve: 6.2-1 (running kernel: 5.4.34-1-pve)
pve-manager: 6.2-4 (running version: 6.2-4/9824574a)
pve-kernel-5.4: 6.2-1
pve-kernel-helper: 6.2-1
pve-kernel-5.4.34-1-pve: 5.4.34-2
ceph: 14.2.16-pve1
ceph-fuse: 14.2.16-pve1
corosync: 3.0.3-pve1
criu: 3.11-3
glusterfs-client: 5.5-3
ifupdown: 0.8.35+pve1
ksm-control-daemon: 1.3-1
libjs-extjs: 6.0.1-10
libknet1: 1.15-pve1
libproxmox-acme-perl: 1.0.3
libpve-access-control: 6.1-1
libpve-apiclient-perl: 3.0-3
libpve-common-perl: 6.1-2
libpve-guest-common-perl: 3.0-10
libpve-http-server-perl: 3.0-5
libpve-storage-perl: 6.1-7
libqb0: 1.0.5-1
libspice-server1: 0.14.2-4~pve6+1
lvm2: 2.03.02-pve4
lxc-pve: 4.0.2-1
lxcfs: 4.0.3-pve2
novnc-pve: 1.1.0-1
proxmox-mini-journalreader: 1.1-1
proxmox-widget-toolkit: 2.2-1
pve-cluster: 6.1-8
pve-container: 3.1-5
pve-docs: 6.2-4
pve-edk2-firmware: 2.20200229-1
pve-firewall: 4.1-2
pve-firmware: 3.1-1
pve-ha-manager: 3.0-9
pve-i18n: 2.1-2
pve-qemu-kvm: 5.0.0-2
pve-xtermjs: 4.3.0-1
qemu-server: 6.2-2
smartmontools: 7.1-pve2
spiceterm: 3.1-1
vncterm: 1.6-1
zfsutils-linux: 0.8.3-pve1
 
check the Syslog using the following command journalctl -u pve-cluster -b

SRV3 node uses old version of Proxmox and corosync as well try to upgrade to the latest version of Proxmox after success upgrade reboot the SRV3 node please.
Bash:
apt update && apt dist-upgrade
 
check the Syslog using the following command journalctl -u pve-cluster -b

SRV3 node uses old version of Proxmox and corosync as well try to upgrade to the latest version of Proxmox after success upgrade reboot the SRV3 node please.
Bash:
apt update && apt dist-upgrade
Thanx

I ran the update and rebooted. Now SRV3 is on it's own, and SRV1 and SRV2 are in the cluster:


Bash:
root@192.168.10.241's password:
Linux SRV1 5.4.98-1-pve #1 SMP PVE 5.4.98-1 (Mon, 15 Feb 2021 16:33:27 +0100) x86_64

The programs included with the Debian GNU/Linux system are free software;
the exact distribution terms for each program are described in the
individual files in /usr/share/doc/*/copyright.

Debian GNU/Linux comes with ABSOLUTELY NO WARRANTY, to the extent
permitted by applicable law.
Last login: Mon Mar  1 11:05:51 2021 from 192.168.10.105
root@SRV1:~# journalctl -u pve-cluster -b
-- Logs begin at Mon 2021-03-01 12:49:56 SAST, end at Mon 2021-03-01 13:01:15 SAST. --
Mar 01 12:50:00 SRV1 systemd[1]: Starting The Proxmox VE cluster filesystem...
Mar 01 12:50:00 SRV1 pmxcfs[1512]: [quorum] crit: quorum_initialize failed: 2
Mar 01 12:50:00 SRV1 pmxcfs[1512]: [quorum] crit: can't initialize service
Mar 01 12:50:00 SRV1 pmxcfs[1512]: [confdb] crit: cmap_initialize failed: 2
Mar 01 12:50:00 SRV1 pmxcfs[1512]: [confdb] crit: can't initialize service
Mar 01 12:50:00 SRV1 pmxcfs[1512]: [dcdb] crit: cpg_initialize failed: 2
Mar 01 12:50:00 SRV1 pmxcfs[1512]: [dcdb] crit: can't initialize service
Mar 01 12:50:00 SRV1 pmxcfs[1512]: [status] crit: cpg_initialize failed: 2
Mar 01 12:50:00 SRV1 pmxcfs[1512]: [status] crit: can't initialize service
Mar 01 12:50:01 SRV1 systemd[1]: Started The Proxmox VE cluster filesystem.
Mar 01 12:50:06 SRV1 pmxcfs[1512]: [status] notice: update cluster info (cluster name  WHZ, version = 5)
Mar 01 12:50:06 SRV1 pmxcfs[1512]: [dcdb] notice: members: 1/1512
Mar 01 12:50:06 SRV1 pmxcfs[1512]: [dcdb] notice: all data is up to date
Mar 01 12:50:06 SRV1 pmxcfs[1512]: [status] notice: members: 1/1512
Mar 01 12:50:06 SRV1 pmxcfs[1512]: [status] notice: all data is up to date
Mar 01 12:50:11 SRV1 pmxcfs[1512]: [dcdb] notice: members: 1/1512, 2/10811
Mar 01 12:50:11 SRV1 pmxcfs[1512]: [dcdb] notice: starting data syncronisation
Mar 01 12:50:11 SRV1 pmxcfs[1512]: [dcdb] notice: cpg_send_message retried 1 times
Mar 01 12:50:11 SRV1 pmxcfs[1512]: [status] notice: node has quorum
Mar 01 12:50:11 SRV1 pmxcfs[1512]: [status] notice: members: 1/1512, 2/10811
Mar 01 12:50:11 SRV1 pmxcfs[1512]: [status] notice: starting data syncronisation
Mar 01 12:50:11 SRV1 pmxcfs[1512]: [dcdb] notice: received sync request (epoch 1/1512/00000002)
Mar 01 12:50:11 SRV1 pmxcfs[1512]: [status] notice: received sync request (epoch 1/1512/00000002)
Mar 01 12:50:11 SRV1 pmxcfs[1512]: [dcdb] notice: received all states
Mar 01 12:50:11 SRV1 pmxcfs[1512]: [dcdb] notice: leader is 2/10811
Mar 01 12:50:11 SRV1 pmxcfs[1512]: [dcdb] notice: synced members: 2/10811
Mar 01 12:50:11 SRV1 pmxcfs[1512]: [dcdb] notice: waiting for updates from leader
Mar 01 12:50:11 SRV1 pmxcfs[1512]: [status] notice: received all states
Mar 01 12:50:11 SRV1 pmxcfs[1512]: [status] notice: all data is up to date
Mar 01 12:50:11 SRV1 pmxcfs[1512]: [dcdb] notice: update complete - trying to commit (got 6 inode updates)
Mar 01 12:50:11 SRV1 pmxcfs[1512]: [dcdb] notice: all data is up to date
Mar 01 12:50:36 SRV1 pmxcfs[1512]: [status] notice: cpg_send_message retry 10
Mar 01 12:50:37 SRV1 pmxcfs[1512]: [status] notice: cpg_send_message retry 20
Mar 01 12:50:38 SRV1 pmxcfs[1512]: [status] notice: cpg_send_message retry 30
Mar 01 12:50:39 SRV1 pmxcfs[1512]: [status] notice: cpg_send_message retry 40
Mar 01 12:50:40 SRV1 pmxcfs[1512]: [status] notice: cpg_send_message retry 50
Mar 01 12:50:41 SRV1 pmxcfs[1512]: [status] notice: cpg_send_message retry 60
Mar 01 12:50:42 SRV1 pmxcfs[1512]: [status] notice: cpg_send_message retry 70
Mar 01 12:50:43 SRV1 pmxcfs[1512]: [status] notice: cpg_send_message retry 80
Mar 01 12:50:44 SRV1 pmxcfs[1512]: [status] notice: cpg_send_message retry 90
Mar 01 12:50:45 SRV1 pmxcfs[1512]: [status] notice: cpg_send_message retry 100
Mar 01 12:50:45 SRV1 pmxcfs[1512]: [status] notice: cpg_send_message retried 100 times
Mar 01 12:50:45 SRV1 pmxcfs[1512]: [status] crit: cpg_send_message failed: 6




Bash:
root@SRV3:~# journalctl -u pve-cluster -b
-- Logs begin at Mon 2021-03-01 12:49:51 SAST, end at Mon 2021-03-01 13:00:20 SAST. --
Mar 01 12:50:21 SRV3 systemd[1]: Starting The Proxmox VE cluster filesystem...
Mar 01 12:50:22 SRV3 pmxcfs[1130]: [quorum] crit: quorum_initialize failed: 2
Mar 01 12:50:22 SRV3 pmxcfs[1130]: [quorum] crit: can't initialize service
Mar 01 12:50:22 SRV3 pmxcfs[1130]: [confdb] crit: cmap_initialize failed: 2
Mar 01 12:50:22 SRV3 pmxcfs[1130]: [confdb] crit: can't initialize service
Mar 01 12:50:22 SRV3 pmxcfs[1130]: [dcdb] crit: cpg_initialize failed: 2
Mar 01 12:50:22 SRV3 pmxcfs[1130]: [dcdb] crit: can't initialize service
Mar 01 12:50:22 SRV3 pmxcfs[1130]: [status] crit: cpg_initialize failed: 2
Mar 01 12:50:22 SRV3 pmxcfs[1130]: [status] crit: can't initialize service
Mar 01 12:50:23 SRV3 systemd[1]: Started The Proxmox VE cluster filesystem.
Mar 01 12:50:28 SRV3 pmxcfs[1130]: [status] notice: update cluster info (cluster name  WHZ, version = 5)
Mar 01 12:50:28 SRV3 pmxcfs[1130]: [dcdb] notice: members: 3/1130
Mar 01 12:50:28 SRV3 pmxcfs[1130]: [dcdb] notice: all data is up to date
Mar 01 12:50:28 SRV3 pmxcfs[1130]: [status] notice: members: 3/1130
Mar 01 12:50:28 SRV3 pmxcfs[1130]: [status] notice: all data is up to date
 
thanks for the output :)

can you do the command below on all nodes and post the output, and provide us your network configuration and Corosync config as well.


Bash:
~ corosync-cfgtool -s
~ cat /etc/pve/corosync.conf
~ cat /etc/network/interfaces


Also, it will be helpful if you send us the Syslog for Corosync

Bash:
journalctl -u corosync.service
 
thanks for the output :)

can you do the command below on all nodes and post the output, and provide us your network configuration and Corosync config as well.


Bash:
~ corosync-cfgtool -s
~ cat /etc/pve/corosync.conf
~ cat /etc/network/interfaces

Code:
root@SRV1:~# corosync-cfgtool -s
Local node ID 1, transport knet
LINK ID 0
        addr    = 192.168.11.241
        status:
                nodeid:   1:    localhost
                nodeid:   2:    connected
                nodeid:   3:    disconnected

root@SRV1:~# corosync-cfgtool -s
Local node ID 1, transport knet
LINK ID 0
        addr    = 192.168.11.241
        status:
                nodeid:   1:    localhost
                nodeid:   2:    connected
                nodeid:   3:    disconnected
root@SRV1:~# cat /etc/pve/corosync.conf
logging {
  debug: off
  to_syslog: yes
}

nodelist {
  node {
    name: SRV1
    nodeid: 1
    quorum_votes: 1
    ring0_addr: 192.168.11.241
  }
  node {
    name: SRV2
    nodeid: 2
    quorum_votes: 1
    ring0_addr: 192.168.10.242
  }
  node {
    name: SRV3
    nodeid: 3
    quorum_votes: 1
    ring0_addr: 192.168.10.243
  }
}

quorum {
  provider: corosync_votequorum
}

totem {
  cluster_name: WHZ
  config_version: 5
  interface {
    linknumber: 0
  }
  ip_version: ipv4-6
  link_mode: passive
  secauth: on
  version: 2
}



root@SRV1:~# cat /etc/network/interfaces
# network interface settings; autogenerated
# Please do NOT modify this file directly, unless you know what
# you're doing.
#
# If you want to manage parts of the network configuration manually,
# please utilize the 'source' or 'source-directory' directives to do
# so.
# PVE will preserve these directives, but will NOT read its network
# configuration from sourced files, so do not attempt to move any of
# the PVE managed interfaces into external files!

auto lo
iface lo inet loopback

iface enp6s0f0 inet manual

auto enp6s0f1
iface enp6s0f1 inet manual

auto enp6s0f2
iface enp6s0f2 inet manual

auto enp6s0f3
iface enp6s0f3 inet static
        address 192.168.11.241/24
#Storage

auto vmbr0
iface vmbr0 inet static
        address 192.168.10.241/24
        gateway 192.168.10.1
        bridge-ports enp6s0f0
        bridge-stp off
        bridge-fd 0

Code:
root@SRV2:~# corosync-cfgtool -s
Local node ID 2, transport knet
LINK ID 0
        addr    = 192.168.10.242
        status:
                nodeid:   1:    connected
                nodeid:   2:    localhost
                nodeid:   3:    connected

root@SRV2:~# cat /etc/pve/corosync.conf
logging {
  debug: off
  to_syslog: yes
}

nodelist {
  node {
    name: SRV1
    nodeid: 1
    quorum_votes: 1
    ring0_addr: 192.168.11.241
  }
  node {
    name: SRV2
    nodeid: 2
    quorum_votes: 1
    ring0_addr: 192.168.10.242
  }
  node {
    name: SRV3
    nodeid: 3
    quorum_votes: 1
    ring0_addr: 192.168.10.243
  }
}

quorum {
  provider: corosync_votequorum
}

totem {
  cluster_name: WHZ
  config_version: 5
  interface {
    linknumber: 0
  }
  ip_version: ipv4-6
  link_mode: passive
  secauth: on
  version: 2
}


root@SRV2:~#  cat /etc/network/interfaces
# network interface settings; autogenerated
# Please do NOT modify this file directly, unless you know what
# you're doing.
#
# If you want to manage parts of the network configuration manually,
# please utilize the 'source' or 'source-directory' directives to do
# so.
# PVE will preserve these directives, but will NOT read its network
# configuration from sourced files, so do not attempt to move any of
# the PVE managed interfaces into external files!

auto lo
iface lo inet loopback

auto enp0s25
iface enp0s25 inet manual

auto eno1
iface eno1 inet static
        address 192.168.11.242/24
#Storage

auto vmbr0
iface vmbr0 inet static
        address 192.168.10.242/24
        gateway 192.168.10.1
        bridge-ports enp0s25
        bridge-stp off
        bridge-fd 0

Code:
root@SRV3:~# corosync-cfgtool -s
Printing link status.
Local node ID 3
LINK ID 0
        addr    = 192.168.10.243
        status:
                nodeid  1:      link enabled:1  link connected:0
                nodeid  2:      link enabled:1  link connected:1
                nodeid  3:      link enabled:1  link connected:1


root@SRV3:~# corosync-cfgtool -s
Printing link status.
Local node ID 3
LINK ID 0
        addr    = 192.168.10.243
        status:
                nodeid  1:      link enabled:1  link connected:0
                nodeid  2:      link enabled:1  link connected:1
                nodeid  3:      link enabled:1  link connected:1
root@SRV3:~# cat /etc/pve/corosync.conf
logging {
  debug: off
  to_syslog: yes
}

nodelist {
  node {
    name: SRV1
    nodeid: 1
    quorum_votes: 1
    ring0_addr: 192.168.11.241
  }
  node {
    name: SRV2
    nodeid: 2
    quorum_votes: 1
    ring0_addr: 192.168.10.242
  }
  node {
    name: SRV3
    nodeid: 3
    quorum_votes: 1
    ring0_addr: 192.168.10.243
  }
}

quorum {
  provider: corosync_votequorum
}

totem {
  cluster_name: WHZ
  config_version: 5
  interface {
    linknumber: 0
  }
  ip_version: ipv4-6
  link_mode: passive
  secauth: on
  version: 2
}


root@SRV3:~# cat /etc/network/interfaces
# network interface settings; autogenerated
# Please do NOT modify this file directly, unless you know what
# you're doing.
#
# If you want to manage parts of the network configuration manually,
# please utilize the 'source' or 'source-directory' directives to do
# so.
# PVE will preserve these directives, but will NOT read its network
# configuration from sourced files, so do not attempt to move any of
# the PVE managed interfaces into external files!

auto lo
iface lo inet loopback

iface enp1s0 inet manual

auto enp2s0
iface enp2s0 inet static
        address 192.168.11.243/24
#Storage

auto vmbr0
iface vmbr0 inet static
        address 192.168.10.243/24
        gateway 192.168.10.1
        bridge-ports enp1s0
        bridge-stp off
        bridge-fd 0

Also, it will be helpful if you send us the Syslog for Corosync

Bash:
journalctl -u corosync.service
This is VERY LONG. i will attach a TXT file
 

Attachments

  • corosync.service.txt
    652.7 KB · Views: 1
Hi again!

Thanks for the output.

May I ask you why the SRV1 node is on a different subnet IP? the issue might from the network side, so try to make all nodes in the same subnet IP also check here[0]

Are you running HA in the Cluster?

Have you tried to reboot your nodes or corosync service?


[0] https://pve.proxmox.com/pve-docs/chapter-pvecm.html#pvecm_redundancy
 
Hi again!

Thanks for the output.

May I ask you why the SRV1 node is on a different subnet IP? the issue might from the network side, so try to make all nodes in the same subnet IP also check here[0]

Are you running HA in the Cluster?

Have you tried to reboot your nodes or corosync service?


[0] https://pve.proxmox.com/pve-docs/chapter-pvecm.html#pvecm_redundancy
I want / wanted to move CEPH to the 2nd IP subnet, but that failed. Both IP subnets can communicate. And all worked fine, till I had to reinstall Proxmox onto another drive.


So, shortly after my last reply, I added the 2nd IP subnet (rather, 192.168.11.243) to SRV3 and now all 3 nodes can see each other in the web UI.

I was under the impression that SRV was on the same subnet, as per /etc/hosts:


root@SRV1:~# more /etc/hosts 127.0.0.1 localhost.localdomain localhost 192.168.10.241 SRV1.local SRV1 192.168.10.242 SRV2.local SRV2 192.168.10.243 SRV3.local SRV3


1614608854563.png
How do I get SRV1 back on 192.168.10.241?


And this raises the question, how do I safely move CEPH to another IP subnet - let's say 172.16.16.10/24 for that matter?
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!