1 node offline after changing host hardware

Magneto · Feb 27, 2021

I have a 3 host cluster and changed one of the hardware nodes. The new machine didn't want to boot from the proxmox OS drive, so I decided to Install Proxmox onto another drive.

Before I did this, however, I removed the 3rd node (SRV3) from the cluster.
Then i wanted to re-add the node, but now the first node (SRV1) is offline.

What I found, which I think could be the problem is that the SSH key from the previous host node is still in /root/.ssh/authorized_keys:


 more /root/.ssh/authorized_keys

ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQDKWzBxavMN0F4uwrYIN49bmSNwllbBZGzPwhVuDZyk8Z1ZrTA4CNZyaAf0glYoVaRzg3Le2EGCLl2maNRVROmxPxAbxIeIN9M4twK3LghPjuc5oK7I3pl+mPvXQ+lEjSM1xgOp5AQ
eVbrh+WlmwkOpg0iLtlC9NjWNe9xMg3EWpV/ybi45ixvPQD2X+o8S3ZcBnYWxTvQ1fprUkRLDpqnVwW3qQqfB7IjoRuWEBdDxZjrBw6calwS8zczfbB6gm03/MDyvVdktJcnRsXs3kx0b/9N5yhwse3POQRk8bCKw1WhLXed9Kd6mN8
EqKRlFtwI6hyqiKOcXhRVTyCxa3Vm9 root@SRV1
ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQDLueRL/xOOQkGeN2RpUBSDVbyZ4YELd/MmTBPm5u8K8RVLtkjnG1YxKkuJBGQ1w8s29752CzQSA75GHy+TGLXMvNUxRB2HtzLAY1XK6XhJ4zlvKI60EWbk3gDUXTN5fQIHclptooJ
aMOPSfivOhyWBKKcKJsu7Amh9HZz4DJGoMC1xOP9Ya77NgDrWuHu7AxFP3v8JDV/3OlYgjbsdmXckxg5err2CoBRXchcMfi6I67nnfLsSESjpfhI/JIAFvopSjZ5CGW/d2J9ReXB8tE3pBszdyrEC1zSeTWi3lec9Xg3qTBWMgspsJW
HsqOBHZgZONGT0eGDdANO3mZh+tJa3 root@SRV2
ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQDJ2L87LdcVaPrFL6NVwuoq/qfnAHvvhXP8D1P5502bTdTofHuKOkaRv7widIhTmi+YobH3yCMhVBWIfbK/Jit0bKdCTCElEee64irSvm5eGo2MjpPG/SOqevKQbNJuFxvtboGr3nI
8ni8rXlIvgpZb4OSMlwOLypUgzll0eIKv++x4gGbH1nyaY+OezRnM5yBK68AlUr3htf/8KPvnhGoLoWAun0MlJVCuTY6GE61YVYDtPooYdEebIvtJgPGhZdsCNsp/XwvdzvTXpUEtsW9Mf9enAtc4E5JCP54DiysOdk00R5UwjE3DQu
K4xAfrojXOQywgmuIb5XzCF44JZ24N root@SRV3
ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQCUY4kKQMMIztU5mvUOpit3pv0zUSy9EiprkuwTz+bdr/OO3YBYaoJ+9TmockCAnPADAJoWTM6uOD/ogMLiVs6isM7mHs0B1zNwCtDA+8Bq4/klgoqZ/6M5FJwgzqOqaHatfr2Z/uQ
FV19Ta+trOIIBKSUiyM4vYMZEfSQtiaoCp2aiSgkjLSs0lufMA94UPhwdqWfX8609NWV9K+3o45+izd7rii8kyvhnAz9Pxu0n+9WCdVG2s59v7ZhftN8dvrJ0BtBm4jqH6pl4TxKbR3GhXcvYAiN39R+vEeMQ2DH9Ac6SB7IOg865lf
/qPLr94BwoSMS8zCFFdxunpiqrXJkf root@SRV3

Note 2 keys for SRV3

I then tried to run "pvecm updatecerts -f" on all 3 nodes, but it doesn't run on the 1st node:

Code:

root@SRV1:~# pvecm updatecerts -f
no quorum - unable to update files
root@SRV1:~#

How do I fix this?

Moayad · Mar 1, 2021

SilverNodashi said:
no quorum - unable to update files

you need a minimum of two nodes are (online) to do that because of quorum [0]

can you please post the output of pvecm status command

[0] https://pve.proxmox.com/pve-docs/chapter-pvecm.html#_quorum

Magneto · Mar 1, 2021

Moayad said:
you need a minimum of two nodes are (online) to do that because of quorum [0]

That's the problem.

When I login to SRV1, only SRV1 is online. When I login to SRV2 and SRV3, both SRV2 and SRV3 are online - almost asif there's 2 clusters.

Moayad said:
you need a minimum of two nodes are (online) to do that because of quorum [0]

can you please post the output of pvecm status command

[0] https://pve.proxmox.com/pve-docs/chapter-pvecm.html#_quorum


root@SRV1:~# pvecm status
Cluster information
-------------------
Name:             WHZ
Config Version:   5
Transport:        knet
Secure auth:      on

Quorum information
------------------
Date:             Mon Mar  1 11:08:02 2021
Quorum provider:  corosync_votequorum
Nodes:            2
Node ID:          0x00000001
Ring ID:          1.58d2
Quorate:          Yes

Votequorum information
----------------------
Expected votes:   3
Highest expected: 3
Total votes:      2
Quorum:           2
Flags:            Quorate

Membership information
----------------------
    Nodeid      Votes Name
0x00000001          1 192.168.11.241 (local)
0x00000002          1 192.168.10.242

Now it seems to have changed, depending on in which sequence I powered on the servers.

So, SRV3 is now in it's own cluster.

Code:

root@SRV3:~# pvecm status
Cluster information
-------------------
Name:             WHZ
Config Version:   5
Transport:        knet
Secure auth:      on

Quorum information
------------------
Date:             Mon Mar  1 11:08:14 2021
Quorum provider:  corosync_votequorum
Nodes:            1
Node ID:          0x00000003
Ring ID:          3.5b92
Quorate:          No

Votequorum information
----------------------
Expected votes:   3
Highest expected: 3
Total votes:      1
Quorum:           2 Activity blocked
Flags:

Membership information
----------------------
    Nodeid      Votes Name
0x00000003          1 192.168.10.243 (local)
root@SRV3:~#

I cannot login to either SRV2 or SRV2 web interface since booting up the servers this morning.

Moayad · Mar 1, 2021

can you ssh to SRV1 node? if so please check the Syslog from SRV1 node to get any hint

Moayad · Mar 1, 2021

Also please send output of pveversion -v

Magneto · Mar 1, 2021

ok,

Moayad said:
can you ssh to SRV1 node? if so please check the Syslog from SRV1 node to get any hint

Ok, I can now login. Had to run the following commands on all 3 servers:

On every node do
systemctl stop pve-cluster
This may take a while
On every node do
sudo rm -f /var/lib/pve-cluster/.pmxcfs.lockfile
On each node – one by one do
systemctl start pve-cluster

And then it's like this again:

Magneto · Mar 1, 2021

Moayad said:
Also please send output of pveversion -v

Code:

root@SRV1:~# pveversion -v

proxmox-ve: 6.3-1 (running kernel: 5.4.98-1-pve)
pve-manager: 6.3-4 (running version: 6.3-4/0a38c56f)
pve-kernel-5.4: 6.3-5
pve-kernel-helper: 6.3-5
pve-kernel-5.4.98-1-pve: 5.4.98-1
pve-kernel-5.4.65-1-pve: 5.4.65-1
pve-kernel-5.4.34-1-pve: 5.4.34-2
ceph: 14.2.16-pve1
ceph-fuse: 14.2.16-pve1
corosync: 3.1.0-pve1
criu: 3.11-3
glusterfs-client: 5.5-3
ifupdown: residual config
ifupdown2: 3.0.0-1+pve3
ksm-control-daemon: 1.3-1
libjs-extjs: 6.0.1-10
libknet1: 1.20-pve1
libproxmox-acme-perl: 1.0.7
libproxmox-backup-qemu0: 1.0.3-1
libpve-access-control: 6.1-3
libpve-apiclient-perl: 3.1-3
libpve-common-perl: 6.3-4
libpve-guest-common-perl: 3.1-5
libpve-http-server-perl: 3.1-1
libpve-storage-perl: 6.3-7
libqb0: 1.0.5-1
libspice-server1: 0.14.2-4~pve6+1
lvm2: 2.03.02-pve4
lxc-pve: 4.0.6-2
lxcfs: 4.0.6-pve1
novnc-pve: 1.1.0-1
proxmox-backup-client: 1.0.8-1
proxmox-mini-journalreader: 1.1-1
proxmox-widget-toolkit: 2.4-5
pve-cluster: 6.2-1
pve-container: 3.3-4
pve-docs: 6.3-1
pve-edk2-firmware: 2.20200531-1
pve-firewall: 4.1-3
pve-firmware: 3.2-2
pve-ha-manager: 3.1-1
pve-i18n: 2.2-2
pve-qemu-kvm: 5.2.0-2
pve-xtermjs: 4.7.0-3
qemu-server: 6.3-5
smartmontools: 7.1-pve2
spiceterm: 3.1-1
vncterm: 1.6-2
zfsutils-linux: 2.0.3-pve1

Bash:

root@SRV2:~#  pveversion -v
proxmox-ve: 6.3-1 (running kernel: 5.4.98-1-pve)
pve-manager: 6.3-4 (running version: 6.3-4/0a38c56f)
pve-kernel-5.4: 6.3-5
pve-kernel-helper: 6.3-5
pve-kernel-5.4.98-1-pve: 5.4.98-1
pve-kernel-5.4.65-1-pve: 5.4.65-1
pve-kernel-5.4.34-1-pve: 5.4.34-2
ceph: 14.2.16-pve1
ceph-fuse: 14.2.16-pve1
corosync: 3.1.0-pve1
criu: 3.11-3
glusterfs-client: 5.5-3
ifupdown: residual config
ifupdown2: 3.0.0-1+pve3
ksm-control-daemon: 1.3-1
libjs-extjs: 6.0.1-10
libknet1: 1.20-pve1
libproxmox-acme-perl: 1.0.7
libproxmox-backup-qemu0: 1.0.3-1
libpve-access-control: 6.1-3
libpve-apiclient-perl: 3.1-3
libpve-common-perl: 6.3-4
libpve-guest-common-perl: 3.1-5
libpve-http-server-perl: 3.1-1
libpve-storage-perl: 6.3-7
libqb0: 1.0.5-1
libspice-server1: 0.14.2-4~pve6+1
lvm2: 2.03.02-pve4
lxc-pve: 4.0.6-2
lxcfs: 4.0.6-pve1
novnc-pve: 1.1.0-1
proxmox-backup-client: 1.0.8-1
proxmox-mini-journalreader: 1.1-1
proxmox-widget-toolkit: 2.4-5
pve-cluster: 6.2-1
pve-container: 3.3-4
pve-docs: 6.3-1
pve-edk2-firmware: 2.20200531-1
pve-firewall: 4.1-3
pve-firmware: 3.2-2
pve-ha-manager: 3.1-1
pve-i18n: 2.2-2
pve-qemu-kvm: 5.2.0-2
pve-xtermjs: 4.7.0-3
qemu-server: 6.3-5
smartmontools: 7.1-pve2
spiceterm: 3.1-1
vncterm: 1.6-2
zfsutils-linux: 2.0.3-pve1

Code:

root@SRV3:~#  pveversion -v
proxmox-ve: 6.2-1 (running kernel: 5.4.34-1-pve)
pve-manager: 6.2-4 (running version: 6.2-4/9824574a)
pve-kernel-5.4: 6.2-1
pve-kernel-helper: 6.2-1
pve-kernel-5.4.34-1-pve: 5.4.34-2
ceph: 14.2.16-pve1
ceph-fuse: 14.2.16-pve1
corosync: 3.0.3-pve1
criu: 3.11-3
glusterfs-client: 5.5-3
ifupdown: 0.8.35+pve1
ksm-control-daemon: 1.3-1
libjs-extjs: 6.0.1-10
libknet1: 1.15-pve1
libproxmox-acme-perl: 1.0.3
libpve-access-control: 6.1-1
libpve-apiclient-perl: 3.0-3
libpve-common-perl: 6.1-2
libpve-guest-common-perl: 3.0-10
libpve-http-server-perl: 3.0-5
libpve-storage-perl: 6.1-7
libqb0: 1.0.5-1
libspice-server1: 0.14.2-4~pve6+1
lvm2: 2.03.02-pve4
lxc-pve: 4.0.2-1
lxcfs: 4.0.3-pve2
novnc-pve: 1.1.0-1
proxmox-mini-journalreader: 1.1-1
proxmox-widget-toolkit: 2.2-1
pve-cluster: 6.1-8
pve-container: 3.1-5
pve-docs: 6.2-4
pve-edk2-firmware: 2.20200229-1
pve-firewall: 4.1-2
pve-firmware: 3.1-1
pve-ha-manager: 3.0-9
pve-i18n: 2.1-2
pve-qemu-kvm: 5.0.0-2
pve-xtermjs: 4.3.0-1
qemu-server: 6.2-2
smartmontools: 7.1-pve2
spiceterm: 3.1-1
vncterm: 1.6-1
zfsutils-linux: 0.8.3-pve1

Moayad · Mar 1, 2021

check the Syslog using the following command journalctl -u pve-cluster -b

SRV3 node uses old version of Proxmox and corosync as well try to upgrade to the latest version of Proxmox after success upgrade reboot the SRV3 node please.

Bash:

apt update && apt dist-upgrade

Magneto · Mar 1, 2021

Moayad said:
check the Syslog using the following command journalctl -u pve-cluster -b

SRV3 node uses old version of Proxmox and corosync as well try to upgrade to the latest version of Proxmox after success upgrade reboot the SRV3 node please.

Bash:

apt update && apt dist-upgrade

Thanx

I ran the update and rebooted. Now SRV3 is on it's own, and SRV1 and SRV2 are in the cluster:

Bash:

root@192.168.10.241's password:
Linux SRV1 5.4.98-1-pve #1 SMP PVE 5.4.98-1 (Mon, 15 Feb 2021 16:33:27 +0100) x86_64

The programs included with the Debian GNU/Linux system are free software;
the exact distribution terms for each program are described in the
individual files in /usr/share/doc/*/copyright.

Debian GNU/Linux comes with ABSOLUTELY NO WARRANTY, to the extent
permitted by applicable law.
Last login: Mon Mar  1 11:05:51 2021 from 192.168.10.105
root@SRV1:~# journalctl -u pve-cluster -b
-- Logs begin at Mon 2021-03-01 12:49:56 SAST, end at Mon 2021-03-01 13:01:15 SAST. --
Mar 01 12:50:00 SRV1 systemd[1]: Starting The Proxmox VE cluster filesystem...
Mar 01 12:50:00 SRV1 pmxcfs[1512]: [quorum] crit: quorum_initialize failed: 2
Mar 01 12:50:00 SRV1 pmxcfs[1512]: [quorum] crit: can't initialize service
Mar 01 12:50:00 SRV1 pmxcfs[1512]: [confdb] crit: cmap_initialize failed: 2
Mar 01 12:50:00 SRV1 pmxcfs[1512]: [confdb] crit: can't initialize service
Mar 01 12:50:00 SRV1 pmxcfs[1512]: [dcdb] crit: cpg_initialize failed: 2
Mar 01 12:50:00 SRV1 pmxcfs[1512]: [dcdb] crit: can't initialize service
Mar 01 12:50:00 SRV1 pmxcfs[1512]: [status] crit: cpg_initialize failed: 2
Mar 01 12:50:00 SRV1 pmxcfs[1512]: [status] crit: can't initialize service
Mar 01 12:50:01 SRV1 systemd[1]: Started The Proxmox VE cluster filesystem.
Mar 01 12:50:06 SRV1 pmxcfs[1512]: [status] notice: update cluster info (cluster name  WHZ, version = 5)
Mar 01 12:50:06 SRV1 pmxcfs[1512]: [dcdb] notice: members: 1/1512
Mar 01 12:50:06 SRV1 pmxcfs[1512]: [dcdb] notice: all data is up to date
Mar 01 12:50:06 SRV1 pmxcfs[1512]: [status] notice: members: 1/1512
Mar 01 12:50:06 SRV1 pmxcfs[1512]: [status] notice: all data is up to date
Mar 01 12:50:11 SRV1 pmxcfs[1512]: [dcdb] notice: members: 1/1512, 2/10811
Mar 01 12:50:11 SRV1 pmxcfs[1512]: [dcdb] notice: starting data syncronisation
Mar 01 12:50:11 SRV1 pmxcfs[1512]: [dcdb] notice: cpg_send_message retried 1 times
Mar 01 12:50:11 SRV1 pmxcfs[1512]: [status] notice: node has quorum
Mar 01 12:50:11 SRV1 pmxcfs[1512]: [status] notice: members: 1/1512, 2/10811
Mar 01 12:50:11 SRV1 pmxcfs[1512]: [status] notice: starting data syncronisation
Mar 01 12:50:11 SRV1 pmxcfs[1512]: [dcdb] notice: received sync request (epoch 1/1512/00000002)
Mar 01 12:50:11 SRV1 pmxcfs[1512]: [status] notice: received sync request (epoch 1/1512/00000002)
Mar 01 12:50:11 SRV1 pmxcfs[1512]: [dcdb] notice: received all states
Mar 01 12:50:11 SRV1 pmxcfs[1512]: [dcdb] notice: leader is 2/10811
Mar 01 12:50:11 SRV1 pmxcfs[1512]: [dcdb] notice: synced members: 2/10811
Mar 01 12:50:11 SRV1 pmxcfs[1512]: [dcdb] notice: waiting for updates from leader
Mar 01 12:50:11 SRV1 pmxcfs[1512]: [status] notice: received all states
Mar 01 12:50:11 SRV1 pmxcfs[1512]: [status] notice: all data is up to date
Mar 01 12:50:11 SRV1 pmxcfs[1512]: [dcdb] notice: update complete - trying to commit (got 6 inode updates)
Mar 01 12:50:11 SRV1 pmxcfs[1512]: [dcdb] notice: all data is up to date
Mar 01 12:50:36 SRV1 pmxcfs[1512]: [status] notice: cpg_send_message retry 10
Mar 01 12:50:37 SRV1 pmxcfs[1512]: [status] notice: cpg_send_message retry 20
Mar 01 12:50:38 SRV1 pmxcfs[1512]: [status] notice: cpg_send_message retry 30
Mar 01 12:50:39 SRV1 pmxcfs[1512]: [status] notice: cpg_send_message retry 40
Mar 01 12:50:40 SRV1 pmxcfs[1512]: [status] notice: cpg_send_message retry 50
Mar 01 12:50:41 SRV1 pmxcfs[1512]: [status] notice: cpg_send_message retry 60
Mar 01 12:50:42 SRV1 pmxcfs[1512]: [status] notice: cpg_send_message retry 70
Mar 01 12:50:43 SRV1 pmxcfs[1512]: [status] notice: cpg_send_message retry 80
Mar 01 12:50:44 SRV1 pmxcfs[1512]: [status] notice: cpg_send_message retry 90
Mar 01 12:50:45 SRV1 pmxcfs[1512]: [status] notice: cpg_send_message retry 100
Mar 01 12:50:45 SRV1 pmxcfs[1512]: [status] notice: cpg_send_message retried 100 times
Mar 01 12:50:45 SRV1 pmxcfs[1512]: [status] crit: cpg_send_message failed: 6

Bash:

root@SRV3:~# journalctl -u pve-cluster -b
-- Logs begin at Mon 2021-03-01 12:49:51 SAST, end at Mon 2021-03-01 13:00:20 SAST. --
Mar 01 12:50:21 SRV3 systemd[1]: Starting The Proxmox VE cluster filesystem...
Mar 01 12:50:22 SRV3 pmxcfs[1130]: [quorum] crit: quorum_initialize failed: 2
Mar 01 12:50:22 SRV3 pmxcfs[1130]: [quorum] crit: can't initialize service
Mar 01 12:50:22 SRV3 pmxcfs[1130]: [confdb] crit: cmap_initialize failed: 2
Mar 01 12:50:22 SRV3 pmxcfs[1130]: [confdb] crit: can't initialize service
Mar 01 12:50:22 SRV3 pmxcfs[1130]: [dcdb] crit: cpg_initialize failed: 2
Mar 01 12:50:22 SRV3 pmxcfs[1130]: [dcdb] crit: can't initialize service
Mar 01 12:50:22 SRV3 pmxcfs[1130]: [status] crit: cpg_initialize failed: 2
Mar 01 12:50:22 SRV3 pmxcfs[1130]: [status] crit: can't initialize service
Mar 01 12:50:23 SRV3 systemd[1]: Started The Proxmox VE cluster filesystem.
Mar 01 12:50:28 SRV3 pmxcfs[1130]: [status] notice: update cluster info (cluster name  WHZ, version = 5)
Mar 01 12:50:28 SRV3 pmxcfs[1130]: [dcdb] notice: members: 3/1130
Mar 01 12:50:28 SRV3 pmxcfs[1130]: [dcdb] notice: all data is up to date
Mar 01 12:50:28 SRV3 pmxcfs[1130]: [status] notice: members: 3/1130
Mar 01 12:50:28 SRV3 pmxcfs[1130]: [status] notice: all data is up to date

Moayad · Mar 1, 2021

thanks for the output

can you do the command below on all nodes and post the output, and provide us your network configuration and Corosync config as well.

Bash:

~ corosync-cfgtool -s
~ cat /etc/pve/corosync.conf
~ cat /etc/network/interfaces

Also, it will be helpful if you send us the Syslog for Corosync

Bash:

journalctl -u corosync.service

Magneto · Mar 1, 2021

Moayad said:
thanks for the output

can you do the command below on all nodes and post the output, and provide us your network configuration and Corosync config as well.

Bash:

~ corosync-cfgtool -s ~ cat /etc/pve/corosync.conf ~ cat /etc/network/interfaces

Code:

root@SRV1:~# corosync-cfgtool -s
Local node ID 1, transport knet
LINK ID 0
        addr    = 192.168.11.241
        status:
                nodeid:   1:    localhost
                nodeid:   2:    connected
                nodeid:   3:    disconnected

root@SRV1:~# corosync-cfgtool -s
Local node ID 1, transport knet
LINK ID 0
        addr    = 192.168.11.241
        status:
                nodeid:   1:    localhost
                nodeid:   2:    connected
                nodeid:   3:    disconnected
root@SRV1:~# cat /etc/pve/corosync.conf
logging {
  debug: off
  to_syslog: yes
}

nodelist {
  node {
    name: SRV1
    nodeid: 1
    quorum_votes: 1
    ring0_addr: 192.168.11.241
  }
  node {
    name: SRV2
    nodeid: 2
    quorum_votes: 1
    ring0_addr: 192.168.10.242
  }
  node {
    name: SRV3
    nodeid: 3
    quorum_votes: 1
    ring0_addr: 192.168.10.243
  }
}

quorum {
  provider: corosync_votequorum
}

totem {
  cluster_name: WHZ
  config_version: 5
  interface {
    linknumber: 0
  }
  ip_version: ipv4-6
  link_mode: passive
  secauth: on
  version: 2
}



root@SRV1:~# cat /etc/network/interfaces
# network interface settings; autogenerated
# Please do NOT modify this file directly, unless you know what
# you're doing.
#
# If you want to manage parts of the network configuration manually,
# please utilize the 'source' or 'source-directory' directives to do
# so.
# PVE will preserve these directives, but will NOT read its network
# configuration from sourced files, so do not attempt to move any of
# the PVE managed interfaces into external files!

auto lo
iface lo inet loopback

iface enp6s0f0 inet manual

auto enp6s0f1
iface enp6s0f1 inet manual

auto enp6s0f2
iface enp6s0f2 inet manual

auto enp6s0f3
iface enp6s0f3 inet static
        address 192.168.11.241/24
#Storage

auto vmbr0
iface vmbr0 inet static
        address 192.168.10.241/24
        gateway 192.168.10.1
        bridge-ports enp6s0f0
        bridge-stp off
        bridge-fd 0

Code:

root@SRV2:~# corosync-cfgtool -s
Local node ID 2, transport knet
LINK ID 0
        addr    = 192.168.10.242
        status:
                nodeid:   1:    connected
                nodeid:   2:    localhost
                nodeid:   3:    connected

root@SRV2:~# cat /etc/pve/corosync.conf
logging {
  debug: off
  to_syslog: yes
}

nodelist {
  node {
    name: SRV1
    nodeid: 1
    quorum_votes: 1
    ring0_addr: 192.168.11.241
  }
  node {
    name: SRV2
    nodeid: 2
    quorum_votes: 1
    ring0_addr: 192.168.10.242
  }
  node {
    name: SRV3
    nodeid: 3
    quorum_votes: 1
    ring0_addr: 192.168.10.243
  }
}

quorum {
  provider: corosync_votequorum
}

totem {
  cluster_name: WHZ
  config_version: 5
  interface {
    linknumber: 0
  }
  ip_version: ipv4-6
  link_mode: passive
  secauth: on
  version: 2
}


root@SRV2:~#  cat /etc/network/interfaces
# network interface settings; autogenerated
# Please do NOT modify this file directly, unless you know what
# you're doing.
#
# If you want to manage parts of the network configuration manually,
# please utilize the 'source' or 'source-directory' directives to do
# so.
# PVE will preserve these directives, but will NOT read its network
# configuration from sourced files, so do not attempt to move any of
# the PVE managed interfaces into external files!

auto lo
iface lo inet loopback

auto enp0s25
iface enp0s25 inet manual

auto eno1
iface eno1 inet static
        address 192.168.11.242/24
#Storage

auto vmbr0
iface vmbr0 inet static
        address 192.168.10.242/24
        gateway 192.168.10.1
        bridge-ports enp0s25
        bridge-stp off
        bridge-fd 0

Code:

root@SRV3:~# corosync-cfgtool -s
Printing link status.
Local node ID 3
LINK ID 0
        addr    = 192.168.10.243
        status:
                nodeid  1:      link enabled:1  link connected:0
                nodeid  2:      link enabled:1  link connected:1
                nodeid  3:      link enabled:1  link connected:1


root@SRV3:~# corosync-cfgtool -s
Printing link status.
Local node ID 3
LINK ID 0
        addr    = 192.168.10.243
        status:
                nodeid  1:      link enabled:1  link connected:0
                nodeid  2:      link enabled:1  link connected:1
                nodeid  3:      link enabled:1  link connected:1
root@SRV3:~# cat /etc/pve/corosync.conf
logging {
  debug: off
  to_syslog: yes
}

nodelist {
  node {
    name: SRV1
    nodeid: 1
    quorum_votes: 1
    ring0_addr: 192.168.11.241
  }
  node {
    name: SRV2
    nodeid: 2
    quorum_votes: 1
    ring0_addr: 192.168.10.242
  }
  node {
    name: SRV3
    nodeid: 3
    quorum_votes: 1
    ring0_addr: 192.168.10.243
  }
}

quorum {
  provider: corosync_votequorum
}

totem {
  cluster_name: WHZ
  config_version: 5
  interface {
    linknumber: 0
  }
  ip_version: ipv4-6
  link_mode: passive
  secauth: on
  version: 2
}


root@SRV3:~# cat /etc/network/interfaces
# network interface settings; autogenerated
# Please do NOT modify this file directly, unless you know what
# you're doing.
#
# If you want to manage parts of the network configuration manually,
# please utilize the 'source' or 'source-directory' directives to do
# so.
# PVE will preserve these directives, but will NOT read its network
# configuration from sourced files, so do not attempt to move any of
# the PVE managed interfaces into external files!

auto lo
iface lo inet loopback

iface enp1s0 inet manual

auto enp2s0
iface enp2s0 inet static
        address 192.168.11.243/24
#Storage

auto vmbr0
iface vmbr0 inet static
        address 192.168.10.243/24
        gateway 192.168.10.1
        bridge-ports enp1s0
        bridge-stp off
        bridge-fd 0

Moayad said:
Also, it will be helpful if you send us the Syslog for Corosync

Bash:

journalctl -u corosync.service

This is VERY LONG. i will attach a TXT file

Moayad · Mar 1, 2021

Hi again!

Thanks for the output.

May I ask you why the SRV1 node is on a different subnet IP? the issue might from the network side, so try to make all nodes in the same subnet IP also check here[0]

Are you running HA in the Cluster?

Have you tried to reboot your nodes or corosync service?

[0] https://pve.proxmox.com/pve-docs/chapter-pvecm.html#pvecm_redundancy

Magneto · Mar 1, 2021

Moayad said:
Hi again!

Thanks for the output.

May I ask you why the SRV1 node is on a different subnet IP? the issue might from the network side, so try to make all nodes in the same subnet IP also check here[0]

Are you running HA in the Cluster?

Have you tried to reboot your nodes or corosync service?

[0] https://pve.proxmox.com/pve-docs/chapter-pvecm.html#pvecm_redundancy

I want / wanted to move CEPH to the 2nd IP subnet, but that failed. Both IP subnets can communicate. And all worked fine, till I had to reinstall Proxmox onto another drive.

So, shortly after my last reply, I added the 2nd IP subnet (rather, 192.168.11.243) to SRV3 and now all 3 nodes can see each other in the web UI.

I was under the impression that SRV was on the same subnet, as per /etc/hosts:


root@SRV1:~# more /etc/hosts
127.0.0.1 localhost.localdomain localhost
192.168.10.241 SRV1.local SRV1
192.168.10.242 SRV2.local SRV2
192.168.10.243 SRV3.local SRV3

How do I get SRV1 back on 192.168.10.241?

And this raises the question, how do I safely move CEPH to another IP subnet - let's say 172.16.16.10/24 for that matter?

Search

Search

1 node offline after changing host hardware

Magneto

Well-Known Member

Moayad

Proxmox Staff Member

Magneto

Well-Known Member

Moayad

Proxmox Staff Member

Moayad

Proxmox Staff Member

Magneto

Well-Known Member

Magneto

Well-Known Member

Moayad

Proxmox Staff Member

Magneto

Well-Known Member

Moayad

Proxmox Staff Member

Magneto

Well-Known Member

Attachments

Moayad

Proxmox Staff Member

Magneto

Well-Known Member

We value your privacy