Some PVE-(CEPH)-Cluster questions

kokel

Member
Mar 9, 2021
34
5
13
37
Hello,

we have a fresh 3-node PVE-CEPH-Cluster running PVE 6.4-4 and we have done some redundancy Tests, yesterday.

Each Node has the following separated Networks:
  • 1GBit/s Port - > GUI VLAN (routed) -> 10.1.0.0/27
  • 1GBit/s Port - > Corosync VLAN (only L2) -> 10.1.1.0/27
  • 10GBit/s Port
    • CEPH Public VLAN (routed, Fallback for Corosync Link) -> 10.1.2.0/27
    • CEPH Cluster VLAN (only L2, Jumbo Frames) -> 10.1.3.0/27
    • Migration VLAN (only L2, Jumbo Frames) -> 10.1.4.0/27
We have deactivated the 10GBit/s Link of one node to see what is happening:
  • PVE-Cluster (Corosync) said everything is fine, sure.
  • VMs keep running on node without ceph storage, no write/read operations are possible, sure.
  • ceph commands (ceph + pveceph) running forever in timeout without response
    • not able to get any ceph status of that node
  • Live migration of VMs from CEPH failed node to other nodes running forever ... until we have activated the 10GBit/s Link again
After activating the Link again:
  • CEPH seems self healing but cephfs not. Our CEPHFS for OS iso files don't come up again. Only a umount /mnt/pve/vm_isos helped out.
Code:
May 05 17:25:59 node2 kernel: libceph: mds0 (1)10.1.2.4:6801 socket closed (con state OPEN)
May 05 17:26:00 node2 kernel: libceph: mds0 (1)10.1.2.4:6801 connection reset
May 05 17:26:00 node2 kernel: libceph: reset on mds0
May 05 17:26:00 node2 kernel: ceph: mds0 closed our session
May 05 17:26:00 node2 kernel: ceph: mds0 reconnect start
May 05 17:26:00 node2 kernel: libceph: mds0 (1)10.1.2.4:6801 socket closed (con state NEGOTIATING)
May 05 17:26:00 node2 systemd[1]: Starting Proxmox VE replication runner...
May 05 17:26:00 node2 systemd[1]: pvesr.service: Succeeded.
May 05 17:26:00 node2 systemd[1]: Started Proxmox VE replication runner.
May 05 17:26:01 node2 pvestatd[4000]: mkdir /mnt/pve/vm_isos: File exists at /usr/share/perl5/PVE/Storage/Plugin.pm line 1175.
May 05 17:26:01 node2 kernel: ceph: mds0 rejected session

Another thingis thate we stumbled over the "Server Address" to see in the GUI via Server View -> Datacenter -> Summary -> Nodes. The IP address shown there was in different Subnets for the three nodes. In /etc/hosts we have configured (ansible does this for us) all local IP addresses for the hostname. We assume the Server Address is the first ip it retrieves on daemon start or server boot.

Our questions:
  1. Why are the ceph commands running in timeout forever? I think it should be possible to get a status of ceph to see whats wrong. At least a forever timeout (we didn't test forever :-D) is not a good choice at all, from my point of view. We can't even monitor the node with failed ceph link because our monitoring agent runs ceph commands and they run in timeout.
  2. Why isn't it possible to live migrate the VMs to a node with a healthy ceph connection?
  3. Is there any chance to configure the PVE Cluster that way, that a node with failed CEPH Links will fence itself in order to get the VMs started on the remaining nodes?
  4. Is the "Server Address" in any way configurable in order not to rely on "/etc/hosts"?
Thanks in advance,
kokel
 
Kindly pushing this Thread ... anyone some hints for us?
 
Last edited:

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!