[SOLVED] Issues Moving CEPH Traffic to Fiber Cards on Proxmox - RBD Unavailable After Switch

erathia · Mar 22, 2024

Hi everyone, long time lurker but hey, there's a first time at anything right?
So i have posted about a problem i'm facing on the proxmox reddit, and someone PM'd me to post it also here, so here's a copypasta of the original post with added all the things they told me to try over there, hope that some folks here can help!

I'm attempting to switch my CEPH traffic from an RJ45 bond to fiber cards using a Dell S5048F-ON switch, but I've hit a roadblock. After switching the OVS interface to the second bridge (fiber) and restarting an OSD for testing, RBD becomes inaccessible from the swapped host.

Goal: Move CEPH traffic from 4x1G RJ45 (bonded) to 2x25G Fiber connections for improved performance.

Issue: Post-switch, RBD can't be accessed, although the OSD stays UP.

Steps Taken:

Down all OSDs on the host and wait for the cluster to reach health:OK.
Shutdown other CEPH services (manager, monitor, metadata server).
Update network configuration to move the interface hosting CEPH's public & cluster network IP.
Execute ifreload -a

Symptoms: After making the switch, RBD becomes inaccessible. Here's the syslog when bringing up the OSD:

Code:

Mar 22 17:16:30 px06 pvestatd[2781]: status update time (5.943 seconds)
Mar 22 17:16:37 px06 pvedaemon[2827]: <root@pam> starting task UPID:px06:0000DFF9:000938B2:65FDAEE5:srvstart:osd.32:root@pam:
Mar 22 17:16:37 px06 systemd[1]: Starting ceph-osd@32.service - Ceph object storage daemon osd.32...
Mar 22 17:16:37 px06 systemd[1]: Started ceph-osd@32.service - Ceph object storage daemon osd.32.
Mar 22 17:16:38 px06 pvedaemon[2827]: <root@pam> end task UPID:px06:0000DFF9:000938B2:65FDAEE5:srvstart:osd.32:root@pam: OK
Mar 22 17:16:39 px06 pvestatd[2781]: got timeout
Mar 22 17:16:40 px06 pvestatd[2781]: status update time (5.952 seconds)
Mar 22 17:16:48 px06 ceph-osd[57345]: 2024-03-22T17:16:48.172+0100 7ff5fa5606c0 -1 osd.32 68903 log_to_monitors true
Mar 22 17:16:49 px06 pvestatd[2781]: got timeout
Mar 22 17:16:50 px06 pvestatd[2781]: status update time (5.949 seconds)
Mar 22 17:16:51 px06 pmxcfs[2270]: [status] notice: received log
Mar 22 17:16:56 px06 pmxcfs[2270]: [status] notice: received log
Mar 22 17:16:59 px06 pvestatd[2781]: got timeout
Mar 22 17:17:00 px06 pvestatd[2781]: status update time (5.939 seconds)
Mar 22 17:17:01 px06 CRON[58110]: pam_unix(cron:session): session opened for user root(uid=0) by (uid=0)
Mar 22 17:17:01 px06 CRON[58111]: (root) CMD (cd / && run-parts --report /etc/cron.hourly)
Mar 22 17:17:01 px06 CRON[58110]: pam_unix(cron:session): session closed for user root
Mar 22 17:17:09 px06 pvestatd[2781]: got timeout
Mar 22 17:17:10 px06 pvestatd[2781]: status update time (5.947 seconds)
Mar 22 17:17:18 px06 sudo[58327]:   zabbix : PWD=/ ; USER=root ; COMMAND=/usr/sbin/qm list
Mar 22 17:17:18 px06 sudo[58327]: pam_unix(sudo:session): session opened for user root(uid=0) by (uid=112)
Mar 22 17:17:19 px06 pvestatd[2781]: got timeout
Mar 22 17:17:19 px06 sudo[58327]: pam_unix(sudo:session): session closed for user root
Mar 22 17:17:19 px06 pvestatd[2781]: status update time (5.946 seconds)
Mar 22 17:17:30 px06 pvestatd[2781]: got timeout
Mar 22 17:17:30 px06 pvestatd[2781]: status update time (5.943 seconds)
Mar 22 17:17:49 px06 pvestatd[2781]: got timeout
Mar 22 17:17:50 px06 pvestatd[2781]: status update time (5.936 seconds)
Mar 22 17:17:53 px06 ceph-osd[57345]: 2024-03-22T17:17:53.957+0100 7ff5ee2b26c0 -1 osd.32 69657 set_numa_affinity unable to identify public interface '' numa node: (2) No such file or directory
Mar 22 17:17:59 px06 pvestatd[2781]: got timeout
Mar 22 17:18:00 px06 pvestatd[2781]: status update time (5.938 seconds)
Mar 22 17:18:09 px06 pvestatd[2781]: got timeout
Mar 22 17:18:10 px06 pvestatd[2781]: status update time (5.944 seconds)
Mar 22 17:18:17 px06 sudo[59204]:   zabbix : PWD=/ ; USER=root ; COMMAND=/usr/sbin/qm list
Mar 22 17:18:17 px06 sudo[59204]: pam_unix(sudo:session): session opened for user root(uid=0) by (uid=112)
Mar 22 17:18:18 px06 ceph-osd[57345]: 2024-03-22T17:18:18.783+0100 7ff5efab56c0 -1 osd.32 69668 heartbeat_check: no reply from 10.0.0.17:6846 osd.8 ever on either front or back, first ping sent 2024-03-22T17:17:58.686812+0100 (oldest deadline 2024-03-22T17:18:18.686812+0100)
Mar 22 17:18:18 px06 ceph-osd[57345]: 2024-03-22T17:18:18.783+0100 7ff5efab56c0 -1 osd.32 69668 heartbeat_check: no reply from 10.0.0.15:6836 osd.31 ever on either front or back, first ping sent 2024-03-22T17:17:58.686812+0100 (oldest deadline 2024-03-22T17:18:18.686812+0100)
Mar 22 17:18:18 px06 ceph-osd[57345]: 2024-03-22T17:18:18.783+0100 7ff5efab56c0 -1 osd.32 69668 heartbeat_check: no reply from 10.0.0.17:6806 osd.35 ever on either front or back, first ping sent 2024-03-22T17:17:58.686812+0100 (oldest deadline 2024-03-22T17:18:18.686812+0100)
Mar 22 17:18:18 px06 ceph-osd[57345]: 2024-03-22T17:18:18.783+0100 7ff5efab56c0 -1 osd.32 69668 heartbeat_check: no reply from 10.0.0.31:6804 osd.44 ever on either front or back, first ping sent 2024-03-22T17:17:58.686812+0100 (oldest deadline 2024-03-22T17:18:18.686812+0100)
Mar 22 17:18:18 px06 ceph-osd[57345]: 2024-03-22T17:18:18.783+0100 7ff5efab56c0 -1 osd.32 69668 heartbeat_check: no reply from 10.0.0.22:6820 osd.48 ever on either front or back, first ping sent 2024-03-22T17:17:58.686812+0100 (oldest deadline 2024-03-22T17:18:18.686812+0100)
Mar 22 17:18:19 px06 sudo[59204]: pam_unix(sudo:session): session closed for user root
Mar 22 17:18:19 px06 ceph-osd[57345]: 2024-03-22T17:18:19.763+0100 7ff5efab56c0 -1 osd.32 69669 heartbeat_check: no reply from 10.0.0.17:6846 osd.8 ever on either front or back, first ping sent 2024-03-22T17:17:58.686812+0100 (oldest deadline 2024-03-22T17:18:18.686812+0100)

Hardware Configuration:

Dell nodes with a mix of SSDs and HDDs, WALs on SSDs for HDD OSDs.
4x1GB RJ45 links bonded in OVS (Intel I350)
2x25GB Fiber Broadcom BCM57414 (recent firmware update & OVS bond too)
Both networks are communicatings and vlans are propagated correctly

Network Configurations:

Proxmox Cluster config:

Code:

| Nodename | ID | Votes | Link 0   | Link 1       |
|----------|----|-------|----------|--------------|
| px01     | 1  | 1     | 10.0.0.11 | 10.197.241.11 |
| px02     | 2  | 1     | 10.0.0.12 | 10.197.241.12 |
| px03     | 3  | 1     | 10.0.0.13 | 10.197.241.13 |
| px04     | 4  | 1     | 10.0.0.14 | 10.197.241.14 |
| px05     | 5  | 1     | 10.0.0.15 | 10.197.241.15 |
| px06     | 6  | 1     | 10.0.0.16 | 10.197.241.16 |
| px07     | 7  | 1     | 10.0.0.17 | 10.197.241.17 |
| px08     | 8  | 1     | 10.0.0.18 | 10.197.241.18 |
| px09     | 9  | 1     | 10.0.0.19 | 10.197.241.19 |

...Ect

Ceph cluster config:

Code:

[global]
     auth_client_required = cephx
     auth_cluster_required = cephx
     auth_service_required = cephx
     cluster_network = 10.0.0.11/24
     fsid = 15247e0b-e91d-49a3-85a7-0e5c62da99e6
     mon_allow_pool_delete = true
     mon_host = 10.0.0.11 10.0.0.16 10.0.0.17 10.0.0.23 10.0.0.25 10.0.0.27 10.0.0.13
     ms_bind_ipv4 = true
     ms_bind_ipv6 = false
     osd_pool_default_min_size = 2
     osd_pool_default_size = 3
     public_network = 10.0.0.11/24

Before the switch:

Code:

auto lo
iface lo inet loopback
 
auto ens6f0np0
iface ens6f0np0 inet manual
 
auto ens6f1np1
iface ens6f1np1 inet manual
 
auto eno4
iface eno4 inet manual
 
auto eno3
iface eno3 inet manual
 
auto eno2
iface eno2 inet manual
 
auto eno1
iface eno1 inet manual
 
auto int241
iface int241 inet static
        address 10.197.241.16/24
        gateway 10.197.241.1
        ovs_type OVSIntPort
        ovs_bridge vmbr0
        ovs_mtu 9000
        ovs_options tag=241
 
auto int242
iface int242 inet static
        address 10.0.0.16/24
        ovs_type OVSIntPort
        ovs_bridge vmbr0
        ovs_mtu 9000
        ovs_options tag=242
 
auto bond0
iface bond0 inet manual
        ovs_bonds eno1 eno2 eno3 eno4
        ovs_type OVSBond
        ovs_bridge vmbr0
        ovs_mtu 9000
        ovs_options lacp=active bond_mode=balance-tcp
 
auto bond1
iface bond1 inet manual
        ovs_bonds ens6f0np0 ens6f1np1
        ovs_type OVSBond
        ovs_bridge vmbr1
        ovs_mtu 9000
        ovs_options lacp=active bond_mode=balance-tcp
 
auto vmbr0
iface vmbr0 inet manual
        ovs_type OVSBridge
        ovs_ports bond0 int241 int242
        ovs_mtu 9000
 
auto vmbr1
iface vmbr1 inet manual
        ovs_type OVSBridge
        ovs_ports bond1
        ovs_mtu 9000

After the switch:

Code:

auto lo
iface lo inet loopback
 
auto ens6f0np0
iface ens6f0np0 inet manual
 
auto ens6f1np1
iface ens6f1np1 inet manual
 
auto eno4
iface eno4 inet manual
 
auto eno3
iface eno3 inet manual
 
auto eno2
iface eno2 inet manual
 
auto eno1
iface eno1 inet manual
 
auto int241
iface int241 inet static
        address 10.197.241.16/24
        gateway 10.197.241.1
        ovs_type OVSIntPort
        ovs_bridge vmbr0
        ovs_mtu 9000
        ovs_options tag=241
 
auto int242
iface int242 inet static
        address 10.0.0.16/24
        ovs_type OVSIntPort
        ovs_bridge vmbr1
        ovs_mtu 9000
        ovs_options tag=242
 
auto bond0
iface bond0 inet manual
        ovs_bonds eno1 eno2 eno3 eno4
        ovs_type OVSBond
        ovs_bridge vmbr0
        ovs_mtu 9000
        ovs_options lacp=active bond_mode=balance-tcp
 
auto bond1
iface bond1 inet manual
        ovs_bonds ens6f0np0 ens6f1np1
        ovs_type OVSBond
        ovs_bridge vmbr1
        ovs_mtu 9000
        ovs_options lacp=active bond_mode=balance-tcp
 
auto vmbr0
iface vmbr0 inet manual
        ovs_type OVSBridge
        ovs_ports bond0 int241
        ovs_mtu 9000
 
auto vmbr1
iface vmbr1 inet manual
        ovs_type OVSBridge
        ovs_ports bond1 int242
        ovs_mtu 9000

both ceph cluster and ceph public networks are on int242, int241 is the front for proxmox webui

The things i tried:

Check for good communication between the swapped host and the other hosts:

Checked for ping between swapped host and other hosts:

Code:

root@px06:~# ping 10.0.0.14
PING 10.0.0.14 (10.0.0.14) 56(84) bytes of data.
64 bytes from 10.0.0.14: icmp_seq=1 ttl=64 time=0.258 ms
64 bytes from 10.0.0.14: icmp_seq=2 ttl=64 time=0.188 ms
--- 10.0.0.14 ping statistics ---
2 packets transmitted, 2 received, 0% packet loss, time 1021ms
rtt min/avg/max/mdev = 0.188/0.223/0.258/0.035 ms

Captured packets on PX06 int242 interface (ceph public and cluster interface) after the switch :

... and the packet capture shows that the hosts (10.0.0.16 & 10.0.0.25 in that exemple) are communicating correctly on the ceph services.

Code:

root@px15:~# netstat -tulpn | grep  6789
tcp        0      0 10.0.0.25:6789          0.0.0.0:*               LISTEN      2140379/ceph-mon   
root@px15:~# netstat -tulpn | grep  6881
tcp        0      0 10.0.0.25:6881          0.0.0.0:*               LISTEN      3288007/ceph-mds   
root@px15:~# netstat -tulpn | grep  6800
tcp        0      0 10.0.0.25:6800          0.0.0.0:*               LISTEN      2097923/ceph-osd

erathia · Mar 27, 2024

Solved, always double check your spanning tree weigths.

Search

Search

[SOLVED] Issues Moving CEPH Traffic to Fiber Cards on Proxmox - RBD Unavailable After Switch

erathia

Member

erathia

Member