Proxmox Ceph Issue

Andrew Sutton · Jan 7, 2016

Hi,

We're running CEPH HA systems, basically a 4 node setup of C6100s.

We had one blow up and take the entire datacenter down (our 5 racks).

In the meantime, two other Ceph clusters have blown up. However they have not taken our DC down.

I'd rather not have this happen obviously, so could you tell me what information you need from me to help me figure out what is going on?

Q-wulf · Jan 8, 2016

1. When you say blow up, what do you mean exactly. It is a potentially broad spectrum. Dude with C4, Water, Fire, Power spikes, blown PSU, Dude not knowing what he was doing and fiddling with your configs.

2. Do you still have ssh access to a working node ? (One per each of the 3 clusters)
2.1 Output of "ceph health detail"
2.2 Output of "ceph osd tree"

3. Do you still have Proxox Gui access ? (One per each of the 3 clusters)
3.1 if so, go to Proxmox -> Node -> Ceph -> Crush and provide us with a copy.

Andrew Sutton · Jan 8, 2016

Hi,

The one that blew up lost it's HA config. When rebooted, it returned then lost it again. It flooded the network with so much traffic the firewall locked up. So we killed it. Was just 1 server.

Today was different, and after checking the conf files it looked fine so I looked at the switches. One was green, not amber. There was a 'touch' by the dc staff, and I suspect the cable itself was not in good shape. So we replaced the cable and everything was fine.

Ceph wouldn't do anything as the CMAP service wouldn't start.

I tried a few things but it was working, then it wasn't, and the conf files looked fine. So I decided to try looking at the switches, back to basics.

Still not sure about the first detonation, as that took out our entire DC, but this was just an errant cable.

I don't think any ceph commands were working without CMAP, and the GUI was pretty meh.

We were physically at the datacenter doing other work.

Thanks,
Andrew

Q-wulf · Jan 8, 2016

So, when you are saying "blown up", "detonation" you are talking about the "rogue" nodes creating enough traffic to overload your internal infrastructure.

Do you know which node(s) is causing the massive traffic?
What type of nics do the nodes have ? 1G, 10G or 40G ? whats the concurrent backplane speed / Processing capacity of the Switches / Firewalls they are taking down ? just to get an idea, wether this is rogue behaviour, or just large-scale backfilling.
How much data are you storing per Node ?

Are you saying you can not run any "ceph" commands via CLI at all ?? On any of the nodes belonging to that cluster ? how about "pveceph status" ??

have you tried restarting the ceph services ?
pveceph stop <leaveEmpty to stop all / or enter specific Service, e.g. mon.0, osd.2>
pveceph start <leaveEmpty to start all / or enter specific Service, e.g. mon.0, osd.2>

Andrew Sutton · Jan 8, 2016

Yes. They effectively shut down our m200 VPN, which we use for internal company access to our servers.

When the C6100 was brought online, everything went down. Took it offline, everything went back up.

Each nic has a 1gb port for regular traffic, and we have dell powerconnects for 10gb SFP+ connections for the ceph traffic.

The m200s I believe are quad core servers. They handle 75 concurrent VPNs and a couple hundred VMs out no problem. The switches are quanta LBM4s. Stuff worked inside fine, it was the firewall couldn't handle the mass of traffic.

We had control of the local box, but often we'd get server offline? or 595 connection messages. That was not what happened in the above situation, that's what happened today (two events, two different results). I could not do anything with PVECEPH or Corosync really. I could do a splace -f corosync -f and it would run but that was about it. Everything else would result in could not start CMAP service. When I ran a sysctl on the corosync service it just said error code 1 and was off. All other services were fine.

Q-wulf · Jan 8, 2016

Andrew Sutton said:
Yes. They effectively shut down our m200 VPN, which we use for internal company access to our servers.

Watchguard Firebox M200 ?
Spec sheet says:
Firewall throughput: 3.2 Gbps
VPN throughput: 1.2 Gbps

the LB4M's and the dell switches have plenty of capacity.

Just to get this straight:

You are running 3 separate Proxmox-Clusters
Each of those "Proxmox-nodes" has a 1x 10G for ceph traffic via Dell powerconnect switches.
Each of those "Proxmox-nodes" has a 1x 1G for VM-traffic via Quanta LB4M
Your M200's connect to your Quanta LB4m's in order for you to get outside access to the data-center.
You run all your public VM traffic via the Watchguard M200's ?

Q1: How many M200's you have ? 1 per Proxmox-Cluster?
Q2: Are the M200's connected to the the dell power connects via the Quanta LB4M 10G uplink ports ?
Q3: Are you talking 4 C6100's per cluster (16 Proxmox nodes per), or 4 Proxmox nodes per Cluster (1 C6100s). ???
Q4: How many OSDs do you have per "Proxmox-CLuster" ? What capacities ?
Q5: Do any of the VM's do inter-Vm communication ?
Q6: do you have the chance to "isolate" a single of those nodes and have a look at what it is doing when it gets online ? e.g. inbound/outbound traffic, syslog messages, etc
Q7: Do you have a monitoring system you could use to see which "VM / node / service" generates the high traffic, or even better yet where the high traffic on your M200's comes from?
Q8: Do you have a link to the Manufacturer site/model where one could look at the specs for the LB4M's ?
Q9: linux native vmbrx's or openvswitch used on prxmox nodes ?
Q10: you do not have a single node of the proxmox-cluster configured to run any sort of backend via your m200's right ? like e.g. a single node in your office basement used as disaster recovery or some such thing ?
Q11: You do not use the M200's to filter traffic between your Clusters Proxmox nodes right ? (

Code:

Proxmox1(VPN) <-> LB4M <-> M200 <-> Lb4M <-> Proxmox2 (VPN)

may that be Proxmox or Ceph traffic. You use teh M200's solely as perimeter firewall to guard your internal (private) Datacenter network from the outside (other clients in Datacenter, regular www), right ?

Some thoughts I have on what it could be:
1. Could be Ceph trying to balance the Cluster via your 1G links or even worse via your M200s (suggests broken or misconfigured configs)
2. Could be Proxmox-HA tyring to move VM's back to where they were supposed to be running (once you move the "rogue" node back into the cluster) via the 1G-Links over your m200s (suggest misconfiguration)
3. Could be the Proxmox-node spamming your M200s via LB4M's with some sort of mass multicast traffic (could be all sorts of things)
4. Could be a couple of VM's (for what ever reason) located on the "cluster" trying to move mass amounts of traffic via your M200's.

in any case, its obvious its traffic coming from behind your m200's not from outside based on your symptom description.

Andrew Sutton · Jan 13, 2016

Hi sorry on the delay, I get crazy busy.

Q-wulf said:
Watchguard Firebox M200 ?
Spec sheet says:
Firewall throughput: 3.2 Gbps
VPN throughput: 1.2 Gbps

the LB4M's and the dell switches have plenty of capacity.

Just to get this straight:

Right so far.

Q-wulf said:
You are running 3 separate Proxmox-Clusters
Each of those "Proxmox-nodes" has a 1x 10G for ceph traffic via Dell powerconnect switches.
Each of those "Proxmox-nodes" has a 1x 1G for VM-traffic via Quanta LB4M
Your M200's connect to your Quanta LB4m's in order for you to get outside access to the data-center.
You run all your public VM traffic via the Watchguard M200's ?

We actually have 2 clusters of 4 nodes per C6100 (so two physical C6100 servers) running in HA mode.

They have 3x1TB 7200 RPM drives each.

Each HA node

Q-wulf said:
Q1: How many M200's you have ? 1 per Proxmox-Cluster?

No. We have 5 racks. Generally 2 clusters (as above) per rack. Each switch (Quanta) is tied to the M200 we use for VPN traffic for our staff use. There are 5 distinct subnets, 1 per rack.

We have 2 MPLS, 1 external connection, and another external connection....sigh, long story short:

One datacenter is closing, but we purchased the Watchguard firewalls and literally the day after we installed the HA M300s into our primary site we got told it was closing. So I've been scrambling for months to prep for the Feb end of month closure of that datacenter, but basically what we have is not what we will have.

Current:
Internal company VPN access M200 (also main connectivity).
MPLS x1, HA M300s per.

Planned:

MPLS x2, HA M300s per.
Main connectivity HA M300s.
Internal company VPN access M200.

As you can see, I'm running our external traffic out through the M200 but the production servers are still in the other DC. We are virtualizing that junk into our new primary DC, and we'll move/reconfigure the two sets of HA M300s from there to the new primary DC once we go live with the VMs. So basically besides internal traffic, we have no traffic. lol

Q-wulf said:
Q2: Are the M200's connected to the the dell power connects via the Quanta LB4M 10G uplink ports ?

No. The powerconnects are isolated Ceph traffic only switches.

The 10g uplink ports are daisy chained across each rack from switch to switch, aka LB4M to LB4M.

At the end of the line we have 5 gigabit ports bonded on the M200 to 5 ports on the LB4M in that rack.

Q-wulf said:
Q3: Are you talking 4 C6100's per cluster (16 Proxmox nodes per), or 4 Proxmox nodes per Cluster (1 C6100s). ???

8 total nodes, with 4 nodes per server. We started with 1, and added a 2nd C6100 for redundancy.

Q-wulf said:
Q4: How many OSDs do you have per "Proxmox-CLuster" ? What capacities ?

Each OSD is 1tb, so an entire drive is being used. 2 per node, so 14 with 7 monitors.

Q-wulf said:
Q5: Do any of the VM's do inter-Vm communication ?

Yes, they are a server stack, 3 windows VMs talking to each other. JBoss, SQL.

Q-wulf said:
Q6: do you have the chance to "isolate" a single of those nodes and have a look at what it is doing when it gets online ? e.g. inbound/outbound traffic, syslog messages, etc

Goodness. I wish my company had hired the extra 2 VM staff I requested 5 months ago. Sadly they have not. So I probably could do so on a distinct setup here in the office, but hey we're moving offices next week and I've been scrambling to make sure the new office is ready.

I'd like to point out we also have about 10 older C6100 proxmox servers that are running individual proxmox nodes.

Two of our systems in two racks are working fine, these are production machines running 1 of 2 C6100 servers. When we add the 2nd server, things go belly up.

Q-wulf said:
Q7: Do you have a monitoring system you could use to see which "VM / node / service" generates the high traffic, or even better yet where the high traffic on your M200's comes from?

There doesn't appear to be high traffic. I looked, all of the gigabit links are running at 1% or less. lol That's why this is driving me crazy.

Q-wulf said:
Q8: Do you have a link to the Manufacturer site/model where one could look at the specs for the LB4M's ?

Sort of. These are the best places for info:

https://forums.servethehome.com/ind...-lb4m-48-port-gigabit-switch-discussion.3248/

First post is a link to the manual.

Q-wulf said:
Q9: linux native vmbrx's or openvswitch used on prxmox nodes ?

Windows. I'm in a windows shop. The only linux servers are IT (proxmox, etc).

Q-wulf said:
Q10: you do not have a single node of the proxmox-cluster configured to run any sort of backend via your m200's right ? like e.g. a single node in your office basement used as disaster recovery or some such thing ?

Nope. We will eventually do this in our secondary datacenter, once the old (primary) one closes and we get into the new one.

Q-wulf said:
Q11: You do not use the M200's to filter traffic between your Clusters Proxmox nodes right ? (

Code:

Proxmox1(VPN) <-> LB4M <-> M200 <-> Lb4M <-> Proxmox2 (VPN)

They are on VLANs.

Q-wulf said:
may that be Proxmox or Ceph traffic. You use teh M200's solely as perimeter firewall to guard your internal (private) Datacenter network from the outside (other clients in Datacenter, regular www), right ?

Correct. No other clients in DC are sharing our internet.

Q-wulf said:
Some thoughts I have on what it could be:
1. Could be Ceph trying to balance the Cluster via your 1G links or even worse via your M200s (suggests broken or misconfigured configs)

Some were on the initial set, because the Ceph HA guide we bought is sadly incorrect on this point. We have built the others correctly and removed that one from service.

Q-wulf said:
2. Could be Proxmox-HA tyring to move VM's back to where they were supposed to be running (once you move the "rogue" node back into the cluster) via the 1G-Links over your m200s (suggest misconfiguration)

The M200s do not resolve DNS. I suggested we isolate the individual servers to talk directly to the LB4Ms (which I believe can resolve DNS) and isolate each rack from each other. This was always planned, but being overloaded with tasks I haven't been able to get ahead of anything and actually get a federated AD system up to help with this.

Q-wulf said:
3. Could be the Proxmox-node spamming your M200s via LB4M's with some sort of mass multicast traffic (could be all sorts of things)

I think it might have been, but really have no idea. We have more clusters up now than ever before, and the box isn't blowing up.

Q-wulf said:
4. Could be a couple of VM's (for what ever reason) located on the "cluster" trying to move mass amounts of traffic via your M200's.

Agreed. I think the incorrect firmware with a bad scanner update combined to take itself down. Once patched the problem seemed to go away, but we already blew the 2nd offending proxmox server cluster away as a precaution.

Q-wulf said:
in any case, its obvious its traffic coming from behind your m200's not from outside based on your symptom description.

I agree.

Thanks for the excellent questions, my apologies for not answering quicker.

Q-wulf · Jan 13, 2016

No its fine,
altho some of the answers made me go

(in that order)

especially Answers to Q7, Q8, Q11 and definitely Q9

So, is the problem solved now, or have you just shelved it due to time constrains and just moved the offending C6100 cluster some place else ?

Andrew Sutton · Jan 14, 2016

We deleted the offending cluster and rebuilt it.

Which question is 7 so I can keep track of the bad answers. lol

Q-wulf · Jan 14, 2016

Q7: Are you using a monitoring system
Q8: is the a vendor site for this router --> nope, but in that weird forum there are pictures of the naked hardware and a there ought to be a pdf there too.
Q11: You don't use the M200's to filter traffic via VPN, don't ya ? --> Nope its on VLans.

But my absolute favourite was this one:

Q9: linux native vmbrx's or openvswitch used on prxmox nodes ?

Windows. I'm in a windows shop. The only linux servers are IT (proxmox, etc).

I almost spilled my coffee reading this.
(what i actually wanted to know is wether you use the "normal" vmbr's in proxmox, or if you had opted to the alternative openvswitch based vmbr's. )

just in case you do not have it yet,

a monitoring system like e.g. zabbix can really - really - help you narrow stuff down in a fraction of the time, posting a 2 paragraph forum post can. Not just CPU/Ram/IO utilisation wise. e.g. since you are running Ceph too, its a good idea to graph HDD's Smart-values, so you can tell from the trend of the smart values wether or not one of your disks will fail (before it actually does). Or keeping track when at what point you bottle necked your network asset X because of network source Y.

Andrew Sutton · Jan 15, 2016

We use cloned manually installed copies of Windows VMs, because we are slammed.

I inherited a global IT system that never had an IT director. 3 countries, 4 main sites. It's been a fun first year. And they only know Windows.

It is what it is.

I actually have a Zenoss system ready for monitoring, and openvas, etc. I need to get systems deployed for production then I can get the federated AD system up, and then maybe get Zenoss up.

I have to help a lot with other projects because technical skills are not aligned to technical needs. It is what it is.

Andrew Sutton · Jan 15, 2016

Q-wulf said:
Q7: Are you using a monitoring system
Q8: is the a vendor site for this router --> nope, but in that weird forum there are pictures of the naked hardware and a there ought to be a pdf there too.
Q11: You don't use the M200's to filter traffic via VPN, don't ya ? --> Nope its on VLans.

Q7. Answered above. Zenoss.

Q8. Wouldn't it be cool to have a bigger budget but it is what it is. I have to get 300 VMs up on proxmox servers and don't have funds for toys like Cisco switches. So I get to suffer.

Q11. Are you meaning traffic in or out? Or across to each other internally? The former, no. The latter, that is planned but not implemented yet.

Q-wulf · Jan 15, 2016

Andrew Sutton said:
Q11. Are you meaning traffic in or out? Or across to each other internally? The former, no. The latter, that is planned but not implemented yet.

i meant the latter, as in you use your M200's to filter intra-VM traffic between multiple VM's (that could potententially act as DOS to your Firewalls from inside of your network). But if you do not use that yet, there is no danger of it.

Andrew Sutton said:
don't have funds for toys like Cisco switches

IMHO not worth the money.
And i am not dissing Whitelabel products, i am dissing white label products where even the unlabled version does not come with a manual

I personally do mellanox, netgear, zyxel switches.

Andrew Sutton · Jan 15, 2016

Right now each subnet sees each other. In future they won't.

I agree, Ciscos are not worth the money. lol and I used to work for them.

Search

Search

Proxmox Ceph Issue

Andrew Sutton

New Member

Q-wulf

Renowned Member

Andrew Sutton

New Member

Q-wulf

Renowned Member

Andrew Sutton

New Member

Q-wulf

Renowned Member

Andrew Sutton

New Member

Q-wulf

Renowned Member

Andrew Sutton

New Member

Q-wulf

Renowned Member

Andrew Sutton

New Member

Andrew Sutton

New Member

Q-wulf

Renowned Member

Andrew Sutton

New Member