Proxmox cluster doing memory upgrade

liemba · Nov 21, 2020

Hi,

I have en 3 node cluster fully upgraded to 6.2 - in production and that works great.

VM and CT are on a share storage and HA is great and working.

I want to do a memory upgrade of the nodes.

When I down my node1 (after moving all vm and ct) the two other nodes seems to fail or restart all services. Not handy at all. I did try to remove all entries in the HA - if that was messing it up.

My question is then; What is the procedure for this?

I want to down a node - do some memory upgrade - start the node - without implicating the other nodes.
(Naturally I move the live vm + ct to another node prior)

Anders

wolfgang · Nov 23, 2020

Hi,

I guess you run out of memory on the two other nodes.
Because the restating of service is not normal. So I guess the OOMKiller killed your services.
Or another possibility is you overload the network.
Does the corosync network is the same as the storage network?

liemba · Nov 23, 2020

Hi,

Well I moveed all VM + CT to the other nodes - and they are running nicely hours before. So node1 is totally empty and node2+3 are running fine. Until I down node1. And I did this 3 times - same result. And both node2 and node3 restarts all vm + vt.

Corosync and storage - yes they are on the same network.

So the network might be congested and resulting in restarts?

I did the same with node2 - moved vm + ct away and downed it. And that worked nicely.

This seems to be isolated to node1. There is no corosync master i can move actively?

wolfgang · Nov 23, 2020

liemba said:
So the network might be congested and resulting in restarts?

Yes, for sure.
If your network works not properly HA makes more problems then it helps.
Use a dedicated network for cororync.

liemba said:
There is no corosync master i can move actively?

No there is no master in corosync.

liemba · Nov 23, 2020

Ok.

But I removed all HA entries? and had the same result.

So I should have one network for corosync, one network for SAN/storage and one network for VM+CT?

wolfgang · Nov 23, 2020

liemba said:
But I removed all HA entries?

The LRM will not stop the watchdog.
So this makes no different if corosync fails.

liemba said:
So I should have one network for corosync, one network for SAN/storage and one network for VM+CT?

Yes, for HA, definitely.

liemba · Nov 23, 2020

Ok. Ill do this.

Thank you.

Anders

Search

Search

Proxmox cluster doing memory upgrade

liemba

New Member

wolfgang

Proxmox Retired Staff

liemba

New Member

wolfgang

Proxmox Retired Staff

liemba

New Member

wolfgang

Proxmox Retired Staff

liemba

New Member

We value your privacy