Proxmox cluster doing memory upgrade

liemba

New Member
Feb 14, 2019
8
0
1
52
Hi,

I have en 3 node cluster fully upgraded to 6.2 - in production and that works great.

VM and CT are on a share storage and HA is great and working.

I want to do a memory upgrade of the nodes.

When I down my node1 (after moving all vm and ct) the two other nodes seems to fail or restart all services. Not handy at all. I did try to remove all entries in the HA - if that was messing it up.

My question is then; What is the procedure for this?

I want to down a node - do some memory upgrade - start the node - without implicating the other nodes.
(Naturally I move the live vm + ct to another node prior)

Anders
 
Hi,

I guess you run out of memory on the two other nodes.
Because the restating of service is not normal. So I guess the OOMKiller killed your services.
Or another possibility is you overload the network.
Does the corosync network is the same as the storage network?
 
Hi,

Well I moveed all VM + CT to the other nodes - and they are running nicely hours before. So node1 is totally empty and node2+3 are running fine. Until I down node1. And I did this 3 times - same result. And both node2 and node3 restarts all vm + vt.

Corosync and storage - yes they are on the same network.

So the network might be congested and resulting in restarts?

I did the same with node2 - moved vm + ct away and downed it. And that worked nicely.

This seems to be isolated to node1. There is no corosync master i can move actively?
 
So the network might be congested and resulting in restarts?
Yes, for sure.
If your network works not properly HA makes more problems then it helps.
Use a dedicated network for cororync.
There is no corosync master i can move actively?
No there is no master in corosync.
 
Ok.

But I removed all HA entries? and had the same result.

So I should have one network for corosync, one network for SAN/storage and one network for VM+CT?
 
But I removed all HA entries?
The LRM will not stop the watchdog.
So this makes no different if corosync fails.

So I should have one network for corosync, one network for SAN/storage and one network for VM+CT?
Yes, for HA, definitely.