Proxmox cluster doing memory upgrade

liemba

New Member
Feb 14, 2019
8
0
1
51
Hi,

I have en 3 node cluster fully upgraded to 6.2 - in production and that works great.

VM and CT are on a share storage and HA is great and working.

I want to do a memory upgrade of the nodes.

When I down my node1 (after moving all vm and ct) the two other nodes seems to fail or restart all services. Not handy at all. I did try to remove all entries in the HA - if that was messing it up.

My question is then; What is the procedure for this?

I want to down a node - do some memory upgrade - start the node - without implicating the other nodes.
(Naturally I move the live vm + ct to another node prior)

Anders
 
Hi,

I guess you run out of memory on the two other nodes.
Because the restating of service is not normal. So I guess the OOMKiller killed your services.
Or another possibility is you overload the network.
Does the corosync network is the same as the storage network?
 
Hi,

Well I moveed all VM + CT to the other nodes - and they are running nicely hours before. So node1 is totally empty and node2+3 are running fine. Until I down node1. And I did this 3 times - same result. And both node2 and node3 restarts all vm + vt.

Corosync and storage - yes they are on the same network.

So the network might be congested and resulting in restarts?

I did the same with node2 - moved vm + ct away and downed it. And that worked nicely.

This seems to be isolated to node1. There is no corosync master i can move actively?
 
So the network might be congested and resulting in restarts?
Yes, for sure.
If your network works not properly HA makes more problems then it helps.
Use a dedicated network for cororync.
There is no corosync master i can move actively?
No there is no master in corosync.
 
Ok.

But I removed all HA entries? and had the same result.

So I should have one network for corosync, one network for SAN/storage and one network for VM+CT?
 
But I removed all HA entries?
The LRM will not stop the watchdog.
So this makes no different if corosync fails.

So I should have one network for corosync, one network for SAN/storage and one network for VM+CT?
Yes, for HA, definitely.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!