How to properly maintain the nodes in the cluster

sky_me

Member
Dec 27, 2021
19
1
8
24
hi
I want to find some information about the correct shutdown process when the node hardware fails in the cluster. Can I directly click the shutdown button in the cluster, or do I need to stop some services first, assuming that my operating system is normal.
This is the status and version information of my current cluster
root@wg-node21:~# pveversion -v
proxmox-ve: 7.4-1 (running kernel: 5.15.102-1-pve)
pve-manager: 7.4-3 (running version: 7.4-3/9002ab8a)
pve-kernel-5.15: 7.3-3
pve-kernel-5.15.102-1-pve: 5.15.102-1
ceph-fuse: 15.2.17-pve1
corosync: 3.1.7-pve1
criu: 3.15-1+pve-1
glusterfs-client: 9.2-1
ifupdown2: 3.1.0-1+pmx3
ksm-control-daemon: 1.4-1
libjs-extjs: 7.0.0-1
libknet1: 1.24-pve2
libproxmox-acme-perl: 1.4.4
libproxmox-backup-qemu0: 1.3.1-1
libproxmox-rs-perl: 0.2.1
libpve-access-control: 7.4-1
libpve-apiclient-perl: 3.2-1
libpve-common-perl: 7.3-3
libpve-guest-common-perl: 4.2-4
libpve-http-server-perl: 4.2-1
libpve-rs-perl: 0.7.5
libpve-storage-perl: 7.4-2
libspice-server1: 0.14.3-2.1
lvm2: 2.03.11-2.1
lxc-pve: 5.0.2-2
lxcfs: 5.0.3-pve1
novnc-pve: 1.4.0-1
proxmox-backup-client: 2.3.3-1
proxmox-backup-file-restore: 2.3.3-1
proxmox-kernel-helper: 7.4-1
proxmox-mail-forward: 0.1.1-1
proxmox-mini-journalreader: 1.3-1
proxmox-widget-toolkit: 3.6.3
pve-cluster: 7.3-3
pve-container: 4.4-3
pve-docs: 7.4-2
pve-edk2-firmware: 3.20221111-1
pve-firewall: 4.3-1
pve-firmware: 3.6-4
pve-ha-manager: 3.6.0
pve-i18n: 2.11-1
pve-qemu-kvm: 7.2.0-8
pve-xtermjs: 4.16.0-1
qemu-server: 7.4-2
smartmontools: 7.2-pve3
spiceterm: 3.2-2
swtpm: 0.8.0~bpo11+3
vncterm: 1.7-1
zfsutils-linux: 2.1.9-pve1
root@wg-node21:~# pvecm status
Cluster information
-------------------
Name: WGTEST02
Config Version: 24
Transport: knet
Secure auth: on

Quorum information
------------------
Date: Wed Jul 24 16:57:24 2024
Quorum provider: corosync_votequorum
Nodes: 23
Node ID: 0x00000001
Ring ID: 1.11ad
Quorate: Yes

Votequorum information
----------------------
Expected votes: 24
Highest expected: 24
Total votes: 23
Quorum: 13
Flags: Quorate

Membership information
----------------------
Nodeid Votes Name

1721810807919.png
1721811067786.png
 
Can I directly click the shutdown button for the node that needs maintenance? I have already migrated the virtual machine.
1721811695855.png
 
I have already migrated the virtual machine.
As a data point, you can tell Proxmox to automatically migrate VMs to new hosts by choosing the migrate shutdown policy in Datacenter options:

1721812679857.png

With that, when you press the "Shutdown" button for a host it will automatically migrate all High Availability VMs to other hosts before it does the shut down.

For "can I just press the shutdown button?", yes you can. At least, you can if your host isn't running Ceph storage on itself.

I don't use Ceph much personally at this stage, but when I was initially trying it out it seemed to want the OSD disks on a host evacuated before shutting down that host. If you're not using Ceph though, it's probably not something to care about. :)
 
As a data point, you can tell Proxmox to automatically migrate VMs to new hosts by choosing the migrate shutdown policy in Datacenter options:

View attachment 71765

With that, when you press the "Shutdown" button for a host it will automatically migrate all High Availability VMs to other hosts before it does the shut down.

For "can I just press the shutdown button?", yes you can. At least, you can if your host isn't running Ceph storage on itself.

I don't use Ceph much personally at this stage, but when I was initially trying it out it seemed to want the OSD disks on a host evacuated before shutting down that host. If you're not using Ceph though, it's probably not something to care about. :)
thks bro
I have not turned on HA now because my resources are limited and the virtual machines are oversold. I am currently using glusterfs. Yesterday I made the change and shut down the operation directly. It went smoothly.
 
  • Like
Reactions: justinclift