Hi All,
So we've been running Proxmox for a while here at work, and while virtualizing machines as we move to phase out the legacy equipment in our rack we've temporarily cloned some of these machines to our cluster (currently sitting at 5 nodes). The issue we're having is when we went to run a full backup yesterday and the backup started on the largest temp. VM we had (HDD of 5+TB), Our 100GB swap filled after about 4 hours and caused almost a complete slow-down in the system. We stopped the backup, and did what we could to bring swap back to a lower figure (albeit swapoff && swapon caused a complete lockup once it started flushing out swap prompting a hard reboot).
My questions are: Why is this happening? Can I prevent it? Should I be doing something different with regards to how my cluster is setup? My supervisor is now worried that Proxmox isn't scalable well enough to handle 4+TB VMs, since our main file server now runs on proxmox, we can't have these systems locking up and forcing a reboot from a single backup.
Let me know what info/screenshots are needed and i'll happily facilitate!
System Info:
So we've been running Proxmox for a while here at work, and while virtualizing machines as we move to phase out the legacy equipment in our rack we've temporarily cloned some of these machines to our cluster (currently sitting at 5 nodes). The issue we're having is when we went to run a full backup yesterday and the backup started on the largest temp. VM we had (HDD of 5+TB), Our 100GB swap filled after about 4 hours and caused almost a complete slow-down in the system. We stopped the backup, and did what we could to bring swap back to a lower figure (albeit swapoff && swapon caused a complete lockup once it started flushing out swap prompting a hard reboot).
My questions are: Why is this happening? Can I prevent it? Should I be doing something different with regards to how my cluster is setup? My supervisor is now worried that Proxmox isn't scalable well enough to handle 4+TB VMs, since our main file server now runs on proxmox, we can't have these systems locking up and forcing a reboot from a single backup.
Let me know what info/screenshots are needed and i'll happily facilitate!
System Info:
Code:
root@vhost1:~# pveversion -v
proxmox-ve: 6.4-1 (running kernel: 5.4.114-1-pve)
pve-manager: 6.4-6 (running version: 6.4-6/be2fa32c)
pve-kernel-5.4: 6.4-2
pve-kernel-helper: 6.4-2
pve-kernel-5.4.114-1-pve: 5.4.114-1
pve-kernel-5.4.106-1-pve: 5.4.106-1
pve-kernel-5.4.78-2-pve: 5.4.78-2
pve-kernel-5.4.73-1-pve: 5.4.73-1
ceph-fuse: 12.2.11+dfsg1-2.1+b1
corosync: 3.1.2-pve1
criu: 3.11-3
glusterfs-client: 5.5-3
ifupdown: residual config
ifupdown2: 3.0.0-1+pve3
ksm-control-daemon: 1.3-1
libjs-extjs: 6.0.1-10
libknet1: 1.20-pve1
libproxmox-acme-perl: 1.1.0
libproxmox-backup-qemu0: 1.0.3-1
libpve-access-control: 6.4-1
libpve-apiclient-perl: 3.1-3
libpve-common-perl: 6.4-3
libpve-guest-common-perl: 3.1-5
libpve-http-server-perl: 3.2-2
libpve-storage-perl: 6.4-1
libqb0: 1.0.5-1
libspice-server1: 0.14.2-4~pve6+1
lvm2: 2.03.02-pve4
lxc-pve: 4.0.6-2
lxcfs: 4.0.6-pve1
novnc-pve: 1.1.0-1
proxmox-backup-client: 1.1.6-2
proxmox-mini-journalreader: 1.1-1
proxmox-widget-toolkit: 2.5-5
pve-cluster: 6.4-1
pve-container: 3.3-5
pve-docs: 6.4-2
pve-edk2-firmware: 2.20200531-1
pve-firewall: 4.1-3
pve-firmware: 3.2-3
pve-ha-manager: 3.1-1
pve-i18n: 2.3-1
pve-qemu-kvm: 5.2.0-6
pve-xtermjs: 4.7.0-3
qemu-server: 6.4-2
smartmontools: 7.2-pve2
spiceterm: 3.1-1
vncterm: 1.6-2
zfsutils-linux: 0.8.5-pve1
Code:
root@vhost1:~# pvecm status
Cluster information
-------------------
Name: cluster
Config Version: 5
Transport: knet
Secure auth: on
Quorum information
------------------
Date: Tue May 25 11:09:45 2021
Quorum provider: corosync_votequorum
Nodes: 5
Node ID: 0x00000003
Ring ID: 1.a1b
Quorate: Yes
Votequorum information
----------------------
Expected votes: 5
Highest expected: 5
Total votes: 5
Quorum: 3
Flags: Quorate
Membership information
----------------------
Nodeid Votes Name
0x00000001 1 10.60.4.123
0x00000002 1 10.60.4.124
0x00000003 1 10.60.4.121 (local)
0x00000004 1 10.60.4.122
0x00000005 1 10.60.4.132