I am in the process of upgrading a cluster from proxmox 3 to 4.3
I am doing the upgrade by migrating all VMs off a node, removing that node form the cluster and doing a clean install of 4.3, partly just so it is clean, partly because I am changing how I am setting up the volumes for VM disk images. I am going to lvm-thin
When I do a restore on ANY of the volumes, either the SSD LVM-thin volume or the HDD local LVM volume, I see load on the server and on every VM spike very, very high. This does NOT occur while the actual restore from the backup is running, but instead after the restore hits 100%. From that point to when it finally finishes is about ten minutes, and during that time the load spikes to 25 or higher. VMs go unresponsive etc.
It appears to me that it occurs when the restore starts the process of actually adding the VM or container to the cluster. It occurs even if all the existing VMs are on one SSD volume and I am restoring to a different SSD volume, so it is not contention there.
I was assuming that the load starts in the base server, but it seems to get worse the more running VMs are on the node. i.e. when I restored the second VM the load spiked to 7. when I did the third the load spiked to 10. Now, with 10 VMs it is spiking to 25 (on the base server) and about 10 or 15 on each VM.
So, now I am wondering if something in the cluster is driving load way up in each VM.
After the restore finishes and everything settles down it is all perfect. No performance issues. No errors in logs.
Anyone seen this?
I am doing the upgrade by migrating all VMs off a node, removing that node form the cluster and doing a clean install of 4.3, partly just so it is clean, partly because I am changing how I am setting up the volumes for VM disk images. I am going to lvm-thin
When I do a restore on ANY of the volumes, either the SSD LVM-thin volume or the HDD local LVM volume, I see load on the server and on every VM spike very, very high. This does NOT occur while the actual restore from the backup is running, but instead after the restore hits 100%. From that point to when it finally finishes is about ten minutes, and during that time the load spikes to 25 or higher. VMs go unresponsive etc.
It appears to me that it occurs when the restore starts the process of actually adding the VM or container to the cluster. It occurs even if all the existing VMs are on one SSD volume and I am restoring to a different SSD volume, so it is not contention there.
I was assuming that the load starts in the base server, but it seems to get worse the more running VMs are on the node. i.e. when I restored the second VM the load spiked to 7. when I did the third the load spiked to 10. Now, with 10 VMs it is spiking to 25 (on the base server) and about 10 or 15 on each VM.
So, now I am wondering if something in the cluster is driving load way up in each VM.
After the restore finishes and everything settles down it is all perfect. No performance issues. No errors in logs.
Anyone seen this?