Slow KVM Backup times and HA issue during back

ejc317

Member
Oct 18, 2012
263
0
16
Hi all,

We're getting there, slowly, but surely.

So we have the cluster up - nodes 1-4 and have a VM up (KVM) and during a backup it

a) shuts the node off - so no live backups?
b) as it's backing up HA kicks in and says "ok" that the VM has moved from node 3 to node 1 ... but in the GUI it still shows up under node 3. When I click start the VM, it does the HA start VM 101 again and says OK that it's moved.
c) this 32gb harddrive is taking more than 30 minutes to backup?
d) it's only backing up the IDE drive not the virtual drive (100GB virtio drive and 30GB IDE drive) ... is there a reason? or do I have to choose separately

Is this normal or something we should set specifically?

Thank you very much in advance
 
a) shuts the node off - so no live backups?

What? The node or the VM?

b) as it's backing up HA kicks in and says "ok" that the VM has moved from node 3 to node 1 ... but in the GUI it still shows up under node 3. When I click start the VM, it does the HA start VM 101 again and says OK that it's moved.

What happens exactly? That requires further analysis. I guess your network is overloaded, and that breaks cluster communication. For HA, you should use a separate network for cluster to avoid such things.

c) this 32gb harddrive is taking more than 30 minutes to backup?

That depends on the speed of the storage.


d) it's only backing up the IDE drive not the virtual drive (100GB virtio drive and 30GB IDE drive) ... is there a reason? or do I have to choose separately

This is not normal. Any hint in the backup log?
 
1) # 1 is solved - seems like user error
2) It says server restarted on node 1 and logs say its fine no errors but server never moved. I actually have a question. We have 2 public NICs bonded in active / failover and 2 NICs currently unbonded but we will either do 802.3ad or Balance-alb ... (any suggestions?) Only issue I see with 802.3ad is that our 2 private nics go to separate switches and I can't bond them switch side (switches are stacked but can't trunk across stacks)

3) Storage is an SSD array with 8 x Samsung 830 SSDs with an LSI 9266-8i with 1GB cache and a cachevault so the speed is there. I checked seems like it was 25% IO wait and the network port was saturated

It seems that the cluster hostname and related IP becomes the initiator and the target is the target that was added originally to proxmox. IE if I have 2 nics, unless its bonded, it won't go out the other private nic. Similarly for the target it'll just chose 10.10.1.100 (for example) as the SAN ip even though I have 15 other 1gbps interfaces that are not getting any traffic (i've chekced switch logs)

I will try to get multipathing to work but it seems that the scsi device no longer shows up (I assume the LVM group is busy)
 
Or conversely, how do I have proxmox use a different NIC for iscsi communication than it does for others?

We had the proxmox IP as the public IP but were afraid that transfers will take place over the public network (67.x.x.x) so we changed the proxmox IP to (10.10.1.x) on the private network.

Ideally, we can have a public IP that is used for cluster communication and a private network for iSCSI but how do we accomplish that via the cluster config? The other reason we had the cluster IP as a private IP is in case we need to re-number, it won't mess up the cluster but do you suggest we go back to having public IP for cluster IPs??
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!