[SOLVED] Proxmox live migration problems with local storage

Ahmet Bas

Well-Known Member
Aug 3, 2018
74
0
46
32
Hi,

We are experiencing an issue with Proxmox live migration with local storage.

We have the following Proxmox setup:

1x Proxmox 6.3-4
Hardware
Dell Poweredge R430
2x E5-2640 V3 @2.6
128 GB RAM
2x 1TB SATA in ZFS RAID1
5x 2TB SSD PM883 in RAIDZ

1x Proxmox 6.3-4
Hardware
Dell PowerEdge R730xd
2x E5 2620 V4 @2.1
128 GB RAM
2x 1TB SATA in ZFS RAID1
5x 2TB SSD PM883 in RAIDZ


We are using local storage with ZFS, but when we are migrating VMs online (through the Proxmox interface) from HV1 to HV2 or from HV2 to HV1 it's causing problems. The VMs on the target node becomes unavailable and we are seeing an increase in CPU load and an IO delay of 20 - 25 in the Proxmox WEB UI. We are using Westmere as CPU type on the VMs and de Hard disk type on VMs is WB unsafe with discard and SSD emulation enabled. We are using VirtIO SCSI as controller.

We tried the following options but without success:
- Limiting bandwidth limits of migration
- Limiting CPU limit of VMs
- Limiting read/write of VMs
- Using different CPU types on VMs: Westmere, kvm64, qemu64

What could be the cause of this problem?
 
So the migration itself works, but afterwards the migrated VM and all other VMs on the node become unresponsive and CPU load and IO delay increase?

Please provide your storage config (/etc/pve/storage.cfg), the VM config, the task log of a migration which leads to this problem and if possible the syslog since slightly before starting the migration.
 
So the migration itself works, but afterwards the migrated VM and all other VMs on the node become unresponsive and CPU load and IO delay increase?

Please provide your storage config (/etc/pve/storage.cfg), the VM config, the task log of a migration which leads to this problem and if possible the syslog since slightly before starting the migration.
Yes, once the migration is started the CPU goes from 8 to 160+ on the target node. All other nodes on the target become unresponsive due to the high CPU load of the target node. Below the requested logs. I had to cancel the task to prevent unresponsive VMs.

>Please provide your storage config (/etc/pve/storage.cfg)

Code:
zfspool: pve1
    pool pve1
    content rootdir,images
    mountpoint /pve1
    nodes pve1
    sparse 1

zfspool: pve2
    pool pve2
    content rootdir,images
    mountpoint /pve2
    nodes pve2
    sparse 1

>the VM config,


Code:
agent: 1
bootdisk: scsi0
cores: 4
cpu: Westmere
ide0: none,media=cdrom
memory: 8192
name: rdp
net0: virtio=2e:e0:8d:82:56:3d,bridge=vmbr3043,rate=125
net1: virtio=EA:CD:80:F1:CA:BC,bridge=vmbr1338
net2: virtio=B6:75:C1:CD:9B:EE,bridge=vmbr1339
net3: virtio=0E:5C:56:49:16:8C,bridge=vmbr3049
net4: virtio=46:28:FB:73:7E:AB,bridge=vmbr1340,firewall=1
net5: virtio=22:1B:84:66:9E:47,bridge=vmbr3909,firewall=1
numa: 1
onboot: 1
ostype: win10
scsi0: pve1:vm-2708-disk-0,cache=unsafe,discard=on,format=raw,size=200G,ssd=1
scsi1: pve1:vm-2708-disk-1,cache=unsafe,discard=on,format=raw,size=1T,ssd=1
scsihw: virtio-scsi-pci
smbios1: uuid=9729b8f4-57a5-4efd-843a-df3a5b3ea528
sockets: 1
vmgenid: 62171785-fe08-4cbf-aef3-8eebdb5674dc

>the task log of a migration

Code:
2021-02-25 13:44:15 use dedicated network address for sending migration traffic (10.28.28.102)
2021-02-25 13:44:15 starting migration of VM 2708 to node 'pve2' (10.28.28.102)
2021-02-25 13:44:15 found local disk 'pve1:vm-2708-disk-0' (in current VM config)
2021-02-25 13:44:15 found local disk 'pve1:vm-2708-disk-1' (in current VM config)
2021-02-25 13:44:15 copying local disk images
2021-02-25 13:44:15 starting VM 2708 on remote node 'pve2'
2021-02-25 13:44:25 start remote tunnel
2021-02-25 13:44:26 ssh tunnel ver 1
2021-02-25 13:44:26 starting storage migration
2021-02-25 13:44:26 scsi1: start migration to nbd:10.28.28.102:60001:exportname=drive-scsi1
drive mirror is starting for drive-scsi1 with bandwidth limit: 35840 KB/s
drive-scsi1: Cancelling block job

syslog


Code:
Feb 25 13:42:56 pve2 pveproxy[30071]: worker exit
Feb 25 13:42:56 pve2 pveproxy[7391]: worker 30071 finished
Feb 25 13:42:56 pve2 pveproxy[7391]: starting 1 worker(s)
Feb 25 13:42:56 pve2 pveproxy[7391]: worker 18742 started
Feb 25 13:43:00 pve2 systemd[1]: Starting Proxmox VE replication runner...
Feb 25 13:43:01 pve2 systemd[1]: pvesr.service: Succeeded.
Feb 25 13:43:01 pve2 systemd[1]: Started Proxmox VE replication runner.
Feb 25 13:43:15 pve2 pveproxy[17829]: worker exit
Feb 25 13:43:15 pve2 pveproxy[7391]: worker 17829 finished
Feb 25 13:43:15 pve2 pveproxy[7391]: starting 1 worker(s)
Feb 25 13:43:15 pve2 pveproxy[7391]: worker 21557 started
Feb 25 13:43:20 pve2 pveproxy[18742]: Clearing outdated entries from certificate cache
Feb 25 13:44:00 pve2 systemd[1]: Starting Proxmox VE replication runner...
Feb 25 13:44:01 pve2 systemd[1]: pvesr.service: Succeeded.
Feb 25 13:44:01 pve2 systemd[1]: Started Proxmox VE replication runner.
Feb 25 13:44:14 pve2 pmxcfs[6803]: [status] notice: received log
Feb 25 13:44:14 pve2 systemd[1]: Started Session 430 of user root.
Feb 25 13:44:15 pve2 systemd[1]: session-430.scope: Succeeded.
Feb 25 13:44:15 pve2 systemd[1]: Started Session 431 of user root.
Feb 25 13:44:15 pve2 systemd[1]: session-431.scope: Succeeded.
Feb 25 13:44:15 pve2 systemd[1]: Started Session 432 of user root.
Feb 25 13:44:15 pve2 systemd[1]: session-432.scope: Succeeded.
Feb 25 13:44:15 pve2 systemd[1]: Started Session 433 of user root.
Feb 25 13:44:16 pve2 qm[26406]: <root@pam> starting task UPID:pve2:00006799:032F9C29:60379BA0:qmstart:2708:root@pam:
Feb 25 13:44:16 pve2 qm[26521]: start VM 2708: UPID:pve2:00006799:032F9C29:60379BA0:qmstart:2708:root@pam:
Feb 25 13:44:19 pve2 systemd[1]: Started 2708.scope.
Feb 25 13:44:19 pve2 systemd-udevd[26373]: Using default interface naming scheme 'v240'.
Feb 25 13:44:19 pve2 systemd-udevd[26373]: link_config: autonegotiation is unset or enabled, the speed and duplex are not writable.
Feb 25 13:44:19 pve2 systemd-udevd[26373]: Could not generate persistent MAC address for tap2708i0: No such file or directory
Feb 25 13:44:20 pve2 kernel: [534522.279035] device tap2708i0 entered promiscuous mode
Feb 25 13:44:20 pve2 kernel: [534522.291055] vmbr3043: port 11(tap2708i0) entered blocking state
Feb 25 13:44:20 pve2 kernel: [534522.291057] vmbr3043: port 11(tap2708i0) entered disabled state
Feb 25 13:44:20 pve2 kernel: [534522.291213] vmbr3043: port 11(tap2708i0) entered blocking state
Feb 25 13:44:20 pve2 kernel: [534522.291215] vmbr3043: port 11(tap2708i0) entered forwarding state
Feb 25 13:44:20 pve2 kernel: [534522.357032] HTB: quantum of class 10001 is big. Consider r2q change.
Feb 25 13:44:20 pve2 systemd-udevd[26373]: link_config: autonegotiation is unset or enabled, the speed and duplex are not writable.
Feb 25 13:44:20 pve2 systemd-udevd[26373]: Could not generate persistent MAC address for tap2708i1: No such file or directory
Feb 25 13:44:20 pve2 kernel: [534523.002573] device tap2708i1 entered promiscuous mode
 
If you wait long enough, do you get some progress reported on the task?
I can't really tell how long the task was running before it was canceled based on the output you sent.

Have you tried migrating a VM with smaller disks (10G or so)?
 
If you wait long enough, do you get some progress reported on the task?
I can't really tell how long the task was running before it was canceled based on the output you sent.

Have you tried migrating a VM with smaller disks (10G or so)?

Yes, it worked before if you wait long enough it will start. It seems to be giving problems during VM disk creation on the target.
 
Writing on a RAIDZ can be slow and cpu/disk intensive. Especially if you have a disk with 1TB of size.
Are you using an HBA or are the disks connected to a RAID controller?
 
Writing on a RAIDZ can be slow and cpu/disk intensive. Especially if you have a disk with 1TB of size.
Are you using an HBA or are the disks connected to a RAID controller?
We are now using HBA, but we got the same issue with the server HW RAID controller.

What do you advise if live migration is a must? What would your setup be ?
 
If you have enough space, you could rename the pool on one of the nodes and use storage replication [0]. This would mean the disks are already available on the other node and just need another sync with the changes since the last replication.

[0] https://pve.proxmox.com/pve-docs/pve-admin-guide.html#chapter_pvesr

Thanks for your reply, but the replication is not real-time? I see also this in the article:

Code:
High-Availability is allowed in combination with storage replication, but it has the following implications:
- as live-migrations are currently not possible, redistributing services after a more preferred node comes online does not work. Keep that in mind when configuring your HA groups and their priorities for replicated guests.
- recovery works, but there may be some data loss between the last synced time and the time a node failed.

I rather resolve this issue than implementing a workaround. I hope that there are other customers who are using ZFS with local storage where the live migration is not resulting in NON-responsible VMs? I hope you can help us sort out the root cause which is still unknown to me. I can understand that writing on ZFS RAIDz can be slow and CPU/disk intensive, but this should not lead to non-responsible VMs.
 
The root issue is the allocation of the disk on the target node. I'm not sure if it needs to be zeroed as well because of nbd, which is used for live storage migration. The same happens with LVM thin.

Live migration is supported with replication, so this seems to be outdated information.
Replication runs in intervals (e.g. every 10 minutes), so everything that has been changed since the last replication will be lost if the node shuts down, crashes, etc. The highest frequency for replication you can choose is every minute.

WIth your current setup you can't use HA and you can't just start the VM on another node if the current one is down. But with storage replication this can be done and what's more important, the disks are already created and live migration will only need to copy the changed parts of the disk since the last replication, not everything.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!