Live-Migration almost freezes Targetnode

m.witt · Nov 26, 2020

Hi,

we have a problem with live-migrations on Proxmox :

Whenever we try to live-migrate a VM from Node A to Node B, Node B gets high IOWait and load constantly increases.
VMs on Node B, which have their disks in the same ZFS Pool as the "migrating VM", also become unresponsive due to the high IOwait ( 80% - 100 % )

When doing an offline-migration (same everthing, just offline) everything works without any problems or symptoms of high IOwait.
The problem becomes noticable when the actual transfer of data starts and disappears if we either stop the migration AND kill the target qemu process or let it finish its transfer.

Does anybody ever had a similiar problem ?

root@prx002:~# pveversion -v
proxmox-ve: 6.2-2 (running kernel: 5.4.73-1-pve)
pve-manager: 6.2-15 (running version: 6.2-15/48bd51b6)
pve-kernel-5.4: 6.3-1
pve-kernel-helper: 6.3-1
pve-kernel-5.3: 6.1-6
pve-kernel-5.4.73-1-pve: 5.4.73-1
pve-kernel-5.4.65-1-pve: 5.4.65-1
pve-kernel-5.4.60-1-pve: 5.4.60-2
pve-kernel-5.3.18-3-pve: 5.3.18-3
pve-kernel-5.3.18-2-pve: 5.3.18-2
ceph-fuse: 12.2.11+dfsg1-2.1+b1
corosync: 3.0.4-pve1
criu: 3.11-3
glusterfs-client: 5.5-3
ifupdown: residual config
ifupdown2: 3.0.0-1+pve3
ksmtuned: 4.20150325+b1
libjs-extjs: 6.0.1-10
libknet1: 1.16-pve1
libproxmox-acme-perl: 1.0.5
libpve-access-control: 6.1-3
libpve-apiclient-perl: 3.0-3
libpve-common-perl: 6.2-4
libpve-guest-common-perl: 3.1-3
libpve-http-server-perl: 3.0-6
libpve-storage-perl: 6.2-10
libqb0: 1.0.5-1
libspice-server1: 0.14.2-4~pve6+1
lvm2: 2.03.02-pve4
lxc-pve: 4.0.3-1
lxcfs: 4.0.3-pve3
novnc-pve: 1.1.0-1
proxmox-backup-client: 1.0.1-1
proxmox-mini-journalreader: 1.1-1
proxmox-widget-toolkit: 2.3-10
pve-cluster: 6.2-1
pve-container: 3.2-3
pve-docs: 6.2-6
pve-edk2-firmware: 2.20200531-1
pve-firewall: 4.1-3
pve-firmware: 3.1-3
pve-ha-manager: 3.1-1
pve-i18n: 2.2-2
pve-qemu-kvm: 5.1.0-6
pve-xtermjs: 4.7.0-2
qemu-server: 6.2-20
smartmontools: 7.1-pve2
spiceterm: 3.1-1
vncterm: 1.6-2
zfsutils-linux: 0.8.5-pve1

Dominic · Nov 27, 2020

You can try to set a bandwidth limit for your migration. Either directly using qm migrate ... --bwlimit 100 or by editing /etc/pve/datacenter.cfg.

m.witt · Nov 27, 2020

Already tried that and the problem became less noticable but still occured. Limited to ~ 500 mbit/s while the Nodes are connected with 10 GBE.

Is this a known issue ? If so : Are there any recommended setups/configurations where this problem doesn't exist ?

Regards,

Dominic · Nov 30, 2020

ZFS online migration uses qemu drive mirror with an nbd server on the target while offline migration uses zfs send. What speeds does your zpool achieve? Just to narrow the problem down: What happens when you set the limit really low?

m.witt · Nov 30, 2020

The following are Benchmark results from a few days ago :

root@prx001:~# pveperf /data/
CPU BOGOMIPS: 96019.44
REGEX/SECOND: 1478871
HD SIZE: 598.03 GB (data)
FSYNCS/SECOND: 3661.94
DNS EXT: 276.31 ms
DNS INT: 3.59 ms (xxxxx.de)

root@prx002:~# pveperf /data/
CPU BOGOMIPS: 220840.56
REGEX/SECOND: 2058114
HD SIZE: 17828.08 GB (data)
FSYNCS/SECOND: 3365.49
DNS EXT: 335.62 ms
DNS INT: 4.09 ms (xxxxx.de)

root@prx003:~# pveperf /data/
CPU BOGOMIPS: 220817.52
REGEX/SECOND: 2931948
HD SIZE: 21206.91 GB (data)
FSYNCS/SECOND: 6799.00
DNS EXT: 320.61 ms
DNS INT: 3.57 ms (xxxxx.de)

Prx001 :
fio --filename=/data/test/fio-test --filesize=32G --direct=1 --sync=1 --rw=write --bs=4k --numjobs=1 --iodepth=1 --group_reporting --name=test
Run status group 0 (all jobs):
WRITE: bw=38.1MiB/s (39.0MB/s), 38.1MiB/s-38.1MiB/s (39.0MB/s-39.0MB/s), io=32.0GiB (34.4GB), run=859125-859125msec

Prx002 :
fio --filename=/data/test/fio-test --filesize=32G --direct=1 --sync=1 --rw=write --bs=4k --numjobs=1 --iodepth=1 --group_reporting --name=test
Run status group 0 (all jobs):
WRITE: bw=29.8MiB/s (31.3MB/s), 29.8MiB/s-29.8MiB/s (31.3MB/s-31.3MB/s), io=32.0GiB (34.4GB), run=1097847-1097847msec

Prx003 :
fio --filename=/data/test/fio-test --filesize=32G --direct=1 --sync=1 --rw=write --bs=4k --numjobs=1 --iodepth=1 --group_reporting --name=test
Run status group 0 (all jobs):
WRITE: bw=54.1MiB/s (56.7MB/s), 54.1MiB/s-54.1MiB/s (56.7MB/s-56.7MB/s), io=32.0GiB (34.4GB), run=605893-605893msec

#without --direct=1 and --sync=1
fio --filename=/data/test/fio-test --filesize=32G --rw=write --bs=4k --numjobs=1 --iodepth=1 --group_reporting --name=journal-test
Run status group 0 (all jobs):
WRITE: bw=301MiB/s (315MB/s), 301MiB/s-301MiB/s (315MB/s-315MB/s), io=32.0GiB (34.4GB), run=108971-108971msec

Im not exactly sure if this is the correct and best way to benchmark the pool.

I will test the migration with lower values and report back again.

m.witt · Nov 30, 2020

I tested the following bandwith settings :

Code:

qm migrate 100 prx001 --with-local-disks --online --bwlimit 100    # ~ 100 kb/s | 0,8 mbit/s = inconspicuous
qm migrate 100 prx001 --with-local-disks --online --bwlimit 1221   # ~ 1 MB/s   |   8 mbit/s = inconspicuous
qm migrate 100 prx001 --with-local-disks --online --bwlimit 9765   # ~ 10 MB/s  |  80 mbit/s = 1 - 5 % IOwait for a few seconds

#online
qm migrate 100 prx001 --with-local-disks --online --bwlimit 48825  # ~ 50 MB/s  | 400 mbit/s = 80-90% IOwait -> dead
qm migrate 100 prx001 --with-local-disks --online --bwlimit 97656  # ~ 100 MB/s | 800 mbit/s = 80-90% IOwait -> dead


#offline
qm migrate 100 prx001 --with-local-disks --bwlimit 48825           # ~ 50 MB/s  | 400 mbit/s = 1 - 5 % IOwait for a few seconds
qm migrate 100 prx002 --with-local-disks --bwlimit 97656           # ~ 100 MB/s | 800 mbit/s = 1 - 5 % IOWait for a few seconds

Im wondering what qemu drive mirroring is doing so much worse compared to the zfs send / receive way.

Dominic · Dec 1, 2020

What disks do you use exactly? How are they attached to the host? How is your pool set up zpool status?

Somewhat guessed explanation: In contrast to ZFS send receive, Qemu probably doesn't send a single stream. When it is online, small parts of the image change, so those parts have to be sent again. This could be especially bad in case your VMs write a lot when online or you have spinning disks.

Could you try to measure how much data is sent during an offline and an online migration using something like nload or iftop?

m.witt · Dec 1, 2020

Current Hardware :

Code:

Prx001, Dell R420 2 x Intel Xeon E5-2430L, 64 GB RAM, 8 x 300GB SAS 15k rpm (Seagate Savvio 15K.3 ST9300653SS) ZFS - RAID-Z1 + SLOG + L2ARC (NVMe), Connected to Perc H710P Mini (each disk exported as Raid0)
Prx002, Dell R730xd 2 x Intel Xeon E5-2670v3, 256 GB RAM, 24 x 1.8TB SAS 10k rpm (HGST Ultrastar C10K1800 / HUC101818CS4204 ) ZFS - stripe over mirror + SLOG + L2ARC (NVMe), Connected to Dell HBA330 Mini
Prx003, Dell R730xd 2 x Intel Xeon E5-2670v3, 256 GB RAM, 12 x 4TB SAS 7.200k rpm ( Western Digital Ultrastar DC HC310 / HUS726T4TAL5204 ) ZFS - stripe over mirror + SLOG + L2ARC (NVMe), Connected to Dell HBA330 Mini

Each system also got 2 SSDs in Softwareraid 1 for boot

pool: data
state: ONLINE
scan: scrub repaired 0B in 0 days 00:34:16 with 0 errors on Sun Nov 8 00:58:17 2020
config:

NAME STATE READ WRITE CKSUM
data ONLINE 0 0 0
raidz1-0 ONLINE 0 0 0
scsi-36b8ca3a0e6c99600266cdb170486270e ONLINE 0 0 0
wwn-0x6b8ca3a0e6c99600266cdb20050ec6b4 ONLINE 0 0 0
wwn-0x6b8ca3a0e6c99600266cdb2605664d5d ONLINE 0 0 0
wwn-0x6b8ca3a0e6c99600266cdb2e05e347a0 ONLINE 0 0 0
wwn-0x6b8ca3a0e6c99600266cdb3306306e41 ONLINE 0 0 0
wwn-0x6b8ca3a0e6c99600266cdb38067ed38a ONLINE 0 0 0
logs
mirror-1 ONLINE 0 0 0
nvme-eui.343335304e4144920025384100000001-part2 ONLINE 0 0 0
nvme-eui.343335304e4143000025384100000001-part2 ONLINE 0 0 0
cache
nvme1n1p1 ONLINE 0 0 0
nvme0n1p1 ONLINE 0 0 0

errors: No known data errors

pool: rpool
state: ONLINE
scan: scrub repaired 0B in 0 days 00:10:16 with 0 errors on Sun Nov 8 00:34:21 2020
config:

NAME STATE READ WRITE CKSUM
rpool ONLINE 0 0 0
mirror-0 ONLINE 0 0 0
scsi-36b8ca3a0e6c99600266cdb070390a7d3-part3 ONLINE 0 0 0
scsi-36b8ca3a0e6c99600266cdb10042019ba-part3 ONLINE 0 0 0

errors: No known data errors

pool: data
state: ONLINE
scan: scrub repaired 0B in 0 days 00:17:33 with 0 errors on Sun Nov 8 00:41:35 2020
config:

NAME STATE READ WRITE CKSUM
data ONLINE 0 0 0
mirror-0 ONLINE 0 0 0
sdb ONLINE 0 0 0
sdc ONLINE 0 0 0
mirror-1 ONLINE 0 0 0
sdd ONLINE 0 0 0
sde ONLINE 0 0 0
mirror-2 ONLINE 0 0 0
sdf ONLINE 0 0 0
sdg ONLINE 0 0 0
mirror-3 ONLINE 0 0 0
sdh ONLINE 0 0 0
sdi ONLINE 0 0 0
mirror-4 ONLINE 0 0 0
sdj ONLINE 0 0 0
sdk ONLINE 0 0 0
mirror-5 ONLINE 0 0 0
sdl ONLINE 0 0 0
sdm ONLINE 0 0 0
mirror-6 ONLINE 0 0 0
sdn ONLINE 0 0 0
sdo ONLINE 0 0 0
mirror-7 ONLINE 0 0 0
sdp ONLINE 0 0 0
sdq ONLINE 0 0 0
mirror-8 ONLINE 0 0 0
sdr ONLINE 0 0 0
sds ONLINE 0 0 0
mirror-9 ONLINE 0 0 0
sdt ONLINE 0 0 0
sdu ONLINE 0 0 0
mirror-10 ONLINE 0 0 0
sdv ONLINE 0 0 0
sdw ONLINE 0 0 0
mirror-11 ONLINE 0 0 0
sdx ONLINE 0 0 0
sdy ONLINE 0 0 0
logs
mirror-12 ONLINE 0 0 0
nvme0n1p2 ONLINE 0 0 0
nvme1n1p2 ONLINE 0 0 0
cache
nvme0n1p1 ONLINE 0 0 0
nvme1n1p1 ONLINE 0 0 0

errors: No known data errors

pool: rpool
state: ONLINE
scan: scrub repaired 0B in 0 days 00:12:57 with 0 errors on Sun Nov 8 00:37:00 2020
config:

NAME STATE READ WRITE CKSUM
rpool ONLINE 0 0 0
mirror-0 ONLINE 0 0 0
ata-Micron_M500DC_MTFDDAK800MBB_17251ACA1CBB-part3 ONLINE 0 0 0
ata-Micron_M500DC_MTFDDAK800MBB_17281AC47BD8-part3 ONLINE 0 0 0

errors: No known data errors

pool: data
state: ONLINE
scan: scrub repaired 0B in 0 days 00:09:47 with 0 errors on Sun Nov 8 00:33:49 2020
config:

NAME STATE READ WRITE CKSUM
data ONLINE 0 0 0
mirror-0 ONLINE 0 0 0
sdb ONLINE 0 0 0
sdc ONLINE 0 0 0
mirror-1 ONLINE 0 0 0
sdd ONLINE 0 0 0
sde ONLINE 0 0 0
mirror-2 ONLINE 0 0 0
sdf ONLINE 0 0 0
sdg ONLINE 0 0 0
mirror-3 ONLINE 0 0 0
sdh ONLINE 0 0 0
sdi ONLINE 0 0 0
mirror-4 ONLINE 0 0 0
sdj ONLINE 0 0 0
sdk ONLINE 0 0 0
mirror-5 ONLINE 0 0 0
sdl ONLINE 0 0 0
sdm ONLINE 0 0 0
logs
mirror-6 ONLINE 0 0 0
nvme0n1p2 ONLINE 0 0 0
nvme1n1p2 ONLINE 0 0 0
cache
nvme0n1p1 ONLINE 0 0 0
nvme1n1p1 ONLINE 0 0 0

errors: No known data errors

pool: rpool
state: ONLINE
scan: scrub repaired 0B in 0 days 00:00:44 with 0 errors on Sun Nov 8 00:24:47 2020
config:

NAME STATE READ WRITE CKSUM
rpool ONLINE 0 0 0
mirror-0 ONLINE 0 0 0
ata-Micron_M500DC_MTFDDAK800MBB_17281AC54E1D-part3 ONLINE 0 0 0
ata-Micron_M500DC_MTFDDAK800MBB_17251ACA1D41-part3 ONLINE 0 0 0

errors: No known data errors

I will try to do your suggested measurements today and report back.

Edit : I've added the zpool status as textfile since its more readable that way.

Dominic · Dec 2, 2020

A colleague mentioned that ZFS can differentiate between ZFS send I/O (offline migration) and those from a VM (online). So the former can be given less priority => less freeze. Additionally, ZFS send (offline) will probably handle holes/zero blocks more efficiently than the NBD online migration. This we should see in the traffic statistics.

What cache settings do you have for your VMs and ZFS? The output of zpool iostat -r might be interesting, too.

m.witt · Dec 2, 2020

I tested offline and online migration again and used vnstat -l -i vmbr1 for measuring. A few seconds overhead are included but not much.

It looks like the qemu drive mirror way really does transfer all the zero blocks. The real diskusage on this system is about 4 GB.

Results below

Offline :

Code:

                           rx         |       tx
--------------------------------------+------------------
  bytes                    61.11 MiB  |        4.57 GiB
--------------------------------------+------------------
          max           12.40 Mbit/s  |   922.94 Mbit/s
      average            7.77 Mbit/s  |   595.43 Mbit/s
          min              11 kbit/s  |       18 kbit/s
--------------------------------------+------------------
  packets                    1220459  |          538564
--------------------------------------+------------------
          max              29743 p/s  |       16268 p/s
      average              18491 p/s  |        8160 p/s
          min                 12 p/s  |          16 p/s
--------------------------------------+------------------
  time                  1.10 minutes

Last log lines :
2020-12-01 14:24:47 14:24:47   4.45G   data/vm-100-disk-0@__migration__
2020-12-01 14:24:49 [prx002] successfully imported 'data:vm-100-disk-0'
2020-12-01 14:24:49 volume 'data:vm-100-disk-0' is 'data:vm-100-disk-0' on the target
2020-12-01 14:24:50 migration finished successfully (duration 00:00:54)

Online :

Code:

                           rx         |       tx
--------------------------------------+------------------
  bytes                   365.15 MiB  |       51.48 GiB
--------------------------------------+------------------
          max            5.99 Mbit/s  |   822.10 Mbit/s
      average            4.76 Mbit/s  |   686.69 Mbit/s
          min              14 kbit/s  |       10 kbit/s
--------------------------------------+------------------
  packets                    7213726  |         1030623
--------------------------------------+------------------
          max              13964 p/s  |        3456 p/s
      average              11201 p/s  |        1600 p/s
          min                 15 p/s  |           9 p/s
--------------------------------------+------------------
  time                 10.73 minutes

Last log lines :

2020-12-02 12:22:50 migration status: completed
drive-scsi0: transferred: 53692334080 bytes remaining: 0 bytes total: 53692334080 bytes progression: 100.00 % busy: 0 ready: 1 
all mirroring jobs are ready 
drive-scsi0: Completing block job...
drive-scsi0: Completed successfully.
drive-scsi0 : finished
2020-12-02 12:22:51 stopping NBD storage migration server on target.
2020-12-02 12:23:05 migration finished successfully (duration 00:10:50)
TASK OK

All VMs are set to writethrough

Code:

agent: 1
balloon: 3072
boot: c
bootdisk: scsi0
cores: 4
hotplug: disk,network,usb,memory,cpu
memory: 4096
name: mw-migratest
nameserver: 1.1.1.1
net0: virtio=5A:C0:3D:DC:74:DA,bridge=vmbr0,firewall=1
numa: 1
onboot: 1
ostype: l26
scsi0: data:vm-100-disk-0,cache=writethrough,format=raw,size=50G
scsihw: virtio-scsi-pci
searchdomain: x-xxxxx.de
smbios1: uuid=10a9734e-eb19-462b-b27b-f50f9bd4bdfa
sockets: 1

Code:

root@prx001:~# zpool iostat -r
data          sync_read    sync_write    async_read    async_write      scrub         trim   
req_size      ind    agg    ind    agg    ind    agg    ind    agg    ind    agg    ind    agg
----------  -----  -----  -----  -----  -----  -----  -----  -----  -----  -----  -----  -----
512             0      0      0      0      0      0      0      0      0      0      0      0
1K              0      0      0      0      0      0      0      0      0      0      0      0
2K              0      0      0      0      0      0      0      0      0      0      0      0
4K          34.6M      0  2.28M      0  1.65M      0  52.6M      0    224      0      0      0
8K           422K  1.30M  6.98M      0  52.0K  58.2K  21.2M  18.4M     55     43      0      0
16K             0  1.15M  1.35M      0      0  56.9K  11.2K  11.2M      0     44      0      0
32K             0   773K  1.54M      0      0  47.8K  23.1K  5.22M      0     51      0      0
64K            32   405K  1.37M      0      0  31.3K  91.8K  1.90M      0     26      0      0
128K            0   208K  1.40M      0      0  25.8K  1.49M   413K      0      6      0      0
256K            0  70.7K      0      0      0  7.86K      0  17.5K      0      3      0      0
512K            0  12.8K      0      0      0  2.79K      0     18      0      0      0      0
1M              0   1003      0      0      0    470      0      0      0      0      0      0
2M              0      0      0      0      0      0      0      0      0      0      0      0
4M              0      0      0      0      0      0      0      0      0      0      0      0
8M              0      0      0      0      0      0      0      0      0      0      0      0
16M             0      0      0      0      0      0      0      0      0      0      0      0
----------------------------------------------------------------------------------------------

root@prx002:~# zpool iostat -r
data          sync_read    sync_write    async_read    async_write      scrub         trim    
req_size      ind    agg    ind    agg    ind    agg    ind    agg    ind    agg    ind    agg
----------  -----  -----  -----  -----  -----  -----  -----  -----  -----  -----  -----  -----
512             0      0      0      0      0      0      0      0      0      0      0      0
1K              0      0      0      0      0      0      0      0      0      0      0      0
2K              0      0      0      0      0      0      0      0      0      0      0      0
4K          66.6M      0  4.94M      0  2.41M      0   103M      0  1.14K      0      0      0
8K           194M   397K  67.5M      0  7.16M  69.0K  40.1M  31.6M    226    125      0      0
16K         84.1K  2.14M  43.8M      0  1.99K   275K  7.55M  35.1M     35    107      0      0
32K         1.38M  3.46M  15.7M      0   100K   417K  61.9M  24.7M  1.10K     83      0      0
64K           602  2.98M  6.63M      0      4   342K  3.98M  26.5M     16     56      0      0
128K            0  1.03M  17.7M      0      0   137K  9.94M  16.6M      0     25      0      0
256K            0   339K      0      0      0  44.4K      0  7.87M      0      6      0      0
512K            0  79.4K      0      0      0  15.8K      0  2.00M      0      1      0      0
1M              0     44      0      0      0     18      0  27.5K      0      0      0      0
2M              0      0      0      0      0      0      0      0      0      0      0      0
4M              0      0      0      0      0      0      0      0      0      0      0      0
8M              0      0      0      0      0      0      0      0      0      0      0      0
16M             0      0      0      0      0      0      0      0      0      0      0      0
----------------------------------------------------------------------------------------------

root@prx003:~# zpool iostat -r
data          sync_read    sync_write    async_read    async_write      scrub         trim    
req_size      ind    agg    ind    agg    ind    agg    ind    agg    ind    agg    ind    agg
----------  -----  -----  -----  -----  -----  -----  -----  -----  -----  -----  -----  -----
512             0      0      0      0      0      0      0      0      0      0      0      0
1K              0      0      0      0      0      0      0      0      0      0      0      0
2K              0      0      0      0      0      0      0      0      0      0      0      0
4K          25.5M      0  4.88M      0   529K      0  58.1M      0  1.37K      0      0      0
8K          56.3M   198K  36.0M      0   891K  14.1K  25.6M  16.3M    246    139      0      0
16K         35.4K   692K  9.02M      0    851  44.2K  3.43M  17.0M     44     95      0      0
32K          557K  1.00M  4.75M      0  20.8K  64.4K  12.6M  11.9M  1.15K     44      0      0
64K         1.04K   914K  2.84M      0      4  58.4K  2.42M  10.9M     16     27      0      0
128K            0   472K  17.2M      0      0  33.1K  23.6M  9.51M      0     11      0      0
256K            0   135K      0      0      0  11.1K      0  6.74M      0      5      0      0
512K            0  19.4K      0      0      0  3.12K      0  1.40M      0      1      0      0
1M              0    292      0      0      0    173      0  11.7K      0      0      0      0
2M              0      0      0      0      0      0      0      0      0      0      0      0
4M              0      0      0      0      0      0      0      0      0      0      0      0
8M              0      0      0      0      0      0      0      0      0      0      0      0
16M             0      0      0      0      0      0      0      0      0      0      0      0
----------------------------------------------------------------------------------------------

Dominic · Dec 3, 2020

Thank you for the detailed output!

So 10x the time & data for online migration. Seems like a good reason why this is working bad in contrast to offline migration to me.

m.witt · Dec 3, 2020

What does this prove to you ?
10x more data at 10x the time doesn't seem to be unusual.

I think online migrations still shouldn't be able to kill the target system with high iowait.

Did i maybe misunderstand some of your previous questions ?

filipealvarez · Jul 27, 2021

Can someone help me?

I have a 4 node 6.4-8 cluster with 1% average IO and 80% free memory but when I start online migration (lvmthin localdisks) the load goes up to 45% causing responsive issues on the VM's in this node.

In the moment that it occours, only IO goes up.

filipealvarez · Jul 27, 2021

Discovered here, when the VM doesnt have Discard option enabled the online migration works as expected.

The question is, trim inside the VM is capable of flush unused space on LVM-THIN of the host??

JamesT · Jul 28, 2021

@m.witt did you ever solve this? I've been facing the same or very similar issue with live migrations since we started using proxmox on 6.1 a year ago. Now on 6.3-3 and still having the same issue, live migration slows everything down, IO Delay goes way up, but cluster is still usable and VMs are usable. However if I start a 2nd live migration, everything completely starts hanging. Dedicated 10G network for migrations, insecure migrations is turned on (we had secure enabled before but it was way too slow).

To work around this issue, we use the scheduled replication every 15seconds-15 minutes depending on data criticality on the VM, and it greatly speeds up migrations. However it's not a ideal solution since I have to periodically reboot both nodes in the cluster because scheduled replications just start playing up (one or two might start failing with different issues) and Live Migrations stop working altogether. By periodically, I mean perhaps every 1 or 2 months.

abien · Jul 28, 2021

@JamesT We didn't solve it.
For a while we used a bandwidth limit of 125 MiB/s. That made the issue almost unnoticeable at the cost of longer migration durations.
However, since then we moved away from ZFS and started using ceph flash only.

JamesT · Jul 28, 2021

@abien Ok, thanks, appreciate the response. We are looking to migrate to ceph too when time permits.

JamesT · Jul 28, 2021

Dominic said:
A colleague mentioned that ZFS can differentiate between ZFS send I/O (offline migration) and those from a VM (online). So the former can be given less priority => less freeze. Additionally, ZFS send (offline) will probably handle holes/zero blocks more efficiently than the NBD online migration. This we should see in the traffic statistics.

What cache settings do you have for your VMs and ZFS? The output of zpool iostat -r might be interesting, too.

Hi @Dominic, I've searched a bit but cannot find how to de*crease ZFS send I/O priority. Can you help me out with the right terms to search, a link to an article, or even the command itself?

Many thanks
James

Snapp R · Mar 12, 2022

Is there any fix to this issue? I experienced the same thing when doing live migration.. the target node froze and the graph shows no activity for around 5 minutes (it might depend on the VM disk size).

this problem is kinda scary if it's in production, there's no bandwidth limit in WebGUI migrate option.

Code:

proxmox-ve: 7.1-1 (running kernel: 5.13.19-5-pve)
pve-manager: 7.1-10 (running version: 7.1-10/6ddebafe)
pve-kernel-helper: 7.1-12
pve-kernel-5.13: 7.1-8
pve-kernel-5.13.19-5-pve: 5.13.19-12
pve-kernel-5.13.19-2-pve: 5.13.19-4
ceph-fuse: 15.2.15-pve1
corosync: 3.1.5-pve2
criu: 3.15-1+pve-1
glusterfs-client: 9.2-1
ifupdown2: 3.1.0-1+pmx3
ksm-control-daemon: 1.4-1
libjs-extjs: 7.0.0-1
libknet1: 1.22-pve2
libproxmox-acme-perl: 1.4.1
libproxmox-backup-qemu0: 1.2.0-1
libpve-access-control: 7.1-6
libpve-apiclient-perl: 3.2-1
libpve-common-perl: 7.1-3
libpve-guest-common-perl: 4.1-1
libpve-http-server-perl: 4.1-1
libpve-storage-perl: 7.1-1
libspice-server1: 0.14.3-2.1
lvm2: 2.03.11-2.1
lxc-pve: 4.0.11-1
lxcfs: 4.0.11-pve1
novnc-pve: 1.3.0-2
proxmox-backup-client: 2.1.5-1
proxmox-backup-file-restore: 2.1.5-1
proxmox-mini-journalreader: 1.3-1
proxmox-widget-toolkit: 3.4-6
pve-cluster: 7.1-3
pve-container: 4.1-4
pve-docs: 7.1-2
pve-edk2-firmware: 3.20210831-2
pve-firewall: 4.2-5
pve-firmware: 3.3-5
pve-ha-manager: 3.3-3
pve-i18n: 2.6-2
pve-qemu-kvm: 6.1.1-2
pve-xtermjs: 4.16.0-1
qemu-server: 7.1-4
smartmontools: 7.2-1
spiceterm: 3.2-2
swtpm: 0.7.0~rc1+2
vncterm: 1.7-1
zfsutils-linux: 2.1.2-pve1

filipealvarez · Mar 15, 2022

Yes I'm too interested in this fix, it's a real pain this high IO migrating when discard enabled.

Live-Migration almost freezes Targetnode

Member

Proxmox Retired Staff

Member

Proxmox Retired Staff

Member

Member

Proxmox Retired Staff

Member

Attachments

Proxmox Retired Staff

Member

Proxmox Retired Staff

Member

Well-Known Member

Well-Known Member

New Member

Member

New Member

New Member

Member

Well-Known Member

We value your privacy