Busy web host with 3 ZFS volumes. It migrates with OK status and the VM is seemingly running fine before and after migration. ZFS has set sync=standard. All volumes are replicated to all hosts, so zfs snapshot takes only a little time.
After some hours, one volume is mounted read-only because of errors. As setting up a new machine, I moved the VM around and it happened 2/2 times. Other less busy VMs migrate fine. Can anything be done here? Maybe setting sync=always before live migration would help?
In VM:
ZFS shows no issues anywhere.
Migration log:
Version info:
After some hours, one volume is mounted read-only because of errors. As setting up a new machine, I moved the VM around and it happened 2/2 times. Other less busy VMs migrate fine. Can anything be done here? Maybe setting sync=always before live migration would help?
In VM:
Code:
Apr 24 22:22:35 xxx kernel: EXT4-fs error (device sdb1): ext4_lookup:1442: inode #19006119: comm rsync: deleted inode referenced: 23650227
Apr 24 22:22:35 xxx kernel: Aborting journal on device sdb1-8.
Apr 24 22:22:35 xxx kernel: EXT4-fs (sdb1): Remounting filesystem read-only
ZFS shows no issues anywhere.
Migration log:
Code:
Header
Proxmox
Virtual Environment 8.3.5
Virtual Machine 113 (xxx) on node 'pve2'
No Tags
Logs
()
2025-04-24 19:46:33 use dedicated network address for sending migration traffic (10.10.10.117)
2025-04-24 19:46:33 starting migration of VM 113 to node 'pve2' (10.10.10.117)
2025-04-24 19:46:33 found local, replicated disk 'storage:vm-113-disk-0' (attached)
2025-04-24 19:46:33 found local, replicated disk 'storage:vm-113-disk-1' (attached)
2025-04-24 19:46:33 found local, replicated disk 'storage:vm-113-disk-2' (attached)
2025-04-24 19:46:34 scsi0: start tracking writes using block-dirty-bitmap 'repl_scsi0'
2025-04-24 19:46:34 scsi2: start tracking writes using block-dirty-bitmap 'repl_scsi2'
2025-04-24 19:46:34 scsi3: start tracking writes using block-dirty-bitmap 'repl_scsi3'
2025-04-24 19:46:34 replicating disk images
2025-04-24 19:46:34 start replication job
2025-04-24 19:46:34 guest => VM 113, running => 1093821
2025-04-24 19:46:34 volumes => storage:vm-113-disk-0,storage:vm-113-disk-1,storage:vm-113-disk-2
2025-04-24 19:46:36 create snapshot '__replicate_113-0_1745516794__' on storage:vm-113-disk-0
2025-04-24 19:46:36 create snapshot '__replicate_113-0_1745516794__' on storage:vm-113-disk-1
2025-04-24 19:46:36 create snapshot '__replicate_113-0_1745516794__' on storage:vm-113-disk-2
2025-04-24 19:46:36 using insecure transmission, rate limit: none
2025-04-24 19:46:36 incremental sync 'storage:vm-113-disk-0' (__replicate_113-0_1745516709__ => __replicate_113-0_1745516794__)
2025-04-24 19:46:39 send from @__replicate_113-0_1745516709__ to storage/vm-113-disk-0@__replicate_113-0_1745516794__ estimated size is 103M
2025-04-24 19:46:39 total estimated size is 103M
2025-04-24 19:46:39 TIME SENT SNAPSHOT storage/vm-113-disk-0@__replicate_113-0_1745516794__
2025-04-24 19:46:40 [pve2] successfully imported 'storage:vm-113-disk-0'
2025-04-24 19:46:40 incremental sync 'storage:vm-113-disk-1' (__replicate_113-0_1745516709__ => __replicate_113-0_1745516794__)
2025-04-24 19:46:43 send from @__replicate_113-0_1745516709__ to storage/vm-113-disk-1@__replicate_113-0_1745516794__ estimated size is 56.4M
2025-04-24 19:46:43 total estimated size is 56.4M
2025-04-24 19:46:43 TIME SENT SNAPSHOT storage/vm-113-disk-1@__replicate_113-0_1745516794__
2025-04-24 19:46:43 [pve2] successfully imported 'storage:vm-113-disk-1'
2025-04-24 19:46:43 incremental sync 'storage:vm-113-disk-2' (__replicate_113-0_1745516709__ => __replicate_113-0_1745516794__)
2025-04-24 19:46:47 send from @__replicate_113-0_1745516709__ to storage/vm-113-disk-2@__replicate_113-0_1745516794__ estimated size is 252M
2025-04-24 19:46:47 total estimated size is 252M
2025-04-24 19:46:47 TIME SENT SNAPSHOT storage/vm-113-disk-2@__replicate_113-0_1745516794__
2025-04-24 19:46:48 [pve2] successfully imported 'storage:vm-113-disk-2'
2025-04-24 19:46:48 delete previous replication snapshot '__replicate_113-0_1745516709__' on storage:vm-113-disk-0
2025-04-24 19:46:48 delete previous replication snapshot '__replicate_113-0_1745516709__' on storage:vm-113-disk-1
2025-04-24 19:46:48 delete previous replication snapshot '__replicate_113-0_1745516709__' on storage:vm-113-disk-2
2025-04-24 19:46:49 (remote_finalize_local_job) delete stale replication snapshot '__replicate_113-0_1745516709__' on storage:vm-113-disk-0
2025-04-24 19:46:49 (remote_finalize_local_job) delete stale replication snapshot '__replicate_113-0_1745516709__' on storage:vm-113-disk-1
2025-04-24 19:46:49 (remote_finalize_local_job) delete stale replication snapshot '__replicate_113-0_1745516709__' on storage:vm-113-disk-2
2025-04-24 19:46:50 end replication job
2025-04-24 19:46:50 starting VM 113 on remote node 'pve2'
2025-04-24 19:46:53 volume 'storage:vm-113-disk-0' is 'storage:vm-113-disk-0' on the target
2025-04-24 19:46:54 volume 'storage:vm-113-disk-1' is 'storage:vm-113-disk-1' on the target
2025-04-24 19:46:54 volume 'storage:vm-113-disk-2' is 'storage:vm-113-disk-2' on the target
2025-04-24 19:46:54 start remote tunnel
2025-04-24 19:46:55 ssh tunnel ver 1
2025-04-24 19:46:55 starting storage migration
2025-04-24 19:46:55 scsi2: start migration to nbd:10.10.10.117:60001:exportname=drive-scsi2
drive mirror re-using dirty bitmap 'repl_scsi2'
drive mirror is starting for drive-scsi2 with bandwidth limit: 10240000 KB/s
drive-scsi2: transferred 11.0 MiB of 21.9 MiB (50.14%) in 0s
drive-scsi2: transferred 22.0 MiB of 22.0 MiB (100.00%) in 1s, ready
all 'mirror' jobs are ready
2025-04-24 19:46:56 scsi3: start migration to nbd:10.10.10.117:60001:exportname=drive-scsi3
drive mirror re-using dirty bitmap 'repl_scsi3'
drive mirror is starting for drive-scsi3 with bandwidth limit: 10240000 KB/s
drive-scsi3: transferred 25.8 MiB of 97.9 MiB (26.31%) in 0s
drive-scsi3: transferred 98.0 MiB of 98.0 MiB (100.00%) in 1s, ready
all 'mirror' jobs are ready
2025-04-24 19:46:57 scsi0: start migration to nbd:10.10.10.117:60001:exportname=drive-scsi0
drive mirror re-using dirty bitmap 'repl_scsi0'
drive mirror is starting for drive-scsi0 with bandwidth limit: 10240000 KB/s
drive-scsi0: transferred 320.0 KiB of 59.8 MiB (0.52%) in 0s
drive-scsi0: transferred 60.6 MiB of 60.6 MiB (100.00%) in 1s, ready
all 'mirror' jobs are ready
2025-04-24 19:46:58 starting online/live migration on tcp:10.10.10.117:60000
2025-04-24 19:46:58 set migration capabilities
2025-04-24 19:46:58 migration speed limit: 9.8 GiB/s
2025-04-24 19:46:58 migration downtime limit: 100 ms
2025-04-24 19:46:58 migration cachesize: 8.0 GiB
2025-04-24 19:46:58 set migration parameters
2025-04-24 19:46:58 start migrate command to tcp:10.10.10.117:60000
2025-04-24 19:46:59 migration active, transferred 244.7 MiB of 50.0 GiB VM-state, 350.1 MiB/s
2025-04-24 19:47:00 migration active, transferred 541.6 MiB of 50.0 GiB VM-state, 302.0 MiB/s
2025-04-24 19:47:01 migration active, transferred 874.8 MiB of 50.0 GiB VM-state, 270.0 MiB/s
2025-04-24 19:47:02 migration active, transferred 1.2 GiB of 50.0 GiB VM-state, 328.6 MiB/s
2025-04-24 19:47:03 migration active, transferred 1.5 GiB of 50.0 GiB VM-state, 345.3 MiB/s
2025-04-24 19:47:04 migration active, transferred 1.7 GiB of 50.0 GiB VM-state, 300.6 MiB/s
[snip ... ]
[snip ... ]
2025-04-24 19:49:30 migration active, transferred 51.0 GiB of 50.0 GiB VM-state, 441.2 MiB/s
2025-04-24 19:49:31 migration active, transferred 51.4 GiB of 50.0 GiB VM-state, 456.4 MiB/s
2025-04-24 19:49:32 migration active, transferred 51.8 GiB of 50.0 GiB VM-state, 473.1 MiB/s
2025-04-24 19:49:33 migration active, transferred 52.3 GiB of 50.0 GiB VM-state, 441.8 MiB/s
2025-04-24 19:49:34 migration active, transferred 52.5 GiB of 50.0 GiB VM-state, 309.4 MiB/s
2025-04-24 19:49:34 xbzrle: send updates to 53413 pages in 114.6 MiB encoded memory, cache-miss 98.87%, overflow 4347
2025-04-24 19:49:35 migration active, transferred 52.7 GiB of 50.0 GiB VM-state, 376.0 MiB/s
2025-04-24 19:49:35 xbzrle: send updates to 141473 pages in 279.7 MiB encoded memory, cache-miss 98.87%, overflow 11287
2025-04-24 19:49:36 migration active, transferred 53.0 GiB of 50.0 GiB VM-state, 365.8 MiB/s
2025-04-24 19:49:36 xbzrle: send updates to 202843 pages in 405.5 MiB encoded memory, cache-miss 98.87%, overflow 17005
2025-04-24 19:49:37 migration active, transferred 53.4 GiB of 50.0 GiB VM-state, 715.1 MiB/s
2025-04-24 19:49:37 xbzrle: send updates to 288209 pages in 451.1 MiB encoded memory, cache-miss 98.87%, overflow 19434
2025-04-24 19:49:38 migration active, transferred 53.7 GiB of 50.0 GiB VM-state, 397.6 MiB/s
2025-04-24 19:49:38 xbzrle: send updates to 318708 pages in 514.2 MiB encoded memory, cache-miss 98.87%, overflow 23483
2025-04-24 19:49:39 migration active, transferred 54.0 GiB of 50.0 GiB VM-state, 372.3 MiB/s
2025-04-24 19:49:39 xbzrle: send updates to 367155 pages in 570.5 MiB encoded memory, cache-miss 50.16%, overflow 26584
2025-04-24 19:49:40 migration active, transferred 54.3 GiB of 50.0 GiB VM-state, 472.9 MiB/s
2025-04-24 19:49:40 xbzrle: send updates to 464339 pages in 638.7 MiB encoded memory, cache-miss 50.16%, overflow 29037
2025-04-24 19:49:42 migration active, transferred 54.7 GiB of 50.0 GiB VM-state, 374.9 MiB/s
2025-04-24 19:49:42 xbzrle: send updates to 539857 pages in 691.8 MiB encoded memory, cache-miss 50.16%, overflow 33014
2025-04-24 19:49:43 migration active, transferred 54.9 GiB of 50.0 GiB VM-state, 979.7 MiB/s
2025-04-24 19:49:43 xbzrle: send updates to 657579 pages in 767.3 MiB encoded memory, cache-miss 47.82%, overflow 35364
2025-04-24 19:49:44 migration active, transferred 55.2 GiB of 50.0 GiB VM-state, 477.5 MiB/s
2025-04-24 19:49:44 xbzrle: send updates to 754413 pages in 864.4 MiB encoded memory, cache-miss 37.55%, overflow 39128
2025-04-24 19:49:45 auto-increased downtime to continue migration: 200 ms
2025-04-24 19:49:45 migration active, transferred 55.4 GiB of 50.0 GiB VM-state, 427.2 MiB/s
2025-04-24 19:49:45 xbzrle: send updates to 885805 pages in 954.2 MiB encoded memory, cache-miss 24.63%, overflow 43453
2025-04-24 19:49:46 migration active, transferred 55.6 GiB of 50.0 GiB VM-state, 542.4 MiB/s
2025-04-24 19:49:46 xbzrle: send updates to 1025560 pages in 1.0 GiB encoded memory, cache-miss 21.88%, overflow 48909
2025-04-24 19:49:47 migration active, transferred 55.8 GiB of 50.0 GiB VM-state, 588.1 MiB/s, VM dirties lots of memory: 600.0 MiB/s
2025-04-24 19:49:47 xbzrle: send updates to 1202711 pages in 1.1 GiB encoded memory, cache-miss 18.01%, overflow 53232
2025-04-24 19:49:47 auto-increased downtime to continue migration: 400 ms
2025-04-24 19:49:48 migration active, transferred 55.9 GiB of 50.0 GiB VM-state, 770.2 MiB/s
2025-04-24 19:49:48 xbzrle: send updates to 1402794 pages in 1.1 GiB encoded memory, cache-miss 8.87%, overflow 58743
2025-04-24 19:49:49 average migration speed: 299.5 MiB/s - downtime 325 ms
2025-04-24 19:49:49 migration status: completed
all 'mirror' jobs are ready
drive-scsi0: Completing block job_id...
drive-scsi0: Completed successfully.
drive-scsi2: Completing block job_id...
drive-scsi2: Completed successfully.
drive-scsi3: Completing block job_id...
drive-scsi3: Completed successfully.
drive-scsi0: mirror-job finished
drive-scsi2: mirror-job finished
drive-scsi3: mirror-job finished
2025-04-24 19:49:50 # /usr/bin/ssh -e none -o 'BatchMode=yes' -o 'HostKeyAlias=pve2' root@10.10.10.117 pvesr set-state 113 \''{"local/pve1":{"storeid_list":["storage"],"last_try":1745516794,"fail_count":0,"last_sync":1745516794,"duration":15.936542,"last_iteration":1745516794,"last_node":"pve1"},"local/pve3":{"last_iteration":1745510407,"last_node":"pve1","last_try":1745510431,"fail_count":0,"last_sync":1745510431,"duration":44.758007,"storeid_list":["storage"]}}'\'
2025-04-24 19:49:53 stopping NBD storage migration server on target.
2025-04-24 19:50:13 migration finished successfully (duration 00:03:41)
TASK OK
Version info:
Code:
proxmox-ve: 8.3.0 (running kernel: 6.8.12-9-pve)
pve-manager: 8.3.5 (running version: 8.3.5/dac3aa88bac3f300)
proxmox-kernel-helper: 8.1.1
pve-kernel-5.15: 7.4-9
proxmox-kernel-6.8: 6.8.12-9
proxmox-kernel-6.8.12-9-pve-signed: 6.8.12-9
proxmox-kernel-6.5.13-6-pve-signed: 6.5.13-6
proxmox-kernel-6.5: 6.5.13-6
proxmox-kernel-6.5.11-7-pve-signed: 6.5.11-7
pve-kernel-5.4: 6.4-20
pve-kernel-5.15.131-2-pve: 5.15.131-3
pve-kernel-5.15.107-2-pve: 5.15.107-2
pve-kernel-5.4.203-1-pve: 5.4.203-1
pve-kernel-4.13.13-5-pve: 4.13.13-38
pve-kernel-4.13.13-2-pve: 4.13.13-33
ceph-fuse: 16.2.15+ds-0+deb12u1
corosync: 3.1.9-pve1
criu: 3.17.1-2+deb12u1
glusterfs-client: 10.3-5
ifupdown: residual config
ifupdown2: 3.2.0-1+pmx11
ksm-control-daemon: 1.5-1
libjs-extjs: 7.0.0-5
libknet1: 1.30-pve1
libproxmox-acme-perl: 1.6.0
libproxmox-backup-qemu0: 1.5.1
libproxmox-rs-perl: 0.3.5
libpve-access-control: 8.2.1
libpve-apiclient-perl: 3.3.2
libpve-cluster-api-perl: 8.1.0
libpve-cluster-perl: 8.1.0
libpve-common-perl: 8.3.0
libpve-guest-common-perl: 5.2.0
libpve-http-server-perl: 5.2.0
libpve-network-perl: 0.10.1
libpve-rs-perl: 0.9.3
libpve-storage-perl: 8.3.5
libqb0: 1.0.5-1
libspice-server1: 0.15.1-1
lvm2: 2.03.16-2
lxc-pve: 6.0.0-1
lxcfs: 6.0.0-pve2
novnc-pve: 1.6.0-2
openvswitch-switch: 3.1.0-2+deb12u1
proxmox-backup-client: 3.3.7-1
proxmox-backup-file-restore: 3.3.7-1
proxmox-firewall: 0.6.0
proxmox-kernel-helper: 8.1.1
proxmox-mail-forward: 0.3.1
proxmox-mini-journalreader: 1.4.0
proxmox-offline-mirror-helper: 0.6.7
proxmox-widget-toolkit: 4.3.8
pve-cluster: 8.1.0
pve-container: 5.2.5
pve-docs: 8.3.1
pve-edk2-firmware: 4.2025.02-3
pve-esxi-import-tools: 0.7.2
pve-firewall: 5.1.0
pve-firmware: 3.15-3
pve-ha-manager: 4.0.6
pve-i18n: 3.4.1
pve-qemu-kvm: 9.2.0-5
pve-xtermjs: 5.5.0-1
pve-zsync: 2.3.1
qemu-server: 8.3.10
smartmontools: 7.3-pve1
spiceterm: 3.3.0
swtpm: 0.8.0+pve1
vncterm: 1.8.0
zfsutils-linux: 2.2.7-pve2
Last edited: