Hi,
We're testing the new 8.3 version on our just upgraded test environment and we occur problems during the VM migrations.
An hardware machine is installed with Proxmox 8.3 up-to-date. Three VM are installed with 8.3 Proxmox too into a CEPH cluster for the shared storage. Three other VM are installed with 8.3 Proxmox too into a PVE cluster. This PVE cluster use the CEPH cluster for the RDB shared storage.
The test I did is to have 5 Debian 12 VM on the PVE cluster. I launched a CPU and memory stress on each VM. I tried to migrate it from PVE cluster members to others, once a time, and some of migration occured problems.
Here're the PVE logs of a failled migration:
Here're what I founded into the source PVE server system logs during the failled migrations:
Thanks for your help.
Fabien
We're testing the new 8.3 version on our just upgraded test environment and we occur problems during the VM migrations.
An hardware machine is installed with Proxmox 8.3 up-to-date. Three VM are installed with 8.3 Proxmox too into a CEPH cluster for the shared storage. Three other VM are installed with 8.3 Proxmox too into a PVE cluster. This PVE cluster use the CEPH cluster for the RDB shared storage.
The test I did is to have 5 Debian 12 VM on the PVE cluster. I launched a CPU and memory stress on each VM. I tried to migrate it from PVE cluster members to others, once a time, and some of migration occured problems.
Here're the PVE logs of a failled migration:
Code:
2024-11-25 15:34:45 starting migration of VM 103 to node 'test-pve-03' (10.0.0.2)
2024-11-25 15:34:45 starting VM 103 on remote node 'test-pve-03'
2024-11-25 15:34:49 start remote tunnel
2024-11-25 15:34:51 ssh tunnel ver 1
2024-11-25 15:34:51 starting online/live migration on unix:/run/qemu-server/103.migrate
2024-11-25 15:34:51 set migration capabilities
2024-11-25 15:34:51 migration downtime limit: 100 ms
2024-11-25 15:34:51 migration cachesize: 512.0 MiB
2024-11-25 15:34:51 set migration parameters
2024-11-25 15:34:51 start migrate command to unix:/run/qemu-server/103.migrate
2024-11-25 15:34:52 migration active, transferred 111.6 MiB of 3.0 GiB VM-state, 154.3 MiB/s
2024-11-25 15:34:53 migration active, transferred 216.4 MiB of 3.0 GiB VM-state, 562.0 MiB/s
2024-11-25 15:34:54 migration active, transferred 342.5 MiB of 3.0 GiB VM-state, 245.3 MiB/s
2024-11-25 15:34:55 migration active, transferred 452.6 MiB of 3.0 GiB VM-state, 350.1 MiB/s
2024-11-25 15:34:56 migration active, transferred 549.6 MiB of 3.0 GiB VM-state, 477.7 MiB/s
2024-11-25 15:34:58 migration active, transferred 694.6 MiB of 3.0 GiB VM-state, 243.2 MiB/s
query migrate failed: VM 103 not running
2024-11-25 15:34:59 query migrate failed: VM 103 not running
query migrate failed: VM 103 not running
2024-11-25 15:35:00 query migrate failed: VM 103 not running
query migrate failed: VM 103 not running
2024-11-25 15:35:01 query migrate failed: VM 103 not running
query migrate failed: VM 103 not running
2024-11-25 15:35:02 query migrate failed: VM 103 not running
query migrate failed: VM 103 not running
2024-11-25 15:35:03 query migrate failed: VM 103 not running
query migrate failed: VM 103 not running
2024-11-25 15:35:04 query migrate failed: VM 103 not running
2024-11-25 15:35:04 ERROR: online migrate failure - too many query migrate failures - aborting
2024-11-25 15:35:04 aborting phase 2 - cleanup resources
2024-11-25 15:35:04 migrate_cancel
2024-11-25 15:35:04 migrate_cancel error: VM 103 not running
2024-11-25 15:35:04 ERROR: query-status error: VM 103 not running
2024-11-25 15:35:08 ERROR: migration finished with problems (duration 00:00:23)
TASK ERROR: migration problems
Here're what I founded into the source PVE server system logs during the failled migrations:
Code:
2024-11-25T11:45:50.208525+01:00 test-pve-01 kernel: [ 3128.811536] kvm[17244]: segfault at 41b8 ip 00005afcf1cdbb00 sp 00007ca9043fff38 error 4 in qemu-system-x86_64[5afcf17f8000+6a4000] likely on CPU 1 (core 1, socket 0)
2024-11-25T11:46:14.828141+01:00 test-pve-01 kernel: [ 3153.430456] kvm[17025]: segfault at 41b8 ip 0000637f34093b00 sp 00007b86894fef38 error 4 in qemu-system-x86_64[637f33bb0000+6a4000] likely on CPU 3 (core 3, socket 0)
2024-11-25T13:26:03.043440+01:00 test-pve-01 kernel: [ 480.749017] kvm[2219]: segfault at 41b8 ip 000060ab8de25b00 sp 00007284055d5f38 error 4 in qemu-system-x86_64[60ab8d942000+6a4000] likely on CPU 2 (core 2, socket 0)
2024-11-25T13:26:28.676741+01:00 test-pve-01 kernel: [ 506.380955] kvm[2058]: segfault at 41b8 ip 00005df40925eb00 sp 000070abe59fff38 error 4 in qemu-system-x86_64[5df408d7b000+6a4000] likely on CPU 2 (core 2, socket 0)
2024-11-25T14:15:13.045343+01:00 test-pve-01 kernel: [ 313.101418] kvm[1829]: segfault at 41b8 ip 000060607b764b00 sp 00007b33f9fe1f38 error 4 in qemu-system-x86_64[60607b281000+6a4000] likely on CPU 2 (core 2, socket 0)
2024-11-25T14:39:18.392760+01:00 test-pve-01 kernel: [ 676.009453] kvm[2609]: segfault at 41b8 ip 00005b71bdecbb00 sp 00007222f02ccf38 error 4 in qemu-system-x86_64[5b71bd9e8000+6a4000] likely on CPU 1 (core 1, socket 0)
2024-11-25T15:34:59.208184+01:00 test-pve-01 kernel: [ 1373.633994] kvm[7867]: segfault at 41b8 ip 00005817d7fcfb00 sp 00007979361fff38 error 4 in qemu-system-x86_64[80eb00,5817d7aec000+6a4000] likely on CPU 0 (core 0, socket 0)
Thanks for your help.
Fabien