VM friert bei Storage-Migration ein (Ceph)

Sourcenux

Member
Feb 3, 2020
34
1
13
30
Hallo zusammen,

wir betreiben einen Proxmox-Cluster bestehend aus mehreren Hosts mit Epyc 7702P + 512GB RAM + 2x 960GB SSDs (Boot) + 4x 3.84TB DC SSDs (Storage/Ceph).

Auf drei der Hosts habe ich nun Ceph (via GUI Installer) installiert. Die Hosts sind über einen 25Gb Switch redundant miteinander verbunden und die 3.84TB SSDs werden als OSDs für Ceph genutzt. Auf den Ceph-Hosts läuft ein aktuelles PVE 6.4-8. Die bisher genutzten Hosts liefen bisher auf 6.3-6 und wurden nun im Zuge der Ceph Installation aktualisiert und laufen damit auch auf 6.4-8. Bisher stellt einer der nicht-Ceph-Hosts den Speicherplatz via NFS (auf einem ZFSRaid10) bereit. Auf dem NFS-Speicher befinden sich aktuell die meisten VM-Images. Wenn man auf einem der aktualisierten Hosts eine Storage-Migration von der lokalen SSD oder vom NFS auf das Ceph (RBD) startet, dann friert die VM ein und lässt sich nur noch killen (qm stop geht bis zu 'using SIGKILL' herunter). Im Migration-Log werden folgende Meldungen gezeigt:
create full clone of drive scsi0 (remote-nvme:164/vm-164-disk-0.qcow2)
drive mirror is starting for drive-scsi0
drive-scsi0: Cancelling block job

Sobald man die VM gekillt hat und neustartet funktioniert der selbige Vorgang ohne Probleme. Bei einer Migration von lokaler SSD auf NFS tritt das Problem nicht auf. Der Host auf dem die VMs liegen, bei denen das Problem auftritt wurde nach dem Update auf 6.4 und der Installation von ceph nicht neugestartet (dazu müssen erst die VMs migriert werden..). Kennt jemand dieses Problem? VMs, die bereits auf dem NFS liegen lassen sich ohne Probleme auf einen anderen Host migrieren und anschließend auch auf das Ceph (vermutlich, weil bei der Migration der qemu Prozess neugestartet wird?).

Sobald man den hängenden Move-Disk Prozess über die GUI beendet wird folgendes geloggt:

Code:
VM 164 qmp command 'query-block-jobs' failed - interrupted by signal

Removing image: 1% complete...
Removing image: 2% complete...
Removing image: 3% complete...
Removing image: 4% complete...
Removing image: 5% complete...
Removing image: 6% complete...
Removing image: 7% complete...
Removing image: 8% complete...
Removing image: 9% complete...
Removing image: 10% complete...
Removing image: 11% complete...
Removing image: 12% complete...
Removing image: 13% complete...
Removing image: 14% complete...
Removing image: 15% complete...
Removing image: 16% complete...
Removing image: 17% complete...
Removing image: 18% complete...
Removing image: 19% complete...
Removing image: 20% complete...
Removing image: 21% complete...
Removing image: 22% complete...
Removing image: 23% complete...
Removing image: 24% complete...
Removing image: 25% complete...
Removing image: 26% complete...
Removing image: 27% complete...
Removing image: 28% complete...
Removing image: 29% complete...
Removing image: 30% complete...
Removing image: 31% complete...
Removing image: 32% complete...
Removing image: 33% complete...
Removing image: 34% complete...
Removing image: 35% complete...
Removing image: 36% complete...
Removing image: 37% complete...
Removing image: 38% complete...
Removing image: 39% complete...
Removing image: 40% complete...
Removing image: 41% complete...
Removing image: 42% complete...
Removing image: 43% complete...
Removing image: 44% complete...
Removing image: 45% complete...
Removing image: 46% complete...
Removing image: 47% complete...
Removing image: 48% complete...
Removing image: 49% complete...
Removing image: 50% complete...
Removing image: 51% complete...
Removing image: 52% complete...
Removing image: 53% complete...
Removing image: 54% complete...
Removing image: 55% complete...
Removing image: 56% complete...
Removing image: 57% complete...
Removing image: 58% complete...
Removing image: 59% complete...
Removing image: 60% complete...
Removing image: 61% complete...
Removing image: 62% complete...
Removing image: 63% complete...
Removing image: 64% complete...
Removing image: 65% complete...
Removing image: 66% complete...
Removing image: 67% complete...
Removing image: 68% complete...
Removing image: 69% complete...
Removing image: 70% complete...
Removing image: 71% complete...
Removing image: 72% complete...
Removing image: 73% complete...
Removing image: 74% complete...
Removing image: 75% complete...
Removing image: 76% complete...
Removing image: 77% complete...
Removing image: 78% complete...
Removing image: 79% complete...
Removing image: 80% complete...
Removing image: 81% complete...
Removing image: 82% complete...
Removing image: 83% complete...
Removing image: 84% complete...
Removing image: 85% complete...
Removing image: 86% complete...
Removing image: 87% complete...
Removing image: 88% complete...
Removing image: 89% complete...
Removing image: 90% complete...
Removing image: 91% complete...
Removing image: 92% complete...
Removing image: 93% complete...
Removing image: 94% complete...
Removing image: 95% complete...
Removing image: 96% complete...
Removing image: 97% complete...
Removing image: 98% complete...
Removing image: 99% complete...
Removing image: 100% complete...done.
TASK ERROR: storage migration failed: mirroring error: VM 164 qmp command 'drive-mirror' failed - got timeout

Ein über die GUI ausgelöster reset führt an dieser Stelle zu folgender Meldung:

Code:
TASK ERROR: VM 164 qmp command 'system_reset' failed - unable to connect to VM 164 qmp socket - timeout after 31 retries

Die VM wird dabei nicht zurückgesetzt.

Lediglich ein Stop funktioniert:

Code:
VM quit/powerdown failed - terminating now with SIGTERM
VM still running - terminating now with SIGKILL
TASK OK

pveversion -v:

Code:
proxmox-ve: 6.4-1 (running kernel: 5.4.106-1-pve)
pve-manager: 6.4-8 (running version: 6.4-8/185e14db)
pve-kernel-5.4: 6.4-3
pve-kernel-helper: 6.4-3
pve-kernel-5.4.119-1-pve: 5.4.119-1
pve-kernel-5.4.106-1-pve: 5.4.106-1
pve-kernel-5.4.73-1-pve: 5.4.73-1
ceph: 15.2.13-pve1~bpo10
ceph-fuse: 15.2.13-pve1~bpo10
corosync: 3.1.2-pve1
criu: 3.11-3
glusterfs-client: 5.5-3
ifupdown: residual config
ifupdown2: 3.0.0-1+pve3
ksm-control-daemon: 1.3-1
libjs-extjs: 6.0.1-10
libknet1: 1.20-pve1
libproxmox-acme-perl: 1.1.0
libproxmox-backup-qemu0: 1.0.3-1
libpve-access-control: 6.4-3
libpve-apiclient-perl: 3.1-3
libpve-common-perl: 6.4-3
libpve-guest-common-perl: 3.1-5
libpve-http-server-perl: 3.2-3
libpve-storage-perl: 6.4-1
libqb0: 1.0.5-1
libspice-server1: 0.14.2-4~pve6+1
lvm2: 2.03.02-pve4
lxc-pve: 4.0.6-2
lxcfs: 4.0.6-pve1
novnc-pve: 1.1.0-1
proxmox-backup-client: 1.1.10-1
proxmox-mini-journalreader: 1.1-1
proxmox-widget-toolkit: 2.5-6
pve-cluster: 6.4-1
pve-container: 3.3-5
pve-docs: 6.4-2
pve-edk2-firmware: 2.20200531-1
pve-firewall: 4.1-4
pve-firmware: 3.2-4
pve-ha-manager: 3.1-1
pve-i18n: 2.3-1
pve-qemu-kvm: 5.2.0-6
pve-xtermjs: 4.7.0-3
qemu-server: 6.4-2
smartmontools: 7.2-pve2
spiceterm: 3.1-1
vncterm: 1.6-2
zfsutils-linux: 2.0.4-pve1

qm config 164 (nach reboot & migration zu ceph):
Code:
agent: 1
boot: order=scsi0;ide2;net0
cores: 1
ide2: iso:iso/debian-10.6.0-amd64-netinst.iso,media=cdrom,size=349M
machine: q35
memory: 2048
name: dev-xx
net0: virtio=5E:88:63:FF:XX:XX,bridge=vmbr0,firewall=1
numa: 0
ostype: l26
scsi0: remote-ceph:vm-164-disk-0,size=40G,ssd=1
scsihw: virtio-scsi-pci
smbios1: uuid=76554c99-f50b-4fa8-b753-XXXX
sockets: 1
vmgenid: 35090c50-1d9e-4493-87a0-XXXX

Ähnliche VM vor der Ceph-Migration:

Code:
agent: 1
boot: order=scsi0;ide2;net0
cores: 1
ide2: iso:iso/debian-10.6.0-amd64-netinst.iso,media=cdrom,size=349M
machine: q35
memory: 2048
name: dev-as
net0: virtio=2A:BD:BC:8E:XX:XX,bridge=vmbr0,firewall=1
numa: 0
ostype: l26
scsi0: local-nvme:vm-163-disk-0,format=raw,size=40G,ssd=1
scsihw: virtio-scsi-pci
smbios1: uuid=345ba9f8-9ee7-4710-9411-XXXX
sockets: 1
vmgenid: 7b14bb54-be00-4412-868e-XXXX
 
Last edited:

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!