move disk on live VM causes cluster node to reboot

Binary Bandit

Well-Known Member
Dec 13, 2018
56
8
48
52
Hello all,

I'm trying to figure out why using move disk on a live VM causes one of our cluster nodes to reboot. We are looking to migrate a live VM from CEPH to LVM storage. The reason being that this will then enable us to live migrate the VM to a non-CEPH attached node.

When we do this things start off on the right track:
Code:
()
moving disk with snapshots, snapshots will not be moved!
create full clone of drive scsi0 (pool1:vm-100-disk-0)
  Logical volume "vm-100-disk-0" created.
drive mirror is starting for drive-scsi0

Unfortunately a few minutes into the move the node reboots.

Here's what the logs look like during the migration attempt:
Code:
Nov  3 08:18:00 ait9 systemd[1]: Starting Proxmox VE replication runner...
Nov  3 08:18:01 ait9 systemd[1]: pvesr.service: Succeeded.
Nov  3 08:18:01 ait9 systemd[1]: Finished Proxmox VE replication runner.
Nov  3 08:18:01 ait9 systemd[1]: pvesr.service: Consumed 1.121s CPU time.
Nov  3 08:18:05 ait9 pmxcfs[3177]: [status] notice: received log
Nov  3 08:18:07 ait9 pmxcfs[3177]: [status] notice: received log
Nov  3 08:18:11 ait9 pmxcfs[3177]: [status] notice: received log
Nov  3 08:18:12 ait9 systemd[1]: Created slice User Slice of UID 0.
Nov  3 08:18:12 ait9 systemd[1]: Starting User Runtime Directory /run/user/0...
Nov  3 08:18:12 ait9 systemd[1]: Finished User Runtime Directory /run/user/0.
Nov  3 08:18:12 ait9 systemd[1]: Starting User Manager for UID 0...
Nov  3 08:18:12 ait9 systemd[447417]: gpgconf: error running '/usr/lib/gnupg/scdaemon': probably not installed
Nov  3 08:18:12 ait9 systemd[447412]: Queued start job for default target Main User Target.
Nov  3 08:18:12 ait9 systemd[447412]: Created slice User Application Slice.
Nov  3 08:18:12 ait9 systemd[447412]: Reached target Paths.
Nov  3 08:18:12 ait9 systemd[447412]: Reached target Timers.
Nov  3 08:18:12 ait9 systemd[447412]: Listening on GnuPG network certificate management daemon.
Nov  3 08:18:12 ait9 systemd[447412]: Listening on GnuPG cryptographic agent and passphrase cache (access for web browsers).
Nov  3 08:18:12 ait9 systemd[447412]: Listening on GnuPG cryptographic agent and passphrase cache (restricted).
Nov  3 08:18:12 ait9 systemd[447412]: Listening on GnuPG cryptographic agent (ssh-agent emulation).
Nov  3 08:18:12 ait9 systemd[447412]: Listening on GnuPG cryptographic agent and passphrase cache.
Nov  3 08:18:12 ait9 systemd[447412]: Reached target Sockets.
Nov  3 08:18:12 ait9 systemd[447412]: Reached target Basic System.
Nov  3 08:18:12 ait9 systemd[447412]: Reached target Main User Target.
Nov  3 08:18:12 ait9 systemd[447412]: Startup finished in 262ms.
Nov  3 08:18:12 ait9 systemd[1]: Started User Manager for UID 0.
Nov  3 08:18:12 ait9 systemd[1]: Started Session 32 of user root.
Nov  3 08:18:12 ait9 systemd[1]: session-32.scope: Succeeded.
Nov  3 08:18:12 ait9 systemd[1]: Started Session 34 of user root.
Nov  3 08:18:14 ait9 qm[447443]: <root@pam> starting task UPID:ait9:0006D3D4:007AE0E8:6182A836:qmstart:100:root@pam:
Nov  3 08:18:14 ait9 qm[447444]: start VM 100: UPID:ait9:0006D3D4:007AE0E8:6182A836:qmstart:100:root@pam:
Nov  3 08:18:14 ait9 systemd[1]: Created slice qemu.slice.
Nov  3 08:18:14 ait9 systemd[1]: Started 100.scope.
Nov  3 08:18:14 ait9 systemd-udevd[447485]: Using default interface naming scheme 'v247'.
Nov  3 08:18:14 ait9 systemd-udevd[447485]: ethtool: autonegotiation is unset or enabled, the speed and duplex are not writable.
Nov  3 08:18:15 ait9 kernel: [80530.496406] device tap100i0 entered promiscuous mode
Nov  3 08:18:15 ait9 systemd-udevd[447484]: Using default interface naming scheme 'v247'.
Nov  3 08:18:15 ait9 systemd-udevd[447484]: ethtool: autonegotiation is unset or enabled, the speed and duplex are not writable.
Nov  3 08:18:15 ait9 systemd-udevd[447485]: ethtool: autonegotiation is unset or enabled, the speed and duplex are not writable.
Nov  3 08:18:15 ait9 systemd-udevd[447484]: ethtool: autonegotiation is unset or enabled, the speed and duplex are not writable.
Nov  3 08:18:15 ait9 kernel: [80530.575291] fwbr100i0: port 1(fwln100i0) entered blocking state
Nov  3 08:18:15 ait9 kernel: [80530.575300] fwbr100i0: port 1(fwln100i0) entered disabled state
Nov  3 08:18:15 ait9 kernel: [80530.575422] device fwln100i0 entered promiscuous mode
Nov  3 08:18:15 ait9 kernel: [80530.575521] fwbr100i0: port 1(fwln100i0) entered blocking state
Nov  3 08:18:15 ait9 kernel: [80530.575525] fwbr100i0: port 1(fwln100i0) entered forwarding state
Nov  3 08:18:15 ait9 kernel: [80530.583706] vmbr2: port 2(fwpr100p0) entered blocking state
Nov  3 08:18:15 ait9 kernel: [80530.583714] vmbr2: port 2(fwpr100p0) entered disabled state
Nov  3 08:18:15 ait9 kernel: [80530.583839] device fwpr100p0 entered promiscuous mode
Nov  3 08:18:15 ait9 kernel: [80530.583923] vmbr2: port 2(fwpr100p0) entered blocking state
Nov  3 08:18:15 ait9 kernel: [80530.583927] vmbr2: port 2(fwpr100p0) entered forwarding state
Nov  3 08:18:15 ait9 kernel: [80530.592069] fwbr100i0: port 2(tap100i0) entered blocking state
Nov  3 08:18:15 ait9 kernel: [80530.592077] fwbr100i0: port 2(tap100i0) entered disabled state
Nov  3 08:18:15 ait9 kernel: [80530.592246] fwbr100i0: port 2(tap100i0) entered blocking state
Nov  3 08:18:15 ait9 kernel: [80530.592251] fwbr100i0: port 2(tap100i0) entered forwarding state
Nov  3 08:18:15 ait9 qm[447443]: <root@pam> end task UPID:ait9:0006D3D4:007AE0E8:6182A836:qmstart:100:root@pam: OK
Nov  3 08:18:15 ait9 systemd[1]: session-34.scope: Succeeded.
Nov  3 08:18:15 ait9 systemd[1]: session-34.scope: Consumed 1.396s CPU time.
Nov  3 08:18:16 ait9 systemd[1]: Started Session 35 of user root.
Nov  3 08:18:43 ait9 systemd[1]: session-35.scope: Succeeded.
Nov  3 08:18:43 ait9 systemd[1]: session-35.scope: Consumed 21.635s CPU time.
Nov  3 08:18:45 ait9 systemd[1]: Started Session 36 of user root.
Nov  3 08:18:45 ait9 systemd[1]: session-36.scope: Succeeded.
Nov  3 08:18:46 ait9 systemd[1]: Started Session 37 of user root.
Nov  3 08:18:47 ait9 systemd[1]: session-37.scope: Succeeded.
Nov  3 08:18:47 ait9 systemd[1]: session-37.scope: Consumed 1.502s CPU time.
Nov  3 08:18:47 ait9 pmxcfs[3177]: [status] notice: received log
Nov  3 08:18:58 ait9 systemd[1]: Stopping User Manager for UID 0...
Nov  3 08:18:58 ait9 systemd[447412]: Stopped target Main User Target.
Nov  3 08:18:58 ait9 systemd[447412]: Stopped target Basic System.
Nov  3 08:18:58 ait9 systemd[447412]: Stopped target Paths.
Nov  3 08:18:58 ait9 systemd[447412]: Stopped target Sockets.
Nov  3 08:18:58 ait9 systemd[447412]: Stopped target Timers.
Nov  3 08:18:58 ait9 systemd[447412]: dirmngr.socket: Succeeded.
Nov  3 08:18:58 ait9 systemd[447412]: Closed GnuPG network certificate management daemon.
Nov  3 08:18:58 ait9 systemd[447412]: gpg-agent-browser.socket: Succeeded.
Nov  3 08:18:58 ait9 systemd[447412]: Closed GnuPG cryptographic agent and passphrase cache (access for web browsers).
Nov  3 08:18:58 ait9 systemd[447412]: gpg-agent-extra.socket: Succeeded.
Nov  3 08:18:58 ait9 systemd[447412]: Closed GnuPG cryptographic agent and passphrase cache (restricted).
Nov  3 08:18:58 ait9 systemd[447412]: gpg-agent-ssh.socket: Succeeded.
Nov  3 08:18:58 ait9 systemd[447412]: Closed GnuPG cryptographic agent (ssh-agent emulation).
Nov  3 08:18:58 ait9 systemd[447412]: gpg-agent.socket: Succeeded.
Nov  3 08:18:58 ait9 systemd[447412]: Closed GnuPG cryptographic agent and passphrase cache.
Nov  3 08:18:58 ait9 systemd[447412]: Removed slice User Application Slice.
Nov  3 08:18:58 ait9 systemd[447412]: Reached target Shutdown.
Nov  3 08:18:58 ait9 systemd[447412]: systemd-exit.service: Succeeded.
Nov  3 08:18:58 ait9 systemd[447412]: Finished Exit the Session.
Nov  3 08:18:58 ait9 systemd[447412]: Reached target Exit the Session.
Nov  3 08:18:58 ait9 systemd[1]: user@0.service: Succeeded.
Nov  3 08:18:58 ait9 systemd[1]: Stopped User Manager for UID 0.
Nov  3 08:18:58 ait9 systemd[1]: Stopping User Runtime Directory /run/user/0...
Nov  3 08:18:58 ait9 systemd[1]: run-user-0.mount: Succeeded.
Nov  3 08:18:58 ait9 systemd[1]: user-runtime-dir@0.service: Succeeded.
Nov  3 08:18:58 ait9 systemd[1]: Stopped User Runtime Directory /run/user/0.
Nov  3 08:18:58 ait9 systemd[1]: Removed slice User Slice of UID 0.
Nov  3 08:18:58 ait9 systemd[1]: user-0.slice: Consumed 24.892s CPU time.
Nov  3 08:19:00 ait9 systemd[1]: Starting Proxmox VE replication runner...

I'd appreciate any insights into this. It's the first time that we've tried to do this sort of migration so it could be that there is a better way, could be a misconfiguration or a bug.

It's worth noting that we can migrate if we shut the VM down first and don't do it live.

best,

James
 

Attachments

  • 1635954913938.png
    1635954913938.png
    19.1 KB · Views: 2
Hi All,

To keep with our timeline we're going to back up and restore from shared storage ... I'm not planning on troubleshooting this. Just an FYI to any posters trying to help.
best,
James
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!