[SOLVED] what's the difference between Clone and Move Disk ? FS Freeze gets stuck in snapshot mode schedule backups

Spirog

Member
Jan 31, 2022
230
38
18
Chicago, IL
Hello I have another drive setup as a directory on Proxmox VE 7.1
I planned on using this drive as a backup drive. hdd only (not ssd)

the main server has 2 1TB SSD with raid.

here
Code:
proxmox-ve: 7.1-1 (running kernel: 5.13.19-6-pve)
pve-manager: 7.1-12 (running version: 7.1-12/b3c09de3)
pve-kernel-helper: 7.1-14
pve-kernel-5.13: 7.1-9
pve-kernel-5.4: 6.4-11
pve-kernel-5.13.19-6-pve: 5.13.19-15
pve-kernel-5.13.19-5-pve: 5.13.19-13
pve-kernel-5.13.19-2-pve: 5.13.19-4
pve-kernel-5.4.157-1-pve: 5.4.157-1
pve-kernel-5.4.73-1-pve: 5.4.73-1
ceph-fuse: 16.2.7
corosync: 3.1.5-pve2
criu: 3.15-1+pve-1
glusterfs-client: 9.2-1
ifupdown: 0.8.36+pve1
ksm-control-daemon: 1.4-1
libjs-extjs: 7.0.0-1
libknet1: 1.22-pve2
libproxmox-acme-perl: 1.4.1
libproxmox-backup-qemu0: 1.2.0-1
libpve-access-control: 7.1-7
libpve-apiclient-perl: 3.2-1
libpve-common-perl: 7.1-5
libpve-guest-common-perl: 4.1-1
libpve-http-server-perl: 4.1-1
libpve-storage-perl: 7.1-1
libqb0: 1.0.5-1
libspice-server1: 0.14.3-2.1
lvm2: 2.03.11-2.1
lxc-pve: 4.0.11-1
lxcfs: 4.0.11-pve1
novnc-pve: 1.3.0-2
proxmox-backup-client: 2.1.5-1
proxmox-backup-file-restore: 2.1.5-1
proxmox-mini-journalreader: 1.3-1
proxmox-widget-toolkit: 3.4-7
pve-cluster: 7.1-3
pve-container: 4.1-4
pve-docs: 7.1-2
pve-edk2-firmware: 3.20210831-2
pve-firewall: 4.2-5
pve-firmware: 3.3-6
pve-ha-manager: 3.3-3
pve-i18n: 2.6-2
pve-qemu-kvm: 6.1.1-2
pve-xtermjs: 4.16.0-1
qemu-server: 7.1-4
smartmontools: 7.2-pve2
spiceterm: 3.2-2
swtpm: 0.7.1~bpo11+1
vncterm: 1.7-1
zfsutils-linux: 2.1.4-pve1



I am testing Scheduled snapshot backups with a particular VM that has AlmaLinux and cPanel installed.
so i attempt to do a move disk for VM103 to the backup drive and select qcow2 instead of raw

but it gets to 93.04% and stalls.... IO is 5-12 % bouncing up and down..
this was like this for 30 mins and I finally canceled the move disk.

now I attempt to Clone to backup drive and select qcow2 instead of raw disk format.

( the reason is my scheduled snapshot backups freeze when having Qemu guest agent enabled. on this particular VM with cPanel )

so I am trying to test if I clone or Move and reformat with qcow2 will the scheduled backups work with snapshot and Qemu Guest Agent ON.

so my question is what is the difference between both of these options or are they the same ?

Clone?
Move Disk?

do they do the same thing or is there a difference ?


Thanks for your reply in advance,

Kind Regards,
Spiro
 
Clone creates a clone (copy) of the disk, while `Move Disk` moves the disk to a different storage.
A move will update the VM config so that the disk references the new storage.

So only use `Move Disk` if you actually want to move the whole VM to a different storage.

In your case, couldn't you simply create a backup of the VM and select the backup drive as backup storage?
 
In your case, couldn't you simply create a backup of the VM and select the backup drive as backup storage?
I was trying to change VM103 to qcow2 format. Then move it back to test schedule backups in snapshot mode. And see if it still freezes up and never completes.
EDIT: It finally backed up - in snapshot mode Via the schedule - BUT... took 5 hors plus from fs freeze to fs-thaw then 8 mins to complete

so its getting stuck at fs-freeze see image below

Screenshot 2022-04-12 111448.jpg


for some reason in snapshot mode for backup this happens to this VM.

Code:
agent: 1,fstrim_cloned_disks=1
boot: order=scsi0;ide2;net0
cores: 50
cpu: host
ide2: local:iso/AlmaLinux-8.5-x86_64-minimal.iso,media=cdrom
memory: 92160
meta: creation-qemu=6.1.0,ctime=1643453974
name: Alma-cP
net0: virtio=B6:0C:91:75:75:D3,bridge=vmbr0,firewall=1
numa: 0
onboot: 1
ostype: l26
scsi0: local-lvm:vm-103-disk-0,discard=on,size=500G
scsihw: virtio-scsi-pci
smbios1: uuid=96a1e770-46b1-463b-9eb1-ba5d0abfd6b7
sockets: 1
vmgenid: 8463c3dd-8e40-4b1d-90f1-b2baf8b2359c
#qmdump#map:scsi0:drive-scsi0:local-lvm:raw:

I noticed that this VM shows qemu 6.1.0
meta: creation-qemu=6.1.0,ctime=1643453974




but a newer VM I created uses 6.1.1 below:


Code:
agent: 1,fstrim_cloned_disks=1
boot: order=scsi0;ide2;net0
cores: 8
cpu: host
ide2: local:iso/AlmaLinux-8.5-x86_64-minimal.iso,media=cdrom
memory: 12288
meta: creation-qemu=6.1.1,ctime=1644877524
name: Alma-cP-greekvids
net0: virtio=96:64:9D:25:32:81,bridge=vmbr0,firewall=1
numa: 0
onboot: 1
ostype: l26
scsi0: local-lvm:vm-100-disk-0,discard=on,size=96G
scsihw: virtio-scsi-pci
smbios1: uuid=c1380626-e3bb-4fec-abd2-57bd94d48999
sockets: 1
vmgenid: 626d2b9d-570e-4781-ba02-c2be26e1d81c
#qmdump#map:scsi0:drive-scsi0:local-lvm:raw:

@mira could this be an issue with the different version that is making fs freeze take so long ?
should not freeze for 5 hours plus. when the other VM's only freeze for 2 seconds then thaw and start the backup.... literally

is there a way tp change that VM to 6.1.1 or do I have to create a new VM with Almalinux and install cPanel and then transfer my accounts over to test it ?

thanks so much
This is a production VM by the way ( server is hosted at a datacenter )
Spiro



Thanks
 
Last edited:
Does it write lots of data that has to be synced down to the disk?
fsfreeze basically does a sync to make sure the disk data is in a consistent state before the backup.
 
Does it write lots of data that has to be synced down to the disk?
fsfreeze basically does a sync to make sure the disk data is in a consistent state before the backup.
Hello thanks for the reply
Answer:
no its got really no traffic I just got it setup to go live. a couple weeks ago

so really no traffic at all. It is weird that this happens to this VM.


i had this issue before in pve prior to update in 7.1 then i reinstalled 2 vm a month ago and it seems those vm Work. It takes no time at all for those vm.

This is the only vm cause the issue so far.
So that is why I thought possibly the qemu 6.0 bs 6.1 showing in my code above might be the issue ? I will try now create new vm install AlmaLinux then cPanel then reupload my websites to cPanel from backups and try run the snapshot backup from scheduler and see if it still takes really long from fs freeze to fs thaw. Will report back shortly. I’m typing as I’m recreating the vm :)
Thanks
SPIRO
 
Does it write lots of data that has to be synced down to the disk?
fsfreeze basically does a sync to make sure the disk data is in a consistent state before the backup.
here I found a bunch of people with this issue. A lot more info than I am knowledgably able to understand, so maybe if you can look at these posts you can see what the issue is. it's reported to Qemu - maybe you can see it with your team and come up with something to fix this issue. still happens on my end as soon as I added 2 of my domains - websites redid scheduler for backups via snapshot mode and fs freeze is stuck.


thanks so much @mira :)
 
If it is related to the guest agent, did you try to disable it in the VM options?
 
If it is related to the guest agent, did you try to disable it in the VM options?
yes.. when disabled it works, but was trying to be able to use the snapshot mode in backups :( with Qemu guest agent.

let me ask, is there a con to disable Qemu guest agent when doing snapshot mode for backup?

is the back up not as good?
or can I use suspend or stop how will it be different than snapshot mode if disable Qemu guest agent ?

Thanks
Spiro
 
Last edited:
ok I found a couple solutions and the reason why this happens when you have cPanel installed... Not a fix but as I said they have a bug report opened for this issue on git for Qemu

Here is 3 solutions to work with.

When using the automated backup feature on a VPS which is running cPanel, you may experience cases when your VPS is stuck in backup status for a long time and may not be accessible. The root cause of this issue is when cPanel users use Jailed Shell access which creates a virtfs on your filesystem.

When you use snapshot mode with cPanel (and schedule automated backups or snapshots), the hypervisor communicates to your VM through the QEMU Guest Agent to freeze the filesystem on the VM before it takes the backup. This mechanism is there to ensure no writes happen to your disk while the backup is running and therefore ensures the consistency of the backup.

However, if Jailed Shell access is enabled, cPanel creates a virtfs which cannot be frozen in this way. It will therefore lock up and cause a kernel panic as soon as the hypervisor initiates a freeze on the VPS. There are three ways to avoid this from happening and we explore them below.

  1. Disable QEMU Guest Agent
  2. Do not allow Jailed Shell
  3. Disable /tmp partition security (not recommended by cPanel, but it is an option available)

Requirements​

  • cPanel installed on your server

Instructions​

Decide on which of the 3 options above you would wish to proceed with and follow the section of the guide that corresponds to your choice. You only have to do one of the three options.

Please choose carefully as they each have their pros and cons.

Disable QEMU Guest Agent​

Firstly, you need to check if QEMU Guest Agent is running on your server. You can check this with the following command:

systemctl status qemu-guest-agent
The status of the service is indicated next to “Active:”. If it is active/running then we need to stop the service and disable it from starting again in the future. You can do this with the following commands:

systemctl stop qemu-guest-agent
systemctl disable qemu-guest-agent

Switch from Jailed Shell to Normal Shell​

You can read about what is Jailed and Normal Shell here

To disable a jailed shell environment for all new and modified users, you will need to disable the jailshell by default option in WHM’s Tweak Settings interface (WHM >> Home >> Server Configuration >> Tweak Settings).

This option allows you to enable/disable the use of a jailed shell for new accounts and accounts that you subsequently edit in the following interfaces:

  1. WHM’s Modify an Account interface (WHM >> Home >> Account Functions >> Modify An Account).
  2. WHM’s Upgrade/Downgrade an Account interface (WHM >> Home >> Account Functions >> Upgrade/Downgrade An Account).
This option does not affect accounts that already exist on the server but that you have not edited in these interfaces.

To disable a jailed shell environment for a specific user, use WHM’s Manage Shell Access interface (WHM >> Home >> Account Functions >> Manage Shell Access).

Disable cPanel /tmp partition security​

Please note this is not recommended by cPanel and it is in your own risk. Should you wish to continue with this option, you can read the exact steps from the following cPanel page.


Thought I would Share this info above for anyone else facing this, and hopefully they team at Qemu will find a way to not have this issue with cpanel or Cloudlinux with qemu-guest-agent enabled.

Thanks MIra
I just tested ( using No Jail Shell for accounts in cPanel, I scheduled a backup via scheduler with snapshot mode and it worked perfectly like the other VM's)

so for now problem solved ( with this work around )

Thank you for your replies I appreciate you and Team ProxMox :)

Kind Regards,
Spiro
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!