3.4 Ceph snapshots hangs VM

brad.lanham

Active Member
Nov 26, 2015
9
1
41
Hi,

Firstly thanks for an excellent product!

I have a 3 cluster nodes that utilises ceph. Up until recently (say 1 month ago) snapshots have been working well. Now however, each time I perform a snapshot the VM hangs. To be more precise to me it appears that the HDD becomes ejected from the VM point of view.

To recover; a forced stop and start is needed. I should note that the integrity of the snapshots are valid.

An upgrade to version 4.0 is not an option just at the moment, but is desired later.

I am also using cephs own wheezy hammer repository. Could this be the issue?

Some PVE version details:

Code:
#pveversion -v
proxmox-ve-2.6.32: not correctly installed (running kernel: 3.10.0-12-pve)
pve-manager: 3.4-11 (running version: 3.4-11/6502936f)
pve-kernel-3.10.0-5-pve: 3.10.0-19
pve-kernel-3.10.0-12-pve: 3.10.0-37
lvm2: 2.02.98-pve4
clvm: 2.02.98-pve4
corosync-pve: 1.4.7-1
openais-pve: 1.1.4-3
libqb0: 0.11.1-2
redhat-cluster-pve: 3.2.0-2
resource-agents-pve: 3.9.2-4
fence-agents-pve: 4.0.10-3
pve-cluster: 3.0-19
qemu-server: 3.4-6
pve-firmware: 1.1-4
libpve-common-perl: 3.0-24
libpve-access-control: 3.0-16
libpve-storage-perl: 3.0-34
pve-libspice-server1: 0.12.4-3
vncterm: 1.1-8
vzctl: 4.0-1pve6
vzprocps: 2.0.11-2
vzquota: 3.1-2
pve-qemu-kvm: 2.2-13
ksm-control-daemon: 1.1-1
glusterfs-client: 3.5.2-1

Has anyone else experienced similar behaviour? Nothing seems to appear in any of the host node logs.

Cheers,
Brad.
 
Hi,

Firstly thanks for an excellent product!

I have a 3 cluster nodes that utilises ceph. Up until recently (say 1 month ago) snapshots have been working well. Now however, each time I perform a snapshot the VM hangs. To be more precise to me it appears that the HDD becomes ejected from the VM point of view.

To recover; a forced stop and start is needed. I should note that the integrity of the snapshots are valid.

An upgrade to version 4.0 is not an option just at the moment, but is desired later.

I am also using cephs own wheezy hammer repository. Could this be the issue?

Some PVE version details:

Code:
#pveversion -v
proxmox-ve-2.6.32: not correctly installed (running kernel: 3.10.0-12-pve)
pve-manager: 3.4-11 (running version: 3.4-11/6502936f)
pve-kernel-3.10.0-5-pve: 3.10.0-19
pve-kernel-3.10.0-12-pve: 3.10.0-37
lvm2: 2.02.98-pve4
clvm: 2.02.98-pve4
corosync-pve: 1.4.7-1
openais-pve: 1.1.4-3
libqb0: 0.11.1-2
redhat-cluster-pve: 3.2.0-2
resource-agents-pve: 3.9.2-4
fence-agents-pve: 4.0.10-3
pve-cluster: 3.0-19
qemu-server: 3.4-6
pve-firmware: 1.1-4
libpve-common-perl: 3.0-24
libpve-access-control: 3.0-16
libpve-storage-perl: 3.0-34
pve-libspice-server1: 0.12.4-3
vncterm: 1.1-8
vzctl: 4.0-1pve6
vzprocps: 2.0.11-2
vzquota: 3.1-2
pve-qemu-kvm: 2.2-13
ksm-control-daemon: 1.1-1
glusterfs-client: 3.5.2-1

Has anyone else experienced similar behaviour? Nothing seems to appear in any of the host node logs.

Cheers,
Brad.


Hi,

they are a bug in current ceph hammer, if you take snapshot and disk cache != writeback. (rbd_cache=false).
is it your case ?

My ceph bug report was here :
http://tracker.ceph.com/issues/13726

and already fixed in master, not yet backported to hammer.

(setting cache=writeback fix the problem)


Edit : it'll be fixed in hammer 0.94.6
 
Last edited:
Hi,

Yes, that does appear to be the cause. I modified the cache setting to 'writeback' and the snapshots now work with out crashing the VM.

Thank you!

Cheers,
Brad.
 
Yes, that does appear to be the cause. I modified the cache setting to 'writeback' and the snapshots now work with out crashing the VM.

But you should know that use "writeback" is dangerous if you have critical information, because the write cache is in the RAM, and this setup not write the data in the HDDs immediately (or in the RAID controller), so if you does a hard reset, or something similar, is sure that you will get a data loss.

they are a bug in current ceph hammer, if you take snapshot and disk cache != writeback. (rbd_cache=false).
is it your case ?

My ceph bug report was here :
http://tracker.ceph.com/issues/13726

and already fixed in master, not yet backported to hammer.

(setting cache=writeback fix the problem)

Hi Spirit

I am interested in know if when CEPH (or PVE) does a CEPH snapshot, the write cache is downloaded to the HDDs (or to the RAID controller) while the CEPH snapshot is in progress, else, we have a snapshot inconsistent, that it will be dangerous if the information is critical (this process is a standard in backups of Veeam Backup for VMware or Hyper-V).

Best regards
Cesar
 
Last edited:
But you should know that use "writeback" is dangerous if you have critical information, because the write cache is in the RAM, and this setup not write the data in the HDDs immediately (or in the RAID controller), so if you does a hard reset, or something similar, is sure that you will get a data loss.



Hi Spirit

I am interested in know if when CEPH (or PVE) does a CEPH snapshot, the write cache is downloaded to the HDDs (or to the RAID controller) while the CEPH snapshot is in progress, else, we have a snapshot inconsistent, that it will be dangerous if the information is critical (this process is a standard in backups of Veeam Backup for VMware or Hyper-V).

Best regards
Cesar

If you install qemu guest agent, and enable it in vm config (agent:1 ), it'll flush the datas and freeze the filesystem before doing the snapshot.

(qemu guest agent will be available in gui soon)
 
If you install qemu guest agent, and enable it in vm config (agent:1 ), it'll flush the datas and freeze the filesystem before doing the snapshot.

(qemu guest agent will be available in gui soon)

Oh, that's good :-) .... that it is as Veeam Backup works.

Please, let me to do some questions about it:

1) Is the agent as a stable version for use in production environment?
2) In what Windows systems versions works the agent? (i guess that since win-2k)
3) If the agent is not installed, i guess that the backup will do his work in mode "copy-on-write" with the problem that the backup will be inconsistent, right?
4) What are the requirements of space of disk available in the VM and in the Host for do a live backup?
5) If the VM is a Linux, as will work the live backup in terms of avoid the lose of data not written to the disks?

Best regards
Cesar
 
>>1) Is the agent as a stable version for use in production environment?
yes
>>2) In what Windows systems versions works the agent? (i guess that since win-2k)
it's available in virtio win iso

>>3) If the agent is not installed, i guess that the backup will do his work in mode "copy-on-write" with the problem that the backup will be inconsistent, right?
>>4) What are the requirements of space of disk available in the VM and in the Host for do a live backup?
>>5) If the VM is a Linux, as will work the live backup in terms of avoid the lose of data not written to the disks?

I was talking about snapshot, not backup.

for backup, proxmox don't do any snapshot, when backup start Ill keep a bitmap of disk map of backuped/non backuped blocks.
If a new write is coming when backup is running, which overrwrite a not yet backuped block, this old block will be write to the backup storage,
then the new block will replace it.

I'm not sure about the consistency, because if they are pending writes before backup start, they won't be writen to the backup.
 
I'm not sure about the consistency, because if they are pending writes before backup start, they won't be writen to the backup.

Thanks for the answers... :-)

Only as a comment...
In VMware, for do snapshots, first download the write cache to disk.
In VMware, for do backup snapshots, Veeam Backup first download the write cache to disk using the agent in Vmware tools and after does the backup.

In conclusion, the VMs can be Linux, Windows, Solaris etc., and always will have the same behaviour in both cases if the VMware tools is installed into the VM, so any kind of snapshot will be always consistent.

So, i would like see this behaviour always in PVE, and for the moment, as i understand, if i want a consistent snapshot or consistent backup snapshot, the only solution for me is configure the VM in mode "directsync" or "writethrough", and also disabling the file write cache into the operating system of the VM (always that the snapshot don't have the RAM included).... :-(

Best regards
Cesar
 
Last edited:
For snapshot, it's ok with the the agent.

for backup, it's not ok. but maybe implementing the qemu agent should work too.
(calling the agent to freeze the filesystem, start backup , unfreeze the filesystem . It should be enough I think).

I'll try to do tests this week.


Edit : Sorry, I was wrong, quest agent is also already implement for backups ! So it's consistent if you have agent:1.
 
Last edited:
Hi Spirit

Please, let me to do a questions:
- From what version of PVE can I configure the agent?
- In Linux VM, can I configure the agent?, and how?

Best regards
 
Hi Spirit

Please, let me to do a questions:
- From what version of PVE can I configure the agent?
proxmox 3.4 already support it, but not in gui. (you need to add agent:1 in vm conf file).
Proxmox 4.1 have it in guit

In Linux VM, can I configure the agent?, and how?
you need to install qemu-agent software. (for debian, apt-get install qemu-guest-agent)

That's all ;)
 
proxmox 3.4 already support it, but not in gui. (you need to add agent:1 in vm conf file).
Proxmox 4.1 have it in guit

you need to install qemu-agent software. (for debian, apt-get install qemu-guest-agent)

That's all ;)
Many thanks Spirit ... :-)
 
Hi All, Thanks for your input!
I would like to confirm that the very recent 0.94.7 version of ceph does indeed fix this issue.

Cheers,
Brad.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!