I have just converted a 2 node cluster to use the ceph file store, the migration seemed to go well with writing all the data onto ceph.
When it came time to fire up the VM's I noticed that they just sit there trying to start.
I created some backups on the cephfs (which worked fine) then I tried to debug why the VM's wouldn't start.
I noticed that on the ceph dashboard in proxmox I was getting around 10MB/s write speed (about expected) and 0B/s read speed.
So I looked to migrated the VM's back out to local disks and I found that there will be a short burst of read speed (up to 300MB/s) for about 2-3 seconds then back to 0B/s where it will just sit there trying to migrate the VM until I stop it.
There is not a lot of data on the ceph filestore (~700GB / 10% used)
I have tried doing backups, snapshots, migrating through the web gui with the same result.
I have also tried rbd export, copy from the command line.
I have also tried to rsync the backups from the cephfs with the same result, I has a burst of data then hangs.
recovery and re-balancing is working between the nodes at around 10-20MB/s
ceph health is showing 1 slow metadata IO
Hardware setup is
2 node cluster, 3 OSD's per node.
Proxmox versions
Any help would be appreciated.
At this point I would be happy to be able to read any data even if some of it is corrupted/missing.
When it came time to fire up the VM's I noticed that they just sit there trying to start.
I created some backups on the cephfs (which worked fine) then I tried to debug why the VM's wouldn't start.
I noticed that on the ceph dashboard in proxmox I was getting around 10MB/s write speed (about expected) and 0B/s read speed.
So I looked to migrated the VM's back out to local disks and I found that there will be a short burst of read speed (up to 300MB/s) for about 2-3 seconds then back to 0B/s where it will just sit there trying to migrate the VM until I stop it.
There is not a lot of data on the ceph filestore (~700GB / 10% used)
I have tried doing backups, snapshots, migrating through the web gui with the same result.
I have also tried rbd export, copy from the command line.
I have also tried to rsync the backups from the cephfs with the same result, I has a burst of data then hangs.
recovery and re-balancing is working between the nodes at around 10-20MB/s
ceph health is showing 1 slow metadata IO
Code:
mds.alpha(mds.0): 1 slow metadata IOs are blocked > 30 secs, oldest blocked for 72899 secs
Hardware setup is
2 node cluster, 3 OSD's per node.
Proxmox versions
Code:
proxmox-ve: 7.1-1 (running kernel: 5.13.19-2-pve)
pve-manager: 7.1-8 (running version: 7.1-8/5b267f33)
pve-kernel-helper: 7.1-6
pve-kernel-5.13: 7.1-5
pve-kernel-5.13.19-2-pve: 5.13.19-4
ceph: 16.2.7
ceph-fuse: 16.2.7
corosync: 3.1.5-pve2
criu: 3.15-1+pve-1
glusterfs-client: 9.2-1
ifupdown2: 3.1.0-1+pmx3
ksm-control-daemon: 1.4-1
libjs-extjs: 7.0.0-1
libknet1: 1.22-pve2
libproxmox-acme-perl: 1.4.0
libproxmox-backup-qemu0: 1.2.0-1
libpve-access-control: 7.1-5
libpve-apiclient-perl: 3.2-1
libpve-common-perl: 7.0-14
libpve-guest-common-perl: 4.0-3
libpve-http-server-perl: 4.0-4
libpve-storage-perl: 7.0-15
libspice-server1: 0.14.3-2.1
lvm2: 2.03.11-2.1
lxc-pve: 4.0.11-1
lxcfs: 4.0.11-pve1
novnc-pve: 1.3.0-1
proxmox-backup-client: 2.1.2-1
proxmox-backup-file-restore: 2.1.2-1
proxmox-mini-journalreader: 1.3-1
proxmox-widget-toolkit: 3.4-4
pve-cluster: 7.1-3
pve-container: 4.1-3
pve-docs: 7.1-2
pve-edk2-firmware: 3.20210831-2
pve-firewall: 4.2-5
pve-firmware: 3.3-4
pve-ha-manager: 3.3-1
pve-i18n: 2.6-2
pve-qemu-kvm: 6.1.0-3
pve-xtermjs: 4.12.0-1
qemu-server: 7.1-4
smartmontools: 7.2-1
spiceterm: 3.2-2
swtpm: 0.7.0~rc1+2
vncterm: 1.7-1
zfsutils-linux: 2.1.1-pve3
Any help would be appreciated.
At this point I would be happy to be able to read any data even if some of it is corrupted/missing.