Hello !
I have a proxmox cluster with 3 machines on which are installed Ceph.
So i have configured one ceph pool with 3 osds, one on each nodes. After i have created a rbd storage over ceph pool. I have created about ten virtual machines which use this rbd storage for their disks.
All works fine until this morning.
At one point, all my virtual machines began to restart without apparent reasons. But since, data on their disks have been removed. Now, there is no bootable disks when I try to start my virtual machines. I don't why and I don't how this happened.
Please, there is a way to recover or prevent this problem ? Somebody has a similar problem ?
This is very uncomfortable because I planned to use this in production environment but this kind of thing is very annoying ...
Thank you !
Some outputs from command lines :
This next is suspicious ?
Versions :
Tell me if you need more details
I have a proxmox cluster with 3 machines on which are installed Ceph.
So i have configured one ceph pool with 3 osds, one on each nodes. After i have created a rbd storage over ceph pool. I have created about ten virtual machines which use this rbd storage for their disks.
All works fine until this morning.
At one point, all my virtual machines began to restart without apparent reasons. But since, data on their disks have been removed. Now, there is no bootable disks when I try to start my virtual machines. I don't why and I don't how this happened.
Please, there is a way to recover or prevent this problem ? Somebody has a similar problem ?
This is very uncomfortable because I planned to use this in production environment but this kind of thing is very annoying ...
Thank you !
Some outputs from command lines :
Code:
root@be1proxmox1:~# cat /etc/pve/storage.cfg
rbd: ceph-vm
monhost 10.53.100.211 10.53.100.212 10.53.100.213
content images
krbd 0
pool ceph-vm
dir: local
path /var/lib/vz
content images,rootdir,vztmpl,iso
maxfiles 0
Code:
root@be1proxmox1:~# rbd ls ceph-vm
vm-100-disk-1
vm-101-disk-1
vm-102-disk-1
vm-103-disk-1
vm-104-disk-1
vm-105-disk-1
vm-106-disk-2
Code:
root@be1proxmox1:~# ceph -s
cluster:
id: 4e1fa8c2-ed36-46a4-9830-04d5575cc97f
health: HEALTH_OK
services:
mon: 3 daemons, quorum be1proxmox1,be1proxmox2,be1proxmox3
mgr: be1proxmox1(active), standbys: be1proxmox3, be1proxmox2
osd: 3 osds: 3 up, 3 in
data:
pools: 1 pools, 128 pgs
objects: 31480 objects, 122 GB
usage: 368 GB used, 4389 GB / 4758 GB avail
pgs: 128 active+clean
io:
client: 810 kB/s wr, 0 op/s rd, 102 op/s wr
This next is suspicious ?
Code:
Sep 18 12:13:11 be1proxmox1 sh[3227]: mount: Mounting /dev/disk/by-partuuid/80cb587b-94c6-47c7-b22a-855859bec57d on /var/lib/ceph/tmp/mnt.sjeO6z with options noatime,inode64
Sep 18 12:13:11 be1proxmox1 sh[3227]: command_check_call: Running command: /bin/mount -t xfs -o noatime,inode64 -- /dev/disk/by-partuuid/80cb587b-94c6-47c7-b22a-855859bec57d /var/lib/ceph/tmp/mnt.sjeO6z
Sep 18 12:13:11 be1proxmox1 sh[3227]: File "/usr/sbin/ceph-disk", line 11, in <module>
Sep 18 12:13:11 be1proxmox1 sh[3227]: load_entry_point('ceph-disk==1.0.0', 'console_scripts', 'ceph-disk')()
Sep 18 12:13:11 be1proxmox1 sh[3227]: File "/usr/lib/python2.7/dist-packages/ceph_disk/main.py", line 5699, in run
Sep 18 12:13:11 be1proxmox1 sh[3227]: File "/usr/lib/python2.7/dist-packages/ceph_disk/main.py", line 5650, in main
Sep 18 12:13:11 be1proxmox1 sh[3227]: File "/usr/lib/python2.7/dist-packages/ceph_disk/main.py", line 5400, in <lambda>
Sep 18 12:13:11 be1proxmox1 sh[3227]: File "/usr/lib/python2.7/dist-packages/ceph_disk/main.py", line 4141, in main_activate_space
Sep 18 12:13:11 be1proxmox1 sh[3227]: File "/usr/lib/python2.7/dist-packages/ceph_disk/main.py", line 3494, in mount_activate
Sep 18 12:13:11 be1proxmox1 sh[3227]: File "/usr/lib/python2.7/dist-packages/ceph_disk/main.py", line 1446, in mount
Sep 18 12:13:11 be1proxmox1 sh[3227]: ceph_disk.main.MountError: Mounting filesystem failed: Command '['/bin/mount', '-t', u'xfs', '-o', 'noatime,inode64', '--', '/dev/disk/by-partuuid/80cb587b-94c6-47c7-b22a-855859bec57d', '/var/lib/ceph/tmp/mnt.sjeO6z']' returned non-zero exit status 32
Sep 18 12:13:11 be1proxmox1 sh[3227]: File "/usr/sbin/ceph-disk", line 11, in <module>
Sep 18 12:13:11 be1proxmox1 sh[3227]: load_entry_point('ceph-disk==1.0.0', 'console_scripts', 'ceph-disk')()
Sep 18 12:13:11 be1proxmox1 sh[3227]: File "/usr/lib/python2.7/dist-packages/ceph_disk/main.py", line 5699, in run
Sep 18 12:13:11 be1proxmox1 sh[3227]: File "/usr/lib/python2.7/dist-packages/ceph_disk/main.py", line 5650, in main
Sep 18 12:13:11 be1proxmox1 sh[3227]: File "/usr/lib/python2.7/dist-packages/ceph_disk/main.py", line 4853, in main_trigger
Sep 18 12:13:11 be1proxmox1 sh[3227]: ceph_disk.main.Error: Error: return code 1
Sep 18 12:13:11 be1proxmox1 systemd[1]: ceph-disk@dev-sdc2.service: Main process exited, code=exited, status=1/FAILURE
Sep 18 12:13:11 be1proxmox1 systemd[1]: ceph-disk@dev-sdc2.service: Unit entered failed state.
Sep 18 12:13:11 be1proxmox1 systemd[1]: ceph-disk@dev-sdc2.service: Failed with result 'exit-code'.
Sep 18 12:13:11 be1proxmox1 sh[3221]: command: Running command: /usr/bin/ceph-conf --cluster=ceph --name=osd. --lookup osd_mount_options_xfs
Sep 18 12:13:11 be1proxmox1 sh[3221]: command: Running command: /usr/bin/ceph-conf --cluster=ceph --name=osd. --lookup osd_fs_mount_options_xfs
Sep 18 12:13:11 be1proxmox1 sh[3221]: mount: Mounting /dev/sdc1 on /var/lib/ceph/tmp/mnt.f5aEho with options noatime,inode64
Sep 18 12:13:11 be1proxmox1 sh[3221]: command_check_call: Running command: /bin/mount -t xfs -o noatime,inode64 -- /dev/sdc1 /var/lib/ceph/tmp/mnt.f5aEho
Sep 18 12:13:11 be1proxmox1 sh[3221]: File "/usr/sbin/ceph-disk", line 11, in <module>
Sep 18 12:13:11 be1proxmox1 sh[3221]: load_entry_point('ceph-disk==1.0.0', 'console_scripts', 'ceph-disk')()
Sep 18 12:13:11 be1proxmox1 sh[3221]: File "/usr/lib/python2.7/dist-packages/ceph_disk/main.py", line 5699, in run
Sep 18 12:13:11 be1proxmox1 sh[3221]: File "/usr/lib/python2.7/dist-packages/ceph_disk/main.py", line 5650, in main
Sep 18 12:13:11 be1proxmox1 sh[3221]: File "/usr/lib/python2.7/dist-packages/ceph_disk/main.py", line 3754, in main_activate
Sep 18 12:13:11 be1proxmox1 sh[3221]: File "/usr/lib/python2.7/dist-packages/ceph_disk/main.py", line 3494, in mount_activate
Sep 18 12:13:11 be1proxmox1 sh[3221]: File "/usr/lib/python2.7/dist-packages/ceph_disk/main.py", line 1446, in mount
Sep 18 12:13:11 be1proxmox1 sh[3221]: ceph_disk.main.MountError: Mounting filesystem failed: Command '['/bin/mount', '-t', u'xfs', '-o', 'noatime,inode64', '--', '/dev/sdc1', '/var/lib/ceph/tmp/mnt.f5aEho']' returned non-zero exit status 32
Sep 18 12:13:11 be1proxmox1 sh[3221]: File "/usr/sbin/ceph-disk", line 11, in <module>
Sep 18 12:13:11 be1proxmox1 sh[3221]: load_entry_point('ceph-disk==1.0.0', 'console_scripts', 'ceph-disk')()
Sep 18 12:13:11 be1proxmox1 sh[3221]: File "/usr/lib/python2.7/dist-packages/ceph_disk/main.py", line 5699, in run
Sep 18 12:13:11 be1proxmox1 sh[3221]: File "/usr/lib/python2.7/dist-packages/ceph_disk/main.py", line 5650, in main
Sep 18 12:13:11 be1proxmox1 sh[3221]: File "/usr/lib/python2.7/dist-packages/ceph_disk/main.py", line 4853, in main_trigger
Sep 18 12:13:11 be1proxmox1 sh[3221]: ceph_disk.main.Error: Error: return code 1
Sep 18 12:13:11 be1proxmox1 systemd[1]: ceph-disk@dev-sdc1.service: Main process exited, code=exited, status=1/FAILURE
Sep 18 12:13:11 be1proxmox1 systemd[1]: ceph-disk@dev-sdc1.service: Unit entered failed state.
Sep 18 12:13:11 be1proxmox1 systemd[1]: ceph-disk@dev-sdc1.service: Failed with result 'exit-code'.
Versions :
Code:
root@be1proxmox1:~# pveversion --verbose
proxmox-ve: 5.0-20 (running kernel: 4.10.17-2-pve)
pve-manager: 5.0-30 (running version: 5.0-30/5ab26bc)
pve-kernel-4.10.17-2-pve: 4.10.17-20
libpve-http-server-perl: 2.0-6
lvm2: 2.02.168-pve3
corosync: 2.4.2-pve3
libqb0: 1.0.1-1
pve-cluster: 5.0-12
qemu-server: 5.0-15
pve-firmware: 2.0-2
libpve-common-perl: 5.0-16
libpve-guest-common-perl: 2.0-11
libpve-access-control: 5.0-6
libpve-storage-perl: 5.0-14
pve-libspice-server1: 0.12.8-3
vncterm: 1.5-2
pve-docs: 5.0-9
pve-qemu-kvm: 2.9.0-4
pve-container: 2.0-15
pve-firewall: 3.0-2
pve-ha-manager: 2.0-2
ksm-control-daemon: not correctly installed
glusterfs-client: 3.8.8-1
lxc-pve: 2.0.8-3
lxcfs: 2.0.7-pve4
criu: 2.11.1-1~bpo90
novnc-pve: 0.6-4
smartmontools: 6.5+svn4324-1
ceph: 12.1.2-pve1
Tell me if you need more details