Hello,
we had a strange issue, we got a call asking for assistance on a proxmox cluster, apparently today the system restarted (apparently one node was already down for "problems" and the 2 other node seems to have gone down around the same time).
When they restarted the system they tried to start again the VM but after a few seconds the console timed out and even if the vm was started nothing seems to work, trying to navigate inside the ceph storage result in another timeout.
Here the result of the oh ceph -s:
not the best situation, but they absolutely need to access the file inside of the storage:
but it is not possible to do a backup or a move. Any ideas?
Thank you
we had a strange issue, we got a call asking for assistance on a proxmox cluster, apparently today the system restarted (apparently one node was already down for "problems" and the 2 other node seems to have gone down around the same time).
When they restarted the system they tried to start again the VM but after a few seconds the console timed out and even if the vm was started nothing seems to work, trying to navigate inside the ceph storage result in another timeout.
Here the result of the oh ceph -s:
Bash:
root@SV3:~# ceph -s
cluster:
id: 89fd82e2-031d-4309-bbf9-454dcc2a4021
health: HEALTH_WARN
Reduced data availability: 345 pgs inactive
Degraded data redundancy: 5956939/13902540 objects degraded (42.848%), 1003 pgs degraded, 1003 pgs undersized
1023 pgs not deep-scrubbed in time
1023 pgs not scrubbed in time
services:
mon: 3 daemons, quorum SV1,SV2,SV3 (age 90m)
mgr: SV2(active, since 90m), standbys: SV3
osd: 18 osds: 18 up (since 88m), 18 in (since 115m); 1003 remapped pgs
data:
pools: 1 pools, 1024 pgs
objects: 4.63M objects, 18 TiB
usage: 47 TiB used, 51 TiB / 98 TiB avail
pgs: 33.691% pgs not active
5956939/13902540 objects degraded (42.848%)
656 active+undersized+degraded+remapped+backfill_wait
344 undersized+degraded+remapped+backfill_wait+peered
21 active+clean
2 active+undersized+degraded+remapped+backfilling
1 undersized+degraded+remapped+backfilling+peered
io:
recovery: 43 MiB/s, 10 objects/s
Bash:
root@SV3:~# ceph osd tree
ID CLASS WEIGHT TYPE NAME STATUS REWEIGHT PRI-AFF
-1 98.24387 root default
-3 32.74796 host SV1
0 hdd 5.45799 osd.0 up 1.00000 1.00000
1 hdd 5.45799 osd.1 up 1.00000 1.00000
2 hdd 5.45799 osd.2 up 1.00000 1.00000
3 hdd 5.45799 osd.3 up 1.00000 1.00000
4 hdd 5.45799 osd.4 up 1.00000 1.00000
15 hdd 5.45799 osd.15 up 1.00000 1.00000
-5 32.74796 host SV2
5 hdd 5.45799 osd.5 up 1.00000 1.00000
6 hdd 5.45799 osd.6 up 1.00000 1.00000
7 hdd 5.45799 osd.7 up 1.00000 1.00000
8 hdd 5.45799 osd.8 up 1.00000 1.00000
9 hdd 5.45799 osd.9 up 1.00000 1.00000
16 hdd 5.45799 osd.16 up 1.00000 1.00000
-7 32.74796 host SV3
10 hdd 5.45799 osd.10 up 1.00000 1.00000
11 hdd 5.45799 osd.11 up 1.00000 1.00000
12 hdd 5.45799 osd.12 up 1.00000 1.00000
13 hdd 5.45799 osd.13 up 1.00000 1.00000
14 hdd 5.45799 osd.14 up 1.00000 1.00000
17 hdd 5.45799 osd.17 up 1.00000 1.00000
not the best situation, but they absolutely need to access the file inside of the storage:
Bash:
root@SV3:~# rbd list Storage
vm-101-disk-0
vm-101-disk-1
vm-101-disk-2
vm-102-disk-0
vm-102-disk-1
vm-103-disk-0
vm-103-disk-1
vm-103-disk-2
but it is not possible to do a backup or a move. Any ideas?
Thank you
Last edited: