Hello,
Sorry if this has been covered before but I can't seem to find a solution.
I have a 5 node Proxmox (5.4) cluster running CEPH. 3 nodes with 3xOSD and 2 nodes with 2 OSD (To be increased shortly). When a server goes offline or I shutdown 2 OSDs some of my VMs run into issues (Nothing in the VM logs but webpages stop loading etc). I have also tried setting noout prior. All OSDs are 1.7TB PM883s
The VMs in question are running on the "ceph-ssd" pool (ceph-nvme is for testing)
If anyone has any suggestions it would be greatly appreciated. I want to upgrade to Proxmox 6 and Nautilus however do not feel confident with this happening.
Crushmap https://pastebin.com/raw/uG6PACxv
Logs https://pastebin.com/raw/mkDPVTtQ
Config https://pastebin.com/raw/P22t4phw
root@hv2:~# pveceph pool ls
Name size min_size pg_num %-used used
ceph-nvme 2 1 128 18.01 73815558239
ceph-ssd 3 2 256 55.27 3844619901849
root@hv2:~#
root@hv2:/var/log/ceph# ceph -s
cluster:
id: 40b9a33d-25c9-42b8-aa49-5a73c4bfa879
health: HEALTH_OK
services:
mon: 3 daemons, quorum hv2,hv4,hv5
mgr: hv2(active), standbys: hv5, hv4
osd: 17 osds: 17 up, 17 in
data:
pools: 2 pools, 384 pgs
objects: 939.60k objects, 3.57TiB
usage: 10.6TiB used, 13.0TiB / 23.6TiB avail
pgs: 384 active+clean
io:
client: 58.1MiB/s rd, 10.4MiB/s wr, 1.36kop/s rd, 238op/s wr
Is my pg_num too low? Should I set min_size to 1? Do I not have enough OSDs? Is it because they're uneven?
Thanks
Sorry if this has been covered before but I can't seem to find a solution.
I have a 5 node Proxmox (5.4) cluster running CEPH. 3 nodes with 3xOSD and 2 nodes with 2 OSD (To be increased shortly). When a server goes offline or I shutdown 2 OSDs some of my VMs run into issues (Nothing in the VM logs but webpages stop loading etc). I have also tried setting noout prior. All OSDs are 1.7TB PM883s
The VMs in question are running on the "ceph-ssd" pool (ceph-nvme is for testing)
If anyone has any suggestions it would be greatly appreciated. I want to upgrade to Proxmox 6 and Nautilus however do not feel confident with this happening.
Crushmap https://pastebin.com/raw/uG6PACxv
Logs https://pastebin.com/raw/mkDPVTtQ
Config https://pastebin.com/raw/P22t4phw
root@hv2:~# pveceph pool ls
Name size min_size pg_num %-used used
ceph-nvme 2 1 128 18.01 73815558239
ceph-ssd 3 2 256 55.27 3844619901849
root@hv2:~#
root@hv2:/var/log/ceph# ceph -s
cluster:
id: 40b9a33d-25c9-42b8-aa49-5a73c4bfa879
health: HEALTH_OK
services:
mon: 3 daemons, quorum hv2,hv4,hv5
mgr: hv2(active), standbys: hv5, hv4
osd: 17 osds: 17 up, 17 in
data:
pools: 2 pools, 384 pgs
objects: 939.60k objects, 3.57TiB
usage: 10.6TiB used, 13.0TiB / 23.6TiB avail
pgs: 384 active+clean
io:
client: 58.1MiB/s rd, 10.4MiB/s wr, 1.36kop/s rd, 238op/s wr
Is my pg_num too low? Should I set min_size to 1? Do I not have enough OSDs? Is it because they're uneven?
Thanks
Last edited: