Thank you Jarek for your time and your reply. I really need to read a manual on Ceph.
I changed the setting: In 2 hours I think the redundancy will be over, then I will try a new test.
I was desperate, even I thought to rollback to version 4..
I added a new powerfull node full of disks, removed the old one with few osd.
As I stop 1 OSD, the ceph pool freezes.
2019-02-20 09:00:00.000189 mon.bluehub-prox01 mon.0 10.9.9.1:6789/0 82642 : cluster [INF] overall HEALTH_OK
Good morning Jarek,
thank you for your advice.
Here it is:
# begin crush map
tunable choose_local_tries 0
tunable choose_local_fallback_tries 0
tunable choose_total_tries 50
tunable chooseleaf_descend_once 1
tunable chooseleaf_vary_r 1
tunable chooseleaf_stable 1
tunable straw_calc_version 1...
I masked systemd-timesyncd and installet ntpd.
No skew detected, anyway, same issue..
NO I/O, everything is blocked.
no info from OSD logs.. I think I will rollback to proxmox version 4 or I will switch from ceph to another shared disk
after some tests, I discovered that if 1 of 4 nodes goes down, the disk IO stucks.
VM and CT are still up but no disk of them are available for I/O.
I have 3 ceph monitors.
When I reboot the node, on ceph logs:
2019-01-24 10:28:08.240463 mon.bluehub-prox02 mon.0 10.9.9.2:6789/0...
Just installed proxmox 5.3, updated just with Debian updates. No pve-nosubs repository. I created a a 3 nodes cluster.
I am using a separate network for the cluster 10 GB. hosts file correct.
Immediatly after creation of the cluster:
I have routing problem. No one of my VM and CT can go out after the latest upgrade.
root@bluehub-prox02:~# pveversion -v
proxmox-ve: 4.3-72 (running kernel: 4.4.24-1-pve)
pve-manager: 4.3-12 (running version: 4.3-12/6894c9d9)