Hi,
I have a weird problem:
When benchmarking Ceph with fio (4K block random write/read), Ceph crashes on HP GL380 Gen 9 Server(s).
- test1: Ceph worked fine when benchmarking on 3xDell R730XD servers.
- test2: Ceph crashed when benchmarking on 3xHP GL380 servers.
- test3: Ceph crashed when benchmarking on 2xDell R730XD + 1 HP GL380. Only the GL380 crashed.
- test4: If benchmarking was done on GL380's local disk, intead of on Ceph, It worked fine.
- test5: I changed GL380's CPU to E5-2696v3, switched another type of raid card, it still crashed when benchmarking.
- test6: I changed multiple HP GL380 servers in test3, it is always the same (Only GL380 crashed)
Sometimes PVE host crashed as well. I had to reboot PVE host.
Micro 5300 Pro 3.84TB SSD was used for Ceph. Micro 5100 Pro 480GB SSD was used for PVE host. Those SSD were tested separately without any performance problem.
I inistalled Zabbix on PVE host, and notced : GL380 SSD had a much higher disk latency (~2000ms), while Dell R730XD much lower(~several ms) when benchmarking.
I swapped the SSD on R730XD and GL380, and again, GL380 crashed when benchmarking.
It seems GL380 has some problems. I'm interested in finding out the cause, but I don't know how to troubleshooting.
There is few information on Google.
Can anybody give some advice?
Thanks in advance.
I have a weird problem:
When benchmarking Ceph with fio (4K block random write/read), Ceph crashes on HP GL380 Gen 9 Server(s).
- test1: Ceph worked fine when benchmarking on 3xDell R730XD servers.
- test2: Ceph crashed when benchmarking on 3xHP GL380 servers.
- test3: Ceph crashed when benchmarking on 2xDell R730XD + 1 HP GL380. Only the GL380 crashed.
- test4: If benchmarking was done on GL380's local disk, intead of on Ceph, It worked fine.
- test5: I changed GL380's CPU to E5-2696v3, switched another type of raid card, it still crashed when benchmarking.
- test6: I changed multiple HP GL380 servers in test3, it is always the same (Only GL380 crashed)
Sometimes PVE host crashed as well. I had to reboot PVE host.
Micro 5300 Pro 3.84TB SSD was used for Ceph. Micro 5100 Pro 480GB SSD was used for PVE host. Those SSD were tested separately without any performance problem.
I inistalled Zabbix on PVE host, and notced : GL380 SSD had a much higher disk latency (~2000ms), while Dell R730XD much lower(~several ms) when benchmarking.
I swapped the SSD on R730XD and GL380, and again, GL380 crashed when benchmarking.
It seems GL380 has some problems. I'm interested in finding out the cause, but I don't know how to troubleshooting.
There is few information on Google.
Can anybody give some advice?
Thanks in advance.