Hello.
We recently installed 3 new Proxmox servers, now running 6x Proxmox servers. 2 of the new servers were identical (HP DL360p G8).
The only difference between these two new servers are that the one with problems are running Seagate 1TB Firecuda SSHD boot disks in RAIDZ1.
All of these servers run 4x SM863A SSD's for Ceph. This has been working great.
Now one of those two servers are causing some problems with Ceph.
The server called proxmox4 is timing out in the web interface.
Timing out when trying to add the OSD's. Sometimes it just takes a long time to add the OSD's on Proxmox4.
Then we added monitor on Proxmox4:
1 slow ops, oldest one blocked for 109 sec, mon.proxmox4 has slow ops
After some hours:
mon.proxmox4 crashed on host proxmox4 at 2020-03-09 23:29:31.307248Z
020-03-10 07:05:15.607837 mon.proxmox4 (mon.5) 504 : cluster [INF] mon.proxmox4 calling monitor election
Syslog from proxmox1:
Mar 9 22:49:41 proxmox1 pmxcfs[2316]: [status] notice: RRDC update error /var/lib/rrdcached/db/pve2-storage/proxmox4/nas1-10gbit: -1
Mar 9 22:49:41 proxmox1 pmxcfs[2316]: [status] notice: RRDC update error /var/lib/rrdcached/db/pve2-storage/proxmox4/ceph-ssd: -1
There is something strange here. ILO is not reporting anything wrong.
Unsure what to look for, but this server is behaving strangely.
Proxmox4: syslog
https://paste.ubuntu.com/p/xRf8PRbgpk/
Proxmox4: ceph-mon:
https://paste.ubuntu.com/p/NGqtgZpTkX/
Proxmox1: syslog
https://paste.ubuntu.com/p/Hj9sKwyqCQ/
Some files were too large for pastebin.
Thanks
EDIT: Might be those SSHD drives we use for boot on the server.
We bought 6 of them, only 3 would be detected on the server. So we used 2 of them on this server. They where all new Seagate Firecuda 1TB.
We recently installed 3 new Proxmox servers, now running 6x Proxmox servers. 2 of the new servers were identical (HP DL360p G8).
The only difference between these two new servers are that the one with problems are running Seagate 1TB Firecuda SSHD boot disks in RAIDZ1.
All of these servers run 4x SM863A SSD's for Ceph. This has been working great.
Now one of those two servers are causing some problems with Ceph.
The server called proxmox4 is timing out in the web interface.
Timing out when trying to add the OSD's. Sometimes it just takes a long time to add the OSD's on Proxmox4.
Then we added monitor on Proxmox4:
1 slow ops, oldest one blocked for 109 sec, mon.proxmox4 has slow ops
After some hours:
mon.proxmox4 crashed on host proxmox4 at 2020-03-09 23:29:31.307248Z
020-03-10 07:05:15.607837 mon.proxmox4 (mon.5) 504 : cluster [INF] mon.proxmox4 calling monitor election
Syslog from proxmox1:
Mar 9 22:49:41 proxmox1 pmxcfs[2316]: [status] notice: RRDC update error /var/lib/rrdcached/db/pve2-storage/proxmox4/nas1-10gbit: -1
Mar 9 22:49:41 proxmox1 pmxcfs[2316]: [status] notice: RRDC update error /var/lib/rrdcached/db/pve2-storage/proxmox4/ceph-ssd: -1
There is something strange here. ILO is not reporting anything wrong.
Unsure what to look for, but this server is behaving strangely.
Proxmox4: syslog
https://paste.ubuntu.com/p/xRf8PRbgpk/
Proxmox4: ceph-mon:
https://paste.ubuntu.com/p/NGqtgZpTkX/
Proxmox1: syslog
https://paste.ubuntu.com/p/Hj9sKwyqCQ/
Some files were too large for pastebin.
Thanks
EDIT: Might be those SSHD drives we use for boot on the server.
We bought 6 of them, only 3 would be detected on the server. So we used 2 of them on this server. They where all new Seagate Firecuda 1TB.
Attachments
Last edited: