[SOLVED] Ceph - mon.proxmox4 has slow ops - mon.proxmox4 crashed

potetpro · Mar 10, 2020

Hello.

We recently installed 3 new Proxmox servers, now running 6x Proxmox servers. 2 of the new servers were identical (HP DL360p G8).
The only difference between these two new servers are that the one with problems are running Seagate 1TB Firecuda SSHD boot disks in RAIDZ1.
All of these servers run 4x SM863A SSD's for Ceph. This has been working great.
Now one of those two servers are causing some problems with Ceph.

The server called proxmox4 is timing out in the web interface.
Timing out when trying to add the OSD's. Sometimes it just takes a long time to add the OSD's on Proxmox4.

Then we added monitor on Proxmox4:
1 slow ops, oldest one blocked for 109 sec, mon.proxmox4 has slow ops

After some hours:
mon.proxmox4 crashed on host proxmox4 at 2020-03-09 23:29:31.307248Z

020-03-10 07:05:15.607837 mon.proxmox4 (mon.5) 504 : cluster [INF] mon.proxmox4 calling monitor election

Syslog from proxmox1:

Mar 9 22:49:41 proxmox1 pmxcfs[2316]: [status] notice: RRDC update error /var/lib/rrdcached/db/pve2-storage/proxmox4/nas1-10gbit: -1
Mar 9 22:49:41 proxmox1 pmxcfs[2316]: [status] notice: RRDC update error /var/lib/rrdcached/db/pve2-storage/proxmox4/ceph-ssd: -1

There is something strange here. ILO is not reporting anything wrong.
Unsure what to look for, but this server is behaving strangely.

Proxmox4: syslog
https://paste.ubuntu.com/p/xRf8PRbgpk/

Proxmox4: ceph-mon:
https://paste.ubuntu.com/p/NGqtgZpTkX/

Proxmox1: syslog
https://paste.ubuntu.com/p/Hj9sKwyqCQ/

Some files were too large for pastebin.

Thanks

EDIT: Might be those SSHD drives we use for boot on the server.
We bought 6 of them, only 3 would be detected on the server. So we used 2 of them on this server. They where all new Seagate Firecuda 1TB.

potetpro · Mar 10, 2020

root@proxmox4:~# ceph crash info 2020-03-09_23:29:31.307248Z_825c9fed-00ec-4917-bb39-e13dc2fed5bb
{
"os_version_id": "10",
"utsname_release": "5.3.13-3-pve",
"os_name": "Debian GNU/Linux 10 (buster)",
"entity_name": "mon.proxmox4",
"timestamp": "2020-03-09 23:29:31.307248Z",
"process_name": "ceph-mon",
"utsname_machine": "x86_64",
"utsname_sysname": "Linux",
"os_version": "10 (buster)",
"os_id": "10",
"utsname_version": "#1 SMP PVE 5.3.13-3 (Fri, 31 Jan 2020 08:17:11 +0100)",
"backtrace": [
"(()+0x12730) [0x7fb2fe142730]",
"(ProtocolV2::write_message(Message*, bool)+0x6ae) [0x7fb2ff5959ce]",
"(ProtocolV2::write_event()+0x175) [0x7fb2ff5ac075]",
"(AsyncConnection::handle_write()+0x43) [0x7fb2ff56b863]",
"(EventCenter:

rocess_events(unsigned int, std::chrono::duration<unsigned long, std::ratio<1l, 1000000000l> >*)+0x133f) [0x7fb2ff5ba22f]",
"(()+0x5c61db) [0x7fb2ff5c01db]",
"(()+0xbbb2f) [0x7fb2fe007b2f]",
"(()+0x7fa3) [0x7fb2fe137fa3]",
"(clone()+0x3f) [0x7fb2fdce74cf]"
],
"utsname_hostname": "proxmox4",
"crash_id": "2020-03-09_23:29:31.307248Z_825c9fed-00ec-4917-bb39-e13dc2fed5bb",
"ceph_version": "14.2.6"
}
root@proxmox4:~#

potetpro · Mar 10, 2020

Might be those SSHD boot drives

root@proxmox4:~# ceph crash info 2020-03-10_08:11:03.445795Z_adc310f8-4172-42f1-ada1-d612e4d5006b
{
"os_version_id": "10",
"assert_condition": "abort",
"utsname_release": "5.3.13-3-pve",
"os_name": "Debian GNU/Linux 10 (buster)",
"entity_name": "mon.proxmox4",
"assert_file": "/mnt/npool/tlamprecht/pve-ceph/ceph-14.2.6/src/mon/MonitorDBStore.h",
"timestamp": "2020-03-10 08:11:03.445795Z",
"process_name": "ceph-mon",
"utsname_machine": "x86_64",
"assert_line": 324,
"utsname_sysname": "Linux",
"os_version": "10 (buster)",
"os_id": "10",
"assert_thread_name": "ms_dispatch",
"utsname_version": "#1 SMP PVE 5.3.13-3 (Fri, 31 Jan 2020 08:17:11 +0100)",
"backtrace": [
"(()+0x12730) [0x7fe7428e7730]",
"(gsignal()+0x10b) [0x7fe7423ca7bb]",
"(abort()+0x121) [0x7fe7423b5535]",
"(ceph::__ceph_abort(char const*, int, char const*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)+0x1b0) [0x7fe743a215af]",
"(MonitorDBStore::apply_transaction(std::shared_ptr<MonitorDBStore::Transaction>)+0xb3e) [0x5574c7e5a02e]",
"(Paxos::store_state(MMonPaxos*)+0x7b0) [0x5574c7f59400]",
"(Paxos::handle_commit(boost::intrusive_ptr<MonOpRequest>)+0x2ea) [0x5574c7f59a5a]",
"(Paxos::dispatch(boost::intrusive_ptr<MonOpRequest>)+0x223) [0x5574c7f5f213]",
"(Monitor::dispatch_op(boost::intrusive_ptr<MonOpRequest>)+0x131c) [0x5574c7e94b1c]",
"(Monitor::_ms_dispatch(Message*)+0x4aa) [0x5574c7e9510a]",
"(Monitor::ms_dispatch(Message*)+0x26) [0x5574c7ec4a36]",
"(Dispatcher::ms_dispatch2(boost::intrusive_ptr<Message> const&)+0x26) [0x5574c7ec0f66]",
"(DispatchQueue::entry()+0x1a49) [0x7fe743c59e69]",
"(DispatchQueue:

ispatchThread::entry()+0xd) [0x7fe743d079ed]",
"(()+0x7fa3) [0x7fe7428dcfa3]",
"(clone()+0x3f) [0x7fe74248c4cf]"
],
"utsname_hostname": "proxmox4",
"assert_msg": "/mnt/npool/tlamprecht/pve-ceph/ceph-14.2.6/src/mon/MonitorDBStore.h: In function 'int MonitorDBStore::apply_transaction(MonitorDBStore::TransactionRef)' thread 7fe7399a3700 time 2020-03-10 09:11:03.437193\n/mnt/npool/tlamprecht/pve-ceph/ceph-14.2.6/src/mon/MonitorDBStore.h: 324: ceph_abort_msg(\"failed to write to db\")\n",
"crash_id": "2020-03-10_08:11:03.445795Z_adc310f8-4172-42f1-ada1-d612e4d5006b",
"assert_func": "int MonitorDBStore::apply_transaction(MonitorDBStore::TransactionRef)",
"ceph_version": "14.2.6"
}

Alwin · Mar 10, 2020

Code:

Mar  9 17:57:35 proxmox4 pvedaemon[33222]: <root@pam> starting task UPID:proxmox4:00347739:05387D59:5E66757F:cephcreatemon:mon.proxmox4:root@pam:
Mar  9 17:57:37 proxmox4 systemd[1]: Started Ceph cluster monitor daemon.

Have there been updates? It seems the MON was restarted already before the crash report.

Code:

Mar  9 18:02:01 proxmox4 pvestatd[2020]: got timeout
Mar  9 18:02:01 proxmox4 pvestatd[2020]: status update time (5.458 seconds)
Mar  9 18:02:31 proxmox4 pvestatd[2020]: got timeout
Mar  9 18:02:32 proxmox4 pvestatd[2020]: status update time (5.148 seconds)
Mar  9 18:02:41 proxmox4 pvestatd[2020]: got timeout
Mar  9 18:02:41 proxmox4 pvestatd[2020]: status update time (5.159 seconds)

And soon after the storage throws the timeouts. At this point, something is already not working with the storage.

potetpro said:
"assert_msg": "/mnt/npool/tlamprecht/pve-ceph/ceph-14.2.6/src/mon/MonitorDBStore.h: In function 'int MonitorDBStore::apply_transaction(MonitorDBStore::TransactionRef)' thread 7fe7399a3700 time 2020-03-10 09:11:03.437193\n/mnt/npool/tlamprecht/pve-ceph/ceph-14.2.6/src/mon/MonitorDBStore.h: 324: ceph_abort_msg(\"failed to write to db\")\n",

The MON couldn't write a transaction to its database. The DB is located on the OS disks. These drives (Seagate 1TB Firecuda SSHD) are none-enterprise and may not be able to handle the IO load (small read/writes).

potetpro said:
2 of the new servers were identical (HP DL360p G8).

How is the network setup with that hardware?

potetpro · Mar 10, 2020

Alwin said:
Code:

Mar 9 17:57:35 proxmox4 pvedaemon[33222]: <root@pam> starting task UPID:proxmox4:00347739:05387D59:5E66757F:cephcreatemon:mon.proxmox4:root@pam: Mar 9 17:57:37 proxmox4 systemd[1]: Started Ceph cluster monitor daemon.

Have there been updates? It seems the MON was restarted already before the crash report.

Potetpro: This was the last time i created a monitor on proxmox4, to collect more error messages. The servers were first installed 28 Feb. And soon after the first crash we removed monitors from proxmox4.

Code:

Mar 9 18:02:01 proxmox4 pvestatd[2020]: got timeout Mar 9 18:02:01 proxmox4 pvestatd[2020]: status update time (5.458 seconds) Mar 9 18:02:31 proxmox4 pvestatd[2020]: got timeout Mar 9 18:02:32 proxmox4 pvestatd[2020]: status update time (5.148 seconds) Mar 9 18:02:41 proxmox4 pvestatd[2020]: got timeout Mar 9 18:02:41 proxmox4 pvestatd[2020]: status update time (5.159 seconds)

And soon after the storage throws the timeouts. At this point, something is already not working with the storage.

The MON couldn't write a transaction to its database. The DB is located on the OS disks. These drives (Seagate 1TB Firecuda SSHD) are none-enterprise and may not be able to handle the IO load (small read/writes).

Potetpro: This might be the case. No errors in dmesg or smartctl

How is the network setup with that hardware?

Potetpro: Each server runs two 10GBIT cards in active/backup, on a separate 10GIT storage network. This is same hardware and setup we have been using for all the servers, and its been working great.

Alwin · Mar 10, 2020

potetpro said:
Potetpro: This was the last time i created a monitor on proxmox4, to collect more error messages. The servers were first installed 28 Feb. And soon after the first crash we removed monitors from proxmox4.

Potetpro: This might be the case. No errors in dmesg or smartctl

This seems to point into the direction of the OS disks. Please, try to replace it with one of the SM863 or similar, to see if the issue resolves itself.

potetpro said:
Potetpro: Each server runs two 10GBIT cards in active/backup, on a separate 10GIT storage network. This is same hardware and setup we have been using for all the servers, and its been working great.

Is there any other traffic, besides Ceph running on them? And could you please post the ceph.conf?

potetpro · Mar 10, 2020

Alwin said:
This seems to point into the direction of the OS disks. Please, try to replace it with one of the SM863 or similar, to see if the issue resolves itself.

I was thinking of shutting the server off, and cloning each of the SSHD to new 1.2TB SAS drives we have. Just putting them info an other server and running dd to copy the entire disk. Then i don't need to think about shrinking the filesystem.

Alwin said:
Is there any other traffic, besides Ceph running on them? And could you please post the ceph.conf?

No. This is a separate network for Ceph.

Even from proxmox1, i get the timeout messages only when a monitor is running on proxmox4:

Code:

Mar 10 08:23:22 proxmox4 pvestatd[2020]: status update time (21.897 seconds)
Mar 10 08:24:57 proxmox4 pvestatd[2020]: got timeout
Mar 10 08:24:57 proxmox4 pvestatd[2020]: status update time (5.149 seconds)
Mar 10 08:25:07 proxmox4 pvestatd[2020]: got timeout
Mar 10 08:25:08 proxmox4 pvestatd[2020]: status update time (5.148 seconds)
Mar 10 08:25:42 proxmox4 pvestatd[2020]: status update time (30.293 seconds)
Mar 10 08:27:48 proxmox4 pvestatd[2020]: status update time (16.720 seconds)
Mar 10 08:30:04 proxmox4 pvestatd[2020]: status update time (25.879 seconds)
Mar 10 08:32:21 proxmox4 pvestatd[2020]: status update time (26.805 seconds)

Code:

[global]
         auth_client_required = cephx
         auth_cluster_required = cephx
         auth_service_required = cephx
         cluster_network = 10.10.10.0/24
         fsid = 1f6d8776-39b3-44c6-b484-111d3c8b8372
         mon_allow_pool_delete = true
         mon_host = 10.10.10.11 10.10.10.13 10.10.10.12 10.10.10.15 10.10.10.16
         osd_journal_size = 5120
         osd_memory_target = 2073741824
         osd_pool_default_min_size = 2
         osd_pool_default_size = 3
         public_network = 10.10.10.0/24

[client]
         keyring = /etc/pve/priv/$cluster.$name.keyring

[mds]
         keyring = /var/lib/ceph/mds/ceph-$id/keyring

[osd]
         keyring = /var/lib/ceph/osd/ceph-$id/keyring

Alwin · Mar 10, 2020

potetpro said:
I was thinking of shutting the server off, and cloning each of the SSHD to new 1.2TB SAS drives we have. Just putting them info an other server and running dd to copy the entire disk. Then i don't need to think about shrinking the filesystem.

Better, run a new installation, as it will then be a clean system. Otherwise you might copy some unwanted (or old) configurations over as well. A removal/addition of a node should not interfere with the VM/CT on the cluster.

potetpro said:
mon_host = 10.10.10.11 10.10.10.13 10.10.10.12 10.10.10.15 10.10.10.16

You only need 3x MONs. Any greater number is only needed if you have 1000s of Ceph clients/services.

potetpro said:
osd_memory_target = 2073741824

Since you decreased the memory_target, I assume, there is not enough memory on the system. As the MON will also need memory space, your system might be taxed to much already.

potetpro said:
Even from proxmox1, i get the timeout messages only when a monitor is running on proxmox4:

Depending on the resources used, the MON might not have enough to work with. Anyway, as only 3x MONs are needed, you don't need to add one on proxmox4.

potetpro · Mar 11, 2020

Alwin said:
Better, run a new installation, as it will then be a clean system. Otherwise you might copy some unwanted (or old) configurations over as well. A removal/addition of a node should not interfere with the VM/CT on the cluster.

How is this regarding leaving old trails, if i remove proxmox4, reinstall, and then add the node with the same name?
I have removed a temporary node with name proxmox9, and this is still listed in the Ceph->OSD list. (even though i removed all OSD's before removing the node. As with network storage, if you remove the network storage, the folder still exists, so if you want to re-add the same network storage later you need to use a new name / remove the folder manually.

Alwin said:
You only need 3x MONs. Any greater number is only needed if you have 1000s of Ceph clients/services.

Noted

Alwin said:
Since you decreased the memory_target, I assume, there is not enough memory on the system. As the MON will also need memory space, your system might be taxed to much already.

We have enough for now, but as we are going to upgrade the system to 6 OSD's per server, we are using 12GB per system for Ceph, instead of 24GB, as our VM's need the ram, and Ceph performance is not an issue (other than the errors

)
Is there an easy way to monitor if Ceph is low on memory?

Alwin said:
Depending on the resources used, the MON might not have enough to work with. Anyway, as only 3x MONs are needed, you don't need to add one on proxmox4.

The errors stopped as soon as i removed proxmox4 as a monitor.

I think this has to do with timeouts due to slow small-size writes, as you can see here:

2x 300GB SAS in RAIDZ1

Code:

root@proxmox5:~# pveperf
CPU BOGOMIPS:      124510.92
REGEX/SECOND:      1804596
HD SIZE:           269.31 GB (rpool/ROOT/pve-1)
FSYNCS/SECOND:     148.23
DNS EXT:           49.37 ms
DNS INT:           18.31 ms (localdomain)
root@proxmox5:~# pveperf
CPU BOGOMIPS:      124510.92
REGEX/SECOND:      1718595
HD SIZE:           269.31 GB (rpool/ROOT/pve-1)
FSYNCS/SECOND:     169.95
DNS EXT:           55.43 ms
DNS INT:           19.32 ms (localdomain)

And then the Seagate Firecuda 1TB SSHD in RAIDZ1

Code:

root@proxmox4:~# pveperf
CPU BOGOMIPS:      124522.92
REGEX/SECOND:      1856490
HD SIZE:           898.99 GB (rpool/ROOT/pve-1)
FSYNCS/SECOND:     3.92
DNS EXT:           48.64 ms
DNS INT:           18.59 ms (localdomain)
root@proxmox4:~# pveperf
CPU BOGOMIPS:      124522.92
REGEX/SECOND:      1709153
HD SIZE:           898.99 GB (rpool/ROOT/pve-1)
FSYNCS/SECOND:     0.57
DNS EXT:           53.81 ms
DNS INT:           18.97 ms (localdomain)
root@proxmox4:~# pveperf
CPU BOGOMIPS:      124522.92
REGEX/SECOND:      1783012
HD SIZE:           898.99 GB (rpool/ROOT/pve-1)
FSYNCS/SECOND:     53.24
DNS EXT:           53.10 ms
DNS INT:           19.38 ms (localdomain)
root@proxmox4:~#

EDIT: Thanks again for the help.

Alwin · Mar 11, 2020

potetpro said:
How is this regarding leaving old trails, if i remove proxmox4, reinstall, and then add the node with the same name?

The node name in /etc/pve/ will be left on the cluster, but all the configs will be re-synced.

potetpro said:
Is there an easy way to monitor if Ceph is low on memory?

You can grab all the stats from the MGR. It has pulgins for some of the monitoring systems.
https://docs.ceph.com/docs/nautilus/mgr/

potetpro said:
The errors stopped as soon as i removed proxmox4 as a monitor.
I think this has to do with timeouts due to slow small-size writes, as you can see here:

The fsyncs/s are already a good indication.

potetpro · Mar 12, 2020

Thanks

Search

Search

[SOLVED] Ceph - mon.proxmox4 has slow ops - mon.proxmox4 crashed

potetpro

Member

Attachments

potetpro

Member

potetpro

Member

Alwin

Proxmox Retired Staff

potetpro

Member

Alwin

Proxmox Retired Staff

potetpro

Member

Alwin

Proxmox Retired Staff

potetpro

Member

Alwin

Proxmox Retired Staff

potetpro

Member