dockerd: sync duration of 1.x, expected less than 1s

Hyacin

Member
May 6, 2020
28
5
8
43
I've had this problem for a while here now, and I've turned the internet upside down trying to figure it out, but there is very, very little information out there about this specific warning. It is extremely consistent and reproducible in my PVE setup though, so I figured I'd ask here!

Code:
May 28, 2020 @ 20:26:06.000manager
dockerd
4
sync duration of 1.00586989s, expected less than 1s
May 28, 2020 @ 20:26:06.000dockernode-ha
dockerd
4
sync duration of 1.062095179s, expected less than 1s
May 28, 2020 @ 20:26:05.000manager
dockerd
4
sync duration of 1.181777401s, expected less than 1s
May 28, 2020 @ 20:23:37.000manager
dockerd
4
sync duration of 1.221711715s, expected less than 1s

These two boxes live on my Ceph pool. I moved one off to test, and the messages stopped - all except when backups (all the backups) were being done at night. As soon as I moved it back to the Ceph pool, the messages returned!

The Ceph pool is running off ADATA SU655 SSDs, on 3x Intel NUC10i3s, with GigE via a USB-C to GigE adapter (formerly through the onboard Intel NICs but they have the issue were the node will eventually get all screwy if all the offloading isn't disabled - and it was happening then too).

They're running through an 8 port Netgear Smart Managed Pro switch with the latest firmware, and I believe they were showing the same behaviour previously running through an unmanaged D-Link GigE switch.

Ceph stats all seem low/fine -

90-110 KiB/s
10-40 IOPS writes
(all 0s on reads)

~1-11ms Apply/Commit Latency.

The two VMs mentioned above are both HA - iirc (I've been chasing this a while before coming here, so the memories are fading a little), it doesn't matter what node they're living on at the time, this still happens - and also iirc, I took one of my normally stapled down VMs and moved it over to the Ceph and it too started complaining. I'll probably re-test those last two to be sure, but I figured it was time to reach out and ask if anyone has seen this before, or if anyone can think of anything else I can check!

Oh, and I *think* this was happening previously as well when my Ceph pool was actually on my SX8200 NVMes, so I really don't think it's a drive thing :-/
 
These two boxes live on my Ceph pool. I moved one off to test, and the messages stopped - all except when backups (all the backups) were being done at night. As soon as I moved it back to the Ceph pool, the messages returned!
Judging from that, Ceph is just to slow. Did you try to run zfs with the storage replication? Can also provide HA, but has different requirements when it comes to hardware.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!