I/O Issues when booting Kernel 5

Linden

New Member
Jul 5, 2019
1
0
1
Hi everyone,


so we upgraded to PVE6 (Proxmox VE 6) about a week ago to test it.

Since then our DRBD resource got incredibly slow (Around 105Mbit/s).
Sometimes even "ls" takes around 20 Seconds. ext4 reports errors. NFS not responding.

If we boot to Kernel 4.15 our Speeds and everything else get back to normal (Max 15Gbit/s).


We also experience high I/O delays and very poor write/read latency if we boot Kernel 5.


Here's some data that might help:


~# cat
/sys/kernel/debug/drbd/resources/r0/connections/node2/0/proc_drbd ; echo
-e "\n\n" ; uname -a ; echo -e "\n\n" ; dpkg -l | grep
'pve-kernel\|drbd' ; echo -e "\n\n" ; drbdadm dump
0: cs:SyncTarget ro:Secondary/Primary ds:Inconsistent/UpToDate C r-----
ns:0 nr:12242948 dw:12242948 dr:110085144 al:0 bm:0 lo:0 pe:[0;93]
ua:0 ap:[0;0] ep:1 wo:2 oos:7352525548
[>....................] sync'ed: 1.5% (7180200/7287584)M
finish: 4:03:52 speed: 502,476 (530,020 -- 484,408) want:
2,000,000 K/sec
1% sector pos: 297469952/15002273264
resync: used:2/61 hits:214057 misses:1684 starving:0 locked:0
changed:842
act_log: used:0/1237 hits:0 misses:0 starving:0 locked:0 changed:0
blocked on activity log: 0/0/0



Linux node1 4.15.18-18-pve #1 SMP PVE 4.15.18-44 (Wed, 03 Jul 2019
11:19:13 +0200) x86_64 GNU/Linux



ii drbd-dkms 9.0.19-1
all RAID 1 over TCP/IP for Linux module source
ii drbd-utils 9.10.0-1
amd64 RAID 1 over TCP/IP for Linux (user utilities)
ii drbdtop 0.2.1-1
amd64 like top, but for drbd
ii pve-firmware 3.0-2
all Binary firmware code for the pve-kernel
ii pve-kernel-4.15 5.4-6
all Latest Proxmox VE Kernel Image
ii pve-kernel-4.15.18-12-pve 4.15.18-36
amd64 The Proxmox PVE Kernel Image
ii pve-kernel-4.15.18-16-pve 4.15.18-41
amd64 The Proxmox PVE Kernel Image
ii pve-kernel-4.15.18-18-pve 4.15.18-44
amd64 The Proxmox PVE Kernel Image
ii pve-kernel-5.0 6.0-5
all Latest Proxmox VE Kernel Image
ii pve-kernel-5.0.15-1-pve 5.0.15-1
amd64 The Proxmox PVE Kernel Image
ii pve-kernel-helper 6.0-5
all Function for various kernel maintenance tasks.



# /etc/drbd.conf
# resource r0 on node1: not ignored, not stacked
# defined at /etc/drbd.d/r0.res:1
resource r0 {
on node1 {
node-id 1;
volume 0 {
device /dev/drbd0 minor 0;
disk
/dev/disk/by-uuid/8a879a82-3880-4998-b5cb-70a95ce4bf79;
meta-disk internal;
}
address ipv4 192.168.99.1:7788;
}
on node2 {
node-id 0;
volume 0 {
device /dev/drbd0 minor 0;
disk
/dev/disk/by-uuid/8a879a82-3880-4998-b5cb-70a95ce4bf79;
meta-disk internal;
}
address ipv4 192.168.99.2:7788;
}
net {
after-sb-0pri discard-zero-changes;
after-sb-1pri discard-secondary;
after-sb-2pri disconnect;
csums-alg sha1;
max-buffers 36864;
max-epoch-size 20000;
rcvbuf-size 2097152;
sndbuf-size 1048576;
verify-alg sha1;
}
disk {
c-fill-target 10240;
c-max-rate 2237280;
c-min-rate 204800;
c-plan-ahead 0;
resync-rate 2000000;
}
}



If we put I/O on the DRBD Resource we get a maximum of 14.19Gbit/s (We
have 25Gbit/s direct Attached Network) on our bond interface with the
4.15 Kernel






~# cat
/sys/kernel/debug/drbd/resources/r0/connections/node2/0/proc_drbd ; echo
-e "\n\n" ; uname -a ; echo -e "\n\n" ; dpkg -l | grep
'pve-kernel\|drbd' ; echo -e "\n\n" ; drbdadm dump
0: cs:SyncTarget ro:Secondary/Primary ds:Inconsistent/UpToDate C r-----
ns:0 nr:541700 dw:541700 dr:10270724 al:0 bm:0 lo:0 pe:[0;107] ua:0
ap:[0;0] ep:1 wo:2 oos:7334758124
[>....................] sync'ed: 0.2% (7162848/7172768)M
finish: 14:05:20 speed: 144,596 (154,112 -- 166,556) want:
2,000,000 K/sec
2% sector pos: 332978176/15002273264
resync: used:2/61 hits:19875 misses:162 starving:0 locked:0
changed:81
act_log: used:0/1237 hits:0 misses:0 starving:0 locked:0 changed:0
blocked on activity log: 0/0/0



Linux node1 5.0.15-1-pve #1 SMP PVE 5.0.15-1 (Wed, 03 Jul 2019 10:51:57
+0200) x86_64 GNU/Linux



ii drbd-dkms 9.0.19-1
all RAID 1 over TCP/IP for Linux module source
ii drbd-utils 9.10.0-1
amd64 RAID 1 over TCP/IP for Linux (user utilities)
ii drbdtop 0.2.1-1
amd64 like top, but for drbd
ii pve-firmware 3.0-2
all Binary firmware code for the pve-kernel
ii pve-kernel-4.15 5.4-6
all Latest Proxmox VE Kernel Image
ii pve-kernel-4.15.18-12-pve 4.15.18-36
amd64 The Proxmox PVE Kernel Image
ii pve-kernel-4.15.18-16-pve 4.15.18-41
amd64 The Proxmox PVE Kernel Image
ii pve-kernel-4.15.18-18-pve 4.15.18-44
amd64 The Proxmox PVE Kernel Image
ii pve-kernel-5.0 6.0-5
all Latest Proxmox VE Kernel Image
ii pve-kernel-5.0.15-1-pve 5.0.15-1
amd64 The Proxmox PVE Kernel Image
ii pve-kernel-helper 6.0-5
all Function for various kernel maintenance tasks.



# /etc/drbd.conf
# resource r0 on node1: not ignored, not stacked
# defined at /etc/drbd.d/r0.res:1
resource r0 {
on node1 {
node-id 1;
volume 0 {
device /dev/drbd0 minor 0;
disk
/dev/disk/by-uuid/8a879a82-3880-4998-b5cb-70a95ce4bf79;
meta-disk internal;
}
address ipv4 192.168.99.1:7788;
}
on node2 {
node-id 0;
volume 0 {
device /dev/drbd0 minor 0;
disk
/dev/disk/by-uuid/8a879a82-3880-4998-b5cb-70a95ce4bf79;
meta-disk internal;
}
address ipv4 192.168.99.2:7788;
}
net {
after-sb-0pri discard-zero-changes;
after-sb-1pri discard-secondary;
after-sb-2pri disconnect;
csums-alg sha1;
max-buffers 36864;
max-epoch-size 20000;
rcvbuf-size 2097152;
sndbuf-size 1048576;
verify-alg sha1;
}
disk {
c-fill-target 10240;
c-max-rate 2237280;
c-min-rate 204800;
c-plan-ahead 0;
resync-rate 2000000;
}
}

If we put I/O on the DRBD Resource we get a maximum of 107Mbit/s (We
have 25Gbit/s direct Attached Network) on our bond interface with the
5.0.15 Kernel

Hardware Information can be found here:
HPe DL385 Gen10
HPe SmartArray P816i => 9 Toshiba 1.92TB SSD's attached and configured in RAID 5 with SmartPath.
2 x EPYC 7401
256GB DDR4 RAM
2 x 25Gbit/s SFP28 (DRBD Network direct attached)
2 x RJ45 10Gbit/s


Maybe someone has a clue what changed in Kernel 5 that is slowing us
that much down.

Maybe someone even knows a solution for that.



Kind Regards,

Alexander Karamanlidis
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!