Hello togehter,
Any help in the issue below is highly appreciated!
Thanks!
Christian
[In short]
After upgrading from Proxmox 6.2 to 6.3 mid of December I noticed a massive performance drop on my HDD based ZFS-pool.
Performance dropped in my SAMBA use case from 110 MByte/s Network copy speed to below 1 MByte/s. Copy jobs from the HDD pool to the SSD based ZFS pool and vice versa
see similar performance hits.
[Symptoms in detail]
I create - on a regular basis - full images of my workstation to be fast back on track in case of a hard drive defect. Images are first stored on a local USB-drive and
in a second step copied over to the SAMBA server running on my Proxmox server via robocopy. Before the upgrade to Proxmox 6.3 the robocopy process to the SAMBA server was
finished in roughly 80 - 90 Minutes for an image of 490 Gbyte in size. Since the upgrade, the copy process now takes forever. In the Windows task manager you'll see that the
data is first copied with full line speed at a rate of 110 MByte/s. After a few seconds the performance drops to below 1 MByte/s and stays there for roughly a minute. In the
same time the IO delay on the servers shoots to over 60% - sometimes 80%. Then performance comes back for a few seconds before it drops again. This looks like a saw tooth function.
To pin point the problem I made some tests with copy processes from my SSD-Pool to the HDD-Pool and vice versa without running CTs and VMs. The effect looks the same. First it starts with high copy rates
and after a few seconds the performance drops down to unusable speeds below 1 MByte/s.
Copy with pv
Other symptoms are low FSYNC/Seconds on the HDD-pool after a few minutes of runtime - even with deactivated CTs and VMs:
Directly after reboot with deactivated CTs and VMs
After a few minutes:
Sometimes you see FSYNCS/SECOND values between 50 and 60.
For reference I've attached screenshots from the web interface
[Hardware and Software Configuration]
Proxmox 6.3-3
Kernel Version Linux: 5.4.78-2-pve #1 SMP PVE 5.4.78-2 (Thu, 03 Dec 2020 14:26:17 +0100)
PVE Manager Version: pve-manager/6.3-3/eee5f901
CPU: Intel Xeon E2146G
Memory: 64 GB Kingston ECC
Mainboard: Fujitsu (now Kontron): D3644-B
Network: 1x 1 Gbe Intel I219-LM onboard for maintenance and webinterface
1x 10 Gbe Intel X550T (second port not used)
HDD: 4x Seagate ST4000VN008 (Ironwolf) Configured as ZFS RAID-10 (2x2) for bind mounts (name: hdd_zfs_guests)
SSD: 2x Crucial CT2000MX500 - Configured as ZFS RAID-1 for VMs and CTs (name: hdd_zfs_ssd)
NVME: 1x Samsung 970 EVO 250Gb as Boot device and Proxmox installation drive
Server Build date: Spring 2019 - Upgrade with 2x SSDs for the second pool early November 2020
Number of typical used LXC containers:
9 (8 turned of for the SAMBA tests and all turned off for tests without SAMBA)
Number of typical running VMs:
2 (turned of for the tests)
Samba CT:
privileged LXC container with Debian 10
SAMBA Version: 4.9.5-Debian
SAMBA data storage on HDD ZFS-pool an mounted via bind mount into the CT
ZPOOL List
ZPOOL Status
ZFS List
ZFS get
See zfs_get_all.txt
ARC summary
See arc_summary.txt
dmesg
lspci
SMART Values
See smart_sd*.txt
System Temperatures
Any help in the issue below is highly appreciated!
Thanks!
Christian
[In short]
After upgrading from Proxmox 6.2 to 6.3 mid of December I noticed a massive performance drop on my HDD based ZFS-pool.
Performance dropped in my SAMBA use case from 110 MByte/s Network copy speed to below 1 MByte/s. Copy jobs from the HDD pool to the SSD based ZFS pool and vice versa
see similar performance hits.
[Symptoms in detail]
I create - on a regular basis - full images of my workstation to be fast back on track in case of a hard drive defect. Images are first stored on a local USB-drive and
in a second step copied over to the SAMBA server running on my Proxmox server via robocopy. Before the upgrade to Proxmox 6.3 the robocopy process to the SAMBA server was
finished in roughly 80 - 90 Minutes for an image of 490 Gbyte in size. Since the upgrade, the copy process now takes forever. In the Windows task manager you'll see that the
data is first copied with full line speed at a rate of 110 MByte/s. After a few seconds the performance drops to below 1 MByte/s and stays there for roughly a minute. In the
same time the IO delay on the servers shoots to over 60% - sometimes 80%. Then performance comes back for a few seconds before it drops again. This looks like a saw tooth function.
To pin point the problem I made some tests with copy processes from my SSD-Pool to the HDD-Pool and vice versa without running CTs and VMs. The effect looks the same. First it starts with high copy rates
and after a few seconds the performance drops down to unusable speeds below 1 MByte/s.
Copy with pv
Code:
%
root@proxmox02:/hdd_zfs_guests# head -c 100G </dev/urandom > /hdd_zfs_ssd/random_data.ran
root@proxmox02:/hdd_zfs_guests# pv -pra /hdd_zfs_ssd/random_data.ran > ./random_data.ran
[1.42MiB/s] [33.5MiB/s] [===========> ] 8%
Other symptoms are low FSYNC/Seconds on the HDD-pool after a few minutes of runtime - even with deactivated CTs and VMs:
Directly after reboot with deactivated CTs and VMs
Code:
root@proxmox02:/hdd_zfs_ssd# pveperf /hdd_zfs_guests
CPU BOGOMIPS: 83997.84
REGEX/SECOND: 4209528
HD SIZE: 2947.06 GB (hdd_zfs_guests)
FSYNCS/SECOND: 151.42
DNS EXT: 34.98 ms
DNS INT: 88.97 ms ()
After a few minutes:
Code:
root@proxmox02:~/zfs_debug# pveperf /hdd_zfs_guests
CPU BOGOMIPS: 83997.84
REGEX/SECOND: 4475092
HD SIZE: 2983.47 GB (hdd_zfs_guests)
FSYNCS/SECOND: 75.64
DNS EXT: 58.20 ms
DNS INT: 67.37 ms ()
Sometimes you see FSYNCS/SECOND values between 50 and 60.
For reference I've attached screenshots from the web interface
[Hardware and Software Configuration]
Proxmox 6.3-3
Kernel Version Linux: 5.4.78-2-pve #1 SMP PVE 5.4.78-2 (Thu, 03 Dec 2020 14:26:17 +0100)
PVE Manager Version: pve-manager/6.3-3/eee5f901
CPU: Intel Xeon E2146G
Memory: 64 GB Kingston ECC
Mainboard: Fujitsu (now Kontron): D3644-B
Network: 1x 1 Gbe Intel I219-LM onboard for maintenance and webinterface
1x 10 Gbe Intel X550T (second port not used)
HDD: 4x Seagate ST4000VN008 (Ironwolf) Configured as ZFS RAID-10 (2x2) for bind mounts (name: hdd_zfs_guests)
SSD: 2x Crucial CT2000MX500 - Configured as ZFS RAID-1 for VMs and CTs (name: hdd_zfs_ssd)
NVME: 1x Samsung 970 EVO 250Gb as Boot device and Proxmox installation drive
Server Build date: Spring 2019 - Upgrade with 2x SSDs for the second pool early November 2020
Number of typical used LXC containers:
9 (8 turned of for the SAMBA tests and all turned off for tests without SAMBA)
Number of typical running VMs:
2 (turned of for the tests)
Samba CT:
privileged LXC container with Debian 10
SAMBA Version: 4.9.5-Debian
SAMBA data storage on HDD ZFS-pool an mounted via bind mount into the CT
ZPOOL List
Code:
pool: hdd_zfs_guests
state: ONLINE
scan: scrub repaired 0B in 0 days 06:14:17 with 0 errors on Sun Jan 3 00:50:29 2021
config:
NAME STATE READ WRITE CKSUM
hdd_zfs_guests ONLINE 0 0 0
mirror-0 ONLINE 0 0 0
wwn-0x5000c500b3a2d8c4 ONLINE 0 0 0
wwn-0x5000c500b3a2edef ONLINE 0 0 0
mirror-1 ONLINE 0 0 0
wwn-0x5000c500b38ee3ed ONLINE 0 0 0
wwn-0x5000c500b3a2e636 ONLINE 0 0 0
errors: No known data errors
pool: hdd_zfs_ssd
state: ONLINE
scan: scrub repaired 0B in 0 days 00:14:13 with 0 errors on Sat Jan 2 18:50:52 2021
config:
NAME STATE READ WRITE CKSUM
hdd_zfs_ssd ONLINE 0 0 0
mirror-0 ONLINE 0 0 0
wwn-0x500a0751e4a94a86 ONLINE 0 0 0
wwn-0x500a0751e4a94af8 ONLINE 0 0 0
errors: No known data errors
ZPOOL Status
Code:
root@proxmox02:~# zpool status
pool: hdd_zfs_guests
state: ONLINE
scan: scrub repaired 0B in 0 days 06:14:17 with 0 errors on Sun Jan 3 00:50:29 2021
config:
NAME STATE READ WRITE CKSUM
hdd_zfs_guests ONLINE 0 0 0
mirror-0 ONLINE 0 0 0
wwn-0x5000c500b3a2d8c4 ONLINE 0 0 0
wwn-0x5000c500b3a2edef ONLINE 0 0 0
mirror-1 ONLINE 0 0 0
wwn-0x5000c500b38ee3ed ONLINE 0 0 0
wwn-0x5000c500b3a2e636 ONLINE 0 0 0
errors: No known data errors
pool: hdd_zfs_ssd
state: ONLINE
scan: scrub repaired 0B in 0 days 00:14:13 with 0 errors on Sat Jan 2 18:50:52 2021
config:
NAME STATE READ WRITE CKSUM
hdd_zfs_ssd ONLINE 0 0 0
mirror-0 ONLINE 0 0 0
wwn-0x500a0751e4a94a86 ONLINE 0 0 0
wwn-0x500a0751e4a94af8 ONLINE 0 0 0
errors: No known data errors
root@proxmox02:~#
ZFS List
Code:
NAME USED AVAIL REFER MOUNTPOINT
hdd_zfs_guests 4.49T 2.91T 176K /hdd_zfs_guests
hdd_zfs_guests/home 12.1G 2.91T 12.1G /hdd_zfs_guests/home
hdd_zfs_guests/shares 112K 2.91T 112K /hdd_zfs_guests/shares
hdd_zfs_guests/shares-client_backups 2.33T 2.91T 937G /hdd_zfs_guests/shares-client_backups
hdd_zfs_guests/shares-incoming 897G 2.91T 897G /hdd_zfs_guests/shares-incoming
hdd_zfs_guests/shares-install 104K 2.91T 104K /hdd_zfs_guests/shares-install
hdd_zfs_guests/shares-iso-images 9.76G 2.91T 9.76G /hdd_zfs_guests/shares-iso-images
hdd_zfs_guests/shares-lost-n-found 75.3G 2.91T 75.3G /hdd_zfs_guests/shares-lost-n-found
hdd_zfs_guests/shares-maintenance 96K 2.91T 96K /hdd_zfs_guests/shares-maintenance
hdd_zfs_guests/shares-media 129G 2.91T 129G /hdd_zfs_guests/shares-media
hdd_zfs_guests/shares-nextcloud 206G 2.91T 206G /hdd_zfs_guests/shares-nextcloud
hdd_zfs_guests/shares-photos 112K 2.91T 112K /hdd_zfs_guests/shares-photos
hdd_zfs_guests/shares-plex-library 11.6G 2.91T 9.65G /hdd_zfs_guests/shares-plex-library
hdd_zfs_guests/shares-server_backup 925M 2.91T 925M /hdd_zfs_guests/shares-server_backup
hdd_zfs_guests/timemachine 856G 2.91T 765G /hdd_zfs_guests/timemachine
hdd_zfs_ssd 349G 1.42T 144K /hdd_zfs_ssd
hdd_zfs_ssd/subvol-301-disk-0 677M 19.3G 677M /hdd_zfs_ssd/subvol-301-disk-0
hdd_zfs_ssd/subvol-302-disk-0 735M 7.28G 735M /hdd_zfs_ssd/subvol-302-disk-0
hdd_zfs_ssd/subvol-401-disk-0 643M 7.37G 643M /hdd_zfs_ssd/subvol-401-disk-0
hdd_zfs_ssd/subvol-404-disk-0 1.09G 28.9G 1.09G /hdd_zfs_ssd/subvol-404-disk-0
hdd_zfs_ssd/subvol-406-disk-0 1.36G 6.64G 1.36G /hdd_zfs_ssd/subvol-406-disk-0
hdd_zfs_ssd/subvol-407-disk-0 1.40G 149G 1.40G /hdd_zfs_ssd/subvol-407-disk-0
hdd_zfs_ssd/subvol-408-disk-0 1.13G 18.9G 1.13G /hdd_zfs_ssd/subvol-408-disk-0
hdd_zfs_ssd/subvol-409-disk-0 3.04G 6.96G 3.04G /hdd_zfs_ssd/subvol-409-disk-0
hdd_zfs_ssd/subvol-410-disk-0 1.33G 6.67G 1.33G /hdd_zfs_ssd/subvol-410-disk-0
hdd_zfs_ssd/subvol-501-disk-0 3.10G 12.9G 3.10G /hdd_zfs_ssd/subvol-501-disk-0
hdd_zfs_ssd/vm-100-disk-0 33.0G 1.44T 10.3G -
hdd_zfs_ssd/vm-1001-disk-0 33.0G 1.45T 1.37G -
hdd_zfs_ssd/vm-1001-disk-1 103G 1.51T 11.2G -
hdd_zfs_ssd/vm-1002-disk-0 33.0G 1.45T 1.86G -
hdd_zfs_ssd/vm-114-disk-0 132G 1.46T 89.1G -
ZFS get
See zfs_get_all.txt
ARC summary
See arc_summary.txt
dmesg
Code:
dmesg | grep -i ahci
[ 1.544825] ahci 0000:00:17.0: version 3.0
[ 1.555236] ahci 0000:00:17.0: AHCI 0001.0301 32 slots 6 ports 6 Gbps 0x3f impl SATA mode
[ 1.555237] ahci 0000:00:17.0: flags: 64bit ncq sntf pm clo only pio slum part ems deso sadm sds apst
[ 1.620251] scsi host0: ahci
[ 1.620478] scsi host1: ahci
[ 1.620708] scsi host2: ahci
[ 1.620881] scsi host3: ahci
[ 1.620947] scsi host4: ahci
[ 1.621070] scsi host5: ahci
lspci
Code:
root@proxmox02:~# lspci
00:00.0 Host bridge: Intel Corporation 8th Gen Core Processor Host Bridge/DRAM Registers (rev 07)
00:02.0 VGA compatible controller: Intel Corporation Device 3e96
00:12.0 Signal processing controller: Intel Corporation Cannon Lake PCH Thermal Controller (rev 10)
00:14.0 USB controller: Intel Corporation Cannon Lake PCH USB 3.1 xHCI Host Controller (rev 10)
00:14.2 RAM memory: Intel Corporation Cannon Lake PCH Shared SRAM (rev 10)
00:16.0 Communication controller: Intel Corporation Cannon Lake PCH HECI Controller (rev 10)
00:17.0 SATA controller: Intel Corporation Cannon Lake PCH SATA AHCI Controller (rev 10)
00:1b.0 PCI bridge: Intel Corporation Cannon Lake PCH PCI Express Root Port (rev f0)
00:1b.4 PCI bridge: Intel Corporation Cannon Lake PCH PCI Express Root Port (rev f0)
00:1d.0 PCI bridge: Intel Corporation Cannon Lake PCH PCI Express Root Port (rev f0)
00:1f.0 ISA bridge: Intel Corporation Device a309 (rev 10)
00:1f.4 SMBus: Intel Corporation Cannon Lake PCH SMBus Controller (rev 10)
00:1f.5 Serial bus controller [0c80]: Intel Corporation Cannon Lake PCH SPI Controller (rev 10)
00:1f.6 Ethernet controller: Intel Corporation Ethernet Connection (7) I219-LM (rev 10)
02:00.0 Non-Volatile memory controller: Samsung Electronics Co Ltd NVMe SSD Controller SM981/PM981
03:00.0 Ethernet controller: Intel Corporation Ethernet Controller 10G X550T (rev 01)
03:00.1 Ethernet controller: Intel Corporation Ethernet Controller 10G X550T (rev 01)
SMART Values
See smart_sd*.txt
System Temperatures
Code:
root@proxmox02:~/zfs_debug# sensors
coretemp-isa-0000
Adapter: ISA adapter
Package id 0: +38.0°C (high = +80.0°C, crit = +100.0°C)
Core 0: +35.0°C (high = +80.0°C, crit = +100.0°C)
Core 1: +35.0°C (high = +80.0°C, crit = +100.0°C)
Core 2: +38.0°C (high = +80.0°C, crit = +100.0°C)
Core 3: +36.0°C (high = +80.0°C, crit = +100.0°C)
Core 4: +36.0°C (high = +80.0°C, crit = +100.0°C)
Core 5: +36.0°C (high = +80.0°C, crit = +100.0°C)
pch_cannonlake-virtual-0
Adapter: Virtual device
temp1: +47.0°C
Attachments
-
atop_samba_copy.png69.8 KB · Views: 22
-
atop-pv-copy.png69.5 KB · Views: 21
-
io_delay_samba.png58.1 KB · Views: 20
-
iodelay_pv-copy.png61.6 KB · Views: 19
-
smart_sdb.txt5.5 KB · Views: 4
-
smart_sdc.txt5.2 KB · Views: 0
-
smart_sdd.txt5.2 KB · Views: 0
-
smart_sde.txt5.2 KB · Views: 0
-
arc_summary.txt21.8 KB · Views: 13