Performance Issues During Updates/Backups in a Linux VM on NVMe Mirror Pool

saygonka

New Member
Sep 22, 2024
6
1
3
Hi everyone,

We recently took on a client with a Proxmox-based cluster. One of the servers hosts a virtual machine running Nextcloud, which contains over 500 GB of small files ranging from 30 KB to 4 GB. The VM disk is in RAW format.

During the update process (while creating a backup) of the Nextcloud instance, we noticed significant delays in both write and read operations. The current IOPS for read and write fluctuates between 230 and 700, which is far too low for a mirror pool consisting of 6 NVMe drives, each with 3.5 TB capacity.

1737541135221.png

The current dataset settings are as follows:

recordsize: 16K
atime: off
sync: standard

The Proxmox server is equipped with 2 INTEL(R) XEON(R) GOLD 6542Y processors and over 370 GB of RAM.

Do you have any suggestions on how to optimize the dataset to prevent future updates from taking ages?
 
What is the make/model of nvme, and what is the zpool configuration - are you using dedup or something?
Intel D7-P5520 (INTEL SSDPF2KX038T1O)
No we don’t use any deduplication. It is the initial configuration of ZFS with only small customizations, which I mentioned earlier.
 
Here are more details:

We use NVMe with Intel vROC. We don’t use any hardware RAID or HBA controller.

root@pve01:~# zpool get all VM
NAME PROPERTY VALUE SOURCE
VM size 17.4T -
VM capacity 0% -
VM altroot - default
VM health ONLINE -
VM guid 16600123178369850094 -
VM version - default
VM bootfs - default
VM delegation on default
VM autoreplace off default
VM cachefile - default
VM failmode wait default
VM listsnapshots off default
VM autoexpand off default
VM dedupratio 1.00x -
VM free 17.3T -
VM allocated 144G -
VM readonly off -
VM ashift 12 local
VM comment - default
VM expandsize - -
VM freeing 0 -
VM fragmentation 0% -
VM leaked 0 -
VM multihost off default
VM checkpoint - -
VM load_guid 18286241696273580506 -
VM autotrim off default
VM compatibility off default
VM bcloneused 0 -
VM bclonesaved 0 -
VM bcloneratio 1.00x -
VM feature@async_destroy enabled local
VM feature@empty_bpobj active local
VM feature@lz4_compress active local
VM feature@multi_vdev_crash_dump enabled local
VM feature@spacemap_histogram active local
VM feature@enabled_txg active local
VM feature@hole_birth active local
VM feature@extensible_dataset active local
VM feature@embedded_data active local
VM feature@bookmarks enabled local
VM feature@filesystem_limits enabled local
VM feature@large_blocks enabled local
VM feature@large_dnode enabled local
VM feature@sha512 enabled local
VM feature@skein enabled local
VM feature@edonr enabled local
VM feature@userobj_accounting active local
VM feature@encryption enabled local
VM feature@project_quota active local
VM feature@device_removal enabled local
VM feature@obsolete_counts enabled local
VM feature@zpool_checkpoint enabled local
VM feature@spacemap_v2 active local
VM feature@allocation_classes enabled local
VM feature@resilver_defer enabled local
VM feature@bookmark_v2 enabled local
VM feature@redaction_bookmarks enabled local
VM feature@redacted_datasets enabled local
VM feature@bookmark_written enabled local
VM feature@log_spacemap active local
VM feature@livelist enabled local
VM feature@device_rebuild enabled local
VM feature@zstd_compress enabled local
VM feature@draid enabled local
VM feature@zilsaxattr enabled local
VM feature@head_errlog active local
VM feature@blake3 enabled local
VM feature@block_cloning enabled local
VM feature@vdev_zaps_v2 active local

root@pve01:~# zpool status VM
pool: VM
state: ONLINE
config:

NAME STATE READ WRITE CKSUM
VM ONLINE 0 0 0
mirror-0 ONLINE 0 0 0
nvme0n1p1 ONLINE 0 0 0
nvme1n1p1 ONLINE 0 0 0
mirror-1 ONLINE 0 0 0
nvme2n1p1 ONLINE 0 0 0
nvme3n1p1 ONLINE 0 0 0
mirror-2 ONLINE 0 0 0
nvme4n1p1 ONLINE 0 0 0
nvme5n1p1 ONLINE 0 0 0
mirror-3 ONLINE 0 0 0
nvme6n1p1 ONLINE 0 0 0
nvme7n1p1 ONLINE 0 0 0
mirror-4 ONLINE 0 0 0
nvme8n1p1 ONLINE 0 0 0
nvme9n1p1 ONLINE 0 0 0

root@pve01:~# zfs get all -r VM
Output was too long for this post, that's why I needed to use Pastebin https://pastebin.com/La2q3247