zfs problems on simple rsync

freebee

Member
May 8, 2020
65
4
13
40
Hi.
My setup is simple.
A proxmox VE (zfs on all disks), with a proxmox backup server virtualized.
The proxmox backup server is temporary inside.

So, in proxmox backup server virtualized, i need rsync a datastore to another, (the first disk in one SSD and the second disk in another and both is passed on smartctl).
Inside proxmox backup server is XFS on both discs.
When i make rsync from the first disc to the second, with some seconds the vm freeze totaly.
When i try to shutdown on proxmox ve the vm, got timeout.
When i stop (force) is ok, is shutdown.
When i try to restart, i got timeout on systemd start.
Just restarting the server i got all back.
I read some posts here telling the bug from zfs is fixed, but is not.
I have other nodes (6.x and 7.4) and none of then have such problems. I don't remember a proxmox version with so much problems on zfs.

Kernel Version Linux 6.5.11-7-pve (2023-12-05T09:44Z)
Manager Version pve-manager/8.1.3/b46aac3b42da5d15
#zfs --version
zfs-2.2.2-pve1
zfs-kmod-2.2.2-pve1

Log from dmesg:

[ 245.101161] INFO: task txg_sync:1915 blocked for more than 120 seconds.
[ 245.101184] Tainted: P IO 6.5.11-7-pve #1
[ 245.101197] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 245.101210] task:txg_sync state: D stack:0 pid:1915 ppid:2 flags:0x00004000
[ 245.101214] Call Trace:
[ 245.101217] <TASK>
[ 245.101221] __schedule+0x3fd/0x1450
[ 245.101229] schedule+0x63/0x110
[ 245.101232] schedule_timeout+0x95/0x170
[ 245.101236] ? __pfx_process_timeout+0x10/0x10
[ 245.101241] io_schedule_timeout+0x51/0x80
[ 245.101246] __cv_timedwait_common+0x140/0x180 [spl]
[ 245.101280] ? __pfx_autoremove_wake_function+0x10/0x10
[ 245.101284] __cv_timedwait_io+0x19/0x30 [spl]
[ 245.101295] zio_wait+0x13a/0x2c0 [zfs]
[ 245.101620] dsl_pool_sync+0xce/0x4e0 [zfs]
[ 245.101800] spa_sync+0x57a/0x1030 [zfs]
[ 245.101977] ? spa_txg_history_init_io+0x120/0x130 [zfs]
[ 245.102150] txg_sync_thread+0x1fd/0x390 [zfs]
[ 245.102322] ? __pfx_txg_sync_thread+0x10/0x10 [zfs]
[ 245.102493] ? __pfx_thread_generic_wrapper+0x10/0x10 [spl]
[ 245.102505] thread_generic_wrapper+0x5c/0x70 [spl]
[ 245.102516] kthread+0xef/0x120
[ 245.102518] ? __pfx_kthread+0x10/0x10
[ 245.102521] ret_from_fork+0x44/0x70
[ 245.102524] ? __pfx_kthread+0x10/0x10
[ 245.102527] ret_from_fork_asm+0x1b/0x30
[ 245.102532] </TASK>
 
Last edited:
Did you find a solution?
Yes.
What work for me:

SSD. Some ssds don´t really is 4k. When you format SSD in ZFS the standard is ashift 12. I turn to 9.

Raid controllers: Some raid controllers just don´t got to the zfs the performance need for write. So, So when zfs is faster at writing than disk, a timeout or lock occurs.
I believe that the zfs algorithm fails to detect the actual writing speed of the disk, thus suffering a look at the process to the point of becoming unresponsive.
The theoretical solution would be for it to adjust its writing performance according to the capacity of the disk used to handle the IOPS. Perhaps there is a mismatch between what it sends to the hardware and what the hardware can actually deliver. The disk writing process, in these cases, can cause a type of DDOS congesting the disk and the file system itself going into a loop. This may be because it is CoW, "Copy on Write". In this type of system, data is not directly overwritten in its original storage locations when it is modified. Instead, every time a file changes, ZFS writes the new data to a new location and only then updates the pointers to the new data. This helps maintain data integrity and prevents data corruption during system failures or power outages.
The same disk and controller using ext4 or xfs operate "normally".

Raid incompatibility.
ZFS is definitely not compatible with raid systems on older or even new hardware. Direct mode, IT, addresses these performance issues. I believe that a note from the project regarding this would solve many problems reported in the forums.
 
  • Like
Reactions: ksb
Did you find a solution?
UPDATE:

NOOP: A simple scheduler that operates on a FIFO queue without any additional reordering, ideal for devices that already have their own scheduler, such as SSDs.
Deadline: Aims to minimize the response time for any I/O operation, giving each operation a deadline before which it must be completed.
CFQ (Completely Fair Queuing): Seeks to distribute I/O bandwidth equally among all processes requesting I/O.
MQ-Deadline or BFQ (Budget Fair Queuing) in newer systems that use the multiple queue model (multiqueue).

You can also try changing the Linux disk scheduler in these cases:
echo deadline > /sys/block/sda/queue/scheduler
OBS: sda here is the disk where is zfs. You can list with zpool status on command prompt.


Linux by default uses CFQ. For slower disks (or slower controllers) it can be a nightmare when used with zfs.
You can try and see if this can help if is impossible to update the raid controller or disk.
 
Last edited:
  • Like
Reactions: ksb
UPDATE:

NOOP: A simple scheduler that operates on a FIFO queue without any additional reordering, ideal for devices that already have their own scheduler, such as SSDs.
Deadline: Aims to minimize the response time for any I/O operation, giving each operation a deadline before which it must be completed.
CFQ (Completely Fair Queuing): Seeks to distribute I/O bandwidth equally among all processes requesting I/O.
MQ-Deadline or BFQ (Budget Fair Queuing) in newer systems that use the multiple queue model (multiqueue).

You can also try changing the Linux disk scheduler in these cases:
echo deadline > /sys/block/sda/queue/scheduler
OBS: sda here is the disk where is zfs. You can list with zpool status on command prompt.


Linux by default uses CFQ. For slower disks (or slower controllers) it can be a nightmare when used with zfs.
You can try and see if this can help if is impossible to update the raid controller or disk.
If I check /sys/block/nvme0n1/queue/scheduler..

Code:
cat /sys/block/nvme0n1/queue/scheduler
[none] mq-deadline

... it doesn't show the the deadline option? Is this expected? So I cannot set it to deadline I assume.
 
If I check /sys/block/nvme0n1/queue/scheduler..

Code:
cat /sys/block/nvme0n1/queue/scheduler
[none] mq-deadline

... it doesn't show the the deadline option? Is this expected? So I cannot set it to deadline I assume.
You are using nvme, you can try mq-deadline. De none is the default for nvme disks. But is uncommon you have this problems on nvme.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!