zfs problems on simple rsync

freebee · Jan 14, 2024

Hi.
My setup is simple.
A proxmox VE (zfs on all disks), with a proxmox backup server virtualized.
The proxmox backup server is temporary inside.

So, in proxmox backup server virtualized, i need rsync a datastore to another, (the first disk in one SSD and the second disk in another and both is passed on smartctl).
Inside proxmox backup server is XFS on both discs.
When i make rsync from the first disc to the second, with some seconds the vm freeze totaly.
When i try to shutdown on proxmox ve the vm, got timeout.
When i stop (force) is ok, is shutdown.
When i try to restart, i got timeout on systemd start.
Just restarting the server i got all back.
I read some posts here telling the bug from zfs is fixed, but is not.
I have other nodes (6.x and 7.4) and none of then have such problems. I don't remember a proxmox version with so much problems on zfs.

Kernel Version Linux 6.5.11-7-pve (2023-12-05T09:44Z)

Manager Version pve-manager/8.1.3/b46aac3b42da5d15

#zfs --version
zfs-2.2.2-pve1
zfs-kmod-2.2.2-pve1

Log from dmesg:

[ 245.101161] INFO: task txg_sync:1915 blocked for more than 120 seconds.
[ 245.101184] Tainted: P IO 6.5.11-7-pve #1
[ 245.101197] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 245.101210] task:txg_sync state: D stack:0 pid:1915 ppid:2 flags:0x00004000
[ 245.101214] Call Trace:
[ 245.101217] <TASK>
[ 245.101221] __schedule+0x3fd/0x1450
[ 245.101229] schedule+0x63/0x110
[ 245.101232] schedule_timeout+0x95/0x170
[ 245.101236] ? __pfx_process_timeout+0x10/0x10
[ 245.101241] io_schedule_timeout+0x51/0x80
[ 245.101246] __cv_timedwait_common+0x140/0x180 [spl]
[ 245.101280] ? __pfx_autoremove_wake_function+0x10/0x10
[ 245.101284] __cv_timedwait_io+0x19/0x30 [spl]
[ 245.101295] zio_wait+0x13a/0x2c0 [zfs]
[ 245.101620] dsl_pool_sync+0xce/0x4e0 [zfs]
[ 245.101800] spa_sync+0x57a/0x1030 [zfs]
[ 245.101977] ? spa_txg_history_init_io+0x120/0x130 [zfs]
[ 245.102150] txg_sync_thread+0x1fd/0x390 [zfs]
[ 245.102322] ? __pfx_txg_sync_thread+0x10/0x10 [zfs]
[ 245.102493] ? __pfx_thread_generic_wrapper+0x10/0x10 [spl]
[ 245.102505] thread_generic_wrapper+0x5c/0x70 [spl]
[ 245.102516] kthread+0xef/0x120
[ 245.102518] ? __pfx_kthread+0x10/0x10
[ 245.102521] ret_from_fork+0x44/0x70
[ 245.102524] ? __pfx_kthread+0x10/0x10
[ 245.102527] ret_from_fork_asm+0x1b/0x30
[ 245.102532] </TASK>

ksb · May 1, 2024

Did you find a solution?

freebee · May 1, 2024

ksb said:
Did you find a solution?

Yes.
What work for me:

SSD. Some ssds don´t really is 4k. When you format SSD in ZFS the standard is ashift 12. I turn to 9.

Raid controllers: Some raid controllers just don´t got to the zfs the performance need for write. So, So when zfs is faster at writing than disk, a timeout or lock occurs.
I believe that the zfs algorithm fails to detect the actual writing speed of the disk, thus suffering a look at the process to the point of becoming unresponsive.
The theoretical solution would be for it to adjust its writing performance according to the capacity of the disk used to handle the IOPS. Perhaps there is a mismatch between what it sends to the hardware and what the hardware can actually deliver. The disk writing process, in these cases, can cause a type of DDOS congesting the disk and the file system itself going into a loop. This may be because it is CoW, "Copy on Write". In this type of system, data is not directly overwritten in its original storage locations when it is modified. Instead, every time a file changes, ZFS writes the new data to a new location and only then updates the pointers to the new data. This helps maintain data integrity and prevents data corruption during system failures or power outages.
The same disk and controller using ext4 or xfs operate "normally".

Raid incompatibility.
ZFS is definitely not compatible with raid systems on older or even new hardware. Direct mode, IT, addresses these performance issues. I believe that a note from the project regarding this would solve many problems reported in the forums.

freebee · May 1, 2024

ksb said:
Did you find a solution?

UPDATE:

NOOP: A simple scheduler that operates on a FIFO queue without any additional reordering, ideal for devices that already have their own scheduler, such as SSDs.
Deadline: Aims to minimize the response time for any I/O operation, giving each operation a deadline before which it must be completed.
CFQ (Completely Fair Queuing): Seeks to distribute I/O bandwidth equally among all processes requesting I/O.
MQ-Deadline or BFQ (Budget Fair Queuing) in newer systems that use the multiple queue model (multiqueue).

You can also try changing the Linux disk scheduler in these cases:
echo deadline > /sys/block/sda/queue/scheduler
OBS: sda here is the disk where is zfs. You can list with zpool status on command prompt.

Linux by default uses CFQ. For slower disks (or slower controllers) it can be a nightmare when used with zfs.
You can try and see if this can help if is impossible to update the raid controller or disk.

ksb · May 3, 2024

freebee said:
UPDATE:

NOOP: A simple scheduler that operates on a FIFO queue without any additional reordering, ideal for devices that already have their own scheduler, such as SSDs.
Deadline: Aims to minimize the response time for any I/O operation, giving each operation a deadline before which it must be completed.
CFQ (Completely Fair Queuing): Seeks to distribute I/O bandwidth equally among all processes requesting I/O.
MQ-Deadline or BFQ (Budget Fair Queuing) in newer systems that use the multiple queue model (multiqueue).

You can also try changing the Linux disk scheduler in these cases:
echo deadline > /sys/block/sda/queue/scheduler
OBS: sda here is the disk where is zfs. You can list with zpool status on command prompt.

Linux by default uses CFQ. For slower disks (or slower controllers) it can be a nightmare when used with zfs.
You can try and see if this can help if is impossible to update the raid controller or disk.

If I check /sys/block/nvme0n1/queue/scheduler..

Code:

cat /sys/block/nvme0n1/queue/scheduler
[none] mq-deadline

... it doesn't show the the deadline option? Is this expected? So I cannot set it to deadline I assume.

freebee · May 3, 2024

ksb said:
If I check /sys/block/nvme0n1/queue/scheduler..

Code:

cat /sys/block/nvme0n1/queue/scheduler [none] mq-deadline

... it doesn't show the the deadline option? Is this expected? So I cannot set it to deadline I assume.

You are using nvme, you can try mq-deadline. De none is the default for nvme disks. But is uncommon you have this problems on nvme.

Search

Search

zfs problems on simple rsync

freebee

Well-Known Member

ksb

Member

freebee

Well-Known Member

freebee

Well-Known Member

ksb

Member

freebee

Well-Known Member

We value your privacy