Random freezes, maybe ZFS related

ksb

Member
Mar 29, 2024
54
6
8
Hi all,

I have some serious issues with random freezes. The server was freshly installed with 8.1.10.x a couple of weeks ago.
Proxmox runs on two NVMe Samsung MZQL21T9HCJR-00A07 Enterprise SSDs on ZFS (RAID1), Intel i9-13900, 64 GB DDR5 ECC RAM

Maybe a ZFS issue?
Scrub runs without any issues. Compression is on, Dedup off.

Code:
[...]
Apr 21 08:36:35 srv02 kernel: VERIFY3(remove_reference(hdr, hdr) > 0) failed (0 > 0)
Apr 21 08:36:35 srv02 kernel: PANIC at arc.c:6610:arc_write_done()
Apr 21 08:36:35 srv02 kernel: Showing stack for process 779
Apr 21 08:36:35 srv02 kernel: CPU: 0 PID: 779 Comm: z_wr_int_0 Tainted: P           O       6.5.13-5-pve #1
Apr 21 08:36:35 srv02 kernel: Hardware name: ASUSTeK COMPUTER INC. System Product Name/W680/MB DC, BIOS 2007-HET0003-24020101 02/01/2024
Apr 21 08:36:35 srv02 kernel: Call Trace:
Apr 21 08:36:35 srv02 kernel:  <TASK>
Apr 21 08:36:35 srv02 kernel:  dump_stack_lvl+0x48/0x70
Apr 21 08:36:35 srv02 kernel:  dump_stack+0x10/0x20
Apr 21 08:36:35 srv02 kernel:  spl_dumpstack+0x29/0x40 [spl]
Apr 21 08:36:35 srv02 kernel:  spl_panic+0xfc/0x120 [spl]
Apr 21 08:36:35 srv02 kernel:  arc_write_done+0x44f/0x550 [zfs]
Apr 21 08:36:35 srv02 kernel:  zio_done+0x289/0x10b0 [zfs]
Apr 21 08:36:35 srv02 kernel:  ? kfree+0x78/0x120
Apr 21 08:36:35 srv02 kernel:  zio_execute+0x88/0x130 [zfs]
Apr 21 08:36:35 srv02 kernel:  taskq_thread+0x27f/0x490 [spl]
Apr 21 08:36:35 srv02 kernel:  ? __pfx_default_wake_function+0x10/0x10
Apr 21 08:36:35 srv02 kernel:  ? __pfx_zio_execute+0x10/0x10 [zfs]
Apr 21 08:36:35 srv02 kernel:  ? __pfx_taskq_thread+0x10/0x10 [spl]
Apr 21 08:36:35 srv02 kernel:  kthread+0xef/0x120
Apr 21 08:36:35 srv02 kernel:  ? __pfx_kthread+0x10/0x10
Apr 21 08:36:35 srv02 kernel:  ret_from_fork+0x44/0x70
Apr 21 08:36:35 srv02 kernel:  ? __pfx_kthread+0x10/0x10
Apr 21 08:36:35 srv02 kernel:  ret_from_fork_asm+0x1b/0x30
Apr 21 08:36:35 srv02 kernel:  </TASK>
[...]
(I attached the complete logs)

Code:
uname -a
Linux srv02 6.5.13-5-pve #1 SMP PREEMPT_DYNAMIC PMX 6.5.13-5 (2024-04-05T11:03Z) x86_64 GNU/Linux

Code:
cat /proc/cmdline
initrd=\EFI\proxmox\6.5.13-5-pve\initrd.img-6.5.13-5-pve root=ZFS=rpool/ROOT/pve-1 boot=zfs pcie_aspm.policy=performance split_lock_detect=off

Code:
cat /sys/module/zfs/parameters/zfs_arc_max
6714032128

Code:
zpool status
  pool: rpool
 state: ONLINE
  scan: scrub repaired 0B in 00:06:56 with 0 errors on Sun Apr 21 10:22:48 2024
config:

        NAME                                                 STATE     READ WRITE CKSUM
        rpool                                                ONLINE       0     0     0
          mirror-0                                           ONLINE       0     0     0
            nvme-eui.36344730571023490025385300000001-part3  ONLINE       0     0     0
            nvme-eui.36344730571023550025385300000001-part3  ONLINE       0     0     0

errors: No known data errors

Does anyone run into the same problem?
 

Attachments

  • CrashApr21.txt
    27.5 KB · Views: 2
  • CrashApr24.txt
    7.6 KB · Views: 0
  • dpkg.txt
    11.3 KB · Views: 1
Two days ago I had another freeze and decided to upgrade to 8.2.2.
Unfortunately the issue is not resolved:

Code:
[69993.693244] VERIFY3(remove_reference(hdr, hdr) > 0) failed (0 > 0)
[69993.693250] PANIC at arc.c:6610:arc_write_done()
[69993.693252] Showing stack for process 795
[69993.693253] CPU: 0 PID: 795 Comm: z_wr_int_1 Tainted: P      D    O       6.8.4-2-pve #1
[69993.693256] Hardware name: ASUSTeK COMPUTER INC. System Product Name/W680/MB DC, BIOS 2007-HET0003-24020101 02/01/2024
[69993.693257] Call Trace:
[69993.693259]  <TASK>
[69993.693261]  dump_stack_lvl+0x48/0x70
[69993.693267]  dump_stack+0x10/0x20
[69993.693269]  spl_dumpstack+0x29/0x40 [spl]
[69993.693278]  spl_panic+0xfc/0x120 [spl]
[69993.693287]  arc_write_done+0x44f/0x550 [zfs]
[69993.693404]  zio_done+0x289/0x10b0 [zfs]
[69993.693509]  zio_execute+0x88/0x130 [zfs]
[69993.693610]  taskq_thread+0x27f/0x490 [spl]
[69993.693619]  ? __pfx_default_wake_function+0x10/0x10
[69993.693623]  ? __pfx_zio_execute+0x10/0x10 [zfs]
[69993.693725]  ? __pfx_taskq_thread+0x10/0x10 [spl]
[69993.693733]  kthread+0xef/0x120
[69993.693735]  ? __pfx_kthread+0x10/0x10
[69993.693737]  ret_from_fork+0x44/0x70
[69993.693739]  ? __pfx_kthread+0x10/0x10
[69993.693741]  ret_from_fork_asm+0x1b/0x30
[69993.693744]  </TASK>

Should I maybe disable ARC?
 
A couple of logs
 

Attachments

  • CrashMay01.txt
    24.8 KB · Views: 0
  • CrashApr30.txt
    10.3 KB · Views: 0
  • CrashApr29.txt
    21.2 KB · Views: 0
  • CrashAprxx2.txt
    5.7 KB · Views: 0
  • CrashAprxx.txt
    5.1 KB · Views: 1
I am nowhere near qualified enough to understand these logs, but since you got no answer, I will give it a try.
Should I maybe disable ARC?
Is your system running out of RAM? Maybe this happens and you have no SWAP.
How much of the 64GB are assigned to VMs?
I for one also have 64GB, my VMs could use 32GB and ARC it set to 24GB max.

Maybe you could try to lower your max ARC size?
 
  • Like
Reactions: Kingneutron
Code:
Apr 29 15:02:01 srv02 kernel: INFO: task txg_sync:777 blocked for more than 122 seconds.
Apr 29 15:02:01 srv02 kernel:       Tainted: P           O       6.8.4-2-pve #1
Apr 29 15:02:01 srv02 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Apr 29 15:02:01 srv02 kernel: task:txg_sync        state:D stack:0     pid:777   tgid:777   ppid:2      flags:0x00004000
Apr 29 15:02:01 srv02 kernel: Call Trace:
Apr 29 15:02:01 srv02 kernel:  <TASK>
Apr 29 15:02:01 srv02 kernel:  __schedule+0x401/0x15e0
Apr 29 15:02:01 srv02 kernel:  schedule+0x33/0x110
Apr 29 15:02:01 srv02 kernel:  schedule_timeout+0x95/0x170
Apr 29 15:02:01 srv02 kernel:  ? __pfx_process_timeout+0x10/0x10
Apr 29 15:02:01 srv02 kernel:  io_schedule_timeout+0x51/0x80
Apr 29 15:02:01 srv02 kernel:  __cv_timedwait_common+0x140/0x180 [spl]
Apr 29 15:02:01 srv02 kernel:  ? __pfx_autoremove_wake_function+0x10/0x10
Apr 29 15:02:01 srv02 kernel:  __cv_timedwait_io+0x19/0x30 [spl]
Apr 29 15:02:01 srv02 kernel:  zio_wait+0x13a/0x2c0 [zfs]
Apr 29 15:02:01 srv02 kernel:  dsl_pool_sync+0xce/0x4e0 [zfs]
Apr 29 15:02:01 srv02 kernel:  spa_sync+0x578/0x1030 [zfs]
Apr 29 15:02:01 srv02 kernel:  ? spa_txg_history_init_io+0x120/0x130 [zfs]
Apr 29 15:02:01 srv02 kernel:  txg_sync_thread+0x1fd/0x390 [zfs]
Apr 29 15:02:01 srv02 kernel:  ? __pfx_txg_sync_thread+0x10/0x10 [zfs]
Apr 29 15:02:01 srv02 kernel:  ? __pfx_thread_generic_wrapper+0x10/0x10 [spl]
Apr 29 15:02:01 srv02 kernel:  thread_generic_wrapper+0x5c/0x70 [spl]
Apr 29 15:02:01 srv02 kernel:  kthread+0xef/0x120
Apr 29 15:02:01 srv02 kernel:  ? __pfx_kthread+0x10/0x10
Apr 29 15:02:01 srv02 kernel:  ret_from_fork+0x44/0x70
Apr 29 15:02:01 srv02 kernel:  ? __pfx_kthread+0x10/0x10
Apr 29 15:02:01 srv02 kernel:  ret_from_fork_asm+0x1b/0x30
Apr 29 15:02:01 srv02 kernel:  </TASK>

Code:
Apr 29 15:02:01 srv02 kernel: INFO: task z_wr_int_4:805 blocked for more than 122 seconds.
Apr 29 15:02:01 srv02 kernel:       Tainted: P           O       6.8.4-2-pve #1
Apr 29 15:02:01 srv02 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Apr 29 15:02:01 srv02 kernel: task:z_wr_int_4      state:D stack:0     pid:805   tgid:805   ppid:2      flags:0x00004000
Apr 29 15:02:01 srv02 kernel: Call Trace:
Apr 29 15:02:01 srv02 kernel:  <TASK>
Apr 29 15:02:01 srv02 kernel:  __schedule+0x401/0x15e0
Apr 29 15:02:01 srv02 kernel:  ? ret_from_fork_asm+0x1b/0x30
Apr 29 15:02:01 srv02 kernel:  schedule+0x33/0x110
Apr 29 15:02:01 srv02 kernel:  spl_panic+0x112/0x120 [spl]
Apr 29 15:02:01 srv02 kernel:  arc_write_done+0x44f/0x550 [zfs]
Apr 29 15:02:01 srv02 kernel:  ? mutex_lock+0x12/0x50
Apr 29 15:02:01 srv02 kernel:  zio_done+0x289/0x10b0 [zfs]
Apr 29 15:02:01 srv02 kernel:  zio_execute+0x88/0x130 [zfs]
Apr 29 15:02:01 srv02 kernel:  taskq_thread+0x27f/0x490 [spl]
Apr 29 15:02:01 srv02 kernel:  ? __pfx_default_wake_function+0x10/0x10
Apr 29 15:02:01 srv02 kernel:  ? __pfx_zio_execute+0x10/0x10 [zfs]
Apr 29 15:02:01 srv02 kernel:  ? __pfx_taskq_thread+0x10/0x10 [spl]
Apr 29 15:02:01 srv02 kernel:  kthread+0xef/0x120
Apr 29 15:02:01 srv02 kernel:  ? __pfx_kthread+0x10/0x10
Apr 29 15:02:01 srv02 kernel:  ret_from_fork+0x44/0x70
Apr 29 15:02:01 srv02 kernel:  ? __pfx_kthread+0x10/0x10
Apr 29 15:02:01 srv02 kernel:  ret_from_fork_asm+0x1b/0x30
Apr 29 15:02:01 srv02 kernel:  </TASK>
 
Code:
Apr 29 15:02:01 srv02 kernel: INFO: task zvol:2993 blocked for more than 122 seconds.
Apr 29 15:02:01 srv02 kernel:       Tainted: P           O       6.8.4-2-pve #1
Apr 29 15:02:01 srv02 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Apr 29 15:02:01 srv02 kernel: task:zvol            state:D stack:0     pid:2993  tgid:2993  ppid:2      flags:0x00004000
Apr 29 15:02:01 srv02 kernel: Call Trace:
Apr 29 15:02:01 srv02 kernel:  <TASK>
Apr 29 15:02:01 srv02 kernel:  __schedule+0x401/0x15e0
Apr 29 15:02:01 srv02 kernel:  ? __alloc_pages+0x251/0x1320
Apr 29 15:02:01 srv02 kernel:  schedule+0x33/0x110
Apr 29 15:02:01 srv02 kernel:  schedule_preempt_disabled+0x15/0x30
Apr 29 15:02:01 srv02 kernel:  __mutex_lock.constprop.0+0x3f8/0x7a0
Apr 29 15:02:01 srv02 kernel:  __mutex_lock_slowpath+0x13/0x20
Apr 29 15:02:01 srv02 kernel:  mutex_lock+0x3c/0x50
Apr 29 15:02:01 srv02 kernel:  buf_hash_find+0x80/0x140 [zfs]
Apr 29 15:02:01 srv02 kernel:  arc_read+0x513/0x17c0 [zfs]
Apr 29 15:02:01 srv02 kernel:  ? __pfx_dbuf_read_done+0x10/0x10 [zfs]
Apr 29 15:02:01 srv02 kernel:  dbuf_read_impl.constprop.0+0x57b/0x890 [zfs]
Apr 29 15:02:01 srv02 kernel:  ? mutex_lock+0x12/0x50
Apr 29 15:02:01 srv02 kernel:  dbuf_read+0xf3/0x620 [zfs]
Apr 29 15:02:01 srv02 kernel:  dmu_tx_check_ioerr+0xa0/0x110 [zfs]
Apr 29 15:02:01 srv02 kernel:  dmu_tx_count_write+0x1b6/0x1d0 [zfs]
Apr 29 15:02:01 srv02 kernel:  dmu_tx_hold_write_by_dnode+0x3a/0x60 [zfs]
Apr 29 15:02:01 srv02 kernel:  zvol_write+0x223/0x670 [zfs]
Apr 29 15:02:01 srv02 kernel:  zvol_write_task+0x12/0x30 [zfs]
Apr 29 15:02:01 srv02 kernel:  taskq_thread+0x27f/0x490 [spl]
Apr 29 15:02:01 srv02 kernel:  ? __pfx_default_wake_function+0x10/0x10
Apr 29 15:02:01 srv02 kernel:  ? __pfx_zvol_write_task+0x10/0x10 [zfs]
Apr 29 15:02:01 srv02 kernel:  ? __pfx_taskq_thread+0x10/0x10 [spl]
Apr 29 15:02:01 srv02 kernel:  kthread+0xef/0x120
Apr 29 15:02:01 srv02 kernel:  ? __pfx_kthread+0x10/0x10
Apr 29 15:02:01 srv02 kernel:  ret_from_fork+0x44/0x70
Apr 29 15:02:01 srv02 kernel:  ? __pfx_kthread+0x10/0x10
Apr 29 15:02:01 srv02 kernel:  ret_from_fork_asm+0x1b/0x30
Apr 29 15:02:01 srv02 kernel:  </TASK>
 
I am nowhere near qualified enough to understand these logs, but since you got no answer, I will give it a try.

Is your system running out of RAM? Maybe this happens and you have no SWAP.
How much of the 64GB are assigned to VMs?
I for one also have 64GB, my VMs could use 32GB and ARC it set to 24GB max.

Maybe you could try to lower your max ARC size?
Sorry, I forgot to update the thread.

I disabled ARC by setting primarycache=none, but I still have the crashes.

Resources:
1714554629797.png
 
Last edited:
Sorry I can't really further help you. Only idea I have left is to live boot memtest86 to see if there is something wrong with your RAM.
 
Would that report errors in the proxmox shell? I am seriously asking, have no idea. I only know that memtest does report errors.
 
Code:
May 02 01:15:01 srv02 CRON[365546]: (root) CMD (command -v debian-sa1 > /dev/null && debian-sa1 1 1)
May 02 01:15:01 srv02 CRON[365545]: pam_unix(cron:session): session closed for user root
May 02 01:17:01 srv02 CRON[366334]: pam_unix(cron:session): session opened for user root(uid=0) by (uid=0)
May 02 01:17:01 srv02 CRON[366335]: (root) CMD (cd / && run-parts --report /etc/cron.hourly)
May 02 01:17:01 srv02 CRON[366334]: pam_unix(cron:session): session closed for user root
May 02 01:23:03 srv02 kernel: VERIFY3(remove_reference(hdr, hdr) > 0) failed (0 > 0)
May 02 01:23:03 srv02 kernel: PANIC at arc.c:6610:arc_write_done()
May 02 01:23:03 srv02 kernel: Showing stack for process 803
May 02 01:23:03 srv02 kernel: CPU: 19 PID: 803 Comm: z_wr_int_4 Tainted: P           O       6.8.4-2-pve #1
May 02 01:23:03 srv02 kernel: Hardware name: ASUSTeK COMPUTER INC. System Product Name/W680/MB DC, BIOS 2007-HET0003-24020101 02/01/2024
May 02 01:23:03 srv02 kernel: Call Trace:
May 02 01:23:03 srv02 kernel:  <TASK>
May 02 01:23:03 srv02 kernel:  dump_stack_lvl+0x48/0x70
May 02 01:23:03 srv02 kernel:  dump_stack+0x10/0x20
May 02 01:23:03 srv02 kernel:  spl_dumpstack+0x29/0x40 [spl]
May 02 01:23:03 srv02 kernel:  spl_panic+0xfc/0x120 [spl]
May 02 01:23:03 srv02 kernel:  arc_write_done+0x44f/0x550 [zfs]
May 02 01:23:03 srv02 kernel:  zio_done+0x289/0x10b0 [zfs]
May 02 01:23:03 srv02 kernel:  zio_execute+0x88/0x130 [zfs]
May 02 01:23:03 srv02 kernel:  taskq_thread+0x27f/0x490 [spl]
May 02 01:23:03 srv02 kernel:  ? __pfx_default_wake_function+0x10/0x10
May 02 01:23:03 srv02 kernel:  ? __pfx_zio_execute+0x10/0x10 [zfs]
May 02 01:23:03 srv02 kernel:  ? __pfx_taskq_thread+0x10/0x10 [spl]
May 02 01:23:03 srv02 kernel:  kthread+0xef/0x120
May 02 01:23:03 srv02 kernel:  ? __pfx_kthread+0x10/0x10
May 02 01:23:03 srv02 kernel:  ret_from_fork+0x44/0x70
May 02 01:23:03 srv02 kernel:  ? __pfx_kthread+0x10/0x10
May 02 01:23:03 srv02 kernel:  ret_from_fork_asm+0x1b/0x30
May 02 01:23:03 srv02 kernel:  </TASK>
May 02 01:25:01 srv02 CRON[368604]: pam_unix(cron:session): session opened for user root(uid=0) by (uid=0)
May 02 01:25:01 srv02 CRON[368605]: (root) CMD (command -v debian-sa1 > /dev/null && debian-sa1 1 1)
May 02 01:25:01 srv02 CRON[368604]: pam_unix(cron:session): session closed for user root
May 02 01:25:23 srv02 kernel: INFO: task z_wr_int_0:629 blocked for more than 122 seconds.
May 02 01:25:23 srv02 kernel:       Tainted: P           O       6.8.4-2-pve #1
May 02 01:25:23 srv02 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
May 02 01:25:23 srv02 kernel: task:z_wr_int_0      state:D stack:0     pid:629   tgid:629   ppid:2      flags:0x00004000
May 02 01:25:23 srv02 kernel: Call Trace:
May 02 01:25:23 srv02 kernel:  <TASK>
May 02 01:25:23 srv02 kernel:  __schedule+0x401/0x15e0
May 02 01:25:23 srv02 kernel:  ? __slab_free+0xdf/0x310
May 02 01:25:23 srv02 kernel:  schedule+0x33/0x110
May 02 01:25:23 srv02 kernel:  schedule_preempt_disabled+0x15/0x30
May 02 01:25:23 srv02 kernel:  __mutex_lock.constprop.0+0x3f8/0x7a0
May 02 01:25:23 srv02 kernel:  __mutex_lock_slowpath+0x13/0x20
May 02 01:25:23 srv02 kernel:  mutex_lock+0x3c/0x50
May 02 01:25:23 srv02 kernel:  buf_hash_insert+0x56/0x1a0 [zfs]
May 02 01:25:23 srv02 kernel:  arc_write_done+0x153/0x550 [zfs]
May 02 01:25:23 srv02 kernel:  zio_done+0x289/0x10b0 [zfs]
May 02 01:25:23 srv02 kernel:  zio_execute+0x88/0x130 [zfs]
May 02 01:25:23 srv02 kernel:  taskq_thread+0x27f/0x490 [spl]
May 02 01:25:23 srv02 kernel:  ? __pfx_default_wake_function+0x10/0x10
May 02 01:25:23 srv02 kernel:  ? __pfx_zio_execute+0x10/0x10 [zfs]
May 02 01:25:23 srv02 kernel:  ? __pfx_taskq_thread+0x10/0x10 [spl]
May 02 01:25:23 srv02 kernel:  kthread+0xef/0x120
May 02 01:25:23 srv02 kernel:  ? __pfx_kthread+0x10/0x10
May 02 01:25:23 srv02 kernel:  ret_from_fork+0x44/0x70
May 02 01:25:23 srv02 kernel:  ? __pfx_kthread+0x10/0x10
May 02 01:25:23 srv02 kernel:  ret_from_fork_asm+0x1b/0x30
May 02 01:25:23 srv02 kernel:  </TASK>
 
Code:
May 02 01:25:23 srv02 kernel: INFO: task txg_sync:765 blocked for more than 122 seconds.
May 02 01:25:23 srv02 kernel:       Tainted: P           O       6.8.4-2-pve #1
May 02 01:25:23 srv02 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
May 02 01:25:23 srv02 kernel: task:txg_sync        state:D stack:0     pid:765   tgid:765   ppid:2      flags:0x00004000
May 02 01:25:23 srv02 kernel: Call Trace:
May 02 01:25:23 srv02 kernel:  <TASK>
May 02 01:25:23 srv02 kernel:  __schedule+0x401/0x15e0
May 02 01:25:23 srv02 kernel:  schedule+0x33/0x110
May 02 01:25:23 srv02 kernel:  schedule_timeout+0x95/0x170
May 02 01:25:23 srv02 kernel:  ? __pfx_process_timeout+0x10/0x10
May 02 01:25:23 srv02 kernel:  io_schedule_timeout+0x51/0x80
May 02 01:25:23 srv02 kernel:  __cv_timedwait_common+0x140/0x180 [spl]
May 02 01:25:23 srv02 kernel:  ? __pfx_autoremove_wake_function+0x10/0x10
May 02 01:25:23 srv02 kernel:  __cv_timedwait_io+0x19/0x30 [spl]
May 02 01:25:23 srv02 kernel:  zio_wait+0x13a/0x2c0 [zfs]
May 02 01:25:23 srv02 kernel:  dsl_pool_sync+0xce/0x4e0 [zfs]
May 02 01:25:23 srv02 kernel:  spa_sync+0x578/0x1030 [zfs]
May 02 01:25:23 srv02 kernel:  ? spa_txg_history_init_io+0x120/0x130 [zfs]
May 02 01:25:23 srv02 kernel:  txg_sync_thread+0x1fd/0x390 [zfs]
May 02 01:25:23 srv02 kernel:  ? __pfx_txg_sync_thread+0x10/0x10 [zfs]
May 02 01:25:23 srv02 kernel:  ? __pfx_thread_generic_wrapper+0x10/0x10 [spl]
May 02 01:25:23 srv02 kernel:  thread_generic_wrapper+0x5c/0x70 [spl]
May 02 01:25:23 srv02 kernel:  kthread+0xef/0x120
May 02 01:25:23 srv02 kernel:  ? __pfx_kthread+0x10/0x10
May 02 01:25:23 srv02 kernel:  ret_from_fork+0x44/0x70
May 02 01:25:23 srv02 kernel:  ? __pfx_kthread+0x10/0x10
May 02 01:25:23 srv02 kernel:  ret_from_fork_asm+0x1b/0x30
May 02 01:25:23 srv02 kernel:  </TASK>
May 02 01:25:23 srv02 kernel: INFO: task z_wr_int_4:803 blocked for more than 122 seconds.
May 02 01:25:23 srv02 kernel:       Tainted: P           O       6.8.4-2-pve #1
May 02 01:25:23 srv02 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
May 02 01:25:23 srv02 kernel: task:z_wr_int_4      state:D stack:0     pid:803   tgid:803   ppid:2      flags:0x00004000
May 02 01:25:23 srv02 kernel: Call Trace:
May 02 01:25:23 srv02 kernel:  <TASK>
May 02 01:25:23 srv02 kernel:  __schedule+0x401/0x15e0
May 02 01:25:23 srv02 kernel:  schedule+0x33/0x110
May 02 01:25:23 srv02 kernel:  spl_panic+0x112/0x120 [spl]
May 02 01:25:23 srv02 kernel:  arc_write_done+0x44f/0x550 [zfs]
May 02 01:25:23 srv02 kernel:  zio_done+0x289/0x10b0 [zfs]
May 02 01:25:23 srv02 kernel:  zio_execute+0x88/0x130 [zfs]
May 02 01:25:23 srv02 kernel:  taskq_thread+0x27f/0x490 [spl]
May 02 01:25:23 srv02 kernel:  ? __pfx_default_wake_function+0x10/0x10
May 02 01:25:23 srv02 kernel:  ? __pfx_zio_execute+0x10/0x10 [zfs]
May 02 01:25:23 srv02 kernel:  ? __pfx_taskq_thread+0x10/0x10 [spl]
May 02 01:25:23 srv02 kernel:  kthread+0xef/0x120
May 02 01:25:23 srv02 kernel:  ? __pfx_kthread+0x10/0x10
May 02 01:25:23 srv02 kernel:  ret_from_fork+0x44/0x70
May 02 01:25:23 srv02 kernel:  ? __pfx_kthread+0x10/0x10
May 02 01:25:23 srv02 kernel:  ret_from_fork_asm+0x1b/0x30
May 02 01:25:23 srv02 kernel:  </TASK>
May 02 01:27:26 srv02 kernel: INFO: task z_wr_int_0:629 blocked for more than 245 seconds.
May 02 01:27:26 srv02 kernel:       Tainted: P           O       6.8.4-2-pve #1
May 02 01:27:26 srv02 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
May 02 01:27:26 srv02 kernel: task:z_wr_int_0      state:D stack:0     pid:629   tgid:629   ppid:2      flags:0x00004000
May 02 01:27:26 srv02 kernel: Call Trace:
May 02 01:27:26 srv02 kernel:  <TASK>
May 02 01:27:26 srv02 kernel:  __schedule+0x401/0x15e0
May 02 01:27:26 srv02 kernel:  ? __slab_free+0xdf/0x310
May 02 01:27:26 srv02 kernel:  schedule+0x33/0x110
May 02 01:27:26 srv02 kernel:  schedule_preempt_disabled+0x15/0x30
May 02 01:27:26 srv02 kernel:  __mutex_lock.constprop.0+0x3f8/0x7a0
May 02 01:27:26 srv02 kernel:  __mutex_lock_slowpath+0x13/0x20
May 02 01:27:26 srv02 kernel:  mutex_lock+0x3c/0x50
May 02 01:27:26 srv02 kernel:  buf_hash_insert+0x56/0x1a0 [zfs]
May 02 01:27:26 srv02 kernel:  arc_write_done+0x153/0x550 [zfs]
May 02 01:27:26 srv02 kernel:  zio_done+0x289/0x10b0 [zfs]
May 02 01:27:26 srv02 kernel:  zio_execute+0x88/0x130 [zfs]
May 02 01:27:26 srv02 kernel:  taskq_thread+0x27f/0x490 [spl]
May 02 01:27:26 srv02 kernel:  ? __pfx_default_wake_function+0x10/0x10
May 02 01:27:26 srv02 kernel:  ? __pfx_zio_execute+0x10/0x10 [zfs]
May 02 01:27:26 srv02 kernel:  ? __pfx_taskq_thread+0x10/0x10 [spl]
May 02 01:27:26 srv02 kernel:  kthread+0xef/0x120
May 02 01:27:26 srv02 kernel:  ? __pfx_kthread+0x10/0x10
May 02 01:27:26 srv02 kernel:  ret_from_fork+0x44/0x70
May 02 01:27:26 srv02 kernel:  ? __pfx_kthread+0x10/0x10
May 02 01:27:26 srv02 kernel:  ret_from_fork_asm+0x1b/0x30
May 02 01:27:26 srv02 kernel:  </TASK>
 
Code:
May 02 01:27:26 srv02 kernel: INFO: task txg_sync:765 blocked for more than 245 seconds.
May 02 01:27:26 srv02 kernel:       Tainted: P           O       6.8.4-2-pve #1
May 02 01:27:26 srv02 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
May 02 01:27:26 srv02 kernel: task:txg_sync        state:D stack:0     pid:765   tgid:765   ppid:2      flags:0x00004000
May 02 01:27:26 srv02 kernel: Call Trace:
May 02 01:27:26 srv02 kernel:  <TASK>
May 02 01:27:26 srv02 kernel:  __schedule+0x401/0x15e0
May 02 01:27:26 srv02 kernel:  schedule+0x33/0x110
May 02 01:27:26 srv02 kernel:  schedule_timeout+0x95/0x170
May 02 01:27:26 srv02 kernel:  ? __pfx_process_timeout+0x10/0x10
May 02 01:27:26 srv02 kernel:  io_schedule_timeout+0x51/0x80
May 02 01:27:26 srv02 kernel:  __cv_timedwait_common+0x140/0x180 [spl]
May 02 01:27:26 srv02 kernel:  ? __pfx_autoremove_wake_function+0x10/0x10
May 02 01:27:26 srv02 kernel:  __cv_timedwait_io+0x19/0x30 [spl]
May 02 01:27:26 srv02 kernel:  zio_wait+0x13a/0x2c0 [zfs]
May 02 01:27:26 srv02 kernel:  dsl_pool_sync+0xce/0x4e0 [zfs]
May 02 01:27:26 srv02 kernel:  spa_sync+0x578/0x1030 [zfs]
May 02 01:27:26 srv02 kernel:  ? spa_txg_history_init_io+0x120/0x130 [zfs]
May 02 01:27:26 srv02 kernel:  txg_sync_thread+0x1fd/0x390 [zfs]
May 02 01:27:26 srv02 kernel:  ? __pfx_txg_sync_thread+0x10/0x10 [zfs]
May 02 01:27:26 srv02 kernel:  ? __pfx_thread_generic_wrapper+0x10/0x10 [spl]
May 02 01:27:26 srv02 kernel:  thread_generic_wrapper+0x5c/0x70 [spl]
May 02 01:27:26 srv02 kernel:  kthread+0xef/0x120
May 02 01:27:26 srv02 kernel:  ? __pfx_kthread+0x10/0x10
May 02 01:27:26 srv02 kernel:  ret_from_fork+0x44/0x70
May 02 01:27:26 srv02 kernel:  ? __pfx_kthread+0x10/0x10
May 02 01:27:26 srv02 kernel:  ret_from_fork_asm+0x1b/0x30
May 02 01:27:26 srv02 kernel:  </TASK>
May 02 01:27:26 srv02 kernel: INFO: task z_wr_int_4:803 blocked for more than 245 seconds.
May 02 01:27:26 srv02 kernel:       Tainted: P           O       6.8.4-2-pve #1
May 02 01:27:26 srv02 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
May 02 01:27:26 srv02 kernel: task:z_wr_int_4      state:D stack:0     pid:803   tgid:803   ppid:2      flags:0x00004000
May 02 01:27:26 srv02 kernel: Call Trace:
May 02 01:27:26 srv02 kernel:  <TASK>
May 02 01:27:26 srv02 kernel:  __schedule+0x401/0x15e0
May 02 01:27:26 srv02 kernel:  schedule+0x33/0x110
May 02 01:27:26 srv02 kernel:  spl_panic+0x112/0x120 [spl]
May 02 01:27:26 srv02 kernel:  arc_write_done+0x44f/0x550 [zfs]
May 02 01:27:26 srv02 kernel:  zio_done+0x289/0x10b0 [zfs]
May 02 01:27:26 srv02 kernel:  zio_execute+0x88/0x130 [zfs]
May 02 01:27:26 srv02 kernel:  taskq_thread+0x27f/0x490 [spl]
May 02 01:27:26 srv02 kernel:  ? __pfx_default_wake_function+0x10/0x10
May 02 01:27:26 srv02 kernel:  ? __pfx_zio_execute+0x10/0x10 [zfs]
May 02 01:27:26 srv02 kernel:  ? __pfx_taskq_thread+0x10/0x10 [spl]
May 02 01:27:26 srv02 kernel:  kthread+0xef/0x120
May 02 01:27:26 srv02 kernel:  ? __pfx_kthread+0x10/0x10
May 02 01:27:26 srv02 kernel:  ret_from_fork+0x44/0x70
May 02 01:27:26 srv02 kernel:  ? __pfx_kthread+0x10/0x10
May 02 01:27:26 srv02 kernel:  ret_from_fork_asm+0x1b/0x30
May 02 01:27:26 srv02 kernel:  </TASK>
May 02 01:27:26 srv02 kernel: INFO: task zvol:2798 blocked for more than 122 seconds.
May 02 01:27:26 srv02 kernel:       Tainted: P           O       6.8.4-2-pve #1
May 02 01:27:26 srv02 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
May 02 01:27:26 srv02 kernel: task:zvol            state:D stack:0     pid:2798  tgid:2798  ppid:2      flags:0x00004000
May 02 01:27:26 srv02 kernel: Call Trace:
May 02 01:27:26 srv02 kernel:  <TASK>
May 02 01:27:26 srv02 kernel:  __schedule+0x401/0x15e0
May 02 01:27:26 srv02 kernel:  ? __cv_init+0x42/0x70 [spl]
May 02 01:27:26 srv02 kernel:  ? spl_kmem_cache_alloc+0xad/0x680 [spl]
May 02 01:27:26 srv02 kernel:  schedule+0x33/0x110
May 02 01:27:26 srv02 kernel:  cv_wait_common+0x109/0x140 [spl]
May 02 01:27:26 srv02 kernel:  ? __pfx_autoremove_wake_function+0x10/0x10
May 02 01:27:26 srv02 kernel:  __cv_wait+0x15/0x30 [spl]
May 02 01:27:26 srv02 kernel:  dbuf_read+0x3eb/0x620 [zfs]
May 02 01:27:26 srv02 kernel:  dmu_tx_check_ioerr+0xa0/0x110 [zfs]
May 02 01:27:26 srv02 kernel:  dmu_tx_count_write+0xe2/0x1d0 [zfs]
May 02 01:27:26 srv02 kernel:  ? dmu_tx_hold_dnode_impl+0x57/0x130 [zfs]
May 02 01:27:26 srv02 kernel:  dmu_tx_hold_write_by_dnode+0x3a/0x60 [zfs]
May 02 01:27:26 srv02 kernel:  zvol_write+0x223/0x670 [zfs]
May 02 01:27:26 srv02 kernel:  zvol_write_task+0x12/0x30 [zfs]
May 02 01:27:26 srv02 kernel:  taskq_thread+0x27f/0x490 [spl]
May 02 01:27:26 srv02 kernel:  ? __pfx_default_wake_function+0x10/0x10
May 02 01:27:26 srv02 kernel:  ? __pfx_zvol_write_task+0x10/0x10 [zfs]
May 02 01:27:26 srv02 kernel:  ? __pfx_taskq_thread+0x10/0x10 [spl]
May 02 01:27:26 srv02 kernel:  kthread+0xef/0x120
May 02 01:27:26 srv02 kernel:  ? __pfx_kthread+0x10/0x10
May 02 01:27:26 srv02 kernel:  ret_from_fork+0x44/0x70
May 02 01:27:26 srv02 kernel:  ? __pfx_kthread+0x10/0x10
May 02 01:27:26 srv02 kernel:  ret_from_fork_asm+0x1b/0x30
May 02 01:27:26 srv02 kernel:  </TASK>
May 02 01:27:26 srv02 kernel: INFO: task zvol:2799 blocked for more than 122 seconds.
May 02 01:27:26 srv02 kernel:       Tainted: P           O       6.8.4-2-pve #1
May 02 01:27:26 srv02 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
May 02 01:27:26 srv02 kernel: task:zvol            state:D stack:0     pid:2799  tgid:2799  ppid:2      flags:0x00004000
May 02 01:27:26 srv02 kernel: Call Trace:
May 02 01:27:26 srv02 kernel:  <TASK>
May 02 01:27:26 srv02 kernel:  __schedule+0x401/0x15e0
May 02 01:27:26 srv02 kernel:  ? mutex_lock+0x12/0x50
May 02 01:27:26 srv02 kernel:  schedule+0x33/0x110
May 02 01:27:26 srv02 kernel:  schedule_preempt_disabled+0x15/0x30
May 02 01:27:26 srv02 kernel:  __mutex_lock.constprop.0+0x3f8/0x7a0
May 02 01:27:26 srv02 kernel:  __mutex_lock_slowpath+0x13/0x20
May 02 01:27:26 srv02 kernel:  mutex_lock+0x3c/0x50
May 02 01:27:26 srv02 kernel:  buf_hash_find+0x80/0x140 [zfs]
May 02 01:27:26 srv02 kernel:  arc_read+0x513/0x17c0 [zfs]
May 02 01:27:26 srv02 kernel:  ? __pfx_dbuf_read_done+0x10/0x10 [zfs]
May 02 01:27:26 srv02 kernel:  ? __mutex_lock.constprop.0+0x6d6/0x7a0
May 02 01:27:26 srv02 kernel:  ? raw_spin_rq_unlock+0x10/0x40
May 02 01:27:26 srv02 kernel:  dbuf_read_impl.constprop.0+0x57b/0x890 [zfs]
May 02 01:27:26 srv02 kernel:  ? mutex_lock+0x12/0x50
May 02 01:27:26 srv02 kernel:  dbuf_read+0xf3/0x620 [zfs]
May 02 01:27:26 srv02 kernel:  dmu_tx_check_ioerr+0xa0/0x110 [zfs]
May 02 01:27:26 srv02 kernel:  dmu_tx_count_write+0x1b6/0x1d0 [zfs]
May 02 01:27:26 srv02 kernel:  dmu_tx_hold_write_by_dnode+0x3a/0x60 [zfs]
May 02 01:27:26 srv02 kernel:  zvol_write+0x223/0x670 [zfs]
May 02 01:27:26 srv02 kernel:  zvol_write_task+0x12/0x30 [zfs]
May 02 01:27:26 srv02 kernel:  taskq_thread+0x27f/0x490 [spl]
May 02 01:27:26 srv02 kernel:  ? __pfx_default_wake_function+0x10/0x10
May 02 01:27:26 srv02 kernel:  ? __pfx_zvol_write_task+0x10/0x10 [zfs]
May 02 01:27:26 srv02 kernel:  ? __pfx_taskq_thread+0x10/0x10 [spl]
May 02 01:27:26 srv02 kernel:  kthread+0xef/0x120
May 02 01:27:26 srv02 kernel:  ? __pfx_kthread+0x10/0x10
May 02 01:27:26 srv02 kernel:  ret_from_fork+0x44/0x70
May 02 01:27:26 srv02 kernel:  ? __pfx_kthread+0x10/0x10
May 02 01:27:26 srv02 kernel:  ret_from_fork_asm+0x1b/0x30
May 02 01:27:26 srv02 kernel:  </TASK>
May 02 01:27:26 srv02 kernel: INFO: task zvol:2806 blocked for more than 122 seconds.
May 02 01:27:26 srv02 kernel:       Tainted: P           O       6.8.4-2-pve #1
May 02 01:27:26 srv02 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
May 02 01:27:26 srv02 kernel: task:zvol            state:D stack:0     pid:2806  tgid:2806  ppid:2      flags:0x00004000
May 02 01:27:26 srv02 kernel: Call Trace:
May 02 01:27:26 srv02 kernel:  <TASK>
May 02 01:27:26 srv02 kernel:  __schedule+0x401/0x15e0
May 02 01:27:26 srv02 kernel:  schedule+0x33/0x110
May 02 01:27:26 srv02 kernel:  cv_wait_common+0x109/0x140 [spl]
May 02 01:27:26 srv02 kernel:  ? __pfx_autoremove_wake_function+0x10/0x10
May 02 01:27:26 srv02 kernel:  __cv_wait+0x15/0x30 [spl]
May 02 01:27:26 srv02 kernel:  dbuf_read+0x3eb/0x620 [zfs]
May 02 01:27:26 srv02 kernel:  dmu_tx_check_ioerr+0xa0/0x110 [zfs]
May 02 01:27:26 srv02 kernel:  dmu_tx_count_write+0xe2/0x1d0 [zfs]
May 02 01:27:26 srv02 kernel:  ? dmu_tx_hold_dnode_impl+0x57/0x130 [zfs]
May 02 01:27:26 srv02 kernel:  dmu_tx_hold_write_by_dnode+0x3a/0x60 [zfs]
May 02 01:27:26 srv02 kernel:  zvol_write+0x223/0x670 [zfs]
May 02 01:27:26 srv02 kernel:  zvol_write_task+0x12/0x30 [zfs]
May 02 01:27:26 srv02 kernel:  taskq_thread+0x27f/0x490 [spl]
May 02 01:27:26 srv02 kernel:  ? __pfx_default_wake_function+0x10/0x10
May 02 01:27:26 srv02 kernel:  ? __pfx_zvol_write_task+0x10/0x10 [zfs]
May 02 01:27:26 srv02 kernel:  ? __pfx_taskq_thread+0x10/0x10 [spl]
May 02 01:27:26 srv02 kernel:  kthread+0xef/0x120
May 02 01:27:26 srv02 kernel:  ? __pfx_kthread+0x10/0x10
May 02 01:27:26 srv02 kernel:  ret_from_fork+0x44/0x70
May 02 01:27:26 srv02 kernel:  ? __pfx_kthread+0x10/0x10
May 02 01:27:26 srv02 kernel:  ret_from_fork_asm+0x1b/0x30
May 02 01:27:26 srv02 kernel:  </TASK>
May 02 01:27:26 srv02 kernel: INFO: task zvol:3100 blocked for more than 122 seconds.
May 02 01:27:26 srv02 kernel:       Tainted: P           O       6.8.4-2-pve #1
May 02 01:27:26 srv02 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
May 02 01:27:26 srv02 kernel: task:zvol            state:D stack:0     pid:3100  tgid:3100  ppid:2      flags:0x00004000
May 02 01:27:26 srv02 kernel: Call Trace:
May 02 01:27:26 srv02 kernel:  <TASK>
May 02 01:27:26 srv02 kernel:  __schedule+0x401/0x15e0
May 02 01:27:26 srv02 kernel:  ? __alloc_pages+0x251/0x1320
May 02 01:27:26 srv02 kernel:  schedule+0x33/0x110
May 02 01:27:26 srv02 kernel:  schedule_preempt_disabled+0x15/0x30
May 02 01:27:26 srv02 kernel:  __mutex_lock.constprop.0+0x3f8/0x7a0
May 02 01:27:26 srv02 kernel:  __mutex_lock_slowpath+0x13/0x20
May 02 01:27:26 srv02 kernel:  mutex_lock+0x3c/0x50
May 02 01:27:26 srv02 kernel:  buf_hash_find+0x80/0x140 [zfs]
May 02 01:27:26 srv02 kernel:  arc_read+0x513/0x17c0 [zfs]
May 02 01:27:26 srv02 kernel:  ? __pfx_dbuf_read_done+0x10/0x10 [zfs]
May 02 01:27:26 srv02 kernel:  dbuf_read_impl.constprop.0+0x57b/0x890 [zfs]
May 02 01:27:26 srv02 kernel:  ? mutex_lock+0x12/0x50
May 02 01:27:26 srv02 kernel:  dbuf_read+0xf3/0x620 [zfs]
May 02 01:27:26 srv02 kernel:  dmu_tx_check_ioerr+0xa0/0x110 [zfs]
May 02 01:27:26 srv02 kernel:  dmu_tx_count_write+0x1b6/0x1d0 [zfs]
May 02 01:27:26 srv02 kernel:  dmu_tx_hold_write_by_dnode+0x3a/0x60 [zfs]
May 02 01:27:26 srv02 kernel:  zvol_write+0x223/0x670 [zfs]
May 02 01:27:26 srv02 kernel:  zvol_write_task+0x12/0x30 [zfs]
May 02 01:27:26 srv02 kernel:  taskq_thread+0x27f/0x490 [spl]
May 02 01:27:26 srv02 kernel:  ? __pfx_default_wake_function+0x10/0x10
May 02 01:27:26 srv02 kernel:  ? __pfx_zvol_write_task+0x10/0x10 [zfs]
May 02 01:27:26 srv02 kernel:  ? __pfx_taskq_thread+0x10/0x10 [spl]
May 02 01:27:26 srv02 kernel:  kthread+0xef/0x120
May 02 01:27:26 srv02 kernel:  ? __pfx_kthread+0x10/0x10
May 02 01:27:26 srv02 kernel:  ret_from_fork+0x44/0x70
May 02 01:27:26 srv02 kernel:  ? __pfx_kthread+0x10/0x10
May 02 01:27:26 srv02 kernel:  ret_from_fork_asm+0x1b/0x30
May 02 01:27:26 srv02 kernel:  </TASK>
May 02 01:27:26 srv02 kernel: Future hung task reports are suppressed, see sysctl kernel.hung_task_warnings
May 02 01:28:16 srv02 pvestatd[2382]: VM 108 qmp command failed - VM 108 qmp command 'query-proxmox-support' failed - got timeout
May 02 01:28:16 srv02 pvestatd[2382]: status update time (8.283 seconds)
May 02 01:28:26 srv02 pvestatd[2382]: VM 108 qmp command failed - VM 108 qmp command 'query-proxmox-support' failed - unable to connect to VM 108 qmp socket - timeout after 51 retries
May 02 01:28:27 srv02 pvestatd[2382]: status update time (8.300 seconds)
[...]
 
On todays crash I was able to check "top".
The kvm processes show >100% CPU load.

1714624041088.png

Yesterday I set the "Async IO" on all virtual disks to "aio=threads".
I will now set the disk cache from writeback to none (cache=none) on all VMs.

@freebee , fyi
 
Last edited:
Probably not your problem, since the error seems to be ARC related, but did you try the new BIOS that follows the Intel Baseline Profile? That could explain freezes under high load.
 
Probably not your problem, since the error seems to be ARC related, but did you try the new BIOS that follows the Intel Baseline Profile? That could explain freezes under high load.
As far as I know only K and KS CPUs are affected. I have a i9-13900 (non-K).
I disabled ARC using primarycache=none, so I think ZFS/ARC is not the root cause.
I cannot update the BIOS because it is a Rootserver (EX101) from Hetzner.

Here is a German thread which looks quite similiar and I have the same issues.
https://forum.proxmox.com/threads/proxmox-freeze-nach-kernel-update-to-6-8-4-2-pve.145920/
A small difference is, that I also had issues with kernel 6.5.13-x before.
Now I'm on 6.8.4-2 which maybe fixed some issues and brought new ones.


@Bierfassl , I forgot to mention you before.
 
Last edited:
  • Like
Reactions: IsThisThingOn
I am also having issues with the Xeon Scalable Gen1. I don't have any good traces at this time, yet when mine dies, the server doesn't freeze. It seems like the NIC's loose carrier (the Intel 500 Series). Then my whole dashboard just shows blank machines. Back on the 6.5 kernel, and the problem is gone.
 
Last edited:
  • Like
Reactions: ksb
@benyamin , did you also you this?
If you meant aio=threads, then yes. You also need iothread=1.

As for cache=none (default), I rely on what the physical disk controller provides. In my analysis, the performance improvement of also enabling various QEMU cache types was negligible. It can also make troubleshooting difficult by shifting the problem around and can cause host memory pressure and undesirable paging IO. Stefan speaks to this in https://bugzilla.kernel.org/show_bug.cgi?id=199727#c12 and https://bugzilla.kernel.org/show_bug.cgi?id=199727#c16.

Are you using any other SCSI Controller types on any other VMs, or are they all VirtIO SCSI single?

It might be helpful to drop the contents of an example VM configuration from its *.conf file @ /etc/pve/qemu-server/.
It might also be informative if you grab one of the pids of an affected kvm process from top, and drop the output of ps aux | grep <pid>.
 
  • Like
Reactions: IsThisThingOn

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!