pve host summary shows 100% io, but iotop shows only low io

Kodey · Nov 16, 2023

The summary data for the host show io delay at nearly 100% constantly, but iotop doesn't show any significant io.
All vms and cts are stopped. All external drives are disconnected but the summary shows an almost solib block of blue.
mem usage is down to about 24% and cpu hardly tops 1%,
I've stopped corosync and all zfs backups but there's no change.
I've tried to find out where pve gets it's summary info for io delay and stopping rrdcached makes no difference either.
None of the zpools report any problems.
iotop reports a little intermittent io for { rrdcached, pmxcfs, pveproxy worker, systemd-journald, [zvol], [z_rd_int_0], iotop }
When I bring up the vms and cts everything seems to function without problems.

Can anyone suggest a way to find out what's going on?

LnxBil · Nov 16, 2023

have you tried

Code:

zpool iostat 5

Kodey · Nov 16, 2023

I hadn't

Code:

$ zpool iostat 5
              capacity     operations     bandwidth
pool        alloc   free   read  write   read  write
----------  -----  -----  -----  -----  -----  -----
backuppool  81.5G  1.73T      0      0    983  8.07K
rpool        250G  1.57T     51     12  4.34M   197K
zfs10-pool  9.09T  5.44T    337     27   262M   509K
----------  -----  -----  -----  -----  -----  -----
backuppool  81.5G  1.73T      0      0      0      0
rpool        250G  1.57T      0      0      0      0
zfs10-pool  9.09T  5.44T      0     47      0  1.12M
----------  -----  -----  -----  -----  -----  -----
backuppool  81.5G  1.73T      0      0      0      0
rpool        250G  1.57T      0      0      0      0
zfs10-pool  9.09T  5.44T      0     38      0   983K
----------  -----  -----  -----  -----  -----  -----
backuppool  81.5G  1.73T      0      0      0      0
rpool        250G  1.57T      0      0      0      0
zfs10-pool  9.09T  5.44T      0     31      0   720K
----------  -----  -----  -----  -----  -----  -----
backuppool  81.5G  1.73T      0      0      0      0
rpool        250G  1.57T      0      0      0      0
zfs10-pool  9.09T  5.44T      0     32      0   403K
----------  -----  -----  -----  -----  -----  -----
backuppool  81.5G  1.73T      0      0      0      0
rpool        250G  1.57T      0      0      0      0
zfs10-pool  9.09T  5.44T      0     34      0   859K
----------  -----  -----  -----  -----  -----  -----

I'm not how that helps.

I started up a windows vm to create this thread and it was working fine. When I ran this command from another pc there was no problem. Once I tried to wake the vm to respond to you, it didn't wake up. I used the pveproxy to stop the vm and the host became unstable and stopped responding. I got this from the logs before it was lost:

Code:

Nov 17 00:21:21 pmhost pvedaemon[1167386]: stop VM 101: UPID:pmhost:0011D01A:006F74CD:65562561:qmstop:101:root@pam:
Nov 17 00:21:21 pmhost pvedaemon[753850]: <root@pam> starting task UPID:pmhost:0011D01A:006F74CD:65562561:qmstop:101:root@pam:
Nov 17 00:21:29 pmhost pvedaemon[742182]: VM 101 qmp command failed - VM 101 qmp command 'guest-ping' failed - got timeout
Nov 17 00:21:48 pmhost pvedaemon[730433]: VM 101 qmp command failed - VM 101 qmp command 'guest-ping' failed - got timeout
Nov 17 00:21:51 pmhost pveproxy[1160311]: proxy detected vanished client connection
Nov 17 00:21:51 pmhost pvedaemon[1167386]: VM still running - terminating now with SIGTERM
Nov 17 00:21:51 pmhost pveproxy[1160311]: proxy detected vanished client connection
Nov 17 00:22:01 pmhost CRON[1168036]: pam_unix(cron:session): session opened for user root(uid=0) by (uid=0)
Nov 17 00:22:01 pmhost CRON[1168037]: pam_unix(cron:session): session opened for user root(uid=0) by (uid=0)
Nov 17 00:22:01 pmhost pvedaemon[1167386]: VM still running - terminating now with SIGKILL
Nov 17 00:23:01 pmhost CRON[1169089]: pam_unix(cron:session): session opened for user root(uid=0) by (uid=0)
Nov 17 00:23:01 pmhost CRON[1169090]: pam_unix(cron:session): session opened for user root(uid=0) by (uid=0)

One of the containers not using only zfs10-pool still runs normally serving media

Kodey · Nov 16, 2023

When rebooting I see a message

Code:

kernel:[   13.874001] PANIC: zfs: adding existent segment to range tree (offset=9ee8d6d000 size=1540000)

it boots and $zpool status -x shows all pools are healthy.
Once booted, pveproxy shows io delay normal and the logs report several

Code:

Nov 17 00:51:40 pmhost kernel: INFO: task metaslab_group_:1762 blocked for more than 120 seconds.
Nov 17 00:51:40 pmhost kernel:       Tainted: P           O       6.2.16-19-pve #1
Nov 17 00:51:40 pmhost kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Nov 17 00:51:40 pmhost kernel: task:metaslab_group_ state:D stack:0     pid:1762  ppid:2      flags:0x00004000
Nov 17 00:51:40 pmhost kernel: Call Trace:
Nov 17 00:51:40 pmhost kernel:  <TASK>
Nov 17 00:51:40 pmhost kernel:  __schedule+0x402/0x1510
Nov 17 00:51:40 pmhost kernel:  ? default_wake_function+0x1a/0x40
Nov 17 00:51:40 pmhost kernel:  ? srso_alias_return_thunk+0x5/0x7f
Nov 17 00:51:40 pmhost kernel:  ? autoremove_wake_function+0x12/0x50
Nov 17 00:51:40 pmhost kernel:  schedule+0x63/0x110
Nov 17 00:51:40 pmhost kernel:  cv_wait_common+0x109/0x140 [spl]
Nov 17 00:51:40 pmhost kernel:  ? __pfx_autoremove_wake_function+0x10/0x10
Nov 17 00:51:40 pmhost kernel:  __cv_wait+0x15/0x30 [spl]
Nov 17 00:51:40 pmhost kernel:  metaslab_load+0x49/0x9a0 [zfs]
Nov 17 00:51:40 pmhost kernel:  ? srso_alias_return_thunk+0x5/0x7f
Nov 17 00:51:40 pmhost kernel:  ? __wake_up_common_lock+0x8b/0xd0
Nov 17 00:51:40 pmhost kernel:  metaslab_preload+0x54/0xb0 [zfs]
Nov 17 00:51:40 pmhost kernel:  taskq_thread+0x2af/0x4d0 [spl]
Nov 17 00:51:40 pmhost kernel:  ? __pfx_default_wake_function+0x10/0x10
Nov 17 00:51:40 pmhost kernel:  ? __pfx_taskq_thread+0x10/0x10 [spl]
Nov 17 00:51:40 pmhost kernel:  kthread+0xe9/0x110
Nov 17 00:51:40 pmhost kernel:  ? __pfx_kthread+0x10/0x10
Nov 17 00:51:40 pmhost kernel:  ret_from_fork+0x2c/0x50
Nov 17 00:51:40 pmhost kernel:  </TASK>

Is there any way to see what's failing?

pve host summary shows 100% io, but iotop shows only low io

Kodey

Active Member

LnxBil

Distinguished Member

Kodey

Active Member

Kodey

Active Member

We value your privacy