pve host summary shows 100% io, but iotop shows only low io

Kodey

Member
Oct 26, 2021
90
4
13
The summary data for the host show io delay at nearly 100% constantly, but iotop doesn't show any significant io.
All vms and cts are stopped. All external drives are disconnected but the summary shows an almost solib block of blue.
mem usage is down to about 24% and cpu hardly tops 1%,
I've stopped corosync and all zfs backups but there's no change.
I've tried to find out where pve gets it's summary info for io delay and stopping rrdcached makes no difference either.
None of the zpools report any problems.
iotop reports a little intermittent io for { rrdcached, pmxcfs, pveproxy worker, systemd-journald, [zvol], [z_rd_int_0], iotop }
When I bring up the vms and cts everything seems to function without problems.

Can anyone suggest a way to find out what's going on?
 
I hadn't
Code:
$ zpool iostat 5
              capacity     operations     bandwidth
pool        alloc   free   read  write   read  write
----------  -----  -----  -----  -----  -----  -----
backuppool  81.5G  1.73T      0      0    983  8.07K
rpool        250G  1.57T     51     12  4.34M   197K
zfs10-pool  9.09T  5.44T    337     27   262M   509K
----------  -----  -----  -----  -----  -----  -----
backuppool  81.5G  1.73T      0      0      0      0
rpool        250G  1.57T      0      0      0      0
zfs10-pool  9.09T  5.44T      0     47      0  1.12M
----------  -----  -----  -----  -----  -----  -----
backuppool  81.5G  1.73T      0      0      0      0
rpool        250G  1.57T      0      0      0      0
zfs10-pool  9.09T  5.44T      0     38      0   983K
----------  -----  -----  -----  -----  -----  -----
backuppool  81.5G  1.73T      0      0      0      0
rpool        250G  1.57T      0      0      0      0
zfs10-pool  9.09T  5.44T      0     31      0   720K
----------  -----  -----  -----  -----  -----  -----
backuppool  81.5G  1.73T      0      0      0      0
rpool        250G  1.57T      0      0      0      0
zfs10-pool  9.09T  5.44T      0     32      0   403K
----------  -----  -----  -----  -----  -----  -----
backuppool  81.5G  1.73T      0      0      0      0
rpool        250G  1.57T      0      0      0      0
zfs10-pool  9.09T  5.44T      0     34      0   859K
----------  -----  -----  -----  -----  -----  -----

I'm not how that helps.

I started up a windows vm to create this thread and it was working fine. When I ran this command from another pc there was no problem. Once I tried to wake the vm to respond to you, it didn't wake up. I used the pveproxy to stop the vm and the host became unstable and stopped responding. I got this from the logs before it was lost:
Code:
Nov 17 00:21:21 pmhost pvedaemon[1167386]: stop VM 101: UPID:pmhost:0011D01A:006F74CD:65562561:qmstop:101:root@pam:
Nov 17 00:21:21 pmhost pvedaemon[753850]: <root@pam> starting task UPID:pmhost:0011D01A:006F74CD:65562561:qmstop:101:root@pam:
Nov 17 00:21:29 pmhost pvedaemon[742182]: VM 101 qmp command failed - VM 101 qmp command 'guest-ping' failed - got timeout
Nov 17 00:21:48 pmhost pvedaemon[730433]: VM 101 qmp command failed - VM 101 qmp command 'guest-ping' failed - got timeout
Nov 17 00:21:51 pmhost pveproxy[1160311]: proxy detected vanished client connection
Nov 17 00:21:51 pmhost pvedaemon[1167386]: VM still running - terminating now with SIGTERM
Nov 17 00:21:51 pmhost pveproxy[1160311]: proxy detected vanished client connection
Nov 17 00:22:01 pmhost CRON[1168036]: pam_unix(cron:session): session opened for user root(uid=0) by (uid=0)
Nov 17 00:22:01 pmhost CRON[1168037]: pam_unix(cron:session): session opened for user root(uid=0) by (uid=0)
Nov 17 00:22:01 pmhost pvedaemon[1167386]: VM still running - terminating now with SIGKILL
Nov 17 00:23:01 pmhost CRON[1169089]: pam_unix(cron:session): session opened for user root(uid=0) by (uid=0)
Nov 17 00:23:01 pmhost CRON[1169090]: pam_unix(cron:session): session opened for user root(uid=0) by (uid=0)

One of the containers not using only zfs10-pool still runs normally serving media
 
When rebooting I see a message
Code:
kernel:[   13.874001] PANIC: zfs: adding existent segment to range tree (offset=9ee8d6d000 size=1540000)
it boots and $zpool status -x shows all pools are healthy.
Once booted, pveproxy shows io delay normal and the logs report several
Code:
Nov 17 00:51:40 pmhost kernel: INFO: task metaslab_group_:1762 blocked for more than 120 seconds.
Nov 17 00:51:40 pmhost kernel:       Tainted: P           O       6.2.16-19-pve #1
Nov 17 00:51:40 pmhost kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Nov 17 00:51:40 pmhost kernel: task:metaslab_group_ state:D stack:0     pid:1762  ppid:2      flags:0x00004000
Nov 17 00:51:40 pmhost kernel: Call Trace:
Nov 17 00:51:40 pmhost kernel:  <TASK>
Nov 17 00:51:40 pmhost kernel:  __schedule+0x402/0x1510
Nov 17 00:51:40 pmhost kernel:  ? default_wake_function+0x1a/0x40
Nov 17 00:51:40 pmhost kernel:  ? srso_alias_return_thunk+0x5/0x7f
Nov 17 00:51:40 pmhost kernel:  ? autoremove_wake_function+0x12/0x50
Nov 17 00:51:40 pmhost kernel:  schedule+0x63/0x110
Nov 17 00:51:40 pmhost kernel:  cv_wait_common+0x109/0x140 [spl]
Nov 17 00:51:40 pmhost kernel:  ? __pfx_autoremove_wake_function+0x10/0x10
Nov 17 00:51:40 pmhost kernel:  __cv_wait+0x15/0x30 [spl]
Nov 17 00:51:40 pmhost kernel:  metaslab_load+0x49/0x9a0 [zfs]
Nov 17 00:51:40 pmhost kernel:  ? srso_alias_return_thunk+0x5/0x7f
Nov 17 00:51:40 pmhost kernel:  ? __wake_up_common_lock+0x8b/0xd0
Nov 17 00:51:40 pmhost kernel:  metaslab_preload+0x54/0xb0 [zfs]
Nov 17 00:51:40 pmhost kernel:  taskq_thread+0x2af/0x4d0 [spl]
Nov 17 00:51:40 pmhost kernel:  ? __pfx_default_wake_function+0x10/0x10
Nov 17 00:51:40 pmhost kernel:  ? __pfx_taskq_thread+0x10/0x10 [spl]
Nov 17 00:51:40 pmhost kernel:  kthread+0xe9/0x110
Nov 17 00:51:40 pmhost kernel:  ? __pfx_kthread+0x10/0x10
Nov 17 00:51:40 pmhost kernel:  ret_from_fork+0x2c/0x50
Nov 17 00:51:40 pmhost kernel:  </TASK>

Is there any way to see what's failing?
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!