I'm looking for ideas on tracking down the cause of this seemingly random high IO that happens on varying nodes and lasts for 30 minutes to a few hours and then goes away. I thought this problem went away with the last large update, but it's back... The only other coincidence I see that it seems to happen on nodes with active VMs.
High IO node in 4 node cluster
top - 16:18:06 up 12 days, 1:01, 1 user, load average: 1380.81, 1351.33, 1279.42
Tasks: 5012 total, 1 running, 5002 sleeping, 0 stopped, 9 zombie
%Cpu(s): 0.2 us, 0.3 sy, 0.0 ni, 8.9 id, 90.5 wa, 0.0 hi, 0.1 si, 0.0 st
MiB Mem : 257832.5 total, 40414.4 free, 43956.2 used, 173461.9 buff/cache
MiB Swap: 10240.0 total, 10101.5 free, 138.5 used. 211666.4 avail Mem
Another node in cluster without high IO
top - 16:18:39 up 12 days, 1:44, 3 users, load average: 0.66, 0.80, 0.93
Tasks: 1988 total, 2 running, 1986 sleeping, 0 stopped, 0 zombie
%Cpu(s): 0.5 us, 0.4 sy, 0.1 ni, 98.8 id, 0.1 wa, 0.0 hi, 0.1 si, 0.0 st
MiB Mem : 257832.5 total, 88293.1 free, 107006.6 used, 62532.8 buff/cache
MiB Swap: 10240.0 total, 10127.2 free, 112.8 used. 148806.7 avail Mem
High IO node in 4 node cluster
top - 16:18:06 up 12 days, 1:01, 1 user, load average: 1380.81, 1351.33, 1279.42
Tasks: 5012 total, 1 running, 5002 sleeping, 0 stopped, 9 zombie
%Cpu(s): 0.2 us, 0.3 sy, 0.0 ni, 8.9 id, 90.5 wa, 0.0 hi, 0.1 si, 0.0 st
MiB Mem : 257832.5 total, 40414.4 free, 43956.2 used, 173461.9 buff/cache
MiB Swap: 10240.0 total, 10101.5 free, 138.5 used. 211666.4 avail Mem
Another node in cluster without high IO
top - 16:18:39 up 12 days, 1:44, 3 users, load average: 0.66, 0.80, 0.93
Tasks: 1988 total, 2 running, 1986 sleeping, 0 stopped, 0 zombie
%Cpu(s): 0.5 us, 0.4 sy, 0.1 ni, 98.8 id, 0.1 wa, 0.0 hi, 0.1 si, 0.0 st
MiB Mem : 257832.5 total, 88293.1 free, 107006.6 used, 62532.8 buff/cache
MiB Swap: 10240.0 total, 10127.2 free, 112.8 used. 148806.7 avail Mem