weird high IO issue on my proxmox server.

dirtbag2k · Mar 12, 2020

Something weird is going on with my proxmox server.. Its running on a beefy cisco ucs server with a lot of ram.

root@newrock:~# free
total used free shared buff/cache available
Mem: 131991876 10108404 120522024 70072 1361448 120824004
Swap: 8388604 0 8388604
root@newrock:~#

Code:

root@newrock:~# pveversion -v
proxmox-ve: 6.1-2 (running kernel: 5.3.18-2-pve)
pve-manager: 6.1-7 (running version: 6.1-7/13e58d5e)
pve-kernel-helper: 6.1-7
pve-kernel-5.3: 6.1-5
pve-kernel-5.3.18-2-pve: 5.3.18-2
pve-kernel-5.3.13-1-pve: 5.3.13-1
pve-kernel-5.3.10-1-pve: 5.3.10-1
ceph-fuse: 12.2.11+dfsg1-2.1+b1
corosync: 3.0.3-pve1
criu: 3.11-3
glusterfs-client: 5.5-3
ifupdown: 0.8.35+pve1
ksm-control-daemon: 1.3-1
libjs-extjs: 6.0.1-10
libknet1: 1.15-pve1
libpve-access-control: 6.0-6
libpve-apiclient-perl: 3.0-3
libpve-common-perl: 6.0-13
libpve-guest-common-perl: 3.0-3
libpve-http-server-perl: 3.0-4
libpve-storage-perl: 6.1-5
libqb0: 1.0.5-1
libspice-server1: 0.14.2-4~pve6+1
lvm2: 2.03.02-pve4
lxc-pve: 3.2.1-1
lxcfs: 3.0.3-pve60
novnc-pve: 1.1.0-1
proxmox-mini-journalreader: 1.1-1
proxmox-widget-toolkit: 2.1-3
pve-cluster: 6.1-4
pve-container: 3.0-21
pve-docs: 6.1-6
pve-edk2-firmware: 2.20200229-1
pve-firewall: 4.0-10
pve-firmware: 3.0-6
pve-ha-manager: 3.0-8
pve-i18n: 2.0-4
pve-qemu-kvm: 4.1.1-3
pve-xtermjs: 4.3.0-1
qemu-server: 6.1-6
smartmontools: 7.1-pve2
spiceterm: 3.1-1
vncterm: 1.6-1
zfsutils-linux: 0.8.3-pve1
root@newrock:~# uname -a
Linux newrock 5.3.18-2-pve #1 SMP PVE 5.3.18-2 (Sat, 15 Feb 2020 15:11:52 +0100) x86_64 GNU/Linux
root@newrock:~# time w
13:14:36 up  3:07,  3 users,  load average: 24.75, 24.09, 20.69
USER     TTY      FROM             LOGIN@   IDLE   JCPU   PCPU WHAT
root     pts/1    74.213.159.129   10:37    2:12   0.30s  0.00s w
root     pts/2    74.213.159.129   11:06   26:34   6.20s  6.14s top
root     pts/3    74.213.159.129   12:53    6:36   0.08s  0.08s -bash

real    2m11.627s
user    0m0.010s
sys    0m0.008s
root@newrock:~# qm list
      VMID NAME                 STATUS     MEM(MB)    BOOTDISK(GB) PID     
       101 winblows7            stopped    6096              25.00 0       
       107 monsterjam           running    8192             302.00 15560   
       111 ubuntu               stopped    2048              32.00 0       
root@newrock:~#

but the server is busy doing some kind of IO

Code:

Total DISK READ:         0.00 B/s | Total DISK WRITE:       528.40 K/s
Current DISK READ:       0.00 B/s | Current DISK WRITE:       3.29 M/s
  TID  PRIO  USER     DISK READ  DISK WRITE  SWAPIN     IO>    COMMAND                                                               
1185 be/4 root        0.00 B/s    0.00 B/s  0.00 % 42.93 % [txg_sync]
2614 be/4 root        0.00 B/s  352.27 K/s  0.00 %  0.00 % pmxcfs
3373 be/4 root        0.00 B/s  176.13 K/s  0.00 %  0.00 % pmxcfs
    1 be/4 root        0.00 B/s    0.00 B/s  0.00 %  0.00 % init
    2 be/4 root        0.00 B/s    0.00 B/s  0.00 %  0.00 % [kthreadd]
    3 be/0 root        0.00 B/s    0.00 B/s  0.00 %  0.00 % [rcu_gp]
    4 be/0 root        0.00 B/s    0.00 B/s  0.00 %  0.00 % [rcu_par_gp]
    6 be/0 root        0.00 B/s    0.00 B/s  0.00 %  0.00 % [kworker/0:0H-kblockd]
    7 be/4 root        0.00 B/s    0.00 B/s  0.00 %  0.00 % [kworker/u48:0-dm-thin]
    9 be/0 root        0.00 B/s    0.00 B/s  0.00 %  0.00 % [mm_percpu_wq]

it seems to be the proxmox server itself.. but whats it doing?!!!
my main ZFS pool seems to be ok.. and I dont see any failing disks in my hardware raid..

root@newrock:~# zpool list
NAME SIZE ALLOC FREE CKPOINT EXPANDSZ FRAG CAP DEDUP HEALTH ALTROOT
zfspool 1.81T 387G 1.43T - - 29% 20% 1.00x ONLINE -

Code:

top - 13:19:39 up  3:12,  3 users,  load average: 20.76, 21.86, 20.71
Tasks: 546 total,   1 running, 544 sleeping,   0 stopped,   1 zombie
%Cpu(s):  0.2 us,  0.4 sy,  0.0 ni, 87.9 id, 11.5 wa,  0.0 hi,  0.0 si,  0.0 st
MiB Mem : 128898.3 total, 117555.4 free,  10011.5 used,   1331.5 buff/cache
MiB Swap:   8192.0 total,   8192.0 free,      0.0 used. 117849.0 avail Mem

  PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     TIME+ COMMAND                                                    
15560 root      20   0 8874196   2.9g   9116 S   4.7   2.3  10:37.25 kvm                                                        
12007 root      20   0 2665720 623348   9488 S   3.7   0.5   0:29.58 kvm                                                        
  607 root       0 -20       0      0      0 S   0.7   0.0   6:53.67 spl_dynamic_tas                                            
1086 root       0 -20       0      0      0 S   0.7   0.0  11:01.99 z_wr_int                                                   
1089 root       0 -20       0      0      0 S   0.7   0.0  11:02.01 z_wr_int                                                   
    2 root      20   0       0      0      0 S   0.3   0.0  17:13.50 kthreadd                                                   
   11 root      20   0       0      0      0 I   0.3   0.0   0:22.11 rcu_sched                                                  
1066 root       1 -19       0      0      0 S   0.3   0.0  11:18.71 z_wr_iss                                                   
1071 root       1 -19       0      0      0 S   0.3   0.0  11:19.31 z_wr_iss                                                   
1073 root       1 -19       0      0      0 S   0.3   0.0  11:18.81 z_wr_iss

any ideas?

-db

dirtbag2k · Mar 12, 2020

sooo I finally figured out by trial and error that it was because of an LXC container that I had whose rootdisk was full.
when I shutdown that container, everything went back to normal.
I expanded the disk and brought it back up and everything is happier now.

thats still weird that a full rootdisk on a container would cause the proxmox server to go nuts like that.

-db

Search

Search

weird high IO issue on my proxmox server.

dirtbag2k

Active Member

dirtbag2k

Active Member

We value your privacy