Proxmox server blocked by heavy i/o

Feb 14, 2021
41
2
13
68
Denmark
This morning, my VMs on my Proxmox server weren't available. I remember message like "No space left on device", but not quite sure. I've tried several reboots. The machine boots, but the VMs and the GUI don't start.

df -h shows

Code:
Filesystem        Size  Used Avail Use% Mounted on
udev               16G     0   16G   0% /dev
tmpfs             3.2G  9.0M  3.2G   1% /run
rpool/ROOT/pve-1  512G  512G  4.8M 100% /
tmpfs              16G   34M   16G   1% /dev/shm
tmpfs             5.0M     0  5.0M   0% /run/lock
rpool             4.9M  128K  4.8M   3% /rpool
rpool/data        4.9M  128K  4.8M   3% /rpool/data
rpool/ROOT        4.9M  128K  4.8M   3% /rpool/ROOT
tmpfs             3.2G     0  3.2G   0% /run/user/0
/dev/fuse         128M   24K  128M   1% /etc/pve

When I start the hardware, the load is very high, even simple commands like "ls" take a long time. There seems to be very much disk-activilty.

Here is what top shows

Code:
top - 09:32:55 up 35 min,  2 users,  load average: 4.36, 3.36, 3.53
Tasks: 213 total,   1 running, 212 sleeping,   0 stopped,   0 zombie
%Cpu(s):  0.1 us,  0.2 sy,  0.0 ni, 29.6 id, 70.2 wa,  0.0 hi,  0.0 si,  0.0 st
MiB Mem :  31957.5 total,  30641.5 free,   1169.0 used,    147.1 buff/cache
MiB Swap:      0.0 total,      0.0 free,      0.0 used.  30425.0 avail Mem

    PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     TIME+ COMMAND
      2 root      20   0       0      0      0 S   0.3   0.0   0:04.74 kthreadd
     49 root      20   0       0      0      0 I   0.3   0.0   0:01.42 kworker/1:1-events
    274 root       0 -20       0      0      0 S   0.3   0.0   0:05.95 spl_dynamic_tas
    349 root       1 -19       0      0      0 S   0.3   0.0   0:02.27 z_wr_iss
    351 root       0 -20       0      0      0 S   0.3   0.0   0:14.49 z_wr_int
    476 root      20   0       0      0      0 D   0.3   0.0   0:01.61 txg_sync
  11522 root      20   0   10224   3656   2892 S   0.3   0.0   0:02.12 top
  21378 root      20   0  271232  86048   4076 S   0.3   0.3   0:01.36 pve-firewall
 114265 root      20   0   10228   3664   2888 R   0.3   0.0   0:00.01 top
      1 root      20   0  311828   8560   5472 S   0.0   0.0   0:03.94 systemd
      3 root       0 -20       0      0      0 I   0.0   0.0   0:00.00 rcu_gp
      4 root       0 -20       0      0      0 I   0.0   0.0   0:00.00 rcu_par_gp
      6 root       0 -20       0      0      0 I   0.0   0.0   0:00.00 kworker/0:0H-events_highpri
      9 root       0 -20       0      0      0 I   0.0   0.0   0:00.00 mm_percpu_wq
     10 root      20   0       0      0      0 S   0.0   0.0   0:00.00 rcu_tasks_rude_
     11 root      20   0       0      0      0 S   0.0   0.0   0:00.00 rcu_tasks_trace
     12 root      20   0       0      0      0 S   0.0   0.0   0:00.03 ksoftirqd/0

Any help appreciated :)

Jesper, Denmark
 
This morning, my VMs on my Proxmox server weren't available. I remember message like "No space left on device", but not quite sure. I've tried several reboots. The machine boots, but the VMs and the GUI don't start.

df -h shows

Code:
Filesystem        Size  Used Avail Use% Mounted on
udev               16G     0   16G   0% /dev
tmpfs             3.2G  9.0M  3.2G   1% /run
rpool/ROOT/pve-1  512G  512G  4.8M 100% /
tmpfs              16G   34M   16G   1% /dev/shm
tmpfs             5.0M     0  5.0M   0% /run/lock
rpool             4.9M  128K  4.8M   3% /rpool
rpool/data        4.9M  128K  4.8M   3% /rpool/data
rpool/ROOT        4.9M  128K  4.8M   3% /rpool/ROOT
tmpfs             3.2G     0  3.2G   0% /run/user/0
/dev/fuse         128M   24K  128M   1% /etc/pve

When I start the hardware, the load is very high, even simple commands like "ls" take a long time. There seems to be very much disk-activilty.

Here is what top shows

Code:
top - 09:32:55 up 35 min,  2 users,  load average: 4.36, 3.36, 3.53
Tasks: 213 total,   1 running, 212 sleeping,   0 stopped,   0 zombie
%Cpu(s):  0.1 us,  0.2 sy,  0.0 ni, 29.6 id, 70.2 wa,  0.0 hi,  0.0 si,  0.0 st
MiB Mem :  31957.5 total,  30641.5 free,   1169.0 used,    147.1 buff/cache
MiB Swap:      0.0 total,      0.0 free,      0.0 used.  30425.0 avail Mem

    PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     TIME+ COMMAND
      2 root      20   0       0      0      0 S   0.3   0.0   0:04.74 kthreadd
     49 root      20   0       0      0      0 I   0.3   0.0   0:01.42 kworker/1:1-events
    274 root       0 -20       0      0      0 S   0.3   0.0   0:05.95 spl_dynamic_tas
    349 root       1 -19       0      0      0 S   0.3   0.0   0:02.27 z_wr_iss
    351 root       0 -20       0      0      0 S   0.3   0.0   0:14.49 z_wr_int
    476 root      20   0       0      0      0 D   0.3   0.0   0:01.61 txg_sync
  11522 root      20   0   10224   3656   2892 S   0.3   0.0   0:02.12 top
  21378 root      20   0  271232  86048   4076 S   0.3   0.3   0:01.36 pve-firewall
 114265 root      20   0   10228   3664   2888 R   0.3   0.0   0:00.01 top
      1 root      20   0  311828   8560   5472 S   0.0   0.0   0:03.94 systemd
      3 root       0 -20       0      0      0 I   0.0   0.0   0:00.00 rcu_gp
      4 root       0 -20       0      0      0 I   0.0   0.0   0:00.00 rcu_par_gp
      6 root       0 -20       0      0      0 I   0.0   0.0   0:00.00 kworker/0:0H-events_highpri
      9 root       0 -20       0      0      0 I   0.0   0.0   0:00.00 mm_percpu_wq
     10 root      20   0       0      0      0 S   0.0   0.0   0:00.00 rcu_tasks_rude_
     11 root      20   0       0      0      0 S   0.0   0.0   0:00.00 rcu_tasks_trace
     12 root      20   0       0      0      0 S   0.0   0.0   0:00.03 ksoftirqd/0

Any help appreciated :)

Jesper, Denmark
OK, I managed to fix it, I hope. Unfortunately, the external disk I use for nightly backups weren't mounted, so this night Proxmox placed the backup on the local filesystem, filling it totally. Removing the local backup-file solved the problem.
 
Great! Here's what I've now done:
Put this script in /usr/local/bin/backup_hook.sh (my external disk is mounted as /mnt/intenso)
Code:
#!/usr/bin/env bash
grep -qs /mnt/intenso /proc/mounts
if [ $? -ne 0 ]; then
  # skip backup
  echo "Skipping backup - disk not mounted";
  exit 1
else
  # continue
  echo "All ok";
  exit 0
fi

Added a "script" line to /etc/pve/jobs.cfg

Code:
vzdump: 5d63c803ba05596f83d3f6e93851fcbcdfd51e03:1
        schedule 02:15
        compress zstd
        enabled 1
        mailnotification always
        mailto jesper.holck@xsxsxsxsxs.dk
        mode snapshot
        node sonja
        quiet 1
        script /usr/local/bin/backup_hook.sh
        storage intenso
        vmid 102

It seems to work :) It's probably a good idea to also check if destination is writable and has enough space, as VictorSTS suggests.
 
  • Like
Reactions: VictorSTS

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!