I/O Delay 100% - Unable to do backups with PBS

Jurny_Katecheta

New Member
Nov 5, 2022
1
0
1
Hi
Thank you for clicking, perhaps you'll be able to help

I've installed the latest (2.2-1) Proxmox backup server on something called Acepc T11(https://www.priceboon.com/product/acepc-t11/) as I just need this thing to run backups.

Unfortunately, there is no way to install PBS directly on eMMC drive as it's crashing during installation (stops at 2% while trying to make a partition), so I've bought an external USB SSD drive just for a system (yes not my first choice either, but I just wanted to make it work on this Acepc).

I've used the available HDD slot for 2tb HDD inside the device just for backups. I know that having a system on SSD via USB will make it a lot slower but it should still work ?

As soon as I start the backup task the I/O delay builds up to 100% (about a minute) and then shortly after that the processor use is following and stay like that.


r/Proxmox - I/O Delay 100% - Unable to do backups with PBS

During that time main 2tb drive will only show a spike of workload:


r/Proxmox - I/O Delay 100% - Unable to do backups with PBS

Unfrtunetly the task status will not show anything special:
PVE
Code:
INFO: scsi0: dirty-bitmap status: created new

INFO:   0% (404.0 MiB of 80.0 GiB) in 3s, read: 134.7 MiB/s, write: 130.7 MiB/s ERROR: VM 100 qmp command 'query-backup' failed - got timeout INFO: aborting backup job


PBS
Code:
2022-11-05T15:13:11+00:00: starting new backup on datastore 'Bakcups': "vm/100/2022-11-05T15:13:03Z"

2022-11-05T15:13:11+00:00: GET /previous: 400 Bad Request: no valid previous backup 2022-11-05T15:13:11+00:00: created new fixed index 1 ("vm/100/2022-11-05T15:13:03Z/drive-scsi0.img.fidx") 2022-11-05T15:13:11+00:00:  add blob   "/mnt/datastore/Bakcups/vm/100/2022-11-05T15:13:03Z/qemu-server.conf.blob"  (449 bytes, comp: 449)

So PVE task will just cancel due to timeout and PBS task will just hand there for hours like that.

Here is the funny, part. Once I'll cancel all the tasks, the CPU usage and I/O delay will stay on 100% for hours slowly going down up to 60-70% and it will stay there with no load.

I just wanted to have a small device completely silent where I can save the backups. Many people are recommending Intel Nuc's or similar like Dell Micro(not exactly silent) itd.

I really wanted this Acepc to work as it has a perfect size and it's silent but at this point, I don't really know if changing it for some different small pc ( like this ) will do anything.

Is there any solution for this ? Can I still make it work with the current hardware or I have to choose something different (what exactly) ?

I need something of the size of a NUC but 0.7 of 1U high max (it's sitting at the bottom 1U of the cabinet) for about 200$
 
Last edited:
I see some major problems here:
- the CPU is ridiculously slow - look at the "Load average" on your screenshot, the box is completely overwhelmed
- you are using a HDD, a single one, and possibly a consumer one
- you are using a single disk for your backups - the slightest failure and your backups are corrupted
- looking at the other hardware, you probably don't have ECC memory built in

Having a slow disk and a very slow CPU will result in something like this.
I assume you are using the default ZFS for storage. The continueing IO wait you see could be some buffers slowly be emptied by ZFS.

The setup is by far not ideal, but you could possibly improve it a bit by ditching ZFS in this case and use something like XFS.
Having a single disk makes redundancy impossible anyways, so you are not loosing much.

You should check the clock speeds of your CPU and make sure ANY "energy saving" policies are disabled, so you get at least no fruther throttling.
And with a less demanding basic file system you *may* see better results. But not much better.
Garbage Collections could take ages when you fill that 2TB HDD up. Beware.

If you are willing to spend a little amount of cash, get a 2TB SSD instead of that HDD, that would help GC tasks and possibly reduce the io_wait generally.
Consumer SSD should be fine in that case, datacenter SSD would be bottlenecked by the CPU anyways.
 
Jep, Atoms are really bad when using PBS. Just added an Atom J3710 as a PVE node running on an enterprise SSD ZFS mirror. It's a 6.5W TDP quadcore with 1.6-2.6 GHz and without HT. When running the workload (3 LXCs + 5 VMs) it's usually between a load average of 2 to 3.5. So it can barely run the guests. When I then start a backup to the PBS the load average will go up to 24. Any read/write in general will hit the CPU very hard because of ZFS and its massive overhead and calculations. But with PBS it's especially worse, as PBS is doing all the work client side. So the atom has in addition to do all the hashing, zstd compression, encryption and so on.
I guess your atom also got it problems. While PBS is doing most work clientside when doing a backup/restore the PBS will still need to do the decompression and hashing when running verify jobs.

So yes, looks like PBS and/or ZFS are too demanding for a atom CPU.

You should check the clock speeds of your CPU and make sure ANY "energy saving" policies are disabled, so you get at least no fruther throttling.
Here that won`t work. When I set it to performance it will just throttle from 2.6 to 1.6 GHz as the CPU reaches 90 degree C because the passive heatsink can't handle that continous boosting. But got a bit better after adding a 60mm fan blowing air on that passive heatsink.
 
Last edited:

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!