Turns out we have two of these. I thought I had fixed the other one before this one ever came up, but it got rebooted today and did the same thing. (Probably what happened is I did "systemctl restart networking" after moving the file back and left it like that figuring it must be fixed.)
Four...
I have one Proxmox VE server that hangs on boot. With "quiet" disabled, it hangs at "Job networking.service/start running (7min 30s / no limit)."
If I move /etc/network/interfaces to /etc/network/interfaces.no, the machine will boot up.
If I then log in to the machine and rename the file back...
In our case, the poor network speeds were caused by the PVE servers incorrectly routing to the PBS server through a software router that is only supposed to be used as an out-of-band backup. We were eventually able to get the bandwidth up to about 4Gbps, which is reasonable, given other traffic...
There are over a dozen.
I think the point (for me) is that PBS introduces a new single point of failure that randomly bounces around between VMs. As described in the thread you linked, problems with the backup can lead to stalls, outright hang of the VM requiring a reboot, or write errors...
In these cases, the VMs have not been shut down.
Oh. Yikes. So, if I understand it correctly, one should expect guest IO performance during a snapshot backup to be based entirely on the backup server's storage pool, and not its own. So, in our case, to run backups on every PVE host at once...
It doesn't. The fifth backup was running on a server that does not have any VMs with large disks. Those consistently work fine. The VM configs are basically the same except the virtual disks are 16-32GiB, tops instead of 1TB. The backup process bebops through about 15 of those in about the way...
This is absolutely not the case. The storage performance only becomes poor on these specific VMs, only while the backup is running. The storage, which is all U.2 NVMe drives hyperconverged on the Proxmox VE nodes, is very fast.
The cluster is healthy, and performs quite well.
4k random read...
The Proxmox VE machines don't have enough non-ceph storage in them to do a 1TB local backup. But there's no Realtek here and, as far as I know, the network is working fine:
Connecting to host 192.168.19.56, port 5201
[ 5] local 192.168.19.54 port 36446 connected to 192.168.19.56 port 5201
[...
PBS runs on a dedicated 16-core server with 256 GiB of RAM and a directly attached ZFS storage array.
It isn't PBS that's being slow; it has plenty of juice. Whatever the problem is, it appears to be on the Proxmox VE side.
We have been testing Proxmox Backup Server, because it's got a lot of really strong features. In principle, I really like it.
In practice, it's causing some pretty serious problems.
On VMs with large disks, the backups are positively glacial.
Example:
INFO: Starting Backup of VM 1914...
Hello from the future!
I had/have a very similar issue. A proxmox backup job runs constantly. It finishes, and a minute later, it starts again. This happened on only one host out of seven in the cluster. Our one backup job's schedule is 2,14:30.
I eventually determined that the affected...
This site uses cookies to help personalise content, tailor your experience and to keep you logged in if you register.
By continuing to use this site, you are consenting to our use of cookies.