KVM guests freeze (hung tasks) during backup/restore/migrate

One thing, as we seem to be restoring on this host before without issues. Could this be the cause?
WARNING: Sum of all thin volume sizes (2.20 TiB) exceeds the size of thin pool fst/vms and the size of whole volume group (2.18 TiB)!
 
I'm new to Proxmox in general, but having similar problem. Proxmox 5.2-5 Upon restore all guests went frozen after 2 minutes and reported sata failures. One guest pretty bad, lost partition tables on two disks. No indication in host logs of any issue. How about setting no cache on guest disks? Can this be a cause?

This looks exactly like the problem that I (and many others) reported for years in this thread and others. The problem is most likely a Linux kernel / KVM issue.

What seems to happen is when local disks are busy due to a restore operation, probably the dirty page IO of the host system is blocked, leading to CPU hangs / lockups in KVM guests which sometimes result in guest kernel panics. Unfortunately putting the swap partition to a different SSD than the array being restored to does not solve the problem. The problem happens (at least) since Proxmox 3, and affects many hardware and software configurations and filesystems (ext4 on LVM, ZFS, etc.).

The issue even appeared to me a few days ago on a fuilly updated Proxmox 5 node when restoring a VM from Ceph to local-zfs, even though I set a restore limit of 100 MB/s on a 4 disk RAIDZ1 SATA SSD pool. Regardless of several times higher system IO capacity, websites hosted on KVM guests on this node timed out for minutes.

Unfortunately, neither the Proxmox nor the KVM developers acknowledge the issue, let alone own (and investigate) it, so no one works on a solution. (Also people fail to read the starter post of this thread, therefore the discussion gets quickly derailed)

I have opened a bug in the Proxmox bugzilla, but it was closed by @wolfgang as a "load issue" while is clearly a bug in the kernel or QEMU/KVM code:
https://bugzilla.proxmox.com/show_bug.cgi?id=1453

Mitigations
I don't see any real solution happening to this problem, apart from using the tweaks that I posted many times before which don't solve the issue, but lessen it's impact:
- bandwidth limit backups, restores and migrations
- put swap on an NVMe SSD, also ZIL+L2ARC if you use ZFS
- use recommended swap settings from the ZFS wiki
- use vm.swappiness=1 on host and in guests
- increase vm.min_free_kbytes on both hosts and guests
- decrease vm.dirty_ratio to 2, vm.dirty_background_ratio to 1 on both hosts and guests
 
Last edited:
Hello, I am new to the forum.
I am pretty happy with proxmox in my lab and also have it on production, few months ago I migrated it from my VirtualBox setup on same server.
Server is Supermicro with 2x NVMe SSD ZFS mirror, 12 x Intel(R) Xeon(R) CPU E5-1650 v4 @ 3.60GHz (1 Socket) and 128GB of RAM.
For backup I attached folder through NFS on Synology storage.
Kernel Version Linux 5.0.21-5-pve #1 SMP PVE 5.0.21-10 (Wed, 13 Nov 2019 08:27:10 +0100)
PVE Manager Version pve-manager/6.1-5/9bf06119

But last week I enabled backups and this bacame nightmare.
It does not matter which compression I chose (none/lzo/gzip) for backup all seems to have simmilar issues.
Gzip seems to have lot more that lzo and none bot all have.
Last thing that I tried was to limit backup speed to 500mbps and this decreased number of CPU stuck errors but still does not resolve problem.
All of VMs have this CPU stuck problem. On one we have Gitlab on Archlinux and it stuck every night and restart of VM is required. After CPU lock file system switch to read only and no one can connect to it anymore.
Another crash that we have during this time, was that on Windows DC file system crashed so badly that after restart it was not possible to get it booted again, of course restore from backup was not successfull because of crashed state that was already in oldest backup...
Also one pfSense router did crashed but it seems to be more robust than Linux and Windows.
Screenshot is from one of VMs for pfSense router.

Was anyone able to resolve this problem and not just disable backup as 500mbps is pretty slow.

I will configure backup with zfs send but I do not see it as real solution.
 

Attachments

  • 1580291850329.png
    1580291850329.png
    172.5 KB · Views: 12
Last edited:
Was anyone able to resolve this problem and not just disable backup as 500mbps is pretty slow.

I will configure backup with zfs send but I do not see it as real solution.

Read my previous posts... this is a several years old bug in the kernel scheduler or KVM (or both), and nobody cares about solving it. Proxmox devs killed my bugreport, KVM devs do not even acknowledge the bugreport.

What happens is that during heavy IO load on the host (disk or network), KVM guests get starved of memory accesses (or CPU time), even when the load is not using all system resources. For example: a 100 mbytes / sec restore from an NFS backup or local disk to an SSD can completely starve a KVM guest for minutes at a time, even though the SSD can write at least 400 mbytes / sec sequentially.

There is no real solution to this bug, you can only mitigate it's effects by limiting the load on the host and turning down the page cache activity:
- bandwidth limit backups, restores and migrations
- use vm.swappiness=1 on host and in guests
- decrease vm.dirty_ratio to 2, vm.dirty_background_ratio to 1 on both hosts and guests
- increase vm.min_free_kbytes on both hosts and guests (at least 262144 helps)
 
[...]
I have opened a bug in the Proxmox bugzilla, but it was closed by @wolfgang as a "load issue" while is clearly a bug in the kernel or QEMU/KVM code:
https://bugzilla.proxmox.com/show_bug.cgi?id=1453
[...]
Wolfgang in his last but one post (comment 7) said that new features have solved the issue and no one replied that it was not the case, so he closed the issue. Maybe if you add comments there about this it could help to reopen and solve the problem?
 
Wolfgang in his last but one post (comment 7) said that new features have solved the issue and no one replied that it was not the case, so he closed the issue. Maybe if you add comments there about this it could help to reopen and solve the problem?

Good point @mmenaz ! Unfortunately none of the new features solved this problem completely, also his observation about this being a load issue is superficial in light of the facts.

I have reopened the bugreport and commented our latest experience. I encourage everyone to do the same.
 
Last edited:
@gkovacs After reading every comment in this thread I am in agreement with you. This problem exists, has existed for years now and while the sysctl settings mentioned help they do not entirely resolve it.
I am tired of having outages because of it. The whole point of being able to move disks to different storage and restore VMs is to avoid outages, not create them!

I just started moving to Proxmox 6.x and I've seen the issue with it too.

I've mostly seen this using disk move or restoring a VM, bandwidth limits do not help.
Ideally, if a single process like the restore, is consuming all of the IO, that process should be throttled and slowed down to allow other processes to also access storage. But this does not happen. What I find most troubling is that once the kernel gets in this state, not only is disk IO an issue, things like networking and other tasks that require no disk IO do not get scheduled. Additionally, if I am doing heavy writes to slow SATA rust and trigger this issue, why would that also halt IO to my NVMe and disrupt networking?

Does not matter if I'm using LVM, EXT4 or zfs it happens.
I've seen it on systems with 16GB of RAM and 256GB RAM, fast/slow CPU does not matter.

I would like to see this fixed, how can I help?
 
Good point @mmenaz ! Unfortunately none of the new features solved this problem completely, also his observation about this being a load issue is superficial in light of the facts.

I have reopened the bugreport and commented our latest experience. I encourage everyone to do the same.

I'm hitting this on my production node as well, but I actually attacked it from a ZFS perspective and found the following:
https://github.com/openzfs/zfs/issues/7631

As you can see I did some commenting and I've managed to help a bit, but it's still not ideal. For reference my hardware is as follows:

CPU(s): 64 x AMD EPYC 7502P 32-Core Processor (1 Socket)
Kernel Version: Linux 5.3.13-1-pve #1 SMP PVE 5.3.13-1 (Thu, 05 Dec 2019 07:18:14 +0100)
PVE Manager Version: pve-manager/6.1-5/9bf06119

Node SN Model Namespace Usage Format FW Rev
---------------- -------------------- ---------------------------------------- --------- -------------------------- ---------------- --------
/dev/nvme0n1 S437NY0M701561 SAMSUNG MZQLB960HAJR-00007 1 953.48 GB / 960.20 GB 512 B + 0 B EDA5202Q
/dev/nvme1n1 S437NY0M701840 SAMSUNG MZQLB960HAJR-00007 1 953.48 GB / 960.20 GB 512 B + 0 B EDA5202Q
/dev/nvme2n1 S437NY0M701866 SAMSUNG MZQLB960HAJR-00007 1 953.48 GB / 960.20 GB 512 B + 0 B EDA5202Q
/dev/nvme3n1 S437NY0M701498 SAMSUNG MZQLB960HAJR-00007 1 953.48 GB / 960.20 GB 512 B + 0 B EDA5202Q
/dev/nvme4n1 S35ENA0K621779 SAMSUNG MZVLW256HEHP-000L7 1 254.16 GB / 256.06 GB 512 B + 0 B 5L7QCXB7
/dev/nvme5n1 S35ENA0K621968 SAMSUNG MZVLW256HEHP-000L7 1 254.16 GB / 256.06 GB 512 B + 0 B 5L7QCXB7
/dev/nvme6n1 CVPF6414004G1P2NGN INTEL SSDPE2MX012T7 1 1.20 TB / 1.20 TB 4 KiB + 0 B MDV1NX27
/dev/nvme7n1 CVPF641400MY1P2NGN INTEL SSDPE2MX012T7 1 1.20 TB / 1.20 TB 4 KiB + 0 B MDV1NX27
/dev/nvme8n1 BTPF7516030T1P2NGN INTEL SSDPE2MX012T7 1 1.20 TB / 1.20 TB 4 KiB + 0 B MDV10271
/dev/nvme9n1 BTPF751603801P2NGN INTEL SSDPE2MX012T7 1 1.20 TB / 1.20 TB 4 KiB + 0 B MDV10271
 
Is anyone from proxmox working on this problem?

I installed proxmox 6.2-4 on an HP Proliant DL360 G7 server 2 weeks ago with 4 new SSDs (I had to make each SSD a Raid0 so that I could install proxmox with ZFS so that I will be able to use the replication feature on a second node once I make a cluster).

I began restoring my VMs and started having these problems with the running VMs locking up, getting read/write errors, crashing and then some of them would not reboot because they would say the disk was missing.

I thought the problem might be the Raid0 or the smart array P410i, so I put in PCIe cards with sata ports to bypass the smart array altogether, still had these problems, I tested the RAM, all passed, I was restoring over the network, so I tried copying to the hard drives first, still had these problems, I changed the configuration to have the OS on a separate boot device than the ZFS pool, still had these problems, and I even gave up on ZFS and used ext4 with the smart array and still had these problems.

So, after reading a dozen forums and trying many different things I was starting to suspect that either proxmox could not keep up with the read/write speeds and it was crashing or 1 or more of my new SSDs was actually bad, or maybe 1 of the 2 physical processors in the server was bad.

I have the exact same model of server that has been running these VMs using a smart array card and an external storage array for the last 2 years without any problems so I decided to try these 4 new SSDs in it. I installed proxmox 6.2-4 on the other HP Proliant DL360 G7 server that I have, made each SSD a raid0 and restored my first VM, then booted it, then restored my 2nd VM and as soon as the 1st VM crashed I shut it down and gave up, and put that working server back to the way it was.

I pulled the 4 SSDs and took the original server home to do more testing, I put the 4 SSDs in the original HP Proliant DL360 G7 server that I thought might have a bad processor because now I was sure it worked fine. I booted up the server because I was curious to see if the proxmox install that I had done would boot up in a different machine without me needing to manually restore the raid or recover the pool or any of that stuff. I was happy to see that it booted up just fine.........but, but, BUT, here is where I am lost and more confused then ever, NOW IT WORKS PERFECTLY!!!

I didn't change anything, basically my original set up should have worked 2 weeks ago. I wrote this whole story because maybe someone could tell me how this is possible, why would installing it on one machine, then pulling the disks and putting them in a different machine make all these crash problems stop. Now I can't make it crash no matter what I do, I restore VMs at full speed and have not have any crash or freeze or read/write error, I took it back to work and put it on the original network thinking maybe the network at my home was the variable BUT still it works perfectly.

I started to think I was crazy, BUT, I just installed the same proxmox 6.2-4 on another server, an HP proliant DL185, on western digital red hard drives and now I am back to square one. I'm getting these crashes on the VMs if I do a restore of another VM. It is consistently not working right so I came back to the forums and this forum seemed to have the most to say on this issue.

Did I stumble upon a solution that I can't explain?
 
  • Like
Reactions: rakurtz
This package was pushed to stable yesterday: libpve-storage-perl (6.2-9) over (6.2-8)
Maybe there is something in the patch(es) in libpve-storage-perl (6.2-9) which solves some or your problems?
 
This package was pushed to stable yesterday: libpve-storage-perl (6.2-9) over (6.2-8)
Maybe there is something in the patch(es) in libpve-storage-perl (6.2-9) which solves some or your problems?
I upgraded and it did not fix the problem. consistently, when I have a running VM and I start a restore of another VM, it gets to about the same progress in the restore point and then the running VM will start to have read/write errors and freeze and lock up and crash. consistently.
 
OK, I absolutely solved my problem and I know why it suddenly started working as I described in my previous post.

So, in MY case the setting that I had to turn on was in the configuration of the P410i smart array. When I boot up my HP Proliant DL360 G7, I tap F5 and once the P410i smart array initializes I press F8 to go into the configuration utility. In the utility one of the options is to turn on cache and because I didn't have a battery attached to the P410i in this server, it prompted me with a warning that it is not recommended because a power outage could cause data lose, but I turned it on anyways and, BOOM, I can now restore the VMs in proxmox and NOT crash the other running VMs. I am 100% sure that setting fixed MY issue.

Also, it now makes sense why the other HP Proliant DL360 G7 suddenly started working. When removing components and switching things around I definitely had the RAM chip out of the P410i and the battery disconnected at some point and it must have been the last thing that I did to reconnect the RAM chip and battery which must have made the cache feature start working, maybe it was disabled or not working when I first unpacked the server.

So now I don't even know if my problem was exactly the same as the one that this thread is about, it certainly seemed like it, but now I'm not sure... anyway, maybe my info can help save someone else 3 weeks of their life.
 
OK, I absolutely solved my problem and I know why it suddenly started working as I described in my previous post.

So, in MY case the setting that I had to turn on was in the configuration of the P410i smart array. When I boot up my HP Proliant DL360 G7, I tap F5 and once the P410i smart array initializes I press F8 to go into the configuration utility. In the utility one of the options is to turn on cache and because I didn't have a battery attached to the P410i in this server, it prompted me with a warning that it is not recommended because a power outage could cause data lose, but I turned it on anyways and, BOOM, I can now restore the VMs in proxmox and NOT crash the other running VMs. I am 100% sure that setting fixed MY issue.

Also, it now makes sense why the other HP Proliant DL360 G7 suddenly started working. When removing components and switching things around I definitely had the RAM chip out of the P410i and the battery disconnected at some point and it must have been the last thing that I did to reconnect the RAM chip and battery which must have made the cache feature start working, maybe it was disabled or not working when I first unpacked the server.

So now I don't even know if my problem was exactly the same as the one that this thread is about, it certainly seemed like it, but now I'm not sure... anyway, maybe my info can help save someone else 3 weeks of their life.

I have the same card except using the DL360 G6. Although my cache is already enabled (I have the capacitor instead of battery).
I verified the cache is enabled in BIOS and both using the ssacli command.
This issue persists for me regardless if the cache is on.

So far the only "hack" that somewhat improved stuff was like previous people mentioned by changing the sysctl vm dirty/swap/ratios.
Using the latest update of Proxmox hasn't yet fixed it either.
 
I solved these issues for me - or at least made them a lot better!

I spend the last couple of days reading through these forums because of similar problems. I run three servers, each with server grade sata disk (spinning disk) in a raidz2. Whenever there was heavy write job no matter from within a VM or from the hosts directly, the server load became quite high and VMs felt unresponsive ...

What completely changed the behavior of all three servers was to check via hdparm -W /dev/sdX if the internal write cache of the disk is turned on. If it is turned off, make sure to turn it on! hdparm -W1 /dev/sdX. That brought up reading and writing performance of the whole raidz2 from less than 1 MB/s for 4K blocks to around 20 MB/s. (checked via fio with --direct=1).

The performance for reading and writing with blocksize=1M went up to around 320 MB/s. Without any cache or log ssd.

make sure to enable writing cache for every disk in the zpool. I accidentally forgot to do so for a single disk (out of 8) and it slowed down the whole system...

I don't know why these were disabled by default.

What i still have to check ist whether the raid controller (it's DELL Perc H730P) can be tuned (cache on/off) via the bios. Maybe the information above can help somebody.

PS: What is also notable ist: I prevent a single VM from writing to much data at once into the zpool by limiting their bandwidth in Proxmox hard disk preferences. Especially for writing data. I am still in testing, if limiting their reading bandwidth is neccessary.
 
Last edited:
I'm hitting this on my production node as well, but I actually attacked it from a ZFS perspective and found the following:
https://github.com/openzfs/zfs/issues/7631

As you can see I did some commenting and I've managed to help a bit, but it's still not ideal. For reference my hardware is as follows:

CPU(s): 64 x AMD EPYC 7502P 32-Core Processor (1 Socket)
Kernel Version: Linux 5.3.13-1-pve #1 SMP PVE 5.3.13-1 (Thu, 05 Dec 2019 07:18:14 +0100)
PVE Manager Version: pve-manager/6.1-5/9bf06119

Node SN Model Namespace Usage Format FW Rev
---------------- -------------------- ---------------------------------------- --------- -------------------------- ---------------- --------
/dev/nvme0n1 S437NY0M701561 SAMSUNG MZQLB960HAJR-00007 1 953.48 GB / 960.20 GB 512 B + 0 B EDA5202Q
/dev/nvme1n1 S437NY0M701840 SAMSUNG MZQLB960HAJR-00007 1 953.48 GB / 960.20 GB 512 B + 0 B EDA5202Q
/dev/nvme2n1 S437NY0M701866 SAMSUNG MZQLB960HAJR-00007 1 953.48 GB / 960.20 GB 512 B + 0 B EDA5202Q
/dev/nvme3n1 S437NY0M701498 SAMSUNG MZQLB960HAJR-00007 1 953.48 GB / 960.20 GB 512 B + 0 B EDA5202Q
/dev/nvme4n1 S35ENA0K621779 SAMSUNG MZVLW256HEHP-000L7 1 254.16 GB / 256.06 GB 512 B + 0 B 5L7QCXB7
/dev/nvme5n1 S35ENA0K621968 SAMSUNG MZVLW256HEHP-000L7 1 254.16 GB / 256.06 GB 512 B + 0 B 5L7QCXB7
/dev/nvme6n1 CVPF6414004G1P2NGN INTEL SSDPE2MX012T7 1 1.20 TB / 1.20 TB 4 KiB + 0 B MDV1NX27
/dev/nvme7n1 CVPF641400MY1P2NGN INTEL SSDPE2MX012T7 1 1.20 TB / 1.20 TB 4 KiB + 0 B MDV1NX27
/dev/nvme8n1 BTPF7516030T1P2NGN INTEL SSDPE2MX012T7 1 1.20 TB / 1.20 TB 4 KiB + 0 B MDV10271
/dev/nvme9n1 BTPF751603801P2NGN INTEL SSDPE2MX012T7 1 1.20 TB / 1.20 TB 4 KiB + 0 B MDV10271
Hi @shaneshort

just curios to get a little more info on your server as i have a theory and wanted to cross check a few configs.

are you still experiencing this issue?

Are you ok to share the following info:

  • brand of server
  • Raid/ HBA card
  • Model of server
  • What Raid Configuration ZFS Mirror, RaidZ etc?

""Cheers
G
 
Last edited:
Addition to my last post:

I checked the raid controller bios settings: deactivating any cards-cache brought us some extra speed. We are using the Dell Perc H730p Controller in HBA mode with all card's cache tuned off but disk's write cache turned on. (hdmparm -W1 /dev/DISK).

That did the trick for us.
 
  • Like
Reactions: velocity08
Addition to my last post:

I checked the raid controller bios settings: deactivating any cards-cache brought us some extra speed. We are using the Dell Perc H730p Controller in HBA mode with all card's cache tuned off but disk's write cache turned on. (hdmparm -W1 /dev/DISK).

That did the trick for us.
Thanks for confirming.

I did a check with our new cluster and all the drive caches are on by default.

these proprietary cards from Dell and HP seem to be a common thread for these types of issues Is what I’ve discovered so far.

””Cheers
g