Thats the nested PVE config, the L1 guestThat’s your guests guest, does your primary guest also have ballooning disabled.
the L2 ubuntu guest vm has ballooning enabled yes.
Don't worry, not a big dealMy bad, I forgot to check whether it's already available in thepve-no-subscription
repository.
Thanks @Neobin !!It is currently in thepvetest
repository:
https://pve.proxmox.com/wiki/Package_Repositories#sysadmin_test_repo
YOU FOUND IT!!!Anyway, the change in the kernel that causes the crash has been added in kernel 5.8 - see GitHub (for easier readability in the browser) and Linux kernel mailing list. You'll thus need kernel 5.7 or older, and these are not available on PVE 7 either. Looking at the Proxmox VE Roadmap, you'll need to use the even older Proxmox VE 6.4 (download here) to see whether it works. The thing is, async page faults in kernel mode are not allowed for a good reason, so you'll have to see whether it will work better. Last but not least, I just want to mention that the suggestion of trying out PVE 6.4 is for testing purposes only, since that version has reached its end of life in September 2022.
Also, would it be possible to temporarily disable swap on the host to check whether it improves the situation? Again, this is not a general recommendation, since using swap has its benefits, but at least this would confirm our current assumptions about your issue.
yeah provably all my other VMs have older kernels than 5.8
I will check later.
But yeah it makes total sense to restrict the PF mechanism to user space, where is actually useful.
I don't see it being as useful in kernel space.
The thing is, when disabling PF# in kernel space, did someone make any changes on qemu/kvm so it knows which faults should be dealt with using the PF approach and which ones with the halt one?
Cause the commit you mentioned means that every distro using kernel 5.8+ onward will be susceptible to the same problem as pve8, and that this problem has little to do with nested virt and its all about memory management between host and guest.
And yes i can totally try with PVE 6 to confirm if it is really not affected by it, but that is not really a solution to this problem, neither would be disabling the swap on the host, as is very useful, as not all guest memory regions must be kept active and in ram all the time...
I will also try running a ubuntu VM on the host, with the latest stable kernel and see if it exhibits the same behavior as PVE
So in which layer should the problem be tackled? qemu, linux kernel on host or linux kernel on guest?
Again thanks you all so much for your time i really appreciate it!