[TTM] Buffer eviction failed

PayTech · Nov 15, 2024

I am also running into this problem on a proxmox machine running 7.4-17 with a VM running Linux Mint 22 Cinnamon 6.2.9 with kernel version 6.8.0-47-generic. This proxmox machine previously has had no issues running VMs for weeks at a time. So I'm prone to thinking its something wrong with the VM as when I start VMs with older OS's it will still run for long periods of time (weeks/months) without error. This issue has persisted through to a second proxmox machine running 8.2.7 where the first Linux Mint VM was copied over to the machine. As a test I installed a VM with PopOS 22.04 LTS with kernel version 6.9.3-76060903-generic and the QXL error has occured there too.

I'm going to transfer one of the VMs that I have not had a problem with over to the 8.2.7 proxmox machine and see if i get a QXL error.

If anyone has some recommended tests they would like me to do to help solve this problem I would be more than happy to assist!

THX1138 · Nov 18, 2024

No joy on any permutation or combination of RAM/VRAM and vgamem settings - for me, the QXL error occurs in all cases and still seems to be random.

Edit:
Assuming I'm reading it right, after reading through the kernel changelog for Ubuntu's 6.8.0-48 kernel (covering e.g. Ubuntu 24.04, Linux Mint 22 and others if the most up-to-date kernel is installed), it seems that the following occured with regard to the alleged QXL driver bug fix, the discussion of which I previously linked to:

14 Jun 2024
Reverted "drm/qxl: simplify qxl_fence_wait" in upstream kernel 6.8.7, which was pulled into Ubuntu 6.8.0-1008.8-22.04.1 [6.8.0-38.38]

19 Jul 2024
Reapplied "drm/qxl: simplify qxl_fence_wait" in upstream kernel 6.8.10, which was pulled into Ubuntu 6.8.0-1010.10-22.04.1 [6.8.0-40.40]

~~What's not clear is whether the bug had actually been fixed when the code was reapplied (in 6.8.0-40) or whether it was reapplied in original (buggy) form awaiting a future fix.~~

To answer my own question, from kernel.org's changelog for upstream kernel 6.8.10:

commit 3dfe35d8683daf9ba69278643efbabe40000bbf6
Author: Linus Torvalds <torvalds@linux-foundation.org>
Date: Mon May 6 13:28:59 2024 -0700

Reapply "drm/qxl: simplify qxl_fence_wait"

commit 3628e0383dd349f02f882e612ab6184e4bb3dc10 upstream.

This reverts commit 07ed11afb68d94eadd4ffc082b97c2331307c5ea.

Stephen Rostedt reports:
"I went to run my tests on my VMs and the tests hung on boot up.
Unfortunately, the most I ever got out was:

[ 93.607888] Testing event system initcall: OK
[ 93.667730] Running tests on all trace events:
[ 93.669757] Testing all events: OK
[ 95.631064] ------------[ cut here ]------------
Timed out after 60 seconds"

and further debugging points to a possible circular locking dependency
between the console_owner locking and the worker pool locking.

Reverting the commit allows Steve's VM to boot to completion again.

[ This may obviously result in the "[TTM] Buffer eviction failed"
messages again, which was the reason for that original revert. But at
this point this seems preferable to a non-booting system... ]

Reported-and-bisected-by: Steven Rostedt <rostedt@goodmis.org>
Link: https://lore.kernel.org/all/20240502081641.457aa25f@gandalf.local.home/

So, any downstream (distro) kernel that pulls from upstream linux kernel <6.8.7 or >=6.8.10 will have the buggy QXL code. That's for the 6.8 series; other kernel series probably also have the buggy code (e.g. 5.15).

For Ubuntu and derivatives, it looks like kernels 6.8.0-38 and 6.8.0-39 have the reverted code, so I'll see if I can test those.

jebbam · Nov 18, 2024

I have seen this in Debian bullseye, bookworm, trixie, and sid. It has been around for a lot of different kernel versions.

THX1138 · Nov 18, 2024

jebbam said:
I have seen this in Debian bullseye, bookworm, trixie, and sid. It has been around for a lot of different kernel versions.

Yes, originally some 3-4 years ago when the QXL driver was first simplified. It's probably in most kernel series since then. But I'm only testing the 6.8 series at the moment.

shadeless · Jan 2, 2025

THX1138 said:
Yes, originally some 3-4 years ago when the QXL driver was first simplified. It's probably in most kernel series since then. But I'm only testing the 6.8 series at the moment.

Hi THX1138 - do you have any updates on your testing of the specific kernels? Did -38 or -39 fix the issue?

THX1138 · Jan 2, 2025

shadeless said:
Hi THX1138 - do you have any updates on your testing of the specific kernels? Did -38 or -39 fix the issue?

Yes, the 6.8.0-38 and -39 kernels both work perfectly: over 200 hours of testing without the bug reappearing. In the middle of that, I re-tested the 6.8.0-49 and -50 kernels and they both failed within a few hours.

Just to be clear (for the benefit of people just finding this), it's the guest kernel we're talking about (the host kernel doesn't seem to matter at all) and it's much broader than the 6.8 kernel series as the bug was introduced in the upstream linux kernel over 3 years ago through simplification of the kvm/qemu QXL guest video driver and has propagated from there. The bug was removed briefly by reverting the prior change (via the upstream linux kernel, but pulled into the kernel of many distros, e.g. 6.8.0-38 and -39 in Ubuntu and derivatives) before being re-introduced because the developer said the reverted, unsimplified code caused crashes in their testing environments. I have seen no crashes or other issues whatsoever.

It seems as though we're stuck with it, since the developer expressed the opinion that people would just shift to virtio or some other guest video driver (virtio works, but is extremely slow - so much so as to be essentially unusable for me).

Hope that helps.

shadeless · Jan 2, 2025

THX1138 said:
[...]

Thank you for testing and sharing the results!

Will switch to one of the kernels and continue testing. My VMs last for about half a day before crashing unfortunately.
For anyone reading - if i dont report back, the older Kernels also worked for me

blitzdose · Jan 7, 2025

THX1138 said:
[...]

First of all: thank you for the intensive test!
I've been following this thread for quite some time now as I'm facing the same issue. I'm not using proxmox but Debian with QEMU for work. Had this issue on Debian and Kali guests. The issue is in the QXL kernel driver, since the working version nothing much changed in the source code. Would anything prevent me from just compiling the driver in a working state and using it with a new kernel? If I just clone the Linux Kernel Repo, roll back the qxl driver to the working files, and compile them, would that work? I'm not that deep into how the kernel works so maybe this could be a stupid question.

rzhu · Jan 14, 2025

I bump into this issue a few weeks ago and recently it appears more frequently. I'm using QEMU/KVM on Ubuntu 22.04, kernel version 6.8.0-51-generic. And the guest OS is also Ubuntu 22.04. The graphics console got frozen initially and then lost respond to keyboard and mouse. Since I still can access the VM via SSH, I can see the error in dmesg like below,

Code:

[Tue Jan 14 16:27:55 2025] [TTM] Buffer eviction failed
[Tue Jan 14 16:27:55 2025] qxl 0000:00:01.0: object_init failed for (262144, 0x00000001)
[Tue Jan 14 16:27:55 2025] [drm:qxl_gem_object_create [qxl]] *ERROR* Failed to allocate GEM object (260772, 1, 4096, -12)
[Tue Jan 14 16:27:55 2025] [drm:qxl_alloc_ioctl [qxl]] *ERROR* qxl_alloc_ioctl: failed to create gem ret=-12
...
[Tue Jan 14 16:28:10 2025] [TTM] Buffer eviction failed
[Tue Jan 14 16:28:10 2025] qxl 0000:00:01.0: object_init failed for (3149824, 0x00000001)
[Tue Jan 14 16:28:10 2025] [drm:qxl_alloc_bo_reserved [qxl]] *ERROR* failed to allocate VRAM BO
...
[Tue Jan 14 16:28:11 2025] p4v.bin[201907]: segfault at 7463ab25ea30 ip 00007463ab25ea30 sp 00007fff42f591d8 error 15 in libQt6Core.so.6[7463ab247000+206000] likely on CPU 0 (core 0, socket 0)
...

Since I was using p4v, but at the moment of crash, no operations on p4v at all. I had several crash logs but every time the trigger seemed from p4v. This is another one,

Code:

[Fri Jan  3 13:35:50 2025] [TTM] Buffer eviction failed
[Fri Jan  3 13:35:50 2025] qxl 0000:00:01.0: object_init failed for (258048, 0x00000001)
[Fri Jan  3 13:35:50 2025] [drm:qxl_gem_object_create [qxl]] *ERROR* Failed to allocate GEM object (256020, 1, 4096, -12)
[Fri Jan  3 13:35:50 2025] [drm:qxl_alloc_ioctl [qxl]] *ERROR* qxl_alloc_ioctl: failed to create gem ret=-12
[Fri Jan  3 13:36:05 2025] [TTM] Buffer eviction failed
[Fri Jan  3 13:36:05 2025] qxl 0000:00:01.0: object_init failed for (3149824, 0x00000001)
[Fri Jan  3 13:36:05 2025] [drm:qxl_alloc_bo_reserved [qxl]] *ERROR* failed to allocate VRAM BO
[Fri Jan  3 13:36:06 2025] p4v.bin[312116]: segfault at 711c6e85ea30 ip 0000711c6e85ea30 sp 00007ffe1fba31f8 error 15 in libQt6Core.so.6[711c6e847000+206000] likely on CPU 2 (core 0, socket 2)
[Fri Jan  3 13:36:06 2025] Code: 65 64 28 51 4f 62 6a 65 63 74 20 2a 29 00 32 64 65 73 74 72 6f 79 65 64 28 51 4f 62 6a 65 63 74 20 2a 29 00 00 00 00 00 00 00 <32> 31 51 4f 62 6a 65 63 74 43 6c 65 61 6e 75 70 48 61 6e 64 6c 65

I post here and wonder if there is a simple fix other than downgrade the kernel? Thanks.

THX1138 · Jan 18, 2025

rzhu said:
Since I was using p4v, but at the moment of crash, no operations on p4v at all. I had several crash logs but every time the trigger seemed from p4v.

If you look at your logs, the p4v segfault occurs several seconds after the QXL driver (TTM Buffer eviction failed) error. The p4v binary may have its own bug that is triggered by the QXL driver crash but, since the majority of systems experiencing this problem don't have p4v (Helix visual client) installed, I'd be fairly certain that p4v is not the cause.

rzhu said:
I post here and wonder if there is a simple fix other than downgrade the kernel? Thanks.

The problem is in the simplified version of the QXL guest video driver (used with qemu/kvm). You can use an alternate guest video driver if that works for you.

THX1138 · Jan 18, 2025

blitzdose said:
First of all: thank you for the intensive test!

You're welcome. I had some spare time while the cricket was on.

blitzdose said:
Would anything prevent me from just compiling the driver in a working state and using it with a new kernel? If I just clone the Linux Kernel Repo, roll back the qxl driver to the working files, and compile them, would that work? I'm not that deep into how the kernel works so maybe this could be a stupid question.

I don't know but, theoretically, I suppose it should ... I've never done that myself ... patching the kernel and recompiling, that is ... I'm not that deep into kernel development either. Maybe a better solution would be if a lot of us petition the developer (via kernel bug reports) to fix the bug?

blitzdose · Jan 24, 2025

THX1138 said:
Maybe a better solution would be if a lot of us petition the developer (via kernel bug reports) to fix the bug?

Yes I like that solution way more. Just thought this could be a temporary fix for this issue. It's really bugging me at work

blitzdose · Feb 13, 2025

THX1138 said:
Maybe a better solution would be if a lot of us petition the developer (via kernel bug reports) to fix the bug?

Are there already any bug reports other than the original which reverted the fix? I tried compiling the fixed version into newer kernels but that's not going to work, too much changed since then. The original bug report from the commit also says that the QXL driver is too old and not really maintained but I think that's not an excuse as the alternative they suggest, using virtio, is performing so bad. There is just no alternative to QXL in my opinion

70tas · Mar 6, 2025

ramius1984 said:
Hi,

@jebbam I cannot solve your problem but at least I can reference a workaround:
https://bbs.archlinux.org/viewtopic.php?id=215839

QXL should be the culprit and the workaround is to switch to VirtIO for the display settings.

View attachment 75429

VDI Client still works fine. Also feels faster.

Other reference and bonus:
https://www.reddit.com/r/Proxmox/comments/1b01sbx/spice_really_sluggish_with_ubuntu_2204/
https://www.reddit.com/r/Proxmox/comments/1auvdlg/qxl_vs_virtio_gpu_vs_virgl_gpu_trivial_benchmark/

Best regards,
Stefan

Stefan:

I am running VMWare Workstation under Debian Bookworm vm with KDE. At some point the vm started stuttering really bad, I would click on an icon and it would take a couple of minutes to respond. Your recommendation worked. As soon as I changed the display driver and rebooted, everything is speeding right along.

Bent · Jul 25, 2025

Currently testing VirtIO-GPU on a few VMs, and so far it's working well in my setup. Considering that SPICE is already on its way toward deprecation, this switch might not be such a bad change after all. I'll continue testing over the next few weeks to see if the issue reappears — particularly the console freeze — and will report back with any findings.

emmiefuchsia · Aug 26, 2025

Hello all
i My case (Kali linux 6.12.33) I done patching qlx driver in kernel and rebuild kernel.
Here is link to post with patching process:
https://linux.kernel.narkive.com/gG...ebian-vm-with-qxl-graphics-freezes-frequently

After installing new kernel there is no freezing ))
Regards...

null-man · Sep 11, 2025

emmiefuchsia said:
Hello all
i My case (Kali linux 6.12.33) I done patching qlx driver in kernel and rebuild kernel.
Here is link to post with patching process:
https://linux.kernel.narkive.com/gG...ebian-vm-with-qxl-graphics-freezes-frequently

After installing new kernel there is no freezing ))
Regards...

As i ran into this (GUI-crash after some hours, while VM still runs - also showing the same errors in the logs) now as well, after updating a couple of Debian-VMs to trixie (and getting one of the affected kernels obviously), I wonder if you can explain some steps how to get this qxl-patch (reverting the culprit-code to the workin state) applied.
While i am working with Linux for decades, I have never really needed to do this and i even if i know the basic steps (like getting the kernel-source, applying patch, recompiling and installing etc...) I have a hard time to figure out what to do based on the link you posted.
So any hints how you did this would be appreciated ;-)
(Also saw a post from above that patching with recent kernels failed? I have "6.12.43+deb13-amd64" currently running here and as you seemed to have success with that in Kali with 6.12.33, i have a slight hope i can get there, too ;-)
VirtIO-GPU is not an alternative as it is really sluggish (lots of tearing etc.) - so having QXL (for use with SPICE) running properly is quite important for me.
Thx

Kkse · Sep 14, 2025

null-man said:
...

I had repeat kali kernel oatching and it worked for me.
Very thanks to emmiefuchsia

Here thi link to kernel build process:

https://www.kali.org/docs/development/recompiling-the-kali-linux-kernel/

In nutshell:
1)Unpack kernel sources
2)Do patching two files -
See Alex Constantino post in upper link:

diff --git a/drivers/gpu/drm/qxl/qxl_release.c
...

diff --git a/include/linux/dma-fence.h
....

3) rebuild and install kernel.

Hope this help you..)

emmiefuchsia · Sep 14, 2025

Hello All )
I am glad to see that someone success to rebuild kernel too...
here is my patched files

null-man · Sep 18, 2025

@Kkse @emmiefuchsia: Thx for the input - maybe i find the time during the weekend to tackle this and try to compile a kernel with the fixes for current Debian-Trixie-Kernel ("6.12.43+deb13-amd64")
If i succeed i´ll share the deb-package somewhere.

[TTM] Buffer eviction failed

New Member

New Member

Attachments

Well-Known Member

New Member

New Member

New Member

New Member

New Member

New Member

New Member

New Member

New Member

New Member

New Member

Well-Known Member

New Member

Member

New Member

New Member

Attachments

Member

We value your privacy