[SOLVED] GPU Passthought KO with kernel 5.0 (PVE 6.0)

Blais

Well-Known Member
Mar 28, 2017
32
4
48
Hello

Following the upgrade 5.4 =>6.0 , I have a rather strange problem that I didn't have before the upgrade.

I feel like it's random.

In short, it's not a panic kernel, but at the configuration interface level, it puts ? at the lxc/vm level.

upload_2019-7-17_20-58-56.png

above the captures using dmesg.



[ 3867.253302] INFO: task vgs:30733 blocked for more than 120 seconds.
[ 3867.253304] Tainted: P O 5.0.15-1-pve #1
[ 3867.253306] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 3867.253308] vgs D 0 30733 2770 0x00000000
[ 3867.253310] Call Trace:
[ 3867.253314] __schedule+0x2d4/0x870
[ 3867.253317] schedule+0x2c/0x70
[ 3867.253319] schedule_timeout+0x258/0x360
[ 3867.253322] ? ttwu_do_activate+0x67/0x90
[ 3867.253325] wait_for_completion+0xb7/0x140
[ 3867.253328] ? wake_up_q+0x80/0x80
[ 3867.253331] __flush_work+0x138/0x200
[ 3867.253333] ? worker_detach_from_pool+0xb0/0xb0
[ 3867.253335] ? get_work_pool+0x40/0x40
[ 3867.253338] __cancel_work_timer+0x115/0x190
[ 3867.253340] ? exact_lock+0x11/0x20
[ 3867.253343] cancel_delayed_work_sync+0x13/0x20
[ 3867.253345] disk_block_events+0x78/0x80
[ 3867.253348] __blkdev_get+0x73/0x550
[ 3867.253350] ? bd_acquire+0xd0/0xd0
[ 3867.253352] blkdev_get+0x10c/0x330
[ 3867.253355] ? bd_acquire+0xd0/0xd0
[ 3867.253357] blkdev_open+0x92/0x100
[ 3867.253359] do_dentry_open+0x143/0x3a0
[ 3867.253361] vfs_open+0x2d/0x30
[ 3867.253363] path_openat+0x2d4/0x16d0
[ 3867.253367] ? __do_page_fault+0x25a/0x4c0
[ 3867.253369] ? mem_cgroup_try_charge+0x8b/0x190
[ 3867.253372] do_filp_open+0x93/0x100
[ 3867.253375] ? strncpy_from_user+0x56/0x1b0
[ 3867.253377] ? __alloc_fd+0x46/0x150
[ 3867.253380] do_sys_open+0x177/0x280
[ 3867.253382] ? __x64_sys_io_submit+0xa9/0x190
[ 3867.253385] __x64_sys_openat+0x20/0x30
[ 3867.253388] do_syscall_64+0x5a/0x110
[ 3867.253390] entry_SYSCALL_64_after_hwframe+0x44/0xa9
[ 3867.253392] RIP: 0033:0x7fb29b4c41ae
[ 3867.253394] Code: Bad RIP value.
[ 3867.253395] RSP: 002b:00007ffff5fa8000 EFLAGS: 00000246 ORIG_RAX: 0000000000000101
[ 3867.253398] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007fb29b4c41ae
[ 3867.253399] RDX: 0000000000044000 RSI: 000055bf67f79b58 RDI: 00000000ffffff9c
[ 3867.253401] RBP: 00007ffff5fa8160 R08: 000055bf680ea000 R09: 0000000000000000
[ 3867.253402] R10: 0000000000000000 R11: 0000000000000246 R12: 00007ffff5fa8edf
[ 3867.253404] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000

ps aux _>
root 30733 0.0 0.0 20420 9748 ? D 20:21 0:00 /sbin/vgs --separator : --noheadings --units b --unbuffered --nosuffix --options vg_name,vg_size,vg_free,lv_cou

Grub option :
GRUB_CMDLINE_LINUX_DEFAULT="pcie_acs_override=downstream,multifunction video=efifb:eek:ff pcie_aspm=off"


To restart the machine, no choice but to go by pressing the power button.

A little help from you would be welcome, hoping that you wouldn't have to reinstall everything.

Sincerely.

Julien
 
I think it's more fear than harm,

By making an fdisk -l on all the disks the sdc seemed not to want to cooperate, I removed it, I'll see.

However, I have another problem with the passthough GPU that worked very well. Since the upgrade, there is no way to do gamestream.

If I take the card off, I don't have a problem.

I don't remember having an Extended features (0xf77ef22294ada) when playing the following command:

dmesg | grep -e DMAR -e IOMMU -e AMD-V
[ 0.000000] Warning: PCIe ACS overrides enabled; This may allow non-IOMMU protected peer-to-peer DMA
[ 0.696535] AMD-Vi: IOMMU performance counters supported
[ 0.700902] AMD-Vi: Found IOMMU at 0000:00:00.2 cap 0x40
[ 0.700904] AMD-Vi: Extended features (0xf77ef22294ada):
[ 0.700908] AMD-Vi: Interrupt remapping enabled
[ 0.700910] AMD-Vi: Virtual APIC enabled
[ 0.701009] AMD-Vi: Lazy IO/TLB flushing enabled
[ 0.702226] perf/amd_iommu: Detected AMD IOMMU #0 (2 banks, 4 counters/bank).


Someone has a lead or is in the same situation.


Sincerely.

Julien
 
Replace q35 => pc-q35-3.1 works!

All that is missing is to add this wording to the existing list of machines, unless it is the workaround solution for the time being.

Thank you.
 
All that is missing is to add this wording to the existing list of machines, unless it is the workaround solution for the time being.
this is only a workaround for now, but can you try the other fix with kernel_irqchip in the args? this one will be definitely fixed when we ship either qemu 4.0.1 or 4.1
 
Inventory of the situation:

Test with graphics card slot 2 without ROM image to be passed with argument "args: -machine type=q35,kernel_irqchip=on" placed in the file => works + game stream tested ok :p
Test with graphics card slot 2 without ROM image by replacing "machine : q35" => "machine : pc-q35-3.1" => works
Test with graphics card slot 1 with ROM image to be passed with argument "args: -machine type=q35,kernel_irqchip=on" placed in the file => works
Test with graphics card slot 1 with ROM image by replacing "machine : q35" => "machine : pc-q35-3.1" => works


When I say it works, no hardware errors encountered and the VM starts and is accessible.
upload_2019-7-19_19-29-50.png

Compared to the first test, yesterday, I was able to play game stream. It seems ok on all the tests, pcie 8x3.0 since there are two cards

As I don't pay a subscription for your work, I can contribute to this kind of test case. Besides, if you want me to do another test, don't hesitate to ask me.

Thank you for your help.

With kind regards.

Julien
 
Last edited:
thanks :) in your case as soon as we ship qemu 4.0.1 or 4.1 you should be able to remove the workarounds again
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!