PVE Kernel 5.3.18-3 PCI passthrough error - unable to read tail (got 0 bytes)

frotella

New Member
May 8, 2020
8
1
3
Hello, i'm experiencing PCIe passthrough (intel nic) with the latest Proxmox 6.1 and kernel 5.3.18-3 (efi boot)
On 5.3.18-2 everything works as as expected (freebsd/linux/windows vms) but on 5.3.18-3 as soon as i boot a vm with passthrough, i get 'unable to read tail (got 0 bytes)'

Bonus hint request: meanwhile, how to edit /etc/kernel/cmdline fot stick to 5.3.18-2?

Here's a trace
Code:
May 12 06:12:30 pv0-it kernel: [  125.937690] invalid opcode: 0000 [#1] SMP PTI
May 12 06:12:30 pv0-it kernel: [  125.937700] CPU: 2 PID: 3618 Comm: task UPID:pv0-i Tainted: P           O      5.3.18-3-pve #1
May 12 06:12:30 pv0-it kernel: [  125.937717] Hardware name: IBM IBM xSeries High Volume Towers x3100 M4  -[2582K1G]-/00D8867, BIOS -[JQE164AUS-1.07]- 12/09/2013
May 12 06:12:30 pv0-it kernel: [  125.937742] RIP: 0010:free_msi_irqs+0x17b/0x1b0
May 12 06:12:30 pv0-it kernel: [  125.937752] Code: 84 e1 fe ff ff 45 31 f6 eb 11 41 83 c6 01 44 39 73 14 0f 86 ce fe ff ff 8b 7b 10 44 01 f7 e8 6c 1f b8 ff 48 83 78 70 00 74 e0 <0f> 0b 49 8d b5 b0 00 00 00 e8 07 da b8 ff e9 cf fe ff ff 48 8b 78
May 12 06:12:30 pv0-it kernel: [  125.937787] RSP: 0018:ffffb6e915b5bcf8 EFLAGS: 00010286
May 12 06:12:30 pv0-it kernel: [  125.937798] RAX: ffff937df98d8400 RBX: ffff937e0a765d80 RCX: 0000000000000000
May 12 06:12:30 pv0-it kernel: [  125.937812] RDX: 0000000000000000 RSI: 0000000000000024 RDI: ffffffffa5466940
May 12 06:12:30 pv0-it kernel: [  125.937826] RBP: ffffb6e915b5bd28 R08: ffff937e1c001ff0 R09: ffff937e1c002138
May 12 06:12:30 pv0-it kernel: [  125.937840] R10: 0000000000000000 R11: ffffffffa5466948 R12: ffff937e1b68f2c0
May 12 06:12:30 pv0-it kernel: [  125.937854] R13: ffff937e1b68f000 R14: 0000000000000000 R15: fffffffffffffff2
May 12 06:12:30 pv0-it kernel: [  125.937869] FS:  00007ff8dadc21c0(0000) GS:ffff937e1fa80000(0000) knlGS:0000000000000000
May 12 06:12:30 pv0-it kernel: [  125.937884] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
May 12 06:12:30 pv0-it kernel: [  125.937896] CR2: 000055fcc8db0d3c CR3: 000000081a1e4005 CR4: 00000000001606e0
May 12 06:12:30 pv0-it kernel: [  125.937910] Call Trace:
May 12 06:12:30 pv0-it kernel: [  125.937920]  pci_disable_msi+0xfa/0x120
May 12 06:12:30 pv0-it kernel: [  125.937935]  e1000e_reset_interrupt_capability+0x52/0x60 [e1000e]
May 12 06:12:30 pv0-it kernel: [  125.937951]  e1000_remove+0xb9/0x170 [e1000e]
May 12 06:12:30 pv0-it kernel: [  125.937962]  pci_device_remove+0x3e/0xc0
May 12 06:12:30 pv0-it kernel: [  125.937971]  device_release_driver_internal+0xe0/0x1b0
May 12 06:12:30 pv0-it kernel: [  125.937983]  device_driver_detach+0x14/0x20
May 12 06:12:30 pv0-it kernel: [  125.937993]  unbind_store+0xf9/0x130
May 12 06:12:30 pv0-it kernel: [  125.938001]  drv_attr_store+0x27/0x40
May 12 06:12:30 pv0-it kernel: [  125.938011]  sysfs_kf_write+0x3b/0x40
May 12 06:12:30 pv0-it kernel: [  125.938019]  kernfs_fop_write+0xda/0x1c0
May 12 06:12:30 pv0-it kernel: [  125.938029]  __vfs_write+0x1b/0x40
May 12 06:12:30 pv0-it kernel: [  125.938037]  vfs_write+0xab/0x1b0
May 12 06:12:30 pv0-it kernel: [  125.938045]  ksys_write+0x61/0xe0
May 12 06:12:30 pv0-it kernel: [  125.938052]  __x64_sys_write+0x1a/0x20
May 12 06:12:30 pv0-it kernel: [  125.938062]  do_syscall_64+0x5a/0x130
May 12 06:12:30 pv0-it kernel: [  125.938072]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
May 12 06:12:30 pv0-it kernel: [  125.938083] RIP: 0033:0x7ff8dafcf471
May 12 06:12:30 pv0-it kernel: [  125.938092] Code: 00 00 75 05 48 83 c4 58 c3 e8 0b 4d ff ff 66 2e 0f 1f 84 00 00 00 00 00 90 8b 05 da ef 00 00 85 c0 75 16 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 57 c3 66 0f 1f 44 00 00 41 54 49 89 d4 55 48
May 12 06:12:30 pv0-it kernel: [  125.938127] RSP: 002b:00007fff4f3f18a8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
May 12 06:12:30 pv0-it kernel: [  125.938142] RAX: ffffffffffffffda RBX: 000055e4a69eb260 RCX: 00007ff8dafcf471
May 12 06:12:30 pv0-it kernel: [  125.938156] RDX: 000000000000000c RSI: 000055e4ad84ced0 RDI: 000000000000000d
May 12 06:12:30 pv0-it kernel: [  125.938170] RBP: 000055e4ad84ced0 R08: 0000000000000000 R09: aaaaaaaaaaaaaaab
May 12 06:12:30 pv0-it kernel: [  125.938184] R10: 000055e4ad842458 R11: 0000000000000246 R12: 000000000000000c
May 12 06:12:30 pv0-it kernel: [  125.938198] R13: 000055e4a69eb260 R14: 000000000000000d R15: 000055e4ad84a980
May 12 06:12:30 pv0-it kernel: [  125.938212] Modules linked in: nfsv3 nfs_acl rpcsec_gss_krb5 auth_rpcgss nfsv4 nfs lockd grace fscache ebtable_filter ebtables ip_set ip6table_raw iptable_raw ip6table_filter ip6_tables iptable_filter bpfilter bonding softdog nfnetlink_log nfnetlink intel_rapl_msr intel_rapl_common x86_pkg_temp_thermal intel_powerclamp coretemp mgag200 drm_vram_helper ttm kvm_intel drm_kms_helper kvm drm i2c_algo_bit fb_sys_fops syscopyarea ipmi_ssif sysfillrect crct10dif_pclmul crc32_pclmul ghash_clmulni_intel sysimgblt cdc_ether aesni_intel usbnet input_leds joydev mii aes_x86_64 crypto_simd cryptd ie31200_edac glue_helper ipmi_si ipmi_devintf mac_hid pcspkr intel_cstate ipmi_msghandler intel_rapl_perf sch_fq vhost_net vhost tap ib_iser rdma_cm iw_cm ib_cm ib_core iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi vfio_pci sunrpc vfio_virqfd irqbypass vfio_iommu_type1 vfio ip_tables x_tables autofs4 zfs(PO) zunicode(PO) zlua(PO) zavl(PO) icp(PO) zcommon(PO) znvpair(PO) spl(O) btrfs xor zstd_compress
May 12 06:12:30 pv0-it kernel: [  125.938237]  raid6_pq libcrc32c wmi hid_sunplus hid_generic usbkbd usbmouse usbhid gpio_ich ahci i2c_i801 hid libahci lpc_ich e1000e
May 12 06:12:30 pv0-it kernel: [  125.941666] ---[ end trace 6d6d6578e1c43408 ]---
May 12 06:12:30 pv0-it kernel: [  125.942380] RIP: 0010:free_msi_irqs+0x17b/0x1b0
May 12 06:12:30 pv0-it kernel: [  125.943070] Code: 84 e1 fe ff ff 45 31 f6 eb 11 41 83 c6 01 44 39 73 14 0f 86 ce fe ff ff 8b 7b 10 44 01 f7 e8 6c 1f b8 ff 48 83 78 70 00 74 e0 <0f> 0b 49 8d b5 b0 00 00 00 e8 07 da b8 ff e9 cf fe ff ff 48 8b 78
May 12 06:12:30 pv0-it kernel: [  125.944529] RSP: 0018:ffffb6e915b5bcf8 EFLAGS: 00010286
May 12 06:12:30 pv0-it kernel: [  125.945261] RAX: ffff937df98d8400 RBX: ffff937e0a765d80 RCX: 0000000000000000
May 12 06:12:30 pv0-it kernel: [  125.946018] RDX: 0000000000000000 RSI: 0000000000000024 RDI: ffffffffa5466940
May 12 06:12:30 pv0-it kernel: [  125.946779] RBP: ffffb6e915b5bd28 R08: ffff937e1c001ff0 R09: ffff937e1c002138
May 12 06:12:30 pv0-it kernel: [  125.947509] R10: 0000000000000000 R11: ffffffffa5466948 R12: ffff937e1b68f2c0
May 12 06:12:30 pv0-it kernel: [  125.948240] R13: ffff937e1b68f000 R14: 0000000000000000 R15: fffffffffffffff2
May 12 06:12:30 pv0-it kernel: [  125.948960] FS:  00007ff8dadc21c0(0000) GS:ffff937e1fa80000(0000) knlGS:0000000000000000
May 12 06:12:30 pv0-it kernel: [  125.949698] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
May 12 06:12:30 pv0-it kernel: [  125.950415] CR2: 000055fcc8db0d3c CR3: 000000081a1e4005 CR4: 00000000001606e0
May 12 06:12:30 pv0-it pvedaemon[3168]: <root@pam> end task UPID:pv0-it:00000E22:00003117:5EBA3E4E:qmstart:100:root@pam: unable to read tail (got 0 bytes)
 
Try updating to 6.2 which ships with a 5.4-based kernel. I remember someone having a similar issue and that kernel fixing it, so worth a try.

To edit the kernel commandline you either put it in /etc/kernel/cmdline as you say or /etc/default/grub if using grub as your bootloader. Don't forget to run pve-efiboot-tool refresh or update-grub (again, depending on what you use as bootloader). Check /proc/cmdline to see if it worked.
 
  • Like
Reactions: frotella
Just updated (two days ago this update wasn't available) and seems that has solved the problem
Solved for now, i'll keep testing. Thank you!
 
  • Like
Reactions: Stefan_R
Try updating to 6.2 which ships with a 5.4-based kernel. I remember someone having a similar issue and that kernel fixing it, so worth a try.

To edit the kernel commandline you either put it in /etc/kernel/cmdline as you say or /etc/default/grub if using grub as your bootloader. Don't forget to run pve-efiboot-tool refresh or update-grub (again, depending on what you use as bootloader). Check /proc/cmdline to see if it worked.
Hi, I am having the same problem when I want to start a normal vm. After issues with the network in my rack the vm wont boot anymore, giving the error message.
Rebooting my nodes is not a solution.
 
Hi, I am having the same problem when I want to start a normal vm. After issues with the network in my rack the vm wont boot anymore, giving the error message.
Rebooting my nodes is not a solution.
Please open a new thread for such issues instead of replying to an old one. We're currently shipping a 5.11 kernel, making the previous response way outdated anyway, so I doubt you're experiencing the *same* issue. When opening a new thread, please provide more details, such as your network configuration ('/etc/network/interfaces'), hardware config ('ip link', 'ip a', etc...) and VM config ('qm config <vmid>').