Issues after upgrading to 6.17.4-1-pve

daviddanko

New Member
Apr 29, 2024
6
3
3
Today I upgraded the kernel to 6.17.4-1 from 6.17.2-2, and after rebooting, my server didn't turn on. Checking the logs it seemed that because of my media disk (rows sstarting with ata5)

r/Proxmox - Issues after upgrading to 6.17.4-1-pve
When I went back to the previous kernel, all seemed to be fine, but my truenas instance did not boot up.

When I unplugged and plugged in back my media disk, these errors were repeating:

Code:
Dec 19 22:49:09 homeserver-01 kernel: ata5.00: cmd ca/00:10:10:12:40/00:00:00:00:00/e0 tag 22 dma 8192 out\
res 51/04:10:10:12:40/00:00:00:00:00/e0 Emask 0x1 (device error)\
Dec 19 22:49:09 homeserver-01 kernel: ata5.00: status: \{ DRDY ERR \}\
Dec 19 22:49:09 homeserver-01 kernel: ata5.00: error: \{ ABRT \}\
Dec 19 22:49:09 homeserver-01 kernel: ahci 10000:e0:17.0: port does not support device sleep\
Dec 19 22:49:09 homeserver-01 kernel: ata5.00: supports DRM functions and may not be fully accessible\
Dec 19 22:49:09 homeserver-01 kernel: ata5.00: failed to enable AA (error_mask=0x1)\
Dec 19 22:49:09 homeserver-01 kernel: ata5.00: supports DRM functions and may not be fully accessible\
Dec 19 22:49:09 homeserver-01 kernel: ata5.00: failed to enable AA (error_mask=0x1)\
Dec 19 22:49:09 homeserver-01 kernel: ata5.00: configured for UDMA/133 (device error ignored)\
Dec 19 22:49:09 homeserver-01 kernel: ahci 10000:e0:17.0: port does not support device sleep\
Dec 19 22:49:09 homeserver-01 kernel: ata5: EH complete\
Dec 19 22:49:09 homeserver-01 kernel: ata5.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0\
Dec 19 22:49:09 homeserver-01 kernel: ata5.00: irq_stat 0x40000001\
Dec 19 22:49:09 homeserver-01 kernel: ata5.00: failed command: WRITE DMA EXT\
Dec 19 22:49:09 homeserver-01 kernel: ata5.00: cmd 35/00:10:10:84:e0/00:00:e8:00:00/e0 tag 23 dma 8192 out\ res 51/04:10:10:84:e0/00:00:e8:00:00/e0 Emask 0x1 (device error)\


To me it seemed like my disk just died. So I disabled my truenas VM, clicked on detach on the media disk in the VM options, removed the entry for the disk from fstab in proxmox, but it still doesn't boot with the latest kernel. It boots fine with 6.17.2-2 however. When I plug the disk in, the proxmox syslogs are showing this:

Code:
Dec 19 23:18:27 homeserver-01 kernel: sd 4:0:0:0: [sda] tag#21 CDB: Read(10) 28 00 00 00 00 00 00 01 00 00
Dec 19 23:18:27 homeserver-01 kernel: blk_print_req_error: 2 callbacks suppressed
Dec 19 23:18:27 homeserver-01 kernel: I/O error, dev sda, sector 0 op 0x0:(READ) flags 0x0 phys_seg 32 prio class 2
Dec 19 23:18:27 homeserver-01 kernel: sd 4:0:0:0: [sda] tag#14 FAILED Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK cmd_age=0s
Dec 19 23:18:27 homeserver-01 kernel: sd 4:0:0:0: [sda] tag#14 CDB: Read(10) 28 00 00 00 00 00 00 01 00 00

Perhaps it's worth mentioning that just before I also upgraded truenas to 25.10.1.

Currently, even though there should be no trace of that disk, the newest still doesn't boot:

r/Proxmox - Issues after upgrading to 6.17.4-1-pve r/Proxmox - Issues after upgrading to 6.17.4-1-pve
So my question is, did my media disk die? If so, and the new kernel was hanging because of the disk, why does the old kernel booted with the faulty disk? Why doesn't the new kernel boots even thought I removed, I believe, every reference to that disk?
 
What SATA controller / HBA are you using.?
I had to disable the Rombar (and upgrade the firmware) for the my controller to boot TrueNAS after upgrade from kernel 6.8.14
 
What SATA controller / HBA are you using.?
I had to disable the Rombar (and upgrade the firmware) for the my controller to boot TrueNAS after upgrade from kernel 6.8.14

Code:
root@homeserver-01:~# lspci -nnk | egrep -A3 -i 'sata|raid|sas|storage'
0000:00:0e.0 RAID bus controller [0104]: Intel Corporation Volume Management Device NVMe RAID Controller [8086:467f]
        Subsystem: Dell Device [1028:0be5]
        Kernel driver in use: vmd
        Kernel modules: vmd, ahci
--
10000:e0:17.0 SATA controller [0106]: Intel Corporation Alder Lake-S PCH SATA Controller [AHCI Mode] [8086:7ae2] (rev 11)
        Subsystem: Dell Device [1028:0be5]
        Kernel driver in use: ahci
        Kernel modules: ahci
So I believe I am not affected by that ROMbar issue, right?
 
So I believe I am not affected by that ROMbar issue, right?
I can't say that as I have no experience with those controllers and no idea what devices you have connected on each. I would try to disable the Rombar one device at a time. Then look into firmware updates, rolling back the kernel to the earlier version and if it still fails do further investigations from that point. There is no simple answer to these sort of cases, it's going to be diagnostics by trial and error.

It's fairly easy to diagnose if the drive is faulty, just connect in any other machine.
 
Ok wierd , i have exactly the same SATA controller on my Dell optiplex 7020 with exactly the same issue with the new kernel (6.17.4-1-pve)
So i pinned the 6.17.2-1 kernel and everything is working again ( command used : proxmox-boot-tool kernel pin 6.17.2-1-pve )
My machine(s) wont boot with that newer kernel : proxmox-kernel-6.17.4-1-pve.

After youre "solution" i also updated the bios (1.20 latest one) , reset the bios to default and reinstalled the proxmox server from scratch and did the apt update/upgrade.
Again .. machine wont boot with the proxmox-kernel-6.17.4-1-pve kernel.

Are we the only ones ?
m
 
When I did some chores during the boot, I realized it actually booted for me. But it took a really long time. There was a workaround which was working for me before the BIOS update.

Edit the /etc/default/grub by adding zfs_import_skip=1 to the GRUB_CMDLINE_LINUX_DEFAULT line.

For example:
Code:
GRUB_CMDLINE_LINUX_DEFAULT="quiet zfs_import_skip=1"

But currently with the newest BIOS I do not need this anymore.
 
Ok thx man ! That was not the issue but you helped me (sort off).
The issue was some default setting in teh bios.
When updating the bios the SATA/NVMe is set to "Raid ON" in stead of AHCI/NVMe.
See picture.
Now the systems are working again , thanks for pointing me in the right direction.20251221_180115.jpeg
 
  • Like
Reactions: daviddanko
Hmm, I also changed that setting from RAID On to AHCI/NVMe, but it was so I could install Windows on the host and use the Windows update tool to update the BIOS. Maybe this change helped me as well? I tried so many things that I’m not sure what made the difference.

Regarding the BIOS update itself, can I ask how you managed to update it? For me the OTA tool didn’t work, creating a FAT32 flash drive and copying the BIOS update .exe didn’t work, and the fwupdate command was also unsuccessful.
 
Sure @daviddanko ,
You asked me : Regarding the BIOS update itself, can I ask how you managed to update it?
Just download the DELL Optiplex_*.exe to a USB drive. In my case Fat32 USB pen drive.
Then open via F12 the Bios-Update screen ( on the right side of the screen )
Then select the inserted USB pen drive with just the Optiplex_*.exe file , browse the USB pen drive and slect that file.
Then 2x UPDATE bios and confirm with the OK button.
Thats it ! Then load the bios defaults ( just to be sure ) and search for AHCI and change the setting to AHCI/NVMe , safe , reboot Done !
I am now on :
CPU(s) 20 x Intel(R) Core(TM) i5-14500T (1 Socket)
Kernel Version Linux 6.17.4-1-pve (2025-12-03T15:42Z)
Boot Mode EFI (Secure Boot)
Manager Version pve-manager/9.1.2/9d436f37a0ac4172


Good luck ! M.
 
  • Like
Reactions: Kingneutron