PVE 9.1 running on a BOSS-S1 causing I/O errors and filesystem remounts as R/O

LerryV2 · Monday at 17:47

I have 3 Dell R640, they all have BOSS-S1 cards in them with Intel SSDSCKKB240G8 M.2 SATA drives. I have verified the m.2 drives are in good health by reading the smart data and you could not ask for better drives unless you were handed brand new ones.

I have verified that all 3 machines have the latest iDRAC/BIOS/LifeCycle/BOSS firmware. Ive looked on Dells website and tried to do an update through the machines themselves and everything says they have the latest firmware.

I get the following errors on a fresh install of PVE on the BOSS card on first login

exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen

failed command: READ SECTOR(S) EXT and READ FPDMA QUEUED

I/O errors on /dev/sda (BOSS virtual disk)
ext4 journal aborts
Filesystem remounting read-only
System becomes unusable

Ive tried adding

libata.force=noncq

and

libata.dma=0

in grub and still have the same issue.

dmseg gives me this error

ata15.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen
ata15.00: failed command: READ SECTOR(S) EXT
ata15.00: cmd 24/00:00:20:50:35/00:20:01:00:00/e0 tag 22 pio 4194304 in
I/O error, dev sda, sector 2027136 op 0x0READ) flags 0x84700 phys_seg 66 prio class 2
EXT4-fs error: ext4_journal_check_start:87: Detected aborted journal
EXT4-fs (dm-1): Remounting filesystem read-only

What are my options outside of not using the BOSS card and installing PVE on either one of the U.2 drives or an external drive

LerryV2 · Monday at 17:55

I should note that the same issues happens with Proxmox Backup Server as well. Same errors so I assume its a kernel/OS/hardware issue but not sure where the breakdown is.

Cyberishf · Tuesday at 08:54

LerryV2 said:
What are my options outside of not using the BOSS card and installing PVE on either one of the U.2 drives or an external drive

Something's the problem with those Intel drives, not the BOSS card itself. I exchanged my Intel SSDSCKKB480G8 drives for a pair of Micron MTFDDAV480TDS and those work well.

You can also run kernel 6.14, as the issue was introduced with 6.17.

LerryV2 · Tuesday at 20:46

I ordered those same Micron drives you got so lets hope that fixes it. I downgraded to kernel 6.12 and all the I/O errors went away and the system runs much faster too, so Im not sure what the major differences are but fingers crossed that fixes them. Ill post an update when they come in. Thanks for the help.

segot · 2026-03-04T15:17:30+0100

I ran in to the exact same issue today when upgrading two 4 node clusters that use BOSS-S1 adapters for the OS.
However, only the second cluster gets R/O error

Servers are PowerEdge R740.

I'm still investigating but so far the only difference I've found is.
Cluster 1: BOSS-S1 firmware 2.5.13.3022 (Works fine)
Cluster 2: BOSS-S1 firmware 2.5.13.3024 (IO errors)

edit:
Firmware seems unrelated as two servers in cluster 1 are running 2.5.13.3024

LerryV2 · 2026-03-04T15:23:03+0100

And 2.5.13.3024 is the version that I'm running. Interesting.
So maybe its not a M.2 SATA issue but a firmware BOSS issue.
Not sure how feasible it is to downgrade, but my first try will be the Micron m.2 SATA drives
If those dont work I might try dumping the BOSS cards completely and use a PCI-e card with 2 m.2 drives on them and just use software RAID
From what I can tell the PCI-e ports supports bifurcation

I should also note I tried updating the firmware on the Intel drives but had zero luck. I tried several methods including Intel and Solidigm/SK Hynix software. No luck. It could have been the StarTech card I was using or other issues, but having dealt with it for 2 days I was over it.

Im curious, when you look in iDRAC on both the 3022 and 3024, do they should a drive health of 0%? All mine are 3024 and they should they have 0 drive health left, but when connecting them to a machine to read the SMART data, they are in perfect shape. Zero issues.

segot · 2026-03-04T16:32:54+0100

Should have looked closer at the SSDs in the clusters. Just noticed what Cyberishf mentioned above.

The working hosts are using MICRON - MTFDDAV240TCB drives.
While the ones with the issue are using INTEL - SSDSCKJB240G7R drives.

Guess the only option at the moment is downgrading the Kernel...

Regarding the drive health, in my iDRAC version it is listed as "Remaining Rated Write Endurance". And it is between 94% - 100%

LerryV2 · 2026-03-04T16:35:00+0100

The Intel drives all show a 0% life left. Its a new to use server install so when the drives come Ill swap them out and see what happens. Thanks.

segot · 2026-03-04T17:14:38+0100

Alright, I have some progress. On one of the failing nodes.

BOSS-S1 adapter card:

Drive	Model	Firmware
0	SSDSCKJB240G7R	DL43
1	SSDSCKKB240G8R	DL6P

I upgraded the firmware on drive 1 from DL6P to DL6R and the errors are gone

Firmware upgrade was performing using the iDRAC.

LerryV2 · 2026-03-04T17:26:03+0100

All mine are running firmware XC311151, not sure what that translates to from a Dell part number, but thats what the drives displayed when I read the SMART data on them. Mine are the SSDSCKKB240G8, non R versions so maybe thats why they show a different revision name.

Cyberishf · 2026-03-05T10:51:12+0100

LerryV2 said:
The Intel drives all show a 0% life left. Its a new to use server install so when the drives come Ill swap them out and see what happens. Thanks.

For what its worth, on my machines Intel drives report accurately and Micron drives show 0% life left however they are all perfectly cromulent.
It must be the BOSS card's or IDRAC's inability to correctly parse the SMART readouts from the drives.

Search

Search

PVE 9.1 running on a BOSS-S1 causing I/O errors and filesystem remounts as R/O

LerryV2

New Member

LerryV2

New Member

Cyberishf

New Member

LerryV2

New Member

segot

New Member

LerryV2

New Member

segot

New Member

LerryV2

New Member

segot

New Member

LerryV2

New Member

Cyberishf

New Member

We value your privacy