[SOLVED] /var/ read-only

FuriousRage

Renowned Member
Oct 17, 2014
124
7
83
I just installed 9.0-1 on an older computer (i5 from 2014), ddr3-era.

9.2 couldnt install.

i managed to update to pve-manager/9.2.2/b9984c6d90a4bd80 (running kernel: 7.0.2-6-pve)
---

How ever, when i try do some stuff, /var/ seems to be read-only. pve is installed on ext4.

The last some lines in dmesg is plenty red:

[ 81.834535] ata1.00: status: { DRDY }
[ 81.834537] ata1: hard resetting link
[ 82.139899] ata1: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
[ 82.159148] ata1.00: configured for UDMA/133
[ 82.159214] scsi_io_completion_action: 3 callbacks suppressed
[ 82.159218] sd 0:0:0:0: [sda] tag#0 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_OK cmd_age=3s
[ 82.159222] sd 0:0:0:0: [sda] tag#0 Sense Key : Aborted Command [current]
[ 82.159223] sd 0:0:0:0: [sda] tag#0 Add. Sense: No additional sense information
[ 82.159225] sd 0:0:0:0: [sda] tag#0 CDB: Write(10) 2a 00 03 a8 70 00 00 20 00 00
[ 82.159226] blk_print_req_error: 4 callbacks suppressed
[ 82.159227] I/O error, dev sda, sector 61370368 op 0x1:(WRITE) flags 0x0 phys_seg 64 prio class 2
[ 82.159233] EXT4-fs warning (device dm-1): ext4_end_bio:369: I/O error 10 writing to inode 1311748 starting block 5310464)
[ 82.159244] sd 0:0:0:0: [sda] tag#2 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_OK cmd_age=2s
[ 82.159246] sd 0:0:0:0: [sda] tag#2 Sense Key : Aborted Command [current]
[ 82.159247] sd 0:0:0:0: [sda] tag#2 Add. Sense: No additional sense information
[ 82.159248] sd 0:0:0:0: [sda] tag#2 CDB: Write(10) 2a 00 03 64 10 00 00 00 08 00
[ 82.159249] I/O error, dev sda, sector 56889344 op 0x1:(WRITE) flags 0x9800 phys_seg 1 prio class 2
[ 82.159251] I/O error, dev sda, sector 56889344 op 0x1:(WRITE) flags 0x9800 phys_seg 1 prio class 2
[ 82.159254] Buffer I/O error on dev dm-1, logical block 4751360, lost sync page write
[ 82.159259] sd 0:0:0:0: [sda] tag#30 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_OK cmd_age=3s
[ 82.159260] sd 0:0:0:0: [sda] tag#30 Sense Key : Aborted Command [current]
[ 82.159272] sd 0:0:0:0: [sda] tag#30 Add. Sense: No additional sense information
[ 82.159273] sd 0:0:0:0: [sda] tag#30 CDB: Read(10) 28 00 00 20 08 00 00 01 00 00
[ 82.159274] I/O error, dev sda, sector 2099200 op 0x0:(READ) flags 0x0 phys_seg 32 prio class 2
[ 82.159280] JBD2: I/O error when updating journal superblock for dm-1-8.
[ 82.159282] ata1: EH complete
[ 82.159419] EXT4-fs (dm-1): Remounting filesystem read-only
[ 82.160043] EXT4-fs (dm-1): ext4_do_writepages: jbd2_start: 13312 pages, ino 1314130; err -30
[ 82.165016] buffer_io_error: 11254 callbacks suppressed
[ 82.165018] Buffer I/O error on device dm-1, logical block 5310464
[ 82.165021] Buffer I/O error on device dm-1, logical block 5310465
[ 82.165022] Buffer I/O error on device dm-1, logical block 5310466
[ 82.165023] Buffer I/O error on device dm-1, logical block 5310467
[ 82.165024] Buffer I/O error on device dm-1, logical block 5310468
[ 82.165025] Buffer I/O error on device dm-1, logical block 5310469
[ 82.165026] Buffer I/O error on device dm-1, logical block 5310470
[ 82.165027] Buffer I/O error on device dm-1, logical block 5310471
[ 82.165028] Buffer I/O error on device dm-1, logical block 5310472
[ 82.165029] Buffer I/O error on device dm-1, logical block 5310473
 
Please try a different SATA cable, different port (or if possible a port on a different SATA controller)? Looks like something goes wrong between the CPU and the drive (storage medium) but maybe it's not the drive itself.

EDIT: This is not specific to Proxmox and other (LInux) drive troubleshooting guides will probably apply.
 
Last edited:
  • Like
Reactions: UdoB
[ 82.159249] I/O error, dev sda, sector 56889344 op 0x1:(WRITE) flags 0x9800 phys_seg 1 prio class 2
[ 82.159251] I/O error, dev sda, sector 56889344 op 0x1:(WRITE) flags 0x9800 phys_seg 1 prio class 2
This is usually a hardware defect. Try another disk, after first checking all cabling and connectors - as already recommended by @leesteken.

I would run an extensive self-test; search for "smartctl selftest" for this.

If your are brave and crazy enough just leave the first ~40 MB unused. (56889344 is approx. 28.4 MB from the beginning and 61370368 is ~31 MB, assuming 512 Byte/Sector.)

In any way you don't want to live with persistent "I/O error"...

Good luck!
 
Did:

smartctl -t short /dev/sda
smartctl -t long /dev/sda
smartctl -t conveyance /dev/sda
smartctl -t select,123+345 /dev/sda


smartctl -H /dev/sda
=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED


*seems* smart passes/its ok. I'll check cables tomorrow, i did rumage around in the case earlier trying to get the 10 Gbps nic in the tiny case, might have affected the cable to the ssd.
 
  • Like
Reactions: UdoB
Switched sata cable to a different one, switched port on mainboard.

Have now run for i in {1..15}; do hdparm --offset 50 -t /dev/sda; done quite many times with different offsets.

dmesg screams:

[ 733.437796] dmar_fault: 2401198 callbacks suppressed
[ 733.437801] DMAR: DRHD: handling fault status reg 3
[ 733.437804] DMAR: [DMA Write NO_PASID] Request device [00:1f.2] fault addr 0xd5200000 [fault reason 0x0c] non-zero reserved fields in PTE
[ 733.437812] DMAR: DRHD: handling fault status reg 3
[ 733.437813] DMAR: [DMA Write NO_PASID] Request device [00:1f.2] fault addr 0xd5205000 [fault reason 0x0c] non-zero reserved fields in PTE
[ 733.437827] DMAR: DRHD: handling fault status reg 3
[ 733.437828] DMAR: [DMA Write NO_PASID] Request device [00:1f.2] fault addr 0xd5207000 [fault reason 0x0c] non-zero reserved fields in PTE
[ 733.437879] DMAR: DRHD: handling fault status reg 3
[ 739.458028] dmar_fault: 1827565 callbacks suppressed
[ 739.458032] DMAR: DRHD: handling fault status reg 3
[ 739.458035] DMAR: [DMA Write NO_PASID] Request device [00:1f.2] fault addr 0xd5600000 [fault reason 0x0c] non-zero reserved fields in PTE
[ 739.458075] DMAR: DRHD: handling fault status reg 3
[ 739.458077] DMAR: [DMA Write NO_PASID] Request device [00:1f.2] fault addr 0xd5605000 [fault reason 0x0c] non-zero reserved fields in PTE
[ 739.458108] DMAR: DRHD: handling fault status reg 3
[ 739.458111] DMAR: [DMA Write NO_PASID] Request device [00:1f.2] fault addr 0xd5608000 [fault reason 0x0c] non-zero reserved fields in PTE
[ 739.458114] DMAR: DRHD: handling fault status reg 3
[ 745.483470] dmar_fault: 1920020 callbacks suppressed
[ 745.483475] DMAR: DRHD: handling fault status reg 3
[ 745.483478] DMAR: [DMA Write NO_PASID] Request device [00:1f.2] fault addr 0xd5400000 [fault reason 0x0c] non-zero reserved fields in PTE
[ 745.483488] DMAR: DRHD: handling fault status reg 3
[ 745.483490] DMAR: [DMA Write NO_PASID] Request device [00:1f.2] fault addr 0xd5405000 [fault reason 0x0c] non-zero reserved fields in PTE
[ 745.483556] DMAR: DRHD: handling fault status reg 3
[ 745.483560] DMAR: [DMA Write NO_PASID] Request device [00:1f.2] fault addr 0xd5408000 [fault reason 0x0c] non-zero reserved fields in PTE
[ 745.483565] DMAR: DRHD: handling fault status reg 3
[ 751.505786] dmar_fault: 1953720 callbacks suppressed
[ 751.505791] DMAR: DRHD: handling fault status reg 3
[ 751.505794] DMAR: [DMA Write NO_PASID] Request device [00:1f.2] fault addr 0xd5600000 [fault reason 0x0c] non-zero reserved fields in PTE
[ 751.505820] DMAR: DRHD: handling fault status reg 3
[ 751.505822] DMAR: [DMA Write NO_PASID] Request device [00:1f.2] fault addr 0xd5605000 [fault reason 0x0c] non-zero reserved fields in PTE
[ 751.505907] DMAR: DRHD: handling fault status reg 3
[ 751.505909] DMAR: [DMA Write NO_PASID] Request device [00:1f.2] fault addr 0xd5608000 [fault reason 0x0c] non-zero reserved fields in PTE
[ 751.505912] DMAR: DRHD: handling fault status reg 3


So im going to assume the ssd is not in great shape, right?
 
Tested a mechanical drive. Same errors.

Some more websearching, i tested intel_iommu=off in grub config, updated grub and rebooted. seems like the ssd also works, no screaming red dmesg outputs after the hdparm tests ive done

So it seems it was the iommu that was my problem.
 
Last edited: