error after update from 7 to 8: EXT4-fs error (device dm-4): ext4_journal_check_start:84 comm pvescheduler: Detected aborted journal | MegaRAID SAS

Kevin Smith

Active Member
Jan 15, 2018
13
2
43
Today I updated from PROXMOX 7.2 to newest version (8.2).
Everything went smoothly, the server was working, the virtual machines too. But after a few minutes the server hung and I could ping but could not connect to the remote computer. On the physical monitor I get the message:


Bash:
EXT4-fs error (device dm-4): ext4_journal_check_start:84 comm pvescheduler: Detected aborted journal
EXT4-fs error (device dm-4): ext4_journal_check_start:84 comm rm:main R:Reg Detected aborted journal
EXT4-fs error (device dm-4): Remounting filesystem read-only

So I restarted the virtual machine but few minutes after restart I had the same problem.
I haven't been upgrading server hardware and machine was working for dozens of months without problems.

While updating my server from version 7 to 8 I obviously ran pve7to8 (without warnings) but by mistake I used "apt get upgrade" instead of "apt get dist-upgrade". May that be the reason of my problems ?

I've attached some logs - if you need any others please let me know.

Do you have any clue what might be the reason ?
 

Attachments

Last edited:
So I reinstalled the whole server to the newest Proxmox version. I managed to restore VMs from backups and then I got the same error and more information in log:
Bash:
Oct 12 00:16:00 host6 kernel: megaraid_sas 0000:43:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x000e address=0xa518d000 flags=0x0020]
Oct 12 00:16:00 host6 kernel: megaraid_sas 0000:43:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x000e address=0xa518dd00 flags=0x0020]
Oct 12 00:16:00 host6 kernel: megaraid_sas 0000:43:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x000e address=0xa518e000 flags=0x0020]
Oct 12 00:16:00 host6 kernel: megaraid_sas 0000:43:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x000e address=0xa518ed00 flags=0x0020]
Oct 12 00:16:00 host6 kernel: megaraid_sas 0000:43:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x000e address=0xa518f000 flags=0x0020]

lspci - info about RAID controller:
Bash:
43:00.0 RAID bus controller: Broadcom / LSI MegaRAID Tri-Mode SAS3408 (rev 01)

Bash:
uname -a
Linux host6 6.8.4-2-pve #1 SMP PREEMPT_DYNAMIC PMX 6.8.4-2 (2024-04-10T17:36Z) x86_64 GNU/Linux

Controller: AVAGO MegaRAID SAS 9440-8i
Firmware version: 5.220.01-3691
 
Last edited:
Did you install storcli (download from broadcom) for your megaraid ctrl ?
What the outcoming from "storcli /call show" ?
 
  • Like
Reactions: Kevin Smith
Thanks for help but storcli doesn't show anything special.

Bash:
root@host6:~/kontroler/storcli_rel/Unified_storcli_all_os/Ubuntu# /opt/MegaRAID/storcli/storcli64 show
CLI Version = 007.2310.0000.0000 Nov 02, 2022
Operating system = Linux 6.8.4-2-pve
Status Code = 0
Status = Success
Description = None


Number of Controllers = 1
Host Name = host6
Operating System  = Linux 6.8.4-2-pve


System Overview :
===============


------------------------------------------------------------------------------------
Ctl Model                   Ports PDs DGs DNOpt VDs VNOpt BBU sPR DS  EHS ASOs Hlth
------------------------------------------------------------------------------------
  0 AVAGOMegaRAIDSAS9440-8i     8   8   3     0   3     0 N/A On  1&2 Y      1 Opt
------------------------------------------------------------------------------------


Ctl=Controller Index|DGs=Drive groups|VDs=Virtual drives|Fld=Failed
PDs=Physical drives|DNOpt=Array NotOptimal|VNOpt=VD NotOptimal|Opt=Optimal
Msng=Missing|Dgd=Degraded|NdAtn=Need Attention|Unkwn=Unknown
sPR=Scheduled Patrol Read|DS=DimmerSwitch|EHS=Emergency Spare Drive
Y=Yes|N=No|ASOs=Advanced Software Options|BBU=Battery backup unit/CV
Hlth=Health|Safe=Safe-mode boot|CertProv-Certificate Provision mode
Chrg=Charging | MsngCbl=Cable Failure


modinfo megaraid_sas

Bash:
filename:       /lib/modules/6.8.4-2-pve/kernel/drivers/scsi/megaraid/megaraid_sas.ko
description:    Broadcom MegaRAID SAS Driver
author:         megaraidlinux.pdl@broadcom.com
version:        07.727.03.00-rc1
license:        GPL
srcversion:     6FAE4049BC4B4F28CCA3B4C
alias:          pci:v00001000d000010E7sv*sd*bc*sc*i*
alias:          pci:v00001000d000010E4sv*sd*bc*sc*i*
alias:          pci:v00001000d000010E3sv*sd*bc*sc*i*
alias:          pci:v00001000d000010E0sv*sd*bc*sc*i*
alias:          pci:v00001000d000010E6sv*sd*bc*sc*i*
alias:          pci:v00001000d000010E5sv*sd*bc*sc*i*
alias:          pci:v00001000d000010E2sv*sd*bc*sc*i*
alias:          pci:v00001000d000010E1sv*sd*bc*sc*i*
alias:          pci:v00001000d0000001Csv*sd*bc*sc*i*
alias:          pci:v00001000d0000001Bsv*sd*bc*sc*i*
alias:          pci:v00001000d00000017sv*sd*bc*sc*i*
alias:          pci:v00001000d00000016sv*sd*bc*sc*i*
alias:          pci:v00001000d00000015sv*sd*bc*sc*i*
alias:          pci:v00001000d00000014sv*sd*bc*sc*i*
alias:          pci:v00001000d00000053sv*sd*bc*sc*i*
alias:          pci:v00001000d00000052sv*sd*bc*sc*i*
alias:          pci:v00001000d000000CFsv*sd*bc*sc*i*
alias:          pci:v00001000d000000CEsv*sd*bc*sc*i*
alias:          pci:v00001000d0000005Fsv*sd*bc*sc*i*
alias:          pci:v00001000d0000005Dsv*sd*bc*sc*i*
alias:          pci:v00001000d0000002Fsv*sd*bc*sc*i*
alias:          pci:v00001000d0000005Bsv*sd*bc*sc*i*
alias:          pci:v00001028d00000015sv*sd*bc*sc*i*
alias:          pci:v00001000d00000413sv*sd*bc*sc*i*
alias:          pci:v00001000d00000071sv*sd*bc*sc*i*
alias:          pci:v00001000d00000073sv*sd*bc*sc*i*
alias:          pci:v00001000d00000079sv*sd*bc*sc*i*
alias:          pci:v00001000d00000078sv*sd*bc*sc*i*
alias:          pci:v00001000d0000007Csv*sd*bc*sc*i*
alias:          pci:v00001000d00000060sv*sd*bc*sc*i*
alias:          pci:v00001000d00000411sv*sd*bc*sc*i*
depends:
retpoline:      Y
intree:         Y
name:           megaraid_sas
vermagic:       6.8.4-2-pve SMP preempt mod_unload modversions
sig_id:         PKCS#7
signer:         Build time autogenerated kernel key
sig_key:        18:F8:E0:A8:57:52:1C:85:DF:C8:08:47:94:11:01:8A:01:C3:85:E9
sig_hashalgo:   sha512
signature:      7E:CE:04:4E:0D:96:B5:77:DC:67:9B:33:F0:EA:03:F8:B9:BF:02:C3:
                CB:DC:7C:34:1B:35:93:80:68:20:AA:B2:AE:77:ED:67:EE:10:A8:A2:
                D2:A0:76:8D:65:4C:96:D2:7D:BC:A5:69:E8:34:DA:6A:15:8B:F5:EA:
                9E:CB:D3:1B:6C:B9:5B:4B:3E:B6:EF:6F:88:20:D4:28:4C:0C:8A:3F:
                89:E6:9F:14:DB:04:F5:E3:51:28:F6:B2:BC:C1:1D:33:CB:FF:A5:0D:
                98:F8:D8:A4:EB:F2:CF:E9:25:37:E4:AA:DE:C9:52:50:2E:31:3A:DD:
                ED:B1:9B:3E:D9:F1:3C:2D:01:08:EC:DE:3F:00:91:AA:38:2E:FD:78:
                E5:5B:2F:C6:E0:BF:70:A1:0E:DA:62:62:31:F9:ED:14:7C:1F:71:F3:
                7E:F2:98:A0:AE:B7:78:26:26:8D:5C:1C:77:77:C9:25:A8:E3:37:CA:
                20:0D:21:04:87:1A:3D:ED:C1:E4:F5:6F:34:95:9C:29:C9:01:E8:34:
                0A:69:CF:82:F0:C1:89:11:0A:B4:5E:8F:68:CE:3F:97:85:09:31:A3:
                5A:14:70:9B:E0:E8:B3:99:1E:F8:EC:CE:B9:28:BE:76:5E:FF:F3:D6:
                29:97:26:0A:0D:DA:50:81:2D:9A:C1:23:86:74:29:0F:92:B0:8B:19:
                93:7E:17:A7:91:36:52:0B:58:30:14:1E:D5:AD:AD:58:BE:BD:38:B5:
                08:98:F3:B1:EE:5B:1F:2F:E3:57:40:B3:8F:80:5A:8D:B3:14:B2:63:
                89:CE:98:F5:9C:7E:7E:F6:7E:40:E0:D9:CA:E5:B9:8D:D0:99:94:06:
                E5:A1:D1:9D:20:81:38:80:9F:29:F3:65:12:77:05:F7:16:11:15:8F:
                BF:71:53:7C:34:E6:6F:44:D5:80:AF:11:FB:8C:B9:7B:53:94:80:CA:
                EB:DD:67:06:2A:4E:24:79:77:58:26:CC:A1:1B:7E:36:EB:F9:B3:97:
                85:22:C5:5E:36:F8:A3:5D:F6:CC:C7:44:87:C2:A3:4D:BB:74:B7:63:
                56:3C:C9:81:B3:75:22:48:AF:6F:E1:BF:A4:8A:87:FC:22:51:0A:90:
                07:3C:01:2A:85:8E:35:8C:F5:38:09:71:D1:6F:57:B1:3A:85:8F:7A:
                D0:A0:85:3D:11:C1:C6:B7:12:6B:E1:24:5E:23:53:24:52:AC:31:8F:
                02:4E:17:AD:3E:8A:41:D4:B8:15:01:43:32:4A:A2:25:22:E0:E0:A0:
                6A:76:41:DB:47:0A:3A:28:6D:75:AE:94:A1:FF:E0:D7:FA:76:6A:6B:
                44:48:25:2B:4B:23:9B:05:DD:4D:3F:B4
parm:           lb_pending_cmds:Change raid-1 load balancing outstanding threshold. Valid Values are 1-128. Default: 4 (int)
parm:           max_sectors:Maximum number of sectors per IO command (int)
parm:           msix_disable:Disable MSI-X interrupt handling. Default: 0 (int)
parm:           msix_vectors:MSI-X max vector count. Default: Set by FW (int)
parm:           allow_vf_ioctls:Allow ioctls in SR-IOV VF mode. Default: 0 (int)
parm:           throttlequeuedepth:Adapter queue depth when throttled due to I/O timeout. Default: 16 (int)
parm:           resetwaittime:Wait time in (1-180s) after I/O timeout before resetting adapter. Default: 180s (int)
parm:           smp_affinity_enable:SMP affinity feature enable/disable Default: enable(1) (int)
parm:           rdpq_enable:Allocate reply queue in chunks for large queue depth enable/disable Default: enable(1) (int)
parm:           dual_qdepth_disable:Disable dual queue depth feature. Default: 0 (int)
parm:           scmd_timeout:scsi command timeout (10-90s), default 90s. See megasas_reset_timer. (int)
parm:           perf_mode:Performance mode (only for Aero adapters), options:
                0 - balanced: High iops and low latency queues are allocated &
                interrupt coalescing is enabled only on high iops queues
                1 - iops: High iops queues are not allocated &
                interrupt coalescing is enabled on all queues
                2 - latency: High iops queues are not allocated &
                interrupt coalescing is disabled on all queues
                default mode is 'balanced' (int)
parm:           event_log_level:Asynchronous event logging level- range is: -2(CLASS_DEBUG) to 4(CLASS_DEAD), Default: 2(CLASS_CRITICAL) (int)
parm:           enable_sdev_max_qd:Enable sdev max qd as can_queue. Default: 0 (int)
parm:           poll_queues:Number of queues to be use for io_uring poll mode.
                This parameter is effective only if host_tagset_enable=1 &
                It is not applicable for MFI_SERIES. &
                Driver will work in latency mode. &
                High iops queues are not allocated &
                 (int)
parm:           host_tagset_enable:Shared host tagset enable/disable Default: enable(1) (int)
 
Last edited:
"storcli64 /call show" for controller "/c" "all" and not without :)

Shame on me... Here's the output:
Bash:
Generating detailed summary of the adapter, it may take a while to complete.

CLI Version = 007.2310.0000.0000 Nov 02, 2022
Operating system = Linux 6.8.4-2-pve
Controller = 0
Status = Success
Description = None

Product Name = AVAGO MegaRAID SAS 9440-8i
Serial Number = SPC1040698
SAS Address =  500605b012226000
PCI Address = 00:43:00:00
System Time = 10/12/2024 13:44:54
Mfg. Date = 03/20/22
Controller Time = 10/12/2024 11:44:53
FW Package Build = 51.22.0-4545
BIOS Version = 7.22.00.0_0x07160300
FW Version = 5.220.01-3691
Driver Name = megaraid_sas
Driver Version = 07.727.03.00-rc1
Current Personality = RAID-Mode
Vendor Id = 0x1000
Device Id = 0x17
SubVendor Id = 0x1000
SubDevice Id = 0x9440
Host Interface = PCI-E
Device Interface = SAS-12G
Bus Number = 67
Device Number = 0
Function Number = 0
Domain ID = 0
Security Protocol = None
Drive Groups = 3

TOPOLOGY :
========

------------------------------------------------------------------------------
DG Arr Row EID:Slot DID Type   State BT       Size PDC  PI SED DS3  FSpace TR
------------------------------------------------------------------------------
 0 -   -   -        -   RAID1  Optl  N    3.492 TB dflt N  N   dflt N      N 
 0 0   -   -        -   RAID1  Optl  N    3.492 TB dflt N  N   dflt N      N 
 0 0   0   69:0     0   DRIVE  Onln  N    3.492 TB dflt N  N   dflt -      N 
 0 0   1   69:1     1   DRIVE  Onln  N    3.492 TB dflt N  N   dflt -      N 
 1 -   -   -        -   RAID10 Optl  N   14.553 TB dflt N  N   dflt N      N 
 1 0   -   -        -   RAID1  Optl  N    7.276 TB dflt N  N   dflt N      N 
 1 0   0   69:4     3   DRIVE  Onln  Y    7.276 TB dflt N  N   dflt -      N 
 1 0   1   69:5     2   DRIVE  Onln  Y    7.276 TB dflt N  N   dflt -      N 
 1 1   -   -        -   RAID1  Optl  Y    7.276 TB dflt N  N   dflt N      N 
 1 1   0   69:6     4   DRIVE  Onln  Y    7.276 TB dflt N  N   dflt -      N 
 1 1   1   69:7     5   DRIVE  Onln  Y    7.276 TB dflt N  N   dflt -      N 
 2 -   -   -        -   RAID1  Optl  N  446.625 GB dflt N  N   dflt N      N 
 2 0   -   -        -   RAID1  Optl  N  446.625 GB dflt N  N   dflt N      N 
 2 0   0   69:2     6   DRIVE  Onln  N  446.625 GB dflt N  N   dflt -      N 
 2 0   1   69:3     7   DRIVE  Onln  N  446.625 GB dflt N  N   dflt -      N 
------------------------------------------------------------------------------

DG=Disk Group Index|Arr=Array Index|Row=Row Index|EID=Enclosure Device ID
DID=Device ID|Type=Drive Type|Onln=Online|Rbld=Rebuild|Optl=Optimal|Dgrd=Degraded
Pdgd=Partially degraded|Offln=Offline|BT=Background Task Active
PDC=PD Cache|PI=Protection Info|SED=Self Encrypting Drive|Frgn=Foreign
DS3=Dimmer Switch 3|dflt=Default|Msng=Missing|FSpace=Free Space Present
TR=Transport Ready

Virtual Drives = 3

VD LIST :
=======

----------------------------------------------------------------
DG/VD TYPE   State Access Consist Cache Cac sCC       Size Name
----------------------------------------------------------------
0/0   RAID1  Optl  RW     No      NRWTD -   ON    3.492 TB     
1/1   RAID10 Optl  RW     No      NRWTD -   ON   14.553 TB     
2/2   RAID1  Optl  RW     No      NRWTD -   ON  446.625 GB     
----------------------------------------------------------------

VD=Virtual Drive| DG=Drive Group|Rec=Recovery
Cac=CacheCade|OfLn=OffLine|Pdgd=Partially Degraded|Dgrd=Degraded
Optl=Optimal|dflt=Default|RO=Read Only|RW=Read Write|HD=Hidden|TRANS=TransportReady
B=Blocked|Consist=Consistent|R=Read Ahead Always|NR=No Read Ahead|WB=WriteBack
AWB=Always WriteBack|WT=WriteThrough|C=Cached IO|D=Direct IO|sCC=Scheduled
Check Consistency

Physical Drives = 8

PD LIST :
=======

----------------------------------------------------------------------------------
EID:Slt DID State DG       Size Intf Med SED PI SeSz Model                Sp Type
----------------------------------------------------------------------------------
69:0      0 Onln   0   3.492 TB SAS  SSD Y   N  512B MZILT3T8HBLS/007     U  -   
69:1      1 Onln   0   3.492 TB SAS  SSD Y   N  512B MZILT3T8HBLS/007     U  -   
69:2      6 Onln   2 446.625 GB SAS  SSD N   N  512B PX04SVB048           U  -   
69:3      7 Onln   2 446.625 GB SAS  SSD N   N  512B PX04SVB048           U  -   
69:4      3 Onln   1   7.276 TB SATA HDD N   N  512B HGST HUS728T8TALE6L4 U  -   
69:5      2 Onln   1   7.276 TB SATA HDD N   N  512B HGST HUS728T8TALE6L4 U  -   
69:6      4 Onln   1   7.276 TB SATA HDD N   N  512B HGST HUS728T8TALE6L4 U  -   
69:7      5 Onln   1   7.276 TB SATA HDD N   N  512B HGST HUS728T8TALE6L4 U  -   
----------------------------------------------------------------------------------

EID=Enclosure Device ID|Slt=Slot No|DID=Device ID|DG=DriveGroup
DHS=Dedicated Hot Spare|UGood=Unconfigured Good|GHS=Global Hotspare
UBad=Unconfigured Bad|Sntze=Sanitize|Onln=Online|Offln=Offline|Intf=Interface
Med=Media Type|SED=Self Encryptive Drive|PI=Protection Info
SeSz=Sector Size|Sp=Spun|U=Up|D=Down|T=Transition|F=Foreign
UGUnsp=UGood Unsupported|UGShld=UGood shielded|HSPShld=Hotspare shielded
CFShld=Configured shielded|Cpybck=CopyBack|CBShld=Copyback Shielded
UBUnsp=UBad Unsupported|Rbld=Rebuild

Enclosures = 1

Enclosure LIST :
==============

------------------------------------------------------------------------
EID State Slots PD PS Fans TSs Alms SIM Port# ProdID     VendorSpecific
------------------------------------------------------------------------
 69 OK        8  8  0    0   0    0   0 -     VirtualSES               
------------------------------------------------------------------------

EID=Enclosure Device ID | PD=Physical drive count | PS=Power Supply count
TSs=Temperature sensor count | Alms=Alarm count | SIM=SIM Count | ProdID=Product ID

Server is now running in rescue mode and I restored VM to backup host.
If I won't find solution since server is rather new I'll try to buy new controller that work's with newest kernels.

Maybe you could recommend some ? I'm using simple RAID 1 connecting both SSD for Supermico H11DSi-NT motherboard (for AMD)
 
So all 8 disks of different types are ok and all 3 raidset are ok.
Would set volume "1" (=raid10 out of 4x sata) from NRWTD to RWBC (3 cmd's, see help), set the 5-6 "rate" values from 30 to 90 % (5 cmd's) and set flush rate from 3(/4) to 1 sec (1 cmd).
NR=no readahead, WT=writethrough, D=direct -->R=readahead, BW=writeback, C=cached
storcli64 /c0/v1 set rdcache=ra
storcli64 /c0/v1 set wrcache=wb
storcli64 /c0/v1 set iopolicy=cached
storcli64 /c0 set bgirate=90
storcli64 /c0 set ccrate=90
storcli64 /c0 set migraterate=90
storcli64 /c0 set prrate=90
storcli64 /c0 set rebuildrate=90
storcli64 /c0 set reconrate=90
storcli64 /c0 set cacheflushint=1
# What shows patrolread and consistency check ?
storcli64 /c0 show pr
storcli64 /c0 show cc
 
Last edited:
  • Like
Reactions: Kevin Smith
So all 8 disks of different types are ok and all 3 raidset are ok.
Would set volume "1" (=raid10 out of 4x sata) from NRWTD to RWBC (3 cmd's, see help), set the 5-6 "rate" values from 30 to 90 % (5 cmd's) and set flush rate from 3(/4) to 1 sec (1 cmd).
NR=no readahead, WT=writethrough, D=direct -->R=readahead, BW=writeback, C=cached
storcli64 /c0/v1 set rdcache=ra
storcli64 /c0/v1 set wrcache=wb
storcli64 /c0/v1 set iopolicy=cached
storcli64 /c0 set bgirate=90
storcli64 /c0 set ccrate=90
storcli64 /c0 set migraterate=90
storcli64 /c0 set prrate=90
storcli64 /c0 set rebuildrate=90
storcli64 /c0 set reconrate=90
storcli64 /c0 set cacheflushint=1
# What shows patrolread and consistency check ?
storcli64 /c0 show pr
storcli64 /c0 show cc
Thanks for quick info
Here's command output:

Bash:
root@host6:~# /opt/MegaRAID/storcli/storcli64 /c0 show pr
CLI Version = 007.2310.0000.0000 Nov 02, 2022
Operating system = Linux 6.8.4-2-pve
Controller = 0
Status = Success
Description = None


Controller Properties :
=====================

---------------------------------------------
Ctrl_Prop               Value
---------------------------------------------
PR Mode                 Auto
PR Execution Delay      168 hours
PR iterations completed 99
PR Next Start time      10/12/2024, 04:00:00
PR on SSD               Disabled
PR Current State        Active 0
PR Excluded VDs         None
PR MaxConcurrentPd      32
---------------------------------------------



root@host6:~# /opt/MegaRAID/storcli/storcli64 /c0 show cc
CLI Version = 007.2310.0000.0000 Nov 02, 2022
Operating system = Linux 6.8.4-2-pve
Controller = 0
Status = Success
Description = None


Controller Properties :
=====================

-----------------------------------------------
Ctrl_Prop                 Value
-----------------------------------------------
CC Operation Mode         Concurrent
CC Execution Delay        168 hours
CC Next Starttime         10/19/2024, 04:00:00
CC Current State          Stopped
CC Number of iterations   96
CC Number of VD completed 0
CC Excluded VDs           None
-----------------------------------------------
 
So after a week this chat disappears ... and there again now ...
pr and cc are configured on and should run once a week which is good.
You can check for possible further errors in ctrl logs with
storcli64 /c0 show eventloginfo
storcli64 /c0 show termlog
Against your AMD-Vi IO_PAGE_FAULT you could try 4 options:
vi /etc/default/grub and add iommu options this line to
GRUB_CMDLINE_LINUX_DEFAULT="quiet iommu=soft"
or
GRUB_CMDLINE_LINUX_DEFAULT="quiet iommu=off"
or
GRUB_CMDLINE_LINUX_DEFAULT="quiet amd_iommu=soft"
or
GRUB_CMDLINE_LINUX_DEFAULT="quiet amd_iommu=off"
Then "update-grub" and reboot.
Check dmesg, your system log files and last if you found the good cmdline ... the function of your vm's and lxc's.