ioatdma bug

proxmox.user287

New Member
May 11, 2011
3
0
1
Hi! I'm using proxmox at my work, and after last kernel upgrade(it happened not that recently, but i rebooted my server just few days ago) I'm getting some trouble with ioatdma driver, it even hangs my system up.
Now some info:
Code:
ppve:~# uname -a
Linux ppve 2.6.32-4-pve #1 SMP Tue Mar 29 09:08:37 CEST 2011 x86_64 GNU/Linux
ppve:~# lspci -vnvn | grep 'System peripheral' -A25
00:08.0 System peripheral [0880]: Intel Corporation 5000 Series Chipset DMA Engine [8086:1a38] (rev b1)
    Subsystem: Intel Corporation Device [8086:3484]
    Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr+ Stepping- SERR+ FastB2B- DisINTx+
    Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
    Latency: 0
    Interrupt: pin A routed to IRQ 56
    Region 0: Memory at fe700000 (64-bit, non-prefetchable) [size=1K]
    Capabilities: [50] Power Management version 2
        Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0+,D1-,D2-,D3hot+,D3cold+)
        Status: D0 PME-Enable- DSel=0 DScale=0 PME-
    Capabilities: [58] Message Signalled Interrupts: Mask- 64bit- Queue=0/0 Enable+
        Address: feeff00c  Data: 41c9
    Capabilities: [6c] Express (v1) Root Complex Integrated Endpoint, MSI 00
        DevCap:    MaxPayload 128 bytes, PhantFunc 0, Latency L0s <64ns, L1 <1us
            ExtTag- RBE- FLReset-
        DevCtl:    Report errors: Correctable+ Non-Fatal+ Fatal+ Unsupported-
            RlxdOrd- ExtTag- PhantFunc- AuxPwr- NoSnoop+
            MaxPayload 128 bytes, MaxReadReq 128 bytes
        DevSta:    CorrErr- UncorrErr- FatalErr- UnsuppReq- AuxPwr- TransPend-
        LnkCap:    Port #0, Speed unknown, Width x0, ASPM unknown, Latency L0 <64ns, L1 <1us
            ClockPM- Suprise- LLActRep- BwNot-
        LnkCtl:    ASPM L1 Enabled; Disabled- Retrain- CommClk-
            ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
        LnkSta:    Speed unknown, Width x0, TrErr- Train- SlotClk- DLActive- BWMgmt- ABWMgmt-
    Kernel driver in use: ioatdma
    Kernel modules: ioatdma
ppve:~# dmesg | tail
ioatdma 0000:00:08.0: Channel halted, chanerr = 0
ioatdma 0000:00:08.0: Channel halted, chanerr = 0
ioatdma 0000:00:08.0: Channel halted, chanerr = 0
ioatdma 0000:00:08.0: Channel halted, chanerr = 0
ioatdma 0000:00:08.0: Channel halted, chanerr = 0
ioatdma 0000:00:08.0: Channel halted, chanerr = 0
ioatdma 0000:00:08.0: Channel halted, chanerr = 0
ioatdma 0000:00:08.0: Channel halted, chanerr = 0
ioatdma 0000:00:08.0: Channel halted, chanerr = 0
ioatdma 0000:00:08.0: Channel halted, chanerr = 0
After a day uptime with this my system couldn't properly work: i got file system's remounted RO, and i couldn't even reboot by ssh(none of reboot/shutdown -r now/telinit 6 worked).
I haven't found a bugzilla, so i posted it here. Sorry for my English;)
 
Could be a hardware issue? test your disks using a live cd.
 
the bug report is for Ubuntu Kernel 2.6.35, you are using 2.6.32 (Squeeze based).
 
I think that anyway there were a regression in ioatdma module. I checked my disk system(I have hardware RAID 5, 4 disks) and almost everything seems ok. So, after second system hang(right after 1 day uptime) i blacklisted this module and at the moment everything seems fine - no errors in logs. I'm just curios: I have lost some i/o performance(net and disk system), right?