kernel: aacraid: Host adapter reset request. SCSI hang ?

Jul 14, 2011
23
0
21
Canada
A few VM's went down on our master node and found the following in /var/log/syslog just before the down state :

Code:
Apr 27 19:29:49 co-ve-001 kernel: aacraid: Host adapter abort request (1,0,0,0)
Apr 27 19:29:49 co-ve-001 kernel: aacraid: Host adapter abort request (1,0,0,0)
Apr 27 19:29:49 co-ve-001 kernel: aacraid: Host adapter abort request (1,0,0,0)
Apr 27 19:29:49 co-ve-001 kernel: aacraid: Host adapter abort request (1,0,0,0)
Apr 27 19:29:49 co-ve-001 kernel: aacraid: Host adapter reset request. SCSI hang ?
Apr 27 19:29:55 co-ve-001 pvemirror[10199]: syncing vzlist from 'x.x.x.x' failed: 500 read timeout
Apr 27 19:30:49 co-ve-001 kernel: aacraid: SCSI bus appears hung
Apr 27 19:31:20 co-ve-001 kernel: IRQ 30/aacraid: IRQF_DISABLED is not guaranteed on shared IRQs
Apr 27 19:31:43 co-ve-001 pvemirror[10199]: syncing templates
Apr 27 19:31:43 co-ve-001 pvemirror[10199]: cluster syncronization finished (118.55 seconds (files 0.00, config 0.00))
Apr 27 19:31:43 co-ve-001 pvemirror[10199]: starting cluster syncronization
Apr 27 19:31:44 co-ve-001 kernel: vmbr0: port 19(tap115i0d0) entering disabled state
Apr 27 19:31:44 co-ve-001 kernel: vmbr0: port 19(tap115i0d0) entering disabled state
Apr 27 19:31:44 co-ve-001 pvemirror[10199]: syncing templates
Apr 27 19:31:44 co-ve-001 pvemirror[10199]: cluster syncronization finished (0.32 seconds (files 0.00, config 0.00))
Apr 27 19:31:44 co-ve-001 kernel: vmbr0: port 13(tap106i0d0) entering disabled state
Apr 27 19:31:44 co-ve-001 kernel: vmbr0: port 13(tap106i0d0) entering disabled state
Apr 27 19:31:44 co-ve-001 kernel: vmbr0: port 12(tap118i0d0) entering disabled state
Apr 27 19:31:44 co-ve-001 kernel: vmbr0: port 12(tap118i0d0) entering disabled state
Apr 27 19:31:44 co-ve-001 kernel: vmbr30: port 3(tap118i30d0) entering disabled state
Apr 27 19:31:44 co-ve-001 kernel: vmbr30: port 3(tap118i30d0) entering disabled state
Apr 27 19:31:44 co-ve-001 kernel: vmbr0: port 22(tap104i0d0) entering disabled state
Apr 27 19:31:44 co-ve-001 kernel: vmbr0: port 22(tap104i0d0) entering disabled state
Apr 27 19:31:44 co-ve-001 kernel: vmbr10: port 5(tap104i10d0) entering disabled state
Apr 27 19:31:44 co-ve-001 kernel: vmbr10: port 5(tap104i10d0) entering disabled state
Apr 27 19:31:44 co-ve-001 kernel: vmbr0: port 14(tap116i0d0) entering disabled state
Apr 27 19:31:44 co-ve-001 kernel: vmbr0: port 14(tap116i0d0) entering disabled state
Apr 27 19:31:44 co-ve-001 kernel: vmbr0: port 15(tap105i0d0) entering disabled state
Apr 27 19:31:44 co-ve-001 kernel: vmbr0: port 15(tap105i0d0) entering disabled state
Apr 27 19:31:44 co-ve-001 kernel: vmbr0: port 8(tap117i0d0) entering disabled state
Apr 27 19:31:44 co-ve-001 kernel: vmbr0: port 8(tap117i0d0) entering disabled state
Apr 27 19:31:44 co-ve-001 kernel: vmbr0: port 23(tap103i0d0) entering disabled state
Apr 27 19:31:44 co-ve-001 kernel: vmbr0: port 23(tap103i0d0) entering disabled state
Apr 27 19:31:44 co-ve-001 kernel: vmbr0: port 21(tap119i0d0) entering disabled state
Apr 27 19:31:44 co-ve-001 kernel: vmbr0: port 21(tap119i0d0) entering disabled state
Apr 27 19:31:44 co-ve-001 kernel: vmbr30: port 5(tap115i30d0) entering disabled state
Apr 27 19:31:44 co-ve-001 kernel: vmbr30: port 5(tap115i30d0) entering disabled state
Apr 27 19:31:44 co-ve-001 kernel: vmbr10: port 7(tap106i10d0) entering disabled state
Apr 27 19:31:44 co-ve-001 kernel: vmbr10: port 7(tap106i10d0) entering disabled state
Apr 27 19:31:44 co-ve-001 kernel: vmbr30: port 4(tap116i30d0) entering disabled state
Apr 27 19:31:44 co-ve-001 kernel: vmbr30: port 4(tap116i30d0) entering disabled state
Apr 27 19:31:44 co-ve-001 kernel: vmbr10: port 3(tap105i10d0) entering disabled state
Apr 27 19:31:44 co-ve-001 kernel: vmbr10: port 3(tap105i10d0) entering disabled state
Apr 27 19:31:45 co-ve-001 kernel: vmbr30: port 2(tap117i30d0) entering disabled state
Apr 27 19:31:45 co-ve-001 kernel: vmbr30: port 2(tap117i30d0) entering disabled state
Apr 27 19:31:45 co-ve-001 kernel: vmbr10: port 6(tap103i10d0) entering disabled state
Apr 27 19:31:45 co-ve-001 kernel: vmbr10: port 6(tap103i10d0) entering disabled state
Apr 27 19:31:45 co-ve-001 kernel: vmbr30: port 6(tap119i30d0) entering disabled state
Apr 27 19:31:45 co-ve-001 kernel: vmbr30: port 6(tap119i30d0) entering disabled state

Anyone have an idea about this issue ?
 
you run an old version? what adaptec card and mainboard do you use? check bios upgrades.

and post pveversion -v.
 
you run an old version? what adaptec card and mainboard do you use? check bios upgrades.

and post pveversion -v.

Hello,

The server went down again.... with the same output in the logs.

It's an Adaptec 6504 on a SuperMicro board (almost new with dual X5672 proc). I did a firmware update a few months ago on the raid card, I will check tomorrow if any other updates are available for the card and the mb.

Code:
pve-manager: 1.9-26 (pve-manager/1.9/6567)
running kernel: 2.6.32-6-pve
proxmox-ve-2.6.32: 1.9-55+ovzfix-1
pve-kernel-2.6.32-4-pve: 2.6.32-33
pve-kernel-2.6.32-6-pve: 2.6.32-55+ovzfix-1
qemu-server: 1.1-32
pve-firmware: 1.0-14
libpve-storage-perl: 1.0-19
vncterm: 0.9-2
vzctl: 3.0.29-3pve1
vzdump: 1.2-16
vzprocps: 2.0.11-2
vzquota: 3.0.11-1
pve-qemu-kvm: 0.15.0-2
ksm-control-daemon: 1.0-6

Cheers!
 
you run an old Proxmox VE 1.9 version, upgrade at least to the latest 1.9.

but everybody should move to 2.x - 1.9 is outdated.
 
you run an old Proxmox VE 1.9 version, upgrade at least to the latest 1.9.

but everybody should move to 2.x - 1.9 is outdated.

Yeah it's a production cluster with 4 nodes running 1.9. We are awaiting new HP servers along with a HP SAN. We will build the new cluster with 2.x and migrate the VMs to the new cluster.

We are actually testing 2.x in our labs with old equipments. But until then, the production servers are critical and we can't afford more downtime, so was just wondering and poking around about this issue...

Cheers!
 
We are having a similar problem (kernel: aacraid: Host adapter abort request) on a newly built server using an Adaptec 6805E controller.

Tom, can you give me the aacraid driver versions used in the 1.9 and 2.1 kernels?
 
Hi all,

My HW is Adaptec 6405E which run with aacraid.

I was faced this error too, no with Proxmox OS but with Ubuntu and Debian (with new kernel) but the solution is the same for all Linux OS which have theses errors. (SCSI hang, abort request, I/O error, disk become invisible in the output of fdisk, etc...)

Upgrade the firmware of your hwcard (I have upgrade 18668 to 19076 and 19109 and now working very nice), so go on adaptec website.


I hope it could help ;)
Regards
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!