Areca 1882ix hang after kernel upgrade from 2.6.32-11-pve to 2.6.32-14-pve

obrienmd

Member
Oct 14, 2009
109
0
16
After upgrading between these kernels, I am getting 100% io wait and 0 throughput (an i/o 'hang') on my Areca 1882ix SAS/SATA RAID controller card, and any VMs touching that card's arrays also get i/o hangs.

Immediately prior to the hang, I get these log messages:
Aug 30 01:10:05 vmserver kernel: usb 1-1.1: USB disconnect, device number 3
Aug 30 01:10:55 vmserver kernel: arcmsr0: abort device command of scsi id = 0 lun = 0
Aug 30 01:10:58 vmserver kernel: arcmsr0: abort device command of scsi id = 0 lun = 0
Aug 30 01:11:01 vmserver kernel: arcmsr0: abort device command of scsi id = 0 lun = 0
Aug 30 01:11:03 vmserver kernel: arcmsr0: abort device command of scsi id = 0 lun = 0
Aug 30 01:11:06 vmserver kernel: arcmsr0: abort device command of scsi id = 0 lun = 0
Aug 30 01:11:09 vmserver kernel: arcmsr0: abort device command of scsi id = 0 lun = 1
Aug 30 01:11:11 vmserver kernel: arcmsr0: abort device command of scsi id = 0 lun = 1
Aug 30 01:11:14 vmserver kernel: arcmsr0: abort device command of scsi id = 0 lun = 0
Aug 30 01:11:16 vmserver kernel: arcmsr0: abort device command of scsi id = 0 lun = 0
Aug 30 01:11:19 vmserver kernel: arcmsr0: abort device command of scsi id = 0 lun = 0
Aug 30 01:11:22 vmserver kernel: arcmsr0: abort device command of scsi id = 0 lun = 0
Aug 30 01:11:24 vmserver kernel: arcmsr0: abort device command of scsi id = 0 lun = 0
Aug 30 01:11:27 vmserver kernel: arcmsr0: abort device command of scsi id = 0 lun = 0
Aug 30 01:11:30 vmserver kernel: arcmsr0: abort device command of scsi id = 0 lun = 0
Aug 30 01:11:32 vmserver kernel: arcmsr0: abort device command of scsi id = 0 lun = 0
Aug 30 01:11:35 vmserver kernel: arcmsr0: abort device command of scsi id = 0 lun = 0
Aug 30 01:11:38 vmserver kernel: arcmsr0: abort device command of scsi id = 0 lun = 0
Aug 30 01:11:40 vmserver kernel: arcmsr0: abort device command of scsi id = 0 lun = 0
Aug 30 01:11:43 vmserver kernel: arcmsr0: abort device command of scsi id = 0 lun = 1
Aug 30 01:11:45 vmserver kernel: arcmsr0: abort device command of scsi id = 0 lun = 1
Aug 30 01:11:48 vmserver kernel: arcmsr0: abort device command of scsi id = 0 lun = 1
Aug 30 01:11:51 vmserver kernel: arcmsr0: abort device command of scsi id = 0 lun = 1
Aug 30 01:11:53 vmserver kernel: arcmsr0: abort device command of scsi id = 0 lun = 1
Aug 30 01:11:56 vmserver kernel: arcmsr0: abort device command of scsi id = 0 lun = 0
Aug 30 01:11:59 vmserver kernel: arcmsr0: abort device command of scsi id = 0 lun = 0
Aug 30 01:12:01 vmserver kernel: arcmsr0: abort device command of scsi id = 0 lun = 0
Aug 30 01:12:04 vmserver kernel: arcmsr0: abort device command of scsi id = 0 lun = 0
Aug 30 01:12:06 vmserver kernel: arcmsr0: abort device command of scsi id = 0 lun = 0
Aug 30 01:12:09 vmserver kernel: arcmsr0: abort device command of scsi id = 0 lun = 0
Aug 30 01:12:12 vmserver kernel: arcmsr0: abort device command of scsi id = 0 lun = 0
Aug 30 01:12:14 vmserver kernel: arcmsr0: abort device command of scsi id = 0 lun = 0
Aug 30 01:12:17 vmserver kernel: arcmsr0: abort device command of scsi id = 0 lun = 0
Aug 30 01:12:20 vmserver kernel: arcmsr0: abort device command of scsi id = 0 lun = 0
Aug 30 01:12:22 vmserver kernel: arcmsr0: abort device command of scsi id = 0 lun = 0
Aug 30 01:12:25 vmserver kernel: arcmsr0: abort device command of scsi id = 0 lun = 0
Aug 30 01:12:27 vmserver kernel: arcmsr0: abort device command of scsi id = 0 lun = 0
Aug 30 01:12:30 vmserver kernel: arcmsr0: abort device command of scsi id = 0 lun = 0
Aug 30 01:12:33 vmserver kernel: arcmsr0: abort device command of scsi id = 0 lun = 0
Aug 30 01:12:35 vmserver kernel: arcmsr0: abort device command of scsi id = 0 lun = 0
Aug 30 01:12:37 vmserver kernel: vgs D ffff88081235a040 0 2362 1762 0 0x00000000
Aug 30 01:12:37 vmserver kernel: ffff880837d39aa8 0000000000000082 ffff880837d39a38 ffffffff8107f28d
Aug 30 01:12:37 vmserver kernel: 0000000000000008 0000000000001000 ffff880837fba138 ffff880837fba048
Aug 30 01:12:37 vmserver kernel: ffff880836da8eb0 ffff88081235a5e0 ffff880837d39fd8 ffff880837d39fd8
Aug 30 01:12:37 vmserver kernel: Call Trace:
Aug 30 01:12:37 vmserver kernel: [<ffffffff8107f28d>] ? del_timer+0x7d/0xe0
Aug 30 01:12:37 vmserver kernel: [<ffffffff81525ed3>] io_schedule+0x73/0xc0
Aug 30 01:12:37 vmserver kernel: [<ffffffff811d2bce>] __blockdev_direct_IO_newtrunc+0x6ee/0xb80
Aug 30 01:12:37 vmserver kernel: [<ffffffff811d30be>] __blockdev_direct_IO+0x5e/0xd0
Aug 30 01:12:37 vmserver kernel: [<ffffffff811cf940>] ? blkdev_get_blocks+0x0/0xc0
Aug 30 01:12:37 vmserver kernel: [<ffffffff811d0797>] blkdev_direct_IO+0x57/0x60
Aug 30 01:12:37 vmserver kernel: [<ffffffff811cf940>] ? blkdev_get_blocks+0x0/0xc0
Aug 30 01:12:37 vmserver kernel: [<ffffffff81123f38>] mapping_direct_IO+0x48/0x70
Aug 30 01:12:37 vmserver kernel: [<ffffffff8112713b>] generic_file_read_iter+0x60b/0x680
Aug 30 01:12:37 vmserver kernel: [<ffffffff811d0fe9>] ? __blkdev_get+0x1a9/0x3c0
Aug 30 01:12:37 vmserver kernel: [<ffffffff811d1220>] ? blkdev_open+0x0/0xc0
Aug 30 01:12:37 vmserver kernel: [<ffffffff811d1210>] ? blkdev_get+0x10/0x20
Aug 30 01:12:37 vmserver kernel: [<ffffffff8112723b>] generic_file_aio_read+0x8b/0xa0
Aug 30 01:12:37 vmserver kernel: [<ffffffff811949da>] do_sync_read+0xfa/0x140
Aug 30 01:12:37 vmserver kernel: [<ffffffff81095be0>] ? autoremove_wake_function+0x0/0x40
Aug 30 01:12:37 vmserver kernel: [<ffffffff811cfd7c>] ? block_ioctl+0x3c/0x40
Aug 30 01:12:37 vmserver kernel: [<ffffffff811a852a>] ? do_vfs_ioctl+0x8a/0x5d0
Aug 30 01:12:37 vmserver kernel: [<ffffffff811953c5>] vfs_read+0xb5/0x1a0
Aug 30 01:12:37 vmserver kernel: [<ffffffff81195501>] sys_read+0x51/0x90
Aug 30 01:12:37 vmserver kernel: [<ffffffff8100b182>] system_call_fastpath+0x16/0x1b
Aug 30 01:12:38 vmserver kernel: arcmsr0: abort device command of scsi id = 0 lun = 0
Aug 30 01:12:41 vmserver kernel: arcmsr0: abort device command of scsi id = 0 lun = 0
Aug 30 01:12:43 vmserver kernel: arcmsr0: abort device command of scsi id = 0 lun = 0
Aug 30 01:12:46 vmserver kernel: arcmsr0: abort device command of scsi id = 0 lun = 0
Aug 30 01:12:48 vmserver kernel: arcmsr0: abort device command of scsi id = 0 lun = 0
Aug 30 01:12:51 vmserver kernel: arcmsr0: abort device command of scsi id = 0 lun = 0
Aug 30 01:12:54 vmserver kernel: arcmsr0: abort device command of scsi id = 0 lun = 0
Aug 30 01:12:56 vmserver kernel: arcmsr0: abort device command of scsi id = 0 lun = 0
Aug 30 01:12:59 vmserver kernel: arcmsr0: abort device command of scsi id = 0 lun = 0
Aug 30 01:13:02 vmserver kernel: arcmsr0: abort device command of scsi id = 0 lun = 0
Aug 30 01:13:04 vmserver kernel: arcmsr0: abort device command of scsi id = 0 lun = 0
Aug 30 01:13:07 vmserver kernel: arcmsr0: abort device command of scsi id = 0 lun = 1
Aug 30 01:13:09 vmserver kernel: arcmsr0: abort device command of scsi id = 0 lun = 1
Aug 30 01:13:12 vmserver kernel: arcmsr0: abort device command of scsi id = 0 lun = 0
Aug 30 01:13:15 vmserver kernel: arcmsr0: abort device command of scsi id = 0 lun = 0
Aug 30 01:13:17 vmserver kernel: arcmsr0: abort device command of scsi id = 0 lun = 0
Aug 30 01:13:20 vmserver kernel: arcmsr0: abort device command of scsi id = 0 lun = 0
Aug 30 01:13:23 vmserver kernel: arcmsr0: abort device command of scsi id = 0 lun = 0
Aug 30 01:13:25 vmserver kernel: arcmsr0: abort device command of scsi id = 0 lun = 0
Aug 30 01:13:51 vmserver kernel: arcmsr0: wait 'abort all outstanding command' timeout
Aug 30 01:13:51 vmserver kernel: arcmsr0: executing hw bus reset .....


The only commit I see touching Areca config is 2355a7395845b48e18b9f1a52e11f6abc4044d6f.

Has anyone else seen anything like this? After rolling back to 2.6.32-11, everything seems great again.

root@vmserver:~# pveversion --verbose
pve-manager: 2.1-14 (pve-manager/2.1/f32f3f46)
running kernel: 2.6.32-11-pve
proxmox-ve-2.6.32: 2.1-74
pve-kernel-2.6.32-10-pve: 2.6.32-64
pve-kernel-2.6.32-11-pve: 2.6.32-66
pve-kernel-2.6.32-13-pve: 2.6.32-72
pve-kernel-2.6.32-12-pve: 2.6.32-68
pve-kernel-2.6.32-14-pve: 2.6.32-74
pve-kernel-2.6.32-6-pve: 2.6.32-55
lvm2: 2.02.95-1pve2
clvm: 2.02.95-1pve2
corosync-pve: 1.4.3-1
openais-pve: 1.1.4-2
libqb: 0.10.1-2
redhat-cluster-pve: 3.1.92-3
resource-agents-pve: 3.9.2-3
fence-agents-pve: 3.1.8-1
pve-cluster: 1.0-27
qemu-server: 2.0-52
pve-firmware: 1.0-18
libpve-common-perl: 1.0-30
libpve-access-control: 1.0-24
libpve-storage-perl: 2.0-31
vncterm: 1.0-3
vzctl: 3.0.30-2pve5
vzprocps: 2.0.11-2
vzquota: 3.0.12-3
pve-qemu-kvm: 1.1-8
ksm-control-daemon: 1.1-1
 
I have more than a dozen 1880ix-12 cards, they all work fine.

In the past I had similar IO hangs with the 1880 cards, the issue was caused by:
1. bugs in the firmware of my WD RE3 disks (fixed with unpublished firmware from WD)
2. bugs in firmware of 1880 cards (are you using the latest firmware?)

Maybe there is a known issue with the version of the driver in the Proxmox kernel.
I would suggest contacting Areca and see if they have any suggestions.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!