After upgrading between these kernels, I am getting 100% io wait and 0 throughput (an i/o 'hang') on my Areca 1882ix SAS/SATA RAID controller card, and any VMs touching that card's arrays also get i/o hangs.
Immediately prior to the hang, I get these log messages:
Aug 30 01:10:05 vmserver kernel: usb 1-1.1: USB disconnect, device number 3
Aug 30 01:10:55 vmserver kernel: arcmsr0: abort device command of scsi id = 0 lun = 0
Aug 30 01:10:58 vmserver kernel: arcmsr0: abort device command of scsi id = 0 lun = 0
Aug 30 01:11:01 vmserver kernel: arcmsr0: abort device command of scsi id = 0 lun = 0
Aug 30 01:11:03 vmserver kernel: arcmsr0: abort device command of scsi id = 0 lun = 0
Aug 30 01:11:06 vmserver kernel: arcmsr0: abort device command of scsi id = 0 lun = 0
Aug 30 01:11:09 vmserver kernel: arcmsr0: abort device command of scsi id = 0 lun = 1
Aug 30 01:11:11 vmserver kernel: arcmsr0: abort device command of scsi id = 0 lun = 1
Aug 30 01:11:14 vmserver kernel: arcmsr0: abort device command of scsi id = 0 lun = 0
Aug 30 01:11:16 vmserver kernel: arcmsr0: abort device command of scsi id = 0 lun = 0
Aug 30 01:11:19 vmserver kernel: arcmsr0: abort device command of scsi id = 0 lun = 0
Aug 30 01:11:22 vmserver kernel: arcmsr0: abort device command of scsi id = 0 lun = 0
Aug 30 01:11:24 vmserver kernel: arcmsr0: abort device command of scsi id = 0 lun = 0
Aug 30 01:11:27 vmserver kernel: arcmsr0: abort device command of scsi id = 0 lun = 0
Aug 30 01:11:30 vmserver kernel: arcmsr0: abort device command of scsi id = 0 lun = 0
Aug 30 01:11:32 vmserver kernel: arcmsr0: abort device command of scsi id = 0 lun = 0
Aug 30 01:11:35 vmserver kernel: arcmsr0: abort device command of scsi id = 0 lun = 0
Aug 30 01:11:38 vmserver kernel: arcmsr0: abort device command of scsi id = 0 lun = 0
Aug 30 01:11:40 vmserver kernel: arcmsr0: abort device command of scsi id = 0 lun = 0
Aug 30 01:11:43 vmserver kernel: arcmsr0: abort device command of scsi id = 0 lun = 1
Aug 30 01:11:45 vmserver kernel: arcmsr0: abort device command of scsi id = 0 lun = 1
Aug 30 01:11:48 vmserver kernel: arcmsr0: abort device command of scsi id = 0 lun = 1
Aug 30 01:11:51 vmserver kernel: arcmsr0: abort device command of scsi id = 0 lun = 1
Aug 30 01:11:53 vmserver kernel: arcmsr0: abort device command of scsi id = 0 lun = 1
Aug 30 01:11:56 vmserver kernel: arcmsr0: abort device command of scsi id = 0 lun = 0
Aug 30 01:11:59 vmserver kernel: arcmsr0: abort device command of scsi id = 0 lun = 0
Aug 30 01:12:01 vmserver kernel: arcmsr0: abort device command of scsi id = 0 lun = 0
Aug 30 01:12:04 vmserver kernel: arcmsr0: abort device command of scsi id = 0 lun = 0
Aug 30 01:12:06 vmserver kernel: arcmsr0: abort device command of scsi id = 0 lun = 0
Aug 30 01:12:09 vmserver kernel: arcmsr0: abort device command of scsi id = 0 lun = 0
Aug 30 01:12:12 vmserver kernel: arcmsr0: abort device command of scsi id = 0 lun = 0
Aug 30 01:12:14 vmserver kernel: arcmsr0: abort device command of scsi id = 0 lun = 0
Aug 30 01:12:17 vmserver kernel: arcmsr0: abort device command of scsi id = 0 lun = 0
Aug 30 01:12:20 vmserver kernel: arcmsr0: abort device command of scsi id = 0 lun = 0
Aug 30 01:12:22 vmserver kernel: arcmsr0: abort device command of scsi id = 0 lun = 0
Aug 30 01:12:25 vmserver kernel: arcmsr0: abort device command of scsi id = 0 lun = 0
Aug 30 01:12:27 vmserver kernel: arcmsr0: abort device command of scsi id = 0 lun = 0
Aug 30 01:12:30 vmserver kernel: arcmsr0: abort device command of scsi id = 0 lun = 0
Aug 30 01:12:33 vmserver kernel: arcmsr0: abort device command of scsi id = 0 lun = 0
Aug 30 01:12:35 vmserver kernel: arcmsr0: abort device command of scsi id = 0 lun = 0
Aug 30 01:12:37 vmserver kernel: vgs D ffff88081235a040 0 2362 1762 0 0x00000000
Aug 30 01:12:37 vmserver kernel: ffff880837d39aa8 0000000000000082 ffff880837d39a38 ffffffff8107f28d
Aug 30 01:12:37 vmserver kernel: 0000000000000008 0000000000001000 ffff880837fba138 ffff880837fba048
Aug 30 01:12:37 vmserver kernel: ffff880836da8eb0 ffff88081235a5e0 ffff880837d39fd8 ffff880837d39fd8
Aug 30 01:12:37 vmserver kernel: Call Trace:
Aug 30 01:12:37 vmserver kernel: [<ffffffff8107f28d>] ? del_timer+0x7d/0xe0
Aug 30 01:12:37 vmserver kernel: [<ffffffff81525ed3>] io_schedule+0x73/0xc0
Aug 30 01:12:37 vmserver kernel: [<ffffffff811d2bce>] __blockdev_direct_IO_newtrunc+0x6ee/0xb80
Aug 30 01:12:37 vmserver kernel: [<ffffffff811d30be>] __blockdev_direct_IO+0x5e/0xd0
Aug 30 01:12:37 vmserver kernel: [<ffffffff811cf940>] ? blkdev_get_blocks+0x0/0xc0
Aug 30 01:12:37 vmserver kernel: [<ffffffff811d0797>] blkdev_direct_IO+0x57/0x60
Aug 30 01:12:37 vmserver kernel: [<ffffffff811cf940>] ? blkdev_get_blocks+0x0/0xc0
Aug 30 01:12:37 vmserver kernel: [<ffffffff81123f38>] mapping_direct_IO+0x48/0x70
Aug 30 01:12:37 vmserver kernel: [<ffffffff8112713b>] generic_file_read_iter+0x60b/0x680
Aug 30 01:12:37 vmserver kernel: [<ffffffff811d0fe9>] ? __blkdev_get+0x1a9/0x3c0
Aug 30 01:12:37 vmserver kernel: [<ffffffff811d1220>] ? blkdev_open+0x0/0xc0
Aug 30 01:12:37 vmserver kernel: [<ffffffff811d1210>] ? blkdev_get+0x10/0x20
Aug 30 01:12:37 vmserver kernel: [<ffffffff8112723b>] generic_file_aio_read+0x8b/0xa0
Aug 30 01:12:37 vmserver kernel: [<ffffffff811949da>] do_sync_read+0xfa/0x140
Aug 30 01:12:37 vmserver kernel: [<ffffffff81095be0>] ? autoremove_wake_function+0x0/0x40
Aug 30 01:12:37 vmserver kernel: [<ffffffff811cfd7c>] ? block_ioctl+0x3c/0x40
Aug 30 01:12:37 vmserver kernel: [<ffffffff811a852a>] ? do_vfs_ioctl+0x8a/0x5d0
Aug 30 01:12:37 vmserver kernel: [<ffffffff811953c5>] vfs_read+0xb5/0x1a0
Aug 30 01:12:37 vmserver kernel: [<ffffffff81195501>] sys_read+0x51/0x90
Aug 30 01:12:37 vmserver kernel: [<ffffffff8100b182>] system_call_fastpath+0x16/0x1b
Aug 30 01:12:38 vmserver kernel: arcmsr0: abort device command of scsi id = 0 lun = 0
Aug 30 01:12:41 vmserver kernel: arcmsr0: abort device command of scsi id = 0 lun = 0
Aug 30 01:12:43 vmserver kernel: arcmsr0: abort device command of scsi id = 0 lun = 0
Aug 30 01:12:46 vmserver kernel: arcmsr0: abort device command of scsi id = 0 lun = 0
Aug 30 01:12:48 vmserver kernel: arcmsr0: abort device command of scsi id = 0 lun = 0
Aug 30 01:12:51 vmserver kernel: arcmsr0: abort device command of scsi id = 0 lun = 0
Aug 30 01:12:54 vmserver kernel: arcmsr0: abort device command of scsi id = 0 lun = 0
Aug 30 01:12:56 vmserver kernel: arcmsr0: abort device command of scsi id = 0 lun = 0
Aug 30 01:12:59 vmserver kernel: arcmsr0: abort device command of scsi id = 0 lun = 0
Aug 30 01:13:02 vmserver kernel: arcmsr0: abort device command of scsi id = 0 lun = 0
Aug 30 01:13:04 vmserver kernel: arcmsr0: abort device command of scsi id = 0 lun = 0
Aug 30 01:13:07 vmserver kernel: arcmsr0: abort device command of scsi id = 0 lun = 1
Aug 30 01:13:09 vmserver kernel: arcmsr0: abort device command of scsi id = 0 lun = 1
Aug 30 01:13:12 vmserver kernel: arcmsr0: abort device command of scsi id = 0 lun = 0
Aug 30 01:13:15 vmserver kernel: arcmsr0: abort device command of scsi id = 0 lun = 0
Aug 30 01:13:17 vmserver kernel: arcmsr0: abort device command of scsi id = 0 lun = 0
Aug 30 01:13:20 vmserver kernel: arcmsr0: abort device command of scsi id = 0 lun = 0
Aug 30 01:13:23 vmserver kernel: arcmsr0: abort device command of scsi id = 0 lun = 0
Aug 30 01:13:25 vmserver kernel: arcmsr0: abort device command of scsi id = 0 lun = 0
Aug 30 01:13:51 vmserver kernel: arcmsr0: wait 'abort all outstanding command' timeout
Aug 30 01:13:51 vmserver kernel: arcmsr0: executing hw bus reset .....
The only commit I see touching Areca config is 2355a7395845b48e18b9f1a52e11f6abc4044d6f.
Has anyone else seen anything like this? After rolling back to 2.6.32-11, everything seems great again.
root@vmserver:~# pveversion --verbose
pve-manager: 2.1-14 (pve-manager/2.1/f32f3f46)
running kernel: 2.6.32-11-pve
proxmox-ve-2.6.32: 2.1-74
pve-kernel-2.6.32-10-pve: 2.6.32-64
pve-kernel-2.6.32-11-pve: 2.6.32-66
pve-kernel-2.6.32-13-pve: 2.6.32-72
pve-kernel-2.6.32-12-pve: 2.6.32-68
pve-kernel-2.6.32-14-pve: 2.6.32-74
pve-kernel-2.6.32-6-pve: 2.6.32-55
lvm2: 2.02.95-1pve2
clvm: 2.02.95-1pve2
corosync-pve: 1.4.3-1
openais-pve: 1.1.4-2
libqb: 0.10.1-2
redhat-cluster-pve: 3.1.92-3
resource-agents-pve: 3.9.2-3
fence-agents-pve: 3.1.8-1
pve-cluster: 1.0-27
qemu-server: 2.0-52
pve-firmware: 1.0-18
libpve-common-perl: 1.0-30
libpve-access-control: 1.0-24
libpve-storage-perl: 2.0-31
vncterm: 1.0-3
vzctl: 3.0.30-2pve5
vzprocps: 2.0.11-2
vzquota: 3.0.12-3
pve-qemu-kvm: 1.1-8
ksm-control-daemon: 1.1-1
Immediately prior to the hang, I get these log messages:
Aug 30 01:10:05 vmserver kernel: usb 1-1.1: USB disconnect, device number 3
Aug 30 01:10:55 vmserver kernel: arcmsr0: abort device command of scsi id = 0 lun = 0
Aug 30 01:10:58 vmserver kernel: arcmsr0: abort device command of scsi id = 0 lun = 0
Aug 30 01:11:01 vmserver kernel: arcmsr0: abort device command of scsi id = 0 lun = 0
Aug 30 01:11:03 vmserver kernel: arcmsr0: abort device command of scsi id = 0 lun = 0
Aug 30 01:11:06 vmserver kernel: arcmsr0: abort device command of scsi id = 0 lun = 0
Aug 30 01:11:09 vmserver kernel: arcmsr0: abort device command of scsi id = 0 lun = 1
Aug 30 01:11:11 vmserver kernel: arcmsr0: abort device command of scsi id = 0 lun = 1
Aug 30 01:11:14 vmserver kernel: arcmsr0: abort device command of scsi id = 0 lun = 0
Aug 30 01:11:16 vmserver kernel: arcmsr0: abort device command of scsi id = 0 lun = 0
Aug 30 01:11:19 vmserver kernel: arcmsr0: abort device command of scsi id = 0 lun = 0
Aug 30 01:11:22 vmserver kernel: arcmsr0: abort device command of scsi id = 0 lun = 0
Aug 30 01:11:24 vmserver kernel: arcmsr0: abort device command of scsi id = 0 lun = 0
Aug 30 01:11:27 vmserver kernel: arcmsr0: abort device command of scsi id = 0 lun = 0
Aug 30 01:11:30 vmserver kernel: arcmsr0: abort device command of scsi id = 0 lun = 0
Aug 30 01:11:32 vmserver kernel: arcmsr0: abort device command of scsi id = 0 lun = 0
Aug 30 01:11:35 vmserver kernel: arcmsr0: abort device command of scsi id = 0 lun = 0
Aug 30 01:11:38 vmserver kernel: arcmsr0: abort device command of scsi id = 0 lun = 0
Aug 30 01:11:40 vmserver kernel: arcmsr0: abort device command of scsi id = 0 lun = 0
Aug 30 01:11:43 vmserver kernel: arcmsr0: abort device command of scsi id = 0 lun = 1
Aug 30 01:11:45 vmserver kernel: arcmsr0: abort device command of scsi id = 0 lun = 1
Aug 30 01:11:48 vmserver kernel: arcmsr0: abort device command of scsi id = 0 lun = 1
Aug 30 01:11:51 vmserver kernel: arcmsr0: abort device command of scsi id = 0 lun = 1
Aug 30 01:11:53 vmserver kernel: arcmsr0: abort device command of scsi id = 0 lun = 1
Aug 30 01:11:56 vmserver kernel: arcmsr0: abort device command of scsi id = 0 lun = 0
Aug 30 01:11:59 vmserver kernel: arcmsr0: abort device command of scsi id = 0 lun = 0
Aug 30 01:12:01 vmserver kernel: arcmsr0: abort device command of scsi id = 0 lun = 0
Aug 30 01:12:04 vmserver kernel: arcmsr0: abort device command of scsi id = 0 lun = 0
Aug 30 01:12:06 vmserver kernel: arcmsr0: abort device command of scsi id = 0 lun = 0
Aug 30 01:12:09 vmserver kernel: arcmsr0: abort device command of scsi id = 0 lun = 0
Aug 30 01:12:12 vmserver kernel: arcmsr0: abort device command of scsi id = 0 lun = 0
Aug 30 01:12:14 vmserver kernel: arcmsr0: abort device command of scsi id = 0 lun = 0
Aug 30 01:12:17 vmserver kernel: arcmsr0: abort device command of scsi id = 0 lun = 0
Aug 30 01:12:20 vmserver kernel: arcmsr0: abort device command of scsi id = 0 lun = 0
Aug 30 01:12:22 vmserver kernel: arcmsr0: abort device command of scsi id = 0 lun = 0
Aug 30 01:12:25 vmserver kernel: arcmsr0: abort device command of scsi id = 0 lun = 0
Aug 30 01:12:27 vmserver kernel: arcmsr0: abort device command of scsi id = 0 lun = 0
Aug 30 01:12:30 vmserver kernel: arcmsr0: abort device command of scsi id = 0 lun = 0
Aug 30 01:12:33 vmserver kernel: arcmsr0: abort device command of scsi id = 0 lun = 0
Aug 30 01:12:35 vmserver kernel: arcmsr0: abort device command of scsi id = 0 lun = 0
Aug 30 01:12:37 vmserver kernel: vgs D ffff88081235a040 0 2362 1762 0 0x00000000
Aug 30 01:12:37 vmserver kernel: ffff880837d39aa8 0000000000000082 ffff880837d39a38 ffffffff8107f28d
Aug 30 01:12:37 vmserver kernel: 0000000000000008 0000000000001000 ffff880837fba138 ffff880837fba048
Aug 30 01:12:37 vmserver kernel: ffff880836da8eb0 ffff88081235a5e0 ffff880837d39fd8 ffff880837d39fd8
Aug 30 01:12:37 vmserver kernel: Call Trace:
Aug 30 01:12:37 vmserver kernel: [<ffffffff8107f28d>] ? del_timer+0x7d/0xe0
Aug 30 01:12:37 vmserver kernel: [<ffffffff81525ed3>] io_schedule+0x73/0xc0
Aug 30 01:12:37 vmserver kernel: [<ffffffff811d2bce>] __blockdev_direct_IO_newtrunc+0x6ee/0xb80
Aug 30 01:12:37 vmserver kernel: [<ffffffff811d30be>] __blockdev_direct_IO+0x5e/0xd0
Aug 30 01:12:37 vmserver kernel: [<ffffffff811cf940>] ? blkdev_get_blocks+0x0/0xc0
Aug 30 01:12:37 vmserver kernel: [<ffffffff811d0797>] blkdev_direct_IO+0x57/0x60
Aug 30 01:12:37 vmserver kernel: [<ffffffff811cf940>] ? blkdev_get_blocks+0x0/0xc0
Aug 30 01:12:37 vmserver kernel: [<ffffffff81123f38>] mapping_direct_IO+0x48/0x70
Aug 30 01:12:37 vmserver kernel: [<ffffffff8112713b>] generic_file_read_iter+0x60b/0x680
Aug 30 01:12:37 vmserver kernel: [<ffffffff811d0fe9>] ? __blkdev_get+0x1a9/0x3c0
Aug 30 01:12:37 vmserver kernel: [<ffffffff811d1220>] ? blkdev_open+0x0/0xc0
Aug 30 01:12:37 vmserver kernel: [<ffffffff811d1210>] ? blkdev_get+0x10/0x20
Aug 30 01:12:37 vmserver kernel: [<ffffffff8112723b>] generic_file_aio_read+0x8b/0xa0
Aug 30 01:12:37 vmserver kernel: [<ffffffff811949da>] do_sync_read+0xfa/0x140
Aug 30 01:12:37 vmserver kernel: [<ffffffff81095be0>] ? autoremove_wake_function+0x0/0x40
Aug 30 01:12:37 vmserver kernel: [<ffffffff811cfd7c>] ? block_ioctl+0x3c/0x40
Aug 30 01:12:37 vmserver kernel: [<ffffffff811a852a>] ? do_vfs_ioctl+0x8a/0x5d0
Aug 30 01:12:37 vmserver kernel: [<ffffffff811953c5>] vfs_read+0xb5/0x1a0
Aug 30 01:12:37 vmserver kernel: [<ffffffff81195501>] sys_read+0x51/0x90
Aug 30 01:12:37 vmserver kernel: [<ffffffff8100b182>] system_call_fastpath+0x16/0x1b
Aug 30 01:12:38 vmserver kernel: arcmsr0: abort device command of scsi id = 0 lun = 0
Aug 30 01:12:41 vmserver kernel: arcmsr0: abort device command of scsi id = 0 lun = 0
Aug 30 01:12:43 vmserver kernel: arcmsr0: abort device command of scsi id = 0 lun = 0
Aug 30 01:12:46 vmserver kernel: arcmsr0: abort device command of scsi id = 0 lun = 0
Aug 30 01:12:48 vmserver kernel: arcmsr0: abort device command of scsi id = 0 lun = 0
Aug 30 01:12:51 vmserver kernel: arcmsr0: abort device command of scsi id = 0 lun = 0
Aug 30 01:12:54 vmserver kernel: arcmsr0: abort device command of scsi id = 0 lun = 0
Aug 30 01:12:56 vmserver kernel: arcmsr0: abort device command of scsi id = 0 lun = 0
Aug 30 01:12:59 vmserver kernel: arcmsr0: abort device command of scsi id = 0 lun = 0
Aug 30 01:13:02 vmserver kernel: arcmsr0: abort device command of scsi id = 0 lun = 0
Aug 30 01:13:04 vmserver kernel: arcmsr0: abort device command of scsi id = 0 lun = 0
Aug 30 01:13:07 vmserver kernel: arcmsr0: abort device command of scsi id = 0 lun = 1
Aug 30 01:13:09 vmserver kernel: arcmsr0: abort device command of scsi id = 0 lun = 1
Aug 30 01:13:12 vmserver kernel: arcmsr0: abort device command of scsi id = 0 lun = 0
Aug 30 01:13:15 vmserver kernel: arcmsr0: abort device command of scsi id = 0 lun = 0
Aug 30 01:13:17 vmserver kernel: arcmsr0: abort device command of scsi id = 0 lun = 0
Aug 30 01:13:20 vmserver kernel: arcmsr0: abort device command of scsi id = 0 lun = 0
Aug 30 01:13:23 vmserver kernel: arcmsr0: abort device command of scsi id = 0 lun = 0
Aug 30 01:13:25 vmserver kernel: arcmsr0: abort device command of scsi id = 0 lun = 0
Aug 30 01:13:51 vmserver kernel: arcmsr0: wait 'abort all outstanding command' timeout
Aug 30 01:13:51 vmserver kernel: arcmsr0: executing hw bus reset .....
The only commit I see touching Areca config is 2355a7395845b48e18b9f1a52e11f6abc4044d6f.
Has anyone else seen anything like this? After rolling back to 2.6.32-11, everything seems great again.
root@vmserver:~# pveversion --verbose
pve-manager: 2.1-14 (pve-manager/2.1/f32f3f46)
running kernel: 2.6.32-11-pve
proxmox-ve-2.6.32: 2.1-74
pve-kernel-2.6.32-10-pve: 2.6.32-64
pve-kernel-2.6.32-11-pve: 2.6.32-66
pve-kernel-2.6.32-13-pve: 2.6.32-72
pve-kernel-2.6.32-12-pve: 2.6.32-68
pve-kernel-2.6.32-14-pve: 2.6.32-74
pve-kernel-2.6.32-6-pve: 2.6.32-55
lvm2: 2.02.95-1pve2
clvm: 2.02.95-1pve2
corosync-pve: 1.4.3-1
openais-pve: 1.1.4-2
libqb: 0.10.1-2
redhat-cluster-pve: 3.1.92-3
resource-agents-pve: 3.9.2-3
fence-agents-pve: 3.1.8-1
pve-cluster: 1.0-27
qemu-server: 2.0-52
pve-firmware: 1.0-18
libpve-common-perl: 1.0-30
libpve-access-control: 1.0-24
libpve-storage-perl: 2.0-31
vncterm: 1.0-3
vzctl: 3.0.30-2pve5
vzprocps: 2.0.11-2
vzquota: 3.0.12-3
pve-qemu-kvm: 1.1-8
ksm-control-daemon: 1.1-1