Disk IO errors on all sata disks, disks dropped out

check-ict · Nov 11, 2011

Hello all,

I'm running Proxmox 1.8 on a Supermicro server with 8 hot-swap bays with hardware raid and 2 normals bays for sata disks (used for back-up).

This night, when the back-ups started, both sata disks crashed suddenly.

VMID	NAME	STATUS	TIME	SIZE	FILENAME
101	X	OK	00:08:36	/mnt/backup2/vzdump-qemu-101-2011_11_11-02_00_01.tar
105	XX	OK	00:02:16	/mnt/backup2/vzdump-qemu-105-2011_11_11-02_08_37.tar
106	XXX	OK	00:03:41	/mnt/backup2/vzdump-qemu-106-2011_11_11-02_10_53.tar
107	VM 107	FAILED	00:00:00

[TD="align: right"]11.07GB[/TD]

[TD="align: right"]11.40GB[/TD]

[TD="align: right"]7.29GB[/TD]

[TD="colspan: 2"]unable to create temporary directory '/mnt/backup2/vzdump-qemu-107-2011_11_11-02_14_34.tmp' at /usr/share/perl5/PVE/VZDump.pm line 830.[/TD]

When I checked the disk, it was mounted as read-only. I could still see the data, but nothing was working.

When I checked /mnt/backup1, wich isn't used for snapshot back-ups, i noticed it was also mounted as read-only.

I umounted both sata disks and tried to mount them again:
proxmox01:/mnt# mount -a
mount: wrong fs type, bad option, bad superblock on /dev/sdc1,
missing codepage or helper program, or other error
In some cases useful info is found in syslog - try
dmesg | tail or so

mount: wrong fs type, bad option, bad superblock on /dev/sdb1,
missing codepage or helper program, or other error
In some cases useful info is found in syslog - try
dmesg | tail or so

After this I did dmesg | tail
proxmox01:/mnt# dmesg | tail
sd 5:0:1:0: [sdc] Unhandled error code
sd 5:0:1:0: [sdc] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
sd 5:0:1:0: [sdc] CDB: Read(10): 28 00 00 00 00 24 00 00 02 00
end_request: I/O error, dev sdc, sector 36
EXT3-fs (sdc1): error: unable to read superblock
sd 5:0:0:0: [sdb] Unhandled error code
sd 5:0:0:0: [sdb] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
sd 5:0:0:0: [sdb] CDB: Read(10): 28 00 00 00 00 24 00 00 02 00
end_request: I/O error, dev sdb, sector 36
EXT3-fs (sdb1): error: unable to read superblock

I tried to recover with fsck:
proxmox01:/mnt# fsck /dev/sdb1
fsck 1.41.3 (12-Oct-2008)
e2fsck 1.41.3 (12-Oct-2008)
fsck.ext2: Attempt to read block from filesystem resulted in short read while trying to open /dev/sdb1
Could this be a zero-length partition?

I located the back-up superblocks:
mke2fs -n /dev/sdb1
Superblock backups stored on blocks:
32768, 98304, 163840, 229376, 294912, 819200, 884736, 1605632, 2654208,
4096000, 7962624, 11239424, 20480000, 23887872, 71663616, 78675968,
102400000, 214990848

Tried to restore from a back-up superblock:
fsck -b 32768 /dev/sdb1
fsck -b 214990848 /dev/sdb1
Same error as with normal superblock

After this I installed smartctl and checked the disks SMART status, it's OK on both disks.

I try to read the partitions:
fdisk -l = nogo
proxmox01:/mnt# fdisk /dev/sdb
Unable to read /dev/sdb

I tried to format the disks:
Warning: could not read block 0: Attempt to read block from filesystem resulted in short read

Nothing is working, not even formatting the disks. Somehow the server lost those 2 sata disks without a real disk problem.

I had this problem before on a smaller server. When I was doing snapshots + normal file back-up, the disks crashed constantly. Maybe this is the samen problem I have now.

At this moment I can't shutdown the Proxmox host, it's live now and located in a datacenter. Can anyone tell me what I can try to recover those drives? I don't care if it needs to be formatted, I just need them back for good back-ups.

proxmox01:/mnt# pveversion -v
pve-manager: 1.8-18 (pve-manager/1.8/6070)
running kernel: 2.6.35-1-pve
proxmox-ve-2.6.35: 1.8-11
pve-kernel-2.6.32-4-pve: 2.6.32-33
pve-kernel-2.6.35-1-pve: 2.6.35-11
qemu-server: 1.1-30
pve-firmware: 1.0-11
libpve-storage-perl: 1.0-17
vncterm: 0.9-2
vzctl: 3.0.28-1pve1
vzdump: 1.2-14
vzprocps: 2.0.11-2
vzquota: 3.0.11-1
pve-qemu-kvm: 0.14.1-1
ksm-control-daemon: 1.0-6

proxmox01:/mnt# pveperf
CPU BOGOMIPS: 32003.38
REGEX/SECOND: 660707
HD SIZE: 9.17 GB (/dev/mapper/pve-os)
BUFFERED READS: 293.76 MB/sec
AVERAGE SEEK TIME: 9.94 ms
FSYNCS/SECOND: 1288.16
DNS EXT: 57.91 ms

The hardware is: http://www.supermicro.com/Aplus/system/2U/2022/AS-2022G-URF.cfm
With 64GB mem and 1 AMD 8 core cpu, running around 20+ vm's
8 sata disks with a adaptec 2805 hardware raid
2 sata disks connected to the motherboard sata connectors (non-raid)

udo · Nov 11, 2011

Hi,
any error from your disks?
You can look with smartmontools:

Code:

apt-get install smartmontools
smartctl --all /dev/sdb
smartctl --all /dev/sdc

Perhaps something strange with the mainboard (because of problems with both disks on the same time)...

Udo

check-ict · Nov 11, 2011

proxmox01:/mnt# smartctl --all /dev/sdb
smartctl version 5.38 [x86_64-unknown-linux-gnu] Copyright (C) 2002-8 Bruce Allen
Home page is http://smartmontools.sourceforge.net/

Short INQUIRY response, skip product id
A mandatory SMART command failed: exiting. To continue, add one or more '-T permissive' options.
proxmox01:/mnt# smartctl --all /dev/sdb -T permissive
smartctl version 5.38 [x86_64-unknown-linux-gnu] Copyright (C) 2002-8 Bruce Allen
Home page is http://smartmontools.sourceforge.net/

Short INQUIRY response, skip product id
SMART Health Status: OK
Read defect list: asked for grown list but didn't get it

Error Counter logging not supported
Device does not support Self Test logging

udo · Nov 11, 2011

check-ict said:
proxmox01:/mnt# smartctl --all /dev/sdb
smartctl version 5.38 [x86_64-unknown-linux-gnu] Copyright (C) 2002-8 Bruce Allen
Home page is http://smartmontools.sourceforge.net/

Short INQUIRY response, skip product id
A mandatory SMART command failed: exiting. To continue, add one or more '-T permissive' options.
proxmox01:/mnt# smartctl --all /dev/sdb -T permissive
smartctl version 5.38 [x86_64-unknown-linux-gnu] Copyright (C) 2002-8 Bruce Allen
Home page is http://smartmontools.sourceforge.net/

Short INQUIRY response, skip product id
SMART Health Status: OK
Read defect list: asked for grown list but didn't get it

Error Counter logging not supported
Device does not support Self Test logging

Looks not so good. Can't access the devices. Power cycle??

Udo

PS: are the disks connected to one powercord? Perhaps the drive-power are gone?

check-ict · Nov 11, 2011

I can try a power cycle, but as said I rather avoid downtime. I will have to go to the datacenter etc. wich I hate

The disks are connected to the same power cable, but the power supply is redundant. Also I think some other devices are connected to the same power cable, like backplane expanders and CD-drive.

check-ict · Nov 11, 2011

Nov 11 02:10:53 proxmox01 vzdump[20951]: INFO: Starting Backup of VM 106 (qemu)
Nov 11 02:10:54 proxmox01 kernel: EXT3-fs: barriers not enabled
Nov 11 02:10:54 proxmox01 kernel: kjournald starting. Commit interval 5 seconds
Nov 11 02:10:54 proxmox01 kernel: EXT3-fs (dm-3): using internal journal
Nov 11 02:10:54 proxmox01 kernel: EXT3-fs (dm-3): mounted filesystem with ordered data mode
Nov 11 02:13:12 proxmox01 proxwww[22933]: Starting new child 22933
Nov 11 02:13:26 proxmox01 kernel: ata5.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6
Nov 11 02:13:26 proxmox01 kernel: ata5.00: BMDMA stat 0x64
Nov 11 02:13:26 proxmox01 kernel: ata5.00: failed command: READ DMA
Nov 11 02:13:26 proxmox01 kernel: ata5.00: cmd c8/00:08:ba:f5:a7/00:00:00:00:00/e4 tag 0 dma 4096 in
Nov 11 02:13:26 proxmox01 kernel: res 51/b4:02:c0:f5:a7/00:00:00:00:00/e4 Emask 0x10 (ATA bus error)
Nov 11 02:13:26 proxmox01 kernel: ata5.00: status: { DRDY ERR }
Nov 11 02:13:26 proxmox01 kernel: ata5.00: error: { ICRC IDNF ABRT }
Nov 11 02:13:26 proxmox01 kernel: ata5: soft resetting link
Nov 11 02:13:31 proxmox01 kernel: ata5: link is slow to respond, please be patient (ready=0)
Nov 11 02:13:36 proxmox01 kernel: ata5: SRST failed (errno=-16)
Nov 11 02:13:36 proxmox01 kernel: ata5: soft resetting link
Nov 11 02:13:41 proxmox01 kernel: ata5: link is slow to respond, please be patient (ready=0)
Nov 11 02:13:46 proxmox01 kernel: ata5: SRST failed (errno=-16)
Nov 11 02:13:46 proxmox01 kernel: ata5: soft resetting link
Nov 11 02:13:51 proxmox01 kernel: ata5: link is slow to respond, please be patient (ready=0)
Nov 11 02:14:21 proxmox01 kernel: ata5: SRST failed (errno=-16)
Nov 11 02:14:21 proxmox01 kernel: ata5: soft resetting link
Nov 11 02:14:26 proxmox01 kernel: ata5: SRST failed (errno=-16)
Nov 11 02:14:26 proxmox01 kernel: ata5: reset failed, giving up
Nov 11 02:14:26 proxmox01 kernel: ata5.00: disabled
Nov 11 02:14:26 proxmox01 kernel: ata5.01: disabled
Nov 11 02:14:26 proxmox01 kernel: ata5: EH complete
Nov 11 02:14:26 proxmox01 kernel: sd 5:0:0:0: [sdb] Unhandled error code
Nov 11 02:14:26 proxmox01 kernel: sd 5:0:0:0: [sdb] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
Nov 11 02:14:26 proxmox01 kernel: sd 5:0:0:0: [sdb] CDB: Read(10): 28 00 04 a7 f5 ba 00 00 08 00
Nov 11 02:14:26 proxmox01 kernel: end_request: I/O error, dev sdb, sector 78116282
Nov 11 02:14:26 proxmox01 kernel: EXT3-fs error (device sdb1): ext3_free_branches:
Nov 11 02:14:26 proxmox01 kernel: sd 5:0:0:0: [sdb] Unhandled error code
Nov 11 02:14:26 proxmox01 kernel: sd 5:0:0:0: [sdb] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
Nov 11 02:14:26 proxmox01 kernel: sd 5:0:0:0: [sdb] CDB: Write(10): 2a 00 16 67 88 5a 00 00 08 00
Nov 11 02:14:26 proxmox01 kernel: end_request: I/O error, dev sdb, sector 375883866

And later sdc:
Nov 11 02:14:33 proxmox01 kernel: sd 5:0:1:0: [sdc] Unhandled error code
Nov 11 02:14:33 proxmox01 kernel: sd 5:0:1:0: [sdc] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
Nov 11 02:14:33 proxmox01 kernel: sd 5:0:1:0: [sdc] CDB: Read(10): 28 00 00 00 00 22 00 00 08 00
Nov 11 02:14:33 proxmox01 kernel: end_request: I/O error, dev sdc, sector 34
Nov 11 02:14:34 proxmox01 kernel: EXT3-fs (sdb1): error: ext3_journal_start_sb: Detected aborted journal
Nov 11 02:14:34 proxmox01 kernel: EXT3-fs (sdb1): error: remounting filesystem read-only
Nov 11 02:14:34 proxmox01 vzdump[20951]: INFO: Finished Backup of VM 106 (00:03:41)

And some kernel stuff:
Nov 11 02:14:56 proxmox01 kernel: ------------[ cut here ]------------
Nov 11 02:14:56 proxmox01 kernel: WARNING: at fs/ext3/inode.c:1534 ext3_ordered_writepage+0x53/0x1ac()
Nov 11 02:14:56 proxmox01 kernel: Hardware name: H8DGU
Nov 11 02:14:56 proxmox01 kernel: Modules linked in: ipmi_si ipmi_msghandler i2c_dev iptable_filter ip_tables x_tables vhost_net kvm_amd kvm bridge stp snd_pcm snd_timer snd i2c_piix4 soundcore psmouse snd_p$
Nov 11 02:14:56 proxmox01 kernel: Pid: 21972, comm: flush-8:16 Tainted: G W 2.6.35-1-pve #1
Nov 11 02:14:56 proxmox01 kernel: Call Trace:
Nov 11 02:14:56 proxmox01 kernel: [<ffffffff8117a681>] ? ext3_ordered_writepage+0x53/0x1ac
Nov 11 02:14:56 proxmox01 kernel: [<ffffffff81051838>] warn_slowpath_common+0x85/0xb3
Nov 11 02:14:56 proxmox01 kernel: [<ffffffff81051880>] warn_slowpath_null+0x1a/0x1c
Nov 11 02:14:56 proxmox01 kernel: [<ffffffff8117a681>] ext3_ordered_writepage+0x53/0x1ac
Nov 11 02:14:56 proxmox01 kernel: [<ffffffff810dc03b>] __writepage+0x17/0x34
Nov 11 02:14:56 proxmox01 kernel: [<ffffffff810dc730>] write_cache_pages+0x20a/0x2f3
Nov 11 02:14:56 proxmox01 kernel: [<ffffffff810dc024>] ? __writepage+0x0/0x34
Nov 11 02:14:56 proxmox01 kernel: [<ffffffff810dc83d>] generic_writepages+0x24/0x2a
Nov 11 02:14:56 proxmox01 kernel: [<ffffffff810dc86b>] do_writepages+0x28/0x2a
Nov 11 02:14:56 proxmox01 kernel: [<ffffffff81135b0d>] writeback_single_inode+0xe8/0x30f
Nov 11 02:14:56 proxmox01 kernel: [<ffffffff811360f6>] writeback_sb_inodes+0x153/0x22c
Nov 11 02:14:56 proxmox01 kernel: [<ffffffff81136a11>] writeback_inodes_wb+0x153/0x163
Nov 11 02:14:56 proxmox01 kernel: [<ffffffff81136bdb>] wb_writeback+0x1ba/0x241
Nov 11 02:14:56 proxmox01 kernel: [<ffffffff8102e0e7>] ? default_spin_lock_flags+0x9/0xe
Nov 11 02:14:56 proxmox01 kernel: [<ffffffff814b50bb>] ? _raw_spin_lock_irqsave+0x27/0x31
Nov 11 02:14:56 proxmox01 kernel: [<ffffffff81136da6>] wb_do_writeback+0x144/0x15a
Nov 11 02:14:56 proxmox01 kernel: [<ffffffff814b3e04>] ? schedule_timeout+0xb7/0xe7
Nov 11 02:14:56 proxmox01 kernel: [<ffffffff81136dff>] bdi_writeback_task+0x43/0x118
Nov 11 02:14:56 proxmox01 kernel: [<ffffffff8106afe4>] ? bit_waitqueue+0x17/0xa8
Nov 11 02:14:56 proxmox01 kernel: [<ffffffff810ea3ad>] ? bdi_start_fn+0x0/0xdf
Nov 11 02:14:56 proxmox01 kernel: [<ffffffff810ea427>] bdi_start_fn+0x7a/0xdf
Nov 11 02:14:56 proxmox01 kernel: [<ffffffff810ea3ad>] ? bdi_start_fn+0x0/0xdf
Nov 11 02:14:56 proxmox01 kernel: [<ffffffff8106abe8>] kthread+0x82/0x8a
Nov 11 02:14:56 proxmox01 kernel: [<ffffffff8100ab24>] kernel_thread_helper+0x4/0x10
Nov 11 02:14:56 proxmox01 kernel: [<ffffffff8106ab66>] ? kthread+0x0/0x8a
Nov 11 02:14:56 proxmox01 kernel: [<ffffffff8100ab20>] ? kernel_thread_helper+0x0/0x10
Nov 11 02:14:56 proxmox01 kernel: ---[ end trace 6d5f91b88b04e547 ]---

check-ict · Nov 11, 2011

I just rebooted the Proxmox host. The disks are not visible anymore. You are probably right about the power/cables. I will have to check the server hardware :-(

daveo · Jan 12, 2012

I would be keen to hear if this did actually fix the problem. I have virtually the same supermicro hardware with the same issue. A reboot fixes the problem, but it happens again. I've now stop write caching on the drives and that seems to have made a difference (the backups are very slow, but the server doesn't stop the sd* devices).

check-ict · Jan 12, 2012

Hello Daveo,

I thought I resolved it by rebooting the server and checking the cables, however it crashed again after a week.

I'm going to migrate all VM's to a new server soon, so I will be able to analyse the problem.

At this moment the server is too important to play with, so I create back-ups to a external disk meanwhile.

daveo · Jan 12, 2012

Ah, well we should stay in contact perhaps - mine *isn't* in production yet so I'm trying

a) change the setup so that the whole system is on a seperate lv to the vps data (maybe the snapshotting of the system is a "bad idea")
b) try a re-install of debian lenny with no lv, except for the vps images

If neither of those work I'm going to pursue the idea that there is a bug in a kernel driver so I'm going to re-compile the proxmox kernel

if none of these work I might regret buying a supermicro (;

PS: I've recabled, addressed the PSU issue etc.etc. - I'm 100% confident that this is a software <> hardware incompatibility or perhaps just my software install.

check-ict · Jan 12, 2012

My disks didnt have any LVM. After the first problem, I tried to use RSYNC instead of snapshots.

I had this problem before with a normal tower server. The RAID10 with 4 disks (LVM proxmox) did snapshots to a 2TB back-up disk internally. Every week the disk somehow crashed. I solved it by going to RSYNC instead of snapshots, however this workaround didn't work for the SuperMicro server.

daveo · Jan 18, 2012

Ok - I have all but solved this problem I think...

How to reproduce

vzrestore from a .tar with both source and destination on the same raid1 device whilst there is a simultaneous 'lighter' write operation.

Fix attempts

I wasn't clever enough to think of these myself, this is just a list of things that other folk with similar errors said worked for them.

acpi=off noapic - didn't help, it still happened. (I didn't try libata.noacpi=1)
Disable write cache /sbin/hdparm -W0 /dev/sdX - helped ?maybe?, but still not perfect.
With everything on the root volume i thought perhaps the snapshot was an issue, there seemed to be some notion that snapshotting the system volume was the issue. I repartitioned and mounted the VPS images on a seperate LV. Situation improved, but still not perfect.
Turn off NCQ echo 1 > /sys/block/sdX/device/queue_depth still no luck. (I didn't try libata.force=noncq)
libata.force=1.5Gbps seems to have worked, but I don't really know why. The more I look the less consensus there seems to be, but for me this worked - I'm now 2 days in, 2 raid rebuilds and 12 LVM snapshotted backups (during the [forced] rebuilds) - not one hint of a problem.
I've noticed this thread http://www.msfn.org/board/topic/128092-seagate-barracuda-720011-troubles/ that could be a clue, but I don't think my drives are affected. I have just replaced one of them (about the same time as forcing 1.5Gbps) with an Hitachi but the other is still in there.
Also added libata.noncq and libata.noacpi=1
OK, It's (d) NONE OF THE ABOVE. All they were doing were avoiding the load that caused the first error. A certain operation yields this graph. The Seagates clearly keep the CPU's waiting for over 2 seconds for IO to return where as the Hitachi doesn't. This results in a "task blocked for more than 120 seconds" message and invariably the system hangs up for a few seconds.
Replacing the Seagate drives looks to be the real solution. They seem to respond badly to heavy random IO, but were faster for vaugely sequential IO - still the failure mode was horrendous and I believe due to over optimisation within the firmware. After replacing these with either Hitachi or Western Digital drives the restore completes in 5 minutes with no hangs or other messages in /var/log/syslog.

Here is a graph that might help the understanding:

The system is now stable, and has a WD Caviar Blue with a Hitachi drive where there were 2 Seagates (ST500DM002-1BD142)

check-ict · Jan 18, 2012

I don't see how this could make both drives die in the same second.

I use Samsung F4EG (green ones). These work fine in alot of other (non-supermicro) systems, even in MDADM soft-raid.

I will soon be able to analyse my system. I will try other disks first, so I can check if it's the same problem.

linum · Jan 21, 2012

Your problem seems very common to me. I have two proxmox systems with several ST32000644NS drives (all with firmware SN11) with supermicro mainboards. Both systems are running md softraid (level 1 raid) with LVM. Randomly I get drive errors while the promox backup task (with LVM snapshots) is running. I tried everything I could image, swapping drives, checking drives with different tests and so on. I even swapped the drives with a system that has 8 ST32000644NS disk on a md softraid level 5 with a LSI controller. This systems show never any kind of trouble. Even swapping all disk in that systems with my disk that creates "trouble" didn't cause any problems. So I'm nearly out of options now. My last try was to rebuild the md softraid after using dd if=/dev/zero to both drives, waiting for the md rebuild and after the sync is complete go to restore the vm. This system is stable since last saturday (7 days today). Meanwhile I checked another setup on the backup system. I use md softraid but without LVM. This system hung two time within the last week, so LVM seems not to be involed at all. After reading your post about the problem is related to the seagate drives I found this notice: http://www-947.ibm.com/support/entry/portal/docdisplay?lndocid=MIGR-62832 with notes:

ST3500514NS, ST31000524NS, ST32000644NS to BB29

Fixes for intermittent drive hangs

This would perfectly match your findings and would also fit to mine. I doubt that the Seagte support would be helpful but I will try to get SN12 for my ST32000644NS disks. But to make sure my problems are disk related I maybe try use use 4x500 GB with an adaptec raid controller in one system.

So, thanks a lot for posting your findings.

Search

Search

Disk IO errors on all sata disks, disks dropped out

check-ict

Well-Known Member

udo

Distinguished Member

check-ict

Well-Known Member

udo

Distinguished Member

check-ict

Well-Known Member

check-ict

Well-Known Member

check-ict

Well-Known Member

daveo

Member

check-ict

Well-Known Member

daveo

Member

check-ict

Well-Known Member

daveo

Member

check-ict

Well-Known Member

linum

Renowned Member

We value your privacy