Problems with IDE disk in KVM

DirkB

New Member
Aug 13, 2012
10
0
1
Proxmox 2, fully updated to latest version, kernel 2.6.32.-13-pve

Using KVM with raw-disks only for all VMs

Suse Enterprise server 11SP1 (3 disks) has problems with its system disk (IDE, no cache).
After a few hours the root-filesystem is suddenly switched to read-only and in the messages file i see:

kernel: [89254.816157] ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen

kernel: [89254.816157] ata1.00: failed command: WRITE DMA
kernel: [89254.816157] ata1.00: cmd ca/00:10:f0:3a:30/00:00:00:00:00/e0 tag 0 dma 8192 out
kernel: [89254.816157] res 40/00:01:01:4f:c2/00:00:00:00:00/a0 Emask 0x4 (timeout)
kernel: [89254.816157] ata1.00: status: { DRDY }
kernel: [89254.816157] ata1: soft resetting link
kernel: [89254.972483] ata1.01: NODEV after polling detection
kernel: [89254.974483] ata1.00: configured for MWDMA2
kernel: [89254.974483] ata1: EH complete
kernel: [89315.804243] ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen
kernel: [89315.804248] ata1.00: failed command: WRITE DMA
kernel: [89315.804252] ata1.00: cmd ca/00:10:f0:3a:30/00:00:00:00:00/e0 tag 0 dma 8192 out
kernel: [89315.804253] res 40/00:01:01:4f:c2/00:00:00:00:00/a0 Emask 0x4 (timeout)
kernel: [89315.804255] ata1.00: status: { DRDY }
kernel: [89315.804322] ata1: soft resetting link
kernel: [89315.960467] ata1.01: NODEV after polling detection
kernel: [89315.961265] ata1.00: configured for MWDMA2
kernel: [89315.961297] ata1: EH complete
kernel: [89376.804742] ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen

kernel: [89376.804748] ata1.00: failed command: WRITE DMA
kernel: [89376.804755] ata1.00: cmd ca/00:10:f0:3a:30/00:00:00:00:00/e0 tag 0 dma 8192 out
kernel: [89376.804756] res 40/00:01:01:4f:c2/00:00:00:00:00/a0 Emask 0x4 (timeout)
kernel: [89376.804759] ata1.00: status: { DRDY }
kernel: [89376.804862] ata1: soft resetting link

kernel: [89376.960472] ata1.01: NODEV after polling detection
kernel: [89376.961130] ata1.00: configured for MWDMA2
kernel: [89376.961158] ata1: EH complete
kernel: [89563.816146] ata1.00: limiting speed to MWDMA1:PIO4
kernel: [89563.816147] ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen
kernel: [89563.816150] ata1.00: failed command: WRITE DMA
kernel: [89563.816154] ata1.00: cmd ca/00:10:20:3d:30/00:00:00:00:00/e0 tag 0 dma 8192 out
kernel: [89563.816155] res 40/00:01:01:4f:c2/00:00:00:00:00/a0 Emask 0x4 (timeout)

kernel: [89563.816157] ata1.00: status: { DRDY }
kernel: [89563.816225] ata1: soft resetting link

Tried to use SCSI / virtio for system disk, proxmox won-t even boot the disk than so that-s no option.
Have to reboot to get the file-system back to RW because the disk is flagged as write-protected inside the KVM.

Thanks for any suggestions
Dirk
 
pls post the full output of :


  • pveversion -v
  • proxperf
  • VMID.conf of your guest
  • any info about your physical hardware? Storage/Raid setup? ECC ram?

also check the syslog of the host.
 
pveversion:
[TABLE="class: outer_border, width: 500"]
[TR]
[TD]pve-manager: 2.1-13 (pve-manager/2.1/bdd3663d)
running kernel: 2.6.32-13-pve
proxmox-ve-2.6.32: 2.1-72
pve-kernel-2.6.32-11-pve: 2.6.32-66
pve-kernel-2.6.32-13-pve: 2.6.32-72
lvm2: 2.02.95-1pve2
clvm: 2.02.95-1pve2
corosync-pve: 1.4.3-1
openais-pve: 1.1.4-2
libqb: 0.10.1-2
redhat-cluster-pve: 3.1.92-2
resource-agents-pve: 3.9.2-3
fence-agents-pve: 3.1.8-1
pve-cluster: 1.0-27
qemu-server: 2.0-47
pve-firmware: 1.0-17
libpve-common-perl: 1.0-28
libpve-access-control: 1.0-24
libpve-storage-perl: 2.0-29
vncterm: 1.0-2
vzctl: 3.0.30-2pve5
vzprocps: 2.0.11-2
vzquota: 3.0.12-3
pve-qemu-kvm: 1.1-6
ksm-control-daemon: 1.1-1
[/TD]
[/TR]
[/TABLE]


pveperf (i guess proxperf was a typo...):
[TABLE="class: outer_border, width: 500"]
[TR]
[TD]CPU BOGOMIPS: 76798.12
REGEX/SECOND: 923841
HD SIZE: 94.49 GB (/dev/mapper/pve-root)
BUFFERED READS: 388.39 MB/sec
AVERAGE SEEK TIME: 7.01 ms
FSYNCS/SECOND: 1847.56
DNS EXT: 47.27 ms
DNS INT: 0.74 ms
[/TD]
[/TR]
[/TABLE]

VM.conf:
[TABLE="class: outer_border, width: 500"]
[TR]
[TD]boot: c
bootdisk: ide0
cores: 2
ide0: local:103/vm-103-disk-1.raw
ide3: none,media=cdrom
memory: 4096
name=xyz
net0: e1000=EA:C6:98:34:0E:36,bridge=vmbr0
onboot: 1
ostype: l26
sockets: 2
virtio1: local:103/vm-103-disk-2.raw
virtio2: local:103/vm-103-disk-3.raw
[/TD]
[/TR]
[/TABLE]

Storage is an Adaptec 6805, 4 disks/2TB, RAID-10, net 4 TB
It-s an INTEL Dual-CPU, Supermicro Motherboard, 24 GB ECC RAM

Thanks

Dirk
 
I've seen the same sort of issue occurring running on a Dell 1950 with RAID/1 on a perc5/i controller.

Several CentOS 6.2 VM's are showing this error, with the following error appearing in the proxmox host syslog

Sep 4 03:32:43 proxmox kernel: mptscsih: ioc0: attempting task abort! (sc=ffff880082c9aec0)
Sep 4 03:32:43 proxmox kernel: sd 2:1:0:0: [sda] CDB: Write(10): 2a 00 1e dc da 80 00 00 28 00
Sep 4 03:32:43 proxmox kernel: mptscsih: ioc0: task abort: FAILED (rv=2003) (sc=ffff880082c9aec0)
Sep 4 03:32:43 proxmox kernel: mptscsih: ioc0: attempting task abort! (sc=ffff880082c9a0c0)
Sep 4 03:32:43 proxmox kernel: sd 2:1:0:0: [sda] CDB: Write(10): 2a 00 09 9c 38 90 00 00 08 00
Sep 4 03:32:43 proxmox kernel: mptscsih: ioc0: task abort: FAILED (rv=2003) (sc=ffff880082c9a0c0)
Sep 4 03:32:43 proxmox kernel: mptscsih: ioc0: attempting task abort! (sc=ffff88007b4bb380)
Sep 4 03:32:43 proxmox kernel: sd 2:1:0:0: [sda] CDB: Write(10): 2a 00 09 9c 50 20 00 00 08 00
Sep 4 03:32:43 proxmox kernel: mptscsih: ioc0: task abort: FAILED (rv=2003) (sc=ffff88007b4bb380)
Sep 4 03:32:43 proxmox kernel: mptscsih: ioc0: attempting task abort! (sc=ffff88007b4bb280)
Sep 4 03:32:43 proxmox kernel: sd 2:1:0:0: [sda] CDB: Write(10): 2a 00 09 8a 0c 28 00 00 08 00
Sep 4 03:32:43 proxmox kernel: mptscsih: ioc0: task abort: FAILED (rv=2003) (sc=ffff88007b4bb280)
Sep 4 03:32:43 proxmox kernel: mptscsih: ioc0: attempting target reset! (sc=ffff880082c9aec0)
Sep 4 03:32:43 proxmox kernel: sd 2:1:0:0: [sda] CDB: Write(10): 2a 00 1e dc da 80 00 00 28 00
Sep 4 03:32:44 proxmox kernel: mptscsih: ioc0: target reset: SUCCESS (sc=ffff880082c9aec0)

I am running proxmox 2.1-14/f32f3f46

Does anybody have any ideas what could be causing this?