KVM gets frozen every day, hardware error?

tabbi

New Member
Jul 30, 2011
25
0
1
Hello,
I'm using Proxmox VE 1.9 in a production environment with nearly 30 OpenVZ containers and one KVM machine. The KVM machine is running the Zimbra Collaboration server. Now I have the problem the Zimbra server is unreachable for nearly two hours every day. In the Syslog of that KVM I found the following errors:

Apr 10 08:28:27 mail2 kernel: [81977.835890] ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen
Apr 10 08:28:27 mail2 kernel: [81977.921766] ata1.00: failed command: READ DMA EXT
Apr 10 08:28:27 mail2 kernel: [81977.921772] ata1.00: cmd 25/00:00:60:22:54/00:02:01:00:00/e0 tag 0 dma 262144 in
Apr 10 08:28:27 mail2 kernel: [81977.921774] res 40/00:00:00:00:00/00:00:00:00:00/e0 Emask 0x4 (timeout)
Apr 10 08:28:27 mail2 kernel: [81977.921777] ata1.00: status: { DRDY }
Apr 10 08:28:27 mail2 kernel: [81977.921848] ata1: soft resetting link
Apr 10 08:28:27 mail2 kernel: [81978.082958] ata1.01: NODEV after polling detection
Apr 10 08:28:27 mail2 kernel: [81978.083682] ata1.00: configured for MWDMA2
Apr 10 08:28:27 mail2 kernel: [81978.083688] ata1.00: device reported invalid CHS sector 0
Apr 10 08:28:27 mail2 kernel: [81978.083714] ata1: EH complete
Apr 10 08:29:39 mail2 kernel: [82050.508549] ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0
Apr 10 08:29:39 mail2 kernel: [82050.593690] ata1.00: BMDMA stat 0x4
Apr 10 08:29:40 mail2 kernel: [82050.677816] ata1.00: failed command: READ DMA EXT
Apr 10 08:29:40 mail2 kernel: [82050.760141] ata1.00: cmd 25/00:00:60:26:54/00:02:01:00:00/e0 tag 0 dma 262144 in
Apr 10 08:29:40 mail2 kernel: [82050.760142] res 41/04:00:60:26:54/04:00:60:26:54/e0 Emask 0x1 (device error)
Apr 10 08:29:40 mail2 kernel: [82051.089726] ata1.00: status: { DRDY ERR }
Apr 10 08:29:40 mail2 kernel: [82051.172676] ata1.00: error: { ABRT }
Apr 10 08:29:40 mail2 kernel: [82051.254731] ata1.00: configured for MWDMA2
Apr 10 08:29:40 mail2 kernel: [82051.254731] ata1: EH complete

This seems to tell me that I have a problem with the hard drive. But the hard drive is virtual. Is this a hint that the real hard drive of the server has problem? The other OpenVZ containers keep running while the KVM machine has that problem!

In the host´s syslog I found corresponding entries:
Apr 10 08:28:28 proxmoxhost kernel: ata1: exception Emask 0x40 SAct 0x0 SErr 0x800 action 0x7
Apr 10 08:28:28 proxmoxhost kernel: ata1: SError: { HostInt }
Apr 10 08:28:28 proxmoxhost kernel: ata1: hard resetting link
Apr 10 08:28:29 proxmoxhost kernel: ata1: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
Apr 10 08:28:29 proxmoxhost kernel: ata1.00: configured for UDMA/133
Apr 10 08:28:29 proxmoxhost kernel: sd 0:0:0:0: [sda] Unhandled error code
Apr 10 08:28:29 proxmoxhost kernel: sd 0:0:0:0: [sda] Result: hostbyte=DID_OK driverbyte=DRIVER_TIMEOUT
Apr 10 08:28:29 proxmoxhost kernel: sd 0:0:0:0: [sda] CDB: Read(10): 28 00 62 56 28 36 00 01 02 00
Apr 10 08:28:29 proxmoxhost kernel: end_request: I/O error, dev sda, sector 1649813558
Apr 10 08:28:29 proxmoxhost kernel: ata1: EH complete

Should I replace the (a) hard drive? It´s a rented server where I have no physical access.

Thanks in advance
tabbi
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!