virtual disk timeout

vkhera · Sep 9, 2015

Sometime overnight one of my VM's running FreeBSD 10.2 reported scsi subsystem timeouts on all four of its virtual drives:

Code:

(da1:vtscsi0:0:0:1): UNMAP. CDB: 42 00 00 00 00 00 00 00 88 00
(da1:vtscsi0:0:0:1): CAM status: Command timeout
(da1:vtscsi0:0:0:1): Retrying command
(da2:vtscsi0:0:0:2): UNMAP. CDB: 42 00 00 00 00 00 00 00 78 00
(da2:vtscsi0:0:0:2): CAM status: Command timeout
(da2:vtscsi0:0:0:2): Retrying command
(da0:vtscsi0:0:0:0): UNMAP. CDB: 42 00 00 00 00 00 00 01 38 00
(da0:vtscsi0:0:0:0): CAM status: Command timeout
(da0:vtscsi0:0:0:0): Retrying command
(da3:vtscsi0:0:0:3): UNMAP. CDB: 42 00 00 00 00 00 00 00 e8 00
(da3:vtscsi0:0:0:3): CAM status: Command timeout
(da3:vtscsi0:0:0:3): Retrying command

It seems everything recovered ok, and the VM's kernel did not crash. However, this is concerning to me that this would happen at all.

Last Friday, one of my other VM's running Switchvox (embedded linux-based asterisk server) seems to also have encountered such an error with an IDE virtual disk and tried to re-mount the file system read-only, only to fail horribly. Unfortunately in that case I had to restore the VM given the embedded system could not boot to single user for repairs

My best guess is several of these events logged in the PVE server's log are related, but I really have no clue:

Code:

Sep  9 03:00:17 pve1 kernel: [2299455.789117] Large kmem_alloc(65536, 0x1000), please file an issue at:
Sep  9 03:00:17 pve1 kernel: [2299455.789117] https://github.com/zfsonlinux/zfs/issues/new
Sep  9 03:00:17 pve1 kernel: [2299455.789124] CPU: 14 PID: 4708 Comm: zvol Tainted: P           O    4.1.3-1-pve #1
Sep  9 03:00:17 pve1 kernel: [2299455.789126] Hardware name: Silicon Mechanics Rackform R308.v5/X10DRL-i, BIOS 1.1 04/09/2015
 ... followed by a lot of hex dump and call trace ...

Anyhow, has anyone else seen virtual disk timeouts like this? I'm running 4.0 beta 1 updated, and using the virtio-scsi drivers in my guest.

manu · Sep 10, 2015

Hi
The Large kmem_alloc issue semms to be inoffensive according to the zfsOnLinux bug tracker.

Do you see anything suspicious in the graphs provided in the Summary tab of your hardware node, especially an increase in iowait ?

vkhera · Sep 11, 2015

Is there a way to scroll back time on the graphs? I cannot see an obvious way.

Search

Search

virtual disk timeout

vkhera

Member

manu

Proxmox Staff Member

vkhera

Member