virtual disk timeout

vkhera

Member
Feb 24, 2015
192
14
18
Maryland, USA
Sometime overnight one of my VM's running FreeBSD 10.2 reported scsi subsystem timeouts on all four of its virtual drives:

Code:
(da1:vtscsi0:0:0:1): UNMAP. CDB: 42 00 00 00 00 00 00 00 88 00
(da1:vtscsi0:0:0:1): CAM status: Command timeout
(da1:vtscsi0:0:0:1): Retrying command
(da2:vtscsi0:0:0:2): UNMAP. CDB: 42 00 00 00 00 00 00 00 78 00
(da2:vtscsi0:0:0:2): CAM status: Command timeout
(da2:vtscsi0:0:0:2): Retrying command
(da0:vtscsi0:0:0:0): UNMAP. CDB: 42 00 00 00 00 00 00 01 38 00
(da0:vtscsi0:0:0:0): CAM status: Command timeout
(da0:vtscsi0:0:0:0): Retrying command
(da3:vtscsi0:0:0:3): UNMAP. CDB: 42 00 00 00 00 00 00 00 e8 00
(da3:vtscsi0:0:0:3): CAM status: Command timeout
(da3:vtscsi0:0:0:3): Retrying command

It seems everything recovered ok, and the VM's kernel did not crash. However, this is concerning to me that this would happen at all.

Last Friday, one of my other VM's running Switchvox (embedded linux-based asterisk server) seems to also have encountered such an error with an IDE virtual disk and tried to re-mount the file system read-only, only to fail horribly. Unfortunately in that case I had to restore the VM given the embedded system could not boot to single user for repairs :(

My best guess is several of these events logged in the PVE server's log are related, but I really have no clue:

Code:
Sep  9 03:00:17 pve1 kernel: [2299455.789117] Large kmem_alloc(65536, 0x1000), please file an issue at:
Sep  9 03:00:17 pve1 kernel: [2299455.789117] https://github.com/zfsonlinux/zfs/issues/new
Sep  9 03:00:17 pve1 kernel: [2299455.789124] CPU: 14 PID: 4708 Comm: zvol Tainted: P           O    4.1.3-1-pve #1
Sep  9 03:00:17 pve1 kernel: [2299455.789126] Hardware name: Silicon Mechanics Rackform R308.v5/X10DRL-i, BIOS 1.1 04/09/2015
 ... followed by a lot of hex dump and call trace ...

Anyhow, has anyone else seen virtual disk timeouts like this? I'm running 4.0 beta 1 updated, and using the virtio-scsi drivers in my guest.
 
Hi
The Large kmem_alloc issue semms to be inoffensive according to the zfsOnLinux bug tracker.

Do you see anything suspicious in the graphs provided in the Summary tab of your hardware node, especially an increase in iowait ?
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!