Sometime overnight one of my VM's running FreeBSD 10.2 reported scsi subsystem timeouts on all four of its virtual drives:
It seems everything recovered ok, and the VM's kernel did not crash. However, this is concerning to me that this would happen at all.
Last Friday, one of my other VM's running Switchvox (embedded linux-based asterisk server) seems to also have encountered such an error with an IDE virtual disk and tried to re-mount the file system read-only, only to fail horribly. Unfortunately in that case I had to restore the VM given the embedded system could not boot to single user for repairs
My best guess is several of these events logged in the PVE server's log are related, but I really have no clue:
Anyhow, has anyone else seen virtual disk timeouts like this? I'm running 4.0 beta 1 updated, and using the virtio-scsi drivers in my guest.
Code:
(da1:vtscsi0:0:0:1): UNMAP. CDB: 42 00 00 00 00 00 00 00 88 00
(da1:vtscsi0:0:0:1): CAM status: Command timeout
(da1:vtscsi0:0:0:1): Retrying command
(da2:vtscsi0:0:0:2): UNMAP. CDB: 42 00 00 00 00 00 00 00 78 00
(da2:vtscsi0:0:0:2): CAM status: Command timeout
(da2:vtscsi0:0:0:2): Retrying command
(da0:vtscsi0:0:0:0): UNMAP. CDB: 42 00 00 00 00 00 00 01 38 00
(da0:vtscsi0:0:0:0): CAM status: Command timeout
(da0:vtscsi0:0:0:0): Retrying command
(da3:vtscsi0:0:0:3): UNMAP. CDB: 42 00 00 00 00 00 00 00 e8 00
(da3:vtscsi0:0:0:3): CAM status: Command timeout
(da3:vtscsi0:0:0:3): Retrying command
It seems everything recovered ok, and the VM's kernel did not crash. However, this is concerning to me that this would happen at all.
Last Friday, one of my other VM's running Switchvox (embedded linux-based asterisk server) seems to also have encountered such an error with an IDE virtual disk and tried to re-mount the file system read-only, only to fail horribly. Unfortunately in that case I had to restore the VM given the embedded system could not boot to single user for repairs
My best guess is several of these events logged in the PVE server's log are related, but I really have no clue:
Code:
Sep 9 03:00:17 pve1 kernel: [2299455.789117] Large kmem_alloc(65536, 0x1000), please file an issue at:
Sep 9 03:00:17 pve1 kernel: [2299455.789117] https://github.com/zfsonlinux/zfs/issues/new
Sep 9 03:00:17 pve1 kernel: [2299455.789124] CPU: 14 PID: 4708 Comm: zvol Tainted: P O 4.1.3-1-pve #1
Sep 9 03:00:17 pve1 kernel: [2299455.789126] Hardware name: Silicon Mechanics Rackform R308.v5/X10DRL-i, BIOS 1.1 04/09/2015
... followed by a lot of hex dump and call trace ...
Anyhow, has anyone else seen virtual disk timeouts like this? I'm running 4.0 beta 1 updated, and using the virtio-scsi drivers in my guest.