Proxmox 4.4 virtio_scsi regression.

could you post the output of "sg_inq /dev/XYZ" for each of those combinations after installing "sg3-utils"?

Here's some output of
SATA Controller - SATA disk

Code:
root@omv3-kvm:~# sg_inq /dev/sda
standard INQUIRY:
  PQual=0  Device_type=0  RMB=1  LU_CONG=0  version=0x05  [SPC-3]
  [AERC=0]  [TrmTsk=0]  NormACA=0  HiSUP=0  Resp_data_format=2
  SCCS=0  ACC=0  TPGS=0  3PC=0  Protect=0  [BQue=0]
  EncServ=0  MultiP=0  [MChngr=0]  [ACKREQQ=0]  Addr16=0
  [RelAdr=0]  WBus16=0  Sync=0  [Linked=0]  [TranDis=0]  CmdQue=0
  [SPI: Clocking=0x0  QAS=0  IUS=0]
    length=96 (0x60)   Peripheral device type: disk
Vendor identification: ATA
Product identification: WDC WD20EARX-00P
Product revision level: AB51
Unit serial number:      WD-WCAZA9734804
root@omv3-kvm:~# sg_inq /dev/sdb
standard INQUIRY:
  PQual=0  Device_type=0  RMB=1  LU_CONG=0  version=0x05  [SPC-3]
  [AERC=0]  [TrmTsk=0]  NormACA=0  HiSUP=0  Resp_data_format=2
  SCCS=0  ACC=0  TPGS=0  3PC=0  Protect=0  [BQue=0]
  EncServ=0  MultiP=0  [MChngr=0]  [ACKREQQ=0]  Addr16=0
  [RelAdr=0]  WBus16=0  Sync=0  [Linked=0]  [TranDis=0]  CmdQue=0
  [SPI: Clocking=0x0  QAS=0  IUS=0]
    length=96 (0x60)   Peripheral device type: disk
Vendor identification: ATA
Product identification: ST32000644NS
Product revision level: SN11
Unit serial number:             9WM1PNPB
root@omv3-kvm:~# sg_inq /dev/sdc
standard INQUIRY:
  PQual=0  Device_type=0  RMB=1  LU_CONG=0  version=0x05  [SPC-3]
  [AERC=0]  [TrmTsk=0]  NormACA=0  HiSUP=0  Resp_data_format=2
  SCCS=0  ACC=0  TPGS=0  3PC=0  Protect=0  [BQue=0]
  EncServ=0  MultiP=0  [MChngr=0]  [ACKREQQ=0]  Addr16=0
  [RelAdr=0]  WBus16=0  Sync=0  [Linked=0]  [TranDis=0]  CmdQue=0
  [SPI: Clocking=0x0  QAS=0  IUS=0]
    length=96 (0x60)   Peripheral device type: disk
Vendor identification: ATA
Product identification: WDC WD20EARS-00J
Product revision level: 0A80
Unit serial number:      WD-WCAYY0101692
root@omv3-kvm:~# sg_inq /dev/sdd
standard INQUIRY:
  PQual=0  Device_type=0  RMB=1  LU_CONG=0  version=0x05  [SPC-3]
  [AERC=0]  [TrmTsk=0]  NormACA=0  HiSUP=0  Resp_data_format=2
  SCCS=0  ACC=0  TPGS=0  3PC=0  Protect=0  [BQue=0]
  EncServ=0  MultiP=0  [MChngr=0]  [ACKREQQ=0]  Addr16=0
  [RelAdr=0]  WBus16=0  Sync=0  [Linked=0]  [TranDis=0]  CmdQue=0
  [SPI: Clocking=0x0  QAS=0  IUS=0]
    length=96 (0x60)   Peripheral device type: disk
Vendor identification: ATA
Product identification: ST3000DM001-9YN1
Product revision level: CC9E
Unit serial number:             Z1F0LGFR

I might be able to get some
SAS controller - SATA disk result later
 
no, but I would be interested in whether the problem goes away if you do

Code:
echo "madvise" >  /sys/kernel/mm/transparent_hugepage/enabled

before starting the VM.
hi,

with this action, it seems it fixes my problem, it no longer has read/write erros


Edit:

Still having read/write erros.
 
Last edited:
Is there any update on this issue? About to deploy Rockstor on Proxmox.

please read the whole thread. updated packages are available on pvetest and will move to the regular repositories soon.
 
yes - but confirmation from more systems is always a good idea..

the situation is as follows:
  • since qemu 2.7, scsi-block uses SG_IO to talk to pass through disks
  • this can cause issues (failing reads and/or writes) if the hypervisor host has very low free memory or very highly fragmented memory (or both)
  • this was worsened by PVE's kernel defaulting to disabling transparent huge pages (small pages => more fragmentation)
there are two counter measures we will release this week:
  • default to scsi-hd (which is not full pass-through) instead of scsi-block for pass-through, with the possibility to "opt-in" to the old behaviour with all the associated risk (until further notice)
  • enable transparent huge pages for programs explicity requesting them, such as Qemu (to decrease the risk of running into the issue when using scsi-block)
there is unfortunately no upstream fix in sight - we'll investige further this week to look for more complete solutions, but the above should minimize the risk for now.

I was not aware of fixes that you guys are making (or those were not communicated properly).
From your statement I can gather that fix is essentially making sure that I won't use virtio_scsi - but more inept and performance hitting virtio_blk (I know that i's still virtio_scsi, but it will mask it behind a "file like" emulation that is domain of virtio_blk). So this is not a fix per say - it's making sure that non functioning driver is not used at expense of performance.

Essentially I will try to get some spare server soon, for time being I can't stop production servers.

@wbumiller
Don't get this as picking on you, but your theory is in my case slightly defunct (unless you mean _specifically_ mdraid). I get corruption on system where there is:
- 48GB of ram, an only 1VM is created with 20GB allocated,
- not a single process running on VM - of course there will be a lot running but when proving error nothing was running, so I could see errors in syslog with nothing writting / reading out of drive (drive just sits with no FS and I get errors)
- I don't use swap space - I despise them - so there is nothing in host and guest either.
 
I was not aware of fixes that you guys are making (or those were not communicated properly).
From your statement I can gather that fix is essentially making sure that I won't use virtio_scsi - but more inept and performance hitting virtio_blk (I know that i's still virtio_scsi, but it will mask it behind a "file like" emulation that is domain of virtio_blk). So this is not a fix per say - it's making sure that non functioning driver is not used at expense of performance.

Essentially I will try to get some spare server soon, for time being I can't stop production servers.

the following two fixes were released:
  • qemu-server >= 4.0-106: only use direct pass-through via scsi-block when explicitly requested, use scsi-hd by default
  • pve-kernel-4.4.35-2-pve >= 4.4.35-79: enable transparent huge pages for programs explicitly requesting it via madvise
the first one means the issue does not occur anymore when running the default configuration, but you lose the full pass-through (which should only be relevant if you really need to issue raw SCSI / ATA commands to the devices). the second one makes the issue less likely to occur when using full pass-through with scsi-block (because it reduces memory fragmentation, which seems to be the root cause).

neither of the changes has anything to do with switching from virtio-scsi to virtio-blk. hope this clears the situation up a bit - if you have more questions, feel free to ask!
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!