What is "kvm: Desc next is 3" indicative of?

davemcl

Member
Sep 24, 2022
135
14
23
I have had another
Jun 11 19:29:24 pve03 QEMU[2198]: kvm: Desc next is 3
on the host, this results in the following in Windows guest event log
Reset to device, \Device\RaidPort4, was issued.
and the VM gets locked up and cant write to disks till a full reboot is made. Actually, I usually have to stop the VM due to it not responding at all.

The error message seems to originate here

https://github.com/qemu/qemu/blob/c3f9aa8e488db330197c9217e38555f6772e8f07/hw/virtio/virtio.c#L1057

Anyone have a clue why this is happening?

I also get
Jun 09 19:09:50 pve03 QEMU[2389459]: kvm: virtio: zero sized buffers are not allowed
followed by the same behaviour in Windows guests.
 
Hi,
please share the output of pveversion -v and qm config <ID> with the ID of your VM. Do these errors happen when you do something specific? Can you share a bit more of the surrounding log?
 
Code:
proxmox-ve: 7.4-1 (running kernel: 6.2.11-2-pve)
pve-manager: 7.4-13 (running version: 7.4-13/46c37d9c)
pve-kernel-6.2: 7.4-3
pve-kernel-5.15: 7.4-3
pve-kernel-6.1: 7.3-6
pve-kernel-6.2.11-2-pve: 6.2.11-2
pve-kernel-6.1.15-1-pve: 6.1.15-1
pve-kernel-5.15.107-2-pve: 5.15.107-2
pve-kernel-5.15.74-1-pve: 5.15.74-1
ceph-fuse: 15.2.17-pve1
corosync: 3.1.7-pve1
criu: 3.15-1+pve-1
glusterfs-client: 9.2-1
ifupdown2: 3.1.0-1+pmx4
ksm-control-daemon: 1.4-1
libjs-extjs: 7.0.0-1
libknet1: 1.24-pve2
libproxmox-acme-perl: 1.4.4
libproxmox-backup-qemu0: 1.3.1-1
libproxmox-rs-perl: 0.2.1
libpve-access-control: 7.4.1
libpve-apiclient-perl: 3.2-1
libpve-common-perl: 7.4-1
libpve-guest-common-perl: 4.2-4
libpve-http-server-perl: 4.2-3
libpve-rs-perl: 0.7.7
libpve-storage-perl: 7.4-3
libspice-server1: 0.14.3-2.1
lvm2: 2.03.11-2.1
lxc-pve: 5.0.2-2
lxcfs: 5.0.3-pve1
novnc-pve: 1.4.0-1
proxmox-backup-client: 2.4.2-1
proxmox-backup-file-restore: 2.4.2-1
proxmox-kernel-helper: 7.4-1
proxmox-mail-forward: 0.1.1-1
proxmox-mini-journalreader: 1.3-1
proxmox-widget-toolkit: 3.7.2
pve-cluster: 7.3-3
pve-container: 4.4-4
pve-docs: 7.4-2
pve-edk2-firmware: 3.20230228-4~bpo11+1
pve-firewall: 4.3-4
pve-firmware: 3.6-5
pve-ha-manager: 3.6.1
pve-i18n: 2.12-1
pve-qemu-kvm: 7.2.0-8
pve-xtermjs: 4.16.0-2
qemu-server: 7.4-3
smartmontools: 7.2-pve3
spiceterm: 3.2-2
swtpm: 0.8.0~bpo11+3
vncterm: 1.7-1
zfsutils-linux: 2.1.11-pve1

Code:
agent: 1
bios: ovmf
boot: order=scsi0;net0;ide0
cores: 12
cpu: Skylake-Server
description: #### Windows Server 2016%0A#### SQL Server 2019%0A
efidisk0: SSD-R10:vm-201-disk-0,efitype=4m,format=raw,pre-enrolled-keys=1,size=528K
ide0: none,media=cdrom
machine: pc-q35-7.1
memory: 61440
meta: creation-qemu=7.1.0,ctime=1676181242
name: IRIS
net0: virtio=86:DB:0D:46:C1:58,bridge=vmbr0,firewall=1,tag=25
numa: 1
onboot: 1
ostype: win10
scsi0: SSD-R10:vm-201-disk-1,discard=on,format=raw,iothread=1,size=96G
scsi1: SSD-R10:vm-201-disk-2,discard=on,iothread=1,size=320G
scsi2: SSD-R10:vm-201-disk-3,discard=on,iothread=1,size=128G
scsi3: SSD-R10:vm-201-disk-4,discard=on,iothread=1,size=64G
scsihw: virtio-scsi-single
smbios1: uuid=973a474f-45aa-4caa-ab0e-bace1c0aa76e
sockets: 1
tags: windows
vmgenid: 62178394-12fe-4df7-ae25-839471658f30

These VM's pretty much have a host all to themselves as SQL & Analysis Services is doing lots of reads/writes.

The host syslog doesnt really point to anything

Code:
Jun 09 19:06:55 pve03 pmxcfs[1672]: [status] notice: received log
Jun 09 19:07:48 pve03 pveproxy[204371]: worker exit
Jun 09 19:07:48 pve03 pveproxy[1836]: worker 204371 finished
Jun 09 19:07:48 pve03 pveproxy[1836]: starting 1 worker(s)
Jun 09 19:07:48 pve03 pveproxy[1836]: worker 267905 started
Jun 09 19:09:25 pve03 smartd[1402]: Device: /dev/bus/0 [megaraid_disk_00] [SAT], SMART Usage Attribute: 190 Airflow_Temperature_Cel changed from 72 to 73
Jun 09 19:09:25 pve03 smartd[1402]: Device: /dev/bus/0 [megaraid_disk_01] [SAT], SMART Usage Attribute: 190 Airflow_Temperature_Cel changed from 73 to 74
Jun 09 19:09:25 pve03 smartd[1402]: Device: /dev/bus/0 [megaraid_disk_02] [SAT], SMART Usage Attribute: 190 Airflow_Temperature_Cel changed from 73 to 74
Jun 09 19:09:25 pve03 smartd[1402]: Device: /dev/bus/0 [megaraid_disk_03] [SAT], SMART Usage Attribute: 190 Airflow_Temperature_Cel changed from 73 to 74
Jun 09 19:09:50 pve03 QEMU[2389459]: kvm: virtio: zero sized buffers are not allowed
Jun 09 19:14:26 pve03 pmxcfs[1672]: [dcdb] notice: data verification successful
Jun 09 19:17:01 pve03 CRON[274691]: pam_unix(cron:session): session opened for user root(uid=0) by (uid=0)
Jun 09 19:17:01 pve03 CRON[274692]: (root) CMD (   cd / && run-parts --report /etc/cron.hourly)
Jun 09 19:17:01 pve03 CRON[274691]: pam_unix(cron:session): session closed for user root

I have also tried all 3 kernels 5.15, 6.1 & 6.2

It occurs quite randomly too, 2 nights in a row, then good for a few days.

Thanks
 
Is there a straight forward way to monitor disk IO of the 4 VM drives, so I can reconcile the issue happening at a time of high IO?
 
I've searched around a bit and I found something that might be related: https://gitlab.com/qemu-project/qemu/-/commit/bbc1c327d7974261c61566cdb950cc5fa0196b41

While the symptoms described there are worse (assertion failure), it's the same code and maybe the issue manifests differently in your case. The fix is included in QEMU 8.0 which is the version used by the upcoming Proxmox VE 8. It might also land in Proxmox VE 7 as part of QEMU 7.2.3, but might take a bit of time.
 
Towards the bottom of the thread io_uring is mentioned quite a bit.
Theres also a comment about avoiding io_uring on lvm but its from 2021.
 
Towards the bottom of the thread io_uring is mentioned quite a bit.
Theres also a comment about avoiding io_uring on lvm but its from 2021.
Yes, there were some issues with io_uring in the beginning. And for certain storages like LVM, it's still not used as the default. But otherwise it should be stable now. If you are using iothread, IO is handled in a dedicated thread and not in the main thread, so it doesn't block virtual CPUs.
 
Got it again - does that trace confirm anything?

QEMU[1961]: kvm: virtio: zero sized buffers are not allowed

Code:
strace -c -p $(cat /var/run/qemu-server/201.pid)
strace: Process 1961 attached
^Cstrace: Process 1961 detached
% time     seconds  usecs/call     calls    errors syscall
------ ----------- ----------- --------- --------- ----------------
 95.41    0.377933         330      1145           ppoll
  2.38    0.009431           2      4080           write
  1.42    0.005609           5      1098           read
  0.69    0.002730           2       998           recvmsg
  0.03    0.000138           2        53         1 futex
  0.03    0.000119           5        20           sendmsg
  0.02    0.000072           2        30           ioctl
  0.01    0.000034           8         4           close
  0.00    0.000015           3         4           accept4
  0.00    0.000010           1         8           fcntl
  0.00    0.000005           1         4           getsockname
------ ----------- ----------- --------- --------- ----------------
100.00    0.396096          53      7444         1 total

Code:
root@pve03:~# qm config 201
agent: 1
bios: ovmf
boot: order=scsi0;net0;ide0
cores: 12
cpu: Skylake-Server
description: #### Windows Server 2016%0A#### SQL Server 2019
efidisk0: SSD-R10:vm-201-disk-0,efitype=4m,format=raw,pre-enrolled-keys=1,size=528K
ide0: none,media=cdrom
machine: pc-q35-7.1
memory: 61440
meta: creation-qemu=7.1.0,ctime=1676181242
name: IRIS
net0: virtio=86:DB:0D:46:C1:58,bridge=vmbr0,firewall=1,tag=25
numa: 1
onboot: 1
ostype: win10
scsi0: SSD-R10:vm-201-disk-1,discard=on,format=raw,iothread=1,size=96G
scsi1: SSD-R10:vm-201-disk-2,discard=on,iothread=1,size=320G
scsi2: SSD-R10:vm-201-disk-3,discard=on,iothread=1,size=160G
scsi3: SSD-R10:vm-201-disk-4,discard=on,iothread=1,size=64G
scsi4: SSD-R10:vm-201-disk-5,discard=on,iothread=1,size=448G
scsihw: virtio-scsi-single
smbios1: uuid=973a474f-45aa-4caa-ab0e-bace1c0aa76e
sockets: 1
tags: ims;windows
vmgenid: 62178394-12fe-4df7-ae25-839471658f30
 
I've searched around a bit and I found something that might be related: https://gitlab.com/qemu-project/qemu/-/commit/bbc1c327d7974261c61566cdb950cc5fa0196b41

While the symptoms described there are worse (assertion failure), it's the same code and maybe the issue manifests differently in your case. The fix is included in QEMU 8.0 which is the version used by the upcoming Proxmox VE 8. It might also land in Proxmox VE 7 as part of QEMU 7.2.3, but might take a bit of time.

Am now on PVE8, will see how things go...
 
I managed to reproduce the error QEMU[xxxx]: kvm: Desc next is 3 error using Microsofts DiskSpd tool.
https://github.com/microsoft/diskspd

  1. Create Windows 2019 Server VM with 8 cores/1 socket and 12GB of RAM - using latest VirtIO drivers 0.1.229
    Machine = pc-q35-8.0, Controller = VirtIO SCSI Single,
    My Drives:
    scsi0: tank:vm-110-disk-2,discard=on,iothread=1,size=48G,ssd=1,aio=io_uring
    scsi1: tank:vm-110-disk-1,discard=on,iothread=1,size=112G,ssd=1, aio=io_uring
    I have ballooning enabled - not sure if relevant & min/max memory set to 12GB
  2. Create 2nd drive, format with NTFS & 64k allocation unit size.
  3. Download Diskspd & extract x64 binaries to C:\Windows
  4. Create 2 test files in D:\diskspd - from powershell
    diskspd -c8G D:\diskspd\iotest2.dat
    diskspd -c8G D:\diskspd\iotest3.dat
  5. Fill files with random data
    diskspd.exe -w100 -Zr -d460 .\iotest2.dat
    diskspd.exe -w100 -Zr -d460 .\iotest3.dat
  6. Open 2nd powershell and run next 2 commands concurrently
    diskspd -b8K -d400 -Sh -L -o32 -t8 -r -w0 .\iotest2.dat
    diskspd -b64K -d400 -Sh -L -o32 -t8 -r -w0 .\iotest3.dat
If last step completes and outputs stats, re-run again.
Underlying storage on my test is RAID10 ZFS pool on 4 x Samsung 883 SSD (the higher the iops, the less this error occurs for me)
From 5 attempts I got it to throw 4 times... YMMV
 
Hi,
I have a similar problem with host
Windows Server 2022 Standard
with SQL
on promox. Reporting an error in the log and corrupting random databases:"Reset to device, \Device\RaidPort2, was issued."

Code:
root@pve:~# qm config 102
agent: 1
bios: ovmf
boot: order=scsi0;net0;ide0
cores: 4
cpu: host
efidisk0: local-lvm:vm-102-disk-0,efitype=4m,pre-enrolled-keys=1,size=4M
machine: pc-q35-8.0
memory: 32768
meta: creation-qemu=7.1.0,ctime=1673047081
name: db1
net0: virtio=62:CE:10:0B:E8:EC,bridge=vmbr0,firewall=1
numa: 0
onboot: 1
ostype: win11
parent: CzteryBazyuUzkodzone
scsi0: local-lvm:vm-102-disk-1,cache=writeback,discard=on,size=150G
scsi1: local-lvm:vm-102-disk-2,cache=writeback,discard=on,size=300G
scsi2: local-lvm:vm-102-disk-3,cache=writeback,discard=on,size=400G
scsihw: virtio-scsi-single
smbios1: uuid=57f34b1c-b9a6-41fb-aa49-fd44afaec0ae
sockets: 1
startup: order=2,up=5
tpmstate0: local-lvm:vm-102-disk-4,size=4M,version=v2.0
usb0: host=090c:1000
usb1: host=058f:6387
usb2: host=096e:0201
vmgenid: 838f76d0-34e9-4faf-8a73-27a4de3f3b66
root@pve:~#

Code:
root@pve:~#
proxmox-ve: 8.0.2 (running kernel: 6.2.16-19-pve)
pve-manager: 8.0.4 (running version: 8.0.4/d258a813cfa6b390)
pve-kernel-6.2: 8.0.5
proxmox-kernel-helper: 8.0.3
proxmox-kernel-6.2.16-19-pve: 6.2.16-19
proxmox-kernel-6.2: 6.2.16-19
pve-kernel-6.2.16-3-pve: 6.2.16-3
ceph-fuse: 17.2.6-pve1+3
corosync: 3.1.7-pve3
criu: 3.17.1-2
glusterfs-client: 10.3-5
ifupdown2: 3.2.0-1+pmx5
ksm-control-daemon: 1.4-1
libjs-extjs: 7.0.0-4
libknet1: 1.28-pve1
libproxmox-acme-perl: 1.4.6
libproxmox-backup-qemu0: 1.4.0
libproxmox-rs-perl: 0.3.1
libpve-access-control: 8.0.5
libpve-apiclient-perl: 3.3.1
libpve-common-perl: 8.0.9
libpve-guest-common-perl: 5.0.5
libpve-http-server-perl: 5.0.4
libpve-rs-perl: 0.8.5
libpve-storage-perl: 8.0.2
libspice-server1: 0.15.1-1
lvm2: 2.03.16-2
lxc-pve: 5.0.2-4
lxcfs: 5.0.3-pve3
novnc-pve: 1.4.0-2
proxmox-backup-client: 3.0.4-1
proxmox-backup-file-restore: 3.0.4-1
proxmox-kernel-helper: 8.0.3
proxmox-mail-forward: 0.2.0
proxmox-mini-journalreader: 1.4.0
proxmox-widget-toolkit: 4.0.9
pve-cluster: 8.0.4
pve-container: 5.0.5
pve-docs: 8.0.5
pve-edk2-firmware: 3.20230228-4
pve-firewall: 5.0.3
pve-firmware: 3.8-3
pve-ha-manager: 4.0.2
pve-i18n: 3.0.7
pve-qemu-kvm: 8.0.2-7
pve-xtermjs: 4.16.0-3
qemu-server: 8.0.7
smartmontools: 7.3-pve1
spiceterm: 3.3.0
swtpm: 0.8.0+pve1
vncterm: 1.8.0
zfsutils-linux: 2.1.13-pve1

Can someone help me with what to do and where to look for the problem?
Thanks!
 
Last edited:
Ive been playing with dropping back the machine type from
pc-q35-8.0
to
pc-q35-5.1
Have done 5 test runs with diskspd and havent seen this or similar issues occur.
On boot I only had a C: drive - other drives had to be re-added to the OS via "Disk Management", youll need to do a bit of cleanup in device management too.
Im assuming this would cause a license re-activation too.
 
Ive been playing with dropping back the machine type from
pc-q35-8.0
to
pc-q35-5.1
Have done 5 test runs with diskspd and havent seen this or similar issues occur.
On boot I only had a C: drive - other drives had to be re-added to the OS via "Disk Management", youll need to do a bit of cleanup in device management too.
Im assuming this would cause a license re-activation too.
Thanks for your reply! We have the same problem. You used local storage on ext4? do you tested on lvm stogage ?
 
Thanks for your reply! We have the same problem. You used local storage on ext4? do you tested on lvm stogage ?

In production its RAID10 / LVM-Thin
The testing I did today is SATA SSD mirrored pair on ZFS
 
Hello. I'm new in the forum, and new also with Proxmox (I used to work with VMWare).

I have a brand new server with PVE 8.0.4, and I'm experiencing this same error, on a Windows Server 2022 VM:

Reset to device, \Device\RaidPort0, was issued.

But the issuer is storahci driver.

Also, I have a lot of errorr from ESENT:

svchost (8068,D,0) SoftwareUsageMetrics-Svc: A request to write to the file "C:\Windows\system32\LogFiles\Sum\Svc.log" at offset 524288 (0x0000000000080000) for 4096 (0x00001000) bytes succeeded, but took an abnormally long time (71 seconds) to be serviced by the OS. In addition, 0 other I/O requests to this file have also taken an abnormally long time to be serviced since the last message regarding this problem was posted 94 seconds ago. This problem is likely due to faulty hardware. Please contact your hardware vendor for further assistance diagnosing the problem.

I have a 24 TB RAID 5 storage, on an harware RAID card MegaRAID SAS 9341-4i.

The storage in PVE is set as LVM.

I'm worried because this is a brand new server, and I'm migrating the domain controller and the file server, from VMware.

Is this something with the drivers?
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!