What is "kvm: Desc next is 3" indicative of?

davemcl · Nov 18, 2023

jose.cardoso said:
Hello. I'm new in the forum, and new also with Proxmox (I used to work with VMWare).

I have a brand new server with PVE 8.0.4, and I'm experiencing this same error, on a Windows Server 2022 VM:

Reset to device, \Device\RaidPort0, was issued.

But the issuer is storahci driver.

Also, I have a lot of errorr from ESENT:

svchost (8068,D,0) SoftwareUsageMetrics-Svc: A request to write to the file "C:\Windows\system32\LogFiles\Sum\Svc.log" at offset 524288 (0x0000000000080000) for 4096 (0x00001000) bytes succeeded, but took an abnormally long time (71 seconds) to be serviced by the OS. In addition, 0 other I/O requests to this file have also taken an abnormally long time to be serviced since the last message regarding this problem was posted 94 seconds ago. This problem is likely due to faulty hardware. Please contact your hardware vendor for further assistance diagnosing the problem.

I have a 24 TB RAID 5 storage, on an harware RAID card MegaRAID SAS 9341-4i.

The storage in PVE is set as LVM.

I'm worried because this is a brand new server, and I'm migrating the domain controller and the file server, from VMware.

Is this something with the drivers?

Try rolling back the machine type to 5.1 - try on a test VM first though as youll need to do a bit of re-configuration.

jose.cardoso · Nov 24, 2023

davemcl said:
Try rolling back the machine type to 5.1 - try on a test VM first though as youll need to do a bit of re-configuration.

Thanks @davemcl .
I've changed the type to 5.1 and, it looks like those errors are gone.

What will I lose with this "downgrade" from 8.1 to 5.1?

BenediktS · Jan 23, 2024

We recognised the same problem since 4 weeks. But when we look in the logs, it started at the end of November 2023.
This was at the time we udated to PVE 8.1 . We run the vm disks on ceph . Problematic VMs are Mysql (Linux) und Microsoft SQL (Windows) Servers.
Last night we had the first time we had data curroption at the Microst SQL Server because of this. All the machines are running on q35-8.0 or q35-8.1.

I try to revert to q35-7.2 and see what happens.

BenediktS · Feb 12, 2024

OK, i can give some more infos.
Reverting back to version q35-7.2 didn't work for us.
( We didn't try 5.2. )

But right now we have a stable system for 1 week. (In the last month they crashed every 2 days)
We have achived this by deactivating "IO thread" on the high IO VMs.
(q35-8.1)

Max2048 · Feb 21, 2024

We're plagued by this issue since PVE 7.2, more than 2 years ago... Did disabling iothread resolve this problem consistently? We had a stable system with iothread enabled for a few months, now it started to freeze I/O again, daily. (using CEPH with NVMe disks)

Max2048 · Feb 21, 2024

@BenediktS Do you use aio=native or aio=io_uring?

BenediktS · Feb 21, 2024

I can only say, it does for us for 3 weeks now.
We only disabled iothread on our database servers. But that are also the only VMs that crashed every 2 days in the last 2 month.

I will update this post if one of our database server crashes again. Until then it is working for us.

BenediktS · Feb 21, 2024

@Max2048: aio=io_uring , ceph (8.2.1 , all nvme, 40Gbit Network, 39 OSDs), PVE 8.1.4

Max2048 · Feb 21, 2024

Thanks! This is my current VM Config (qm config 104):

Code:

agent: 1
balloon: 0
boot: order=scsi0
cores: 6
cpu: host
description: MSSQL
hotplug: 0
machine: pc-q35-6.2
memory: 32768
meta: creation-qemu=6.2.0,ctime=1657105335
name: DB01
net0: virtio=76:90:CE:B7:21:69,bridge=vmbr4003
numa: 0
onboot: 1
ostype: win11
scsi0: ceph_storage:vm-104-disk-0,aio=native,discard=on,size=100G
scsi1: ceph_storage:vm-104-disk-1,aio=native,discard=on,size=1005G
scsi2: ceph_storage:vm-104-disk-2,aio=native,discard=on,size=1000G
scsi3: ceph_storage:vm-104-disk-3,aio=native,discard=on,size=950G
scsihw: virtio-scsi-single
smbios1: uuid=5829521f-0c23-431c-9806-c7ee8aab6694
sockets: 1
startup: order=3,up=60,down=30
vmgenid: 1c6d5ca2-862b-49c5-848d-453cbe000164

I've now disabled IOthread as well!

confusing-charger · Apr 2, 2024

We also had this error for the first time last weekend, with the same symptoms as described here in the thread.

Eventlog:

Code:

Warning        31.03.2024 23:51:17        vioscsi        129
Reset to device "\Device\RaidPort2" was issued.

Syslog:

Code:

Mar 31 23:51:17 pve2 QEMU[18196]: kvm: Desc next is 12

VM config:

Code:

agent: 1,freeze-fs-on-backup=0
bios: ovmf
boot: order=ide0;scsi0
cores: 2
cpu: host
efidisk0: pool01:vm-112-disk-0,efitype=4m,pre-enrolled-keys=1,size=528K
ide0: none,media=cdrom
machine: pc-q35-7.2
memory: 65536
meta: creation-qemu=7.2.0,ctime=1681376113
name: db02
net0: virtio=96:88:2D:xx:yy:zz,bridge=vmbr0,tag=xx
numa: 0
onboot: 1
ostype: win11
scsi0: pool01:vm-112-disk-1,discard=on,iothread=1,size=100G,ssd=1
scsi1: pool01:vm-112-disk-3,discard=on,iothread=1,size=300G
scsi2: pool01:vm-112-disk-4,discard=on,iothread=1,size=200G
scsi3: pool01:vm-112-disk-5,iothread=1,size=300G
scsihw: virtio-scsi-single
sockets: 2
tags: windows
tpmstate0: pool01:vm-112-disk-2,size=4M,version=v2.0

pveversion -v

Code:

proxmox-ve: 7.4-1 (running kernel: 5.15.131-2-pve)
pve-manager: 7.4-17 (running version: 7.4-17/513c62be)
pve-kernel-5.15: 7.4-9
pve-kernel-5.15.131-2-pve: 5.15.131-3
pve-kernel-5.15.108-1-pve: 5.15.108-2
pve-kernel-5.15.74-1-pve: 5.15.74-1
ceph: 17.2.7-pve2~bpo11+1
ceph-fuse: 17.2.7-pve2~bpo11+1
corosync: 3.1.7-pve1
criu: 3.15-1+pve-1
glusterfs-client: 9.2-1
ifupdown2: 3.1.0-1+pmx4
ksm-control-daemon: 1.4-1
libjs-extjs: 7.0.0-1
libknet1: 1.24-pve2
libproxmox-acme-perl: 1.4.4
libproxmox-backup-qemu0: 1.3.1-1
libproxmox-rs-perl: 0.2.1
libpve-access-control: 7.4.1
libpve-apiclient-perl: 3.2-1
libpve-common-perl: 7.4-2
libpve-guest-common-perl: 4.2-4
libpve-http-server-perl: 4.2-3
libpve-rs-perl: 0.7.7
libpve-storage-perl: 7.4-3
libspice-server1: 0.14.3-2.1
lvm2: 2.03.11-2.1
lxc-pve: 5.0.2-2
lxcfs: 5.0.3-pve1
novnc-pve: 1.4.0-1
proxmox-backup-client: 2.4.6-1
proxmox-backup-file-restore: 2.4.6-1
proxmox-kernel-helper: 7.4-1
proxmox-mail-forward: 0.1.1-1
proxmox-mini-journalreader: 1.3-1
proxmox-widget-toolkit: 3.7.3
pve-cluster: 7.3-3
pve-container: 4.4-6
pve-docs: 7.4-2
pve-edk2-firmware: 3.20230228-4~bpo11+2
pve-firewall: 4.3-5
pve-firmware: 3.6-6
pve-ha-manager: 3.6.1
pve-i18n: 2.12-1
pve-qemu-kvm: 7.2.0-8
pve-xtermjs: 4.16.0-2
qemu-server: 7.4-4
smartmontools: 7.2-pve3
spiceterm: 3.2-2
swtpm: 0.8.0~bpo11+3
vncterm: 1.7-1
zfsutils-linux: 2.1.14-pve1

We have a 5-node cluster with Ceph running on 4 NVMe per node.

We will continue to monitor this for the time being, as it has only occurred once so far.
If it occurs repeatedly or more frequently, we will first switch the vDisks to iothread=0 and see if the problem disappears.

Gektor · Apr 4, 2024

Have exactly same errors on two new Proxmox 8 servers:

First server:
96 x AMD EPYC 9454P 48-Core Processor (1 Socket), 384 GB RAM, 2 x INTEL SSDPF2KX038T1

Windows Server 2019 + MSSQL 2019

Code:

pveversion -v

proxmox-ve: 8.1.0 (running kernel: 6.5.13-3-pve)
pve-manager: 8.1.10 (running version: 8.1.10/4b06efb5db453f29)
proxmox-kernel-helper: 8.1.0
proxmox-kernel-6.5.13-3-pve-signed: 6.5.13-3
proxmox-kernel-6.5: 6.5.13-3
ceph-fuse: 16.2.11+ds-2
corosync: 3.1.7-pve3
criu: 3.17.1-2
glusterfs-client: 10.3-5
ifupdown: residual config
ifupdown2: 3.2.0-1+pmx8
libjs-extjs: 7.0.0-4
libknet1: 1.28-pve1
libproxmox-acme-perl: 1.5.0
libproxmox-backup-qemu0: 1.4.1
libproxmox-rs-perl: 0.3.3
libpve-access-control: 8.1.3
libpve-apiclient-perl: 3.3.2
libpve-cluster-api-perl: 8.0.5
libpve-cluster-perl: 8.0.5
libpve-common-perl: 8.1.1
libpve-guest-common-perl: 5.0.6
libpve-http-server-perl: 5.0.6
libpve-rs-perl: 0.8.8
libpve-storage-perl: 8.1.4
libspice-server1: 0.15.1-1
lvm2: 2.03.16-2
lxc-pve: 5.0.2-4
lxcfs: 5.0.3-pve4
novnc-pve: 1.4.0-3
proxmox-backup-client: 3.1.5-1
proxmox-backup-file-restore: 3.1.5-1
proxmox-kernel-helper: 8.1.0
proxmox-mail-forward: 0.2.3
proxmox-mini-journalreader: 1.4.0
proxmox-widget-toolkit: 4.1.5
pve-cluster: 8.0.5
pve-container: 5.0.9
pve-docs: 8.1.5
pve-edk2-firmware: not correctly installed
pve-firewall: 5.0.3
pve-firmware: 3.9-2
pve-ha-manager: 4.0.3
pve-i18n: 3.2.1
pve-qemu-kvm: 8.1.5-4
pve-xtermjs: 5.3.0-3
qemu-server: 8.1.1
smartmontools: 7.3-pve1
spiceterm: 3.3.0
swtpm: 0.8.0+pve1
vncterm: 1.8.0
zfsutils-linux: 2.2.3-pve1

Code:

qm config 110

agent: 1
balloon: 0
bios: ovmf
boot: order=scsi0
cores: 40
cpu: host
efidisk0: system:vm-110-disk-1,efitype=4m,pre-enrolled-keys=1,size=1M
hotplug: disk,network
machine: pc-q35-7.2
memory: 131072
name: pc010
net0: virtio=<hided>,bridge=vmbr1,queues=2
numa: 0
onboot: 1
ostype: win10
scsi0: system:vm-110-disk-0,discard=on,iothread=1,replicate=0,size=200G
scsi1: data:vm-110-disk-0,discard=on,replicate=0,size=1000G
scsi2: swap:vm-110-disk-3,discard=on,replicate=0,size=300G
scsi3: swap:vm-110-disk-4,discard=on,iothread=1,replicate=0,size=500G
scsihw: virtio-scsi-single
smbios1: uuid=<hided>
sockets: 1
startup: order=4
tablet: 0
tpmstate0: system:vm-110-disk-2,size=4M,version=v2.0
vga: qxl
vmgenid: d2c00e94-6810-4994-8001-6f95939d5081

It's 3 days after upgrade from Proxmox 7.4 with 32 x AMD Ryzen 9 5950X 16-Core Processor (1 Socket) and it's works stable for an months without any 129 errors on old system, but now it hangs very often (3...4 times on a day), hangs without possibility to reboot or shutdown from guest OS. Hangs only with disks on which MSSQL store and works with databases (data:vm-110-disk-0,discard=on,replicate=0,size=1000G) / tempdb(swap:vm-110-disk-3,discard=on,replicate=0,size=300G).
Have try change machine to pc-q35-8.1, but seems that with 7.2 and iothread=0 for MSSQL database and tempdb disks it's works more stable.

Second server:

48 x Intel(R) Xeon(R) Gold 5412U (1 Socket), 256 GB RAM, 2 x SAMSUNG MZQL21T9HCJR-00A07

Windows Server 2022 + MSSQL 2019

Code:

pveversion -v

proxmox-ve: 8.1.0 (running kernel: 6.5.11-4-pve)
pve-manager: 8.1.8 (running version: 8.1.8/d29041d9f87575d0)
proxmox-kernel-helper: 8.1.0
pve-kernel-6.2: 8.0.5
proxmox-kernel-6.5.13-3-pve-signed: 6.5.13-3
proxmox-kernel-6.5: 6.5.13-3
proxmox-kernel-6.5.13-1-pve-signed: 6.5.13-1
proxmox-kernel-6.5.11-4-pve-signed: 6.5.11-4
proxmox-kernel-6.5.11-3-pve: 6.5.11-3
proxmox-kernel-6.2.16-20-pve: 6.2.16-20
proxmox-kernel-6.2: 6.2.16-20
proxmox-kernel-6.2.16-19-pve: 6.2.16-19
ceph-fuse: 16.2.11+ds-2
corosync: 3.1.7-pve3
criu: 3.17.1-2
glusterfs-client: 10.3-5
ifupdown: residual config
ifupdown2: 3.2.0-1+pmx8
libjs-extjs: 7.0.0-4
libknet1: 1.28-pve1
libproxmox-acme-perl: 1.5.0
libproxmox-backup-qemu0: 1.4.1
libproxmox-rs-perl: 0.3.3
libpve-access-control: 8.1.3
libpve-apiclient-perl: 3.3.2
libpve-cluster-api-perl: 8.0.5
libpve-cluster-perl: 8.0.5
libpve-common-perl: 8.1.1
libpve-guest-common-perl: 5.0.6
libpve-http-server-perl: 5.0.6
libpve-rs-perl: 0.8.8
libpve-storage-perl: 8.1.3
libspice-server1: 0.15.1-1
lvm2: 2.03.16-2
lxc-pve: 5.0.2-4
lxcfs: 5.0.3-pve4
novnc-pve: 1.4.0-3
proxmox-backup-client: 3.1.5-1
proxmox-backup-file-restore: 3.1.5-1
proxmox-kernel-helper: 8.1.0
proxmox-mail-forward: 0.2.3
proxmox-mini-journalreader: 1.4.0
proxmox-widget-toolkit: 4.1.5
pve-cluster: 8.0.5
pve-container: 5.0.9
pve-docs: 8.1.5
pve-edk2-firmware: 4.2023.08-4
pve-firewall: 5.0.3
pve-firmware: 3.9-2
pve-ha-manager: 4.0.3
pve-i18n: 3.2.1
pve-qemu-kvm: 8.1.5-4
pve-xtermjs: 5.3.0-3
qemu-server: 8.1.1
smartmontools: 7.3-pve1
spiceterm: 3.3.0
swtpm: 0.8.0+pve1
vncterm: 1.8.0
zfsutils-linux: 2.2.3-pve1

Code:

qm config 311

agent: 1,fstrim_cloned_disks=1
balloon: 0
bios: ovmf
boot: order=scsi0
cores: 24
cpu: host
efidisk0: system:vm-311-disk-0,efitype=4m,pre-enrolled-keys=1,size=1M
hotplug: disk,network
machine: pc-q35-8.1
memory: 83968
meta: creation-qemu=6.1.0,ctime=1638370104
name: VHost-002
net0: virtio=<hided>,bridge=vmbr1,queues=2
numa: 0
onboot: 1
ostype: win11
scsi0: system:vm-311-disk-1,discard=on,iothread=1,size=200G
scsi1: system:vm-311-disk-2,discard=on,iothread=1,size=2000G
scsi2: swap:vm-311-disk-0,discard=on,iothread=1,size=256G
scsi3: data:vm-311-disk-1,discard=on,iothread=1,replicate=0,size=2000G
scsi4: data:vm-311-disk-0,iothread=1,replicate=0,size=2000G
scsihw: virtio-scsi-single
smbios1: uuid=<hided>
sockets: 1
startup: order=5,up=60
tablet: 0
vga: qxl
vmgenid: 0bec1e32-1793-4f54-bf60-cf05cef821e9
vmstatestorage: system

This system has errors 129 on system logs, but it doesn't hangs even with iothread=1.

Is there any way make first system more stable?

Gektor · Apr 6, 2024

Temporary solution:
https://github.com/virtio-win/kvm-guest-drivers-windows/issues/623#issuecomment-2041014083

Because with VirtiIO SCSI it's absolutely unstable for now.

davemcl · Apr 6, 2024

@Gektor Would be interested what performance hit you take doing that.
Recently a Promox employee commented on a Github thread that adding a VirtIO-Blk disk seemed to help in their testing.

Gektor · Apr 7, 2024

davemcl said:
Would be interested what performance hit you take doing that.

I have no choice, because with VirtIO-SCSI system hangs every hour-two...
I will try VirtIO Block device instead of SATA, and report results.

Gektor · Apr 8, 2024

davemcl said:
@Gektor Would be interested what performance hit you take doing that.
Recently a Promox employee commented on a Github thread that adding a VirtIO-Blk disk seemed to help in their testing.

You was right, VirtIO Block with IO thread: ON working without any issues (but very important to set SCSI Controller to Default).

davemcl · Apr 8, 2024

Gektor said:
You was right, VirtIO Block with IO thread: ON working without any issues (but very important to set SCSI Controller to Default).

Was that just adding a spare drive or did you move them from SCSI to VirtIO-Blk?

Did you remove the SCSI controller completely and change to default?

Gektor · Apr 8, 2024

All disks move to VirtIO Block with IO thread: ON, SCSI Controller to Default (if set to VirtIO SCSI Single, even without disks, - system disks will randomly hangs). Look like it's weird bug with VirtIO SCSI drivers/emulation in Proxmox 8.

Max2048 · Apr 20, 2024

Gektor said:
All disks move to VirtIO Block with IO thread: ON, SCSI Controller to Default (if set to VirtIO SCSI Single, even without disks, - system disks will randomly hangs). Look like it's weird bug with VirtIO SCSI drivers/emulation in Proxmox 8.

When you switched the disks from Virtio SCSI to VirtIO Block and changed the SCSI Controller did you need to re-initialize the disks in Windows guest (assign the drive letters again) or did Windows figure it out on its own?

Gektor · Apr 20, 2024

It's all works without re-initialiation.

Max2048 · Apr 20, 2024

Gektor said:
It's all works without re-initialiation.

Thanks, interesting! Did you do I/O benchmarks before and after the changes? Is the performance much worse?

What is "kvm: Desc next is 3" indicative of?

Member

New Member

Member

Member

Member

Member

Member

Member

Member

New Member

Member

Member

Member

Member

Member

Member

Member

Member

Member

Member