Q35 v11.0 causes extreme corruption on many VMs Linux/BSD/Windows (STOP 0x8007025d)

THE_BIG_ONE

New Member
Jul 15, 2025
14
11
3
So, I've been currently running about 20 Windows guests (using VirtIO SCSI) in my home environment across 8 different PCs running Proxmox for several years. The Proxmox host PCs have a wide range of hardware ranging from an old Intel 4770k to a newer Ryzen 3700X. All systems using NVME ssds and sata SSDs as their primary drives.

About a week ago, I experienced major corruption on my router PC (Lenovo M715Q with multiple NICs) running proxmox and OPNSENSE as a VM. Drive corruption required me to reimage the disk from a backup I made.

This week, I decided to push the latest windows updates out to all my home PCs from WSUS server. During the May 2025 Windows cumulative update process from my WSUS server, all my home PCs and VMs succeeded successfully except for 4. These four exceptions were corrupted on three different machines. Two on an intel 4770k Based system, one AMD FX based system, and one AMD Ryzen 3700X based system. At this point the systems were not able to update, one was not able to boot. The three that were able to boot were not able to even perform an 'in-place' upgrade / repair from current Windows 11 ISO media. I began two days of OS repairs and troubleshooting as I did not have backups of these systems. I also thoroughly tested the NVMEs in all machines for any corruption, bad blocks, bad smart values, etc. Tested RAM, etc.

I also attempted forcefully installing the latest cumulative update by downloading the standalone installer, because I thought maybe something was corrupted when it was downloading from my WSUS server, even though every other system completed successfully. While performing DISM repair commands, all four corrupted systems were failing to repair, failing to clean component store, failing to "/restorehealth". Upon examining CBS.log and DISM.log on these systems, I kept noticing some strange errors that I haven't seen in 20 years of windows update fuckery.
Code:
dism /online /cleanup-image /restorehealth /source:WIM:Z:\26200.8457_amd64_en-us_multi_a90228a3_convert\ISOFOLDER\sources\install.wim:1 /limitaccess

Deployment Image Servicing and Management tool
Version: 10.0.26100.5074

Image Version: 10.0.26200.8246

[=========================  44.5%                          ]
Error: 605

The specified buffer contains ill-formed data.

Then inspecting DISM.log, I kept seeing mentions of 0x80007025d during update, repair, and dism repair challanges.
Code:
2026-05-15 05:39:24, Error                 DISM   DISM Package Manager: PID=7332 TID=3300 Failed finalizing changes. - CDISMPackageManager::Internal_Finalize(hr:0x8007025d)

That stop code translates to:
HRESULT=0x8007025d (ERROR_BAD_COMPRESSION_BUFFER)

After hitting walls repairing the corrupted VM(s), I made a clonezilla image of one of the corrupted VMs, and restored it to my main desktop system running in a virtualbox VM. There I attempted an in-place upgrade/repair using windows media and it succeeded. So, that confirmed my suspicions about something happening with Proxmox or a proxmox scsi driver. At this point, I went back to two of the bootable, but corrupted Win11 guests on two different proxmox hosts, and changed VirtIO Scsi controller drivers, changed to SATA instead of Virtio-scsi, etc. I spent the next 2 days trying to track down the exact issue causing this instability. Every change I made, took about an hour or so to test on an installation of windows, so this has been extremely slow and frusterating. Seeing that "Bad Compression Buffer" made me think it was a deep seeded bug related to the SCSI controller drivers or cache settings, so one by one I tried each cache setting and attaching the drive in every combination of ways to the VM with each controller available. All attempts failed on a repair install, or manual DISM /cleanup-image /restorehealth, etc.

Finally, on two proxmox hosts, I attempted to create fresh new VMs and install Windows 11 from an ISO. Each system failed at ~98% completion with 0x8007025d. This immediately told me something was wrong with recent proxmox updates. After changing settings and trying multiple configurations, I downgraded the machine type from Q35 version 11.0 to Q35 v10.2 and all installs progressed properly. I then attempted repairs on the other corrupted VMs that kept throwing 0x8007025d stop errors after setting Q35 v10.2 and all systems successfully repaired (except for the one so badly damaged that neither a repair, nor DISM tools could repair the corruption).

That said, over the past couple years, when a new machine type version has become available, I've gone through each VM and updated it's architecture to it. This time was no exception; however, everything appeared to operate correctly, until it didn't. I never attempted a fresh install on v11.0 because I haven't had a new need until now. I have noticed a few posts on this forum with people reporting that same error on Windows VMs (0x8007025d) with no replies or comments. As I said previously, I also experienced drive corruption on my router PC running OPNSense which is based on FreeBSD, so it's not 100% limited to Windows. After 5 days of troubleshooting, I can confidently say that whatever is happening, it's happening when Machine type is Q35 and Version is 11.0.

I'm not exactly sure why 11.0 is triggering these failures based on the change log, but I do see memory and IOMMU changes, etc. This has affected mutliple VMs running multiple OS on vastly different host machine specifications.
 
out of curiosity i tried on my homelab using fully updated no-subscription repo + q35 + machine version 11, windows 11 installed without issue
 
Which VirtIO SCSI drivers version was ?
I tried the following drivers: 0.1.240, 0.1.266, 0.1.271, and 0.1.285. I realized this was not a virtio scsi issue, as I said before, when it did not matter if the drive was attached using a SATA / IDE controller or SCSI controller. I am able to reliably reproduce this issue on all proxmox hosts.
 
  • Like
Reactions: _gabriel
Hi @THE_BIG_ONE,
please share your /etc/pve/storage.cfg and the configuration of an affected VM qm config ID with the numerical ID of the VM. What storage type and format is used for your virtual disks, qcow2/raw/...? What was the exact error for the FreeBSD or Linux VMs, do you have any logs/error message for those too? What exact Windows installer ISO version did you use?
 
What you could also test, does the issue happen when you switch to a different Async IO setting for the virtual disk (in the Advanced settings when editing the disk in the UI)?
 
So, today I tested again, and now I'm not able to succeed with any fresh installations using the current build of Windows 25H2 (26200.8457). Fiona, I misspoke when I said linux VMs, as I haven't yet had a corrupted linux system, yet. The OPNSense VM, I no longer have any log files or data to support that. As for the cache settings, I've tried each cache setting one by one.

I am using qcow2 for all images or direct PCIe/SATA passthrough. None of the direct PCIe or SATA passthrough VMs are affected. Only ones with the OS install on a qcow2 image.

storage.cfg
Code:
dir: local
        path /var/lib/vz
        content rootdir,vztmpl,images
        preallocation off
        prune-backups keep-all=1
        shared 1

cifs: TBO-M715Q-SSD
        path /mnt/pve/TBO-M715Q-SSD
        server 192.168.100.7
        share TBO-M715Q-SSD
        content backup,iso
        prune-backups keep-all=1
        username root

dir: TBO-TESTBENCH-NVME1
        path /mnt/pve/TBO-TESTBENCH-NVME1
        content vztmpl,rootdir,images
        is_mountpoint 1
        nodes TBO-TESTBENCH-PROXMOX
        preallocation off
        shared 1

cifs: TBO-SERVER
        path /mnt/pve/TBO-SERVER
        server 192.168.100.2
        share Proxmox
        content backup
        prune-backups keep-all=1
        username THE BIG ONE


I'm still trying to figure this out, because now I can't install a fresh system on any host, even after using yesterday's supposed workaround of setting machine type to 10.2; however, I did also update my kernel and hosts today to the latest 7.0.2-5. I might try rolling back some kernels and see if that's an issue. What is common across all VMs affected is they they're on a QCOW2 filesystem. It does not seem to matter which cache settings or controller I pass it through as.

I tried a factory 8457 ISO, an insider ISO via UUP dump, as well, and also my own custom 8457 captured image using DISM and PXE boot.
I also tested each installer on Virtualbox, including my PXE winpe / custom image install. All complete successfully there.

Here is an example of a fresh VM that experiences this issue:
Code:
  GNU nano 8.4                                            /etc/pve/qemu-server/120.conf
balloon: 0
bios: ovmf
boot: order=ide2;net0
cores: 8
cpu: host
efidisk0: TBO-TESTBENCH-NVME1:120/vm-120-disk-0.qcow2,efitype=4m,ms-cert=2023k,pre-enrolled-keys=1,size=528K
ide2: none,media=cdrom
machine: pc-q35-10.0
memory: 8192
meta: creation-qemu=11.0.0,ctime=1779240060
name: TBO-WIN11TEST
net0: virtio=BC:24:11:C3:DE:9D,bridge=vmbr0
numa: 0
ostype: win11
sata0: TBO-TESTBENCH-NVME1:120/vm-120-disk-1.qcow2,size=64G
scsihw: virtio-scsi-single
smbios1: uuid=9a47a937-0a63-4617-89ae-8e9b88540982
sockets: 1
tpmstate0: TBO-TESTBENCH-NVME1:120/vm-120-disk-2.qcow2,size=4M,version=v2.0
vga: qxl
vmgenid: 20cabc88-f5c7-47b3-9623-be1eec69a866
 
Last edited:
As for the cache settings, I've tried each cache setting one by one.
Async IO settings would be interesting to test too. But that's also good to know!

storage.cfg
Code:
dir: local
        path /var/lib/vz
        content rootdir,vztmpl,images
        preallocation off
        prune-backups keep-all=1
        shared 1
Unlikely to be related, but you should not set the shared flag if the storage is not actually shared (or do you really have a networked storage underlying /var/lib/vz?

From the docs:
shared
Indicate that this is a single storage with the same contents on all nodes (or all listed in the nodes option). It will not make the contents of a local storage automatically accessible to other nodes, it just marks an already shared storage as such!

Code:
dir: TBO-TESTBENCH-NVME1
        path /mnt/pve/TBO-TESTBENCH-NVME1
        content vztmpl,rootdir,images
        is_mountpoint 1
        nodes TBO-TESTBENCH-PROXMOX
        preallocation off
        shared 1
Similar here regarding the shared flag.

I'm still trying to figure this out, because now I can't install a fresh system on any host, even after using yesterday's supposed workaround of setting machine type to 10.2; however, I did also update my kernel and hosts today to the latest 7.0.2-5. I might try rolling back some kernels and see if that's an issue. What is common across all VMs affected is they they're on a QCOW2 filesystem. It does not seem to matter which cache settings or controller I pass it through as.

I tried a factory 8457 ISO, an insider ISO via UUP dump, as well, and also my own custom 8457 captured image using DISM and PXE boot.
Stupid question, but asking just to be sure, did you verify integrity of the ISOs/image with a checksum? Do you see anything in the system logs of the host around the time of the issue? Feel free to share the journal for the current boot journalctl -b > /tmp/journal.txt. Did you already do a health check for the underlying disk where the qcow2 files reside?
 
I changed shared flags, no I don't have shared local storage across all hosts. They're all independent. I did perform disk checks on all hosts. Journalctl shows nothing when these errors occur. Virtualbox VMs work flawlessly with all test images. One of my test ISOs I also used to use on my proxmox hosts before this issue cropped up. This is something recent with proxmox, but I can't figure out what it is resulting in this instability. It seems something with the underlying writes to the filesystem / image. Yesterday I tried various cache and async io settings, but I will try again now playing with async settings. I just did another install test, and noticed that when windows installer crashes out, the drive is corrupted:

Code:
Z:\>setup /product server

Z:\>diskpart

Microsoft DiskPart version 10.0.26100.1

Copyright (C) Microsoft Corporation.
On computer: MININT-OOUUB4L

DISKPART> list disk

  Disk ###  Status         Size     Free     Dyn  Gpt
  --------  -------------  -------  -------  ---  ---
  Disk 0    Online           64 GB      0 B        *

DISKPART> sel disk 0

Disk 0 is now the selected disk.

DISKPART> list par

  Partition ###  Type              Size     Offset
  -------------  ----------------  -------  -------
  Partition 1    System             100 MB  1024 KB
  Partition 2    Reserved            16 MB   101 MB
  Partition 3    Primary             63 GB   117 MB

DISKPART> sel par 3

Partition 3 is now the selected partition.

DISKPART> assign letter=c

DiskPart successfully assigned the drive letter or mount point.

DISKPART> exit

Leaving DiskPart...

Z:\>C:
The file or directory is corrupted and unreadable.

Z:\>chkdsk C: /F
The type of the file system is NTFS.

Chkdsk cannot run because the volume is in use by another
process.  Chkdsk may run if this volume is dismounted first.
ALL OPENED HANDLES TO THIS VOLUME WOULD THEN BE INVALID.
Would you like to force a dismount on this volume? (Y/N) y
Volume dismounted.  All opened handles to this volume are now invalid.

Stage 1: Examining basic file system structure ...
Deleting corrupt attribute record (0x80, "")
from file record segment 0x1BD.
Deleted corrupt attribute list entry
with type code 44524352 in file 17F6B.
Deleted corrupt attribute list entry
with type code 26FC in file 17F6B.
Truncating corrupt attribute list for file 17F6B.
Deleting corrupt attribute record (0x80, WofCompressedData)
from file record segment 0x19594.
Deleting corrupt attribute record (0x80, WofCompressedData)
from file record segment 0x195FF.
Deleting corrupt attribute record (0x80, WofCompressedData)
from file record segment 0x19639.
Deleting corrupt attribute record (0x80, WofCompressedData)
from file record segment 0x1A3AE.
Deleting corrupt attribute record (0x80, WofCompressedData)
from file record segment 0x1A3C0.
Deleting corrupt attribute record (0x80, WofCompressedData)
from file record segment 0x1B142.
Deleting corrupt attribute record (0x80, WofCompressedData)
from file record segment 0x1B9AA.
Deleting corrupt attribute record (0x80, WofCompressedData)
from file record segment 0x1D030.
Deleting corrupt attribute record (0x80, WofCompressedData)
from file record segment 0x1D0C5.
Deleting corrupt attribute record (0x80, "")
from file record segment 0x1D46A.
Deleting corrupt attribute record (0x80, WofCompressedData)
from file record segment 0x1DC53.
Deleted corrupt attribute list entry
with type code 80 in file 1DC54.
Deleting corrupt attribute record (0x80, WofCompressedData)
from file record segment 0x1DC54.
Deleting corrupt attribute record (0x80, "")
from file record segment 0x254BE.
  175616 file records processed.
File verification completed.
 Phase duration (File record verification): 2.33 seconds.
Deleting orphan file record segment 7.
Deleting orphan file record segment C.
Deleting orphan file record segment E.
Deleting orphan file record segment F.
Deleting orphan file record segment 18714.
  8458 large file records processed.
 Phase duration (Orphan file record recovery): 9.60 milliseconds.
  0 bad file records processed.
 Phase duration (Bad file record checking): 0.36 milliseconds.

Stage 2: Examining file name linkage ...
Fixing incorrect information in file record segment 5.
Correcting minor file name errors in file 17F6B.
  50118 reparse records processed.
Deleting index entry storagewmi_passthru.mof in index $I30 of file 5DA6.
Deleting index entry storagewmi_passthru.mof in index $I30 of file 5DA8.
Deleting index entry storagewmi_passthru.mof in index $I30 of file 8336.
Deleting index entry STORAG~1.MOF in index $I30 of file 8336.
  254232 index entries processed.
Index verification completed.
 Phase duration (Index verification): 7.45 seconds.
CHKDSK is creating new root directory.
CHKDSK is scanning unindexed files for reconnect to their original directory.
Recovering orphaned file $MFT (0) into directory file 5.
Recovering orphaned file $MFTMirr (1) into directory file 5.
Recovering orphaned file $LogFile (2) into directory file 5.
Recovering orphaned file $Volume (3) into directory file 5.
Recovering orphaned file $AttrDef (4) into directory file 5.
Fixing incorrect information in file record segment 5.
Recovering orphaned file . (5) into directory file 5.
Recovering orphaned file $Bitmap (6) into directory file 5.
Recovering orphaned file $Boot (7) into directory file 5.
Recovering orphaned file $BadClus (8) into directory file 5.
Recovering orphaned file $Secure (9) into directory file 5.
Skipping further messages about recovering orphans.
  15 unindexed files scanned.
  14 unindexed files recovered to original directory.
 Phase duration (Orphan reconnection): 73.04 milliseconds.
CHKDSK is recovering remaining unindexed files.
  1 unindexed files recovered to lost and found.
    Lost and found is located at \found.000

 Phase duration (Orphan recovery to lost and found): 11.79 milliseconds.
  50118 reparse records processed.
 Phase duration (Reparse point and Object ID verification): 115.94 milliseconds.

Stage 3: Examining security descriptors ...
Security descriptor verification completed.
 Phase duration (Security descriptor verification): 43.17 milliseconds.
Inserting data attribute into file 4.
Inserting data attribute into file 6.
Inserting data attribute into file 7.
Inserting data attribute into file D.
Inserting data attribute into file 1BD.
Inserting data attribute into file 17F6B.
Inserting data attribute into file 1D46A.
Inserting data attribute into file 254BE.
  39317 data files processed.
 Phase duration (Data attribute verification): 1.88 milliseconds.
CHKDSK is verifying Usn Journal...
Usn Journal verification completed.
Correcting errors in the Master File Table (MFT) mirror.
Correcting errors in the Attribute Definition Table.
Correcting errors in the Boot File.
Correcting errors in the master file table's (MFT) BITMAP attribute.
Correcting errors in the Volume Bitmap.

Windows has made corrections to the file system.
No further action is required.

  66988031 KB total disk space.
  11049992 KB in 127645 files.
    103244 KB in 39312 indexes.
         0 KB in bad sectors.
    252667 KB in use by the system.
     65536 KB occupied by the log file.
  55582128 KB available on disk.

      4096 bytes in each allocation unit.
  16747007 total allocation units on disk.
  13895532 allocation units available on disk.
Total duration: 10.14 seconds (10148 ms).
Unable to obtain a handle to the event log.

Changing async from io-uring to native or threads did not seem to help at all.
 
Last edited:
Does installation work without issues when you use raw format instead of qcow2?
 
I've finally narrowed down the exact settings / what is causing the issue. I've been repeating installs on 4 different proxmox host machines over and over changing one thing at a time and comparing. It's the SSD and Discard flag being set. Doesn't matter if the drive is connected by SCSI or SATA controller, if the SSD and discard flag is set, the drive gets corrupted and windows install fails / upgrades fail. I have been using the SSD and discard flag for years without issues; however, the only way I can reliably get a complete install is by removing discard and SSD from the disk options.
Tried this with just SSD flag set, and still unreliable.
Example of working / reliable installation settings:
WORKS: scsi0: TBO-TESTBENCH-NVME1:122/vm-122-disk-1.qcow2,iothread=1,size=64G
WORKS: sata1: local:122/vm-122-disk-1.qcow2,size=64G

Examples of failing / corrupted drive during install:
FAIL/CORRUPT: scsi0: TBO-TESTBENCH-NVME1:122/vm-122-disk-1.qcow2,discard=on,iothread=1,size=32G,ssd=1
FAIL/CORRUPT: scsi0: TBO-TESTBENCH-NVME1:122/vm-122-disk-1.qcow2,iothread=1,size=32G,ssd=1

As previously mentioned, almost all my VMs have been set with SSD emulation and Discard enabled. This was also set on the 4 VMs that corrupted during windows updates.

There is still one proxmox host that I have that does not work with any of the above settings when attempting to install windows, even when disabling SSD emulation and discard. All other hosts work when these settings are disabled.

I have been working on this the past 10+ hours. Taking a break now that I've finally narrowed down some issues. I will try raw format later this evening.
 
Last edited:
Update: Using .raw file image instead of .qcow2 works with all proxmox hosts including the ssd=1 and discard=1 flags set. All systems can upgrade/repair/clean install windows using .raw file format.

With that said, all my tests now seem to show zero corruption with raw format vs qcow2. Should I begin the process of converting all my VMs off qcow2 images to raw? (I don't use LVM, I setup all my hosts in the debian base configuration because I dislike LVM file system, so they are all ext4 based directory file systems) The only issue I see with using raw vs qcow2 is that the files are not thinly provisioned, a 64GB disk appears to eat up 64GB space regardless of internal space usage.
 
Last edited:
  • Like
Reactions: meyergru
If you want to verify this, please test with the new pve-qemu-kvm=11.0.0-3 package version, which has the revert.
 
  • Like
Reactions: meyergru
I have performed 6 successful installs across 5 different Proxmox PCs with this morning's pve-qemu-kvm v11.0.0-3 using qcow2 with SSD + Trim/Discard enabled and no seen issues. I am currently now performing 3x in-place repair / upgrades across 3x different PCs for extra verification.