Nic dissappeared from a QEMU VM

VictorSTS

Distinguished Member
Oct 7, 2019
1,028
558
158
Spain
Looking for some clues about this or if someone else has seen this happening too (it's been a first for me and I do have thousand of VMs).

Using PVE8.4.5. Have a VM with Windows 2019 with virtio drivers 0.1.271, running fine for a couple of weeks since the last reboot. This morning all of a sudden it lost it's network. Things checked:
  • On device manager the card showed in grey as if it wasn't present
  • Powershell's get-netadapter did not showed the interface at all, as if it was fully removed
  • The VM config did had net0 as expected. qm config also showed the interface configured at QEMU
  • ps -ef | grep VMID did also showed the nic
  • To my surprise, using qm monitor VMID, as info qtree did not showed the network interface net0 (that explains why Windows didn't see it either)
  • Nothing relevant on PVE host dmesg or journal. Host has lots of RAM, CPU and disk space/performance to spare (still migrating from other hypervisor).

Tried adding a second nic net1 and it did show both on info qtree and on Windows. Set an IP and was working flawlessly. Before rebooting the VM, tried live migrating to another host in the cluster and then both nic0 and nic1 showed up on info qtree and Windows could use both nics.

Seems as if QEMU somehow unplugged and fully removed nic0 for some reason and there's no trace of it anywhere :( The only thing that calls my attention is ostype: other, which should be "win10" to match VMs OS. Could that be the culprit?

Anyone has any clue?

Some details:
Code:
proxmox-ve: 8.4.0 (running kernel: 6.8.12-12-pve)
pve-manager: 8.4.5 (running version: 8.4.5/57892e8e686cb35b)
proxmox-kernel-helper: 8.1.1
proxmox-kernel-6.8.12-12-pve-signed: 6.8.12-12
proxmox-kernel-6.8: 6.8.12-12
proxmox-kernel-6.8.12-9-pve-signed: 6.8.12-9
amd64-microcode: 3.20240820.1~deb12u1
ceph: 19.2.1-pve3
ceph-fuse: 19.2.1-pve3
corosync: 3.1.9-pve1
criu: 3.17.1-2+deb12u1
frr-pythontools: 10.2.2-1+pve1
glusterfs-client: 10.3-5
ifupdown2: 3.2.0-1+pmx11
ksm-control-daemon: 1.5-1
libjs-extjs: 7.0.0-5
libknet1: 1.30-pve2
libproxmox-acme-perl: 1.6.0
libproxmox-backup-qemu0: 1.5.2
libproxmox-rs-perl: 0.3.5
libpve-access-control: 8.2.2
libpve-apiclient-perl: 3.3.2
libpve-cluster-api-perl: 8.1.2
libpve-cluster-perl: 8.1.2
libpve-common-perl: 8.3.2
libpve-guest-common-perl: 5.2.2
libpve-http-server-perl: 5.2.2
libpve-network-perl: 0.11.2
libpve-rs-perl: 0.9.4
libpve-storage-perl: 8.3.6
libspice-server1: 0.15.1-1
lvm2: 2.03.16-2
lxc-pve: 6.0.0-1
lxcfs: 6.0.0-pve2
novnc-pve: 1.6.0-2
proxmox-backup-client: 3.4.3-1
proxmox-backup-file-restore: 3.4.3-1
proxmox-backup-restore-image: 0.7.0
proxmox-firewall: 0.7.1
proxmox-kernel-helper: 8.1.1
proxmox-mail-forward: 0.3.3
proxmox-mini-journalreader: 1.5
proxmox-offline-mirror-helper: 0.6.7
proxmox-widget-toolkit: 4.3.12
pve-cluster: 8.1.2
pve-container: 5.2.7
pve-docs: 8.4.0
pve-edk2-firmware: 4.2025.02-4~bpo12+1
pve-esxi-import-tools: 0.7.4
pve-firewall: 5.1.2
pve-firmware: 3.16-3
pve-ha-manager: 4.0.7
pve-i18n: 3.4.5
pve-qemu-kvm: 9.2.0-7
pve-xtermjs: 5.5.0-2
qemu-server: 8.4.1
smartmontools: 7.3-pve1
spiceterm: 3.3.0
swtpm: 0.8.0+pve1
vncterm: 1.8.0
zfsutils-linux: 2.2.8-pve1

qm config

Code:
agent: 1,fstrim_cloned_disks=1
bios: ovmf
boot: order=scsi0
cores: 16
cpu: host
efidisk0: ceph--VMs:vm-117-disk-0,efitype=4m,pre-enrolled-keys=1,size=528K
ide2: none,media=cdrom
machine: pc-q35-9.2+pve1
memory: 65536
meta: creation-qemu=9.2.0,ctime=1756479105
name: ts03
net0: virtio=BC:24:11:59:1C:B2,bridge=vmbr0,tag=2
net1: virtio=BC:24:11:33:71:06,bridge=vmbr0,tag=2
numa: 1
onboot: 1
ostype: other
scsi0: ceph--VMs:vm-117-disk-1,discard=on,iothread=1,size=200G,ssd=1
scsihw: virtio-scsi-single
smbios1: uuid=89e62f67-06e4-4468-bc27-61e4d15e9fb7
sockets: 1
vmgenid: 0e07729b-e160-4828-915d-c0d186839dd7
 
It is possible to eject the network device from inside the Windows VM. Can you check the Windows Events Log (or similar) if and when (and maybe why) this happened?
 
Umm, might be but AFAIK no user should have permissions for it. Checked a log of event logs too and didn't find anything relevant (there's a lot of noise in the event log related to networked disk errors).

By chance, do you know in which log exactly would something like that show up?
 
Sorry but I have very little experience with Windows any more. I was hoping that there would be some information or log inside, as you did not see information outside of the VM on the host.
 
Thanks for the tip, it put me on the right track :). This is definitely what happened: someone did eject the nic from Windows:

1758031090543.png

Any user can eject devices, even non-admin ones (come on, Microsoft, it's a server OS!!). It's an RDP host and I bet their GPO don't restrict user access to that, which means anyone can DoS your machine by disabling the network.

Reproduced it a few times both on Win11 and Win2019. There's no log about this events at the host level (opened a bug asking if that can be improved [1]). In the event log there's no real entry saying "user USER removed device DEVICE", but at least found two events that might help:

Code:
Event log: System
Origin: Service Control Manager
id: 7042
Level: Informational
Text: "Sent stop to NetBIOS over TCP/IP, Reason 0x40030011"

And on Window 11 there's another one on "NetworkProfile" with id 10001 that mentions "Network disconnected".

I've asked their RDP environment admins to restrict access to that part of the GUI for non admin users (as it should be, IMHO) and meanwhile we've disabled network hotplug in the VMs options (spoke too soon: tested it and any user can still can unplug devices). Still, any user could unplug things like the VirtIO Balloon driver and force the host to not being able to reclaim used memory if the host needs it (so another DoS vector).

[1] https://bugzilla.proxmox.com/show_bug.cgi?id=6820
 
Last edited:
  • Like
Reactions: leesteken