I have a weird recent problem that some of my VMs fail to start after updating Proxmox. Some start fine but 2 always fail. The weird part is that if I copy them over to another Proxmox machine (which has not been updated) they start without any problem but I do have to make a couple of config changes.
Updated machine is a Ryzen 3900x with 128Gb Ram and an Intel x710 nic. SR-IOV vf is applied to each VM. Worked perfectly before update. Package versions for this machine are:
The non updated machine is an Intel i7-4770 with 32Gb Ram using virtio for VM nics. Config for this machine is:
These are the differences:
Config for non working VM:
Screenshot from VM during boot (Proxmox has plenty of memory available):
Does anyone have any ideas? Although the VMs are running, I really need to get them back on my main machine. As mentioned earlier, the main machine is running several other VMs and LXCs without problem. The journal doesn't give any info; neither does dmesg.
Thanks in advance.
Updated machine is a Ryzen 3900x with 128Gb Ram and an Intel x710 nic. SR-IOV vf is applied to each VM. Worked perfectly before update. Package versions for this machine are:
Code:
proxmox-ve: 7.3-1 (running kernel: 5.15.85-1-pve)
pve-manager: 7.3-6 (running version: 7.3-6/723bb6ec)
pve-kernel-helper: 7.3-7
pve-kernel-5.15: 7.3-3
pve-kernel-5.15.102-1-pve: 5.15.102-1
pve-kernel-5.15.85-1-pve: 5.15.85-1
pve-kernel-5.15.83-1-pve: 5.15.83-1
pve-kernel-5.15.30-2-pve: 5.15.30-3
ceph-fuse: 15.2.16-pve1
corosync: 3.1.7-pve1
criu: 3.15-1+pve-1
glusterfs-client: 9.2-1
ifupdown2: 3.1.0-1+pmx3
ksm-control-daemon: 1.4-1
libjs-extjs: 7.0.0-1
libknet1: 1.24-pve2
libproxmox-acme-perl: 1.4.4
libproxmox-backup-qemu0: 1.3.1-1
libpve-access-control: 7.3-2
libpve-apiclient-perl: 3.2-1
libpve-common-perl: 7.3-2
libpve-guest-common-perl: 4.2-3
libpve-http-server-perl: 4.1-6
libpve-storage-perl: 7.3-2
libspice-server1: 0.14.3-2.1
lvm2: 2.03.11-2.1
lxc-pve: 5.0.2-2
lxcfs: 5.0.3-pve1
novnc-pve: 1.4.0-1
proxmox-backup-client: 2.3.3-1
proxmox-backup-file-restore: 2.3.3-1
proxmox-mail-forward: 0.1.1-1
proxmox-mini-journalreader: 1.3-1
proxmox-offline-mirror-helper: 0.5.1-1
proxmox-widget-toolkit: 3.5.5
pve-cluster: 7.3-2
pve-container: 4.4-2
pve-docs: 7.3-1
pve-edk2-firmware: 3.20221111-1
pve-firewall: 4.2-7
pve-firmware: 3.6-4
pve-ha-manager: 3.5.1
pve-i18n: 2.8-3
pve-qemu-kvm: 7.2.0-7
pve-xtermjs: 4.16.0-1
qemu-server: 7.3-4
smartmontools: 7.2-pve3
spiceterm: 3.2-2
swtpm: 0.8.0~bpo11+3
vncterm: 1.7-1
zfsutils-linux: 2.1.9-pve1
The non updated machine is an Intel i7-4770 with 32Gb Ram using virtio for VM nics. Config for this machine is:
Code:
proxmox-ve: 7.3-1 (running kernel: 5.15.85-1-pve)
pve-manager: 7.3-6 (running version: 7.3-6/723bb6ec)
pve-kernel-helper: 7.3-6
pve-kernel-5.15: 7.3-2
pve-kernel-5.13: 7.1-9
pve-kernel-5.15.85-1-pve: 5.15.85-1
pve-kernel-5.15.83-1-pve: 5.15.83-1
pve-kernel-5.13.19-6-pve: 5.13.19-15
pve-kernel-5.13.19-2-pve: 5.13.19-4
ceph-fuse: 15.2.15-pve1
corosync: 3.1.7-pve1
criu: 3.15-1+pve-1
glusterfs-client: 9.2-1
ifupdown2: 3.1.0-1+pmx3
ksm-control-daemon: 1.4-1
libjs-extjs: 7.0.0-1
libknet1: 1.24-pve2
libproxmox-acme-perl: 1.4.4
libproxmox-backup-qemu0: 1.3.1-1
libpve-access-control: 7.3-2
libpve-apiclient-perl: 3.2-1
libpve-common-perl: 7.3-2
libpve-guest-common-perl: 4.2-3
libpve-http-server-perl: 4.1-6
libpve-storage-perl: 7.3-2
libspice-server1: 0.14.3-2.1
lvm2: 2.03.11-2.1
lxc-pve: 5.0.2-2
lxcfs: 5.0.3-pve1
novnc-pve: 1.4.0-1
proxmox-backup-client: 2.3.3-1
proxmox-backup-file-restore: 2.3.3-1
proxmox-mail-forward: 0.1.1-1
proxmox-mini-journalreader: 1.3-1
proxmox-offline-mirror-helper: 0.5.1-1
proxmox-widget-toolkit: 3.5.5
pve-cluster: 7.3-2
pve-container: 4.4-2
pve-docs: 7.3-1
pve-edk2-firmware: 3.20221111-1
pve-firewall: 4.2-7
pve-firmware: 3.6-3
pve-ha-manager: 3.5.1
pve-i18n: 2.8-3
pve-qemu-kvm: 7.2.0-5
pve-xtermjs: 4.16.0-1
qemu-server: 7.3-4
smartmontools: 7.2-pve3
spiceterm: 3.2-2
swtpm: 0.8.0~bpo11+3
vncterm: 1.7-1
zfsutils-linux: 2.1.9-pve1
These are the differences:
Code:
< pve-kernel-helper: 7.3-7
< pve-kernel-5.15: 7.3-3
< pve-kernel-5.15.102-1-pve: 5.15.102-1
---
> pve-kernel-helper: 7.3-6
> pve-kernel-5.15: 7.3-2
> pve-kernel-5.13: 7.1-9
8,9c8,10
< pve-kernel-5.15.30-2-pve: 5.15.30-3
< ceph-fuse: 15.2.16-pve1
---
> pve-kernel-5.13.19-6-pve: 5.13.19-15
> pve-kernel-5.13.19-2-pve: 5.13.19-4
> ceph-fuse: 15.2.15-pve1
41c42
< pve-firmware: 3.6-4
---
> pve-firmware: 3.6-3
44c45
< pve-qemu-kvm: 7.2.0-7
---
> pve-qemu-kvm: 7.2.0-5
Config for non working VM:
Code:
agent: 1
balloon: 0
bios: ovmf
boot: order=scsi0
cores: 1
cpu: host,flags=+aes
cpuunits: 10
efidisk0: nvme-thin:vm-4800-disk-2,efitype=4m,pre-enrolled-keys=1,size=4M
machine: q35
memory: 1024
meta: creation-qemu=7.1.0,ctime=1674682172
name: nvidia-dls
numa: 1
onboot: 1
ostype: l26
scsi0: nvme-thin:vm-4800-disk-1,cache=none,discard=on,iothread=1,size=8G
scsihw: virtio-scsi-single
serial0: socket
smbios1: uuid=<redacted>
sockets: 1
startup: order=16,up=10
tablet: 0
vmgenid: <redacted>
Screenshot from VM during boot (Proxmox has plenty of memory available):
Does anyone have any ideas? Although the VMs are running, I really need to get them back on my main machine. As mentioned earlier, the main machine is running several other VMs and LXCs without problem. The journal doesn't give any info; neither does dmesg.
Thanks in advance.