[SOLVED] VM issues when running simultaneously with another VM

sleeper52

New Member
Mar 24, 2023
25
7
3
New Proxmox user here. I have a TrueNAS Core 13.0-U5.2 VM. It is installed in a WD Blue SSD named VMs_CT_Main. I have passed through a Dell H310 HBA to it with 8 HDDs attached. Below is the config of the TrueNAS VM.
Code:
kristian@pve:~$ sudo qm config 100
[sudo] password for kristian:
agent: 1
balloon: 32768
bios: ovmf
boot: order=scsi0;net0
cores: 8
cpu: host
efidisk0: VMs_CT_Main:100/vm-100-disk-0.vmdk,efitype=4m,size=528K
hostpci0: 0000:0c:00,pcie=1
machine: q35
memory: 49152
meta: creation-qemu=7.2.0,ctime=1683517391
name: TrueNAS
net0: virtio=0E:09:A3:2F:7B:3F,bridge=vmbr0,firewall=1
numa: 0
onboot: 1
ostype: l26
scsi0: VMs_CT_Main:100/vm-100-disk-1.qcow2,discard=on,iothread=1,size=32G,ssd=1,serial=S12PNEAD272742M
scsihw: virtio-scsi-single
smbios1: uuid=dfd7888b-60d9-4817-9c94-e4245371f7eb
sockets: 1
startup: order=1,up=180
unused0: VMs_CT_Main:100/vm-100-disk-2.raw
vga: qxl
vmgenid: 5b7bbce9-5c35-4566-b8b2-828adc14dd7e

I also have a Windows 11 Pro VM installed on the same WD Blue SSD (VMs_CT_Main). If I start up the Windows 11 Pro VM, it causes my TrueNAS VM to inadvertently shutdown. As long as the Windows 11 VM is on, I get a "TASK ERROR: start failed: QEMU exited with code 1" message every time I attempt to start the TrueNAS VM. If the Windows VM is off, the TrueNAS VM would start up fine. Below is the config of the Windows 11 Pro VM.
Code:
kristian@pve:~$ sudo qm config 102
agent: 1
balloon: 4096
bios: ovmf
boot: order=scsi0;net0
cores: 4
cpu: host
efidisk0: VMs_CT_Main:102/vm-102-disk-0.raw,efitype=4m,pre-enrolled-keys=1,size=528K
machine: pc-q35-8.0
memory: 8192
meta: creation-qemu=7.2.0,ctime=1682339722
name: windows-vm1
net0: virtio=26:DE:08:F1:40:CB,bridge=vmbr0,firewall=1
numa: 0
ostype: win11
scsi0: VMs_CT_Main:102/vm-102-disk-1.raw,cache=writeback,discard=on,iothread=1,size=244144M,ssd=1
scsihw: virtio-scsi-single
smbios1: uuid=4d2d21ca-23ff-468d-8dd9-061f7237ed0a
sockets: 1
startup: order=2,up=60
tpmstate0: VMs_CT_Main:102/vm-102-disk-2.raw,size=4M,version=v2.0
vmgenid: faa260c5-b6b1-4187-9203-43825076926e

Is this issue related to IOMMU? I am still learning about it so I am not sure. I have acquired a script to show the IOMMU groupings as seen below just in case.
Code:
Group:  0   0000:00:01.0 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Starship/Matisse PCIe Dummy Host Bridge [1022:1482]
Group:  1   0000:00:01.2 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] Starship/Matisse GPP Bridge [1022:1483]   Driver: pcieport
Group:  2   0000:00:02.0 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Starship/Matisse PCIe Dummy Host Bridge [1022:1482]
Group:  3   0000:00:03.0 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Starship/Matisse PCIe Dummy Host Bridge [1022:1482]
Group:  4   0000:00:03.1 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] Starship/Matisse GPP Bridge [1022:1483]   Driver: pcieport
Group:  5   0000:00:03.2 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] Starship/Matisse GPP Bridge [1022:1483]   Driver: pcieport
Group:  6   0000:00:04.0 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Starship/Matisse PCIe Dummy Host Bridge [1022:1482]
Group:  7   0000:00:05.0 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Starship/Matisse PCIe Dummy Host Bridge [1022:1482]
Group:  8   0000:00:07.0 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Starship/Matisse PCIe Dummy Host Bridge [1022:1482]
Group:  9   0000:00:07.1 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] Starship/Matisse Internal PCIe GPP Bridge 0 to bus[E:B] [1022:1484]   Driver: pcieport
Group:  10  0000:00:08.0 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Starship/Matisse PCIe Dummy Host Bridge [1022:1482]
Group:  11  0000:00:08.1 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] Starship/Matisse Internal PCIe GPP Bridge 0 to bus[E:B] [1022:1484]   Driver: pcieport
Group:  12  0000:00:14.0 SMBus [0c05]: Advanced Micro Devices, Inc. [AMD] FCH SMBus Controller [1022:790b] (rev 61)   Driver: piix4_smbus
Group:  12  0000:00:14.3 ISA bridge [0601]: Advanced Micro Devices, Inc. [AMD] FCH LPC Bridge [1022:790e] (rev 51)
Group:  13  0000:00:18.0 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Matisse/Vermeer Data Fabric: Device 18h; Function 0 [1022:1440]
Group:  13  0000:00:18.1 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Matisse/Vermeer Data Fabric: Device 18h; Function 1 [1022:1441]
Group:  13  0000:00:18.2 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Matisse/Vermeer Data Fabric: Device 18h; Function 2 [1022:1442]
Group:  13  0000:00:18.3 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Matisse/Vermeer Data Fabric: Device 18h; Function 3 [1022:1443]   Driver: k10temp
Group:  13  0000:00:18.4 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Matisse/Vermeer Data Fabric: Device 18h; Function 4 [1022:1444]
Group:  13  0000:00:18.5 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Matisse/Vermeer Data Fabric: Device 18h; Function 5 [1022:1445]
Group:  13  0000:00:18.6 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Matisse/Vermeer Data Fabric: Device 18h; Function 6 [1022:1446]
Group:  13  0000:00:18.7 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Matisse/Vermeer Data Fabric: Device 18h; Function 7 [1022:1447]
Group:  14  0000:01:00.0 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] Matisse Switch Upstream [1022:57ad]   Driver: pcieport
Group:  15  0000:02:04.0 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] Matisse PCIe GPP Bridge [1022:57a3]   Driver: pcieport
Group:  16  0000:02:05.0 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] Matisse PCIe GPP Bridge [1022:57a3]   Driver: pcieport
Group:  17  0000:02:08.0 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] Matisse PCIe GPP Bridge [1022:57a4]   Driver: pcieport
Group:  17  0000:05:00.0 Non-Essential Instrumentation [1300]: Advanced Micro Devices, Inc. [AMD] Starship/Matisse Reserved SPP [1022:1485]
Group:  17  0000:05:00.1 USB controller [0c03]: Advanced Micro Devices, Inc. [AMD] Matisse USB 3.0 Host Controller [1022:149c]   Driver: xhci_hcd
Group:  17  0000:05:00.3 USB controller [0c03]: Advanced Micro Devices, Inc. [AMD] Matisse USB 3.0 Host Controller [1022:149c]   Driver: xhci_hcd
Group:  18  0000:02:09.0 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] Matisse PCIe GPP Bridge [1022:57a4]   Driver: pcieport
Group:  18  0000:06:00.0 SATA controller [0106]: Advanced Micro Devices, Inc. [AMD] FCH SATA Controller [AHCI mode] [1022:7901] (rev 51)   Driver: ahci
Group:  19  0000:02:0a.0 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] Matisse PCIe GPP Bridge [1022:57a4]   Driver: pcieport
Group:  19  0000:07:00.0 SATA controller [0106]: Advanced Micro Devices, Inc. [AMD] FCH SATA Controller [AHCI mode] [1022:7901] (rev 51)   Driver: ahci
Group:  20  0000:03:00.0 Ethernet controller [0200]: Realtek Semiconductor Co., Ltd. RTL8125 2.5GbE Controller [10ec:8125] (rev 05)   Driver: r8169
Group:  21  0000:04:00.0 Ethernet controller [0200]: Intel Corporation I211 Gigabit Network Connection [8086:1539] (rev 03)   Driver: igb
Group:  22  0000:08:00.0 PCI bridge [0604]: Intel Corporation Device [8086:4fa1] (rev 01)   Driver: pcieport
Group:  23  0000:09:01.0 PCI bridge [0604]: Intel Corporation Device [8086:4fa4]   Driver: pcieport
Group:  24  0000:09:04.0 PCI bridge [0604]: Intel Corporation Device [8086:4fa4]   Driver: pcieport
Group:  25  0000:0a:00.0 VGA compatible controller [0300]: Intel Corporation DG2 [Arc A380] [8086:56a5] (rev 05)   Driver: i915
Group:  26  0000:0b:00.0 Audio device [0403]: Intel Corporation DG2 Audio Controller [8086:4f92]   Driver: snd_hda_intel
Group:  27  0000:0c:00.0 Serial Attached SCSI controller [0107]: Broadcom / LSI SAS2008 PCI-Express Fusion-MPT SAS-2 [Falcon] [1000:0072] (rev 03)   Driver: vfio-pci
Group:  28  0000:0d:00.0 Non-Essential Instrumentation [1300]: Advanced Micro Devices, Inc. [AMD] Starship/Matisse PCIe Dummy Function [1022:148a]
Group:  29  0000:0e:00.0 Non-Essential Instrumentation [1300]: Advanced Micro Devices, Inc. [AMD] Starship/Matisse Reserved SPP [1022:1485]
Group:  30  0000:0e:00.1 Encryption controller [1080]: Advanced Micro Devices, Inc. [AMD] Starship/Matisse Cryptographic Coprocessor PSPCPP [1022:1486]   Driver: ccp
Group:  31  0000:0e:00.3 USB controller [0c03]: Advanced Micro Devices, Inc. [AMD] Matisse USB 3.0 Host Controller [1022:149c]   Driver: xhci_hcd
 
Last edited:
bump.
I have other VMs (Ubuntu Server and Pop!_OS) also installed on the same WD Blue SSD (VMs_CT_Main). Both are running fine with no issues. Only these two are conflicting for some reason.
 
Last edited:
Hmm this feels familiar. Question. Are you using ZFS on your proxmox hosts as well? Even if you arent....I had a weird situation a few months back.

I was running hosts close to the max capacity of the node. Proxmox ZFS was eating my ram ( I know it is supposed to). The OOM killer in linux was beating ZFS to the punch. The oom killer on the proxmox host was killing VMs before ZFS could release resources to support running VMs.

Considering the Truenas VM is going to also use as much ram as it can, its always going to be running near the max you have set. Are you running close the limits of the ram on the host? The OOM killer score for that truenas VM is likely high. It will be the first to go when proxmox thinks its running out of ram. Might be why other VMs survive but Truenas gets smacked.

I also see you have balloon set on truenas. I may be reading that wrong, but does that not mean:

truenas will hit the ram limit you set in the 'balloon:' paramater > proxmox will give it more up to what you have set in 'memory:' paramater > truenas will take more> loop until you hit the set max>etc.
 
Last edited:
Hmm this feels familiar. Question. Are you using ZFS on your proxmox hosts as well? Even if you arent....I had a weird situation a few months back.

I was running hosts close to the max capacity of the node. Proxmox ZFS was eating my ram ( I know it is supposed to). The OOM killer in linux was beating ZFS to the punch. The oom killer on the proxmox host was killing VMs before ZFS could release resources to support running VMs.

Considering the Truenas VM is going to also use as much ram as it can, its always going to be running near the max you have set. Are you running close the limits of the ram on the host? The OOM killer score for that truenas VM is likely high. It will be the first to go when proxmox thinks its running out of ram. Might be why other VMs survive but Truenas gets smacked.

I also see you have balloon set on truenas. I may be reading that wrong, but does that not mean:

truenas will hit the ram limit you set in the 'balloon:' paramater > proxmox will give it more up to what you have set in 'memory:' paramater > truenas will take more> loop until you hit the set max>etc.
Thanks for the reply. I am using a separate SDD, an old 128GB Samsung 840 Pro, running on ext4 filesystem for the Proxmox OS. The WD Blue SSD (VMs_CT_Main) where the TrueNAS and Windows VMs are installed is also running on ext4. The HDDs connected to the Dell H310 HBA are running on ZFS (1x mirror and 2x RAIDz1). The TrueNAS VM with 8 HDDs in ZFS are currently holding ~60TB of storage. It is indeed TrueNAS' nature to use as much memory as it can (iirc iXSystems recommends 1GB of RAM for every TB). My RAM allocation is as follows:
TrueNAS VMBallooning enabledminimum 32768 MiBmaximum 49152 MiB
Windows VMBallooning enabledminimum 4096 MiBmaximum 8192 MiB
Ubuntu Server VMBallooning enabledminimum 4096 MiBmaximum 8192 MiB
Pop!_OS VMBallooning enabledminimum 4096 MiBmaximum 8192 MiB

This is what my TrueNAS VM memory usage is like:
1689740441237.png

My System
CPU = AMD Ryzen 5900x
RAM = Corsair Vengeance 64GB (2x32GB) 3600MT Cas18
 
Last edited:
Thanks for the reply. I am using a separate SDD, an old 128GB Samsung 840 Pro, running on ext4 filesystem for the Proxmox OS. The WD Blue SSD (VMs_CT_Main) where the TrueNAS and Windows VMs are installed is also running on ext4. The HDDs connected to the Dell H310 HBA are running on ZFS (1x mirror and 2x RAIDz1). The TrueNAS VM with 8 HDDs in ZFS are currently holding ~60TB of storage. It is indeed TrueNAS' nature to use as much memory as it can (iirc iXSystems recommends 1GB of RAM for every TB). My RAM allocation is as follows:
TrueNAS VMBallooning enabledminimum 32768 MiBmaximum 49152 MiB
Windows VMBallooning enabledminimum 4096 MiBmaximum 8192 MiB
Ubuntu Server VMBallooning enabledminimum 4096 MiBmaximum 8192 MiB
Pop!_OS VMBallooning enabledminimum 4096 MiBmaximum 8192 MiB

This is what my TrueNAS VM memory usage is like:
View attachment 53148

My System
CPU = AMD Ryzen 5900x
RAM = Corsair Vengeance 64GB (2x32GB) 3600MT Cas18
Ok no Proxmox ZFS. Cool. Still think we are onto something here.

Alright so with some quick napkin math

64GB Total -48GB Truenas= 16GB left for everything else.

Lets say that the other 3 Vms run at 4GB ( the minimum) all the time: 16GB Remaining-12GB (3Vms)= 4GB left.

Proxmox also needs ram to run too. I think you are way too close to the limit of the host.

As soon as any of those VMs goes a tad over the minimum asigned RAM, you are out of RAM. Proxmox is going to start killing processes ( your VMs). Truenas goes first every time because of its OOM Killer score.

Windows also loves ram, so its going to creep to 8Gb way faster than the 2 Linux Boxes.
 
  • Like
Reactions: sleeper52
Ok no Proxmox ZFS. Cool. Still think we are onto something here.

Alright so with some quick napkin math

64GB Total -48GB Truenas= 16GB left for everything else.

Lets say that the other 3 Vms run at 4GB ( the minimum) all the time: 16GB Remaining-12GB (3Vms)= 4GB left.

Proxmox also needs ram to run too. I think you are way too close to the limit of the host.

As soon as any of those VMs goes a tad over the minimum asigned RAM, you are out of RAM. Proxmox is going to start killing processes ( your VMs). Truenas goes first every time because of its OOM Killer score.

Windows also loves ram, so its going to creep to 8Gb way faster than the 2 Linux Boxes.
Will try experimenting with RAM allocation on my VMs. Will update this thread on what I find. Thanks again for your replies.
 
As soon as any of those VMs goes a tad over the minimum asigned RAM, you are out of RAM. Proxmox is going to start killing processes ( your VMs). Truenas goes first every time because of its OOM Killer score.

Windows also loves ram, so its going to creep to 8Gb way faster than the 2 Linux Boxes.

Just an update: your hypothesis turned out to be correct. I've been experimenting with RAM allocations with what you said in mind. It turned out, as long as I don't run all 4 VMs at once, I don't experience any issues. With the Pop!_OS VM off, I ran and turned off the Windows VM with it not affecting the TrueNAS VM at all. It makes a lot of sense why these two VMs in particular are conflicting since these two use the most memory thus run OOM. I think I might be buying an additional 64GB since DDR4 is quite cheap anyway. Thanks again for your help.
 
Last edited:
  • Like
Reactions: shalashaskatoka
Just an update: your hypothesis turned out to be correct. I've been experimenting with RAM allocations with what you said in mind. It turned out, as long as I don't run all 4 VMs at once, I don't experience any issues. With the Pop!_OS VM off, I ran and turned off the Windows VM with it not affecting the TrueNAS VM at all. It makes a lot of sense why these two VMs in particular are conflicting since these two use the most memory thus run OOM. I think I might be buying an additional 64GB since DDR4 is quite cheap anyway. Thanks again for your help.
Thanks for reporting back. Note that any VM with PCI(e) passthrough will pin all VM memory into actual host RAM (because of possible device initiated DMA) and ballooning won't work for such VMs. Therefore, your TrueNAS will always use 48GiB.
 
Thanks for reporting back. Note that any VM with PCI(e) passthrough will pin all VM memory into actual host RAM (because of possible device initiated DMA) and ballooning won't work for such VMs. Therefore, your TrueNAS will always use 48GiB.
Thanks for the useful info.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!