[SOLVED] Extremely Laggy VM's on Poweredge R720xd.

ewrich

New Member
Jan 6, 2025
3
0
1
Hello All,

I've been working on my homelab for a bit and have set up both an R720XD and an R730 in a Proxmox Datacenter setup. I've noticed that on the R730, any Windows VM I throw at it runs decently with no issues. But when I run any VMs on the R720, they can run terribly, like 2-10 fps in RDP or Proxmox Console. The server itself shows no overload on CPU, RAM, or anything, and the configuration is the same between the two servers. But any VM I throw at it runs terribly and almost gets worse if I pass through a GPU to one. I know the R730 has a bit better specs, but the R720 shouldn't be running this bad. Anyone got any ideas?

Specs

R720XD - 2x Intel Xeon E5-2690 V2, 64 GB DDR3 RAM, 1 TB WD Blue SSD for boot drive. RAID 5 array of 3 500 GB Samsung SSDs for VM storage and RAID 0 array of 2 500 GB HDDs for NAS. GTX 960

R730 - 2x Intel Xeon E5-2697 V4, 316 GB DDR4 RAM, 1 TB WD Blue SSD for boot drive. RAID 5 array of 3 500 GB Samsung SSDs for VM storage, RAID Z2 array of 48 TB JBOD. RTX 3060
 
Last edited:
Hello ewrich! Just to be sure: is it only a video issue, or do you have general performance issues as well?

I would check the following:
  1. Make sure that virtualization (Intel VT-x and related options) are turned on in the BIOS.
  2. KVM hardware virtualization should be turned on in the VM settings.
  3. In case of Windows VMs, you should install all VirtIO drivers and guest tools, as recommended by the documentation.
  4. In case none of the above helped and you are not interested in migrating the VM to another machine, you can also try setting the CPU of the VM to host, in which case it will use all available CPU features for improved performance.
 
Last edited:
Few options to try on the R720 when actively running VM's
1) in the shell run "dmesg" to see if the server is complaining about memory, disc problems etc
2) run "ifconfig <your-nic-interface-name> look for TX and RX errors
3) Hopefully you have managed switch, look at the uplink ports for the R720 on the switch for the same errors
4) Try running a LXC container , just for comparison
5) Log into your idrac and run the Hardware Diagnostics in the lifecycle controller
 
Thanks for the ideas. I've been going over them.

I'm pretty sure its a performance issue and not a video issue. Programs lock up and freeze during lag, even though Task Manager and Proxmox show nowhere near 100% of the VM's resources being used. Intel VT-x/Virtualization is enabled in the BIOS, KVM hardware virtualization is enabled for all VMs, and VirtIO drivers are installed on all of them. Interestingly, after migrating the VMs from the R720 to the R730, they run perfectly fine. Also, running a few Debian terminal instances in LXC on the R720 seems to work without any issues.

I ran dmesg and reviewed the logs. There were only a few errors flagged in red:

  • "Warning: Unqualified SFP+ module detected": This is related to the SFP+ card I'm using for internet.
  • ACPI errors (detailed below): These might be unrelated, but it's the only other thing that showed up as an error.
In addition, iDRAC is reporting recurring drive issues once a month for the three SATA drives in the system (two in RAID 0 and one as the host drive). However, the VMs aren’t installed on these drives. Here's a sample log:

Fri Dec 27 2024 18:27:27 Drive 2 in disk drive bay 1 is operating normally.
Fri Dec 27 2024 18:27:27 Drive 0 in disk drive bay 1 is operating normally.
Fri Dec 27 2024 18:27:24 Drive 1 in disk drive bay 1 is operating normally.
Fri Dec 27 2024 18:16:21 Fault detected on drive 2 in disk drive bay 1.
Fri Dec 27 2024 18:16:21 Fault detected on drive 0 in disk drive bay 1.
Fri Dec 27 2024 18:16:19 Fault detected on drive 1 in disk drive bay 1.

Here are the ACPI errors from dmesg:
[ 11.595846] ACPI Error: AE_NOT_EXIST, Returned by Handler for [IPMI] (20230628/evregion-296)
[ 11.595857] ACPI Error: Region IPMI (ID=7) has no handler (20230628/exfldio-261)
[ 11.595872] No Local Variables are initialized for Method [_GHL]
[ 11.595874] No Arguments are initialized for method [_GHL]
[ 11.595877] ACPI Error: Aborting method \_SB.PMI0._GHL due to previous error (AE_NOT_EXIST) (20230628/psparse-529)
[ 11.595888] ACPI Error: Aborting method \_SB.PMI0._PMC due to previous error (AE_NOT_EXIST) (20230628/psparse-529)


I'm still waiting for some downtime so I can run the diagnostics. Now that I'm looking closer,

I also speed tested the raid 5 that the VM's use as storage and I'm getting about 27k IOPS, so I don't believe it's drive based.
 
Last edited:
Soo, I made a mistake and misunderstood which drives the VM's were installed on. They were actually all installed on an old SAS HDD Raid 5 array that, after testing, could do only 290 IOPS. I've since moved them to the drives they were supposed to be on, and they run fine now. Thank you all!