VM disturbing the whole machine

greg

Renowned Member
Apr 6, 2011
137
2
83
Greetings
I've been using Proxmox for years, mainly form CTs, with a few VMs.
Recently I started developing for Docker, so I created various VMs and CTs for testing.

The last VM I created is problematic: when it's running, the whole machine (host + other guests) are slow and show messages like:

Code:
NMI watchdog: BUG:soft lockup - CPU#2 stuck for 21s!

The problematic VM itself show weird messages:
Code:
rcu_sched detected stalls on CPUs/tasks:
6-...0: (1 GPs behind) idle=52a/1/0x4000000000 softirq=2573/2574 fqs=2506
(detected by 3, t=5252 jiffies, g=3089, q=271)
rcu_sched kthread starved for 1366 fiffies! g3089 f0x0 RCU_Gp_DOING_FQS(6) ->state=0x0 ->cpu=0

There's nothing running in it beside the Docker daemon. The "summary" page show low CPU usage, low memory, low network.

I'm puzzled... any idea?

Thanks in advance

Regards
 
maybe it is saturation your storage? this would explain why everything else is slow
 
Thanks for your answer. Storage has 200G left. Would that be a possible explanation for my problem? when the VM is not running, everything works fine.
 
no i meant that the vm maybe generates a high io load (iow. reads and writes much) so that the disk is saturated
 
not much what it can be then... anything in the guest/host logs ?
 
Only the weird CPU/irq message I posted.
I'm trying to free a lot of space to test if space is the problem, I know ZFS deals poorly with low disk space.
 
It seems you were right: I casually let the zfs partition go behind 80% and it was probably the root cause. After a lot of cleanup, things seems to be back to normal. Thanks a lot!!
 
Well it wasn't the cause... I know have 2T free disk, the VM is using 1G of 3G allowed, server load is between 0 and 4 (for 12 cores) and I see this kind of things:

Jun 1 17:18:24 dkr06 kernel: [10071.847256] clear_huge_page+0x110/0x200
Jun 1 17:19:43 dkr06 kernel: [10150.892113] clear_subpage+0x3b/0x50
Jun 1 17:19:43 dkr06 kernel: [10150.892114] clear_huge_page+0x110/0x200
Jun 1 17:19:43 dkr06 kernel: [10150.892114] do_huge_pmd_anonymous_page+0x1b5/0x740
Jun 1 17:19:43 dkr06 kernel: [10150.892114] __handle_mm_fault+0xdba/0x1270
Jun 1 17:19:43 dkr06 kernel: [10150.892115] handle_mm_fault+0xd6/0x200
Jun 1 17:19:43 dkr06 kernel: [10150.892115] __do_page_fault+0x249/0x4f0
Jun 1 17:19:43 dkr06 kernel: [10150.892115] ? async_page_fault+0x8/0x30
Jun 1 17:19:43 dkr06 kernel: [10150.892115] async_page_fault+0x1e/0x30
Jun 1 17:19:43 dkr06 kernel: [10150.892116] RIP: 0033:0x7f4ca5351346
Jun 1 17:19:43 dkr06 kernel: [10150.892116] Code: 00 00 00 0f 18 8e c0 00 00 00 0f 10 06 0f 10 4e 10 0f 10 56 20 0f 10 5e 30 48 83 c6 40 48 83 ea 40 66 0f e7 07 66 0f e7 4f 10 <66> 0f e7 57 20 66 0f e7 5f 30 48 83 c7 40 48 83 fa 40 77 be 0f ae
Jun 1 17:19:43 dkr06 kernel: [10150.892117] RSP: 002b:00007ffd9a10a4b8 EFLAGS: 00010202
Jun 1 17:19:43 dkr06 kernel: [10150.892117] RAX: 00007f4c8aa00018 RBX: 00007f4c8aa00000 RCX: 00007f4c8f0297b1
Jun 1 17:19:43 dkr06 kernel: [10150.892118] RDX: 00000000008297a1 RSI: 00007f4c93000008 RDI: 00007f4c8e7fffe0
Jun 1 17:19:43 dkr06 kernel: [10150.892118] RBP: 00007f4ca2212050 R08: fffffffffffffff8 R09: 0000000000000000
Jun 1 17:19:43 dkr06 kernel: [10150.892118] R10: 00007f4c8f0297c1 R11: 00007f4c8aa00018 R12: 00000000046297a9
Jun 1 17:19:43 dkr06 kernel: [10150.892119] R13: 00007f4c8f200000 R14: 00007f4ca2211ff0 R15: 00007f4c97b07290


What could have I done wrong for this VM to be able to bring the whole system down???
 
Interesting follow-up.

I moved a working VM to the host where the problem occured, and the VM is running erratically. So it's probably a host-related problem.

Both hosts are running PVE 5.4-15, the working is Intel Xeon E3-1245v2@3,4GHz, the problematic is Intel Xeon E5-1650 v2 @3.5 GHz.

Could it be a BIOS configuration issue?
 
As further testing, I moved the problematic VM to another node, and it's working.
So it's very likely a host-related issue.

Any idea??
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!