yes I've been able to resolve via Proxmox support.
Basically I installed a package (vlan) via APT which removed the package "proxmox-ve" and prevented pve kernels to be updated.
After I reinstalled the "proxmox-ve" package, did a apt dist-upgrade and reboot, everything started working...
No luck. :(
VMs still experience lockups and I had to create a script that runs every 5 minutes, pings the guest agent and powercycles the vm when it doesn't reply. It's not the best solution, I know, but at least this reduces the downtime.
Digging into the forum I discovered a few posts which could be related to some APIC issues.
which both resolved by disabling some power...
Don't know if this could help, but I found some errors in the eventlog:
Bugcheck 0x00000109 with 0x17 as the 4th parameter means Local APIC modifications
Looks like the system behaves better, but lockups still happen even if less frequently.
I really don't know what to do next.
I exclude any hardware defect because
only windows server 2016 vms are affected
the idrac on the server does not report any issue
since a few days we're experiencing frequent lockups on ALL Windows Server 2016 VMs (i got 3).
These vms have been running without any issue for months, but now all are experiencing the same issue.
As you can see from the attached screenshot
CPU spikes at 100% and there's no way to...
I have the following scenario: a backup task of 5 vms is running at 09:00 pm and, for simplicity, let's assume that each backup is 10 Gb. The backup storage is set to maintain 5 copies, and at the current time there are already 5 copies, so at the next backup the oldest should be deleted...
I have the following configuration on my DELL VRTX:
3x M630 blades attached through a DAS with a Shared PERC8 raid controller in multipath failover configuration (as described in DELL documentation).
thanks for your reply.
I can tell for sure that nobody had directly modified that file... and neither had I, of course.
Anyway, when I was installing the server, I restored that vm from backup on the SAS storage and then moved the 50GB disk to SSD storage through UI -> Move disk .. no...
At the beginning of this week, I just deployed a new PVE 4.4-13 cluster over a DELL VRTX with 3 blades with HA active for almost every VMs, and everything was working flawlessly. Storage is LVM, of course.
Just one hour ago, for some unknown reason, one (or more? I didn't get...
So it seems that the overall Ceph performance is not that great as one may think...
I mean: can I compare eg. 3-node ceph cluster with 2 osd (15k SAS 600gb) + 1 jornaling ssd per each node against a dedicated FC/iSCSI SAN with the same amount and kind of disks? I think the SAN would win here in...