I've got a situation right now where one VM, upon reboot, encountered filesystem corruption.
Upon closer examination, I see that the Proxmox GUI doesn't accurately reflect which cluster node that VM is running on.
The GUI shows VM #107 running on node #2, whereas the KVM process with -id 107 is...
On a fairly regular basis, but unpredictably, when I attempt to migrate a VM from one host to another, I get failures with no useful error information.
Digging deeper reveals that the problem is at the rgmanager level somehow, with the VM resources being marked as "failed" even though they're...
Workaround: if I disable the BMC's local IPMI interface, the system boots fine. IPMI-LAN still works OK but the local channel breaks. Possibly the IPMI card is starting to fail?
I noticed that one server in my cluster - the one I typically test updates on first - has been down for a little while.
(Kudos to Proxmox for making a technology where 25% of my cluster has been down for days or possibly weeks without me even noticing.)
The reason it's dead in the water is...
Short answer: don't use vmbr with balance-xor. Most of the other modes will work fine. LACP (aka 802.3ad) definitely does work correctly, but that requires configuration in your next-hop (hardware) switch. Active-passive will definitely work correctly. Balance-arp *should* but I've never...
I've got a four-node PVE cluster running CEPH (locally, on the PVE nodes).
I see from other posts that in the uncomplicated case, all I would have to do is shut down the nodes, change their IP addresses (/etc/network/interfaces), and reboot them.
But what do I do about nodes that are also...
A scaled-out CEPH cluster inherently uses many, many paths, but not "multipath" as in having redundant paths
You've been very lucky, then.
I see - across all the switches and servers I'm at least partly responsible for - roughly a 1% network-component failure rate per year per installation...
You don't have to set IP addresses at all on ethX or vmbrX interfaces that you aren't using for IP - just leave the address field blank and ensure the auto start check box is turned on.
Sorry, but what you're asking for is so far outside the scope of any virtualization hypervisor, that I don't think you'll ever get a solution, at least not for this exact configuration.
Dietmar already explained the fundamental problem: there's no way to mirror the state of non-virtual hardware...
Regardless, you're (almost certainly) describing a network problem, not a PVE problem. You should find out what causes this behaviour before running production VMs on that network.
-Adam
Yes, I know.
I did not ask for a solution to my migration issue, I asked for suggestions on how to troubleshoot. I am not a QEMU expert, so I am not sure what to look at next, or even how to obtain debug logs.
I already know how to get this information under VMware, but I did not choose VMware...
I actually had to reboot Node#1 to clear the wedged KVM process; qm commands would all fail with some error about being unable to connect. I would up editing the vm config file and deleting "Locked: migration". (Going from memory, not necessarily an exact quote.)
Clicking "Stop" to cancel the migration produces no additional log output, but the status changes to "stopped: unexpected status".
Even better, all other attempts to control the VM on the original node now report "Error: VM is locked (migrate)".
I've got a PVE 3.1 cluster up and running now, using sheepdog for shared storage. (I've tried on five separate occasions, I have yet to successfully build a CEPH cluster so I just gave up. The whole point, for me, is to run storage and VM on the same nodes!)
Everything's updated to...
Do you need to use PCI passthrough for performance?
If not, stop using it and create a new bridge that connects only that card to only that VM, and migration will then be allowed.
Generally speaking, I have never thought it made any sense to virtualize a system, and then tie it to a specific...
Obvious solution:
# apt-get install screen
# screen
(then, from inside screen:)
# wget http://whatever
If you get disconnected for any reason, just log back in and run "screen -R" to reconnect to the running session.
Run
# man screen
for more information.
However... if you can't log in, how...
Sorry for the late addition to this discussion, but from the networking side there's one huge problem with the multiple-indepdent-switches theory of balance-rr. All the switches must be interconnected, since balance-rr assumes a common FIB at the switching layer.
In theory, if every single...
*sigh* This particular server no longer has internet access. I'll admit that 50% of the reason I bought the license was to get rid of the nag screen. (The other 50% is that the product is worth the small amount of $$.)
Where do I configure PVE to use an HTTP proxy?
[Never mind - I just found...
Found it. That server was pointing to a DNS server that no longer exists, so it thought "shop.maurer-it.com" didn't exist.
This, however, does mean that I cannot run PVE in an isolated environment (i.e. no internet access) and have licenses function???
I'm OK with the nagware screen (you...
This site uses cookies to help personalise content, tailor your experience and to keep you logged in if you register.
By continuing to use this site, you are consenting to our use of cookies.