Okay, good to see I'm not the only one. So better not restart containers until this is fixed. Really bad issue. Luckily I have a second host which I haven't converted to OpenVSwitch yet, so at least I could start the container there. But not an ideal situation.
I think the multi-bridge setup is way too complicated. I just use one bridge:
# The loopback network interface
auto lo
iface lo inet loopback
# The primary network interface
allow-hotplug enp2s0
iface enp2s0 inet manual
auto vmbr0
iface vmbr0 inet dhcp
bridge-ports enp2s0
bridge-stp...
Yes, it works. There are two ways to do it: 1. is to pass through only one interface to pfSense and set up VLANs in pfSense. That gives you a little more flexibility. Remember to allow the VLANs for the interface by editing the conf file for the VM and adding ,trunks=1;2;3 to allow VLANs 1, 2...
We do Live Migrations between AMD and Intel without problems. The key is for Windows to set the CPU type to "Westmere". Then there's no problem. Cold migration (shut down, move, start) is no problem regardless the CPU setting.
I have never had any version of Windows complain about a different...
It's a bug. Proxmox team knows about it and won't acknowledge or do something about it. Live migration worked fine in 5.0 and then broke after that.
See https://forum.proxmox.com/threads/live-migration-broken-for-vms-doing-real-work.49380/ and https://bugzilla.proxmox.com/show_bug.cgi?id=1660...
Ok, we got new hardware and since it still didn't work even between identical machines. We even purchased a PVE subscription to eliminate this as a factor.
I did some more searching then and found this thread:
https://pve.proxmox.com/pipermail/pve-user/2018-February/169238.html
Which describes...
Yes, it would be an interesting experiment (but not more!) to have a high redundancy, then take 3 of the servers and start them up at one location and the other 3 in a separate network and see if the data can be read in both clusters :confused:
Somebody can correct me if this is wrong, but to my understanding if you want to survive with 6 OSD down then you need m=6.
Shutdown/startup usually work fine if you follow proper procedures: first shut down all clients (VMs), then shutdown all servers. Then start up again. Ceph will not start...
Of course you can, also with RAW. You will need to enable and use TRIM in your VM and then the backup file will only contain the used blocks.
But PVE 3 is really old, I have no idea whether TRIM is supported.
It's been a while since I set our EC pool up, but I think K=2, M=1 won't work because it's not spread out enough to work with one failed host. We have k=4 and m=2, which I think is the minimum (we have 2 or 3 SSDs per host and 5 hosts). It works well, but there's not a lot more I can say about...
So, is anybody able to confirm or deny that installing Debian 9 in a VM with the parameters outlined above works? I think that could be the first step to find out where the problem is.
Why would you guess that?
The BIOS versions and CPU microcode levels as well as /proc/cpuinfo are absolutely identical. These were provisioned on the same day, so I wouldn't have thought why they would be different. Also, how would anything there explain that migration from Ryzen to Ryzen...
That's how I understood it. The problem happens regardless whether the physical machines have the same or different CPU. We have 5 machines, 2 Xeon, 2 Ryzen, 1 Epyc. So plenty of possibilities to test - none work.
Here's an interesting test case: I created a VM with these parameters:
And started a basic Debian 9 text install via netboot.xyz. It crashed halfway through the install process.
I could see the error message on console 3 (Alt+F3).
Even on two machines that are virtually identically (same CPU, same RAM) it doesn't work. Also, I wouldn't except a machine to crash on startup only because I select the wrong CPU type (Westmere for example). I'll try to work on a reproducible test case.
We started using Proxmox for production in March this year. Everything was working fine, live migration was great and working perfectly (had the VM CPU setting to "Host" for performance).
Then with some update (around the time the major SPECTRE/MELTDOWN fixes came in) this stopped working for...
Happened again last night with a lesser used VM:
2018-08-21 22:28:37.298133 7f9fc6eae700 -1 bluestore(/var/lib/ceph/osd/ceph-1) _verify_csum bad crc32c/0x1000 checksum at blob offset 0x3000, got 0x6706be76, expected 0x77722c59, device location [0x40a89bb000~1000], logical extent 0x233000~1000...
The patch seems very trivial to me, merely retrying reads a specified number of times in case of a checksum failure. It will be "slower" when there is a checksum failure. How often do you think a checksum failure will/should occur?
Host memory usage is around 30-40%. How much lower should it...
This site uses cookies to help personalise content, tailor your experience and to keep you logged in if you register.
By continuing to use this site, you are consenting to our use of cookies.