Proxmox Live Migration no network after Migration is done

spirit · Jan 13, 2020

AngeLinuX said:
Yes, but when we made a live migration between proxmox servers, the VM need at least 5 minutes to get up online. Anyway, I don't know if hetzner has changed anything last weekend, since now I'm having problems in outbound.

Thanks for your questions. I can have any test if you need.

Regards.

mmm, if the migration is between different physical sites, maybe hetzner have some arp cache somewhere with hardcorded tll.

just to be sure, you can start a vm, test network, stop the vm, migrate offline, start the vm on the remote node.
If network is not working, proxmox can't do anything for you, it's an hetzner problem.

AngeLinuX · Feb 2, 2020

Coming back to the problem, it makes sense about mtu 1400. We have some connection failures and we have the feeling that not only the hetzner virtual switch must be configured with mtu 1400, but also all the virtual machines that belong to this virtual switch. Does anyone know how to configure mtu 1400 at proxmox virtual machine level (not operating system VM level) ?

Thank you.

SebastianS · Feb 16, 2020

Hi AngeLinux,

no, it is an OS level setting.
I have seen some hacks editing some Proxmox configuration files.

My virtual machines are receiving their IP addresses via DHCP.
So I have setup my DHCP server to tell the clients to use 1400 as MTU.
This is working out just fine.
Maybe, this is an option for you as well.

Best regards
Sebastian

Gastondc · May 17, 2020

Hello.

I have de same issue with Pfsense and Proxmox en Heztner.

How resolved the problem? is It posible to change default to 1400 to all Proxmox?

I think a posible is VXLAN to replace Hetzner's Vswich, but a i need study. can you resolved wich a easy way?

Thanks!

AngeLinuX · May 17, 2020

Hello Gaston,

No, our problem is not solved. we continue having the 5 minutes downtime when we move a VM from one proxmox server to another in same cluster. We also upgrade to 6.2 version, but the problem persist. Hetzner support staff tell us that the problem is in our proxmox machines/ VMs and not in his virtual switches (but i think that the problem is in their virtual switches, because they cannot update instantaneously the arp tables).

Yes, ther are an option to set mtu 1400 default to every new VM that you add to the cluster, but remember that you loose this modification every time that you upgdate your proxmox server. Also, will not work in all VMs (depends also on VM network interface). My recommendation is to setup MTU 1400 to every VM that you have (including pfsense), but you can take a look here :

https://forum.proxmox.com/threads/hetzner-vswitch-and-proxmox.48594/#post-229848

I hope that this info will hepful for you. If you can find some solution, please share here !!

Regards.

Osikx · Nov 16, 2020

Hi @AngeLinuX

I think you are right about the assumption regarding the ARP table of the Hetzner vSwitch. I am testing a setup, where two OpnSense machines (running on top of 2 different Proxmox nodes) using CARP and virtual IPs.

I have the strong impression, they are using openvSwitch and the setup seems to regarding the fdb of that switch. At the moment I try to figure out with Hetzner if there is a way.

Their proposal, send a GARP and use different MAC addresses. of course this is not the way CARP is working.

In the end we see the exact same behaviour. 5minutes downtime. which matches the 300 seconds default cache time of the ARP table on the vSwitch.

As soon as I have more, I will update this thread.

Does anyone else have a setup running on Hetzner with multiple Proxmox nodes using a Hetzner vSwitch and CARP?

Regards,

DerDanilo · Nov 16, 2020

Osikx said:
Hi @AngeLinuX

I think you are right about the assumption regarding the ARP table of the Hetzner vSwitch. I am testing a setup, where two OpnSense machines (running on top of 2 different Proxmox nodes) using CARP and virtual IPs.

I have the strong impression, they are using openvSwitch and the setup seems to regarding the fdb of that switch. At the moment I try to figure out with Hetzner if there is a way.

Their proposal, send a GARP and use different MAC addresses. of course this is not the way CARP is working.

In the end we see the exact same behaviour. 5minutes downtime. which matches the 300 seconds default cache time of the ARP table on the vSwitch.

As soon as I have more, I will update this thread.

Does anyone else have a setup running on Hetzner with multiple Proxmox nodes using a Hetzner vSwitch and CARP?

Regards,

Yes, a customer was running this setup and it was working fine with immediate failover (ping only had latency for 1-2 seconds). BUT it was all running in the same rack, hence the IP was just switched to another VM on a hypervisor most probably connected to the same physical switch.

Some customer is using keepalived (VRRP if I am not mistaken) which is also running fine. But there is a backend physical switch that handles the hearthbeat between the keepalived instances.

krawum · Jan 6, 2021

Hi there,

as @Osikx already mentioned, the 5-minute-downtime after switching a CARP IP from Master to Backup is a problem caused by Hetzners vSwitch. We did a few tests yesterday after running into exactly the same issue.
I can also confirm that this is caused by some kind of MAC address caching within the vSwitch, if you have servers in different sections of Hetzners datacenter park. The issue basically has nothing to do with proxmox or opnsense, it can also be reproduced with virtual network interface directly at the dedicated server itself.

What we've tried yesterday:
- Host A, B, C and D are connected via Hetzner vSwitch.
- Host A at FSN1-DC16 is configured with IP 10.0.0.1
- Host B at FSN1-DC14 is configured with IP 10.0.0.2 and mac address 00:11:22:33:44:55
- Host C at FSN1-DC4 is not configured
- Host D at FSN1-DC4 is not configured
=> Ping from Host A to 10.0.0.2 (-> B) works perfectly

So we changed the configuration and "moved" the entire virtual network interface (MAC+IP):
- Host B: Disable network interface - not configured anymore
- Host C: Add interface with same configuration previously configured at Host B (10.0.0.2 / MAC 00:11:22:33:44:55)
=> Ping from Host A to 10.0.0.2 (-> C) started working after ~5 minutes (the well-known delay)
=> When we've tried the same with different mac addresses on Host B / C, ping worked after ~2 seconds.
=> Sending GARP packages didn't help either.
=> Different MAC addresses for CARP doesn't seem to be a good solution (and won't work with opnsense)

We've played around a little bit and moved the MAC+IP around a little bit in the same way:
- from Host B to C: 5 min delay
- from Host B to D: 5 min delay
- from Host C to B: 5 min delay
- from Host C to D: ping worked immediately
- from Host D to C: ping worked immediately
- from Host D to B: 5 min delay
=> Moving around MAC+IP at the same building (FSN1-DC4) worked quite good

==> With this in knowledge, we have ordered a move of Host B to FSN1-DC16 so that host A and B are in the same DC (costs 39€)
==> It seems that our CARP-IPs working now...
------------------------------------------------------------

Long story short: CARP IPs via Hetzner vSwitch seems to work, if both dedicated hosts in the same datacenter:

Search

Search

Proxmox Live Migration no network after Migration is done

spirit

Distinguished Member

AngeLinuX

Renowned Member

SebastianS

New Member

Gastondc

Well-Known Member

AngeLinuX

Renowned Member

Osikx

New Member

DerDanilo

Famous Member

krawum

New Member

We value your privacy