We're running a 3-node cluster version PVE 8. We routinely reboot nodes for applying security updates or regular updates and most of the VMs have a somewhat high uptime (15-180 days) as we move them before node reboot. Most of them run IPv6-only, some dual stack. All of the "network device" for the VM are created with the Firewall off and VirtIO drivers (linux guests). Most of our VMs still have a random mac address set (which was the previous default), not yet from the proxmox assigned prefix. The hypervisor firewall is also off. The hardware is HP ProLiant Gen8
After some time we noticed strange behaviour of one individual VM:
The connection into and from the VM starts getting spotty, strange packet loss resulting in a lot of TCP retransmissions and connections being limited to a few mbit/s of speed. This appeared only to IPv6 and most prominent with the VM sending large amounts of data during e.g. a upload. During testing, we noticed that IPv4 was not affected, or at least not as much. IPv6 was very unstable at around 15-30 mbit/s for sending with hundrets to thousands retransmissions. Receiving yielded initially 3 gbit/s for the first second and dropped down to about 1 gbit for the remaining 9 seconds. The hardware node is connected via a 2x 10 gbit/s bond to the network and all testing was performed within our network.
We also tested and made sure that the hypervisor node of the slow VM itself worked fine via iperf3 tcp and udp (around 8 gbit/s sending and receiving). We also tested the invididual network VLAN at other places through other devices (VM and hardware) to rule out strange networking outside of the hypervisor.
We were able to fix the problem by using the webinterface for deleting the mac address, causing proxmox to generate one from the proxmox assigned prefix https://macaddress.io/statistics/company/32814 and rebooting the VM. The speeds immediately recovered for sending are at 8,5 gbit/s and receiving 7,5 gbit/s -- back to normal.
We then changed the mac back to the random mac and rebooted once again: no more problems, everything still at normal.
Does this ring a bell or did anyone have similar situations? Is this related to changing the mac? Or changing the mac from random to proxmox prefix? Or Rebooting the VM after? Does time play a role for the problem to appear?
After some time we noticed strange behaviour of one individual VM:
The connection into and from the VM starts getting spotty, strange packet loss resulting in a lot of TCP retransmissions and connections being limited to a few mbit/s of speed. This appeared only to IPv6 and most prominent with the VM sending large amounts of data during e.g. a upload. During testing, we noticed that IPv4 was not affected, or at least not as much. IPv6 was very unstable at around 15-30 mbit/s for sending with hundrets to thousands retransmissions. Receiving yielded initially 3 gbit/s for the first second and dropped down to about 1 gbit for the remaining 9 seconds. The hardware node is connected via a 2x 10 gbit/s bond to the network and all testing was performed within our network.
We also tested and made sure that the hypervisor node of the slow VM itself worked fine via iperf3 tcp and udp (around 8 gbit/s sending and receiving). We also tested the invididual network VLAN at other places through other devices (VM and hardware) to rule out strange networking outside of the hypervisor.
We were able to fix the problem by using the webinterface for deleting the mac address, causing proxmox to generate one from the proxmox assigned prefix https://macaddress.io/statistics/company/32814 and rebooting the VM. The speeds immediately recovered for sending are at 8,5 gbit/s and receiving 7,5 gbit/s -- back to normal.
We then changed the mac back to the random mac and rebooted once again: no more problems, everything still at normal.
Does this ring a bell or did anyone have similar situations? Is this related to changing the mac? Or changing the mac from random to proxmox prefix? Or Rebooting the VM after? Does time play a role for the problem to appear?