Hello,
I have a 4 node PVE 2.3 cluster. This used to work until last week, so I'm preplexed on why it just stopped working. The only thing that was changed was the addition of another PVE node.
We use PXE booting to do our installs from a kickstart server for CentOS. Quite often the DHCP/BOOTP process will fail to even get an address. When it does make it past getting an address, the pxelinux menu may or may not load, and when it does, it always dies somewhere in TFTP downloading the vmlinux or initrd.img.
Both the DHCP server, and the TFTP server are on different networks, going through a router. I configured a DHCP server locally on the same physical segment that the PVE cluster is on, and DHCP worked 98% of the time. (still a few occasions where it failed, but I experienced that level of failure when I used to do this by hand with qemu/kvm on another system).
What I don't understand though, is why a PHYSICAL server on this network segment has absolutely no problem getting a DHCP address, loading pxelinux, and downloading the installer via TFTP. I tested this by rebooting my 3rd PVE node, and having it boot over the network. Repeatedly doing this worked 100% of the time. Its only when guests running on that same node try to do this is the failure rate nearly 100%.
Any ideas on what is causing this? It seems to be something fundamentally wrong with the bridge. At first I suspected duplicate IP, or arptables having issues, but the IP isn't the issue, and arptables isn't even on these Debian installs.
I have attached a network diagram to help (4th node not pictured, btw). Keep in mind, this works just fine when its a physical server (in this case vm-c1-b3)... It doesn't work when a guest is on vm-c1-b3.
Thanks,
Jason
I have a 4 node PVE 2.3 cluster. This used to work until last week, so I'm preplexed on why it just stopped working. The only thing that was changed was the addition of another PVE node.
We use PXE booting to do our installs from a kickstart server for CentOS. Quite often the DHCP/BOOTP process will fail to even get an address. When it does make it past getting an address, the pxelinux menu may or may not load, and when it does, it always dies somewhere in TFTP downloading the vmlinux or initrd.img.
Both the DHCP server, and the TFTP server are on different networks, going through a router. I configured a DHCP server locally on the same physical segment that the PVE cluster is on, and DHCP worked 98% of the time. (still a few occasions where it failed, but I experienced that level of failure when I used to do this by hand with qemu/kvm on another system).
What I don't understand though, is why a PHYSICAL server on this network segment has absolutely no problem getting a DHCP address, loading pxelinux, and downloading the installer via TFTP. I tested this by rebooting my 3rd PVE node, and having it boot over the network. Repeatedly doing this worked 100% of the time. Its only when guests running on that same node try to do this is the failure rate nearly 100%.
Any ideas on what is causing this? It seems to be something fundamentally wrong with the bridge. At first I suspected duplicate IP, or arptables having issues, but the IP isn't the issue, and arptables isn't even on these Debian installs.
I have attached a network diagram to help (4th node not pictured, btw). Keep in mind, this works just fine when its a physical server (in this case vm-c1-b3)... It doesn't work when a guest is on vm-c1-b3.
Thanks,
Jason