Hi all, I'm a super noob so excuse my lack of knowledge.
I was trying to install packages through the host but found that it timed out. Pinged google and also timed out. I can ping my gateway ip and local ips just fine. I have a few lxcs that DO have internet and can ping google just fine. The weird thing is that the host did have internet a few days prior, unsure when it stopped have internet access. To my knowledge there were no power outages or random reboots. Here's the steps I used to try to fix the issue:
Change CIDR from 22 to 24- This change locked me out of the GUI and had to hookup a monitor to my server in order my CIDR back to 22.
Check network config. Everything looks good. If you want to see it I'd more than happy to provide it.
Add DNS servers- Added 1.1.1.1 and 1.0.0.1, tried 8.8.8.8 as well.
Check routing- Again everything looks good, can provide the config.
Check firewall settings- no firewall is active.
Check bridge and interface config.
Restart networking service.
Checked logs- everything network wise looks good
Traceroute google.com- all 30 hops are dropped.
These are all of the things I tried that I could think of the top of my head. Again the host used to have internet access and my lxcs work and have internet access. I'm thinking of just nuking the proxmox server and restarting but would prefer not too since I spent too much time configuring lxcs . Please let me know if you have any idea of what caused this and any possible solutions!
I wouldn't go nuking the whole cluster just yet, this is a great learning experience! I'd examine the following outputs for anything that doesn't look right. You should be able to compare against the containers/VMs that have working internet access and see some anomalies. Feel free to post if you get stuck or need a second set of eyes
cat /etc/network/interfaces #review interfaces
ip route show #review routes
ping -c 3 1.1.1.1 #try to ping 1.1.1.1 3 times, tests for internet connectivity without DNS
ping -c 3 google.com #try to ping google.com,tests for DNS working or not
ip a #view ip address information
traceroute -w 1 google.com #traceroute to google.com, see if anything is unexpected. -w 1 to reduce the timeout so it doesn't hang
traceroute -w 1 <gateway> #traceroute to your gateway, see if anything unexpected
auto lo iface lo inet loopback iface eno1 inet manual auto vmbr0 iface vmbr0 inet static address 192.168.5.161/22 gateway 192.168.4.1 bridge-ports eno1 bridge-stp off bridge-fd 0 auto vmbr1 iface vmbr1 inet manual bridge-ports none bridge-stp off bridge-fd 0 source /etc/network/interfaces.d/*
ip routes are slightly different between the host and working lxc:
Host
default via 192.168.4.1 dev vmbr0 proto kernel onlink 10.1.0.0/24 dev wg0 proto kernel scope link src 10.1.0.2 192.168.4.0/22 dev vmbr0 proto kernel scope link src 192.168.5.16
Working lxc
default via 192.168.4.1 dev eth0 proto dhcp src 192.168.5.253 metric 1024 10.1.0.0/24 dev wg0 proto kernel scope link src 10.1.0.2 192.168.4.0/22 dev eth0 proto kernel scope link src 192.168.5.253 metric 1024 192.168.4.1 dev eth0 proto dhcp scope link src 192.168.5.253 metric 1024
There were two more under these but the numbers looked unique and like something I shouldn't be giving out.
Didn't really see anything else of the ordinary. Don't know where to go from here. Also forgot to mention that I've asked chatgpt for help as well before posting this thread and pretty much gave me the exact same stuff to do.
auto vmbr0 iface vmbr0 inet static address 192.168.5.161/22 gateway 192.168.4.1 bridge-ports eno1 bridge-stp off bridge-fd 0
auto vmbr1 iface vmbr1 inet manual bridge-ports none bridge-stp off bridge-fd 0
source /etc/network/interfaces.d/*
ip routes are slightly different between the host and working lxc:
Host
default via 192.168.4.1 dev vmbr0 proto kernel onlink 10.1.0.0/24 dev wg0 proto kernel scope link src 10.1.0.2 192.168.4.0/22 dev vmbr0 proto kernel scope link src 192.168.5.16
Working lxc
default via 192.168.4.1 dev eth0 proto dhcp src 192.168.5.253 metric 1024 10.1.0.0/24 dev wg0 proto kernel scope link src 10.1.0.2 192.168.4.0/22 dev eth0 proto kernel scope link src 192.168.5.253 metric 1024 192.168.4.1 dev eth0 proto dhcp scope link src 192.168.5.253 metric 1024
There were two more under these but the numbers looked unique and like something I shouldn't be giving out.
Didn't really see anything else of the ordinary. Don't know where to go from here. Also forgot to mention that I've asked chatgpt for help as well before posting this thread and pretty much gave me the exact same stuff to do.
Is there any chance this interface/tunnel within your LXC is responsible in some way? You should be able to run standard network commands inside your LXC such as ip a. I'm also curious what the results of a traceroute to 1.1.1.1 from your LXC looks like.
2: eth0@if7: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000 link/ether bc:24:11:04:ff:52 brd ff:ff:ff:ff:ff:ff link-netnsid 0 inet 192.168.5.253/22 metric 1024 brd 192.168.7.255 scope global dynamic eth0 valid_lft 9045sec preferred_lft 9045sec inet6 fe80::be24:11ff:fe04:ff52/64 scope link valid_lft forever preferred_lft forever 3: wg0: <POINTOPOINT,NOARP,UP,LOWER_UP> mtu 1420 qdisc noqueue state UNKNOWN group default qlen 1000 link/none inet 10.1.0.2/24 scope global wg0 valid_lft forever preferred_lft forever
"1:" starts with a 127.x.x.x ip and "4." is my tailscale tunnel. Something interesting I noticed was that out of the 10 lxcs I have the 10.1.0.2 was only present in this lxc and the host. It is a jellyfin lxc installed with tteck.
traceroute to 1.1.1.1 from lxc
traceroute to 1.1.1.1 (1.1.1.1), 30 hops max, 60 byte packets 1 10.1.0.1 (10.1.0.1) 36.430 ms 36.369 ms 36.377 ms 2 140.x.x.x (140.x.x.x) 36.343 ms 140.x.x.x (140.x.x.x) 36.316 ms 140.x.x.x (140.x.x.x) 36.296 ms 3 173.x.x.x (173.x.x.x) 40.789 ms 36.525 ms 40.747 ms 4 173.x.x.x (173.x.x.x) 59.640 ms * * 5 172.x.x.x (172.x.x.x) 36.922 ms 172.x.x.x (172.x.x.x) 36.889 ms 173.x.x.x (173.x.x.x) 36.888 ms 6 one.one.one.one (1.1.1.1) 36.272 ms 35.334 ms 35.256 ms
Wasn't sure if I should show the full ip addresses so x them out
"2" is an oracle vps I setup so I could access Jellyfin without tailscale. This was a fun little experiment I did. Also done to bypass my isp cgnat
"3-5" are all cloudflare ip
I REMEMBER NOW!!! Doing this I'm recalling more information. I followed this guide https://github.com/mochman/Bypass_CGNAT/wiki/Oracle-Cloud-(Creating). I got all the way to this part https://github.com/mochman/Bypass_CGNAT/wiki/Oracle-Cloud-(Automatic-Installer-Script), where i installed it on my host instead of the plex lxc (I was testing out plex but ended up going with jellyfin). I got it running but then I messed something up on the vps and got locked out of the ssh so I shut the oracle vps instance down. The plex lxc no longer exists either. I also never tried to get any packages through the host after this so I assume this is where the problem started. I also did this for jellyfin too, instead I ran the script on the lxc itself and not on the host (the vps instance is still up and running). My guess is since this original vps meant for plex no longer exists and the tunnel is still on my host, it's definitely causing interference. How would I remove this tunnel? I think we cracked it!
I FIXED IT!!!! After coming up with that hypothesis I uninstalled wireguard on the host and wireguard-tools and now I have internet on the host! Thank you for guiding me! Such a simple fix, thanks for not encouraging a nuke!