Temporary failure in name resolution - Help Needed

PVE is a hypervisor suite built on Debian Linux with an Ubuntu kernel. The userland is nearly identical to a standard Debian installation. PVE’s networking is entirely Linux-based - there’s nothing proprietary or unusual compared to something like ESXi.

There are hundreds of thousands of successful PVE installations out there, from gray-market mini-PCs to enterprise-grade servers. With tens of millions of Debian-based systems in production globally, the odds that you’ve discovered a brand-new Linux/Debian/Ubuntu/PVE networking bug are vanishingly small.

Each week, we see someone come to the forum convinced there’s a fundamental issue with PVE. After a few rounds of “20 Questions,” it turns out they’ve added some undocumented twist during installation or enabled clever behavior on their network gear that disrupts normal functionality.

At least once a month, someone reports “mysterious” network problems that trace back to a "smart" router silently blocking traffic from static IPs placed within the DHCP range.

Your testing and reporting haven’t been especially methodical or consistent. There’s a lot of “but it works over here,” without fully tracing the implications. Clearly, something in your network setup does work - you are, after all, posting on a public forum via the Internet.

A simple next step: install plain vanilla Debian - no PVE, no automation scripts, no clever tricks. Use a static IP if you want to replicate a typical PVE configuration. Once you confirm that works, follow the official guide to install Proxmox on top:
https://pve.proxmox.com/wiki/Install_Proxmox_VE_on_Debian_12_Bookworm


Blockbridge : Ultra low latency all-NVME shared storage for Proxmox - https://www.blockbridge.com/proxmox
 
Your testing and reporting haven’t been especially methodical or consistent. There’s a lot of “but it works over here,” without fully tracing the implications.
I'm doing what I can and know. I did everything it was asked me, and I'm happy to do extra steps and provide extra info if instructed on how to!

A simple next step: install plain vanilla Debian - no PVE, no automation scripts, no clever tricks. Use a static IP if you want to replicate a typical PVE configuration. Once you confirm that works, follow the official guide to install Proxmox on top:
Aha. I already went in this direction. In the moment I have a plain vanilla Dabian (no desktop environment) in front of me, in the same machine.

Debian comes of the box with DHCP client enabled, so I haven't gotten to that fixed-IP part yet.
But I'm already stuck on the same problem. Can ping lan, can ping 1.1.1.1, pinging google.com gives the usual "Temporary failure in name resolution"

As I said, it's something that only affects Debian machines in my network. Fedora, MacOS, Windows, Android, all work fine.
And I can't imagine it's Debian itself, as you said, thousands of people do this everyday without any problems.
(Though it should be noted "Temporary failure in name resolution" does yield a lot of web results)

As you, I think it's my router, or, more specifically, they way my network interacts with Debian's default settings.
But then again, I'm at a loss, my router & APs are all OpenWRT 24.10, no smart routers, no fancy firewall rules, just that wireguard client, which hasn't presented problems so far elsewhere.

What I can't wrap my mind around is how a fresh Proxmox install will work for 30s than go dark. (which is why I checked time & date)


Edit: one thing I can think of is to install Debian offline, switch to static IP, connect the cable and see if the Proxmox situation repeats itself. Though I'm not sure what that accomplishes.
 
Last edited:
someone reports “mysterious” network problems that trace back to a "smart" router silently blocking traffic from static IPs placed within the DHCP range

This got me thinking. My network is not all openwrts. I do have a smart switch.
So I tried a fresh Proxmox install with the PC connected directly to the router. And unfortunately nothing different this time.
 
my network is not all openwrts. I do have a smart switch.
Plot thickens.
So I tried a fresh Proxmox install with the PC connected directly to the router
So previously you had multiple switch/router layers?
In the moment I have a plain vanilla Dabian (no desktop environment) in front of me, in the same machine.

Debian comes of the box with DHCP client enabled, so I haven't gotten to that fixed-IP part yet.
But I'm already stuck on the same problem. Can ping lan, can ping 1.1.1.1, pinging google.com gives the usual "Temporary failure in name resolution"
If you are getting a failure even with a fresh installation of a widely used OS with DHCP, meaning that all of the information is supplied dynamically by whatever device does your DHCP, then it suggests that messing around with static IP and config files is premature.

You need to troubleshoot the primary failure with minimal complexity. Troubleshooting may include "strace", "tcpdump", "ltrace", etc.

Considering all of the facts, including:
Replacing nameserver 192.168.10.1 with nameserver 1.1.1.1 does solve the issue. In the node.
I feel it is safe to say that the issue is _not_ PVE related and falls out of scope of the topic of this forum: Proxmox VE: Installation and configuration

I suggest that you continue troubleshooting with vanilla OS installation and perhaps /r/debian or /r/techsupport can be of assistance.

Cheers


Blockbridge : Ultra low latency all-NVME shared storage for Proxmox - https://www.blockbridge.com/proxmox
 
Last edited:
As I said, it's something that only affects Debian machines in my network.
So really your issue does not belong in this forum, but rather in a (vanilla) Debian based one.

BTW what hostname with FQDN (Fully Qualified Domain Name) are you providing on both the Debian & PVE installs?

In the moment I have a plain vanilla Dabian (no desktop environment) in front of me, in the same machine.
Is that using the same HW as the PVE install?

What I can't wrap my mind around is how a fresh Proxmox install will work for 30s than go dark

You may have some HW/incompatibility issue?

Maybe start by trying an alternate NIC/cables etc. Then maybe try a different PC completely with vanilla Debian.
 
Plot thickens.
=)

So previously you had multiple switch/router layers?
openwrt router -> tplink smart switch -> openwrt APs
Proxmox PC was connected via cable to the switch
But it's not the switch, connecting directly to the router yielded the same result.

I feel it is safe to say that the issue is _not_ PVE related and falls out of scope of the topic of this forum: Proxmox VE: Installation and configuration

I suggest that you continue troubleshooting with vanilla OS installation and perhaps /r/debian or /r/techsupport can be of assistance.
I figured as much. Again, I'm very appreciative of all the attention and support you guys gave here!

You need to troubleshoot the primary failure with minimal complexity. Troubleshooting may include "strace", "tcpdump", "ltrace", etc.
That's pretty helpful, thanks!


BTW what hostname with FQDN (Fully Qualified Domain Name) are you providing on both the Debian & PVE installs?
Proxmox: pve.local
Debian: just the default "debian"

Is that using the same HW as the PVE install?
Correct.

You may have some HW/incompatibility issue?

Maybe start by trying an alternate NIC/cables etc. Then maybe try a different PC completely with vanilla Debian.
It tested this hypothesis. 2 different computers (one new i5, one 10yo AMD), 2 network cables, 2 different installation processes (Debian / Proxmox isos), exact same results.


Well, I know it's not a Proxmox issue, but a lot of people seem to be interested, so I'll keep posting what I found out, and hopefully a definitive solution for those who come to this thread in search of an answer, whatever the source of the issue is.

strace ping google.com, with DNS = router
Code:
connect(5, {sa_family=AF_INET, sin_port=htons(53), sin_addr=inet_addr("192.168.10.1")}, 16) = 0
poll([{fd=5, events=POLLOUT}], 1, 0)    = 1 ([{fd=5, revents=POLLOUT}])
sendmmsg(5, [{msg_hdr={msg_name=NULL, msg_namelen=0, msg_iov=[{iov_base="\35Y\1\0\0\1\0\0\0\0\0\0\6google\3com\0\0\1\0\1", iov_len=28}], msg_iovlen=1, msg_controllen=0, msg_flags=0}, msg_len=28}, {msg_hdr={msg_name=NULL, msg_namelen=0, msg_iov=[{iov_base="\312Z\1\0\0\1\0\0\0\0\0\0\6google\3com\0\0\34\0\1", iov_len=28}], msg_iovlen=1, msg_controllen=0, msg_flags=0}, msg_len=28}], 2, MSG_NOSIGNAL) = 2
poll([{fd=5, events=POLLIN}], 1, 5000)  = 0 (Timeout)

strace ping google.com, with DNS=1.1.1.1
Code:
connect(5, {sa_family=AF_INET, sin_port=htons(53), sin_addr=inet_addr("1.1.1.1")}, 16) = 0
poll([{fd=5, events=POLLOUT}], 1, 0)    = 1 ([{fd=5, revents=POLLOUT}])
sendto(5, "w\375\1\0\0\1\0\0\0\0\0\0\003206\003217\003250\003142\7in-"..., 46, MSG_NOSIGNAL, NULL, 0) = 46
poll([{fd=5, events=POLLIN}], 1, 5000)  = 1 ([{fd=5, revents=POLLIN}])

It uses different methods (sendto vs sendmmsg), and doesn't get a timeout


Also: my router doesn't log any firewall rejections for the DNS probing, but the DNS log doesn't register the request.

Also: I tried the fresh install offline method to try and log a strace from a working resolution, but wasn't able to achieve it this time.
 
I'd be mildly curious in execution and results of this request:



Blockbridge : Ultra low latency all-NVME shared storage for Proxmox - https://www.blockbridge.com/proxmox
 
I'm guessing on a standard Proxmox install, that should be:
The packets will flow vmbr>bond>eth and should be the same at all levels, I prefer to see it making to bond, or even eth.
If they don't make it to bond, the issue is at the bridge layer.

EDIT: I thought I saw bond mentioned in one of the posts but can no longer locate that post.
It may need to be : tcpdump -nnni enp0s31f6 udp port 53


Blockbridge : Ultra low latency all-NVME shared storage for Proxmox - https://www.blockbridge.com/proxmox
 
Last edited:
Code:
root@pve:~# tcpdump -ni vmbr0 udp port 53
tcpdump: verbose output suppressed, use -v[v]... for full protocol decode
listening on vmbr0, link-type EN10MB (Ethernet), snapshot length 262144 bytes
15:57:12.298584 IP 192.168.10.20.56450 > 192.168.10.1.53: 62847+ A? google.com. (28)
15:57:12.298588 IP 192.168.10.20.56450 > 192.168.10.1.53: 39264+ AAAA? google.com. (28)
15:57:17.303905 IP 192.168.10.20.56450 > 192.168.10.1.53: 62847+ A? google.com. (28)
15:57:17.303926 IP 192.168.10.20.56450 > 192.168.10.1.53: 39264+ AAAA? google.com. (28)
15:57:22.309305 IP 192.168.10.20.43557 > 192.168.10.1.53: 43274+ A? google.com.local. (34)
15:57:22.309317 IP 192.168.10.20.43557 > 192.168.10.1.53: 53620+ AAAA? google.com.local. (34)
15:57:27.314580 IP 192.168.10.20.43557 > 192.168.10.1.53: 43274+ A? google.com.local. (34)
15:57:27.314599 IP 192.168.10.20.43557 > 192.168.10.1.53: 53620+ AAAA? google.com.local. (34)

Code:
root@pve:~# tcpdump -nnni enp0s31f6 udp port 53
tcpdump: verbose output suppressed, use -v[v]... for full protocol decode
listening on enp0s31f6, link-type EN10MB (Ethernet), snapshot length 262144 bytes
16:01:40.869595 IP 192.168.10.20.44226 > 192.168.10.1.53: 10230+ A? google.com. (28)
16:01:40.869598 IP 192.168.10.20.44226 > 192.168.10.1.53: 14321+ AAAA? google.com. (28)
16:01:45.874924 IP 192.168.10.20.44226 > 192.168.10.1.53: 10230+ A? google.com. (28)
16:01:45.874937 IP 192.168.10.20.44226 > 192.168.10.1.53: 14321+ AAAA? google.com. (28)
16:01:50.880308 IP 192.168.10.20.53138 > 192.168.10.1.53: 19864+ A? google.com.local. (34)
16:01:50.880315 IP 192.168.10.20.53138 > 192.168.10.1.53: 5786+ AAAA? google.com.local. (34)
16:01:55.885663 IP 192.168.10.20.53138 > 192.168.10.1.53: 19864+ A? google.com.local. (34)
16:01:55.885672 IP 192.168.10.20.53138 > 192.168.10.1.53: 5786+ AAAA? google.com.local. (34)
 
And with DNS = 1.1.1.1

Code:
root@pve:~# tcpdump -ni vmbr0 udp port 53
tcpdump: verbose output suppressed, use -v[v]... for full protocol decode
listening on vmbr0, link-type EN10MB (Ethernet), snapshot length 262144 bytes
16:04:36.590853 IP 192.168.10.20.46329 > 1.1.1.1.53: 32695+ A? google.com. (28)
16:04:36.590858 IP 192.168.10.20.46329 > 1.1.1.1.53: 2486+ AAAA? google.com. (28)
16:04:36.724006 IP 1.1.1.1.53 > 192.168.10.20.46329: 32695 1/0/0 A 142.250.217.238 (44)
16:04:36.729134 IP 1.1.1.1.53 > 192.168.10.20.46329: 2486 1/0/0 AAAA 2607:f8b0:4008:80a::200e (56)
16:04:36.863536 IP 192.168.10.20.36651 > 1.1.1.1.53: 49952+ PTR? 238.217.250.142.in-addr.arpa. (46)
16:04:36.995370 IP 1.1.1.1.53 > 192.168.10.20.36651: 49952 1/0/0 PTR mia07s62-in-f14.1e100.net. (85)
16:04:37.862985 IP 192.168.10.20.48398 > 1.1.1.1.53: 30596+ PTR? 238.217.250.142.in-addr.arpa. (46)
16:04:37.995727 IP 1.1.1.1.53 > 192.168.10.20.48398: 30596 1/0/0 PTR mia07s62-in-f14.1e100.net. (85)
16:04:38.865053 IP 192.168.10.20.52018 > 1.1.1.1.53: 20243+ PTR? 238.217.250.142.in-addr.arpa. (46)
16:04:38.998156 IP 1.1.1.1.53 > 192.168.10.20.52018: 20243 1/0/0 PTR mia07s62-in-f14.1e100.net. (85)
16:04:39.866388 IP 192.168.10.20.35459 > 1.1.1.1.53: 28613+ PTR? 238.217.250.142.in-addr.arpa. (46)
16:04:40.000121 IP 1.1.1.1.53 > 192.168.10.20.35459: 28613 1/0/0 PTR mia07s62-in-f14.1e100.net. (85)
 
Well, as you can see packets are sent and no replies are received.
Again, pointing to your external infrastructure. We could have saved a lot of typing if this was done yesterday.

Next step - network trace on your "DNS" server. Are packets making it there? Are replies being sent?


Blockbridge : Ultra low latency all-NVME shared storage for Proxmox - https://www.blockbridge.com/proxmox
 
Well, as you can see packets are sent and no replies are received.
Again, pointing to your external infrastructure. We could have saved a lot of typing if this was done yesterday.

Next step - network trace on your "DNS" server. Are packets making it there? Are replies being sent?


Blockbridge : Ultra low latency all-NVME shared storage for Proxmox - https://www.blockbridge.com/proxmox

Code:
root@OpenWrt:~# tcpdump -n  host 192.168.10.20
tcpdump: verbose output suppressed, use -v[v]... for full protocol decode
listening on eth0, link-type EN10MB (Ethernet), snapshot length 262144 bytes
16:45:13.404197 IP 192.168.10.20.42166 > 192.168.10.1.53: 12786+ A? google.com. (56)
16:45:18.409514 IP 192.168.10.20.42166 > 192.168.10.1.53: 12786+ A? google.com. (56)
16:45:18.772189 ARP, Request who-has 192.168.10.1 tell 192.168.10.20, length 46
16:45:18.772220 ARP, Reply 192.168.10.1 is-at dc:a6:32:50:68:1c, length 28
16:45:23.414835 IP 192.168.10.20.41113 > 192.168.10.1.53: 56782+ A? google.com.local. (68)
16:45:28.420225 IP 192.168.10.20.41113 > 192.168.10.1.53: 56782+ A? google.com.local. (68)


And this is my notebook
Code:
root@OpenWrt:~# tcpdump -n  host 192.168.10.130 and port 53
tcpdump: verbose output suppressed, use -v[v]... for full protocol decode
listening on eth0, link-type EN10MB (Ethernet), snapshot length 262144 bytes
16:47:50.404664 IP 192.168.10.130.52902 > 192.168.10.1.53: 59338+ A? google.com. (28)
16:47:50.545485 IP 192.168.10.1.53 > 192.168.10.130.52902: 59338 1/0/0 A 142.250.189.142 (44)
 
Last edited:
& don't forget WireGuard in this murky soup.
Turns out it's not it

Next step - openwrt forum.
Yup

Answer is here (credit to S456 in OpenWRT's forum):
Details of this resolv.conf option from the resolv.conf manual:
single-request (since glibc 2.10)
Sets RES_SNGLKUP in _res.options. By default, glibc
performs IPv4 and IPv6 lookups in parallel since
glibc 2.9
. Some appliance DNS servers cannot handle
these queries properly and make the requests time
out.
This option disables the behavior and makes
glibc perform the IPv6 and IPv4 requests
sequentially (at the cost of some slowdown of the
resolving process).

The solution then is to run ethtool -K eth0 rx-gro-list off on the router (credit to rucketeg)

For people without access to the router (or with a different router with an unknown solution) a workaround is to include either options single-request or options single-request-reopen in /etc/resolv.conf on the clients.


Thanks for everybody's help and I'm sorry for wasting everyone's time.
But at least there's now an answer here. This topic is high up on search results for "Temporary failure in name resolution" so people will be able to understand the issue and find the solution.
 
Last edited:
  • Like
Reactions: BobhWasatch
So it seems to be specific to a particular Ethernet driver used by RPi4 boards. Which explains why I have never seen this problem, as my OpenWRT is running on a Celeron J1900 with Intel Ethernet.

Nice find!

ETA: On the other hand, the #1 cause of "temporary failure in name resolution" is a bad /etc/resolv.conf or similar, not oddities like this.
 
Last edited: