Difficulty transferring ISOs to Proxmox

thetrystero

New Member
May 12, 2020
12
2
3
38
My Proxmox machine (Intel NUC7i3BNH) is connected to my netgear router via ethernet. Other devices on the network are on WiFi. File transfers fail around 1-5% complete with the following error messages.

When I try to use the GUI to upload an ISO to a directory I get the error message "Error 0 occured while receiving the document." This is whether I am using Windows or ubuntu on the machine uploading the ISO.

When I try to use SCP from another Linux machine on the local network, I get the message "client_loop: send disconnect: Broken pipe lost connection"

When I use SCP from the Proxmox machine, I get the error message "Connection to <localip> port 22: message authentication code incorrect lost connection"

When I use WinSCP from my Windows machine, I get the error message "Copying to remote side failed"

Could it be a problem with my router settings? Do I need to set up any port forwarding on my router? Thank you in advance
 
Also, you might wonder why I don't just use wget to download the file directly on the proxmox box. I get a large number of interruptions with message "read error at byte x/y (Decryption has failed). Retrying." Sometimes I am able to successfully complete the download, but other times not.
 
Honestly, that all should really not happen and IMO it points to a bigger issue, one educated guess would be a HW failure, i.e., memory corruption (bad RAM stick).

Maybe let a memtest run on that machine for a while as a starter?
 
That would make a LOT of sense. I ran a few passes of memtest86 last week without any errors, but I will run it for longer this time. I couldn't get the memtest86+ option in the ProxMox boot menu to do anything (it just showed a black screen until I eventually turned the machine off), so I am running Memtest86 (without the plus) from a USB stick. Thank you for the idea.
 
couldn't get the memtest86+ option in the ProxMox boot menu to do anything (it just showed a black screen until I eventually turned the machine off),
Yeah, memtest86+ doesn't work with UEFI boot, and I did not yet found a working and open source UEFI implementation which does..
 
Thanks, sounds like I can turn off UEFI boot in the BIOS and use legacy boot mode to use Memtest86+. I will do that if Memtest86 doesn't find any errors. None so far, on the second of four passes.
 
I ran Memtest86 for four passes and found no memory errors. The SMART hard disk scans for my two SSDs seem healthy, plus I have encountered the error on an initial ProxMox install on my SATA SSD, and then again after reinstalling on an NVME SSD. Are there any other hardware diagnostics I should run, such as some way to test the CPU or motherboard?

I have not had success booting into Memtest86+, even in Legacy Boot mode, either using the version in the ProxMox boot menu or using a USB drive loaded with Memtest86+. I am aware that just because Memtest86 didn't find errors doesn't mean the RAM is fully exonerated.

Thanks for any other thoughts you might have.
 
I decided to nuke the Proxmox installation for now and installed Ubuntu on the machine. I am able to wget without the errors, which makes me think it is not my network and not failing hardware. I wonder if there could have been something corrupted about my Proxmox install, both times? Now I wonder if I flashed it to my USB drive just once, which perhaps had an error, and if I could have transferred a corrupted version both times from the flash drive to the machine, causing the same type of issues both times. I will try copying again to the USB drive and perhaps reinstalling Proxmox.
 
Now I have tried to reinstall Proxmox, and every time I try the install, it fails during the package unpacking process. All the errors look the same, though the specific package changes each time, even from the same USB image. Most recenlty, the error looked like this.

dpkg-deb (subprocess): decompressing archive member: lzma error: compressed data is corrupt
cannot copy archive member from /tmp/longpackagename.deb' to decompressor pipe: failed to write (Broken pipe)
dpkg-deb: error: decompress subprocess returned error exit status 2
dpkg: error processing archive...unexpected end of file or stream
AIGLX: suspending AIGLX clients for VT switch

I'm wondering if I should just send the whole computer in for a warranty replacement, as perhaps there is some difficult to diagnose issue going on somewhere in the motherboard/CPU? Again, memtest looked fine and so do the harddrives. Inexplicably, Ubuntu installs OK but that seems to be the only OS I have tried to do so. I have also run memtest on the computer burning the image files, changed computers burning the image files, changed the method of burning the image files (dd, dd with conv=fdatasync, ubuntu image writer, rufus), changed the USB drive I am using, changed the USB port I am plugging the USB drive into.
 
Now I have tried to reinstall Proxmox, and every time I try the install, it fails during the package unpacking process. All the errors look the same, though the specific package changes each time, even from the same USB image. Most recenlty, the error looked like this.

dpkg-deb (subprocess): decompressing archive member: lzma error: compressed data is corrupt
cannot copy archive member from /tmp/longpackagename.deb' to decompressor pipe: failed to write (Broken pipe)
dpkg-deb: error: decompress subprocess returned error exit status 2
dpkg: error processing archive...unexpected end of file or stream

I mean, is the USB pen drive you use good? Did you tried another?

As if you say Ubuntu works and this error together would me make guess more in that direction..

a plain dd with either fdatasync or a sync command before unplugging afterwards should always work, this is our preferred mode for us devs (as all Linux people and thus no need for some complicate writer gui thingy).

Did you checked the sha256sum from you Proxmox VE ISO file? See here:
http://download.proxmox.com/iso/
https://www.proxmox.com/en/downloads/item/proxmox-ve-6-2-iso-installer

I'd check those two things first (ISO and USB stick), then if that also makes no sense you could try the Install Proxmox VE on Debian Buster method.
 
Yes, I had been checking the sha256sum after downloading the iso and it has looked good.

Happily, I was able to get Proxmox 6.2-1 installed by using fdisk to make sure my SSD was fully wiped, putting the install ISO on that, and then installing from one internal SSD (sda) onto another (nvme0n1). This bypassed the potentially problematic USB flash drive and I was able to get it installed. I had previously tried using sync based on your recommendation, which didn't make the difference with the USB pen drive.

I used ZFS this time in the hope that the error handling features might prevent the issues I was having before, when I had done an install using ext4. The zfs pool is just the one nvme drive that I installed on. I have removed the other SSD for now.

Differences between the two installs have been as follows:
- Previously 6.1, now 6.2
- Previously ext4, now ZFS
- Previously on a SATA SSD, now on a different nvme SSD.

Unfortunately, I am continuing to encounter the same issues as I wrote about in my original post. File uploads via the GUI fail and so does wget (frequent interruptions to the download, every 5-10 seconds, eventually resulting in it giving up).
 
I think I have a new leading theory, that I need to update my NIC driver. My NIC is an Intel I219-v (rev 21) according to lspci.

I installed ethtool and the result of ethtool -i eno1 is as follows:
driver: e1000e
version: 3.2.6-k
firmware-version: 0.1-4

According to Intel, there is an updated driver (v3.8.4) which introduces "initial support for I219-V"
https://downloadcenter.intel.com/download/15817

My next task is how to figure out how to get this driver.tar.gz onto my machine, and then where to install it.
 
Last edited:
I was able to get the driver installed, but it didn't end up fixing the issue.

For posterity, I did the following steps to update the driver (not sure all of them are necessary, especially after make install, but ethtool did not show the new driver being used until the last step).
wget the NIC driver
Add the pve test repo
apt-get update
apt install pve-headers-$(uname -r)
apt-get install gcc
make install (this installed the driver in /lib/modules/5.4.34-1-pve/updates/drivers/net/ethernet/intel/e1000e)
nano /etc/modules-load.d/modules.conf and adding a line "alias eno1 e1000e"
Changed line 1213 of usr/lib/modules/5.4.34-1-pve/modules.order (kernel/drivers/net/ethernet/intel/e1000e/e1000e.ko, "kernel" changed to "updates")
rmmod e1000e
modprobe e1000e
update-initramfs -u - k all (this got the driver to be seen when I ran ethtool -i eno1)

I welcome any other ideas! Thanks again
 
Last edited:
on a hunch - and if possible with both the stock and the updated nic-driver - could you try disabling all offloading of the nic?
(ethtool -K - see man ethtool)

Thanks!
 
  • Like
Reactions: t.lamprecht
Thank you for the suggestion.

It turns out that my wget downloads using http: now complete, but every time I try to download an ubuntu image and check sha256sum, I get a different result, and never the correct sha256sum (after six tries, including two tries after the changes below).

On your suggestion, I ran ethtool -K eno1 ufo off gso off gro off tso off lro off rxhash off. Turns out I cannot change ufo or rxhash, but after the other changes, the problem persists. My next step is going to be removing 1 DIMM of RAM at a time and seeing if the problem persists, even though both DIMMs passed Memtest86. I welcome any other thoughts on the ethtool settings. My configuration (ethtool -k) is as follows in case that is helpful:

Features for eno1:
rx-checksumming: on
tx-checksumming: on
tx-checksum-ipv4: off [fixed]
tx-checksum-ip-generic: on
tx-checksum-ipv6: off [fixed]
tx-checksum-fcoe-crc: off [fixed]
tx-checksum-sctp: off [fixed]
scatter-gather: on
tx-scatter-gather: on
tx-scatter-gather-fraglist: off [fixed]
tcp-segmentation-offload: off
tx-tcp-segmentation: off
tx-tcp-ecn-segmentation: off [fixed]
tx-tcp-mangleid-segmentation: off
tx-tcp6-segmentation: off
udp-fragmentation-offload: off
generic-segmentation-offload: off
generic-receive-offload: off
large-receive-offload: off [fixed]
rx-vlan-offload: on
tx-vlan-offload: on
ntuple-filters: off [fixed]
receive-hashing: off
highdma: on [fixed]
rx-vlan-filter: off [fixed]
vlan-challenged: off [fixed]
tx-lockless: off [fixed]
netns-local: off [fixed]
tx-gso-robust: off [fixed]
tx-fcoe-segmentation: off [fixed]
tx-gre-segmentation: off [fixed]
tx-gre-csum-segmentation: off [fixed]
tx-ipxip4-segmentation: off [fixed]
tx-ipxip6-segmentation: off [fixed]
tx-udp_tnl-segmentation: off [fixed]
tx-udp_tnl-csum-segmentation: off [fixed]
tx-gso-partial: off [fixed]
tx-sctp-segmentation: off [fixed]
tx-esp-segmentation: off [fixed]
tx-udp-segmentation: off [fixed]
fcoe-mtu: off [fixed]
tx-nocache-copy: off
loopback: off [fixed]
rx-fcs: off
rx-all: off
tx-vlan-stag-hw-insert: off [fixed]
rx-vlan-stag-hw-parse: off [fixed]
rx-vlan-stag-filter: off [fixed]
l2-fwd-offload: off [fixed]
hw-tc-offload: off [fixed]
esp-hw-offload: off [fixed]
esp-tx-csum-hw-offload: off [fixed]
rx-udp_tunnel-port-offload: off [fixed]
tls-hw-tx-offload: off [fixed]
tls-hw-rx-offload: off [fixed]
rx-gro-hw: off [fixed]
tls-hw-record: off [fixed]
 
hmm - thanks for trying - maybe also try disabling the following offloadings:
Code:
ethtool -K eno1  rx off tx off sg off rxvlan off txvlan off
(these worked on my NIC, sadly the flags you need to specify on the commandline are not 1:1 the ones in the output of `ethtool -k` - usually it helps to check the syntax in the manpage)


another thought:
* could it be possible that the IP is used twice in your network? - that would also explain the behaviour (the tcpstream is at one point redirected to the other user of the same ip) - you could verify this by shutting of the NUC and afterwards trying to ping it's ip address.
* less likely - could it be that the MAC address is used twice in the network segment (usually in virtualization environments the chances are very slim, even with randomly generated addresses - but who knows)

last but not least - before trying the memory test - try recording a download with 'tcpdump' into a file and then take a look at the generated pcap file with wireshark (problems on the network layer usually are quite easily recognized there):
Code:
tcpdump -w corruptdownload.pcap -s 0 -nvi eno1
wget $iso
then press Ctrl+C in the tcpdump console and copy the corruptdownload.pcap file

I hope this helps!
 
Amazing news. I tried removing one DIMM -- it didn't boot.

I removed the other DIMM -- it is working perfectly. The checksum is correct after using wget, AND I am successfully using the upload image feature in the GUI for the first time, with no errors. I plan to try this DIMM in the other slot to figure out if the issue was with the slot or the DIMM.

This was bad RAM the whole time, as t.lamprecht had suggested, despite no Memtest86 errors. I'm wondering if I need to reinstall Proxmox knowing that the current instance was installed with bad RAM in place.
 
This was bad RAM the whole time, as t.lamprecht had suggested, despite no Memtest86 errors. I'm wondering if I need to reinstall Proxmox knowing that the current instance was installed with bad RAM in place.

this was quite the ride, props to you to stick trying - lots of others would have thrown their towel :)

But yes, I'd honestly suggest a re-installation, just to be sure - bad RAM can cause such subtle issues I'd not risk it.
 
The MEMTEST86 failure to note the bad ram is sort of concerning. It would be interesting to know what the issue with the stick/slot was.
 
I know this post is old but wanted to share some potential troubleshooting for future people to check.
Some motherboards will increase ram speed to rated speed when finished booting into a system. Many of the memtest software tend not to be detected by the motherboard as an OS but as a boot loader, so it stays at reference RAM speed.
Also if you do suspect RAM failure you should see if the motherboard is setting the correct RAM speed and cache latency for the RAM you have installed.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!