Proxmox has a high load, VM's are disconnecting and getting new ip-adresses

Spiderwolf

New Member
Apr 30, 2023
9
0
1
Hi there,

I'm playing around with Proxmox and have my unifi cloudkey setup on one container and a vanille minecrafserver on an other container. Everything worked smoothly.
However since I updated Proxmox this weekend, suddenly my Containers are disconnecting (disconnecting.... MIgration detected) and everything is real sluggish.
Now it is even impossible to login with an external IP-adress on the Proxmox console.

Is there a known bug in the latest updates?

With kind regards,

Spiderwolf
 
Hi, from which version did you upgrade? Could you please attach the output of pveversion -v?

Is there anything mentioned in the journal that might be related to the sluggishness? You can check the journal of the current boot using journalctl -b -- you can also attach that file (or an excerpt) here.

Can you check top or htop to find out what is causing the high CPU load?
 
Thank you for your quick reply.

I cannot login externally (I get a black screen), but I will come back on it tonight.

With warm regards,
Dennis
 
About the updates, it's just an ordinary update from the menu of Proxmox. So I just updated Proxmox. I still will do what you asked, since it's only better to give you all the info you need.

With warm regards,

Dennis
 
I even get a black screen now trying to login internally. Putty is not helpful either. I get an error in the software connection.
I think all hope is lost :(
 
This is quite strange. Could you elaborate what exactly is happening?
  • Can you still reach the GUI in the browser by opening https://<YOUR IP>:8006?
  • What error exactly does PuTTY print when you try to connect via SSH?
  • Does the machine still respond to pings? What is the output of ping -c 4 <YOUR IP> (on a different machine)?
  • Do you have physical access or remote access via IPMI etc to the machine? If yes, can you check if there is something on the screen and send a screenshot? What happens if you reboot the machine?
 
  • Can you still reach the GUI in the browser by opening https://<YOUR IP>:8006?
    • No I cannot, mostly it gives me unreachable, sometimes I get a black screen.
  • What error exactly does PuTTY print when you try to connect via SSH?
    • sometimes I can login sometimes not, but I get a:
      • Network Connection error (on internal ip-address)
      • Remote Side unexpectedly closed Network session
      • Network error: Software caused connection abort
  • Does the machine still respond to pings? What is the output of ping -c 4 <YOUR IP> (on a different machine)?
    • Sometimes it responds, most of the times not
    • Code:
      Pinging 192.168.178.230 with 32 bytes of data:
      Reply from 192.168.178.230: bytes=32 time=1ms TTL=64
      Reply from 192.168.178.230: bytes=32 time=2ms TTL=64
      Reply from 192.168.178.230: bytes=32 time=2ms TTL=64
      Reply from 192.168.178.230: bytes=32 time=2ms TTL=64
      
      Ping statistics for 192.168.178.230:
          Packets: Sent = 4, Received = 4, Lost = 0 (0% loss),
      Approximate round trip times in milli-seconds:
          Minimum = 1ms, Maximum = 2ms, Average = 1ms
    • Code:
      Pinging 192.168.178.230 with 32 bytes of data:
      Reply from 192.168.178.230: bytes=32 time=1ms TTL=64
      Request timed out.
      Request timed out.
      Request timed out.
      
      Ping statistics for 192.168.178.230:
          Packets: Sent = 4, Received = 1, Lost = 3 (75% loss),
      Approximate round trip times in milli-seconds:
          Minimum = 1ms, Maximum = 1ms, Average = 1ms
  • I'm going to try to reboot the machine and go in physically, but I need to connect it to a monitor, so it will take some time. Will update you in my next post.
As far as I can tell myself, it looks like the machine is constantly rebooting. I have no internet or networkproblems with other devices. I hope I can give more information after I connected it to a monitor.

With warm regards,

Dennis
 
Last edited:
  • Can you still reach the GUI in the browser by opening https://<YOUR IP>:8006?
    • No I cannot, mostly it gives me unreachable, sometimes I get a black screen.
Do you in fact get a black screen in the browser window? Could you send a screenshot if you see it again?
  • As far as I can tell myself, it looks like the machine is constantly rebooting. I have no internet or networkproblems with other devices. I hope I can give more information after I connected it to a monitor.
Sounds good! Another idea: Could you check whether there is another device on the network that also uses the IP 192.168.178.230? Maybe this IP was originally assigned by a DHCP server, but has since been reused for another device? This could cause all sorts of weird network issues and might be another explanation why you can sometimes reach the GUI and sometimes cannot.
 
I got logged in (and even from a external ip-adress), but already a communication error in 30 seconds.
Trying to open a new session, gives me the famous black screen.

I also searched my network with Advanced IP-scanner if some other device was hijacking the proxmox ip-adress. But no, no other ip-adress is on the list when it disconnects. (Or it must be a virtual machine/container who gets the ip-adress, but that DHCP-pool is far from the Proxmox ip-adress).

With warm regards,

Dennis
 

Attachments

  • 2023-09-28 14_17_31-Proxmox_black_screen.png
    2023-09-28 14_17_31-Proxmox_black_screen.png
    33.2 KB · Views: 8
  • 2023-09-28 14_13_22-proxmox - Proxmox Virtual Environment.png
    2023-09-28 14_13_22-proxmox - Proxmox Virtual Environment.png
    91.5 KB · Views: 8
Attached you will find the syslog I could get when I was logged in. It is from the past 10 days, so everything should be covered.
Working in the Shell is a pain in the ..... cause it is mostly non-responsive (when I type it sets one character every 30 seconds)

I'm trying to plan when I can put my server on a monitor and keyboard, so I can see what is wrong on the machine itself.

Also the output of pveversion -v:
Code:
proxmox-ve: 8.0.1 (running kernel: 6.2.16-3-pve)
pve-manager: 8.0.3 (running version: 8.0.3/bbf3993334bfa916)
pve-kernel-6.2: 8.0.2
pve-kernel-5.15: 7.4-4
pve-kernel-6.2.16-3-pve: 6.2.16-3
pve-kernel-5.15.108-1-pve: 5.15.108-1
pve-kernel-5.15.102-1-pve: 5.15.102-1
ceph-fuse: 17.2.6-pve1+3
corosync: 3.1.7-pve3
criu: 3.17.1-2
glusterfs-client: 10.3-5
ifupdown2: 3.2.0-1+pmx3
ksm-control-daemon: 1.4-1
libjs-extjs: 7.0.0-3
libknet1: 1.25-pve1
libproxmox-acme-perl: 1.4.6
libproxmox-backup-qemu0: 1.4.0
libproxmox-rs-perl: 0.3.0
libpve-access-control: 8.0.3
libpve-apiclient-perl: 3.3.1
libpve-common-perl: 8.0.5
libpve-guest-common-perl: 5.0.3
libpve-http-server-perl: 5.0.3
libpve-rs-perl: 0.8.3
libpve-storage-perl: 8.0.2
libspice-server1: 0.15.1-1
lvm2: 2.03.16-2
lxc-pve: 5.0.2-4
lxcfs: 5.0.3-pve3
novnc-pve: 1.4.0-2
proxmox-backup-client: 3.0.1-1
proxmox-backup-file-restore: 3.0.1-1
proxmox-kernel-helper: 8.0.2
proxmox-mail-forward: 0.2.0
proxmox-mini-journalreader: 1.4.0
proxmox-widget-toolkit: 4.0.5
pve-cluster: 8.0.1
pve-container: 5.0.4
pve-docs: 8.0.4
pve-edk2-firmware: 3.20230228-4
pve-firewall: 5.0.2
pve-firmware: 3.7-1
pve-ha-manager: 4.0.2
pve-i18n: 3.0.4
pve-qemu-kvm: 8.0.2-3
pve-xtermjs: 4.16.0-3
qemu-server: 8.0.6
smartmontools: 7.3-pve1
spiceterm: 3.3.0
swtpm: 0.8.0+pve1
vncterm: 1.8.0
zfsutils-linux: 2.1.12-pve1
With warm regards,

Dennis
 

Attachments

  • Proxmox .txt
    427.6 KB · Views: 6
Last edited:
Ever since last reboot your network connectivity is unstable. This can be seen in the log in the following snippets:
Code:
Sep 22 23:30:44 proxmox kernel: e1000e 0000:00:1f.6 eno1: NIC Link is Up 100 Mbps Full Duplex, Flow Control: Rx/Tx
Sep 22 23:30:44 proxmox kernel: vmbr0: port 1(eno1) entered blocking state
Sep 22 23:30:44 proxmox kernel: vmbr0: port 1(eno1) entered forwarding state
Sep 22 23:30:44 proxmox kernel: e1000e 0000:00:1f.6 eno1: NIC Link is Down
Sep 22 23:30:45 proxmox kernel: vmbr0: port 1(eno1) entered disabled state
Sep 22 23:30:54 proxmox kernel: e1000e 0000:00:1f.6 eno1: NIC Link is Up 100 Mbps Full Duplex, Flow Control: Rx/Tx
Sep 22 23:30:54 proxmox kernel: vmbr0: port 1(eno1) entered blocking state
Sep 22 23:30:54 proxmox kernel: vmbr0: port 1(eno1) entered forwarding state
Sep 22 23:30:54 proxmox kernel: e1000e 0000:00:1f.6 eno1: NIC Link is Down
Sep 22 23:30:55 proxmox kernel: vmbr0: port 1(eno1) entered disabled state
Sep 22 23:31:23 proxmox kernel: e1000e 0000:00:1f.6 eno1: NIC Link is Up 10 Mbps Full Duplex, Flow Control: Rx/Tx
Sep 22 23:31:23 proxmox kernel: vmbr0: port 1(eno1) entered blocking state
Sep 22 23:31:23 proxmox kernel: vmbr0: port 1(eno1) entered forwarding stat

and
Code:
Sep 23 00:21:45 proxmox kernel: ------------[ cut here ]------------
Sep 23 00:21:45 proxmox kernel: NETDEV WATCHDOG: eno1 (e1000e): transmit queue 0 timed out
Sep 23 00:21:45 proxmox kernel: WARNING: CPU: 2 PID: 0 at net/sched/sch_generic.c:525 dev_watchdog+0x23a/0x250
Sep 23 00:21:45 proxmox kernel: Modules linked in: tcp_diag inet_diag ip6t_REJECT nf_reject_ipv6 xt_hl ip6t_rt ipt_REJECT nf_reject_ipv4 xt_LOG nf_log_syslog xt_comment nft_limit xt_limit xt_addrtype xt_tcpudp xt_conntrack nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 nft_compat cfg80211 veth ebtable_filter ebtables ip_set ip6table_raw iptable_raw ip6table_filter ip6_tables iptable_filter bpfilter nf_tables scsi_transport_iscsi bonding tls sunrpc nfnetlink_log nfnetlink binfmt_misc snd_hda_codec_hdmi snd_hda_codec_realtek snd_hda_codec_generic ledtrig_audio snd_soc_avs snd_soc_hda_codec snd_hda_ext_core snd_soc_core intel_rapl_msr snd_compress intel_rapl_common ac97_bus intel_tcc_cooling x86_pkg_temp_thermal snd_pcm_dmaengine intel_powerclamp coretemp snd_hda_intel snd_intel_dspcfg snd_intel_sdw_acpi kvm_intel mei_hdcp mei_pxp snd_hda_codec i915 snd_hda_core kvm irqbypass drm_buddy crct10dif_pclmul ttm polyval_clmulni polyval_generic ghash_clmulni_intel sha512_ssse3 aesni_intel drm_display_helper crypto_simd
Sep 23 00:21:45 proxmox kernel:  snd_hwdep cec cryptd rc_core snd_pcm drm_kms_helper rapl i2c_algo_bit intel_cstate syscopyarea snd_timer sysfillrect hp_wmi sparse_keymap pcspkr platform_profile wmi_bmof ee1004 sysimgblt snd mei_me tpm_infineon soundcore mei mac_hid intel_pch_thermal acpi_pad zfs(PO) zunicode(PO) zzstd(O) zlua(O) zavl(PO) icp(PO) zcommon(PO) znvpair(PO) spl(O) vhost_net vhost vhost_iotlb tap drm efi_pstore dmi_sysfs ip_tables x_tables autofs4 btrfs blake2b_generic xor raid6_pq simplefb dm_thin_pool dm_persistent_data dm_bio_prison dm_bufio libcrc32c crc32_pclmul xhci_pci ahci i2c_i801 xhci_pci_renesas e1000e i2c_smbus libahci xhci_hcd video wmi
Sep 23 00:21:45 proxmox kernel: CPU: 2 PID: 0 Comm: swapper/2 Tainted: P           O       6.2.16-3-pve #1
Sep 23 00:21:45 proxmox kernel: Hardware name: HP HP EliteDesk 800 G2 DM 65W/8056, BIOS N21 Ver. 02.51 10/20/2020
Sep 23 00:21:45 proxmox kernel: RIP: 0010:dev_watchdog+0x23a/0x250
Sep 23 00:21:45 proxmox kernel: Code: 00 e9 2b ff ff ff 48 89 df c6 05 8a 6f 7d 01 01 e8 6b 08 f8 ff 44 89 f1 48 89 de 48 c7 c7 58 64 e0 ad 48 89 c2 e8 06 ab 30 ff <0f> 0b e9 1c ff ff ff 66 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 40 00
Sep 23 00:21:45 proxmox kernel: RSP: 0018:ffffb11d40174e38 EFLAGS: 00010246

You can try a few troubleshooting avenues:
- as suggested previously - boot into older kernel
- check/replace cable
- swap NIC
- try to upgrade Bios and all available hardware firmware
- move NIC to another PCI port if possible

good luck


Blockbridge : Ultra low latency all-NVME shared storage for Proxmox - https://www.blockbridge.com/proxmox
 
  • Like
Reactions: fweber
Thank you for your quick reply. I think that I know what the culprit is then. I will check my networkcables, especially since I do have some other issues, but those were normally speed related.

Just to be sure... how long is the journalctl -b ? Due to the disconnecting (which you probably have solved), I need to press enter constantly to keep the list loading.

With warm regards,

Dennis
 
The man pages are a great source of information:
Code:
-b [[ID][±offset]|all], --boot[=[ID][±offset]|all]
           Show messages from a specific boot. This will add a match for "_BOOT_ID=".

           The argument may be empty, in which case logs for the current boot will be shown.

           If the boot ID is omitted, a positive offset will look up the boots starting from the beginning of the journal, and an equal-or-less-than zero offset will look up boots starting from the end of the journal. Thus, 1
           means the first boot found in the journal in chronological order, 2 the second and so on; while -0 is the last boot, -1 the boot before last, and so on. An empty offset is equivalent to specifying -0, except when
           the current boot is not the last boot (e.g. because --directory was specified to look at logs from a different machine).

           If the 32-character ID is specified, it may optionally be followed by offset which identifies the boot relative to the one given by boot ID. Negative values mean earlier boots and positive values mean later boots.
           If offset is not specified, a value of zero is assumed, and the logs for the boot given by ID are shown.

           The special argument all can be used to negate the effect of an earlier use of -b.

You can disable pager via:
Code:
       --no-pager
           Do not pipe output into a pager.

You can also specify number of lines, via "-n", or output the journal to file via ">file.log" which disables the pager.


Blockbridge : Ultra low latency all-NVME shared storage for Proxmox - https://www.blockbridge.com/proxmox
 
In addition to what @bbgeek17 wrote: It seems like NICs using the e1000e occasionally experience such hangs [1], and apparently disabling TSO and GSO offloading can help, see e.g. [2].

If I understand correctly, the issues only started appearing after the update this weekend. Can you check what kernel kernel was running before the update? You can find out by passing the ID of the last boot before the update to journalctl -b (e.g. journalctl -b -3) and reading the kernel off the first line of the output (Linux version ...).

What NIC are you using? Can you post the output of lspci -v | grep -i ethernet?

[1] https://forum.proxmox.com/threads/e1000-driver-hang.58284/page-8#post-390709
[2] https://forum.proxmox.com/threads/e1000-driver-hang.58284/page-10#post-567651
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!