System unresponsive after being leftover the weekend. Logs do not show anything obvious to me at least

4w4i5

New Member
Apr 22, 2024
3
0
1
Hey there i was wondering if someone can help me with this slight issue on a PVE machine i have. This is my first post ever on a forum so guide me if i am wrong in someplace. Anyways the PVE machine is meant to run multiple VMs at a time. CPU is 16 core Ryzen 9, RAM is 128GB and Storage is a 500gb SSD (PVE Storage + VM templates) and a 1tb HDD (VM Data) which hosts the main zfs pool for the VMs (I needed file-level storage with resizeablility and the option to transfer vmdk/qcow2 in the case we shift to another system), the network is the default network it comes with. I created a few HostOnly and NATNetwork implementations equivalent to what we'd see on vmware but have not used them.

The problem ive been having with is the fact that the server goes unresponsive on everything over the weekend or if i reload the networking with the srv networking reload command. By unresponsive i mean connection times out on GUI, SSH and sometimes but rarely i wont be able to ping the server IP as well. Last time this happened i combed through the logs but as posted on another thread here the ethernet adapter went from a blocking to forwarding state so everything's fine. This time as im being prompted to post on the community forums i have the problem of a SIGHUP being sent to a VM that was running and after that the server went dark again. I restarted it and everythings back up and running as it should be but i can tell it will be dead by the weekend again or god forbid i change the networking config.


I have attached the recent log but if anything else is required please let me know
 

Attachments

  • log.txt
    343 KB · Views: 6
Hi,
do you have any kind of monitoring for the server? It sounds a bit like the server is starting to swap heavily, slowing down the system to a halt.
You could add a metric server in the datacenter options to record the resource usage over the weekend, or set up something like prometheus.
 
Swapping on 128gb of ram sounds a bit too harsh no? (Assuming ZFS just decided to blatantly go past the ram limit for 8.5GB i set for it according to the guides) regardless ill set up metrics, no other lead so far so i will update over the weekend
 
Could always be some process going haywire. Could you think of anything out of the ordinary that is happening on the weekend, like backups or bigger sync jobs?

Btw, even if the ram limit for zfs wasn't working, it would still free it if the system ran low overall. The zfs arc is only a problem if you have tasks that very rapidly try to reserve ram, as zfs might not get scheduled early enough to free it.

Your network config and the output of pveversion -v could be helpful
 
Last edited:
I just switched to Routed config it better suits my use of PVE as a VM host for multiple users as well as the fact that if any VM is to serve something outside the network i need it to come through one interface (Masquerade would better suit me here but the firewall rules might be a pain, routed seems like a decent compromise if it can get me that).

As for the tasks or anything like that no not yet, this is a fresh install with a backup of the VMs restored and stored in a stopped state. I did have a PBS server connected but its been disconnected/removed from the past few weeks in an attempt to diagnose this. Note that the server is not being hit with heavy loads at the moment and is not expected to be pushed past 80%. To put it into numbers i'd say we would need 5-6 VMs max running at the same time.

Honestly, my first thought was the same that some process is absolutely hogging the system but nothing shows up + i ran a fresh install, this server atm just has the VMs and the SDN's i created, however im tempted to purge those due to multiple dnsmasq faults related to it not finding an ethers file.

My network config so far is that i have a static IP from the IT team here onsite, after that ive just used the suggested config for the routing mode found online

Heres the output from pveversion -v

Code:
~# pveversion -v
proxmox-ve: 8.1.0 (running kernel: 6.5.13-3-pve)
pve-manager: 8.1.10 (running version: 8.1.10/4b06efb5db453f29)
proxmox-kernel-helper: 8.1.0
proxmox-kernel-6.5.13-3-pve-signed: 6.5.13-3
proxmox-kernel-6.5: 6.5.13-3
proxmox-kernel-6.5.11-8-pve-signed: 6.5.11-8
ceph-fuse: 17.2.7-pve2
corosync: 3.1.7-pve3
criu: 3.17.1-2
dnsmasq: 2.89-1
glusterfs-client: 10.3-5
ifupdown2: 3.2.0-1+pmx8
ksm-control-daemon: 1.4-1
libjs-extjs: 7.0.0-4
libknet1: 1.28-pve1
libproxmox-acme-perl: 1.5.0
libproxmox-backup-qemu0: 1.4.1
libproxmox-rs-perl: 0.3.3
libpve-access-control: 8.1.3
libpve-apiclient-perl: 3.3.2
libpve-cluster-api-perl: 8.0.5
libpve-cluster-perl: 8.0.5
libpve-common-perl: 8.1.1
libpve-guest-common-perl: 5.0.6
libpve-http-server-perl: 5.0.6
libpve-network-perl: 0.9.6
libpve-rs-perl: 0.8.8
libpve-storage-perl: 8.1.4
libspice-server1: 0.15.1-1
lvm2: 2.03.16-2
lxc-pve: 5.0.2-4
lxcfs: 5.0.3-pve4
novnc-pve: 1.4.0-3
proxmox-backup-client: 3.1.5-1
proxmox-backup-file-restore: 3.1.5-1
proxmox-kernel-helper: 8.1.0
proxmox-mail-forward: 0.2.3
proxmox-mini-journalreader: 1.4.0
proxmox-offline-mirror-helper: 0.6.5
proxmox-widget-toolkit: 4.1.5
pve-cluster: 8.0.5
pve-container: 5.0.9
pve-docs: 8.1.5
pve-edk2-firmware: 4.2023.08-4
pve-firewall: 5.0.3
pve-firmware: 3.9-2
pve-ha-manager: 4.0.3
pve-i18n: 3.2.1
pve-qemu-kvm: 8.1.5-4
pve-xtermjs: 5.3.0-3
qemu-server: 8.1.1
smartmontools: 7.3-pve1
spiceterm: 3.3.0
swtpm: 0.8.0+pve1
vncterm: 1.8.0
zfsutils-linux: 2.2.3-pve1

As for the network config here's the output from `/etc/network/interfaces`

Code:
auto lo
iface lo inet loopback

auto enp39s0
iface enp39s0 inet static
        address 172.17.170.194/28
        gateway 172.17.170.1
        post-up echo 1 > /proc/sys/net/ipv4/ip_forward
        post-up echo 1 > /proc/sys/net/ipv4/conf/enp39s0/proxy_arp
        post-up /sbin/ethtool -s enp39s0 wol g


auto vmbr0
iface vmbr0 inet static
        address 192.168.1.1/24
        bridge-ports none
        bridge-stp off
        bridge-fd 0
#Bridge to the internet

source /etc/network/interfaces.d/*
 
Last edited:

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!