[SOLVED] Cannot login to Proxmox GUI under moderate load

Alecz

New Member
Mar 12, 2024
22
2
3
Connecting as root using "Linux PAM standard authentication", and I am getting:

Login failed. Please try again

journalctl confirms the passowrd is correct but the login still fails.
Code:
journalctl -f
Dec 23 10:46:17 proxmox sshd[1462869]: Accepted password for root from 192.168.5.100 port 54428 ssh2
Dec 23 10:46:17 proxmox sshd[1462869]: pam_unix(sshd:session): session opened for user root(uid=0) by (uid=0)
Dec 23 10:46:22 proxmox systemd-logind[692]: New session 14026 of user root.q
Dec 23 10:46:22 proxmox systemd[1]: Started session-14026.scope - Session 14026 of User root.
Dec 23 10:46:24 proxmox sshd[1462869]: pam_env(sshd:session): deprecated reading of user environment enabled
Dec 23 10:46:40 proxmox pvestatd[1092]: status update time (11.530 seconds)
Dec 23 10:46:44 proxmox pve-firewall[1090]: firewall update time (5.780 seconds)
Dec 23 10:47:17 proxmox pvedaemon[1361469]: <root@pam> successful auth for user 'root@pam'
Dec 23 10:47:20 proxmox pveproxy[1280597]: proxy detected vanished client connection

I can connect via ssh, though it is slow, but cannot connect via GUI.

This usually happens under moderate or high system load.

Currently top says:

Code:
top - 10:54:03 up 106 days, 21:49,  2 users,  load average: 2.49, 2.65, 1.86
Tasks: 313 total,   1 running, 312 sleeping,   0 stopped,   0 zombie
%Cpu(s):  3.6 us,  3.2 sy,  0.0 ni, 33.5 id, 59.2 wa,  0.0 hi,  0.5 si,  0.0 st
MiB Mem :   7857.1 total,   2318.1 free,   5588.0 used,    206.3 buff/cache
MiB Swap:   7632.0 total,   6341.3 free,   1290.7 used.   2269.1 avail Mem

    PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     TIME+ COMMAND
 372571 transmi+  20   0  175668 142728   2560 S   4.7   1.8     81,12 transmission-da
   1090 root      20   0  193884   9996   4864 S   2.0   0.1     12,00 pve-firewall
1437265 root      rt   0  562236 169436  52960 S   1.3   2.1     12,58 corosync
    557 root       1 -19       0      0      0 S   0.7   0.0 110:11.84 z_wr_iss

iotop shows
Code:
Total DISK READ:         2.54 M/s | Total DISK WRITE:        27.08 K/s
Current DISK READ:       2.87 M/s | Current DISK WRITE:      10.15 K/s
    TID  PRIO  USER     DISK READ  DISK WRITE  SWAPIN     IO>    COMMAND                                                                                      1459871 be/4 root       13.54 K/s    0.00 B/s  0.00 % 15.05 % [kworker/u4:5+dm-thin]
1462949 be/4 root       10.15 K/s    0.00 B/s  0.00 % 13.46 % [kworker/u4:4+flush-252:9]
1465440 be/4 root        3.38 K/s    0.00 B/s  0.00 %  1.93 % python3 /usr/sbin/iotop
   1090 be/4 root      104.93 K/s    0.00 B/s  0.00 %  0.00 % pve-firewall
   1092 be/4 root      253.87 K/s    0.00 B/s  0.00 %  0.00 % pvestatd
 133498 be/4 100000    121.86 K/s    0.00 B/s  0.00 %  0.00 % init

free shows some swap usage:
Code:
# free -h
               total        used        free      shared  buff/cache   available
Mem:           7.7Gi       5.2Gi       2.5Gi        21Mi       204Mi       2.5Gi
Swap:          7.5Gi       1.3Gi       6.2Gi

I did swapoff -a and I could login instantly via Web gui. (my swappiness is 10)
I changed it to 1.

If I turn swap back on, it is getting filled again, the IO load spikes and I cannot login via GUI:
Code:
free -h
               total        used        free      shared  buff/cache   available
Mem:           7.7Gi       5.3Gi       2.4Gi        21Mi       189Mi       2.3Gi
Swap:          7.5Gi       1.2Gi       6.2Gi
(with swappiness 1)

(Note: I have a zpool ans the arc is limited to 4G)

Why can't I login via web gui under such moderate load if the SWAP is being used?
How come 1.4G of swap is used despite having 2.5G RAM available right after re-enabling swap.
 
Last edited:
Thanks for the reply, here are the answers:


How big is your zpool, and how many resources do your VMs consume?
Code:
zpool list -v

Code:
qm list

It would also be interesting to know:
Code:
lscpu

Your server has very few RAM; I suspect that this is the problem. Also I/O wait is very high:


Please check the system requirements [0], [1].

[0] https://pve.proxmox.com/pve-docs/pve-admin-guide.html#install_recommended_requirements
[1] https://pve.proxmox.com/pve-docs/pve-admin-guide.html#_hardware_2

The pool is 14TB:
Code:
zpool list -v
NAME               SIZE  ALLOC   FREE  CKPOINT  EXPANDSZ   FRAG    CAP  DEDUP    HEALTH  ALTROOT
storage14          12.7T  11.2T  1.55T        -         -     7%    87%  1.00x    ONLINE  -
  usb-Seagate_...  12.7T  11.2T  1.55T        -         -     7%  87.8%      -    ONLINE
I know it's a bit overutilized .. I'm working on freeing up some space

qm list is empty

lscpu: this is an older laptop-rated CPU

Code:
lscpu
Architecture:             x86_64
  CPU op-mode(s):         32-bit, 64-bit
  Address sizes:          39 bits physical, 48 bits virtual
  Byte Order:             Little Endian
CPU(s):                   2
  On-line CPU(s) list:    0,1
Vendor ID:                GenuineIntel
  BIOS Vendor ID:         Intel
  Model name:             Intel(R) Celeron(R) 2957U @ 1.40GHz
    BIOS Model name:      Intel(R) Celeron(R) 2957U @ 1.40GHz Fill By OEM CPU @ 1.4GHz
    BIOS CPU family:      15
    CPU family:           6
    Model:                69
    Thread(s) per core:   1
    Core(s) per socket:   2
    Socket(s):            1
    Stepping:             1
    CPU(s) scaling MHz:   100%
    CPU max MHz:          1400.0000
    CPU min MHz:          800.0000
    BogoMIPS:             2793.42
    Flags:                fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc
                          arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc cpuid aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx est tm2 ssse3 sdbg cx16 xtpr pdcm pcid sse4_1 s
                          se4_2 movbe popcnt xsave rdrand lahf_lm abm cpuid_fault epb pti tpr_shadow flexpriority ept vpid ept_ad fsgsbase tsc_adjust erms invpcid xsaveopt dtherm arat pln
                          pts vnmi
Virtualization features:
  Virtualization:         VT-x
Caches (sum of all):
  L1d:                    64 KiB (2 instances)
  L1i:                    64 KiB (2 instances)
  L2:                     512 KiB (2 instances)
  L3:                     2 MiB (1 instance)
NUMA:
  NUMA node(s):           1
  NUMA node0 CPU(s):      0,1
Vulnerabilities:
  Gather data sampling:   Not affected
  Itlb multihit:          KVM: Mitigation: VMX disabled
  L1tf:                   Mitigation; PTE Inversion; VMX conditional cache flushes, SMT disabled
  Mds:                    Vulnerable: Clear CPU buffers attempted, no microcode; SMT disabled
  Meltdown:               Mitigation; PTI
  Mmio stale data:        Unknown: No mitigations
  Reg file data sampling: Not affected
  Retbleed:               Not affected
  Spec rstack overflow:   Not affected
  Spec store bypass:      Vulnerable
  Spectre v1:             Mitigation; usercopy/swapgs barriers and __user pointer sanitization
  Spectre v2:             Mitigation; Retpolines; STIBP disabled; RSB filling; PBRSB-eIBRS Not affected; BHI Not affected
  Srbds:                  Vulnerable: No microcode
  Tsx async abort:        Not affected
  Vmscape:                Vulnerable

The system has only 8GB of RAM, but I'm not running any VMs on it, just the one zfs pool and a a few containers (Turnkey Linux for Samba, NFS, transmission, and pi-hole)

I am aware the system might not have enough memory given the 14TB pool (I would need 16G, but i will migrate the pool to a different machine).

One important thing to note is that since I reduced the ZFS arc from 4GB to 2GB, I haven't seen any problems.
The RAM usage is about 50-60%, and there is no swap usage.
They all say that linux using up all the RAM is a good thing, but I am not convinced since when the RAM usage was above 80% (with 4G arc), the system was swapping and becoming unresponsive under load. Now with a lower arc, the RAM usage is <60%, no swap, and the system is responsive.
I wish I could use more of the RAM, but without risking lock-ups.

I am fine if the zfs performance is reduced due to lack of RAM, but I'd like to have the rest of the system responsive.

EDIT: Yes, the I/O wait is very high when the SWAP is being used.
 
Last edited:
  • Like
Reactions: Kingneutron
I know it's a bit overutilized .. I'm working on freeing up some space
Right.

lscpu: this is an older laptop-rated CPU

Yes, the CPU is indeed very, very weak for that.

The system has only 8GB of RAM, but I'm not running any VMs on it, just the one zfs pool and a a few containers (Turnkey Linux for Samba, NFS, transmission, and pi-hole)
I am aware the system might not have enough memory given the 14TB pool (I would need 16G, but i will migrate the pool to a different machine).

I am fine if the zfs performance is reduced due to lack of RAM, but I'd like to have the rest of the system responsive.
EDIT: Yes, the I/O wait is very high when the SWAP is being used.
Okay, you know that your system doesn't have enough resources for the installed setup. One option would be to “not” use ZFS. An alternative would be, for example, a default Ext4 filesystem.

To get more real RAM, I can also recommend zram [0]. I use this on a friend's old Intel Nuc, which only has 6 GB of RAM. This gives the miniserver 10 GB of RAM. The RAM is displayed as swap (compressed RAM). You can also use real swap. However, this must not be on ZFS [1] and should be fast NVME/SSD storage, if possible.


[0] https://pve.proxmox.com/wiki/Zram#Alternative_Setup_using_zram-tools
[1] https://pve.proxmox.com/pve-docs/pve-admin-guide.html#zfs_swap
 
So while, I agree this is a low power system, I find it a poor excuse to not work properly under "moderate load" (not allow me to login and interact with the OS installed on a separate HDD).

To further prove my point, the system has been incredibly snappy after reducing the ZFS arc from 4G > 2G.

For the past couple of days, I have been:
  • sending / receiving TBs of snapshots from the USB3 14TB pool to another USB2 24TB pool on the same system (so 38T total zpool with 8G ram)
  • extracting archives on the 14TB pool
  • Streaming videos over the network from the 14TB pool
(All at the same time!)

All this time, the IO delay and RAM usage have both been constantly 60-70%, (Swap at 0.01%) but the system has been extremely responsive with no indication that it was struggling to do anything. Logging in either via GUI or SSH is instant, browsing the 14TB pool was responsive, streaming was fluid.

This behaviors directly points to the abhorrent unresponsiveness being caused by the SWAP usage.
Maybe SWAP somehow increases the performance if ZFS is not used, but with ZFS, it seems to backfire because think the kernel considers the ZFS ARC as "resident memory" not a cache that should be dropped instead of pushing processes to swap.


I know about zram and I have used it in the past before. My concern here is that it will cost more CPU to compress/decompress. But I might consider it regardless since the CPU usage is relatively constantly low.

Anyhow, the conclusion is that for low RAM systems with ZFS:
Sure this won't result in some incredible ZFS performance, but at least you can use the system without adding more (now expensive) RAM:cool:

I hope this will help other people that use low-power systems.
 
When you install PVE with ZFS it doesn't create SWAP. ZRAM might also be a better option.
 
PROTIP - if you install PVE with Advanced options and leave a couple of GB free at the end of the disk, you can partition it post-install and add swap.
 
> I know about zram and I have used it in the past before. My concern here is that it will cost more CPU to compress/decompress. But I might consider it regardless since the CPU usage is relatively constantly low.

zram uses LZO-RLE by default, but also supports lz4 compression - which is basically free on just about any processor created in the last 15 years or so. Yours was introduced in late 2013, but it's only 1.4GHz so YMMV; but it's probably still worth experimenting with.
 
Hello,

> reduce ZFS arc to maybe 1/3 or 1/4 of RAM (https://pve.proxmox.com/wiki/ZFS_on_Linux#sysadmin_zfs_limit_memory_usage)

Note that new Proxmox VE installs starting from 8.1(see the changelog [1] for more details) will by default allocate 10% of the available memory to the ARC.

> reduce swappiness to a 0-10 (optional, mine is back to 60 and no issues since reducing arc)

This is not as simple as it sounds, the effect of swappiness heavily depends on what kind of storage is used for the swap and the workload. However, an argument can be made [2] that the defaults are good enough for common usecases and that reducing swappines might in fact have a detrimental effect.

[1] https://pve.proxmox.com/wiki/Roadmap#Proxmox_VE_8.1
[2] https://chrisdown.name/2018/01/02/in-defence-of-swap.html