[SOLVED] Big problem with my server.

xickwy

Member
Nov 10, 2019
10
12
8
35
Hello

I've a big problem with my server. I don't know what's happening. After two weeks, two days ago i've noticed my server fans upper a bit. And here comes the problems.

The networks services works fine (I'm talking now with pfsense in my server). I can access by ssh and iRMC. I can run commands how qm to see if pfsense is running, lxc don't work, and some pve commands. So... not bad. Go to the problems. I see a lot of services stucked, firstly pveproxy, with a loop of dmesg trying to start. If i go to the problem, I see the systemd-hostnamed is stopped. When I try to start, it shows me a libc.so.6 Access Denied. With systemd-nslookup, can't access to /etc/hostname. Again, Access Denied. SMPD services shows me the can't access to hosts.allow/deny...... I've try to run fsck, no errors.

I will try to reinstall some proxmox services, and i've got the PVE Manager stucked in the reinstall because pveproxy can't start. Now if i try to update by apt, can't do. Due to dns lookup, can't accesible. I can download by ssh and dkpg -i. But nothing... I see PX run over Apparmor. I've disabled it, and the same thing.

I think i've try all. But some help is needed sometimes.

I come after 10 hours, I go to work now and this tonight i will try more.

Thanks in advance

My server is a Fujitsu Primergy RX800. Two Xeon E5-2630v2 with 86GiB DDR3. I'm Running in a SATA disk and a RAID5 with the others disks.

Code:
proxmox-ve: 6.3-1 (running kernel: 5.4.78-2-pve)
pve-manager: not correctly installed (running version: 6.3-3/eee5f901)
pve-kernel-5.4: 6.3-3
pve-kernel-helper: 6.3-3
pve-kernel-5.3: 6.1-6
pve-kernel-5.4.78-2-pve: 5.4.78-2
pve-kernel-5.4.78-1-pve: 5.4.78-1
pve-kernel-5.3.18-3-pve: 5.3.18-3
pve-kernel-5.3.10-1-pve: 5.3.10-1
ceph-fuse: 12.2.11+dfsg1-2.1+b1
corosync: 3.0.4-pve1
criu: 3.11-3
glusterfs-client: 5.5-3
ifupdown: residual config
ifupdown2: 3.0.0-1+pve3
ksm-control-daemon: 1.3-1
libjs-extjs: 6.0.1-10
libknet1: 1.16-pve1
libproxmox-acme-perl: 1.0.7
libproxmox-backup-qemu0: 1.0.2-1
libpve-access-control: 6.1-3
libpve-apiclient-perl: 3.1-3
libpve-common-perl: 6.3-2
libpve-guest-common-perl: 3.1-3
libpve-http-server-perl: 3.1-1
libpve-storage-perl: 6.3-3
libqb0: 1.0.5-1
libspice-server1: 0.14.2-4~pve6+1
lvm2: 2.03.02-pve4
lxc-pve: 4.0.3-1
lxcfs: 4.0.3-pve3
novnc-pve: 1.1.0-1
proxmox-backup-client: 1.0.6-1
proxmox-mini-journalreader: 1.1-1
proxmox-widget-toolkit: 2.4-3
pve-cluster: 6.2-1
pve-container: 3.3-2
pve-docs: 6.3-1
pve-edk2-firmware: 2.20200531-1
pve-firewall: 4.1-3
pve-firmware: 3.1-3
pve-ha-manager: 3.1-1
pve-i18n: 2.2-2
pve-qemu-kvm: 5.1.0-7
pve-xtermjs: 4.7.0-3
qemu-server: 6.3-2
smartmontools: 7.1-pve2
spiceterm: 3.1-1
vncterm: 1.6-2
zfsutils-linux: 0.8.5-pve1
 
Have you tried just rebooting the machine?
I have similar old gear and since some times it needs reboots every 24 days. On the 25th day it otherwise will lock up (ZFS pools suspended)...
 
Have you tried just rebooting the machine?
I have similar old gear and since some times it needs reboots every 24 days. On the 25th day it otherwise will lock up (ZFS pools suspended)...
Yes. I've try rebooting sometimes, put a forced fsck on grub... I don't work with ZFS, not for nothing. I'm working with a built-in Raid-5 SAS 6G card. The speed of that Raid is asewomeness (For my use). I get out of work and put some failed services. But it's strange. I've do always the same, apt update & dist-upgrade.
 
The reference / link to ZFS was just to show how my problem.manifests.
I still don't know what causes my issue. ZFS being suspended could only be the result, not the source.

My point is: take note of your uptimes and see if that issue appares around the same number of days. Then try to circumvent / workaround the problem with a regular reboot. Maybe it helps
 
  • Like
Reactions: xickwy
Any mdadm raid in your machine ?

I notice BIG slowdown when the checkarray runs ( there is cron programmed to run monthly every 1st Sunday
 
Any mdadm raid in your machine ?

I notice BIG slowdown when the checkarray runs ( there is cron programmed to run monthly every 1st Sunday
Nope, its hardware raid. And the Adaptec MSM shows a good health. Those are the disks with the images and lxc. Backups are stored on a NFS server and trying the tuxis service. The Main disk is a SSD with 240GB connected in the backpane of internal RAID motherboard. A Intel RSTi

This night i've see another service down. Man-db. It shows Access denied again.

SElinux isnt installed, but yes Apparmor. But with my knows, Apparmor only works for lxc's, no?
 
Last edited:
Resolved. I need to restore the permissions in all folders. Holy $h1t!!! What a headache.... I need to investigate why its occurred.....

I put all the code if a any people fail similarly for something.

After do that, Reboot. If apply that fix, you see a lot of deny or not applicable. Disable Apparmor on GRUB (apparmor=0), and later apply this fix again.

Code:
chmod -R 755 /bin /boot /dev /etc/ /home /lib /lib64 /media /mnt /opt /run /sbin /srv /usr /var
chmod -R 1777 /tmp
chmod -R 555 /sys
chmod -R 555 /proc
chmod -R 700 /root
chmod -x /lib/systemd/user/*.target
chmod -x /lib/systemd/user/*.socket
chmod -x /lib/systemd/user/*.service
chmod -x /lib/systemd/system/*.target
chmod -x /lib/systemd/system/*.socket
chmod -x /lib/systemd/system/*.service
chmod -x /etc/systemd/*.conf
chmod 400 /etc/ssh/*
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!