Proxmox crashing when using ressources

Baptiste.mrch

New Member
Oct 26, 2023
11
1
1
Hello,

I create this thread because I'm really lost. It's now some times that my Proxmox crashes, I noticed when using a lot of resources.
I'm using a docker container with Photoprism installed on it (using the awesome proxmox helper scripts), and I noticed it used some heavy resources (I put 2 CPU thread and 2go RAM). When I execute a library indexation, I have my 2 CPU thread and my RAM that go likes 80% of usage ONLY for this container. The host tells me that it was approximately 50/60% CPU and 60/70% of RAM.

I wonder if I'm not having some bad hardware because I try many things, including updating Proxmox, Processor Microcode, CPU Scaling Governor all from the proxmox helper script. Still have issues.

Below you will find 2 logs I extract from Proxmox using Rsyslog
https://pastebin.com/dHair40M
https://pastebin.com/m2KatVHG

Sorry if my English is bad :)
Thanks to you !

Here is my config:
root@proxmox:~# pveversion --verbose
proxmox-ve: 8.0.2 (running kernel: 6.2.16-18-pve)
pve-manager: 8.0.4 (running version: 8.0.4/d258a813cfa6b390)
pve-kernel-6.2: 8.0.5
proxmox-kernel-helper: 8.0.3
proxmox-kernel-6.2.16-18-pve: 6.2.16-18
proxmox-kernel-6.2: 6.2.16-18
proxmox-kernel-6.2.16-15-pve: 6.2.16-15
pve-kernel-6.2.16-3-pve: 6.2.16-3
ceph-fuse: 17.2.6-pve1+3
corosync: 3.1.7-pve3
criu: 3.17.1-2
glusterfs-client: 10.3-5
ifupdown2: 3.2.0-1+pmx5
ksm-control-daemon: 1.4-1
libjs-extjs: 7.0.0-4
libknet1: 1.28-pve1
libproxmox-acme-perl: 1.4.6
libproxmox-backup-qemu0: 1.4.0
libproxmox-rs-perl: 0.3.1
libpve-access-control: 8.0.5
libpve-apiclient-perl: 3.3.1
libpve-common-perl: 8.0.9
libpve-guest-common-perl: 5.0.5
libpve-http-server-perl: 5.0.4
libpve-rs-perl: 0.8.5
libpve-storage-perl: 8.0.2
libspice-server1: 0.15.1-1
lvm2: 2.03.16-2
lxc-pve: 5.0.2-4
lxcfs: 5.0.3-pve3
novnc-pve: 1.4.0-2
proxmox-backup-client: 3.0.4-1
proxmox-backup-file-restore: 3.0.4-1
proxmox-kernel-helper: 8.0.3
proxmox-mail-forward: 0.2.0
proxmox-mini-journalreader: 1.4.0
proxmox-widget-toolkit: 4.0.9
pve-cluster: 8.0.4
pve-container: 5.0.4
pve-docs: 8.0.5
pve-edk2-firmware: 3.20230228-4
pve-firewall: 5.0.3
pve-firmware: 3.8-3
pve-ha-manager: 4.0.2
pve-i18n: 3.0.7
pve-qemu-kvm: 8.0.2-7
pve-xtermjs: 4.16.0-3
qemu-server: 8.0.7
smartmontools: 7.3-pve1
spiceterm: 3.3.0
swtpm: 0.8.0+pve1
vncterm: 1.8.0
zfsutils-linux: 2.1.13-pve1
 
I'm using a docker container with Photoprism installed on it (using the awesome proxmox helper scripts), and I noticed it used some heavy resources (I put 2 CPU thread and 2go RAM). When I execute a library indexation, I have my 2 CPU thread and my RAM that go likes 80% of usage ONLY for this container. The host tells me that it was approximately 50/60% CPU and 60/70% of RAM.
The Proxmox manual and just about every thread about Docker on this forum will warn about running it in a container. Can you run Docker in a VM and see if the problem persists?
 
If this is the case, the script didn't come from the repository you mentioned. The script there installs PhotoPrism in a Linux container, not a Docker container.
Yeah, I'm using the Docker LXC so that I can keep the hand on the installation. Not sure if it necessary, but I'm using a NFS mount on /etc/fstab and I didn't succeed with your Photoprism script.
Many thanks for your scripts btw :)
 
It crashes again. Here's the logs :
Oct 26 22:31:45 proxmox kernel: device veth107i0 entered promiscuous mode
Oct 26 22:31:45 proxmox kernel: eth0: renamed from vethRwSAsW
Oct 26 22:31:45 proxmox pct[15797]: <root@pam> end task UPID:proxmox:00003DB6:0003A8BA:653ACCB0:vzstart:107:root@pam: OK
Oct 26 22:31:46 proxmox kernel: IPv6: ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready
Oct 26 22:31:46 proxmox kernel: vmbr0: port 8(veth107i0) entered blocking state
Oct 26 22:31:46 proxmox kernel: vmbr0: port 8(veth107i0) entered forwarding state
Oct 26 22:31:53 proxmox postfix/qmgr[909]: 5E84A81012: from=<root@proxmox.home>, size=614, nrcpt=1 (queue active)
Oct 26 22:31:53 proxmox postfix/local[16582]: error: open database /etc/aliases.db: No such file or directory
Oct 26 22:31:53 proxmox postfix/local[16582]: warning: hash:/etc/aliases is unavailable. open database /etc/aliases.db: No such file or directory
Oct 26 22:31:53 proxmox postfix/local[16582]: warning: hash:/etc/aliases: lookup of 'root' failed
Oct 26 22:31:53 proxmox postfix/local[16582]: 5E84A81012: to=<root@proxmox.home>, orig_to=<root>, relay=local, delay=2339, delays=2339/0.01/0/0.03, dsn=4.3.0, status=deferred (alias database unavailable)
Oct 26 22:36:53 proxmox postfix/qmgr[909]: 3198980FC3: from=<root@proxmox.home>, size=614, nrcpt=1 (queue active)
Oct 26 22:36:53 proxmox postfix/local[19515]: error: open database /etc/aliases.db: No such file or directory
Oct 26 22:36:53 proxmox postfix/local[19515]: warning: hash:/etc/aliases is unavailable. open database /etc/aliases.db: No such file or directory
Oct 26 22:36:53 proxmox postfix/local[19515]: warning: hash:/etc/aliases: lookup of 'root' failed
Oct 26 22:36:53 proxmox postfix/local[19515]: 3198980FC3: to=<root@proxmox.home>, orig_to=<root>, relay=local, delay=3331, delays=3331/0.01/0/0.01, dsn=4.3.0, status=deferred (alias database unavailable)
Oct 26 22:40:24 proxmox pveproxy[963]: worker 966 finished
Oct 26 22:40:24 proxmox pveproxy[963]: starting 1 worker(s)
Oct 26 22:40:24 proxmox pveproxy[963]: worker 20604 started
Oct 26 22:40:25 proxmox pveproxy[20600]: worker exit
Oct 26 22:45:42 proxmox pvedaemon[959]: <root@pam> successful auth for user 'root@pam'
Oct 26 22:51:08 proxmox pveproxy[963]: worker 965 finished
Oct 26 22:51:08 proxmox pveproxy[963]: starting 1 worker(s)
Oct 26 22:51:08 proxmox pveproxy[963]: worker 24334 started
Oct 26 22:51:11 proxmox pveproxy[24330]: got inotify poll request in wrong process - disabling inotify
Oct 26 22:51:51 proxmox smartd[601]: Device: /dev/sda [SAT], SMART Usage Attribute: 190 Airflow_Temperature_Cel changed from 75 to 74
-- Reboot --
Oct 26 23:01:05 proxmox kernel: microcode: microcode updated early to revision 0xf4, date = 2023-02-23
Oct 26 23:01:05 proxmox kernel: Linux version 6.2.16-18-pve (build@proxmox) (gcc (Debian 12.2.0-14) 12.2.0, GNU ld (GNU Binutils for Debian) 2.40) #1 SMP PREEMPT_DYNAMIC PMX 6.2.16-18 (2023-10-11T15:05Z) ()
Oct 26 23:01:05 proxmox kernel: Command line: BOOT_IMAGE=/boot/vmlinuz-6.2.16-18-pve root=/dev/mapper/pve-root ro quiet
Oct 26 23:01:05 proxmox kernel: KERNEL supported cpus:
Oct 26 23:01:05 proxmox kernel: Intel GenuineIntel
Oct 26 23:01:05 proxmox kernel: AMD AuthenticAMD
Oct 26 23:01:05 proxmox kernel: Hygon HygonGenuine
Oct 26 23:01:05 proxmox kernel: Centaur CentaurHauls
Oct 26 23:01:05 proxmox kernel: zhaoxin Shanghai
 
It crashes again. Here's the logs :
Code:
Oct 26 22:31:45 proxmox kernel: device veth107i0 entered promiscuous mode
Oct 26 22:31:45 proxmox kernel: eth0: renamed from vethRwSAsW
Oct 26 22:31:45 proxmox pct[15797]: <root@pam> end task UPID:proxmox:00003DB6:0003A8BA:653ACCB0:vzstart:107:root@pam: OK
Oct 26 22:31:46 proxmox kernel: IPv6: ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready
Oct 26 22:31:46 proxmox kernel: vmbr0: port 8(veth107i0) entered blocking state
Oct 26 22:31:46 proxmox kernel: vmbr0: port 8(veth107i0) entered forwarding state
Oct 26 22:31:53 proxmox postfix/qmgr[909]: 5E84A81012: from=<root@proxmox.home>, size=614, nrcpt=1 (queue active)
Oct 26 22:31:53 proxmox postfix/local[16582]: error: open database /etc/aliases.db: No such file or directory
Oct 26 22:31:53 proxmox postfix/local[16582]: warning: hash:/etc/aliases is unavailable. open database /etc/aliases.db: No such file or directory
Oct 26 22:31:53 proxmox postfix/local[16582]: warning: hash:/etc/aliases: lookup of 'root' failed
Oct 26 22:31:53 proxmox postfix/local[16582]: 5E84A81012: to=<root@proxmox.home>, orig_to=<root>, relay=local, delay=2339, delays=2339/0.01/0/0.03, dsn=4.3.0, status=deferred (alias database unavailable)
Oct 26 22:36:53 proxmox postfix/qmgr[909]: 3198980FC3: from=<root@proxmox.home>, size=614, nrcpt=1 (queue active)
Oct 26 22:36:53 proxmox postfix/local[19515]: error: open database /etc/aliases.db: No such file or directory
Oct 26 22:36:53 proxmox postfix/local[19515]: warning: hash:/etc/aliases is unavailable. open database /etc/aliases.db: No such file or directory
Oct 26 22:36:53 proxmox postfix/local[19515]: warning: hash:/etc/aliases: lookup of 'root' failed
Oct 26 22:36:53 proxmox postfix/local[19515]: 3198980FC3: to=<root@proxmox.home>, orig_to=<root>, relay=local, delay=3331, delays=3331/0.01/0/0.01, dsn=4.3.0, status=deferred (alias database unavailable)
Proxmox is trying to tell you something, which might be important, by e-mailing you but fails because mail is not configured properly.
Code:
Oct 26 22:40:24 proxmox pveproxy[963]: worker 966 finished
Oct 26 22:40:24 proxmox pveproxy[963]: starting 1 worker(s)
Oct 26 22:40:24 proxmox pveproxy[963]: worker 20604 started
Oct 26 22:40:25 proxmox pveproxy[20600]: worker exit
Oct 26 22:45:42 proxmox pvedaemon[959]: <root@pam> successful auth for user 'root@pam'
Oct 26 22:51:08 proxmox pveproxy[963]: worker 965 finished
Oct 26 22:51:08 proxmox pveproxy[963]: starting 1 worker(s)
Oct 26 22:51:08 proxmox pveproxy[963]: worker 24334 started
Oct 26 22:51:11 proxmox pveproxy[24330]: got inotify poll request in wrong process - disabling inotify
Oct 26 22:51:51 proxmox smartd[601]: Device: /dev/sda [SAT], SMART Usage Attribute: 190 Airflow_Temperature_Cel changed from 75 to 74
-- Reboot --
There is no information in the logs about the crash, just that it happened and the system detected a restart. This is usually a hardware issue. Run memtest, start replacing components, update the BIOS, make sure temperatures are not too high. What kind of hardware are you using?
 
Okay, I think I will try to setup the emails conf so that I can check.

I will do memtest. I have buy new RAM sticks few days ago, I will soon install it and see if it's doing better.

Thanks for your help !
 
Okay some news, I configured email service, works fine (I tested via command line and via backups notification). I receive absolutely 0 emails regarding the crashes.
Memtest tells me everything is fine. I really don't know what to do.
My config is the following :
- Intel Core i3-8109U @ 3.00GHz
- 8go DDR4 2400MT/s
- 1to M.2 SSD from Samsung
 
Okay some news, I configured email service, works fine (I tested via command line and via backups notification). I receive absolutely 0 emails regarding the crashes.
Memtest tells me everything is fine. I really don't know what to do.
My config is the following :
- Intel Core i3-8109U @ 3.00GHz
- 8go DDR4 2400MT/s
- 1to M.2 SSD from Samsung
And you are having these problems when running Docker inside a VM? Or only when running it in a container?
 
And you are having these problems when running Docker inside a VM? Or only when running it in a container?
I have this issue randomly, I think when using a lot of resources. (My photoprism container is only an example).
I've just reinstalled Proxmox, and for now, all seems to keep online even by turn on all my container/vm. I will let few days go by and see if it fixed my issue. I let you know :)
 
Okay some news. Everything appear to working fine, until saturday, when my proxmox crashes again. The mains things that repeat at each time I check the logs is some sort of cron execution (the hourly one). I checked and I have nothing at all inside the /etc/cron.hourly directory
https://pastebin.com/k082cTGp

If you have any ideas, and/or diagnostics, I take.
Thanks !
 
Okay some news. Everything appear to working fine, until saturday, when my proxmox crashes again. The mains things that repeat at each time I check the logs is some sort of cron execution (the hourly one). I checked and I have nothing at all inside the /etc/cron.hourly directory
https://pastebin.com/k082cTGp

If you have any ideas, and/or diagnostics, I take.
Thanks !
Cron messages appear regularly anyway and are probably unrelated. It's probably a hardware (or cooling) issue that is triggers by stressing the system.
 
Hi again,

After a while, reinstalling Proxmox seems to work fine. Unfortunately, I still have crashes. I changed ram sticks (upgrade to 16go) and still crashes. I think I will try a completely different hardware to be sure.

Here are the logs in case someone found something interesting:
https://pastebin.com/NLnEMbbx

Thanks