Systemd[1]: Failed to start Journal Service.

Mikonara

New Member
Mar 21, 2022
3
0
1
24
Hello, I've recently tried to open up my proxmox gui and noticed i couldn't connect to it. When i opened up the terminal i was greeted by two a few different errors.

1.
INFO: task pmxcfs:1916 blocked for more than 120 seconds
INFO: task java:6969xx blocked for more than 120 seconds (the xx meaning there were a few of these.)
INFO: task kworker/u129:x:xxxxx blocked for more than 120 seconds

at the end of those theres a bunch of systemd[1]: failed to start journal service.

Because of these errors i cant get into the console to get logs or ssh in to get logs either since it times out, however all the VMs that were running in proxmox are up and working as normal.

Anyone know where i should go from here?
 

Attachments

  • rpviewer.png
    rpviewer.png
    70.1 KB · Views: 113
hi,
Because of these errors i cant get into the console to get logs or ssh in to get logs either since it times out, however all the VMs that were running in proxmox are up and working as normal.
* are you sure the login does not work from the console? have you tried pressing enter a couple of times to see if the login prompt comes back?

* is this a cluster or a standalone node?

* can you login via the GUI?

* the other error you're getting is probably from a container running some java app, which is using too much memory
 
Hi Oguz, thank you for the response!
* are you sure the login does not work from the console? have you tried pressing enter a couple of times to see if the login prompt comes back?
Yes i have tried doing multiple key combinations and spamming enter a bunch of times to no avail.

* is this a cluster or a standalone node?

this is a standalone server

* can you login via the GUI?
I cannot log into the gui. The page does not load.
* the other error you're getting is probably from a container running some java app, which is using too much memory
This question confuses me a little. Maybe its my misunderstanding of VMs... Don't VMs and containers have a set amount of memory to work with? I have about 8 machines running windows 10 vms and 2 containers running ubuntu. In total they take up only 36 gigs and i have 64 gigs in total for the machine. How would a vm's memory effect the host (except for the obvious over provision)?
 
Yes i have tried doing multiple key combinations and spamming enter a bunch of times to no avail.
okay

this is a standalone server
alright

I cannot log into the gui. The page does not load.
so the services are not working at all..?

Don't VMs and containers have a set amount of memory to work with?
yes, but your container is running out of memory because of the java app running inside (at least that's what the error says) and kills that process to free up memory (in the container).
normally that shouldn't cause the host to hang so it might be unrelated, though you mention that your VMs are running fine so that's interesting.

* were you doing anything specific on the host before this started happening? such as upgrades, or modifying any configuration files?

* do you remember which PVE version you're on?

if you could boot into the machine in debug mode [0], then you could take a look if there are any errors in journalctl and/or dmesg and post them here.

[0]: https://pve.proxmox.com/pve-docs/pve-admin-guide.html#installation_installer
 
Hey just wanted to put it out there for anyone else having similar issues. Increasing the ram on containers that are over 75% utilization fixed the problem for me. I had a ubuntu container running Unifi's controller software on it. Increasing the ram from 512 mb to 4 gigs has "fixed" (still monitoring) the issue with a 7 day uptime with no gui/console lock ups.
 
I too am struggling with the same issue. Memory is under 30% util on the single Container am running. The only other thing I've noticed is that my IO delay is more than 90% so am assuming I have a hard drive failure coming or it could be related?
 
I'm also thinking it's a hard drive failure even though smartctl tests are passing and other disk checks are all fine. Still havent found the root cause of my problem.
I'm going through trim (never used it before) managed to save 16.4gb on the host, and currently trying to trim my main (largest) CT... using pct trim 101

Never run the command before, so it's all a bit above me, will see if that brings anything back to life. It's taking like 10minutes to log into SSH!!!!
 
I'm also thinking it's a hard drive failure even though smartctl tests are passing and other disk checks are all fine. Still havent found the root cause of my problem.
Ok, so I seemed to have fixed my problem. On the host I did trim on the container, and after what seemed like an hour, it came back with some more info. Again, 16.4gb seems to have been trimmed. I restarted again, only this time, the machine is back to normal. Transfer speeds went from <100kb/s to 11MiB/s. IO delay < 1% (it was >95% before).

I think we're good! Hope you sort yours out.
 
Hey just wanted to put it out there for anyone else having similar issues. Increasing the ram on containers that are over 75% utilization fixed the problem for me. I had a ubuntu container running Unifi's controller software on it. Increasing the ram from 512 mb to 4 gigs has "fixed" (still monitoring) the issue with a 7 day uptime with no gui/console lock ups.
how do i do that? i can't go on gui at all.