crashed PVE-hosts

TErxleben

Renowned Member
Oct 20, 2008
187
8
83
I've been having a massive problem with some PVE hosts for quite some time now (+1 year). They tend to crash completely (weekly) during backups. All of them are up to date.
The web GUI reports, for example:
Code:
file '/usr/share/javascript/proxmox-widget-toolkit/proxmoxlib.js' exists but open for reading failed - Input/output errorfile '/usr/share/javascript/proxmox-widget-toolkit/proxmoxlib.js' exists but open for reading failed - Input/output error
Logging in via SSH is also no longer possible: Connection refused.
nmap shows the expected ports as open.
Code:
root@backup:~# nmap pve1
Starting Nmap 7.93 ( https://nmap.org ) at 2025-03-19 19:38 CET
Nmap scan report for pve1 (192.168.5.201)
Host is up (0.00016s latency).
rDNS record for 192.168.5.201: pve1.fritz.box
Not shown: 997 closed tcp ports (reset)
PORT     STATE SERVICE
22/tcp   open  ssh
111/tcp  open  rpcbind
3128/tcp open  squid-http
MAC Address: 00:4E:01:A4:74:31 (Dell)
Has anyone else experienced a similar problem and perhaps even found a solution?
 
Either you ran out of space and corrupted your filesystem, or your disks are failing as @alexskysilk mentioned.

I would look into doing a rescue boot from a new boot source and examining both software and hardware states.

It seems like this node is hosted in the cloud, if you have console access that may be another way to get into it.


Blockbridge : Ultra low latency all-NVME shared storage for Proxmox - https://www.blockbridge.com/proxmox
 
Last edited:
This file is part of subscription notification. Have you tried to remove the notification via one of the "helpful" scripts?
/usr/share/javascript/proxmox-widget-toolkit/proxmoxlib.js

Retrace your steps, recover the files to their original state (there are posts on the forum on how to reinstall affected packages).


Blockbridge : Ultra low latency all-NVME shared storage for Proxmox - https://www.blockbridge.com/proxmox
 
This file is part of subscription notification. Have you tried to remove the notification via one of the "helpful" scripts?


Retrace your steps, recover the files to their original state (there are posts on the forum on how to reinstall affected packages).


Blockbridge : Ultra low latency all-NVME shared storage for Proxmox - https://www.blockbridge.com/proxmox
I don't use any tricky scripts.
Only the nosubscription repo is integrated via the WebGUI.

P.S.: do you have a link for recovering maybe affected files?
 
Last edited:
I don't use any tricky scripts.
Only the nosubscription repo is integrated via the WebGUI.
Then, I guess, it is just an unlucky coincidence that the only I/O error is against that particular file.

You are the only person who has access to the system and the entirety of the facts. So far we have:
- no network access
- single file I/O error
- your testimony that there are no SMART errors or space issues
- no information on how you are accessing this data

You need to:
a) examine the last 500-1000 lines of journalctl output
b) examine dmesg
c) provide additional information about your system, backed by screenshots if thats the only way
d) examine systemd status

I am certain that there is more data on that system that would allow community to make educated suggestions rather than provide guesses based on the most common conditions reported in this forum.

Best


Blockbridge : Ultra low latency all-NVME shared storage for Proxmox - https://www.blockbridge.com/proxmox
 
P.S.: do you have a link for recovering maybe affected files?
Ask the system where the (potentially) damaged file comes from:
Code:
~# dpkg -S /usr/share/javascript/proxmox-widget-toolkit/proxmoxlib.js 
proxmox-widget-toolkit: /usr/share/javascript/proxmox-widget-toolkit/proxmoxlib.js
Re-install the package; this will (should) "repair" all files contained in that package:
Code:
~# apt --reinstall install proxmox-widget-toolkit
 
Then, I guess, it is just an unlucky coincidence that the only I/O error is against that particular file.

You are the only person who has access to the system and the entirety of the facts. So far we have:
- no network access
- single file I/O error
- your testimony that there are no SMART errors or space issues
- no information on how you are accessing this data

You need to:
a) examine the last 500-1000 lines of journalctl output
b) examine dmesg
c) provide additional information about your system, backed by screenshots if thats the only way
d) examine systemd status

I am certain that there is more data on that system that would allow community to make educated suggestions rather than provide guesses based on the most common conditions reported in this forum.

Best


Blockbridge : Ultra low latency all-NVME shared storage for Proxmox - https://www.blockbridge.com/proxmox
First of all thx for your help-
- affected systems are accessible via ping/nmap
- system went inaccessible erratical during vzdump of different VMs
- no SMART errors appears
- The backup destination (internal backup disk, PBS, shared NFS directory) doesn't matter. Sometimes a host crashes here and there.
- I read the available logs until my eyes bled. I guess I'll have to go back and do it again.

brgds
 
Ask the system where the (potentially) damaged file comes from:
Code:
~# dpkg -S /usr/share/javascript/proxmox-widget-toolkit/proxmoxlib.js
proxmox-widget-toolkit: /usr/share/javascript/proxmox-widget-toolkit/proxmoxlib.js
Re-install the package; this will (should) "repair" all files contained in that package:
Code:
~# apt --reinstall install proxmox-widget-toolkit
@UdoB
Unfortunately, the current server that prompted my post is currently unavailable. I'll follow your advice and report back tomorrow.
 
@bgeek17
It's a mystery to me. The machine runs absolutely reliably in daily use. That's why I've ruled out RAM and the like. Only when the disks are under increased load (like during backups) does the machine freeze. I'll play around with bandwidth limiting now.