Variety of disk errors, can't delete VM "unable to open file" "Input/output error" & "Broken pipe"

jaxjexjox

Member
Mar 25, 2022
53
0
11
1.)
I can't currently delete a VM which was created, setup, shut down and left off - it's not important at all.
TASK ERROR: unable to open file '/etc/pve/nodes/proxmox/qemu-server/108.conf.tmp.2207519' - Input/output error
root@proxmox:/etc/pve/nodes/proxmox/qemu-server# ls -lah | grep 108
-rw-r----- 1 root www-data 511 Apr 26 03:34 108.conf
That file doesn't exist? I've rebooted so I'd expect any "locks" to files to be cleared?
The VM was only used for a few hours shut down and left off.

NOTE: this VM has never even been included in backups, different issue to the one below.






2.)
Furthermore, last nights backup for another VM:
INFO: 59% (41.3 GiB of 70.0 GiB) in 10m 40s, read: 30.7 MiB/s, write: 30.6 MiB/s
ERROR: vma_queue_write: write error - Broken pipe
INFO: aborting backup job
INFO: resuming VM again
unable to open file '/etc/pve/nodes/proxmox/qemu-server/100.conf.tmp.1395630' - Input/output error
ERROR: Backup of VM 100 failed - vma_queue_write: write error - Broken pipe

INFO: Failed at 2023-05-14 03:11:16
However the VM is up, running and working at the moment.






3.)
I couldn't start an LXC (if I recall?) or restart it, not sure.
TASK ERROR: CT is locked (snapshot-delete)

Couldn't delete the snapshot either (same error)
I had to google and run the following, which allowed me to nuke the snapshot
"pct unlock 103"







4.)

Finally,
I've had issues logging in to the webUI with 100% the correct password. I've ended up having to do this fix regularly via SSH.
pvecm updatecerts && systemctl restart pve-cluster && systemctl restart corosync
I only mention this because all 4 of these issues have occurred in about the last 2.5 weeks, ever since I added a couple of VMs, it's like I tripped something up?
Otherwise the machine is running lovely, it's fine.



I thought it was a disk space issue?
root@proxmox:/etc/pve/nodes/proxmox/qemu-server# df
Filesystem 1K-blocks Used Available Use% Mounted on
udev 16276508 0 16276508 0% /dev
tmpfs 3262000 1188 3260812 1% /run
rpool/ROOT/pve-1 121833600 79653120 42180480 66% /
tmpfs 16309996 53040 16256956 1% /dev/shm
tmpfs 5120 0 5120 0% /run/lock
rpool 42180608 128 42180480 1% /rpool
rpool/data 42180608 128 42180480 1% /rpool/data
rpool/ROOT 42180608 128 42180480 1% /rpool/ROOT
rpool/data/subvol-102-disk-0 8388608 303616 8084992 4% /rpool/data/subvol-102-disk-0
rpool/data/subvol-103-disk-0 41943040 13886080 28056960 34% /rpool/data/subvol-103-disk-0
/dev/fuse 131072 20 131052 1% /etc/pve
tmpfs 3261996 0 3261996 0% /run/user/0




I've clearly slightly broken something, though I don't recall changing any major settings in the UI at all, I just added VMs.
Any tips on where I've gone wrong here? I suspect, sooner or later she's going to fall over hard and die.
 
Last edited:
=== START OF SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

SMART/Health Information (NVMe Log 0x02)
Critical Warning: 0x00
Temperature: 44 Celsius
Available Spare: 100%
Available Spare Threshold: 50%
Percentage Used: 19%
Data Units Read: 82,289,155 [42.1 TB]
Data Units Written: 82,143,204 [42.0 TB]
Host Read Commands: 688,168,733
Host Write Commands: 1,517,247,873
Controller Busy Time: 5,864
Power Cycles: 47
Power On Hours: 9,953
Unsafe Shutdowns: 13
Media and Data Integrity Errors: 0
Error Information Log Entries: 0
Warning Comp. Temperature Time: 28
Critical Comp. Temperature Time: 0
Temperature Sensor 1: 44 Celsius
Temperature Sensor 2: 48 Celsius
Thermal Temp. 1 Transition Count: 9974
Thermal Temp. 2 Transition Count: 1045
Thermal Temp. 1 Total Time: 157392
Thermal Temp. 2 Total Time: 3599

Error Information (NVMe Log 0x01, 16 of 256 entries)
No Errors Logged