Write-error on swap-device on brand new hardware

Smoochii

New Member
Jun 2, 2024
22
3
3
I just bought a brand new mini-pc and installed proxmox on it. I then restored all of my VMs from my backup drive and everything was working. It looks like the computer crashed in the middle of the day and when I rebooted it I had this error message a bunch: `Write-error on swap-device`. I power cycled it again and it booted normally. I properly shut it down and removed it from the rack to move it to a new rack and upon booting it I got this error message again.

I've read this can be related to a bad drive but this thing is brand new. If I just reinstall proxmox could that fix the issue? Is there anything else I can do? Thanks!

1752759713035.png
 
Ya, I guess that could be. It's in a really small enclosure but the room has really good air flow (it's where the rest of my server stuff is). It's just strange to me that a reboot fixes it. I guess if I keep seeing it I'll just reformat the disk and start over.

How can I "try to read other parts of the disk" if it gets in that state again?
 
Ugh, now I'm getting this in the web GUI, I think something might just be corrupt. `file '/usr/share/javascript/proxmox-widget-toolkit/proxmoxlib.js' exists but open for reading failed - Input/output error`
 
Run a long SMART test (using whatever tools you like from any of the many guides on the internet) and if it fails (indicating that the device agrees that it is dying) return it for RMA.

EDIT: If the long SMART succeeds then reseat/replace cables and other connectors. Try a different slot and test RAM and maybe a different computer or try different drive.

EDIT2: nvme device-self-test --help
 
Last edited:
I tried this but it doesn't look like SMART supports nvme drives. I just ended up wiping the drive and starting over. If it keeps happening I'll replace the drive or computer.
 
What's the drive model? You can find it via smartctl -i /dev/nvme... or nvme list or lsblk -do+MODEL,SERIAL.
The SMART data of it would be interesting too. You can use smartctl -A /dev/nvme... or nvme smart-log /dev/nvme....
Try to run update-smart-drivedb first.
 
Last edited:
The model is CT1000P3PSSD8. Here is the output of smartctl:

Also, I installed proxmox on the second hard drive that I bought to see if it still crashes. I'll keep an eye on it overnight and check tomorrow.

=== START OF SMART DATA SECTION ===
SMART/Health Information (NVMe Log 0x02)
Critical Warning: 0x00
Temperature: 37 Celsius
Available Spare: 100%
Available Spare Threshold: 5%
Percentage Used: 0%
Data Units Read: 67,064 [34.3 GB]
Data Units Written: 163,246 [83.5 GB]
Host Read Commands: 794,402
Host Write Commands: 1,097,745
Controller Busy Time: 4
Power Cycles: 22
Power On Hours: 50
Unsafe Shutdowns: 4
Media and Data Integrity Errors: 0
Error Information Log Entries: 0
Warning Comp. Temperature Time: 0
Critical Comp. Temperature Time: 0
Temperature Sensor 1: 37 Celsius
 
It's a QLC drive and from what I've read not a very good one. Might work okay with the default LVM-Thin though. Beside the Unsafe Shutdowns the values look okay to me. I'd also check for a firmware update and as mentioned monitor drive temperature under load and do a self test.
To monitor other temperatures you can use something like watch -c -d -n1 sensors. Run apt install lm-sensors first.
 
Last edited:
I've never updated the firmware for an SSD before, how do I do that? Also, I tried running SMART test and it didn't seem to do anything. I ran `nvme device-self-test /dev/nvme0n1 -s 1` and all the results are 0xf, what does that mean?
 
Not every vendor provides firmware though that and you have to download their tool. Seems like there are none for those models though.