Write-error on swap-device on brand new hardware

Smoochii · 2025-07-17T15:42:28+0200

I just bought a brand new mini-pc and installed proxmox on it. I then restored all of my VMs from my backup drive and everything was working. It looks like the computer crashed in the middle of the day and when I rebooted it I had this error message a bunch: `Write-error on swap-device`. I power cycled it again and it booted normally. I properly shut it down and removed it from the rack to move it to a new rack and upon booting it I got this error message again.

I've read this can be related to a bad drive but this thing is brand new. If I just reinstall proxmox could that fix the issue? Is there anything else I can do? Thanks!

LnxBil · 2025-07-17T16:28:11+0200

If this error occurs, you can also try to read other parts of the disk in order to check if they also fail.
Is the SSD maybo too hot?

Smoochii · 2025-07-17T16:30:25+0200

Ya, I guess that could be. It's in a really small enclosure but the room has really good air flow (it's where the rest of my server stuff is). It's just strange to me that a reboot fixes it. I guess if I keep seeing it I'll just reformat the disk and start over.

How can I "try to read other parts of the disk" if it gets in that state again?

Smoochii · 2025-07-17T16:42:17+0200

Ugh, now I'm getting this in the web GUI, I think something might just be corrupt. `file '/usr/share/javascript/proxmox-widget-toolkit/proxmoxlib.js' exists but open for reading failed - Input/output error`

leesteken · 2025-07-17T17:44:05+0200

Run a long SMART test (using whatever tools you like from any of the many guides on the internet) and if it fails (indicating that the device agrees that it is dying) return it for RMA.

EDIT: If the long SMART succeeds then reseat/replace cables and other connectors. Try a different slot and test RAM and maybe a different computer or try different drive.

EDIT2: nvme device-self-test --help

Smoochii · 2025-07-17T21:44:09+0200

I tried this but it doesn't look like SMART supports nvme drives. I just ended up wiping the drive and starting over. If it keeps happening I'll replace the drive or computer.

Impact · 2025-07-18T05:06:41+0200

What's the drive model? You can find it via smartctl -i /dev/nvme... or nvme list or lsblk -do+MODEL,SERIAL.
The SMART data of it would be interesting too. You can use smartctl -A /dev/nvme... or nvme smart-log /dev/nvme....
Try to run update-smart-drivedb first.

Smoochii · 2025-07-18T05:50:04+0200

The model is CT1000P3PSSD8. Here is the output of smartctl:

Also, I installed proxmox on the second hard drive that I bought to see if it still crashes. I'll keep an eye on it overnight and check tomorrow.

=== START OF SMART DATA SECTION ===
SMART/Health Information (NVMe Log 0x02)
Critical Warning: 0x00
Temperature: 37 Celsius
Available Spare: 100%
Available Spare Threshold: 5%
Percentage Used: 0%
Data Units Read: 67,064 [34.3 GB]
Data Units Written: 163,246 [83.5 GB]
Host Read Commands: 794,402
Host Write Commands: 1,097,745
Controller Busy Time: 4
Power Cycles: 22
Power On Hours: 50
Unsafe Shutdowns: 4
Media and Data Integrity Errors: 0
Error Information Log Entries: 0
Warning Comp. Temperature Time: 0
Critical Comp. Temperature Time: 0
Temperature Sensor 1: 37 Celsius

Impact · 2025-07-18T06:00:44+0200

It's a QLC drive and from what I've read not a very good one. Might work okay with the default LVM-Thin though. Beside the Unsafe Shutdowns the values look okay to me. I'd also check for a firmware update and as mentioned monitor drive temperature under load and do a self test.
To monitor other temperatures you can use something like watch -c -d -n1 sensors. Run apt install lm-sensors first.

Smoochii · 2025-07-18T06:06:41+0200

I've never updated the firmware for an SSD before, how do I do that? Also, I tried running SMART test and it didn't seem to do anything. I ran `nvme device-self-test /dev/nvme0n1 -s 1` and all the results are 0xf, what does that mean?

Smoochii · 2025-07-18T06:08:40+0200

I bought a Samsung 990 Pro to replace it and also bought another machine just in case it's just a lemon.

Impact · 2025-07-18T06:11:28+0200

The 990 Pro had wear issues and might require a firmware update too.
nvme self-test-log -v /dev/nvme... might explain the value. I'd let it run like so you can watch the progress

Bash:

nvme device-self-test /dev/nvme... -ws 1h

Smoochii · 2025-07-18T06:14:41+0200

root@pve:~# fwupdmgr get-updates
WARNING: UEFI capsule updates not available or enabled in firmware setup
See https://github.com/fwupd/fwupd/wiki/PluginFlag:capsules-unsupported for more information.
Devices with no available firmware updates:
• CT1000P3PSSD8
• CT2000P310SSD8
• WDC WD40NMZW-11GX6S1
No updatable devices

Impact · 2025-07-18T06:20:21+0200

Not every vendor provides firmware though that and you have to download their tool. Seems like there are none for those models though.

Search

Search

Write-error on swap-device on brand new hardware

Smoochii

New Member

LnxBil

Distinguished Member

Smoochii

New Member

Smoochii

New Member

leesteken

Distinguished Member

Smoochii

New Member

Impact

Well-Known Member

Smoochii

New Member

Impact

Well-Known Member

Smoochii

New Member

Smoochii

New Member

Impact

Well-Known Member

Smoochii

New Member

Impact

Well-Known Member

We value your privacy