Yes, I know I'm not working within expected parameters. I took three identical e-waste SFF PCs, installed Proxmox 8.1 on Transcend 128GB JetFlash 920 USB drives for boot and reasonably similar 1TB SATA SSDs in the only internal storage for Ceph. Ceph is working well and I'm able to migrate an LXC from PVE2 to PVE4 in about two seconds. However, one of the nodes (PVE3) keeps going read-only. For instance, I try to do most any command that would write to the disk and I get
. Nano says the disk is read only. Locally, the errors on the screen looked like
. The other two systems seem to be fine but this one is going into this state within hours of reboot. I cannot properly reboot because a shutdown -r now returns
via SSH and, while the Web GUI acts like it is going to reboot, it does not.
I have tried to reduce the stress on the flash drives by disabling SWAP (should be irrelevant with 16GB of RAM and nothing running yet), reducing logging (a long term issue for drive health but not a problem for today, right?), disabling TRIM (USB flash drive doesn't support TRIM but that was another read only issue I read about), and checking drive health (another missing feature of USB flash drives).
Any thoughts as to why one of three identical systems would be having this issue. Could one of the flash drives be defective? Could a 10 year old SFF PC have a failed USB port? Please save me from buying good new hardware and keep this cluster alive.
What else could be useful?
Code:
-bash: /usr/bin/*command*: Input/output error
Code:
[269117.049596] systemd-journald[312]: Failed to rotate /var/log/journal/very-long-number/system.journal: Read-only file system
Code:
Call to Reboot failed: Access denied
I have tried to reduce the stress on the flash drives by disabling SWAP (should be irrelevant with 16GB of RAM and nothing running yet), reducing logging (a long term issue for drive health but not a problem for today, right?), disabling TRIM (USB flash drive doesn't support TRIM but that was another read only issue I read about), and checking drive health (another missing feature of USB flash drives).
Any thoughts as to why one of three identical systems would be having this issue. Could one of the flash drives be defective? Could a 10 year old SFF PC have a failed USB port? Please save me from buying good new hardware and keep this cluster alive.
What else could be useful?
Code:
root@pve3:/etc/ssh# lsblk
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINTS
sda 8:0 0 931.5G 0 disk
└─ceph--f04948df--44a3--441f--99b6--6f4dbdd29ab4-osd--block--20ffda48--2cc5--45d5--897c--f506939f011e
252:0 0 931.5G 0 lvm
sdc 8:32 0 115.2G 0 disk
├─sdc1 8:33 0 1007K 0 part
├─sdc2 8:34 0 1G 0 part
└─sdc3 8:35 0 114.2G 0 part