USB Flash boot drive keeps going read only

Reliant8275

New Member
Sep 4, 2023
14
2
3
Yes, I know I'm not working within expected parameters. I took three identical e-waste SFF PCs, installed Proxmox 8.1 on Transcend 128GB JetFlash 920 USB drives for boot and reasonably similar 1TB SATA SSDs in the only internal storage for Ceph. Ceph is working well and I'm able to migrate an LXC from PVE2 to PVE4 in about two seconds. However, one of the nodes (PVE3) keeps going read-only. For instance, I try to do most any command that would write to the disk and I get
Code:
-bash: /usr/bin/*command*: Input/output error
. Nano says the disk is read only. Locally, the errors on the screen looked like
Code:
[269117.049596] systemd-journald[312]: Failed to rotate /var/log/journal/very-long-number/system.journal: Read-only file system
. The other two systems seem to be fine but this one is going into this state within hours of reboot. I cannot properly reboot because a shutdown -r now returns
Code:
Call to Reboot failed: Access denied
via SSH and, while the Web GUI acts like it is going to reboot, it does not.
I have tried to reduce the stress on the flash drives by disabling SWAP (should be irrelevant with 16GB of RAM and nothing running yet), reducing logging (a long term issue for drive health but not a problem for today, right?), disabling TRIM (USB flash drive doesn't support TRIM but that was another read only issue I read about), and checking drive health (another missing feature of USB flash drives).

Any thoughts as to why one of three identical systems would be having this issue. Could one of the flash drives be defective? Could a 10 year old SFF PC have a failed USB port? Please save me from buying good new hardware and keep this cluster alive.

What else could be useful?
Code:
root@pve3:/etc/ssh# lsblk
NAME                                                                                         MAJ:MIN RM   SIZE RO TYPE MOUNTPOINTS
sda                                                                                            8:0    0 931.5G  0 disk 
└─ceph--f04948df--44a3--441f--99b6--6f4dbdd29ab4-osd--block--20ffda48--2cc5--45d5--897c--f506939f011e
                                                                                             252:0    0 931.5G  0 lvm  
sdc                                                                                            8:32   0 115.2G  0 disk 
├─sdc1                                                                                         8:33   0  1007K  0 part 
├─sdc2                                                                                         8:34   0     1G  0 part 
└─sdc3                                                                                         8:35   0 114.2G  0 part
 
PVE isn't meant to be installed on pen drives or SD cards. Its writing too much and will kill those very fast as they usually use the crappiest NAND flash available and are lacking all those features reducing wear like wear leveling, GC, DRAM caching, PLP and so on.
And yes, not unusual that a flash device switches read-only on the hardwre level if the NAND cells are worn out too much.
Other common problems are corrupted filesystem or 100% filled root filesystem where Linux will switch read-only to prevent additional data-loss.
If your only option is USB, then at least use a USB SSD, USB HDD or external USB to M.2 enclosure. And USB isn't that reliable in the first place.
 
Last edited:
  • Like
Reactions: Kingneutron
PVE isn't meant to be installed on pen drives or SD cards. Its writing too much and will kill those very fast as they usually use the crappiest NAND flash available and are lacking all those features reducing wear like wear leveling, GC, DRAM caching, PLP and so on.
And yes, not unusual that a flash device switches read-only on the hardwre level if the NAND cells are worn out too much.
Other common problems are corrupted filesystem or 100% filled root filesystem where Linux will switch read-only to prevent additional data-loss.
If your only option is USB, then at least use a USB SSD, USB HDD or external USB to M.2 enclosure. And USB isn't that reliable in the first place.
While I understand your point in the long term, this is a brand new drive. It is advertised as high endurance MLC flash for purposes just like this. And the other two are operating correctly. I'm looking for other options before I send the drive back.
 
It is advertised as high endurance MLC flash for purposes just like this.
There are some industrial pen drives that might be up for the task but then you pay 4 times the price for 1/4 of the capacity of those JetFlashs. A 20$ SATA SSD/HDD + a 10$ USB-to-SATA cable will probably do a way better job. Will work for some time with pen drives...at least you got a cluster with ceph for HA so not that bad if those nodes are failing all the time...just keep in mind that you are running your nodes below the minimum hardware requirements.
 
Update: I ran a badblocks destructive test on the flash boot drive and found no errors. I ran memtest 6 times and found no errors. I am restoring the backup image of the flash drive now.
 
sdc 8:32 0 115.2G 0 disk ├─sdc1 8:33 0 1007K 0 part ├─sdc2 8:34 0 1G 0 part └─sdc3 8:35 0 114.2G 0 part
Assuming the above is the actual lsblk output from the installed /sdc (flashdrive) boot partitions/pve drive; something doesn't look right. I don't see the normal named partitions that PVE creates here.

Try and compare it to an lsblk from the other nodes.

Maybe I'm missing something here.
 
lsblk from pve1 (running for a year or so)
Code:
root@pve1:~# lsblk
NAME                                      MAJ:MIN RM   SIZE RO TYPE MOUNTPOINTS
sda                                         8:0    1     0B  0 disk 
nvme0n1                                   259:0    0 465.8G  0 disk 
├─nvme0n1p1                               259:1    0  1007K  0 part 
├─nvme0n1p2                               259:2    0     1G  0 part /boot/efi
└─nvme0n1p3                               259:3    0 464.8G  0 part 
  ├─pve-swap                              252:0    0     8G  0 lvm  
  ├─pve-root                              252:1    0    96G  0 lvm  /
  ├─pve-data_tmeta                        252:2    0   3.4G  0 lvm  
  │ └─pve-data-tpool                      252:4    0 337.9G  0 lvm  
  │   ├─pve-data                          252:5    0 337.9G  1 lvm  
  │   ├─pve-vm--100--disk--0              252:6    0    80G  0 lvm  
  │   ├─pve-vm--102--disk--0              252:7    0   120G  0 lvm  
  │   ├─pve-vm--102--state--Working       252:8    0  16.5G  0 lvm  
  │   ├─pve-vm--102--state--reboots--well 252:9    0  16.5G  0 lvm  
  │   ├─pve-vm--101--disk--0              252:10   0     2G  0 lvm  
  │   ├─pve-vm--102--state--pre--QOS      252:11   0  16.5G  0 lvm  
  │   ├─pve-vm--103--disk--0              252:12   0     8G  0 lvm  
  │   ├─pve-vm--107--disk--0              252:13   0     2G  0 lvm  
  │   ├─pve-vm--108--disk--0              252:14   0     8G  0 lvm  
  │   ├─pve-vm--109--disk--0              252:15   0    20G  0 lvm  
  │   ├─pve-vm--104--disk--0              252:16   0    32G  0 lvm  
  │   ├─pve-vm--105--disk--0              252:17   0     2G  0 lvm  
  │   ├─pve-vm--110--disk--0              252:18   0     4G  0 lvm  
  │   └─pve-vm--111--disk--0              252:19   0     2G  0 lvm  
  └─pve-data_tdata                        252:3    0 337.9G  0 lvm  
    └─pve-data-tpool                      252:4    0 337.9G  0 lvm  
      ├─pve-data                          252:5    0 337.9G  1 lvm  
      ├─pve-vm--100--disk--0              252:6    0    80G  0 lvm  
      ├─pve-vm--102--disk--0              252:7    0   120G  0 lvm  
      ├─pve-vm--102--state--Working       252:8    0  16.5G  0 lvm  
      ├─pve-vm--102--state--reboots--well 252:9    0  16.5G  0 lvm  
      ├─pve-vm--101--disk--0              252:10   0     2G  0 lvm  
      ├─pve-vm--102--state--pre--QOS      252:11   0  16.5G  0 lvm  
      ├─pve-vm--103--disk--0              252:12   0     8G  0 lvm  
      ├─pve-vm--107--disk--0              252:13   0     2G  0 lvm  
      ├─pve-vm--108--disk--0              252:14   0     8G  0 lvm  
      ├─pve-vm--109--disk--0              252:15   0    20G  0 lvm  
      ├─pve-vm--104--disk--0              252:16   0    32G  0 lvm  
      ├─pve-vm--105--disk--0              252:17   0     2G  0 lvm  
      ├─pve-vm--110--disk--0              252:18   0     4G  0 lvm  
      └─pve-vm--111--disk--0              252:19   0     2G  0 lvm  
nvme1n1                                   259:4    0   3.7T  0 disk 
└─nvme1n1p1                               259:5    0   3.7T  0 part /mnt/nvme
                                                                    /mnt/pve/teamgroup

PVE2, one of the three "identical" systems:
Code:
root@pve2:~# lsblk
NAME                                                                                         MAJ:MIN RM   SIZE RO TYPE MOUNTPOINTS
sda                                                                                            8:0    0 931.5G  0 disk 
└─ceph--10655fa6--84d2--4c1f--8d92--5b9bfd5e7721-osd--block--6bd15324--9102--4758--aef5--091ff3a28c87
                                                                                             252:0    0 931.5G  0 lvm  
sdb                                                                                            8:16   0 115.2G  0 disk 
├─sdb1                                                                                         8:17   0  1007K  0 part 
├─sdb2                                                                                         8:18   0     1G  0 part 
└─sdb3                                                                                         8:19   0 114.2G  0 part 
  ├─pve-root                                                                                 252:2    0  38.6G  0 lvm  /
  ├─pve-data_tmeta                                                                           252:3    0     1G  0 lvm  
  │ └─pve-data-tpool                                                                         252:5    0  51.4G  0 lvm  
  │   └─pve-data                                                                             252:6    0  51.4G  1 lvm  
  └─pve-data_tdata                                                                           252:4    0  51.4G  0 lvm  
    └─pve-data-tpool                                                                         252:5    0  51.4G  0 lvm  
      └─pve-data                                                                             252:6    0  51.4G  1 lvm

PVE3, the problem system (now after dumping to a .img and restoring):
Code:
root@pve3:~# lsblk
NAME                                                                                         MAJ:MIN RM   SIZE RO TYPE MOUNTPOINTS
sda                                                                                            8:0    0 931.5G  0 disk 
└─ceph--f04948df--44a3--441f--99b6--6f4dbdd29ab4-osd--block--20ffda48--2cc5--45d5--897c--f506939f011e
                                                                                             252:0    0 931.5G  0 lvm  
sdb                                                                                            8:16   0 115.2G  0 disk 
├─sdb1                                                                                         8:17   0  1007K  0 part 
├─sdb2                                                                                         8:18   0     1G  0 part /boot/efi
└─sdb3                                                                                         8:19   0 114.2G  0 part 
  ├─pve-root                                                                                 252:1    0  38.6G  0 lvm  /
  ├─pve-data_tmeta                                                                           252:2    0     1G  0 lvm  
  │ └─pve-data                                                                               252:4    0  51.4G  0 lvm  
  └─pve-data_tdata                                                                           252:3    0  51.4G  0 lvm  
    └─pve-data                                                                               252:4    0  51.4G  0 lvm

PVE4, another "identical" system:
Code:
root@pve4:~# lsblk
NAME                                                                                         MAJ:MIN RM   SIZE RO TYPE MOUNTPOINTS
sda                                                                                            8:0    0 953.9G  0 disk 
└─ceph--89ebb9fe--bf62--43eb--8888--23e46edc76ea-osd--block--79fa5bbd--97c3--43cc--86d5--c64571e3403a
                                                                                             252:0    0 953.9G  0 lvm  
sdb                                                                                            8:16   0 115.2G  0 disk 
├─sdb1                                                                                         8:17   0  1007K  0 part 
├─sdb2                                                                                         8:18   0     1G  0 part /boot/efi
└─sdb3                                                                                         8:19   0 114.2G  0 part 
  ├─pve-root                                                                                 252:1    0  38.6G  0 lvm  /
  ├─pve-data_tmeta                                                                           252:2    0     1G  0 lvm  
  │ └─pve-data                                                                               252:4    0  51.4G  0 lvm  
  └─pve-data_tdata                                                                           252:3    0  51.4G  0 lvm  
    └─pve-data                                                                               252:4    0  51.4G  0 lvm
 
Now PVE3 looks correctly populated in lsblk. Look at you original output. Maybe you just redacted it. IDK.
Is it now functioning correctly?

Anyway the flashcards aren't going to hold out too long anyway.
 
Now PVE3 looks correctly populated in lsblk. Look at you original output. Maybe you just redacted it. IDK.
Is it now functioning correctly?

Anyway the flashcards aren't going to hold out too long anyway.
The lsblk from the earlier post was when the system was in read-only mode. A reboot fixes it, for a little while.
 
you should check your kernel log / dmesg for errors. i guess your usb connection isn't stable or your usb stick is not reliable . could also be related to powersave settings on usb ports
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!