VM disks going readonly

fiona · Jan 30, 2024

Should it happen again, you can check with ls -1 /proc/$(cat /var/run/qemu-server/123.pid)/fd/ | wc -l replacing 123 with the actual ID of your VM how many open file descriptors there are.

emunt6 · Jan 30, 2024

Hi!

How many free space do you have on the storage?

EXT4 filesystem has default 5% reserved space, maybe you hitting the limit? (when you write the 95% of the disk).

Code:

$> tune2fs -l /dev/sda1 | grep -ie "Block count" -ie "Reserved block count" -ie "Block size"

[CODE]
The limit ( in GB ):
("Block count" - "Reserved block count") * "Block size" / 1024^3

$> tune2fs -l /dev/sda1 | grep -ie "Block count" -ie "Reserved block count" -ie "Block size" | awk '{print $(NF-0)}' | tr "\n" " " | awk '{print ($1 - $2) * $3 / 1024^3 " GB" }'

SagnikS · Jan 30, 2024

emunt6 said:
Hi!

How many free space do you have on the storage?

EXT4 filesystem has default 5% reserved space, maybe you hitting the limit? ( when you hit the limit you cant write it )

Code:

$> tune2fs -l /dev/sda1 | grep -ie "Block count" -ie "Reserved block count" -ie "Block size" The limit ( in GB ): ("Block count" - "Reserved block count") * "Block size" / 1024^3

Code:

$> tune2fs -l /dev/sda1 | grep -ie "Block count" -ie "Reserved block count" -ie "Block size" | awk '{print $(NF-0)}' | tr "\n" " " | awk '{print ($1 - $2) * $3 / 1024^3 " GB" }'

Thanks for your reply. It's not lack of space, there's enough space available on all these servers.

SagnikS · Feb 2, 2024

fiona said:
Should it happen again, you can check with ls -1 /proc/$(cat /var/run/qemu-server/123.pid)/fd/ | wc -l replacing 123 with the actual ID of your VM how many open file descriptors there are.

It happened on another VM, not sure if related. The output of this command above was 107. I grabbed kernel logs from the guest:

SagnikS · Feb 2, 2024

SagnikS said:
It happened on another VM, not sure if related. The output of this command above was 107. I grabbed kernel logs from the guest:

This might actually be unrelated, the timestamp in the VM is 20:03 EST, this server had a running backup and the backup had failed due to a hardware problem with the backup server at around the same time. This behaviour isn't really ideal

(no iothread, VirtIO SCSI controller)

Code:

INFO:  69% (7.7 GiB of 11.0 GiB) in 1m 33s, read: 101.3 MiB/s, write: 101.3 MiB/s
INFO:  69% (7.7 GiB of 11.0 GiB) in 17m 15s, read: 0 B/s, write: 0 B/s
ERROR: backup write data failed: command error: write_data upload error: pipelined request failed: timed out
INFO: aborting backup job
INFO: resuming VM again
ERROR: Backup of VM 412 failed - backup write data failed: command error: write_data upload error: pipelined request failed: timed out
INFO: Failed at 2024-02-02 01:14:54
INFO: Starting Backup of VM 413 (qemu)
INFO: Backup started at 2024-02-02 01:14:54
INFO: status = running

fiona · Feb 6, 2024

SagnikS said:
It happened on another VM, not sure if related. The output of this command above was 107. I grabbed kernel logs from the guest:

The system logs show failed IO and that the filesystem is remounted read-only because of that.

SagnikS said:

Code:

INFO:  69% (7.7 GiB of 11.0 GiB) in 1m 33s, read: 101.3 MiB/s, write: 101.3 MiB/s
INFO:  69% (7.7 GiB of 11.0 GiB) in 17m 15s, read: 0 B/s, write: 0 B/s
ERROR: backup write data failed: command error: write_data upload error: pipelined request failed: timed out
INFO: aborting backup job
INFO: resuming VM again
ERROR: Backup of VM 412 failed - backup write data failed: command error: write_data upload error: pipelined request failed: timed out
INFO: Failed at 2024-02-02 01:14:54
INFO: Starting Backup of VM 413 (qemu)
INFO: Backup started at 2024-02-02 01:14:54
INFO: status = running

When the connection to the backup target is lost or too slow, it's unfortunately expected. See my reply here: https://forum.proxmox.com/threads/i...m-in-the-vm-after-fsfreeze.141080/post-631577

taoky · Apr 25, 2024

SagnikS said:
Thank you very much for your reply. Unfortunately in most cases the guest console got spammed with systemd messages stating the disk was readonly. I was able to grab the error right after in one or two instances, I don't have a screenshot but it was something like this:
validate_block_bitmap comm_fstrim bad block bitmap checksum

I had assumed it might fstrim on the host/guest, so I disabled fstrim.timer on both, but it still happened. I was also have to trigger it on two guests by running fstrim -v / on the host, but I couldn't reproduce it after that.

No backups or snapshots at all.

I will try the jq command if/when this happens again. My most recent change was to disable discard on the VM disks on PVE, I'll report back if this reoccurs.

Does ext4 in guest still crash after disabling discard? We also have a Ubuntu VM with 64G storage, ext4 crashes (gets fs read-only) once two weeks, while other VMs are working normally (some of them also have discard on though). Also not sure if it's the same cause as this.

SagnikS · Apr 25, 2024

taoky said:
Does ext4 in guest still crash after disabling discard? We also have a Ubuntu VM with 64G storage, ext4 crashes (gets fs read-only) once two weeks, while other VMs are working normally (some of them also have discard on though). Also not sure if it's the same cause as this.

Disabling/enabling discard didn't seem to help much. It has gotten a lot less frequent recently though, not sure what has changed.

taoky · Jul 4, 2024

After disabling discard our Ubuntu VM works without disks going readonly for 50 days...

Search

Search

VM disks going readonly

fiona

Proxmox Staff Member

emunt6

Active Member

SagnikS

Well-Known Member

SagnikS

Well-Known Member

Attachments

SagnikS

Well-Known Member

fiona

Proxmox Staff Member

taoky

New Member

SagnikS

Well-Known Member

taoky

New Member

We value your privacy