VM disks going readonly

SagnikS

Well-Known Member
Feb 23, 2018
39
2
48
22
I have been observing VM disks going readonly randomly (every 5-7 days, it's not a regular pattern). This appears to only be happening to VMs with very large disks, such as 4TB or more. I have a fairly large deployment with over 50 hypervisors.

The VMs have a local RAID 1/RAID 10 disk, I have seen it happen on both HDD and SSD storage. I have tried changing the SCSI controller, device type (SATA/SCSI/VirtIO Block), to no avail. I've tried changing the caching settings, none, default, writeback, directsync, and the problem still occurs. Changing the async IO setting doesn't really seem to help either.

This does not appear to be a host hardware related problem, I have had this occur on Xeon Silvers, old i7's, etc. The host filesystem doesn't go readonly, the host runs fine and there are no relevant logs in the host or the guest. In most cases, just rebooting the server triggers a fsck and that gets resolved automatically. In some cases, it boots into busybox and a manual fsck is required.

It "seems" that the problem has started after kernel 6.5, but I am not certain about this. Any help/tips regarding this would be super appreciated.

Unfortunately, I can't test whether all VMs are getting affected, in most cases these are single VM PVE installations, however, in two cases, I've had the VM with the largest disk (6TB) go readonly, while other VMs on the same storage and hypervisor continue working as if nothing happened. Simply rebooting/fscking the affected VM worked.
 
Last edited:
Hi,
This does not appear to be a host hardware related problem, I have had this occur on Xeon Silvers, old i7's, etc. The host filesystem doesn't go readonly, the host runs fine and there are no relevant logs in the host or the guest. In most cases, just rebooting the server triggers a fsck and that gets resolved automatically. In some cases, it boots into busybox and a manual fsck is required.
did you check the logs in the guest directly after the issue happened or only after rebooting? Maybe the logs can't be persisted to disk anymore, because it's read-only? Does the issue coincide with certain operations like backup or snapshot?

You can install jq and next time it happens you can run the following (replacing 123 with the ID of the affected VM):
Code:
echo '{"execute": "qmp_capabilities"}{"execute": "query-block", "arguments": {}}' | socat - /var/run/qemu-server/123.qmp | jq
This will tell us how the disks look like from QEMU's perspective.
 
  • Like
Reactions: SagnikS
did you check the logs in the guest directly after the issue happened or only after rebooting?
Thank you very much for your reply. Unfortunately in most cases the guest console got spammed with systemd messages stating the disk was readonly. I was able to grab the error right after in one or two instances, I don't have a screenshot but it was something like this:
validate_block_bitmap comm_fstrim bad block bitmap checksum

I had assumed it might fstrim on the host/guest, so I disabled fstrim.timer on both, but it still happened. I was also have to trigger it on two guests by running fstrim -v / on the host, but I couldn't reproduce it after that.

Does the issue coincide with certain operations like backup or snapshot?

No backups or snapshots at all.

I will try the jq command if/when this happens again. My most recent change was to disable discard on the VM disks on PVE, I'll report back if this reoccurs.
 
Which file system within the VM do you use and what kind of data do you store there? Maybe you have an inode problem?
 
  • Like
Reactions: SagnikS
Then inodes are definitely a topic for you. So my question still remains as to what you store in it and what your inodes df -i look like.
 
  • Like
Reactions: SagnikS
Then inodes are definitely a topic for you. So my question still remains as to what you store in it and what your inodes df -i look like.
From one such server:
Code:
df -i /
Filesystem        Inodes IUsed     IFree IUse% Mounted on
/dev/sda1      671023104 86758 670936346    1% /
 
Okay, then the inodes are definitely not the problem, at least not on this system.

Then I have another idea, it could possibly be that the VM is running into the file limit and is therefore having problems. Please post the output of the following command: lsof -n 2>/dev/null | awk '{print $1,$2}' | sort | uniq -c | sort -nr | head -25 | while read nr name pid ; do printf "%10d / %-10d %-15s (PID %5s)\n" $nr $(cat /proc/$pid/limits | grep 'open files' | awk '{print $4}') $name $pid; done
 
  • Like
Reactions: SagnikS
Okay, then the inodes are definitely not the problem, at least not on this system.

Then I have another idea, it could possibly be that the VM is running into the file limit and is therefore having problems. Please post the output of the following command: lsof -n 2>/dev/null | awk '{print $1,$2}' | sort | uniq -c | sort -nr | head -25 | while read nr name pid ; do printf "%10d / %-10d %-15s (PID %5s)\n" $nr $(cat /proc/$pid/limits | grep 'open files' | awk '{print $4}') $name $pid; done
Thanks for your reply, here's the output from the PVE host:

Code:
     41368 / 1024       kvm             (PID 3721702)
      2496 / 1024       pmxcfs          (PID  1221)
       660 / 1024       rrdcached       (PID  1211)
       216 / 1024       tuned           (PID  1156)
       148 / 1024       pveproxy        (PID 1392627)
       147 / 1024       pveproxy        (PID 1393583)
       147 / 1024       pveproxy        (PID 1386992)
       147 / 1024       pvedaemon       (PID 863145)
       147 / 1024       pvedaemon       (PID 821390)
       147 / 1024       pvedaemon       (PID 1033916)
       141 / 1024       pve-ha-lr       (PID  1713)
       141 / 1024       pve-ha-cr       (PID  1703)
       137 / 1024       pveproxy        (PID  1704)
       137 / 1024       pvedaemon       (PID  1690)
       134 / 1024       pveschedu       (PID  1933)
       134 / 524288     master          (PID  1620)
       114 / 1024       polkitd         (PID  1228)
       105 / 1024       zed             (PID  1062)
       105 / 1024       pve-lxc-s       (PID  1044)
       103 / 1048576    systemd         (PID     1)
       102 / 1024       pvestatd        (PID  1635)
       101 / 1024       pve-firew       (PID  1634)
        84 / 1024       systemd-t       (PID  1034)
        78 / 1024       spiceprox       (PID 454965)
        68 / 1024       spiceprox       (PID  1711)
 
Okay, then you might have a problem with the file limit. You can use the command to temporarily increase the limits (e.g. you could wait until the problem occurs, then issue the command and see if it helps):
prlimit --pid 3721702 --nofile=4096:1048576

or set them permanently in the files. A reboot is necessary for this to take effect.

/etc/security/limits.conf
Code:
root soft nofile 4096
root hard nofile 1048576

/etc/systemd/system.conf
Code:
DefaultLimitNOFILE=4096:1048576

I set these limits by default on each of my hypervisors to avoid such problems.
 
  • Like
Reactions: SagnikS
Okay, then you might have a problem with the file limit. You can use the command to temporarily increase the limits (e.g. you could wait until the problem occurs, then issue the command and see if it helps):
prlimit --pid 3721702 --nofile=4096:1048576

or set them permanently in the files. A reboot is necessary for this to take effect.

/etc/security/limits.conf
Code:
root soft nofile 4096
root hard nofile 1048576

/etc/systemd/system.conf
Code:
DefaultLimitNOFILE=4096:1048576

I set these limits by default on each of my hypervisors to avoid such problems.
Interesting, thank you, I know this problem occurs on other applications when things get busy as the defaults are low, never really even thought that this would have been a problem on Proxmox. Maybe Proxmox should ship with higher defaults in future, such as 1048576?
 
Hi,

did you check the logs in the guest directly after the issue happened or only after rebooting? Maybe the logs can't be persisted to disk anymore, because it's read-only? Does the issue coincide with certain operations like backup or snapshot?

You can install jq and next time it happens you can run the following (replacing 123 with the ID of the affected VM):
Code:
echo '{"execute": "qmp_capabilities"}{"execute": "query-block", "arguments": {}}' | socat - /var/run/qemu-server/123.qmp | jq
This will tell us how the disks look like from QEMU's perspective.
Just noticed another stuck VM:

Code:
{
  "QMP": {
    "version": {
      "qemu": {
        "micro": 2,
        "minor": 1,
        "major": 8
      },
      "package": "pve-qemu-kvm_8.1.2-6"
    },
    "capabilities": []
  }
}
{
  "return": {}
}
{
  "return": [
    {
      "io-status": "ok",
      "device": "drive-ide2",
      "locked": false,
      "removable": true,
      "qdev": "ide2",
      "tray_open": false,
      "type": "unknown"
    },
    {
      "io-status": "ok",
      "device": "drive-scsi0",
      "locked": false,
      "removable": false,
      "inserted": {
        "iops_rd": 0,
        "detect_zeroes": "unmap",
        "image": {
          "virtual-size": 107374182400,
          "filename": "/var/lib/vz/images/406/vm-406-disk-0.raw",
          "format": "raw",
          "actual-size": 95555608576,
          "dirty-flag": false
        },
        "iops_wr": 0,
        "ro": false,
        "node-name": "#block153",
        "backing_file_depth": 0,
        "drv": "raw",
        "iops": 0,
        "bps_wr": 0,
        "write_threshold": 0,
        "dirty-bitmaps": [
          {
            "name": "pbs-incremental-dirty-bitmap",
            "recording": true,
            "persistent": false,
            "busy": false,
            "granularity": 4194304,
            "count": 14722007040
          }
        ],
        "encrypted": false,
        "bps": 0,
        "bps_rd": 0,
        "cache": {
          "no-flush": false,
          "direct": false,
          "writeback": true
        },
        "file": "/var/lib/vz/images/406/vm-406-disk-0.raw"
      },
      "qdev": "scsi0",
      "type": "unknown"
    }
  ]
}
 
Code:
    {
      "io-status": "ok",
      "device": "drive-scsi0",
      "locked": false,
        "ro": false,
nothing wrong here and not marked as read-only by QEMU (which doesn't say anything about inside the guest of course).

Code:
"dirty-bitmaps": [
          {
            "name": "pbs-incremental-dirty-bitmap",
            "recording": true,
            "persistent": false,
            "busy": false,
            "granularity": 4194304,
            "count": 14722007040
          }
        ],
But you did make a backup, so if you are using iothread, the issue might actually be the same as here: https://forum.proxmox.com/threads/vms-hung-after-backup.137286/post-627915
 
  • Like
Reactions: SagnikS
nothing wrong here and not marked as read-only by QEMU (which doesn't say anything about inside the guest of course).


But you did make a backup, so if you are using iothread, the issue might actually be the same as here: https://forum.proxmox.com/threads/vms-hung-after-backup.137286/post-627915
This is actually not from the large single VMs where the issue was rampant, this is from another hypervisor where I didn't notice issues before. iothread isn't enabled as well, and controller is VirtIO SCSI.
 
This is actually not from the large single VMs where the issue was rampant, this is from another hypervisor where I didn't notice issues before. iothread isn't enabled as well, and controller is VirtIO SCSI.
Okay, then it's certainly not the issue I linked to. Were there any useful logs on host or guest this time? Maybe you still can log in and get the system journal?
 
Okay, then it's certainly not the issue I linked to. Were there any useful logs on host or guest this time? Maybe you still can log in and get the system journal?
I'm not seeing anything interesting in the host, I should have grabbed logs from the guest but in an attempt to minimize downtime I didn't think of that. However, I did briefly go through the error messages on the VM console, and they were along the lines of sd hung task I believe. Perhaps this is the open file limit indeed.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!