VM disks going readonly

SagnikS · Jan 28, 2024

I have been observing VM disks going readonly randomly (every 5-7 days, it's not a regular pattern). This appears to only be happening to VMs with very large disks, such as 4TB or more. I have a fairly large deployment with over 50 hypervisors.

The VMs have a local RAID 1/RAID 10 disk, I have seen it happen on both HDD and SSD storage. I have tried changing the SCSI controller, device type (SATA/SCSI/VirtIO Block), to no avail. I've tried changing the caching settings, none, default, writeback, directsync, and the problem still occurs. Changing the async IO setting doesn't really seem to help either.

This does not appear to be a host hardware related problem, I have had this occur on Xeon Silvers, old i7's, etc. The host filesystem doesn't go readonly, the host runs fine and there are no relevant logs in the host or the guest. In most cases, just rebooting the server triggers a fsck and that gets resolved automatically. In some cases, it boots into busybox and a manual fsck is required.

It "seems" that the problem has started after kernel 6.5, but I am not certain about this. Any help/tips regarding this would be super appreciated.

Unfortunately, I can't test whether all VMs are getting affected, in most cases these are single VM PVE installations, however, in two cases, I've had the VM with the largest disk (6TB) go readonly, while other VMs on the same storage and hypervisor continue working as if nothing happened. Simply rebooting/fscking the affected VM worked.

fiona · Jan 29, 2024

Hi,

SagnikS said:
This does not appear to be a host hardware related problem, I have had this occur on Xeon Silvers, old i7's, etc. The host filesystem doesn't go readonly, the host runs fine and there are no relevant logs in the host or the guest. In most cases, just rebooting the server triggers a fsck and that gets resolved automatically. In some cases, it boots into busybox and a manual fsck is required.

did you check the logs in the guest directly after the issue happened or only after rebooting? Maybe the logs can't be persisted to disk anymore, because it's read-only? Does the issue coincide with certain operations like backup or snapshot?

You can install jq and next time it happens you can run the following (replacing 123 with the ID of the affected VM):

Code:

echo '{"execute": "qmp_capabilities"}{"execute": "query-block", "arguments": {}}' | socat - /var/run/qemu-server/123.qmp | jq

This will tell us how the disks look like from QEMU's perspective.

SagnikS · Jan 29, 2024

fiona said:
did you check the logs in the guest directly after the issue happened or only after rebooting?

Thank you very much for your reply. Unfortunately in most cases the guest console got spammed with systemd messages stating the disk was readonly. I was able to grab the error right after in one or two instances, I don't have a screenshot but it was something like this:
validate_block_bitmap comm_fstrim bad block bitmap checksum

I had assumed it might fstrim on the host/guest, so I disabled fstrim.timer on both, but it still happened. I was also have to trigger it on two guests by running fstrim -v / on the host, but I couldn't reproduce it after that.

fiona said:
Does the issue coincide with certain operations like backup or snapshot?

No backups or snapshots at all.

I will try the jq command if/when this happens again. My most recent change was to disable discard on the VM disks on PVE, I'll report back if this reoccurs.

sb-jw · Jan 29, 2024

Which file system within the VM do you use and what kind of data do you store there? Maybe you have an inode problem?

SagnikS · Jan 29, 2024

sb-jw said:
Which file system within the VM do you use and what kind of data do you store there? Maybe you have an inode problem?

All are using ext4

sb-jw · Jan 29, 2024

Then inodes are definitely a topic for you. So my question still remains as to what you store in it and what your inodes df -i look like.

SagnikS · Jan 29, 2024

sb-jw said:
Then inodes are definitely a topic for you. So my question still remains as to what you store in it and what your inodes df -i look like.

From one such server:

Code:

df -i /
Filesystem        Inodes IUsed     IFree IUse% Mounted on
/dev/sda1      671023104 86758 670936346    1% /

sb-jw · Jan 29, 2024

Okay, then the inodes are definitely not the problem, at least not on this system.

Then I have another idea, it could possibly be that the VM is running into the file limit and is therefore having problems. Please post the output of the following command:

lsof -n 2>/dev/null | awk '{print $1,$2}' | sort | uniq -c | sort -nr | head -25 | while read nr name pid ; do printf "%10d / %-10d %-15s (PID %5s)\n" $nr $(cat /proc/$pid/limits | grep 'open files' | awk '{print $4}') $name $pid; done

SagnikS · Jan 29, 2024

sb-jw said:
Okay, then the inodes are definitely not the problem, at least not on this system.

Then I have another idea, it could possibly be that the VM is running into the file limit and is therefore having problems. Please post the output of the following command: lsof -n 2>/dev/null | awk '{print $1,$2}' | sort | uniq -c | sort -nr | head -25 | while read nr name pid ; do printf "%10d / %-10d %-15s (PID %5s)\n" $nr $(cat /proc/$pid/limits | grep 'open files' | awk '{print $4}') $name $pid; done

Thanks for your reply, here's the output from the PVE host:

Code:

     41368 / 1024       kvm             (PID 3721702)
      2496 / 1024       pmxcfs          (PID  1221)
       660 / 1024       rrdcached       (PID  1211)
       216 / 1024       tuned           (PID  1156)
       148 / 1024       pveproxy        (PID 1392627)
       147 / 1024       pveproxy        (PID 1393583)
       147 / 1024       pveproxy        (PID 1386992)
       147 / 1024       pvedaemon       (PID 863145)
       147 / 1024       pvedaemon       (PID 821390)
       147 / 1024       pvedaemon       (PID 1033916)
       141 / 1024       pve-ha-lr       (PID  1713)
       141 / 1024       pve-ha-cr       (PID  1703)
       137 / 1024       pveproxy        (PID  1704)
       137 / 1024       pvedaemon       (PID  1690)
       134 / 1024       pveschedu       (PID  1933)
       134 / 524288     master          (PID  1620)
       114 / 1024       polkitd         (PID  1228)
       105 / 1024       zed             (PID  1062)
       105 / 1024       pve-lxc-s       (PID  1044)
       103 / 1048576    systemd         (PID     1)
       102 / 1024       pvestatd        (PID  1635)
       101 / 1024       pve-firew       (PID  1634)
        84 / 1024       systemd-t       (PID  1034)
        78 / 1024       spiceprox       (PID 454965)
        68 / 1024       spiceprox       (PID  1711)

sb-jw · Jan 30, 2024

SagnikS said:
PID 3721702

Is this the affected VM?

SagnikS · Jan 30, 2024

sb-jw said:
Is this the affected VM?

I believe it is, it's a single VM on this host.

sb-jw · Jan 30, 2024

Okay, then you might have a problem with the file limit. You can use the command to temporarily increase the limits (e.g. you could wait until the problem occurs, then issue the command and see if it helps):
prlimit --pid 3721702 --nofile=4096:1048576

or set them permanently in the files. A reboot is necessary for this to take effect.

/etc/security/limits.conf

Code:

root soft nofile 4096
root hard nofile 1048576

/etc/systemd/system.conf

Code:

DefaultLimitNOFILE=4096:1048576

I set these limits by default on each of my hypervisors to avoid such problems.

SagnikS · Jan 30, 2024

sb-jw said:
Okay, then you might have a problem with the file limit. You can use the command to temporarily increase the limits (e.g. you could wait until the problem occurs, then issue the command and see if it helps):
prlimit --pid 3721702 --nofile=4096:1048576

or set them permanently in the files. A reboot is necessary for this to take effect.

/etc/security/limits.conf

Code:

root soft nofile 4096 root hard nofile 1048576

/etc/systemd/system.conf

Code:

DefaultLimitNOFILE=4096:1048576

I set these limits by default on each of my hypervisors to avoid such problems.

Interesting, thank you, I know this problem occurs on other applications when things get busy as the defaults are low, never really even thought that this would have been a problem on Proxmox. Maybe Proxmox should ship with higher defaults in future, such as 1048576?

fiona · Jan 30, 2024

SagnikS said:
Interesting, thank you, I know this problem occurs on other applications when things get busy as the defaults are low, never really even thought that this would have been a problem on Proxmox. Maybe Proxmox should ship with higher defaults in future, such as 1048576?

There already is a proposed patch for that: https://lists.proxmox.com/pipermail/pve-devel/2023-December/061043.html

SagnikS · Jan 30, 2024

fiona said:
Hi,

did you check the logs in the guest directly after the issue happened or only after rebooting? Maybe the logs can't be persisted to disk anymore, because it's read-only? Does the issue coincide with certain operations like backup or snapshot?

You can install jq and next time it happens you can run the following (replacing 123 with the ID of the affected VM):

Code:

echo '{"execute": "qmp_capabilities"}{"execute": "query-block", "arguments": {}}' | socat - /var/run/qemu-server/123.qmp | jq

This will tell us how the disks look like from QEMU's perspective.

Just noticed another stuck VM:

Code:

{
  "QMP": {
    "version": {
      "qemu": {
        "micro": 2,
        "minor": 1,
        "major": 8
      },
      "package": "pve-qemu-kvm_8.1.2-6"
    },
    "capabilities": []
  }
}
{
  "return": {}
}
{
  "return": [
    {
      "io-status": "ok",
      "device": "drive-ide2",
      "locked": false,
      "removable": true,
      "qdev": "ide2",
      "tray_open": false,
      "type": "unknown"
    },
    {
      "io-status": "ok",
      "device": "drive-scsi0",
      "locked": false,
      "removable": false,
      "inserted": {
        "iops_rd": 0,
        "detect_zeroes": "unmap",
        "image": {
          "virtual-size": 107374182400,
          "filename": "/var/lib/vz/images/406/vm-406-disk-0.raw",
          "format": "raw",
          "actual-size": 95555608576,
          "dirty-flag": false
        },
        "iops_wr": 0,
        "ro": false,
        "node-name": "#block153",
        "backing_file_depth": 0,
        "drv": "raw",
        "iops": 0,
        "bps_wr": 0,
        "write_threshold": 0,
        "dirty-bitmaps": [
          {
            "name": "pbs-incremental-dirty-bitmap",
            "recording": true,
            "persistent": false,
            "busy": false,
            "granularity": 4194304,
            "count": 14722007040
          }
        ],
        "encrypted": false,
        "bps": 0,
        "bps_rd": 0,
        "cache": {
          "no-flush": false,
          "direct": false,
          "writeback": true
        },
        "file": "/var/lib/vz/images/406/vm-406-disk-0.raw"
      },
      "qdev": "scsi0",
      "type": "unknown"
    }
  ]
}

SagnikS · Jan 30, 2024

fiona said:
There already is a proposed patch for that: https://lists.proxmox.com/pipermail/pve-devel/2023-December/061043.html

That's great, hopefully it is merged and rolled out ASAP.

fiona · Jan 30, 2024

SagnikS said:

nothing wrong here and not marked as read-only by QEMU (which doesn't say anything about inside the guest of course).

SagnikS said:

Code:

"dirty-bitmaps": [
          {
            "name": "pbs-incremental-dirty-bitmap",
            "recording": true,
            "persistent": false,
            "busy": false,
            "granularity": 4194304,
            "count": 14722007040
          }
        ],

But you did make a backup, so if you are using iothread, the issue might actually be the same as here: https://forum.proxmox.com/threads/vms-hung-after-backup.137286/post-627915

SagnikS · Jan 30, 2024

fiona said:
nothing wrong here and not marked as read-only by QEMU (which doesn't say anything about inside the guest of course).

But you did make a backup, so if you are using iothread, the issue might actually be the same as here: https://forum.proxmox.com/threads/vms-hung-after-backup.137286/post-627915

This is actually not from the large single VMs where the issue was rampant, this is from another hypervisor where I didn't notice issues before. iothread isn't enabled as well, and controller is VirtIO SCSI.

fiona · Jan 30, 2024

SagnikS said:
This is actually not from the large single VMs where the issue was rampant, this is from another hypervisor where I didn't notice issues before. iothread isn't enabled as well, and controller is VirtIO SCSI.

Okay, then it's certainly not the issue I linked to. Were there any useful logs on host or guest this time? Maybe you still can log in and get the system journal?

SagnikS · Jan 30, 2024

fiona said:
Okay, then it's certainly not the issue I linked to. Were there any useful logs on host or guest this time? Maybe you still can log in and get the system journal?

I'm not seeing anything interesting in the host, I should have grabbed logs from the guest but in an attempt to minimize downtime I didn't think of that. However, I did briefly go through the error messages on the VM console, and they were along the lines of sd hung task I believe. Perhaps this is the open file limit indeed.

VM disks going readonly

Well-Known Member

Proxmox Staff Member

Well-Known Member

Famous Member

Well-Known Member

Famous Member

Well-Known Member

Famous Member

Well-Known Member

Famous Member

Well-Known Member

Famous Member

Well-Known Member

Proxmox Staff Member

Well-Known Member

Well-Known Member

Proxmox Staff Member

Well-Known Member

Proxmox Staff Member

Well-Known Member

We value your privacy