What if root filesystem became readonly

ZKallo

Active Member
Nov 8, 2017
18
1
43
63
We have four nodes, on one of them the root filesystem became readonly and there are running more virtual machines on it. We have enough resources to move them on an other node but what could we do successfully because lots of things cannot be done. (e.g. we cannot move, make backup, didn't try to stop or shutdown)
 
You can try to re-mount as rw, perhaps at least temporarily.
If you are using shared storage for the VMs, shutdown the offending host (hard shutdown via IPMI or physical button), and the HA will transfer the VMs (if you had VMs HA'ed).

Good luck


Blockbridge : Ultra low latency all-NVME shared storage for Proxmox - https://www.blockbridge.com/proxmox
Thx for your answer,
- I launched fsck in read-only mode on root filesystem and there were lots of errors.
- We haven't implemented HA.
 
- I launched fsck in read-only mode on root filesystem and there were lots of errors.
You don't need to fix the filesystem, yet. You just need it RW for the duration of the live migration attempt. You have not said what type of storage is backing the data of these VMs, so I don't know if it's even possible to move the VMs at all.
- We haven't implemented HA.
I guess, you could turn off the offending node and offline migrate the VMs. Make sure you get the config backups in advance:
https://forum.proxmox.com/threads/move-vm-from-a-dead-node-to-a-second-node.139095/

Please understand that any suggestion here is not a guaranteed way to recover. Only you have access to the full view and understanding of your environment. If these are important services, you should have subscription/support/backups in place.

Good luck


Blockbridge : Ultra low latency all-NVME shared storage for Proxmox - https://www.blockbridge.com/proxmox
 
  • Like
Reactions: Kingneutron
You'll need to replace the disk, it's dying. Entry in /etc/fstab should have "errors=remount-ro"

Hope you have recent backups to restore from.

Go with Enterprise-class SSD to replace it if this is business/production
 
lots of advice, no one asked the obvious.

What do you see in dmesg to explain the fault? obvs only available before rebooting the node since your logs arent being written to your read-only file system.
 
You don't need to fix the filesystem, yet. You just need it RW for the duration of the live migration attempt. You have not said what type of storage is backing the data of these VMs, so I don't know if it's even possible to move the VMs at all.

I guess, you could turn off the offending node and offline migrate the VMs. Make sure you get the config backups in advance:
https://forum.proxmox.com/threads/move-vm-from-a-dead-node-to-a-second-node.139095/

Please understand that any suggestion here is not a guaranteed way to recover. Only you have access to the full view and understanding of your environment. If these are important services, you should have subscription/support/backups in place.

Good luck


Blockbridge : Ultra low latency all-NVME shared storage for Proxmox - https://www.blockbridge.com/proxmox
Thanks for your answer. It gave me a hope. I'll try these suggestions soon.
 
You'll need to replace the disk, it's dying. Entry in /etc/fstab should have "errors=remount-ro"

Hope you have recent backups to restore from.

Go with Enterprise-class SSD to replace it if this is business/production
It is a 32GB SSD as the machine came from the factory. There is no backup from it. We would see what's wrong with it when we will have moved all of the vms from that node.
 
You don't need to fix the filesystem, yet. You just need it RW for the duration of the live migration attempt. You have not said what type of storage is backing the data of these VMs, so I don't know if it's even possible to move the VMs at all.

I guess, you could turn off the offending node and offline migrate the VMs. Make sure you get the config backups in advance:
https://forum.proxmox.com/threads/move-vm-from-a-dead-node-to-a-second-node.139095/

Please understand that any suggestion here is not a guaranteed way to recover. Only you have access to the full view and understanding of your environment. If these are important services, you should have subscription/support/backups in place.

Good luck


Blockbridge : Ultra low latency all-NVME shared storage for Proxmox - https://www.blockbridge.com/proxmox
What worked: stopping vms, renaming conf files and copying to a network share, copying to the new node, renaming back.
What didn't work: remounting pve-root as rw ( it said: "write protected")
 
  • Like
Reactions: Kingneutron
What worked: stopping vms, renaming conf files and copying to a network share, copying to the new node, renaming back.
Glad to hear you are up and running.
What didn't work: remounting pve-root as rw ( it said: "write protected")
That would have been a very short-term workaround to get to the same state as you did via rename. As @leesteken said, you need a new disk of a proper kind.


Blockbridge : Ultra low latency all-NVME shared storage for Proxmox - https://www.blockbridge.com/proxmox
 
  • Like
Reactions: Kingneutron

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!