Permission denied - when creating a snapshot

davidand · Jul 31, 2022

I'm trying to create a snapshot of my VM ID=101 and I'm getting a Permission denied message:

Code:

 WARNING: You have not turned on protection against thin pools running out of space.
  WARNING: Set activation/thin_pool_autoextend_threshold below 100 to trigger automatic extension of thin pools before they get full.
  Logical volume "vm-101-state-before_upgrade" created.
  WARNING: Sum of all thin volume sizes (818.44 GiB) exceeds the size of thin pool pve/data and the amount of free space in volume group (<16.00 GiB).
TASK ERROR: unable to open file '/etc/pve/nodes/proxmox-2/qemu-server/101.conf.tmp.3410546' - Permission denied

This node is a part of a cluster and the other node is currently down (hardware fault), can it be the cause of the problem?

LnxBil · Jul 31, 2022

davidand said:
This node is a part of a cluster and the other node is currently down (hardware fault), can it be the cause of the problem?

First, a thin pool is not cluster-aware and cannot be used clustered, so the warnings are real.
Now to the error ... yes, it is possible if you have only a two-node cluster (which is not a good cluster per se, because it cannot have quorum without 2 out of 2 beeing online).

davidand · Aug 1, 2022

@LnxBil I have a script set up to solve the quorum problem (https://gitlab.rickelobe.com/virtualiztion/proxmox-2n-ha-monitor) which has been working great. I cannot (at this moment) afford a third hardware and I don't want to fake the quorum with a third Raspberry PI machine like some people do. I still like the 2-node cluster though, because I can migrate nodes back-and-forth in case of a hardware issue of one of the nodes, like I'm currently experiencing. I'm having a node being serviced with no downtime of my setup.

Back to the snapshot issue. Is it possible to solve the permission denied problem?

Just for explanation, the permission issue seems to be at filesystem level, although I'm running the following commands under root:

Code:

root@proxmox-2:~# cd /etc/pve/nodes/proxmox-2/qemu-server
root@proxmox-2:/etc/pve/nodes/proxmox-2/qemu-server# touch new_file
touch: cannot touch 'new_file': Permission denied

LnxBil · Aug 1, 2022

davidand said:
I don't want to fake the quorum with a third Raspberry PI machine like some people do.

Fake? That's not faking, that's a quorum device.

davidand said:
Just for explanation, the permission issue seems to be at filesystem level, although I'm running the following commands under root:

Code:

root@proxmox-2:~# cd /etc/pve/nodes/proxmox-2/qemu-server root@proxmox-2:/etc/pve/nodes/proxmox-2/qemu-server# touch new_file touch: cannot touch 'new_file': Permission denied

That implies two possible errors: no Quorum and sqlite database corrupt/disk full.

fiona · Aug 1, 2022

davidand said:
@LnxBil I have a script set up to solve the quorum problem (https://gitlab.rickelobe.com/virtualiztion/proxmox-2n-ha-monitor) which has been working great. I cannot (at this moment) afford a third hardware and I don't want to fake the quorum with a third Raspberry PI machine like some people do.

This script seems misguided to me. Nothing prevents both nodes from setting expected to 1 when there's communication failure?

Warning: If a network issue is preventing quorum and this program is running on both nodes, you may encounter a situation where both nodes start the HA VMs.

This leads to a split brain situation and mess up the cluster! The vote provided by the QDevice is real and can only be provided to one of the nodes at a time, which avoids a split-brain situation.

davidand said:
I still like the 2-node cluster though, because I can migrate nodes back-and-forth in case of a hardware issue of one of the nodes, like I'm currently experiencing. I'm having a node being serviced with no downtime of my setup.

Back to the snapshot issue. Is it possible to solve the permission denied problem?

Just for explanation, the permission issue seems to be at filesystem level, although I'm running the following commands under root:

Code:

root@proxmox-2:~# cd /etc/pve/nodes/proxmox-2/qemu-server root@proxmox-2:/etc/pve/nodes/proxmox-2/qemu-server# touch new_file touch: cannot touch 'new_file': Permission denied

davidand · Aug 1, 2022

@fiona @LnxBil One possible solution would be setting the expected number of nodes to one. In that case I can write to the filesystem. Are there any drawbacks of this?

Before:
- no quorum
- cannot create snapshot

Action:
- pvecm expected 1

After:
- quorum achieved
- can create snapshot

Code:

# pvecm status

Quorum information
------------------
Date:             Mon Aug  1 09:37:07 2022
Quorum provider:  corosync_votequorum
Nodes:            1
Node ID:          0x00000002
Ring ID:          2.35d
Quorate:          No

Votequorum information
----------------------
Expected votes:   2
Highest expected: 2
Total votes:      1
Quorum:           2 Activity blocked

# pvecm expected 1

# pvecm status

Quorum information
------------------
Date:             Mon Aug  1 09:37:58 2022
Quorum provider:  corosync_votequorum
Nodes:            1
Node ID:          0x00000002
Ring ID:          2.35d
Quorate:          Yes

Votequorum information
----------------------
Expected votes:   1
Highest expected: 1
Total votes:      1
Quorum:           1
Flags:            Quorate

# cd /etc/pve/nodes/proxmox-2/qemu-server
# touch new_file
# rm new_file

fiona · Aug 1, 2022

davidand said:

@fiona @LnxBil One possible solution would be setting the expected number of nodes to one. In that case I can write to the filesystem. Are there any drawbacks of this?

Before:
- no quorum
- cannot create snapshot

Action:
- pvecm expected 1

After:
- quorum achieved
- can create snapshot

Code:

# pvecm status

Quorum information
------------------
Date:             Mon Aug  1 09:37:07 2022
Quorum provider:  corosync_votequorum
Nodes:            1
Node ID:          0x00000002
Ring ID:          2.35d
Quorate:          No

Votequorum information
----------------------
Expected votes:   2
Highest expected: 2
Total votes:      1
Quorum:           2 Activity blocked

# pvecm expected 1

# pvecm status

Quorum information
------------------
Date:             Mon Aug  1 09:37:58 2022
Quorum provider:  corosync_votequorum
Nodes:            1
Node ID:          0x00000002
Ring ID:          2.35d
Quorate:          Yes

Votequorum information
----------------------
Expected votes:   1
Highest expected: 1
Total votes:      1
Quorum:           1
Flags:            Quorate

# cd /etc/pve/nodes/proxmox-2/qemu-server
# touch new_file
# rm new_file

The issue is when this happens on both nodes independently. Because the cluster status will be out of sync. Each node will think it has the last say on the matter and there's really no good way to recover from that. If you can ensure that only one node at a time will set expected to 1 (EDIT: i.e. a node is only allowed to do it if both nodes were in sync before), fine. But that also means that the script you mentioned can only ever run on one node!

Having a QDevice is designed exactly for clusters with two nodes (or more generally an even number) and it can only give its vote to one node (or more generally one half of the cluster) at a time, preventing the above situation.

davidand · Aug 1, 2022

@fiona Got it! The other node is down due to a hardware fault, so I can be sure that this happens only on the second node right now.

Regarding the QDevice, does it make sense to install a QDevice in a VM on each node independently? This way there would be 4 nodes: 2 PVE and 2 QDevices, where if one hardware fails, then two devices go down and we still have 2 alive.

fiona · Aug 1, 2022

davidand said:
@fiona Got it! The other node is down due to a hardware fault, so I can be sure that this happens only on the second node right now.

Regarding the QDevice, does it make sense to install a QDevice in a VM on each node independently? This way there would be 4 nodes: 2 PVE and 2 QDevices, where if one hardware fails, then two devices go down and we still have 2 alive.

No, the QDevice needs to be physical. If it's a HA-managed VM, there will be confusion who the VM belongs to if there's communication failure. And you only ever should use 1 QDevice (I'm not even sure if multiple are possible), or you'll have the same problem with half and half

LnxBil · Aug 1, 2022

davidand said:
@fiona Got it! The other node is down due to a hardware fault, so I can be sure that this happens only on the second node right now.

Just to be sure: Change the expected votes to 1, do your changes, change it back to 2 BEFORE the other node starts up again. If you don't set it, you will end up with the split-brain scenario @fiona described.

Concerning the QDevice, you can just run it in a VM on another Cluster or machine that is not part of your 2-node-PVE cluster, that is no problem and often the case in any HA PVE setup with multiple sites. This is very common in any HA setup (also other cluster technologies e.g. Oracle RAC)

davidand · Aug 1, 2022

@LnxBil Good point, thanks!

I did have a look at the documentation that describes installing a QDevice. Is it really that easy? I found other guides, as well, that are way more complicated than installing an empty PVE node. If the below is all what I need then this could run in a NAS Docker container without any additional hardware needed.

LnxBil · Aug 1, 2022

Yes, a QDevice is as easy as it sounds but it has to be off-site (with respect to the cluster itself). This is just a third vote, the counting one of course :-D
You can easily install it in your NAS or PI or whatevery you want.

davidand · Aug 1, 2022

@LnxBil Thanks, I will give it a shot then!

Search

Search

Permission denied - when creating a snapshot

davidand

Active Member

LnxBil

Distinguished Member

davidand

Active Member

LnxBil

Distinguished Member

fiona

Proxmox Staff Member

davidand

Active Member

fiona

Proxmox Staff Member

davidand

Active Member

fiona

Proxmox Staff Member

LnxBil

Distinguished Member

davidand

Active Member

LnxBil

Distinguished Member

davidand

Active Member

We value your privacy