[solved] /etc/pve/nodes/node/qemu-server/ readonly after vm migrate from other node

m.ardito

Active Member
Feb 17, 2010
1,473
16
38
Torino, Italy
I have a problem similar to this OP
http://forum.proxmox.com/threads/14515-Proxmox-VE-3-0-Clustering-setup-broke-permissions

but coming from different path:

cluster of 2 nodes, yesterday I had to (live) migrate all vms from one node to the other (why? see this thread)

I left the "empty" node stopped for maintenance
during the night on the other node, ALL backups, except the only openvz CT, failed for the same reason:
Code:
VMID    NAME    STATUS    TIME    SIZE    FILENAME
102    VM 102    FAILED    00:00:00    command 'qm set 102 --lock backup' failed: exit code 2
108    openvzmwserve.proxmox    OK    00:01:22    564MB    /mnt/pve/iso_qnap/dump/vzdump-openvz-108-2013_07_12-01_00_02.tar.lzo
202    VM 202    FAILED    00:00:01    command 'qm set 202 --lock backup' failed: exit code 2
203    VM 203    FAILED    00:00:00    command 'qm set 203 --lock backup' failed: exit code 2
205    VM 205    FAILED    00:00:01    command 'qm set 205 --lock backup' failed: exit code 2
206    VM 206    FAILED    00:00:00    command 'qm set 206 --lock backup' failed: exit code 2
207    VM 207    FAILED    00:00:00    command 'qm set 207 --lock backup' failed: exit code 2
209    VM 209    FAILED    00:00:01    command 'qm set 209 --lock backup' failed: exit code 2
211    VM 211    FAILED    00:00:00    command 'qm set 211 --lock backup' failed: exit code 2
300    VM 300    FAILED    00:00:01    command 'qm set 300 --lock backup' failed: exit code 2
301    VM 301    FAILED    00:00:00    command 'qm set 301 --lock backup' failed: exit code 2
302    VM 302    FAILED    00:00:00    command 'qm set 302 --lock backup' failed: exit code 2
310    VM 310    FAILED    00:00:01    command 'qm set 310 --lock backup' failed: exit code 2
400    VM 400    FAILED    00:00:00    command 'qm set 400 --lock backup' failed: exit code 2
900    VM 900    FAILED    00:00:01    command 'qm set 900 --lock backup' failed: exit code 2
TOTAL    00:01:28    564MB    


Detailed backup logs:

vzdump 108 202 203 204 205 206 207 209 211 300 301 302 310 400 101 900 102 --quiet 1 --mailto mail_address --mode snapshot --compress lzo --storage iso_qnap

102: Jul 12 01:00:02 INFO: Starting Backup of VM 102 (qemu)
102: Jul 12 01:00:02 INFO: status = running
102: Jul 12 01:00:02 INFO: unable to open file '/etc/pve/nodes/pve2/qemu-server/102.conf.tmp.764986' - Permission denied
102: Jul 12 01:00:02 ERROR: Backup of VM 102 failed - command 'qm set 102 --lock backup' failed: exit code 2

108: Jul 12 01:00:02 INFO: Starting Backup of VM 108 (openvz)
108: Jul 12 01:00:02 INFO: CTID 108 exist mounted running
108: Jul 12 01:00:02 INFO: status = running
108: Jul 12 01:00:02 INFO: backup mode: snapshot
108: Jul 12 01:00:02 INFO: ionice priority: 7
108: Jul 12 01:00:02 INFO: creating lvm snapshot of /dev/mapper/pve-data ('/dev/pve/vzsnap-pve2-0')
108: Jul 12 01:00:03 INFO:   Logical volume "vzsnap-pve2-0" created
108: Jul 12 01:00:03 INFO: creating archive '/mnt/pve/iso_qnap/dump/vzdump-openvz-108-2013_07_12-01_00_02.tar.lzo'
108: Jul 12 01:01:15 INFO: Total bytes written: 1015971840 (969MiB, 16MiB/s)
108: Jul 12 01:01:22 INFO: archive file size: 564MB
108: Jul 12 01:01:22 INFO: delete old backup '/mnt/pve/iso_qnap/dump/vzdump-openvz-108-2013_07_10-01_00_02.tar.lzo'
108: Jul 12 01:01:24 INFO: Finished Backup of VM 108 (00:01:22)

202: Jul 12 01:01:24 INFO: Starting Backup of VM 202 (qemu)
202: Jul 12 01:01:24 INFO: status = running
202: Jul 12 01:01:25 INFO: unable to open file '/etc/pve/nodes/pve2/qemu-server/202.conf.tmp.765347' - Permission denied
202: Jul 12 01:01:25 ERROR: Backup of VM 202 failed - command 'qm set 202 --lock backup' failed: exit code 2

203: Jul 12 01:01:25 INFO: Starting Backup of VM 203 (qemu)
203: Jul 12 01:01:25 INFO: status = running
203: Jul 12 01:01:25 INFO: unable to open file '/etc/pve/nodes/pve2/qemu-server/203.conf.tmp.765351' - Permission denied
203: Jul 12 01:01:25 ERROR: Backup of VM 203 failed - command 'qm set 203 --lock backup' failed: exit code 2

205: Jul 12 01:01:25 INFO: Starting Backup of VM 205 (qemu)
205: Jul 12 01:01:25 INFO: status = running
205: Jul 12 01:01:26 INFO: unable to open file '/etc/pve/nodes/pve2/qemu-server/205.conf.tmp.765355' - Permission denied
205: Jul 12 01:01:26 ERROR: Backup of VM 205 failed - command 'qm set 205 --lock backup' failed: exit code 2

206: Jul 12 01:01:26 INFO: Starting Backup of VM 206 (qemu)
206: Jul 12 01:01:26 INFO: status = running
206: Jul 12 01:01:26 INFO: unable to open file '/etc/pve/nodes/pve2/qemu-server/206.conf.tmp.765359' - Permission denied
206: Jul 12 01:01:26 ERROR: Backup of VM 206 failed - command 'qm set 206 --lock backup' failed: exit code 2

207: Jul 12 01:01:26 INFO: Starting Backup of VM 207 (qemu)
207: Jul 12 01:01:26 INFO: status = running
207: Jul 12 01:01:26 INFO: unable to open file '/etc/pve/nodes/pve2/qemu-server/207.conf.tmp.765363' - Permission denied
207: Jul 12 01:01:26 ERROR: Backup of VM 207 failed - command 'qm set 207 --lock backup' failed: exit code 2

209: Jul 12 01:01:26 INFO: Starting Backup of VM 209 (qemu)
209: Jul 12 01:01:26 INFO: status = running
209: Jul 12 01:01:27 INFO: unable to open file '/etc/pve/nodes/pve2/qemu-server/209.conf.tmp.765367' - Permission denied
209: Jul 12 01:01:27 ERROR: Backup of VM 209 failed - command 'qm set 209 --lock backup' failed: exit code 2

211: Jul 12 01:01:27 INFO: Starting Backup of VM 211 (qemu)
211: Jul 12 01:01:27 INFO: status = running
211: Jul 12 01:01:27 INFO: unable to open file '/etc/pve/nodes/pve2/qemu-server/211.conf.tmp.765371' - Permission denied
211: Jul 12 01:01:27 ERROR: Backup of VM 211 failed - command 'qm set 211 --lock backup' failed: exit code 2

300: Jul 12 01:01:27 INFO: Starting Backup of VM 300 (qemu)
300: Jul 12 01:01:27 INFO: status = running
300: Jul 12 01:01:27 INFO: unable to open file '/etc/pve/nodes/pve2/qemu-server/300.conf.tmp.765375' - Permission denied
300: Jul 12 01:01:28 ERROR: Backup of VM 300 failed - command 'qm set 300 --lock backup' failed: exit code 2

301: Jul 12 01:01:28 INFO: Starting Backup of VM 301 (qemu)
301: Jul 12 01:01:28 INFO: status = running
301: Jul 12 01:01:28 INFO: unable to open file '/etc/pve/nodes/pve2/qemu-server/301.conf.tmp.765379' - Permission denied
301: Jul 12 01:01:28 ERROR: Backup of VM 301 failed - command 'qm set 301 --lock backup' failed: exit code 2

302: Jul 12 01:01:28 INFO: Starting Backup of VM 302 (qemu)
302: Jul 12 01:01:28 INFO: status = running
302: Jul 12 01:01:28 INFO: unable to open file '/etc/pve/nodes/pve2/qemu-server/302.conf.tmp.765405' - Permission denied
302: Jul 12 01:01:28 ERROR: Backup of VM 302 failed - command 'qm set 302 --lock backup' failed: exit code 2

310: Jul 12 01:01:28 INFO: Starting Backup of VM 310 (qemu)
310: Jul 12 01:01:28 INFO: status = running
310: Jul 12 01:01:29 INFO: unable to open file '/etc/pve/nodes/pve2/qemu-server/310.conf.tmp.765423' - Permission denied
310: Jul 12 01:01:29 ERROR: Backup of VM 310 failed - command 'qm set 310 --lock backup' failed: exit code 2

400: Jul 12 01:01:29 INFO: Starting Backup of VM 400 (qemu)
400: Jul 12 01:01:29 INFO: status = running
400: Jul 12 01:01:29 INFO: unable to open file '/etc/pve/nodes/pve2/qemu-server/400.conf.tmp.765427' - Permission denied
400: Jul 12 01:01:29 ERROR: Backup of VM 400 failed - command 'qm set 400 --lock backup' failed: exit code 2

900: Jul 12 01:01:29 INFO: Starting Backup of VM 900 (qemu)
900: Jul 12 01:01:29 INFO: status = running
900: Jul 12 01:01:29 INFO: unable to open file '/etc/pve/nodes/pve2/qemu-server/900.conf.tmp.765431' - Permission denied
900: Jul 12 01:01:30 ERROR: Backup of VM 900 failed - command 'qm set 900 --lock backup' failed: exit code 2

checking on the filesystem:

Code:
 ls -lah /etc/pve/nodes/pve2/qemu-server/
total 9.5K
dr-xr-x--- 2 root www-data   0 May 18  2012 .
dr-xr-x--- 2 root www-data   0 May 18  2012 ..
-r--r----- 1 root www-data 393 Jun 19  2012 100.conf
-r--r----- 1 root www-data 183 Jul 11 16:00 102.conf
-r--r----- 1 root www-data 202 Apr 17 11:28 200.conf
-r--r----- 1 root www-data 260 Jul 11 01:08 202.conf
-r--r----- 1 root www-data 262 Jul 11 01:14 203.conf
-r--r----- 1 root www-data 259 Jul 11 01:36 205.conf
-r--r----- 1 root www-data 312 Jul 11 01:53 206.conf
-r--r----- 1 root www-data 299 Jul 11 02:07 207.conf
-r--r----- 1 root www-data 342 Jul 11 02:25 209.conf
-r--r----- 1 root www-data 247 Jul 11 02:37 211.conf
-r--r----- 1 root www-data 180 Apr 12 16:55 216.conf
-r--r----- 1 root www-data 330 Jul 11 02:42 300.conf
-r--r----- 1 root www-data 281 Jul 11 03:02 301.conf
-r--r----- 1 root www-data 214 Jul 11 03:13 302.conf
-r--r----- 1 root www-data 333 Mar 27 02:41 306.conf
-r--r----- 1 root www-data 332 Mar 22 10:33 309.conf
-r--r----- 1 root www-data 288 Jul 11 03:41 310.conf
-r--r----- 1 root www-data 263 Jul 11 15:58 400.conf
-r--r----- 1 root www-data 224 Jul 11 15:57 900.conf

of course my cluster (not ha managed) sees only one node

# pvecm nodes
Code:
Node  Sts   Inc   Joined               Name
   1   M     80   2013-07-05 16:15:25  pve2
   2   X     88                        pve1

and
Code:
# pvecm status
Version: 6.2.0
Config Version: 2
Cluster Name: pvecluster
Cluster Id: 48308
Cluster Member: Yes
Cluster Generation: 92
Membership state: Cluster-Member
Nodes: 1
Expected votes: 2
Total votes: 1
Node votes: 1
Quorum: 2 Activity blocked
Active subsystems: 5
Flags:
Ports Bound: 0
Node name: pve2
Node ID: 1
Multicast addresses: 239.192.188.113
Node addresses: xxx.xxx.xxx.xxx


? why is this happening?

i guess answer is here
Code:
Nodes: 1
Expected votes: 2
Total votes: 1
Node votes: 1
Quorum: 2 Activity blocked

but I have no HA, and it's (for me) perfectly fine that 1 node is down...

what to do now, and when the other node will come up again?

I will have to do the reverse, moving all vms to the first node, and stopping the second...

Code:
pveversion -v
pve-manager: 2.1-1 (pve-manager/2.1/f9b0f63a)
running kernel: 2.6.32-11-pve
proxmox-ve-2.6.32: 2.0-66
pve-kernel-2.6.32-11-pve: 2.6.32-66
lvm2: 2.02.95-1pve2
clvm: 2.02.95-1pve2
corosync-pve: 1.4.3-1
openais-pve: 1.1.4-2
libqb: 0.10.1-2
redhat-cluster-pve: 3.1.8-3
resource-agents-pve: 3.9.2-3
fence-agents-pve: 3.1.7-2
pve-cluster: 1.0-26
qemu-server: 2.0-39
pve-firmware: 1.0-15
libpve-common-perl: 1.0-27
libpve-access-control: 1.0-21
libpve-storage-perl: 2.0-18
vncterm: 1.0-2
vzctl: 3.0.30-2pve5
vzprocps: 2.0.11-2
vzquota: 3.0.12-3
pve-qemu-kvm: 1.0-9
ksm-control-daemon: 1.1-1

Marco
 
Last edited:
Re: /etc/pve/nodes/node/qemu-server/ readonly after vm migrate from other node

Thanks,

I always skipped those page as I thought they were only about HA setups, anyway:

cat /etc/pve/cluster.conf
Code:
<?xml version="1.0"?>
<cluster name="pvecluster" config_version="2">

  <cman keyfile="/var/lib/pve-cluster/corosync.authkey">
  </cman>

  <clusternodes>
  <clusternode name="pve2" votes="1" nodeid="1"/>
  <clusternode name="pve1" votes="1" nodeid="2"/></clusternodes>

</cluster>

shoud I edit it as cluster.conf.new to become:

Code:
<?xml version="1.0"?>
<cluster name="pvecluster" config_version="[B][COLOR=#ff0000]3[/COLOR][/B]">

  <cman keyfile="/var/lib/pve-cluster/corosync.authkey" [COLOR=#ff0000][B]two_node="1" expected_votes="1"[/B][/COLOR]>
  </cman>

  <clusternodes>
  <clusternode name="pve2" votes="1" nodeid="1"/>
  <clusternode name="pve1" votes="1" nodeid="2"/></clusternodes>

</cluster>

and then ? shoud I follow ther same steps the wiki displays for HA (HA tab?)

but you mean, "when the other node is back" ?

because at the moment I can't, gui shows

Code:
cluster not ready - no quorum? (500)

and the spinning wheel never stops...

otherwise?

Marco
 
Re: /etc/pve/nodes/node/qemu-server/ readonly after vm migrate from other node

yes, after your did the changes in cluster.conf.new (and validated) just activate via GUI.
 
Re: /etc/pve/nodes/node/qemu-server/ readonly after vm migrate from other node

yes, otherwise you cannot write to /etc/pve/...
 
Re: /etc/pve/nodes/node/qemu-server/ readonly after vm migrate from other node

Interesting, thanks! But what I would love to achieve with 2 node cluster without HA is this:
- set for 2 nodes as you suggested, and let's say I've shared storage with vm 101 running on first node and vm 102 on second node
- if node 2 dies, have a button on web interface related to node 1 that says "promote me as the master of the universe" ;P Maybe a warning message could be enough ("do you really know what you are doing?" ;P))
- move the 102 vm, that I'm sure is off since I'm there in the "datacenter" and I see the smoke coming from node 2, to node 1
- start vm 102 on node 1
- repair node 2 (i.e. replace power supply, change MB or whatever)
- turn ON node 2 that will join the cluster again, find the "master of the universe" and then fetch it's config as the good one, so will NOT start vm102 since does no longer belongs to him
- me from web interface "unpromote" node 1 and have the cluster work normally

Currently instead I've to go in bash and move config files manually
 
Re: /etc/pve/nodes/node/qemu-server/ readonly after vm migrate from other node

Just want to tell you that such settings are dangerous, beacuse the whole cluster can get int a inconsistent state. There is a reason why quorum is required.
 
Re: /etc/pve/nodes/node/qemu-server/ readonly after vm migrate from other node

Please can you remove that "two_node="1" expected_votes="1" suggestion - I think that is very dangerous!

its also documented like this in 'man cman', that the reason why some write this our wiki.
 
Re: /etc/pve/nodes/node/qemu-server/ readonly after vm migrate from other node

Please can you remove that "two_node="1" expected_votes="1" suggestion - I think hat is very dangerous!

Dietmar, i thank you for your hint, and I'm sure you as a developer have a strong point, and deeper knowledge, but...
can you explain in some more detail what you fear could possibly happen, so that I can beter undestand the risks and edit that wiki section?

As I said before: the cluster as no HA setup. I had to take the node down for maintenance and so, before that, I migrated everything to the other node. All of my backups were failing, on the remaining node, because of readonly locking. So I followed Tom hint and (for now) executed

#pvecm expected 1

And nothing else, all backups went fine.

Now, I need to power up that node: what could possibly happen in either of these cases:
1) if I leave that "expected" as it is: 1
2) if I edit /etc/pve/cluster.conf adding "two_node="1" expected_votes="1"
3) if I revert the CLI command as with #pvecm expected 2
4) other?
...

I think all users that have a 2 node setup, even not HA, should know what to do when in the need to bring one node down for maintenance (but also for a not planned one as a faulty node), not just what NOT to do because "it's dangerous", without any explaination (it sounds a bit FUD).

A 2 node cluster is not supported under pve? If not, users shoud be warned to not use it. If yes, they shoud know how should it be treated, and how to safely manage a perfectly normal situation like having to switch off a node for maintenance.

Of course this apply also to 3 and more nodes, and in particular with HA setups, because they're expected to "shoot each other in the head" in case of a failure, and restart inaccessible vm on ohter cluster nodes automatically, with all the risk involved, but in this case everything was planned, vms moved in advance, nothing else changed in the cluster...

If it is dangerous it will be of course interesting and useful to know why, but it will be much more useful to know how to manage such situations properly, even if the suggestion present in the manual of the tools provided by pve itself, in this case a third-party tool, are dangerous for some other reason in the pve environment.

I know pve is a compelx beasts, and changes really fast, but to help I will be happy to document this kind of knowledge in the pve wiki, once I'm sure to have understood. :)

Marco
 
Re: /etc/pve/nodes/node/qemu-server/ readonly after vm migrate from other node

Dietmar, i thank you for your hint, and I'm sure you as a developer have a strong point, and deeper knowledge, but...
can you explain in some more detail what you fear could possibly happen, so that I can beter undestand the risks and edit that wiki section?

You can run into a split brain, because you allow each node to update values without having quorum.

As I said before: the cluster as no HA setup. I had to take the node down for maintenance and so, before that, I migrated everything to the other node. All of my backups were failing, on the remaining node, because of readonly locking. So I followed Tom hint and (for now) executed

#pvecm expected 1

And nothing else, all backups went fine.

Now, I need to power up that node: what could possibly happen in either of these cases:
1) if I leave that "expected" as it is: 1

This is automatically updated if a new node joins.
 
Re: /etc/pve/nodes/node/qemu-server/ readonly after vm migrate from other node

I have does the following configuration for unicasting under cluster.conf.new

root@node3:~# cat /etc/pve/cluster.conf.new
<?xml version="1.0"?>
<cluster name="master3" config_version="3">


<cman keyfile="/var/lib/pve-cluster/corosync.authkey" transport="udpu">
</cman>


<clusternodes>
<clusternode name="node3" votes="1" nodeid="1"/>
<clusternode name="node1" votes="1" nodeid="2"/></clusternodes>


</cluster>

root@node3:~# service cman status
fenced is stopped

root@node3:~# service cman restart
Stopping cluster:
Stopping dlm_controld... [ OK ]
Stopping fenced... [ OK ]
Stopping cman... [ OK ]
Waiting for corosync to shutdown:[ OK ]
Unloading kernel modules... [ OK ]
Unmounting configfs... [ OK ]
Starting cluster:
Checking if cluster has been disabled at boot... [ OK ]
Checking Network Manager... [ OK ]
Global setup... [ OK ]
Loading kernel modules... [ OK ]
Mounting configfs... [ OK ]
Starting cman... [ OK ]
Waiting for quorum... Timed-out waiting for cluster
[FAILED]

root@node3:~# pveversion -v
proxmox-ve-2.6.32: 3.1-109 (running kernel: 2.6.32-23-pve)
pve-manager: 3.1-3 (running version: 3.1-3/dc0e9b0e)
pve-kernel-2.6.32-23-pve: 2.6.32-109
lvm2: 2.02.98-pve4
clvm: 2.02.98-pve4
corosync-pve: 1.4.5-1
openais-pve: 1.1.4-3
libqb0: 0.11.1-2
redhat-cluster-pve: 3.2.0-2
resource-agents-pve: 3.9.2-4
fence-agents-pve: 4.0.0-1
pve-cluster: 3.0-7
qemu-server: 3.1-1
pve-firmware: 1.0-23
libpve-common-perl: 3.0-6
libpve-access-control: 3.0-6
libpve-storage-perl: 3.0-10
pve-libspice-server1: 0.12.4-1
vncterm: 1.1-4
vzctl: 4.0-1pve3
vzprocps: 2.0.11-2
vzquota: 3.1-2
pve-qemu-kvm: 1.4-17
ksm-control-daemon: 1.1-1
glusterfs-client: 3.4.0-2

Please help me solve this cluster problem.
I can't understand how to proceed for this.

please help.
Thanks in advance.
Nasim
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!