[solved] /etc/pve/nodes/node/qemu-server/ readonly after vm migrate from other node

m.ardito · Jul 12, 2013

I have a problem similar to this OP
http://forum.proxmox.com/threads/14515-Proxmox-VE-3-0-Clustering-setup-broke-permissions

but coming from different path:

cluster of 2 nodes, yesterday I had to (live) migrate all vms from one node to the other (why? see this thread)

I left the "empty" node stopped for maintenance
during the night on the other node, ALL backups, except the only openvz CT, failed for the same reason:

Code:

VMID    NAME    STATUS    TIME    SIZE    FILENAME
102    VM 102    FAILED    00:00:00    command 'qm set 102 --lock backup' failed: exit code 2
108    openvzmwserve.proxmox    OK    00:01:22    564MB    /mnt/pve/iso_qnap/dump/vzdump-openvz-108-2013_07_12-01_00_02.tar.lzo
202    VM 202    FAILED    00:00:01    command 'qm set 202 --lock backup' failed: exit code 2
203    VM 203    FAILED    00:00:00    command 'qm set 203 --lock backup' failed: exit code 2
205    VM 205    FAILED    00:00:01    command 'qm set 205 --lock backup' failed: exit code 2
206    VM 206    FAILED    00:00:00    command 'qm set 206 --lock backup' failed: exit code 2
207    VM 207    FAILED    00:00:00    command 'qm set 207 --lock backup' failed: exit code 2
209    VM 209    FAILED    00:00:01    command 'qm set 209 --lock backup' failed: exit code 2
211    VM 211    FAILED    00:00:00    command 'qm set 211 --lock backup' failed: exit code 2
300    VM 300    FAILED    00:00:01    command 'qm set 300 --lock backup' failed: exit code 2
301    VM 301    FAILED    00:00:00    command 'qm set 301 --lock backup' failed: exit code 2
302    VM 302    FAILED    00:00:00    command 'qm set 302 --lock backup' failed: exit code 2
310    VM 310    FAILED    00:00:01    command 'qm set 310 --lock backup' failed: exit code 2
400    VM 400    FAILED    00:00:00    command 'qm set 400 --lock backup' failed: exit code 2
900    VM 900    FAILED    00:00:01    command 'qm set 900 --lock backup' failed: exit code 2
TOTAL    00:01:28    564MB    


Detailed backup logs:

vzdump 108 202 203 204 205 206 207 209 211 300 301 302 310 400 101 900 102 --quiet 1 --mailto mail_address --mode snapshot --compress lzo --storage iso_qnap

102: Jul 12 01:00:02 INFO: Starting Backup of VM 102 (qemu)
102: Jul 12 01:00:02 INFO: status = running
102: Jul 12 01:00:02 INFO: unable to open file '/etc/pve/nodes/pve2/qemu-server/102.conf.tmp.764986' - Permission denied
102: Jul 12 01:00:02 ERROR: Backup of VM 102 failed - command 'qm set 102 --lock backup' failed: exit code 2

108: Jul 12 01:00:02 INFO: Starting Backup of VM 108 (openvz)
108: Jul 12 01:00:02 INFO: CTID 108 exist mounted running
108: Jul 12 01:00:02 INFO: status = running
108: Jul 12 01:00:02 INFO: backup mode: snapshot
108: Jul 12 01:00:02 INFO: ionice priority: 7
108: Jul 12 01:00:02 INFO: creating lvm snapshot of /dev/mapper/pve-data ('/dev/pve/vzsnap-pve2-0')
108: Jul 12 01:00:03 INFO:   Logical volume "vzsnap-pve2-0" created
108: Jul 12 01:00:03 INFO: creating archive '/mnt/pve/iso_qnap/dump/vzdump-openvz-108-2013_07_12-01_00_02.tar.lzo'
108: Jul 12 01:01:15 INFO: Total bytes written: 1015971840 (969MiB, 16MiB/s)
108: Jul 12 01:01:22 INFO: archive file size: 564MB
108: Jul 12 01:01:22 INFO: delete old backup '/mnt/pve/iso_qnap/dump/vzdump-openvz-108-2013_07_10-01_00_02.tar.lzo'
108: Jul 12 01:01:24 INFO: Finished Backup of VM 108 (00:01:22)

202: Jul 12 01:01:24 INFO: Starting Backup of VM 202 (qemu)
202: Jul 12 01:01:24 INFO: status = running
202: Jul 12 01:01:25 INFO: unable to open file '/etc/pve/nodes/pve2/qemu-server/202.conf.tmp.765347' - Permission denied
202: Jul 12 01:01:25 ERROR: Backup of VM 202 failed - command 'qm set 202 --lock backup' failed: exit code 2

203: Jul 12 01:01:25 INFO: Starting Backup of VM 203 (qemu)
203: Jul 12 01:01:25 INFO: status = running
203: Jul 12 01:01:25 INFO: unable to open file '/etc/pve/nodes/pve2/qemu-server/203.conf.tmp.765351' - Permission denied
203: Jul 12 01:01:25 ERROR: Backup of VM 203 failed - command 'qm set 203 --lock backup' failed: exit code 2

205: Jul 12 01:01:25 INFO: Starting Backup of VM 205 (qemu)
205: Jul 12 01:01:25 INFO: status = running
205: Jul 12 01:01:26 INFO: unable to open file '/etc/pve/nodes/pve2/qemu-server/205.conf.tmp.765355' - Permission denied
205: Jul 12 01:01:26 ERROR: Backup of VM 205 failed - command 'qm set 205 --lock backup' failed: exit code 2

206: Jul 12 01:01:26 INFO: Starting Backup of VM 206 (qemu)
206: Jul 12 01:01:26 INFO: status = running
206: Jul 12 01:01:26 INFO: unable to open file '/etc/pve/nodes/pve2/qemu-server/206.conf.tmp.765359' - Permission denied
206: Jul 12 01:01:26 ERROR: Backup of VM 206 failed - command 'qm set 206 --lock backup' failed: exit code 2

207: Jul 12 01:01:26 INFO: Starting Backup of VM 207 (qemu)
207: Jul 12 01:01:26 INFO: status = running
207: Jul 12 01:01:26 INFO: unable to open file '/etc/pve/nodes/pve2/qemu-server/207.conf.tmp.765363' - Permission denied
207: Jul 12 01:01:26 ERROR: Backup of VM 207 failed - command 'qm set 207 --lock backup' failed: exit code 2

209: Jul 12 01:01:26 INFO: Starting Backup of VM 209 (qemu)
209: Jul 12 01:01:26 INFO: status = running
209: Jul 12 01:01:27 INFO: unable to open file '/etc/pve/nodes/pve2/qemu-server/209.conf.tmp.765367' - Permission denied
209: Jul 12 01:01:27 ERROR: Backup of VM 209 failed - command 'qm set 209 --lock backup' failed: exit code 2

211: Jul 12 01:01:27 INFO: Starting Backup of VM 211 (qemu)
211: Jul 12 01:01:27 INFO: status = running
211: Jul 12 01:01:27 INFO: unable to open file '/etc/pve/nodes/pve2/qemu-server/211.conf.tmp.765371' - Permission denied
211: Jul 12 01:01:27 ERROR: Backup of VM 211 failed - command 'qm set 211 --lock backup' failed: exit code 2

300: Jul 12 01:01:27 INFO: Starting Backup of VM 300 (qemu)
300: Jul 12 01:01:27 INFO: status = running
300: Jul 12 01:01:27 INFO: unable to open file '/etc/pve/nodes/pve2/qemu-server/300.conf.tmp.765375' - Permission denied
300: Jul 12 01:01:28 ERROR: Backup of VM 300 failed - command 'qm set 300 --lock backup' failed: exit code 2

301: Jul 12 01:01:28 INFO: Starting Backup of VM 301 (qemu)
301: Jul 12 01:01:28 INFO: status = running
301: Jul 12 01:01:28 INFO: unable to open file '/etc/pve/nodes/pve2/qemu-server/301.conf.tmp.765379' - Permission denied
301: Jul 12 01:01:28 ERROR: Backup of VM 301 failed - command 'qm set 301 --lock backup' failed: exit code 2

302: Jul 12 01:01:28 INFO: Starting Backup of VM 302 (qemu)
302: Jul 12 01:01:28 INFO: status = running
302: Jul 12 01:01:28 INFO: unable to open file '/etc/pve/nodes/pve2/qemu-server/302.conf.tmp.765405' - Permission denied
302: Jul 12 01:01:28 ERROR: Backup of VM 302 failed - command 'qm set 302 --lock backup' failed: exit code 2

310: Jul 12 01:01:28 INFO: Starting Backup of VM 310 (qemu)
310: Jul 12 01:01:28 INFO: status = running
310: Jul 12 01:01:29 INFO: unable to open file '/etc/pve/nodes/pve2/qemu-server/310.conf.tmp.765423' - Permission denied
310: Jul 12 01:01:29 ERROR: Backup of VM 310 failed - command 'qm set 310 --lock backup' failed: exit code 2

400: Jul 12 01:01:29 INFO: Starting Backup of VM 400 (qemu)
400: Jul 12 01:01:29 INFO: status = running
400: Jul 12 01:01:29 INFO: unable to open file '/etc/pve/nodes/pve2/qemu-server/400.conf.tmp.765427' - Permission denied
400: Jul 12 01:01:29 ERROR: Backup of VM 400 failed - command 'qm set 400 --lock backup' failed: exit code 2

900: Jul 12 01:01:29 INFO: Starting Backup of VM 900 (qemu)
900: Jul 12 01:01:29 INFO: status = running
900: Jul 12 01:01:29 INFO: unable to open file '/etc/pve/nodes/pve2/qemu-server/900.conf.tmp.765431' - Permission denied
900: Jul 12 01:01:30 ERROR: Backup of VM 900 failed - command 'qm set 900 --lock backup' failed: exit code 2

checking on the filesystem:

Code:

 ls -lah /etc/pve/nodes/pve2/qemu-server/
total 9.5K
dr-xr-x--- 2 root www-data   0 May 18  2012 .
dr-xr-x--- 2 root www-data   0 May 18  2012 ..
-r--r----- 1 root www-data 393 Jun 19  2012 100.conf
-r--r----- 1 root www-data 183 Jul 11 16:00 102.conf
-r--r----- 1 root www-data 202 Apr 17 11:28 200.conf
-r--r----- 1 root www-data 260 Jul 11 01:08 202.conf
-r--r----- 1 root www-data 262 Jul 11 01:14 203.conf
-r--r----- 1 root www-data 259 Jul 11 01:36 205.conf
-r--r----- 1 root www-data 312 Jul 11 01:53 206.conf
-r--r----- 1 root www-data 299 Jul 11 02:07 207.conf
-r--r----- 1 root www-data 342 Jul 11 02:25 209.conf
-r--r----- 1 root www-data 247 Jul 11 02:37 211.conf
-r--r----- 1 root www-data 180 Apr 12 16:55 216.conf
-r--r----- 1 root www-data 330 Jul 11 02:42 300.conf
-r--r----- 1 root www-data 281 Jul 11 03:02 301.conf
-r--r----- 1 root www-data 214 Jul 11 03:13 302.conf
-r--r----- 1 root www-data 333 Mar 27 02:41 306.conf
-r--r----- 1 root www-data 332 Mar 22 10:33 309.conf
-r--r----- 1 root www-data 288 Jul 11 03:41 310.conf
-r--r----- 1 root www-data 263 Jul 11 15:58 400.conf
-r--r----- 1 root www-data 224 Jul 11 15:57 900.conf

of course my cluster (not ha managed) sees only one node

# pvecm nodes

Code:

Node  Sts   Inc   Joined               Name
   1   M     80   2013-07-05 16:15:25  pve2
   2   X     88                        pve1

and

Code:

# pvecm status
Version: 6.2.0
Config Version: 2
Cluster Name: pvecluster
Cluster Id: 48308
Cluster Member: Yes
Cluster Generation: 92
Membership state: Cluster-Member
Nodes: 1
Expected votes: 2
Total votes: 1
Node votes: 1
Quorum: 2 Activity blocked
Active subsystems: 5
Flags:
Ports Bound: 0
Node name: pve2
Node ID: 1
Multicast addresses: 239.192.188.113
Node addresses: xxx.xxx.xxx.xxx

? why is this happening?

i guess answer is here

Code:

Nodes: 1
Expected votes: 2
Total votes: 1
Node votes: 1
Quorum: 2 Activity blocked

but I have no HA, and it's (for me) perfectly fine that 1 node is down...

what to do now, and when the other node will come up again?

I will have to do the reverse, moving all vms to the first node, and stopping the second...

Code:

pveversion -v
pve-manager: 2.1-1 (pve-manager/2.1/f9b0f63a)
running kernel: 2.6.32-11-pve
proxmox-ve-2.6.32: 2.0-66
pve-kernel-2.6.32-11-pve: 2.6.32-66
lvm2: 2.02.95-1pve2
clvm: 2.02.95-1pve2
corosync-pve: 1.4.3-1
openais-pve: 1.1.4-2
libqb: 0.10.1-2
redhat-cluster-pve: 3.1.8-3
resource-agents-pve: 3.9.2-3
fence-agents-pve: 3.1.7-2
pve-cluster: 1.0-26
qemu-server: 2.0-39
pve-firmware: 1.0-15
libpve-common-perl: 1.0-27
libpve-access-control: 1.0-21
libpve-storage-perl: 2.0-18
vncterm: 1.0-2
vzctl: 3.0.30-2pve5
vzprocps: 2.0.11-2
vzquota: 3.0.12-3
pve-qemu-kvm: 1.0-9
ksm-control-daemon: 1.1-1

Marco

tom · Jul 12, 2013

Re: /etc/pve/nodes/node/qemu-server/ readonly after vm migrate from other node

just tell your cluster that you run a two_node setup (two_node="1" expected_votes="1")

see http://pve.proxmox.com/wiki/Two-Node_High_Availability_Cluster#Configuring_Fencing

quroum is always essential (also for non HA, see pmxcfs)

m.ardito · Jul 12, 2013

Re: /etc/pve/nodes/node/qemu-server/ readonly after vm migrate from other node

Thanks,

I always skipped those page as I thought they were only about HA setups, anyway:

cat /etc/pve/cluster.conf

Code:

<?xml version="1.0"?>
<cluster name="pvecluster" config_version="2">

  <cman keyfile="/var/lib/pve-cluster/corosync.authkey">
  </cman>

  <clusternodes>
  <clusternode name="pve2" votes="1" nodeid="1"/>
  <clusternode name="pve1" votes="1" nodeid="2"/></clusternodes>

</cluster>

shoud I edit it as cluster.conf.new to become:

Code:

<?xml version="1.0"?>
<cluster name="pvecluster" config_version="[B][COLOR=#ff0000]3[/COLOR][/B]">

  <cman keyfile="/var/lib/pve-cluster/corosync.authkey" [COLOR=#ff0000][B]two_node="1" expected_votes="1"[/B][/COLOR]>
  </cman>

  <clusternodes>
  <clusternode name="pve2" votes="1" nodeid="1"/>
  <clusternode name="pve1" votes="1" nodeid="2"/></clusternodes>

</cluster>

and then ? shoud I follow ther same steps the wiki displays for HA (HA tab?)

but you mean, "when the other node is back" ?

because at the moment I can't, gui shows

Code:

cluster not ready - no quorum? (500)

and the spinning wheel never stops...

otherwise?

Marco

tom · Jul 12, 2013

Re: /etc/pve/nodes/node/qemu-server/ readonly after vm migrate from other node

yes, after your did the changes in cluster.conf.new (and validated) just activate via GUI.

m.ardito · Jul 12, 2013

Re: /etc/pve/nodes/node/qemu-server/ readonly after vm migrate from other node

yes, that, when the other node comes up, because before I can't access HA tab.

but before that, may I also use

#pvecm expected <expected>

like

#pvecm expected 1

I found this your answer in this thread, http://forum.proxmox.com/threads/10099-cluster-not-ready-no-quorum-(500)
?

Marco

tom · Jul 12, 2013

Re: /etc/pve/nodes/node/qemu-server/ readonly after vm migrate from other node

yes, otherwise you cannot write to /etc/pve/...

m.ardito · Jul 12, 2013

Re: /etc/pve/nodes/node/qemu-server/ readonly after vm migrate from other node

I just added a new wiki section about this case, which is not HA. Hope it is all correct there...
http://pve.proxmox.com/wiki/Proxmox_VE_2.0_Cluster#Two_nodes_cluster_and_quorum_issues

Marco

mmenaz · Jul 12, 2013

Re: /etc/pve/nodes/node/qemu-server/ readonly after vm migrate from other node

Interesting, thanks! But what I would love to achieve with 2 node cluster without HA is this:
- set for 2 nodes as you suggested, and let's say I've shared storage with vm 101 running on first node and vm 102 on second node
- if node 2 dies, have a button on web interface related to node 1 that says "promote me as the master of the universe" ;P Maybe a warning message could be enough ("do you really know what you are doing?" ;P))
- move the 102 vm, that I'm sure is off since I'm there in the "datacenter" and I see the smoke coming from node 2, to node 1
- start vm 102 on node 1
- repair node 2 (i.e. replace power supply, change MB or whatever)
- turn ON node 2 that will join the cluster again, find the "master of the universe" and then fetch it's config as the good one, so will NOT start vm102 since does no longer belongs to him
- me from web interface "unpromote" node 1 and have the cluster work normally

Currently instead I've to go in bash and move config files manually

dietmar · Jul 13, 2013

Re: /etc/pve/nodes/node/qemu-server/ readonly after vm migrate from other node

Just want to tell you that such settings are dangerous, beacuse the whole cluster can get int a inconsistent state. There is a reason why quorum is required.

dietmar · Jul 13, 2013

Re: /etc/pve/nodes/node/qemu-server/ readonly after vm migrate from other node

m.ardito said:
I just added a new wiki section about this case, which is not HA. Hope it is all correct there...
http://pve.proxmox.com/wiki/Proxmox_VE_2.0_Cluster#Two_nodes_cluster_and_quorum_issues

Marco

Please can you remove that "two_node="1" expected_votes="1" suggestion - I think hat is very dangerous!

tom · Jul 13, 2013

Re: /etc/pve/nodes/node/qemu-server/ readonly after vm migrate from other node

dietmar said:
Please can you remove that "two_node="1" expected_votes="1" suggestion - I think that is very dangerous!

its also documented like this in 'man cman', that the reason why some write this our wiki.

m.ardito · Jul 15, 2013

Re: /etc/pve/nodes/node/qemu-server/ readonly after vm migrate from other node

dietmar said:
Please can you remove that "two_node="1" expected_votes="1" suggestion - I think hat is very dangerous!

Dietmar, i thank you for your hint, and I'm sure you as a developer have a strong point, and deeper knowledge, but...
can you explain in some more detail what you fear could possibly happen, so that I can beter undestand the risks and edit that wiki section?

As I said before: the cluster as no HA setup. I had to take the node down for maintenance and so, before that, I migrated everything to the other node. All of my backups were failing, on the remaining node, because of readonly locking. So I followed Tom hint and (for now) executed

#pvecm expected 1

And nothing else, all backups went fine.

Now, I need to power up that node: what could possibly happen in either of these cases:
1) if I leave that "expected" as it is: 1
2) if I edit /etc/pve/cluster.conf adding "two_node="1" expected_votes="1"
3) if I revert the CLI command as with #pvecm expected 2
4) other?
...

I think all users that have a 2 node setup, even not HA, should know what to do when in the need to bring one node down for maintenance (but also for a not planned one as a faulty node), not just what NOT to do because "it's dangerous", without any explaination (it sounds a bit FUD).

A 2 node cluster is not supported under pve? If not, users shoud be warned to not use it. If yes, they shoud know how should it be treated, and how to safely manage a perfectly normal situation like having to switch off a node for maintenance.

Of course this apply also to 3 and more nodes, and in particular with HA setups, because they're expected to "shoot each other in the head" in case of a failure, and restart inaccessible vm on ohter cluster nodes automatically, with all the risk involved, but in this case everything was planned, vms moved in advance, nothing else changed in the cluster...

If it is dangerous it will be of course interesting and useful to know why, but it will be much more useful to know how to manage such situations properly, even if the suggestion present in the manual of the tools provided by pve itself, in this case a third-party tool, are dangerous for some other reason in the pve environment.

I know pve is a compelx beasts, and changes really fast, but to help I will be happy to document this kind of knowledge in the pve wiki, once I'm sure to have understood.

Marco

dietmar · Jul 15, 2013

Re: /etc/pve/nodes/node/qemu-server/ readonly after vm migrate from other node

m.ardito said:
Dietmar, i thank you for your hint, and I'm sure you as a developer have a strong point, and deeper knowledge, but...
can you explain in some more detail what you fear could possibly happen, so that I can beter undestand the risks and edit that wiki section?

You can run into a split brain, because you allow each node to update values without having quorum.

m.ardito said:
As I said before: the cluster as no HA setup. I had to take the node down for maintenance and so, before that, I migrated everything to the other node. All of my backups were failing, on the remaining node, because of readonly locking. So I followed Tom hint and (for now) executed

#pvecm expected 1

And nothing else, all backups went fine.

Now, I need to power up that node: what could possibly happen in either of these cases:
1) if I leave that "expected" as it is: 1

This is automatically updated if a new node joins.

m.ardito · Jul 15, 2013

Re: /etc/pve/nodes/node/qemu-server/ readonly after vm migrate from other node

Thanks, I will dig myself into more details (eg: which "values" could a node update in that situation?)

I added some caution words in that wiki section, and an external link to a wiki article explaining the matter in som e extent.
http://pve.proxmox.com/wiki/Proxmox_VE_2.0_Cluster#The_solution

Marco

nasim · Nov 27, 2013

Re: /etc/pve/nodes/node/qemu-server/ readonly after vm migrate from other node

I have does the following configuration for unicasting under cluster.conf.new

root@node3:~# cat /etc/pve/cluster.conf.new
<?xml version="1.0"?>
<cluster name="master3" config_version="3">

<cman keyfile="/var/lib/pve-cluster/corosync.authkey" transport="udpu">
</cman>

<clusternodes>
<clusternode name="node3" votes="1" nodeid="1"/>
<clusternode name="node1" votes="1" nodeid="2"/></clusternodes>

</cluster>

root@node3:~# service cman status
fenced is stopped

root@node3:~# service cman restart
Stopping cluster:
Stopping dlm_controld... [ OK ]
Stopping fenced... [ OK ]
Stopping cman... [ OK ]
Waiting for corosync to shutdown:[ OK ]
Unloading kernel modules... [ OK ]
Unmounting configfs... [ OK ]
Starting cluster:
Checking if cluster has been disabled at boot... [ OK ]
Checking Network Manager... [ OK ]
Global setup... [ OK ]
Loading kernel modules... [ OK ]
Mounting configfs... [ OK ]
Starting cman... [ OK ]
Waiting for quorum... Timed-out waiting for cluster
[FAILED]

root@node3:~# pveversion -v
proxmox-ve-2.6.32: 3.1-109 (running kernel: 2.6.32-23-pve)
pve-manager: 3.1-3 (running version: 3.1-3/dc0e9b0e)
pve-kernel-2.6.32-23-pve: 2.6.32-109
lvm2: 2.02.98-pve4
clvm: 2.02.98-pve4
corosync-pve: 1.4.5-1
openais-pve: 1.1.4-3
libqb0: 0.11.1-2
redhat-cluster-pve: 3.2.0-2
resource-agents-pve: 3.9.2-4
fence-agents-pve: 4.0.0-1
pve-cluster: 3.0-7
qemu-server: 3.1-1
pve-firmware: 1.0-23
libpve-common-perl: 3.0-6
libpve-access-control: 3.0-6
libpve-storage-perl: 3.0-10
pve-libspice-server1: 0.12.4-1
vncterm: 1.1-4
vzctl: 4.0-1pve3
vzprocps: 2.0.11-2
vzquota: 3.1-2
pve-qemu-kvm: 1.4-17
ksm-control-daemon: 1.1-1
glusterfs-client: 3.4.0-2

Please help me solve this cluster problem.
I can't understand how to proceed for this.

please help.
Thanks in advance.
Nasim

Search

Search

[solved] /etc/pve/nodes/node/qemu-server/ readonly after vm migrate from other node

m.ardito

Famous Member

tom

Proxmox Staff Member

m.ardito

Famous Member

tom

Proxmox Staff Member

m.ardito

Famous Member

tom

Proxmox Staff Member

m.ardito

Famous Member

mmenaz

Renowned Member

dietmar

Proxmox Staff Member

dietmar

Proxmox Staff Member

tom

Proxmox Staff Member

m.ardito

Famous Member

dietmar

Proxmox Staff Member

m.ardito

Famous Member

nasim

Guest