[Proxmox 2.0 rc1] unable to aquire pmxcfs lock - trying again

M

Mirage

Guest
Hi,

I am trying proxmox 2.0 and seems very promising, but I do have a big problem : Cluster does not work.

I installed 2 bare metal proxmox from the PVE iso.

I set up an IP address during installation process (different IP on 2 servers).
Once both servers are up, I moved the IP address to another interface (to free vmbr0 from cluster stuff)

Cluster is setting up, but after a reboot I can't get it to work properly :
- I can't log in to the second server (web interface is not showing up)
- I can't have access to 2nd server throught web interface on 1st server, even if it shows up in green (Can't connect to ip:8006 Connection refused : 500)

Some useful informations :

SERVER 1

Code:
root@FR-PM-PROX03-PRD:~# hostname
FR-PM-PROX03-PRD

Code:
root@FR-PM-PROX03-PRD:~# cat /etc/hosts
127.0.0.1 localhost.localdomain localhost
172.20.75.3 FR-PM-PROX03-PRD.COMPANY.ORG FR-PM-PROX03-PRD pvelocalhost

# The following lines are desirable for IPv6 capable hosts

::1     ip6-localhost ip6-loopback
fe00::0 ip6-localnet
ff00::0 ip6-mcastprefix
ff02::1 ip6-allnodes
ff02::2 ip6-allrouters
ff02::3 ip6-allhosts

Code:
root@FR-PM-PROX03-PRD:~# pvecm nodes
Node  Sts   Inc   Joined               Name
   1   M     12   2012-03-23 14:32:28  FR-PM-PROX03-PRD
   2   M     20   2012-03-23 14:32:33  FR-PM-PROX04-PRD

Code:
root@FR-PM-PROX03-PRD:~# pvecm status
Version: 6.2.0
Config Version: 2
Cluster Name: PROXMOX
Cluster Id: 10266
Cluster Member: Yes
Cluster Generation: 20
Membership state: Cluster-Member
Nodes: 2
Expected votes: 2
Total votes: 2
Node votes: 1
Quorum: 2  
Active subsystems: 5
Flags: 
Ports Bound: 0  
Node name: FR-PM-PROX03-PRD
Node ID: 1
Multicast addresses: 239.192.40.66 
Node addresses: 172.20.75.3

Code:
root@FR-PM-PROX03-PRD:~# pmxcfs -f
[main] notice: unable to aquire pmxcfs lock - trying again
[main] crit: unable to aquire pmxcfs lock: Resource temporarily unavailable
[main] notice: exit proxmox configuration filesystem (-1)

Code:
root@FR-PM-PROX03-PRD:~# ifconfig
eth0      Link encap:Ethernet  HWaddr d0:67:e5:ef:26:be  
          UP BROADCAST MULTICAST  MTU:1500  Metric:1
...

eth6      Link encap:Ethernet  HWaddr 90:e2:ba:07:18:a4  
          inet addr:172.20.75.3  Bcast:172.20.75.255  Mask:255.255.255.0
          inet6 addr: fe80::92e2:baff:fe07:18a4/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
...

eth7      Link encap:Ethernet  HWaddr 90:e2:ba:07:18:a5  
          inet addr:172.20.70.108  Bcast:172.20.70.255  Mask:255.255.255.0
          inet6 addr: fe80::92e2:baff:fe07:18a5/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
...

lo        Link encap:Local Loopback  
          inet addr:127.0.0.1  Mask:255.0.0.0
          inet6 addr: ::1/128 Scope:Host
          UP LOOPBACK RUNNING  MTU:16436  Metric:1
...

venet0    Link encap:UNSPEC  HWaddr 00-00-00-00-00-00-00-00-00-00-00-00-00-00-00-00  
          inet6 addr: fe80::1/128 Scope:Link
          UP BROADCAST POINTOPOINT RUNNING NOARP  MTU:1500  Metric:1
...

vmbr0     Link encap:Ethernet  HWaddr d0:67:e5:ef:26:be  
          inet addr:192.168.1.3  Bcast:192.168.1.255  Mask:255.255.255.0
          inet6 addr: fe80::d267:e5ff:feef:26be/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
...

Code:
root@FR-PM-PROX03-PRD:~# /etc/init.d/apache2 restart
Restarting web server: apache2 ... waiting .

Code:
root@FR-PM-PROX03-PRD:~# /etc/init.d/apache2 restart
Restarting web server: apache2 ... waiting .
root@FR-PM-PROX03-PRD:~# ifconfig^C
root@FR-PM-PROX03-PRD:~# mount
/dev/mapper/pve-root on / type ext3 (rw,errors=remount-ro)
tmpfs on /lib/init/rw type tmpfs (rw,nosuid,mode=0755)
proc on /proc type proc (rw,noexec,nosuid,nodev)
sysfs on /sys type sysfs (rw,noexec,nosuid,nodev)
udev on /dev type tmpfs (rw,mode=0755)
tmpfs on /dev/shm type tmpfs (rw,nosuid,nodev)
devpts on /dev/pts type devpts (rw,noexec,nosuid,gid=5,mode=620)
/dev/mapper/pve-data on /var/lib/vz type ext3 (rw)
/dev/sda1 on /boot type ext3 (rw)
fusectl on /sys/fs/fuse/connections type fusectl (rw)
/dev/fuse on /etc/pve type fuse (rw,nosuid,nodev,default_permissions,allow_other)
none on /sys/kernel/config type configfs (rw)
beancounter on /proc/vz/beancounter type cgroup (rw,name=beancounter)
container on /proc/vz/container type cgroup (rw,name=container)
fairsched on /proc/vz/fairsched type cgroup (rw,name=fairsched)



SERVER 2

Code:
root@FR-PM-PROX04-PRD:~# hostname
FR-PM-PROX04-PRD

Code:
root@FR-PM-PROX04-PRD:~# cat /etc/hosts
127.0.0.1 localhost.localdomain localhost
172.20.75.4 FR-PM-PROX04-PRD.COMPANY.ORG FR-PM-PROX04-PRD pvelocalhost

# The following lines are desirable for IPv6 capable hosts

::1     ip6-localhost ip6-loopback
fe00::0 ip6-localnet
ff00::0 ip6-mcastprefix
ff02::1 ip6-allnodes
ff02::2 ip6-allrouters
ff02::3 ip6-allhosts

Code:
root@FR-PM-PROX04-PRD:~# pvecm nodes
Node  Sts   Inc   Joined               Name
   1   M     20   2012-03-23 14:31:54  FR-PM-PROX03-PRD
   2   M     16   2012-03-23 14:31:54  FR-PM-PROX04-PRD

Code:
root@FR-PM-PROX04-PRD:~# pvecm status
Version: 6.2.0
Config Version: 2
Cluster Name: PROXMOX
Cluster Id: 10266
Cluster Member: Yes
Cluster Generation: 20
Membership state: Cluster-Member
Nodes: 2
Expected votes: 2
Total votes: 2
Node votes: 1
Quorum: 2  
Active subsystems: 5
Flags: 
Ports Bound: 0  
Node name: FR-PM-PROX04-PRD
Node ID: 2
Multicast addresses: 239.192.40.66 
Node addresses: 172.20.75.4

Code:
root@FR-PM-PROX04-PRD:~# pmxcfs -f
[main] notice: unable to aquire pmxcfs lock - trying again
[main] crit: unable to aquire pmxcfs lock: Resource temporarily unavailable
[main] notice: exit proxmox configuration filesystem (-1)

Code:
root@FR-PM-PROX04-PRD:~# ifconfig
eth0      Link encap:Ethernet  HWaddr d0:67:e5:ef:25:05  
          UP BROADCAST MULTICAST  MTU:1500  Metric:1
...

eth6      Link encap:Ethernet  HWaddr 90:e2:ba:07:19:44  
          inet addr:172.20.75.4  Bcast:172.20.75.255  Mask:255.255.255.0
          inet6 addr: fe80::92e2:baff:fe07:1944/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
...

eth7      Link encap:Ethernet  HWaddr 90:e2:ba:07:19:45  
          inet addr:172.20.70.112  Bcast:172.20.70.255  Mask:255.255.255.0
          inet6 addr: fe80::92e2:baff:fe07:1945/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
...

lo        Link encap:Local Loopback  
          inet addr:127.0.0.1  Mask:255.0.0.0
          inet6 addr: ::1/128 Scope:Host
          UP LOOPBACK RUNNING  MTU:16436  Metric:1
...

venet0    Link encap:UNSPEC  HWaddr 00-00-00-00-00-00-00-00-00-00-00-00-00-00-00-00  
          inet6 addr: fe80::1/128 Scope:Link
          UP BROADCAST POINTOPOINT RUNNING NOARP  MTU:1500  Metric:1
...

vmbr0     Link encap:Ethernet  HWaddr d0:67:e5:ef:25:05  
          inet addr:192.168.1.4  Bcast:192.168.1.255  Mask:255.255.255.0
          inet6 addr: fe80::d267:e5ff:feef:2505/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
...

Code:
root@FR-PM-PROX04-PRD:~# /etc/init.d/apache2 restart
Syntax error on line 13 of /etc/apache2/sites-enabled/pve-redirect.conf:
SSLCertificateFile: file '/etc/pve/local/pve-ssl.pem' does not exist or is empty
Action 'configtest' failed.
The Apache error log may have more information.
 failed!

Code:
root@FR-PM-PROX04-PRD:~# mount
/dev/mapper/pve-root on / type ext3 (rw,errors=remount-ro)
tmpfs on /lib/init/rw type tmpfs (rw,nosuid,mode=0755)
proc on /proc type proc (rw,noexec,nosuid,nodev)
sysfs on /sys type sysfs (rw,noexec,nosuid,nodev)
udev on /dev type tmpfs (rw,mode=0755)
tmpfs on /dev/shm type tmpfs (rw,nosuid,nodev)
devpts on /dev/pts type devpts (rw,noexec,nosuid,gid=5,mode=620)
/dev/mapper/pve-data on /var/lib/vz type ext3 (rw)
/dev/sda1 on /boot type ext3 (rw)
fusectl on /sys/fs/fuse/connections type fusectl (rw)
/dev/fuse on /etc/pve type fuse (rw,nosuid,nodev,default_permissions,allow_other)
none on /sys/kernel/config type configfs (rw)
beancounter on /proc/vz/beancounter type cgroup (rw,name=beancounter)
container on /proc/vz/container type cgroup (rw,name=container)
fairsched on /proc/vz/fairsched type cgroup (rw,name=fairsched)


As I did it from scratch, and following the wiki to set up the cluster, what's going wrong ? Is it because I changed the interface ?

Thank you for any hints.
 
You already tried to restart that node?

What is the output of

# /etc/init.d/pve-cluster stop
# /etc/init.d/pve-cluster start
 
I tried to reboot both nodes already

Code:
root@FR-PM-PROX03-PRD:~# /etc/init.d/pve-cluster stop
Stopping pve cluster filesystem: pve-cluster.
root@FR-PM-PROX03-PRD:~# /etc/init.d/pve-cluster start
Starting pve cluster filesystem : pve-cluster.

Code:
root@FR-PM-PROX04-PRD:~# /etc/init.d/pve-cluster stop
Stopping pve cluster filesystem: pve-cluster.
root@FR-PM-PROX04-PRD:~# /etc/init.d/pve-cluster start
Starting pve cluster filesystem : pve-cluster.

It has no visible effect.

I did it first on the 1st server, then on the 2nd.
 
Ok my problem is partly solved :

After restarting the cluster, it seems that /etc/pve was properly mounted on the 2nd server, so /etc/pve/local/pve-ssl* were readable, so apache started, making the whole web interface working.
Theses ssl files are missing on the 1st server, but apache doesn't seem to worry about, even if I restart it.

But I still have the raised error with pmxcfs -f

Anyway, I would like to know what's going on, as this cluster should be in production soon.
 
Theses ssl files are missing on the 1st server, but apache doesn't seem to worry about, even if I restart it.

Are you sure (I do not understand how the interface can work without those files)?

But I still have the raised error with pmxcfs -f

Why do you run that command? (That is started win /etc/init.d/pve-cluster, so the filesystem already runs).
 
Dietmar,

Unfortunately I saw your reply to late.

I am pretty sure that Friday theses files were not on the server, but now they are.

Is /etc/pve a kind of shared filesystem between the two nodes, and may be available on both nodes after a sync time ?
 
Is /etc/pve a kind of shared filesystem between the two nodes, and may be available on both nodes after a sync time ?

Yes, that is a distributed file system. If cluster communication works, all nodes see the same data.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!