Cluster Proxmox VE - services failed

Neb

Well-Known Member
Apr 27, 2017
35
0
46
29
Hi,

I've 3 physical servers in proxmox cluster. In order to test, I shut down brutaly 2 nodes to see the result after reboot. But now, i've only three services which are launched. Pveproxy.service does not listening on port 8006 for instance :

Code:
17:01:32 ~ # netstat -tupln                                                               root@px-node-1
Active Internet connections (only servers)
Proto Recv-Q Send-Q Local Address           Foreign Address         State       PID/Program name
tcp        0      0 0.0.0.0:47764           0.0.0.0:*               LISTEN      1139/rpc.statd
tcp        0      0 0.0.0.0:22              0.0.0.0:*               LISTEN      1311/sshd      
tcp        0      0 127.0.0.1:25            0.0.0.0:*               LISTEN      1487/master    
tcp        0      0 10.52.1.11:6789         0.0.0.0:*               LISTEN      1306/ceph-mon  
tcp        0      0 0.0.0.0:111             0.0.0.0:*               LISTEN      1127/rpcbind  
tcp        0      0 10.52.1.11:6800         0.0.0.0:*               LISTEN      1755/ceph-osd  
tcp        0      0 10.52.1.11:6801         0.0.0.0:*               LISTEN      1755/ceph-osd  
tcp        0      0 10.52.1.11:6802         0.0.0.0:*               LISTEN      1755/ceph-osd  
tcp        0      0 10.52.1.11:6803         0.0.0.0:*               LISTEN      1755/ceph-osd  
tcp6       0      0 :::22                   :::*                    LISTEN      1311/sshd      
tcp6       0      0 ::1:25                  :::*                    LISTEN      1487/master    
tcp6       0      0 :::32862                :::*                    LISTEN      1139/rpc.statd
tcp6       0      0 :::111                  :::*                    LISTEN      1127/rpcbind  
udp        0      0 0.0.0.0:56808           0.0.0.0:*                           1139/rpc.statd
udp        0      0 0.0.0.0:57482           0.0.0.0:*                           948/systemd-timesyn
udp        0      0 0.0.0.0:111             0.0.0.0:*                           1127/rpcbind  
udp        0      0 0.0.0.0:864             0.0.0.0:*                           1127/rpcbind  
udp        0      0 127.0.0.1:891           0.0.0.0:*                           1139/rpc.statd
udp        0      0 10.51.0.11:5404         0.0.0.0:*                           1537/corosync  
udp        0      0 239.192.162.87:5405     0.0.0.0:*                           1537/corosync  
udp        0      0 10.51.0.11:5405         0.0.0.0:*                           1537/corosync  
udp6       0      0 :::39754                :::*                                1139/rpc.statd
udp6       0      0 :::111                  :::*                                1127/rpcbind  
udp6       0      0 :::864                  :::*                                1127/rpcbind

Logs :

Code:
17:02:20 ~ # systemctl status -l pvedaemon                                                root@px-node-1
● pvedaemon.service - LSB: PVE Daemon
   Loaded: loaded (/etc/init.d/pvedaemon)
   Active: active (exited) since Mon 2017-05-22 16:58:49 CEST; 4min 17s ago

May 22 16:58:49 px-node-1 systemd[1]: Started LSB: PVE Daemon.
------------------------------------------------------------
17:03:06 ~ # systemctl status -l pveproxy                                                 root@px-node-1
● pveproxy.service - LSB: PVE API Proxy Server
   Loaded: loaded (/etc/init.d/pveproxy)
   Active: active (exited) since Mon 2017-05-22 16:58:49 CEST; 5min ago

May 22 16:58:49 px-node-1 systemd[1]: Started LSB: PVE API Proxy Server.
------------------------------------------------------------
17:03:51 ~ # systemctl status -l pve-cluster                                              root@px-node-1
● pve-cluster.service - The Proxmox VE cluster filesystem
   Loaded: loaded (/lib/systemd/system/pve-cluster.service; enabled)
   Active: active (running) since Mon 2017-05-22 17:01:27 CEST; 3min 0s ago
  Process: 2126 ExecStartPost=/usr/bin/pvecm updatecerts --silent (code=exited, status=0/SUCCESS)
  Process: 2122 ExecStart=/usr/bin/pmxcfs $DAEMON_OPTS (code=exited, status=0/SUCCESS)
 Main PID: 2124 (pmxcfs)
   CGroup: /system.slice/pve-cluster.service
           └─2124 /usr/bin/pmxcfs

May 22 17:01:26 px-node-1 pmxcfs[2124]: [status] notice: starting data syncronisation
May 22 17:01:26 px-node-1 pmxcfs[2124]: [status] notice: received sync request (epoch 1/2124/00000001)
May 22 17:01:26 px-node-1 pmxcfs[2124]: [dcdb] notice: received all states
May 22 17:01:26 px-node-1 pmxcfs[2124]: [dcdb] notice: leader is 2/1493
May 22 17:01:26 px-node-1 pmxcfs[2124]: [dcdb] notice: synced members: 2/1493, 3/1515
May 22 17:01:26 px-node-1 pmxcfs[2124]: [dcdb] notice: waiting for updates from leader
May 22 17:01:26 px-node-1 pmxcfs[2124]: [status] notice: received all states
May 22 17:01:26 px-node-1 pmxcfs[2124]: [status] notice: all data is up to date
May 22 17:01:26 px-node-1 pmxcfs[2124]: [dcdb] notice: update complete - trying to commit (got 2 inode updates)
May 22 17:01:26 px-node-1 pmxcfs[2124]: [dcdb] notice: all data is up to date
May 22 17:01:27 px-node-1 systemd[1]: Started The Proxmox VE cluster filesystem.

17:04:52 ~ # pvecm status                                                                 root@px-node-1
Quorum information
------------------
Date:             Mon May 22 17:04:55 2017
Quorum provider:  corosync_votequorum
Nodes:            3
Node ID:          0x00000001
Ring ID:          1/808
Quorate:          Yes

Votequorum information
----------------------
Expected votes:   3
Highest expected: 3
Total votes:      3
Quorum:           2 
Flags:            Quorate

Membership information
----------------------
    Nodeid      Votes Name
0x00000001          1 10.51.0.11 (local)
0x00000002          1 10.51.0.12
0x00000003          1 10.51.0.13

17:04:55 ~ # tail -n 20 /var/log/syslog                                                   root@px-node-1
May 22 17:01:26 px-node-1 pmxcfs[2124]: [dcdb] notice: received all states
May 22 17:01:26 px-node-1 pmxcfs[2124]: [dcdb] notice: leader is 2/1493
May 22 17:01:26 px-node-1 pmxcfs[2124]: [dcdb] notice: synced members: 2/1493, 3/1515
May 22 17:01:26 px-node-1 pmxcfs[2124]: [dcdb] notice: waiting for updates from leader
May 22 17:01:26 px-node-1 pmxcfs[2124]: [status] notice: received all states
May 22 17:01:26 px-node-1 pmxcfs[2124]: [status] notice: all data is up to date
May 22 17:01:26 px-node-1 pmxcfs[2124]: [dcdb] notice: update complete - trying to commit (got 2 inode updates)
May 22 17:01:26 px-node-1 pmxcfs[2124]: [dcdb] notice: all data is up to date
May 22 17:01:27 px-node-1 systemd[1]: Started The Proxmox VE cluster filesystem.
May 22 17:01:27 px-node-1 systemd[1]: Starting PVE activate Ceph OSD disks...
May 22 17:01:27 px-node-1 kernel: [  179.935215] XFS (sdc1): Filesystem has duplicate UUID 0de83491-2068-444f-a4c1-9aec356d9e68 - can't mount
May 22 17:01:27 px-node-1 ceph-disk[2132]: mount: wrong fs type, bad option, bad superblock on /dev/sdc1,
May 22 17:01:27 px-node-1 ceph-disk[2132]: missing codepage or helper program, or other error
May 22 17:01:27 px-node-1 ceph-disk[2132]: In some cases useful info is found in syslog - try
May 22 17:01:27 px-node-1 ceph-disk[2132]: dmesg | tail or so.
May 22 17:01:27 px-node-1 ceph-disk[2132]: ceph-disk: Mounting filesystem failed: Command '['/bin/mount', '-t', u'xfs', '-o', 'noatime,inode64', '--', '/dev/disk/by-parttypeuuid/4fbd7e29-9d25-41b8-afd0-062c0ceff05d.e5898c81-b53b-4142-8e62-bdbe2c82b617', '/var/lib/ceph/tmp/mnt.FCxl2X']' returned non-zero exit status 32
May 22 17:01:27 px-node-1 ceph-disk[2132]: ceph-disk: Error: One or more partitions failed to activate
May 22 17:01:27 px-node-1 systemd[1]: ceph.service: main process exited, code=exited, status=1/FAILURE
May 22 17:01:27 px-node-1 systemd[1]: Failed to start PVE activate Ceph OSD disks.
May 22 17:01:27 px-node-1 systemd[1]: Unit ceph.service entered failed state.

I don't know what is wrong ? How to repair the cluster ?

In addition :
Code:
17:09:17 /var/log # pveversion -v                                                         root@px-node-1
zsh: command not found: pveversion

Any idea please ?

EDIT : I discover this :

Code:
17:12:28 /var/log # systemctl status pvenetcommit.service -l                              root@px-node-1
● pvenetcommit.service
   Loaded: not-found (Reason: No such file or directory)
   Active: inactive (dead)

May 22 16:59:08 px-node-1 systemd[1]: Cannot add dependency job for unit pvenetcommit.service, ignoring: Unit pvenetcommit.service failed to load: No such file or directory.
May 22 16:59:08 px-node-1 systemd[1]: Cannot add dependency job for unit pvenetcommit.service, ignoring: Unit pvenetcommit.service failed to load: No such file or directory.
. . . . .
May 22 17:08:20 px-node-1 systemd[1]: Cannot add dependency job for unit pvenetcommit.service, ignoring: Unit pvenetcommit.service failed to load: No such file or directory.
May 22 17:08:20 px-node-1 systemd[1]: Cannot add dependency job for unit pvenetcommit.service, ignoring: Unit pvenetcommit.service failed to load: No such file or directory.
May 22 17:08:20 px-node-1 systemd[1]: Cannot add dependency job for unit pvenetcommit.service, ignoring: Unit pvenetcommit.service failed to load: No such file or directory.
May 22 17:08:20 px-node-1 systemd[1]: Cannot add dependency job for unit pvenetcommit.service, ignoring: Unit pvenetcommit.service failed to load: No such file or directory.

Thanks you
 
Ok ... I find what is it about. I installed 'vlan' package, and this uninstalled the 'pve-manager' package. Idiot.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!