Cluster Proxmox VE - services failed

Neb

Renowned Member
Apr 27, 2017
35
0
71
30
Hi,

I've 3 physical servers in proxmox cluster. In order to test, I shut down brutaly 2 nodes to see the result after reboot. But now, i've only three services which are launched. Pveproxy.service does not listening on port 8006 for instance :

Code:
17:01:32 ~ # netstat -tupln                                                               root@px-node-1
Active Internet connections (only servers)
Proto Recv-Q Send-Q Local Address           Foreign Address         State       PID/Program name
tcp        0      0 0.0.0.0:47764           0.0.0.0:*               LISTEN      1139/rpc.statd
tcp        0      0 0.0.0.0:22              0.0.0.0:*               LISTEN      1311/sshd      
tcp        0      0 127.0.0.1:25            0.0.0.0:*               LISTEN      1487/master    
tcp        0      0 10.52.1.11:6789         0.0.0.0:*               LISTEN      1306/ceph-mon  
tcp        0      0 0.0.0.0:111             0.0.0.0:*               LISTEN      1127/rpcbind  
tcp        0      0 10.52.1.11:6800         0.0.0.0:*               LISTEN      1755/ceph-osd  
tcp        0      0 10.52.1.11:6801         0.0.0.0:*               LISTEN      1755/ceph-osd  
tcp        0      0 10.52.1.11:6802         0.0.0.0:*               LISTEN      1755/ceph-osd  
tcp        0      0 10.52.1.11:6803         0.0.0.0:*               LISTEN      1755/ceph-osd  
tcp6       0      0 :::22                   :::*                    LISTEN      1311/sshd      
tcp6       0      0 ::1:25                  :::*                    LISTEN      1487/master    
tcp6       0      0 :::32862                :::*                    LISTEN      1139/rpc.statd
tcp6       0      0 :::111                  :::*                    LISTEN      1127/rpcbind  
udp        0      0 0.0.0.0:56808           0.0.0.0:*                           1139/rpc.statd
udp        0      0 0.0.0.0:57482           0.0.0.0:*                           948/systemd-timesyn
udp        0      0 0.0.0.0:111             0.0.0.0:*                           1127/rpcbind  
udp        0      0 0.0.0.0:864             0.0.0.0:*                           1127/rpcbind  
udp        0      0 127.0.0.1:891           0.0.0.0:*                           1139/rpc.statd
udp        0      0 10.51.0.11:5404         0.0.0.0:*                           1537/corosync  
udp        0      0 239.192.162.87:5405     0.0.0.0:*                           1537/corosync  
udp        0      0 10.51.0.11:5405         0.0.0.0:*                           1537/corosync  
udp6       0      0 :::39754                :::*                                1139/rpc.statd
udp6       0      0 :::111                  :::*                                1127/rpcbind  
udp6       0      0 :::864                  :::*                                1127/rpcbind

Logs :

Code:
17:02:20 ~ # systemctl status -l pvedaemon                                                root@px-node-1
● pvedaemon.service - LSB: PVE Daemon
   Loaded: loaded (/etc/init.d/pvedaemon)
   Active: active (exited) since Mon 2017-05-22 16:58:49 CEST; 4min 17s ago

May 22 16:58:49 px-node-1 systemd[1]: Started LSB: PVE Daemon.
------------------------------------------------------------
17:03:06 ~ # systemctl status -l pveproxy                                                 root@px-node-1
● pveproxy.service - LSB: PVE API Proxy Server
   Loaded: loaded (/etc/init.d/pveproxy)
   Active: active (exited) since Mon 2017-05-22 16:58:49 CEST; 5min ago

May 22 16:58:49 px-node-1 systemd[1]: Started LSB: PVE API Proxy Server.
------------------------------------------------------------
17:03:51 ~ # systemctl status -l pve-cluster                                              root@px-node-1
● pve-cluster.service - The Proxmox VE cluster filesystem
   Loaded: loaded (/lib/systemd/system/pve-cluster.service; enabled)
   Active: active (running) since Mon 2017-05-22 17:01:27 CEST; 3min 0s ago
  Process: 2126 ExecStartPost=/usr/bin/pvecm updatecerts --silent (code=exited, status=0/SUCCESS)
  Process: 2122 ExecStart=/usr/bin/pmxcfs $DAEMON_OPTS (code=exited, status=0/SUCCESS)
 Main PID: 2124 (pmxcfs)
   CGroup: /system.slice/pve-cluster.service
           └─2124 /usr/bin/pmxcfs

May 22 17:01:26 px-node-1 pmxcfs[2124]: [status] notice: starting data syncronisation
May 22 17:01:26 px-node-1 pmxcfs[2124]: [status] notice: received sync request (epoch 1/2124/00000001)
May 22 17:01:26 px-node-1 pmxcfs[2124]: [dcdb] notice: received all states
May 22 17:01:26 px-node-1 pmxcfs[2124]: [dcdb] notice: leader is 2/1493
May 22 17:01:26 px-node-1 pmxcfs[2124]: [dcdb] notice: synced members: 2/1493, 3/1515
May 22 17:01:26 px-node-1 pmxcfs[2124]: [dcdb] notice: waiting for updates from leader
May 22 17:01:26 px-node-1 pmxcfs[2124]: [status] notice: received all states
May 22 17:01:26 px-node-1 pmxcfs[2124]: [status] notice: all data is up to date
May 22 17:01:26 px-node-1 pmxcfs[2124]: [dcdb] notice: update complete - trying to commit (got 2 inode updates)
May 22 17:01:26 px-node-1 pmxcfs[2124]: [dcdb] notice: all data is up to date
May 22 17:01:27 px-node-1 systemd[1]: Started The Proxmox VE cluster filesystem.

17:04:52 ~ # pvecm status                                                                 root@px-node-1
Quorum information
------------------
Date:             Mon May 22 17:04:55 2017
Quorum provider:  corosync_votequorum
Nodes:            3
Node ID:          0x00000001
Ring ID:          1/808
Quorate:          Yes

Votequorum information
----------------------
Expected votes:   3
Highest expected: 3
Total votes:      3
Quorum:           2 
Flags:            Quorate

Membership information
----------------------
    Nodeid      Votes Name
0x00000001          1 10.51.0.11 (local)
0x00000002          1 10.51.0.12
0x00000003          1 10.51.0.13

17:04:55 ~ # tail -n 20 /var/log/syslog                                                   root@px-node-1
May 22 17:01:26 px-node-1 pmxcfs[2124]: [dcdb] notice: received all states
May 22 17:01:26 px-node-1 pmxcfs[2124]: [dcdb] notice: leader is 2/1493
May 22 17:01:26 px-node-1 pmxcfs[2124]: [dcdb] notice: synced members: 2/1493, 3/1515
May 22 17:01:26 px-node-1 pmxcfs[2124]: [dcdb] notice: waiting for updates from leader
May 22 17:01:26 px-node-1 pmxcfs[2124]: [status] notice: received all states
May 22 17:01:26 px-node-1 pmxcfs[2124]: [status] notice: all data is up to date
May 22 17:01:26 px-node-1 pmxcfs[2124]: [dcdb] notice: update complete - trying to commit (got 2 inode updates)
May 22 17:01:26 px-node-1 pmxcfs[2124]: [dcdb] notice: all data is up to date
May 22 17:01:27 px-node-1 systemd[1]: Started The Proxmox VE cluster filesystem.
May 22 17:01:27 px-node-1 systemd[1]: Starting PVE activate Ceph OSD disks...
May 22 17:01:27 px-node-1 kernel: [  179.935215] XFS (sdc1): Filesystem has duplicate UUID 0de83491-2068-444f-a4c1-9aec356d9e68 - can't mount
May 22 17:01:27 px-node-1 ceph-disk[2132]: mount: wrong fs type, bad option, bad superblock on /dev/sdc1,
May 22 17:01:27 px-node-1 ceph-disk[2132]: missing codepage or helper program, or other error
May 22 17:01:27 px-node-1 ceph-disk[2132]: In some cases useful info is found in syslog - try
May 22 17:01:27 px-node-1 ceph-disk[2132]: dmesg | tail or so.
May 22 17:01:27 px-node-1 ceph-disk[2132]: ceph-disk: Mounting filesystem failed: Command '['/bin/mount', '-t', u'xfs', '-o', 'noatime,inode64', '--', '/dev/disk/by-parttypeuuid/4fbd7e29-9d25-41b8-afd0-062c0ceff05d.e5898c81-b53b-4142-8e62-bdbe2c82b617', '/var/lib/ceph/tmp/mnt.FCxl2X']' returned non-zero exit status 32
May 22 17:01:27 px-node-1 ceph-disk[2132]: ceph-disk: Error: One or more partitions failed to activate
May 22 17:01:27 px-node-1 systemd[1]: ceph.service: main process exited, code=exited, status=1/FAILURE
May 22 17:01:27 px-node-1 systemd[1]: Failed to start PVE activate Ceph OSD disks.
May 22 17:01:27 px-node-1 systemd[1]: Unit ceph.service entered failed state.

I don't know what is wrong ? How to repair the cluster ?

In addition :
Code:
17:09:17 /var/log # pveversion -v                                                         root@px-node-1
zsh: command not found: pveversion

Any idea please ?

EDIT : I discover this :

Code:
17:12:28 /var/log # systemctl status pvenetcommit.service -l                              root@px-node-1
● pvenetcommit.service
   Loaded: not-found (Reason: No such file or directory)
   Active: inactive (dead)

May 22 16:59:08 px-node-1 systemd[1]: Cannot add dependency job for unit pvenetcommit.service, ignoring: Unit pvenetcommit.service failed to load: No such file or directory.
May 22 16:59:08 px-node-1 systemd[1]: Cannot add dependency job for unit pvenetcommit.service, ignoring: Unit pvenetcommit.service failed to load: No such file or directory.
. . . . .
May 22 17:08:20 px-node-1 systemd[1]: Cannot add dependency job for unit pvenetcommit.service, ignoring: Unit pvenetcommit.service failed to load: No such file or directory.
May 22 17:08:20 px-node-1 systemd[1]: Cannot add dependency job for unit pvenetcommit.service, ignoring: Unit pvenetcommit.service failed to load: No such file or directory.
May 22 17:08:20 px-node-1 systemd[1]: Cannot add dependency job for unit pvenetcommit.service, ignoring: Unit pvenetcommit.service failed to load: No such file or directory.
May 22 17:08:20 px-node-1 systemd[1]: Cannot add dependency job for unit pvenetcommit.service, ignoring: Unit pvenetcommit.service failed to load: No such file or directory.

Thanks you
 
Ok ... I find what is it about. I installed 'vlan' package, and this uninstalled the 'pve-manager' package. Idiot.