Hi,
I've 3 physical servers in proxmox cluster. In order to test, I shut down brutaly 2 nodes to see the result after reboot. But now, i've only three services which are launched. Pveproxy.service does not listening on port 8006 for instance :
Logs :
I don't know what is wrong ? How to repair the cluster ?
In addition :
Any idea please ?
EDIT : I discover this :
Thanks you
I've 3 physical servers in proxmox cluster. In order to test, I shut down brutaly 2 nodes to see the result after reboot. But now, i've only three services which are launched. Pveproxy.service does not listening on port 8006 for instance :
Code:
17:01:32 ~ # netstat -tupln root@px-node-1
Active Internet connections (only servers)
Proto Recv-Q Send-Q Local Address Foreign Address State PID/Program name
tcp 0 0 0.0.0.0:47764 0.0.0.0:* LISTEN 1139/rpc.statd
tcp 0 0 0.0.0.0:22 0.0.0.0:* LISTEN 1311/sshd
tcp 0 0 127.0.0.1:25 0.0.0.0:* LISTEN 1487/master
tcp 0 0 10.52.1.11:6789 0.0.0.0:* LISTEN 1306/ceph-mon
tcp 0 0 0.0.0.0:111 0.0.0.0:* LISTEN 1127/rpcbind
tcp 0 0 10.52.1.11:6800 0.0.0.0:* LISTEN 1755/ceph-osd
tcp 0 0 10.52.1.11:6801 0.0.0.0:* LISTEN 1755/ceph-osd
tcp 0 0 10.52.1.11:6802 0.0.0.0:* LISTEN 1755/ceph-osd
tcp 0 0 10.52.1.11:6803 0.0.0.0:* LISTEN 1755/ceph-osd
tcp6 0 0 :::22 :::* LISTEN 1311/sshd
tcp6 0 0 ::1:25 :::* LISTEN 1487/master
tcp6 0 0 :::32862 :::* LISTEN 1139/rpc.statd
tcp6 0 0 :::111 :::* LISTEN 1127/rpcbind
udp 0 0 0.0.0.0:56808 0.0.0.0:* 1139/rpc.statd
udp 0 0 0.0.0.0:57482 0.0.0.0:* 948/systemd-timesyn
udp 0 0 0.0.0.0:111 0.0.0.0:* 1127/rpcbind
udp 0 0 0.0.0.0:864 0.0.0.0:* 1127/rpcbind
udp 0 0 127.0.0.1:891 0.0.0.0:* 1139/rpc.statd
udp 0 0 10.51.0.11:5404 0.0.0.0:* 1537/corosync
udp 0 0 239.192.162.87:5405 0.0.0.0:* 1537/corosync
udp 0 0 10.51.0.11:5405 0.0.0.0:* 1537/corosync
udp6 0 0 :::39754 :::* 1139/rpc.statd
udp6 0 0 :::111 :::* 1127/rpcbind
udp6 0 0 :::864 :::* 1127/rpcbind
Logs :
Code:
17:02:20 ~ # systemctl status -l pvedaemon root@px-node-1
● pvedaemon.service - LSB: PVE Daemon
Loaded: loaded (/etc/init.d/pvedaemon)
Active: active (exited) since Mon 2017-05-22 16:58:49 CEST; 4min 17s ago
May 22 16:58:49 px-node-1 systemd[1]: Started LSB: PVE Daemon.
------------------------------------------------------------
17:03:06 ~ # systemctl status -l pveproxy root@px-node-1
● pveproxy.service - LSB: PVE API Proxy Server
Loaded: loaded (/etc/init.d/pveproxy)
Active: active (exited) since Mon 2017-05-22 16:58:49 CEST; 5min ago
May 22 16:58:49 px-node-1 systemd[1]: Started LSB: PVE API Proxy Server.
------------------------------------------------------------
17:03:51 ~ # systemctl status -l pve-cluster root@px-node-1
● pve-cluster.service - The Proxmox VE cluster filesystem
Loaded: loaded (/lib/systemd/system/pve-cluster.service; enabled)
Active: active (running) since Mon 2017-05-22 17:01:27 CEST; 3min 0s ago
Process: 2126 ExecStartPost=/usr/bin/pvecm updatecerts --silent (code=exited, status=0/SUCCESS)
Process: 2122 ExecStart=/usr/bin/pmxcfs $DAEMON_OPTS (code=exited, status=0/SUCCESS)
Main PID: 2124 (pmxcfs)
CGroup: /system.slice/pve-cluster.service
└─2124 /usr/bin/pmxcfs
May 22 17:01:26 px-node-1 pmxcfs[2124]: [status] notice: starting data syncronisation
May 22 17:01:26 px-node-1 pmxcfs[2124]: [status] notice: received sync request (epoch 1/2124/00000001)
May 22 17:01:26 px-node-1 pmxcfs[2124]: [dcdb] notice: received all states
May 22 17:01:26 px-node-1 pmxcfs[2124]: [dcdb] notice: leader is 2/1493
May 22 17:01:26 px-node-1 pmxcfs[2124]: [dcdb] notice: synced members: 2/1493, 3/1515
May 22 17:01:26 px-node-1 pmxcfs[2124]: [dcdb] notice: waiting for updates from leader
May 22 17:01:26 px-node-1 pmxcfs[2124]: [status] notice: received all states
May 22 17:01:26 px-node-1 pmxcfs[2124]: [status] notice: all data is up to date
May 22 17:01:26 px-node-1 pmxcfs[2124]: [dcdb] notice: update complete - trying to commit (got 2 inode updates)
May 22 17:01:26 px-node-1 pmxcfs[2124]: [dcdb] notice: all data is up to date
May 22 17:01:27 px-node-1 systemd[1]: Started The Proxmox VE cluster filesystem.
17:04:52 ~ # pvecm status root@px-node-1
Quorum information
------------------
Date: Mon May 22 17:04:55 2017
Quorum provider: corosync_votequorum
Nodes: 3
Node ID: 0x00000001
Ring ID: 1/808
Quorate: Yes
Votequorum information
----------------------
Expected votes: 3
Highest expected: 3
Total votes: 3
Quorum: 2
Flags: Quorate
Membership information
----------------------
Nodeid Votes Name
0x00000001 1 10.51.0.11 (local)
0x00000002 1 10.51.0.12
0x00000003 1 10.51.0.13
17:04:55 ~ # tail -n 20 /var/log/syslog root@px-node-1
May 22 17:01:26 px-node-1 pmxcfs[2124]: [dcdb] notice: received all states
May 22 17:01:26 px-node-1 pmxcfs[2124]: [dcdb] notice: leader is 2/1493
May 22 17:01:26 px-node-1 pmxcfs[2124]: [dcdb] notice: synced members: 2/1493, 3/1515
May 22 17:01:26 px-node-1 pmxcfs[2124]: [dcdb] notice: waiting for updates from leader
May 22 17:01:26 px-node-1 pmxcfs[2124]: [status] notice: received all states
May 22 17:01:26 px-node-1 pmxcfs[2124]: [status] notice: all data is up to date
May 22 17:01:26 px-node-1 pmxcfs[2124]: [dcdb] notice: update complete - trying to commit (got 2 inode updates)
May 22 17:01:26 px-node-1 pmxcfs[2124]: [dcdb] notice: all data is up to date
May 22 17:01:27 px-node-1 systemd[1]: Started The Proxmox VE cluster filesystem.
May 22 17:01:27 px-node-1 systemd[1]: Starting PVE activate Ceph OSD disks...
May 22 17:01:27 px-node-1 kernel: [ 179.935215] XFS (sdc1): Filesystem has duplicate UUID 0de83491-2068-444f-a4c1-9aec356d9e68 - can't mount
May 22 17:01:27 px-node-1 ceph-disk[2132]: mount: wrong fs type, bad option, bad superblock on /dev/sdc1,
May 22 17:01:27 px-node-1 ceph-disk[2132]: missing codepage or helper program, or other error
May 22 17:01:27 px-node-1 ceph-disk[2132]: In some cases useful info is found in syslog - try
May 22 17:01:27 px-node-1 ceph-disk[2132]: dmesg | tail or so.
May 22 17:01:27 px-node-1 ceph-disk[2132]: ceph-disk: Mounting filesystem failed: Command '['/bin/mount', '-t', u'xfs', '-o', 'noatime,inode64', '--', '/dev/disk/by-parttypeuuid/4fbd7e29-9d25-41b8-afd0-062c0ceff05d.e5898c81-b53b-4142-8e62-bdbe2c82b617', '/var/lib/ceph/tmp/mnt.FCxl2X']' returned non-zero exit status 32
May 22 17:01:27 px-node-1 ceph-disk[2132]: ceph-disk: Error: One or more partitions failed to activate
May 22 17:01:27 px-node-1 systemd[1]: ceph.service: main process exited, code=exited, status=1/FAILURE
May 22 17:01:27 px-node-1 systemd[1]: Failed to start PVE activate Ceph OSD disks.
May 22 17:01:27 px-node-1 systemd[1]: Unit ceph.service entered failed state.
I don't know what is wrong ? How to repair the cluster ?
In addition :
Code:
17:09:17 /var/log # pveversion -v root@px-node-1
zsh: command not found: pveversion
Any idea please ?
EDIT : I discover this :
Code:
17:12:28 /var/log # systemctl status pvenetcommit.service -l root@px-node-1
● pvenetcommit.service
Loaded: not-found (Reason: No such file or directory)
Active: inactive (dead)
May 22 16:59:08 px-node-1 systemd[1]: Cannot add dependency job for unit pvenetcommit.service, ignoring: Unit pvenetcommit.service failed to load: No such file or directory.
May 22 16:59:08 px-node-1 systemd[1]: Cannot add dependency job for unit pvenetcommit.service, ignoring: Unit pvenetcommit.service failed to load: No such file or directory.
. . . . .
May 22 17:08:20 px-node-1 systemd[1]: Cannot add dependency job for unit pvenetcommit.service, ignoring: Unit pvenetcommit.service failed to load: No such file or directory.
May 22 17:08:20 px-node-1 systemd[1]: Cannot add dependency job for unit pvenetcommit.service, ignoring: Unit pvenetcommit.service failed to load: No such file or directory.
May 22 17:08:20 px-node-1 systemd[1]: Cannot add dependency job for unit pvenetcommit.service, ignoring: Unit pvenetcommit.service failed to load: No such file or directory.
May 22 17:08:20 px-node-1 systemd[1]: Cannot add dependency job for unit pvenetcommit.service, ignoring: Unit pvenetcommit.service failed to load: No such file or directory.
Thanks you