Hello guys how are you?
My name is Marcos and I provide support for a small family bussines company that is runing proxmox. We have 4 Nodes and Ceph to provide High Availability.
Other day we had a energy supply failure and all the nodes went down after more then 10 hours with the No Breakes holding the load.
I wasn`t in the company and no one notified me so probably the Nodes Shoutdown sudenly
I have the following error
After that i ran: systemctl status corosync.service
Ok that wasn`t enought as we say here in brazil we are brazilians and we never quit so i tried:
journalctl -xe
Now i DONT KNOW WERE ELSE TO GO THEN HERE
So please if any one can help me this is the /etc/pve/corosyn.conf file
My name is Marcos and I provide support for a small family bussines company that is runing proxmox. We have 4 Nodes and Ceph to provide High Availability.
Other day we had a energy supply failure and all the nodes went down after more then 10 hours with the No Breakes holding the load.
I wasn`t in the company and no one notified me so probably the Nodes Shoutdown sudenly
I have the following error
Code:
pvecm status
Cannot initialize CMAP service
After that i ran: systemctl status corosync.service
Code:
● corosync.service - Corosync Cluster Engine
Loaded: loaded (/lib/systemd/system/corosync.service; enabled; vendor preset: enabled)
Active: failed (Result: timeout) since Fri 2018-11-30 20:35:41 -02; 2 days ago
Docs: man:corosync
man:corosync.conf
man:corosync_overview
Process: 11428 ExecStart=/usr/sbin/corosync -f $COROSYNC_OPTIONS (code=killed, signal=TERM)
Main PID: 11428 (code=killed, signal=TERM)
Nov 30 20:34:11 kimenz1 systemd[1]: Starting Corosync Cluster Engine...
Nov 30 20:34:11 kimenz1 corosync[11428]: [MAIN ] Corosync Cluster Engine ('2.4.2-dirty'): started and ready to provide service.
Nov 30 20:34:11 kimenz1 corosync[11428]: [MAIN ] Corosync built-in features: dbus rdma monitoring watchdog augeas systemd upstart xmlconf qdevices qnetd snmp pie relro bindnow
Nov 30 20:34:11 kimenz1 corosync[11428]: notice [MAIN ] Corosync Cluster Engine ('2.4.2-dirty'): started and ready to provide service.
Nov 30 20:34:11 kimenz1 corosync[11428]: info [MAIN ] Corosync built-in features: dbus rdma monitoring watchdog augeas systemd upstart xmlconf qdevices qnetd snmp pie relro bindnow
Nov 30 20:35:41 kimenz1 systemd[1]: corosync.service: Start operation timed out. Terminating.
Nov 30 20:35:41 kimenz1 systemd[1]: Failed to start Corosync Cluster Engine.
Nov 30 20:35:41 kimenz1 systemd[1]: corosync.service: Unit entered failed state.
Nov 30 20:35:41 kimenz1 systemd[1]: corosync.service: Failed with result 'timeout'.
Ok so we have a CoroSync failure right. I tried manually to start corosync without sucess with this
systemctl start corosync.service
[CODE]
Job for corosync.service failed because a timeout was exceeded.
See "systemctl status corosync.service" and "journalctl -xe" for details.
Ok that wasn`t enought as we say here in brazil we are brazilians and we never quit so i tried:
journalctl -xe
Code:
Dec 03 09:01:18 kimenz1 corosync[892]: info [MAIN ] Corosync built-in features: dbus rdma monitoring watchdog augeas systemd upstart xmlconf qdevices qnetd snmp pie relro bindnow
Dec 03 09:01:19 kimenz1 pvestatd[2221]: status update time (22.161 seconds)
Dec 03 09:02:00 kimenz1 systemd[1]: Starting Proxmox VE replication runner...
-- Subject: Unit pvesr.service has begun start-up
-- Defined-By: systemd
-- Support:
--
-- Unit pvesr.service has begun starting up.
Dec 03 09:02:01 kimenz1 systemd[1]: Started Proxmox VE replication runner.
-- Subject: Unit pvesr.service has finished start-up
-- Defined-By: systemd
-- Support:
--
-- Unit pvesr.service has finished starting up.
--
-- The start-up result is done.
Dec 03 09:02:04 kimenz1 pvestatd[2221]: status update time (15.132 seconds)
Dec 03 09:02:32 kimenz1 pvestatd[2221]: status update time (28.169 seconds)
Dec 03 09:02:48 kimenz1 systemd[1]: corosync.service: Start operation timed out. Terminating.
Dec 03 09:02:48 kimenz1 systemd[1]: Failed to start Corosync Cluster Engine.
-- Subject: Unit corosync.service has failed
-- Defined-By: systemd
-- Support:
--
-- Unit corosync.service has failed.
--
-- The result is failed.
Dec 03 09:02:48 kimenz1 systemd[1]: corosync.service: Unit entered failed state.
Dec 03 09:02:48 kimenz1 systemd[1]: corosync.service: Failed with result 'timeout'.
Dec 03 09:03:00 kimenz1 systemd[1]: Starting Proxmox VE replication runner...
-- Subject: Unit pvesr.service has begun start-up
-- Defined-By: systemd
-- Support:
--
-- Unit pvesr.service has begun starting up.
Dec 03 09:03:01 kimenz1 systemd[1]: Started Proxmox VE replication runner.
-- Subject: Unit pvesr.service has finished start-up
-- Defined-By: systemd
-- Support:
--
-- Unit pvesr.service has finished starting up.
--
-- The start-up result is done.
Dec 03 09:03:14 kimenz1 pvestatd[2221]: status update time (12.180 seconds)
Dec 03 09:03:29 kimenz1 pvestatd[2221]: status update time (15.220 seconds)
Dec 03 09:03:57 kimenz1 pvestatd[2221]: status update time (18.118 seconds)
Dec 03 09:04:00 kimenz1 systemd[1]: Starting Proxmox VE replication runner...
-- Subject: Unit pvesr.service has begun start-up
-- Defined-By: systemd
-- Support:
--
-- Unit pvesr.service has begun starting up.
Dec 03 09:04:01 kimenz1 systemd[1]: Started Proxmox VE replication runner.
-- Subject: Unit pvesr.service has finished start-up
-- Defined-By: systemd
-- Support:
--
-- Unit pvesr.service has finished starting up.
--
-- The start-up result is done.
lines 3553-3612/3612 (END)
Now i DONT KNOW WERE ELSE TO GO THEN HERE
So please if any one can help me this is the /etc/pve/corosyn.conf file
Code:
logging {
debug: off
to_syslog: yes
}
nodelist {
node {
name: kimenz4
nodeid: 4
quorum_votes: 1
ring0_addr: kimenz4
}
node {
name: kimenz1
nodeid: 1
quorum_votes: 1
ring0_addr: kimenz1
}
node {
name: kimenz3
nodeid: 2
quorum_votes: 1
ring0_addr: kimenz3
}
node {
name: kimenz5
nodeid: 3
quorum_votes: 1
ring0_addr: kimenz5
}
}
quorum {
provider: corosync_votequorum
}
totem {
cluster_name: kimenz
config_version: 6
ip_version: ipv4
secauth: on
version: 2
interface {
bindnetaddr: 192.168.1.161
ringnumber: 0
}
}
Last edited: