Guten Abend,
eigentlich habe ich schon wesentlich komplexere Setups hinter mir, aber bei mir zuhause habe ich ein sehr spannendes Symptom:
Kurz zur Topologie:
hpve1: Cluster Master (Cluster: 'HomeCluster', LACP Bond, 192.168.1.201)
hpve2: Cluster Node (LACP Bond, 192.168.1.202)
Wenn ich an beiden Systemen den Corosync dienst neu starte sieht alles für 2-3 Minuten ganz okay aus, aber dann verabschiedet sich immer mein hpve2 vom Cluster.
Folgendes konnte ich im Syslog finden, nur wie kann ich das Problem lösen?
hpve1:
hpve2
Liegt es wirklich daran, dass er am Node den einen ZFS Pool nicht findet?
Ich vermute die Ursache am Master, aber wo fange ich hier an?
mfg,
René
eigentlich habe ich schon wesentlich komplexere Setups hinter mir, aber bei mir zuhause habe ich ein sehr spannendes Symptom:
Kurz zur Topologie:
hpve1: Cluster Master (Cluster: 'HomeCluster', LACP Bond, 192.168.1.201)
hpve2: Cluster Node (LACP Bond, 192.168.1.202)
Wenn ich an beiden Systemen den Corosync dienst neu starte sieht alles für 2-3 Minuten ganz okay aus, aber dann verabschiedet sich immer mein hpve2 vom Cluster.
Folgendes konnte ich im Syslog finden, nur wie kann ich das Problem lösen?
hpve1:
Code:
Jul 20 01:04:01 hpve1 systemd[1]: Starting Proxmox VE replication runner...
Jul 20 01:04:15 hpve1 systemd[1]: Started Proxmox VE replication runner.
Jul 20 01:04:17 hpve1 pvestatd[4780]: status update time (7.024 seconds)
Jul 20 01:04:27 hpve1 corosync[22657]: error [TOTEM ] FAILED TO RECEIVE
Jul 20 01:04:27 hpve1 corosync[22657]: [TOTEM ] FAILED TO RECEIVE
Jul 20 01:04:28 hpve1 corosync[22657]: notice [TOTEM ] A new membership (192.168.1.201:3444) was formed. Members left: 2
Jul 20 01:04:28 hpve1 corosync[22657]: notice [TOTEM ] Failed to receive the leave message. failed: 2
Jul 20 01:04:28 hpve1 corosync[22657]: notice [QUORUM] This node is within the non-primary component and will NOT provide any services.
Jul 20 01:04:28 hpve1 corosync[22657]: notice [QUORUM] Members[1]: 1
Jul 20 01:04:28 hpve1 corosync[22657]: notice [MAIN ] Completed service synchronization, ready to provide service.
Jul 20 01:04:28 hpve1 corosync[22657]: [TOTEM ] A new membership (192.168.1.201:3444) was formed. Members left: 2
Jul 20 01:04:28 hpve1 corosync[22657]: [TOTEM ] Failed to receive the leave message. failed: 2
Jul 20 01:04:28 hpve1 corosync[22657]: [QUORUM] This node is within the non-primary component and will NOT provide any services.
Jul 20 01:04:28 hpve1 corosync[22657]: [QUORUM] Members[1]: 1
Jul 20 01:04:28 hpve1 corosync[22657]: [MAIN ] Completed service synchronization, ready to provide service.
Jul 20 01:04:28 hpve1 pmxcfs[12597]: [status] notice: node lost quorum
Jul 20 01:04:28 hpve1 pmxcfs[12597]: [dcdb] notice: members: 1/12597
Jul 20 01:04:28 hpve1 pmxcfs[12597]: [status] notice: members: 1/12597
Jul 20 01:04:28 hpve1 pmxcfs[12597]: [dcdb] crit: received write while not quorate - trigger resync
Jul 20 01:04:28 hpve1 pmxcfs[12597]: [dcdb] crit: leaving CPG group
Jul 20 01:04:28 hpve1 pve-ha-lrm[4805]: unable to write lrm status file - unable to open file '/etc/pve/nodes/hpve1/lrm_status.tmp.4805' - Permission denied
Jul 20 01:04:29 hpve1 pmxcfs[12597]: [dcdb] notice: start cluster connection
Jul 20 01:04:29 hpve1 pmxcfs[12597]: [dcdb] notice: members: 1/12597
Jul 20 01:04:29 hpve1 pmxcfs[12597]: [dcdb] notice: all data is up to date
Jul 20 01:05:00 hpve1 systemd[1]: Starting Proxmox VE replication runner...
Jul 20 01:05:04 hpve1 systemd[1]: Started Proxmox VE replication runner.
hpve2
Code:
Jul 20 01:04:27 hpve2 corosync[3908]: [TOTEM ] Retransmit List: 3a5 3a6 3a7
Jul 20 01:04:27 hpve2 corosync[3908]: [TOTEM ] Retransmit List: 3a5 3a6 3a7
Jul 20 01:04:27 hpve2 corosync[3908]: [TOTEM ] Retransmit List: 3a5 3a6 3a7
Jul 20 01:04:27 hpve2 corosync[3908]: [TOTEM ] A new membership (192.168.1.202:3444) was formed. Members left: 1
Jul 20 01:04:27 hpve2 corosync[3908]: [TOTEM ] Failed to receive the leave message. failed: 1
Jul 20 01:04:27 hpve2 corosync[3908]: [QUORUM] This node is within the non-primary component and will NOT provide any services.
Jul 20 01:04:27 hpve2 corosync[3908]: [QUORUM] Members[1]: 2
Jul 20 01:04:27 hpve2 corosync[3908]: [MAIN ] Completed service synchronization, ready to provide service.
Jul 20 01:04:28 hpve2 pvestatd[3146]: could not activate storage 'LocalSpace', zfs error: cannot import 'rpool': no such pool available
Jul 20 01:04:28 hpve2 corosync[3908]: notice [TOTEM ] A new membership (192.168.1.202:3448) was formed. Members
Jul 20 01:04:28 hpve2 corosync[3908]: [TOTEM ] A new membership (192.168.1.202:3448) was formed. Members
Jul 20 01:04:28 hpve2 corosync[3908]: notice [QUORUM] Members[1]: 2
Jul 20 01:04:28 hpve2 corosync[3908]: notice [MAIN ] Completed service synchronization, ready to provide service.
Jul 20 01:04:28 hpve2 corosync[3908]: [QUORUM] Members[1]: 2
Jul 20 01:04:28 hpve2 corosync[3908]: [MAIN ] Completed service synchronization, ready to provide service.
Jul 20 01:04:30 hpve2 corosync[3908]: notice [TOTEM ] A new membership (192.168.1.202:3452) was formed. Members
Jul 20 01:04:30 hpve2 corosync[3908]: [TOTEM ] A new membership (192.168.1.202:3452) was formed. Members
Jul 20 01:04:30 hpve2 corosync[3908]: notice [QUORUM] Members[1]: 2
Jul 20 01:04:30 hpve2 corosync[3908]: notice [MAIN ] Completed service synchronization, ready to provide service.
Jul 20 01:04:30 hpve2 corosync[3908]: [QUORUM] Members[1]: 2
Jul 20 01:04:30 hpve2 corosync[3908]: [MAIN ] Completed service synchronization, ready to provide service.
Jul 20 01:04:31 hpve2 corosync[3908]: notice [TOTEM ] A new membership (192.168.1.202:3456) was formed. Members
Jul 20 01:04:31 hpve2 corosync[3908]: [TOTEM ] A new membership (192.168.1.202:3456) was formed. Members
Jul 20 01:04:31 hpve2 corosync[3908]: notice [QUORUM] Members[1]: 2
Jul 20 01:04:31 hpve2 corosync[3908]: notice [MAIN ] Completed service synchronization, ready to provide service.
Jul 20 01:04:31 hpve2 corosync[3908]: [QUORUM] Members[1]: 2
Jul 20 01:04:31 hpve2 corosync[3908]: [MAIN ] Completed service synchronization, ready to provide service.
Jul 20 01:04:37 hpve2 pvestatd[3146]: could not activate storage 'LocalSpace', zfs error: cannot import 'rpool': no such pool available
Jul 20 01:04:47 hpve2 pvestatd[3146]: could not activate storage 'LocalSpace', zfs error: cannot import 'rpool': no such pool available
Jul 20 01:04:57 hpve2 pvestatd[3146]: could not activate storage 'LocalSpace', zfs error: cannot import 'rpool': no such pool available
Jul 20 01:05:00 hpve2 systemd[1]: Starting Proxmox VE replication runner...
Jul 20 01:05:02 hpve2 systemd[1]: Started Proxmox VE replication runner.
Jul 20 01:05:07 hpve2 pvestatd[3146]: could not activate storage 'LocalSpace', zfs error: cannot import 'rpool': no such pool available
Liegt es wirklich daran, dass er am Node den einen ZFS Pool nicht findet?
Ich vermute die Ursache am Master, aber wo fange ich hier an?
mfg,
René