Hello,
Have had a 3 node cluster running perfectly fine for month's, recently added a 4th node to this cluster (Same hardware DL160 G9, and same configuration)
For the last few nights every night around 11:40-50 the server will reboot it self, looking in the log's I am struggling to see anything at this exact time apart from corosync retransmit notifications however these are going throughout the day.
pveversion -v
proxmox-ve: 5.0-20 (running kernel: 4.10.17-2-pve)
pve-manager: 5.0-30 (running version: 5.0-30/5ab26bc)
pve-kernel-4.10.17-2-pve: 4.10.17-20
libpve-http-server-perl: 2.0-6
lvm2: 2.02.168-pve3
corosync: 2.4.2-pve3
libqb0: 1.0.1-1
pve-cluster: 5.0-12
qemu-server: 5.0-15
pve-firmware: 2.0-2
libpve-common-perl: 5.0-16
libpve-guest-common-perl: 2.0-11
libpve-access-control: 5.0-6
libpve-storage-perl: 5.0-14
pve-libspice-server1: 0.12.8-3
vncterm: 1.5-2
pve-docs: 5.0-9
pve-qemu-kvm: 2.9.0-4
pve-container: 2.0-15
pve-firewall: 3.0-2
pve-ha-manager: 2.0-2
ksm-control-daemon: 1.2-2
glusterfs-client: 3.8.8-1
lxc-pve: 2.0.8-3
lxcfs: 2.0.7-pve4
criu: 2.11.1-1~bpo90
novnc-pve: 0.6-4
smartmontools: 6.5+svn4324-1
zfsutils-linux: 0.6.5.11-pve17~bpo90
ceph: 12.2.0-1~bpo90+1
syslog at point of reboot
Sep 1 23:36:00 cn04 systemd[1]: Starting Proxmox VE replication runner...
Sep 1 23:36:01 cn04 systemd[1]: Started Proxmox VE replication runner.
Sep 1 23:36:11 cn04 pmxcfs[1968]: [status] notice: received log
Sep 1 23:36:16 cn04 pmxcfs[1968]: [status] notice: received log
Sep 1 23:37:00 cn04 systemd[1]: Starting Proxmox VE replication runner...
Sep 1 23:37:01 cn04 systemd[1]: Started Proxmox VE replication runner.
Sep 1 23:37:11 cn04 pmxcfs[1968]: [status] notice: received log
Sep 1 23:37:16 cn04 pmxcfs[1968]: [status] notice: received log
Sep 1 23:38:00 cn04 systemd[1]: Starting Proxmox VE replication runner...
Sep 1 23:38:01 cn04 systemd[1]: Started Proxmox VE replication runner.
Sep 1 23:38:11 cn04 pmxcfs[1968]: [status] notice: received log
Sep 1 23:38:16 cn04 pmxcfs[1968]: [status] notice: received log
Sep 1 23:39:00 cn04 systemd[1]: Starting Proxmox VE replication runner...
Sep 1 23:39:01 cn04 systemd[1]: Started Proxmox VE replication runner.
Sep 1 23:39:11 cn04 pmxcfs[1968]: [status] notice: received log
Sep 1 23:39:16 cn04 pmxcfs[1968]: [status] notice: received log
Sep 1 23:40:00 cn04 systemd[1]: Starting Proxmox VE replication runner...
Sep 1 23:40:01 cn04 systemd[1]: Started Proxmox VE replication runner.
Sep 1 23:40:11 cn04 pmxcfs[1968]: [status] notice: received log
Sep 1 23:40:16 cn04 pmxcfs[1968]: [status] notice: received log
Sep 1 23:41:00 cn04 systemd[1]: Starting Proxmox VE replication runner...
Sep 1 23:41:01 cn04 systemd[1]: Started Proxmox VE replication runner.
Sep 1 23:41:11 cn04 pmxcfs[1968]: [status] notice: received log
Sep 1 23:41:16 cn04 pmxcfs[1968]: [status] notice: received log
Sep 1 23:42:00 cn04 systemd[1]: Starting Proxmox VE replication runner...
Sep 1 23:42:01 cn04 systemd[1]: Started Proxmox VE replication runner.
Sep 1 23:42:09 cn04 pmxcfs[1968]: [dcdb] notice: data verification successful
Sep 1 23:42:11 cn04 pmxcfs[1968]: [status] notice: received log
Sep 1 23:42:16 cn04 pmxcfs[1968]: [status] notice: received log
Sep 1 23:43:00 cn04 systemd[1]: Starting Proxmox VE replication runner...
Sep 1 23:43:01 cn04 systemd[1]: Started Proxmox VE replication runner.
Sep 1 23:43:11 cn04 pmxcfs[1968]: [status] notice: received log
Sep 1 23:43:16 cn04 pmxcfs[1968]: [status] notice: received log
Sep 1 23:43:49 cn04 corosync[1985]: notice [TOTEM ] Retransmit List: 2a97c5
Sep 1 23:43:49 cn04 corosync[1985]: [TOTEM ] Retransmit List: 2a97c5
Sep 1 23:44:00 cn04 systemd[1]: Starting Proxmox VE replication runner...
Sep 1 23:44:03 cn04 systemd[1]: Started Proxmox VE replication runner.
Sep 1 23:44:11 cn04 pmxcfs[1968]: [status] notice: received log
Sep 1 23:44:16 cn04 pmxcfs[1968]: [status] notice: received log
Sep 1 23:47:00 cn04 systemd-modules-load[593]: Inserted module 'iscsi_tcp'
Sep 1 23:47:00 cn04 kernel: [ 0.000000] Linux version 4.10.17-2-pve (root@nora) (gcc version 6.3.0 20170516 (Debian 6.3.0-18) ) #1 SMP PVE 4.10.17-20 (Mon, 14 Aug 2017 11:23:37 +0200) ()
Sep 1 23:47:00 cn04 systemd-modules-load[593]: Inserted module 'ib_iser'
Sep 1 23:47:00 cn04 kernel: [ 0.000000] Command line: BOOT_IMAGE=/boot/vmlinuz-4.10.17-2-pve root=/dev/mapper/pve-root ro quiet
Sep 1 23:47:00 cn04 kernel: [ 0.000000] KERNEL supported cpus:
Sep 1 23:47:00 cn04 kernel: [ 0.000000] Intel GenuineIntel
Sep 1 23:47:00 cn04 kernel: [ 0.000000] AMD AuthenticAMD
Sep 1 23:47:00 cn04 systemd-modules-load[593]: Inserted module 'vhost_net'
Sep 1 23:47:00 cn04 kernel: [ 0.000000] Centaur CentaurHauls
Sep 1 23:47:00 cn04 kernel: [ 0.000000] x86/fpu: Supporting XSAVE feature 0x001: 'x87 floating point registers'
Sep 1 23:47:00 cn04 systemd-udevd[655]: Process '/bin/mount -t fusectl fusectl /sys/fs/fuse/connections' failed with exit code 32.
Sep 1 23:47:00 cn04 kernel: [ 0.000000] x86/fpu: Supporting XSAVE feature 0x002: 'SSE registers'
Sep 1 23:47:00 cn04 kernel: [ 0.000000] x86/fpu: Supporting XSAVE feature 0x004: 'AVX registers'
Sep 1 23:47:00 cn04 kernel: [ 0.000000] x86/fpu: xstate_offset[2]: 576, xstate_sizes[2]: 256
Sep 1 23:47:00 cn04 kernel: [ 0.000000] x86/fpu: Enabled xstate features 0x7, context size is 832 bytes, using 'standard' format.
Sep 1 23:47:00 cn04 kernel: [ 0.000000] e820: BIOS-provided physical RAM map:
Sep 1 23:47:00 cn04 keyboard-setup.sh[586]: cannot open file /tmp/tmpkbd.qZry3Q
Sep 1 23:47:00 cn04 kernel: [ 0.000000] BIOS-e820: [mem 0x0000000000000000-0x0000000000092fff] usable
Sep 1 23:47:00 cn04 kernel: [ 0.000000] BIOS-e820: [mem 0x0000000000093000-0x0000000000093fff] reserved
Sep 1 23:47:00 cn04 kernel: [ 0.000000] BIOS-e820: [mem 0x0000000000094000-0x000000000009ffff] usable
Sep 1 23:47:00 cn04 kernel: [ 0.000000] BIOS-e820: [mem 0x0000000000100000-0x0000000062fe0fff] usable
Sep 1 23:47:00 cn04 kernel: [ 0.000000] BIOS-e820: [mem 0x0000000062fe1000-0x000000006b5e0fff] reserved
Sep 1 23:47:00 cn04 kernel: [ 0.000000] BIOS-e820: [mem 0x000000006b5e1000-0x000000006b5e1fff] usable
Sep 1 23:47:00 cn04 systemd[1]: Starting Flush Journal to Persistent Storage...
Sep 1 23:47:00 cn04 kernel: [ 0.000000] BIOS-e820: [mem 0x000000006b5e2000-0x000000006b662fff] reserved
Sep 1 23:47:00 cn04 kernel: [ 0.000000] BIOS-e820: [mem 0x000000006b663000-0x00000000784fefff] usable
Sep 1 23:47:00 cn04 kernel: [ 0.000000] BIOS-e820: [mem 0x00000000784ff000-0x00000000788fefff] reserved
Sep 1 23:47:00 cn04 kernel: [ 0.000000] BIOS-e820: [mem 0x00000000788ff000-0x00000000790fefff] type 20
Sep 1 23:47:00 cn04 kernel: [ 0.000000] BIOS-e820: [mem 0x00000000790ff000-0x00000000791fefff] reserved
Sep 1 23:47:00 cn04 kernel: [ 0.000000] BIOS-e820: [mem 0x00000000791ff000-0x000000007b5fefff] ACPI NVS
Sep 1 23:47:00 cn04 systemd[1]: Started Flush Journal to Persistent Storage.
Sep 1 23:47:00 cn04 kernel: [ 0.000000] BIOS-e820: [mem 0x000000007b5ff000-0x000000007b7fefff] ACPI data
Sep 1 23:47:00 cn04 kernel: [ 0.000000] BIOS-e820: [mem 0x000000007b7ff000-0x000000007b7fffff] usable
Sep 1 23:47:00 cn04 kernel: [ 0.000000] BIOS-e820: [mem 0x0000000080000000-0x000000008fffffff] reserved
Sep 1 23:47:00 cn04 kernel: [ 0.000000] BIOS-e820: [mem 0x0000000100000000-0x000000407fffffff] usable
Have had a 3 node cluster running perfectly fine for month's, recently added a 4th node to this cluster (Same hardware DL160 G9, and same configuration)
For the last few nights every night around 11:40-50 the server will reboot it self, looking in the log's I am struggling to see anything at this exact time apart from corosync retransmit notifications however these are going throughout the day.
pveversion -v
proxmox-ve: 5.0-20 (running kernel: 4.10.17-2-pve)
pve-manager: 5.0-30 (running version: 5.0-30/5ab26bc)
pve-kernel-4.10.17-2-pve: 4.10.17-20
libpve-http-server-perl: 2.0-6
lvm2: 2.02.168-pve3
corosync: 2.4.2-pve3
libqb0: 1.0.1-1
pve-cluster: 5.0-12
qemu-server: 5.0-15
pve-firmware: 2.0-2
libpve-common-perl: 5.0-16
libpve-guest-common-perl: 2.0-11
libpve-access-control: 5.0-6
libpve-storage-perl: 5.0-14
pve-libspice-server1: 0.12.8-3
vncterm: 1.5-2
pve-docs: 5.0-9
pve-qemu-kvm: 2.9.0-4
pve-container: 2.0-15
pve-firewall: 3.0-2
pve-ha-manager: 2.0-2
ksm-control-daemon: 1.2-2
glusterfs-client: 3.8.8-1
lxc-pve: 2.0.8-3
lxcfs: 2.0.7-pve4
criu: 2.11.1-1~bpo90
novnc-pve: 0.6-4
smartmontools: 6.5+svn4324-1
zfsutils-linux: 0.6.5.11-pve17~bpo90
ceph: 12.2.0-1~bpo90+1
syslog at point of reboot
Sep 1 23:36:00 cn04 systemd[1]: Starting Proxmox VE replication runner...
Sep 1 23:36:01 cn04 systemd[1]: Started Proxmox VE replication runner.
Sep 1 23:36:11 cn04 pmxcfs[1968]: [status] notice: received log
Sep 1 23:36:16 cn04 pmxcfs[1968]: [status] notice: received log
Sep 1 23:37:00 cn04 systemd[1]: Starting Proxmox VE replication runner...
Sep 1 23:37:01 cn04 systemd[1]: Started Proxmox VE replication runner.
Sep 1 23:37:11 cn04 pmxcfs[1968]: [status] notice: received log
Sep 1 23:37:16 cn04 pmxcfs[1968]: [status] notice: received log
Sep 1 23:38:00 cn04 systemd[1]: Starting Proxmox VE replication runner...
Sep 1 23:38:01 cn04 systemd[1]: Started Proxmox VE replication runner.
Sep 1 23:38:11 cn04 pmxcfs[1968]: [status] notice: received log
Sep 1 23:38:16 cn04 pmxcfs[1968]: [status] notice: received log
Sep 1 23:39:00 cn04 systemd[1]: Starting Proxmox VE replication runner...
Sep 1 23:39:01 cn04 systemd[1]: Started Proxmox VE replication runner.
Sep 1 23:39:11 cn04 pmxcfs[1968]: [status] notice: received log
Sep 1 23:39:16 cn04 pmxcfs[1968]: [status] notice: received log
Sep 1 23:40:00 cn04 systemd[1]: Starting Proxmox VE replication runner...
Sep 1 23:40:01 cn04 systemd[1]: Started Proxmox VE replication runner.
Sep 1 23:40:11 cn04 pmxcfs[1968]: [status] notice: received log
Sep 1 23:40:16 cn04 pmxcfs[1968]: [status] notice: received log
Sep 1 23:41:00 cn04 systemd[1]: Starting Proxmox VE replication runner...
Sep 1 23:41:01 cn04 systemd[1]: Started Proxmox VE replication runner.
Sep 1 23:41:11 cn04 pmxcfs[1968]: [status] notice: received log
Sep 1 23:41:16 cn04 pmxcfs[1968]: [status] notice: received log
Sep 1 23:42:00 cn04 systemd[1]: Starting Proxmox VE replication runner...
Sep 1 23:42:01 cn04 systemd[1]: Started Proxmox VE replication runner.
Sep 1 23:42:09 cn04 pmxcfs[1968]: [dcdb] notice: data verification successful
Sep 1 23:42:11 cn04 pmxcfs[1968]: [status] notice: received log
Sep 1 23:42:16 cn04 pmxcfs[1968]: [status] notice: received log
Sep 1 23:43:00 cn04 systemd[1]: Starting Proxmox VE replication runner...
Sep 1 23:43:01 cn04 systemd[1]: Started Proxmox VE replication runner.
Sep 1 23:43:11 cn04 pmxcfs[1968]: [status] notice: received log
Sep 1 23:43:16 cn04 pmxcfs[1968]: [status] notice: received log
Sep 1 23:43:49 cn04 corosync[1985]: notice [TOTEM ] Retransmit List: 2a97c5
Sep 1 23:43:49 cn04 corosync[1985]: [TOTEM ] Retransmit List: 2a97c5
Sep 1 23:44:00 cn04 systemd[1]: Starting Proxmox VE replication runner...
Sep 1 23:44:03 cn04 systemd[1]: Started Proxmox VE replication runner.
Sep 1 23:44:11 cn04 pmxcfs[1968]: [status] notice: received log
Sep 1 23:44:16 cn04 pmxcfs[1968]: [status] notice: received log
Sep 1 23:47:00 cn04 systemd-modules-load[593]: Inserted module 'iscsi_tcp'
Sep 1 23:47:00 cn04 kernel: [ 0.000000] Linux version 4.10.17-2-pve (root@nora) (gcc version 6.3.0 20170516 (Debian 6.3.0-18) ) #1 SMP PVE 4.10.17-20 (Mon, 14 Aug 2017 11:23:37 +0200) ()
Sep 1 23:47:00 cn04 systemd-modules-load[593]: Inserted module 'ib_iser'
Sep 1 23:47:00 cn04 kernel: [ 0.000000] Command line: BOOT_IMAGE=/boot/vmlinuz-4.10.17-2-pve root=/dev/mapper/pve-root ro quiet
Sep 1 23:47:00 cn04 kernel: [ 0.000000] KERNEL supported cpus:
Sep 1 23:47:00 cn04 kernel: [ 0.000000] Intel GenuineIntel
Sep 1 23:47:00 cn04 kernel: [ 0.000000] AMD AuthenticAMD
Sep 1 23:47:00 cn04 systemd-modules-load[593]: Inserted module 'vhost_net'
Sep 1 23:47:00 cn04 kernel: [ 0.000000] Centaur CentaurHauls
Sep 1 23:47:00 cn04 kernel: [ 0.000000] x86/fpu: Supporting XSAVE feature 0x001: 'x87 floating point registers'
Sep 1 23:47:00 cn04 systemd-udevd[655]: Process '/bin/mount -t fusectl fusectl /sys/fs/fuse/connections' failed with exit code 32.
Sep 1 23:47:00 cn04 kernel: [ 0.000000] x86/fpu: Supporting XSAVE feature 0x002: 'SSE registers'
Sep 1 23:47:00 cn04 kernel: [ 0.000000] x86/fpu: Supporting XSAVE feature 0x004: 'AVX registers'
Sep 1 23:47:00 cn04 kernel: [ 0.000000] x86/fpu: xstate_offset[2]: 576, xstate_sizes[2]: 256
Sep 1 23:47:00 cn04 kernel: [ 0.000000] x86/fpu: Enabled xstate features 0x7, context size is 832 bytes, using 'standard' format.
Sep 1 23:47:00 cn04 kernel: [ 0.000000] e820: BIOS-provided physical RAM map:
Sep 1 23:47:00 cn04 keyboard-setup.sh[586]: cannot open file /tmp/tmpkbd.qZry3Q
Sep 1 23:47:00 cn04 kernel: [ 0.000000] BIOS-e820: [mem 0x0000000000000000-0x0000000000092fff] usable
Sep 1 23:47:00 cn04 kernel: [ 0.000000] BIOS-e820: [mem 0x0000000000093000-0x0000000000093fff] reserved
Sep 1 23:47:00 cn04 kernel: [ 0.000000] BIOS-e820: [mem 0x0000000000094000-0x000000000009ffff] usable
Sep 1 23:47:00 cn04 kernel: [ 0.000000] BIOS-e820: [mem 0x0000000000100000-0x0000000062fe0fff] usable
Sep 1 23:47:00 cn04 kernel: [ 0.000000] BIOS-e820: [mem 0x0000000062fe1000-0x000000006b5e0fff] reserved
Sep 1 23:47:00 cn04 kernel: [ 0.000000] BIOS-e820: [mem 0x000000006b5e1000-0x000000006b5e1fff] usable
Sep 1 23:47:00 cn04 systemd[1]: Starting Flush Journal to Persistent Storage...
Sep 1 23:47:00 cn04 kernel: [ 0.000000] BIOS-e820: [mem 0x000000006b5e2000-0x000000006b662fff] reserved
Sep 1 23:47:00 cn04 kernel: [ 0.000000] BIOS-e820: [mem 0x000000006b663000-0x00000000784fefff] usable
Sep 1 23:47:00 cn04 kernel: [ 0.000000] BIOS-e820: [mem 0x00000000784ff000-0x00000000788fefff] reserved
Sep 1 23:47:00 cn04 kernel: [ 0.000000] BIOS-e820: [mem 0x00000000788ff000-0x00000000790fefff] type 20
Sep 1 23:47:00 cn04 kernel: [ 0.000000] BIOS-e820: [mem 0x00000000790ff000-0x00000000791fefff] reserved
Sep 1 23:47:00 cn04 kernel: [ 0.000000] BIOS-e820: [mem 0x00000000791ff000-0x000000007b5fefff] ACPI NVS
Sep 1 23:47:00 cn04 systemd[1]: Started Flush Journal to Persistent Storage.
Sep 1 23:47:00 cn04 kernel: [ 0.000000] BIOS-e820: [mem 0x000000007b5ff000-0x000000007b7fefff] ACPI data
Sep 1 23:47:00 cn04 kernel: [ 0.000000] BIOS-e820: [mem 0x000000007b7ff000-0x000000007b7fffff] usable
Sep 1 23:47:00 cn04 kernel: [ 0.000000] BIOS-e820: [mem 0x0000000080000000-0x000000008fffffff] reserved
Sep 1 23:47:00 cn04 kernel: [ 0.000000] BIOS-e820: [mem 0x0000000100000000-0x000000407fffffff] usable