Hi,
On a 3 nodes cluster (with HA VMs), since few days, I have Totem retransmits everyday at 06:00 (during few seconds).
These logs appear on 2 of 3 nodes. dc-prox-06 and dc-prox-13 run 10 VMs each. the last one, dc-prox-07, has no hosted VM.
I have no heavy load (CPU+Net) at 06:00 and everything works as expected during the rest of the day, with high CPU/NET load as well.
I know I'm not running the latest proxmox version but I have to plan Bios update first.
All nodes are running with same packages :
Corosync uses a dedicated VLAN, but yes, same interface (LACP bond 2x1G) we use for bridging VMs (vmbr), but as I said, there is no traffic and no load at 06:00.
Do you think that the issue is located on dc-prox-07 ?
What can I do ?
Thanks a lot !
On a 3 nodes cluster (with HA VMs), since few days, I have Totem retransmits everyday at 06:00 (during few seconds).
Code:
...
Aug 08 06:00:05 dc-prox-13 corosync[2378]: notice [TOTEM ] Retransmit List: 26c62fb
Aug 08 06:00:05 dc-prox-13 corosync[2378]: [TOTEM ] Retransmit List: 26c62fb
Aug 08 06:00:05 dc-prox-13 corosync[2378]: [TOTEM ] Retransmit List: 26c62fb
Aug 08 06:00:05 dc-prox-13 corosync[2378]: notice [TOTEM ] Retransmit List: 26c62fb
Aug 08 06:00:05 dc-prox-13 corosync[2378]: notice [TOTEM ] Retransmit List: 26c62fb
Aug 08 06:00:05 dc-prox-13 corosync[2378]: [TOTEM ] Retransmit List: 26c62fb
Aug 08 06:00:05 dc-prox-13 corosync[2378]: notice [TOTEM ] Retransmit List: 26c62fb
Aug 08 06:00:05 dc-prox-13 corosync[2378]: [TOTEM ] Retransmit List: 26c62fb
Aug 08 06:00:05 dc-prox-13 corosync[2378]: notice [TOTEM ] Retransmit List: 26c62fb 26c62fc
Aug 08 06:00:05 dc-prox-13 corosync[2378]: [TOTEM ] Retransmit List: 26c62fb 26c62fc
Aug 08 06:00:05 dc-prox-13 corosync[2378]: notice [TOTEM ] Retransmit List: 26c62fb 26c62fc
...
Code:
...
Aug 08 06:00:05 dc-prox-06 corosync[2278]: notice [TOTEM ] Retransmit List: 26c62fb
Aug 08 06:00:05 dc-prox-06 corosync[2278]: notice [TOTEM ] Retransmit List: 26c62fb
Aug 08 06:00:05 dc-prox-06 corosync[2278]: [TOTEM ] Retransmit List: 26c62fb
Aug 08 06:00:05 dc-prox-06 corosync[2278]: [TOTEM ] Retransmit List: 26c62fb
Aug 08 06:00:05 dc-prox-06 corosync[2278]: notice [TOTEM ] Retransmit List: 26c62fb
Aug 08 06:00:05 dc-prox-06 corosync[2278]: [TOTEM ] Retransmit List: 26c62fb
Aug 08 06:00:05 dc-prox-06 corosync[2278]: notice [TOTEM ] Retransmit List: 26c62fb
Aug 08 06:00:05 dc-prox-06 corosync[2278]: [TOTEM ] Retransmit List: 26c62fb
Aug 08 06:00:05 dc-prox-06 corosync[2278]: notice [TOTEM ] Retransmit List: 26c62fb 26c62fc
Aug 08 06:00:05 dc-prox-06 corosync[2278]: [TOTEM ] Retransmit List: 26c62fb 26c62fc
Aug 08 06:00:05 dc-prox-06 corosync[2278]: notice [TOTEM ] Retransmit List: 26c62fb 26c62fc
...
These logs appear on 2 of 3 nodes. dc-prox-06 and dc-prox-13 run 10 VMs each. the last one, dc-prox-07, has no hosted VM.
Code:
...
Aug 08 05:32:40 dc-prox-07 rrdcached[2014]: started new journal /var/lib/rrdcached/journal/rrd.journal.1533699160.906147
Aug 08 05:32:40 dc-prox-07 rrdcached[2014]: removing old journal /var/lib/rrdcached/journal/rrd.journal.1533691960.906145
Aug 08 05:58:22 dc-prox-07 pmxcfs[2065]: [dcdb] notice: data verification successful
Aug 08 06:17:01 dc-prox-07 CRON[25764]: pam_unix(cron:session): session opened for user root by (uid=0)
Aug 08 06:17:01 dc-prox-07 CRON[25765]: (root) CMD ( cd / && run-parts --report /etc/cron.hourly)
Aug 08 06:17:01 dc-prox-07 CRON[25764]: pam_unix(cron:session): session closed for user root
Aug 08 06:25:01 dc-prox-07 CRON[27769]: pam_unix(cron:session): session opened for user root by (uid=0)
Aug 08 06:25:01 dc-prox-07 CRON[27770]: (root) CMD (test -x /usr/sbin/anacron || ( cd / && run-parts --report /etc/cron.daily ))
...
I have no heavy load (CPU+Net) at 06:00 and everything works as expected during the rest of the day, with high CPU/NET load as well.
I know I'm not running the latest proxmox version but I have to plan Bios update first.
All nodes are running with same packages :
root@dc-prox-06:/var/log# pveversion -v
proxmox-ve: 5.1-32 (running kernel: 4.13.13-2-pve)
pve-manager: 5.1-41 (running version: 5.1-41/0b958203)
pve-kernel-4.13.4-1-pve: 4.13.4-26
pve-kernel-4.13.13-2-pve: 4.13.13-32
pve-kernel-4.13.13-1-pve: 4.13.13-31
libpve-http-server-perl: 2.0-8
lvm2: 2.02.168-pve6
corosync: 2.4.2-pve3
libqb0: 1.0.1-1
pve-cluster: 5.0-19
qemu-server: 5.0-18
pve-firmware: 2.0-3
libpve-common-perl: 5.0-25
libpve-guest-common-perl: 2.0-14
libpve-access-control: 5.0-7
libpve-storage-perl: 5.0-17
pve-libspice-server1: 0.12.8-3
vncterm: 1.5-3
pve-docs: 5.1-12
pve-qemu-kvm: 2.9.1-5
pve-container: 2.0-18
pve-firewall: 3.0-5
pve-ha-manager: 2.0-4
ksm-control-daemon: 1.2-2
glusterfs-client: 3.8.8-1
lxc-pve: 2.1.1-2
lxcfs: 2.0.8-1
criu: 2.11.1-1~bpo90
novnc-pve: 0.6-4
smartmontools: 6.5+svn4324-1
zfsutils-linux: 0.7.3-pve1~bpo9
ceph: 12.2.2-pve1
proxmox-ve: 5.1-32 (running kernel: 4.13.13-2-pve)
pve-manager: 5.1-41 (running version: 5.1-41/0b958203)
pve-kernel-4.13.4-1-pve: 4.13.4-26
pve-kernel-4.13.13-2-pve: 4.13.13-32
pve-kernel-4.13.13-1-pve: 4.13.13-31
libpve-http-server-perl: 2.0-8
lvm2: 2.02.168-pve6
corosync: 2.4.2-pve3
libqb0: 1.0.1-1
pve-cluster: 5.0-19
qemu-server: 5.0-18
pve-firmware: 2.0-3
libpve-common-perl: 5.0-25
libpve-guest-common-perl: 2.0-14
libpve-access-control: 5.0-7
libpve-storage-perl: 5.0-17
pve-libspice-server1: 0.12.8-3
vncterm: 1.5-3
pve-docs: 5.1-12
pve-qemu-kvm: 2.9.1-5
pve-container: 2.0-18
pve-firewall: 3.0-5
pve-ha-manager: 2.0-4
ksm-control-daemon: 1.2-2
glusterfs-client: 3.8.8-1
lxc-pve: 2.1.1-2
lxcfs: 2.0.8-1
criu: 2.11.1-1~bpo90
novnc-pve: 0.6-4
smartmontools: 6.5+svn4324-1
zfsutils-linux: 0.7.3-pve1~bpo9
ceph: 12.2.2-pve1
root@dc-prox-07:~# pveversion -v
proxmox-ve: 5.1-32 (running kernel: 4.13.13-2-pve)
pve-manager: 5.1-41 (running version: 5.1-41/0b958203)
pve-kernel-4.13.4-1-pve: 4.13.4-26
pve-kernel-4.13.13-2-pve: 4.13.13-32
pve-kernel-4.10.17-4-pve: 4.10.17-24
pve-kernel-4.10.17-2-pve: 4.10.17-20
pve-kernel-4.10.15-1-pve: 4.10.15-15
pve-kernel-4.13.13-1-pve: 4.13.13-31
pve-kernel-4.10.17-3-pve: 4.10.17-23
libpve-http-server-perl: 2.0-8
lvm2: 2.02.168-pve6
corosync: 2.4.2-pve3
libqb0: 1.0.1-1
pve-cluster: 5.0-19
qemu-server: 5.0-18
pve-firmware: 2.0-3
libpve-common-perl: 5.0-25
libpve-guest-common-perl: 2.0-14
libpve-access-control: 5.0-7
libpve-storage-perl: 5.0-17
pve-libspice-server1: 0.12.8-3
vncterm: 1.5-3
pve-docs: 5.1-12
pve-qemu-kvm: 2.9.1-5
pve-container: 2.0-18
pve-firewall: 3.0-5
pve-ha-manager: 2.0-4
ksm-control-daemon: 1.2-2
glusterfs-client: 3.8.8-1
lxc-pve: 2.1.1-2
lxcfs: 2.0.8-1
criu: 2.11.1-1~bpo90
novnc-pve: 0.6-4
smartmontools: 6.5+svn4324-1
zfsutils-linux: 0.7.3-pve1~bpo9
ceph: 12.2.2-pve1
proxmox-ve: 5.1-32 (running kernel: 4.13.13-2-pve)
pve-manager: 5.1-41 (running version: 5.1-41/0b958203)
pve-kernel-4.13.4-1-pve: 4.13.4-26
pve-kernel-4.13.13-2-pve: 4.13.13-32
pve-kernel-4.10.17-4-pve: 4.10.17-24
pve-kernel-4.10.17-2-pve: 4.10.17-20
pve-kernel-4.10.15-1-pve: 4.10.15-15
pve-kernel-4.13.13-1-pve: 4.13.13-31
pve-kernel-4.10.17-3-pve: 4.10.17-23
libpve-http-server-perl: 2.0-8
lvm2: 2.02.168-pve6
corosync: 2.4.2-pve3
libqb0: 1.0.1-1
pve-cluster: 5.0-19
qemu-server: 5.0-18
pve-firmware: 2.0-3
libpve-common-perl: 5.0-25
libpve-guest-common-perl: 2.0-14
libpve-access-control: 5.0-7
libpve-storage-perl: 5.0-17
pve-libspice-server1: 0.12.8-3
vncterm: 1.5-3
pve-docs: 5.1-12
pve-qemu-kvm: 2.9.1-5
pve-container: 2.0-18
pve-firewall: 3.0-5
pve-ha-manager: 2.0-4
ksm-control-daemon: 1.2-2
glusterfs-client: 3.8.8-1
lxc-pve: 2.1.1-2
lxcfs: 2.0.8-1
criu: 2.11.1-1~bpo90
novnc-pve: 0.6-4
smartmontools: 6.5+svn4324-1
zfsutils-linux: 0.7.3-pve1~bpo9
ceph: 12.2.2-pve1
root@dc-prox-13:~# pveversion -v
proxmox-ve: 5.1-32 (running kernel: 4.13.13-2-pve)
pve-manager: 5.1-41 (running version: 5.1-41/0b958203)
pve-kernel-4.13.4-1-pve: 4.13.4-26
pve-kernel-4.13.13-2-pve: 4.13.13-32
pve-kernel-4.10.17-4-pve: 4.10.17-24
pve-kernel-4.10.17-2-pve: 4.10.17-20
pve-kernel-4.10.15-1-pve: 4.10.15-15
pve-kernel-4.13.13-1-pve: 4.13.13-31
pve-kernel-4.10.17-3-pve: 4.10.17-23
pve-kernel-4.10.17-1-pve: 4.10.17-18
libpve-http-server-perl: 2.0-8
lvm2: 2.02.168-pve6
corosync: 2.4.2-pve3
libqb0: 1.0.1-1
pve-cluster: 5.0-19
qemu-server: 5.0-18
pve-firmware: 2.0-3
libpve-common-perl: 5.0-25
libpve-guest-common-perl: 2.0-14
libpve-access-control: 5.0-7
libpve-storage-perl: 5.0-17
pve-libspice-server1: 0.12.8-3
vncterm: 1.5-3
pve-docs: 5.1-12
pve-qemu-kvm: 2.9.1-5
pve-container: 2.0-18
pve-firewall: 3.0-5
pve-ha-manager: 2.0-4
ksm-control-daemon: 1.2-2
glusterfs-client: 3.8.8-1
lxc-pve: 2.1.1-2
lxcfs: 2.0.8-1
criu: 2.11.1-1~bpo90
novnc-pve: 0.6-4
smartmontools: 6.5+svn4324-1
zfsutils-linux: 0.7.3-pve1~bpo9
ceph: 12.2.2-pve1
proxmox-ve: 5.1-32 (running kernel: 4.13.13-2-pve)
pve-manager: 5.1-41 (running version: 5.1-41/0b958203)
pve-kernel-4.13.4-1-pve: 4.13.4-26
pve-kernel-4.13.13-2-pve: 4.13.13-32
pve-kernel-4.10.17-4-pve: 4.10.17-24
pve-kernel-4.10.17-2-pve: 4.10.17-20
pve-kernel-4.10.15-1-pve: 4.10.15-15
pve-kernel-4.13.13-1-pve: 4.13.13-31
pve-kernel-4.10.17-3-pve: 4.10.17-23
pve-kernel-4.10.17-1-pve: 4.10.17-18
libpve-http-server-perl: 2.0-8
lvm2: 2.02.168-pve6
corosync: 2.4.2-pve3
libqb0: 1.0.1-1
pve-cluster: 5.0-19
qemu-server: 5.0-18
pve-firmware: 2.0-3
libpve-common-perl: 5.0-25
libpve-guest-common-perl: 2.0-14
libpve-access-control: 5.0-7
libpve-storage-perl: 5.0-17
pve-libspice-server1: 0.12.8-3
vncterm: 1.5-3
pve-docs: 5.1-12
pve-qemu-kvm: 2.9.1-5
pve-container: 2.0-18
pve-firewall: 3.0-5
pve-ha-manager: 2.0-4
ksm-control-daemon: 1.2-2
glusterfs-client: 3.8.8-1
lxc-pve: 2.1.1-2
lxcfs: 2.0.8-1
criu: 2.11.1-1~bpo90
novnc-pve: 0.6-4
smartmontools: 6.5+svn4324-1
zfsutils-linux: 0.7.3-pve1~bpo9
ceph: 12.2.2-pve1
Corosync uses a dedicated VLAN, but yes, same interface (LACP bond 2x1G) we use for bridging VMs (vmbr), but as I said, there is no traffic and no load at 06:00.
Do you think that the issue is located on dc-prox-07 ?
What can I do ?
Thanks a lot !