[SOLVED] pmxcfs: [main] notice: ignore duplicate, following date issue

lazynooblet · Jul 17, 2022

I replaced a CMOS battery a few days ago, and then fat-fingered the year in BIOS prior to booting up the node. The node came up with year 2024, it was noticed pretty much straight away but some damage was done.

Topology: 3 node system, exclusive NIC and switch for Corosync. Chrony setup on each node.

Issues I've resolved since then are:

Task list showing items at top of list that were in the past. Resolved by manually removing entries from /var/log/pve/tasks
RRDC update error spamming syslog. Resolved by deleting files in /var/lib/rrdcached
GUI repeatedly asking for user sign-in. Resolved by removing auth-related files in /etc/pve and running: pvecm updatecerts -f

The only lingering issue is a syslog entry on every sign-in to the GUI, or refresh of the GUI page reading: ignore duplicate

Anyone know how to resolve this?

Jul 17 16:01:31 grogu pvedaemon[3578461]: <root@pam> successful auth for user 'root@pam'
Jul 17 16:01:31 grogu pmxcfs[1797]: [main] notice: ignore duplicate
Jul 17 16:02:25 grogu pvedaemon[3578460]: <root@pam> successful auth for user 'root@pam'
Jul 17 16:02:25 grogu pmxcfs[1797]: [main] notice: ignore duplicate
Jul 17 16:02:27 grogu pvedaemon[3578460]: <root@pam> successful auth for user 'root@pam'
Jul 17 16:02:27 grogu pmxcfs[1797]: [main] notice: ignore duplicate
Jul 17 16:02:28 grogu pvedaemon[3578462]: <root@pam> successful auth for user 'root@pam'
Jul 17 16:02:28 grogu pmxcfs[1797]: [main] notice: ignore duplicate

# pveversion -v

proxmox-ve: 7.2-1 (running kernel: 5.15.35-3-pve)
pve-manager: 7.2-5 (running version: 7.2-5/12f1e639)
pve-kernel-5.15: 7.2-5
pve-kernel-helper: 7.2-5
pve-kernel-5.13: 7.1-9
pve-kernel-5.11: 7.0-10
pve-kernel-5.15.35-3-pve: 5.15.35-6
pve-kernel-5.13.19-6-pve: 5.13.19-15
pve-kernel-5.13.19-2-pve: 5.13.19-4
pve-kernel-5.11.22-7-pve: 5.11.22-12
pve-kernel-5.11.22-4-pve: 5.11.22-9
ceph-fuse: 15.2.14-pve1
corosync: 3.1.5-pve2
criu: 3.15-1+pve-1
glusterfs-client: 9.2-1
ifupdown2: 3.1.0-1+pmx3
ksm-control-daemon: 1.4-1
libjs-extjs: 7.0.0-1
libknet1: 1.24-pve1
libproxmox-acme-perl: 1.4.2
libproxmox-backup-qemu0: 1.3.1-1
libpve-access-control: 7.2-2
libpve-apiclient-perl: 3.2-1
libpve-common-perl: 7.2-2
libpve-guest-common-perl: 4.1-2
libpve-http-server-perl: 4.1-2
libpve-storage-perl: 7.2-5
libspice-server1: 0.14.3-2.1
lvm2: 2.03.11-2.1
lxc-pve: 4.0.12-1
lxcfs: 4.0.12-pve1
novnc-pve: 1.3.0-3
proxmox-backup-client: 2.2.3-1
proxmox-backup-file-restore: 2.2.3-1
proxmox-mini-journalreader: 1.3-1
proxmox-widget-toolkit: 3.5.1
pve-cluster: 7.2-1
pve-container: 4.2-1
pve-docs: 7.2-2
pve-edk2-firmware: 3.20210831-2
pve-firewall: 4.2-5
pve-firmware: 3.4-2
pve-ha-manager: 3.3-4
pve-i18n: 2.7-2
pve-qemu-kvm: 6.2.0-10
pve-xtermjs: 4.16.0-1
qemu-server: 7.2-3
smartmontools: 7.2-pve3
spiceterm: 3.2-2
swtpm: 0.7.1~bpo11+1
vncterm: 1.7-1
zfsutils-linux: 2.1.4-pve1

# tree /etc/pve

Attached pmxcfs-debug.txt: /var/log/syslog with pmxcfs debug enabled during a GUI sign-in

t.lamprecht · Jul 18, 2022

lazynooblet said:
The only lingering issue is a syslog entry on every sign-in to the GUI, or refresh of the GUI page reading: ignore duplicate

Anyone know how to resolve this?

It seems to me that there are messages from the future inside cluster log buffer, that breaks some internal assumption.

Trouble is that that got delivered to all nodes, but as it's only saved in ram you may get away with doing a full stop on all and then start on all again, note that this requires special care if you have HA enabled.

If you have HA enabled you must first freeze it to avoid that the watchdog triggers due to pmxcfs not being up. If you never had any HA service enabled since the last cold boot of the whole cluster you can ignore this, but even then it won't hurt (if you want to be safe):

On all nodes first stop the local resource manager: systemctl stop pve-ha-lrm, wait until that finished and all logged that they closed the watchdog gracefully.

Only then stop the cluster resource manager on all nodes: systemctl stop pve-ha-crm

Now stop pve-cluster service on all nodes. Note, running VMs/CTs will stay running but you won't be able to start new ones or stop them, also login via api/gui is off the table as long as pve-cluster is stopped.

systemctl stop pve-cluster

Wait until that went through on all nodes, and then start up again:

systemctl start pve-cluster

pmxcfs should now have dropped all (future) timestamps from internal cached states and so it should work out again fine now.
If you stopped HA first you may do the reverse for starting it up again, first all CRMs then all LRMs.

lazynooblet · Jul 18, 2022

I am humbled that you chose to respond to my request. Thank you. Such a well written reply too.

I have followed your steps and all nodes are no longer showing the error on login.

Search

Search

[SOLVED] pmxcfs: [main] notice: ignore duplicate, following date issue

lazynooblet

Member

Attachments

t.lamprecht

Proxmox Staff Member

lazynooblet

Member

We value your privacy