a pve server rebooted - any insight?

jmckee

Renowned Member
Oct 19, 2013
24
0
66
Since:



Until:


Happened during the night. Hopefully this log is useful - it is a bit opaque to me.. Server rebooted and seems normal again today... I think. Any insight is much appreciated especially if hardware is failing. I like to get right no top of that :)

Feb 05 23:01:43 proxmox1 pmxcfs[1474]: [status] notice: received log
Feb 05 23:16:43 proxmox1 pmxcfs[1474]: [status] notice: received log
Feb 05 23:17:01 proxmox1 CRON[2768566]: pam_unix(cron:session): session opened for user root(uid=0) by (uid=0)
Feb 05 23:17:01 proxmox1 CRON[2768567]: (root) CMD ( cd / && run-parts --report /etc/cron.hourly)
Feb 05 23:17:01 proxmox1 CRON[2768566]: pam_unix(cron:session): session closed for user root
Feb 05 23:39:19 proxmox1 pmxcfs[1474]: [dcdb] notice: data verification successful
Feb 06 00:00:02 proxmox1 pvescheduler[2778104]: <root@pam> starting task UPID:proxmox1:002A6400:317D2960:65C1E701:vzdump:103:root@pam:
Feb 06 00:00:05 proxmox1 pvescheduler[2778112]: INFO: starting new backup job: vzdump 103 --storage backups --mailto redacted--mode snapshot --mailnotification failure --quiet 1 --compress zstd --node proxmox1
Feb 06 00:00:05 proxmox1 pvescheduler[2778112]: INFO: Starting Backup of VM 103 (qemu)
Feb 06 00:00:08 proxmox1 systemd[1]: Starting Rotate log files...
Feb 06 00:00:08 proxmox1 systemd[1]: Starting Daily man-db regeneration...
Feb 06 00:00:09 proxmox1 systemd[1]: Reloading PVE API Proxy Server.
Feb 06 00:00:31 proxmox1 pvestatd[1545]: status update time (14.376 seconds)
Feb 06 00:01:10 proxmox1 pvestatd[1545]: status update time (9.197 seconds)
Feb 06 00:01:12 proxmox1 systemd[1]: man-db.service: Succeeded.
Feb 06 00:01:12 proxmox1 systemd[1]: Finished Daily man-db regeneration.
Feb 06 00:01:14 proxmox1 pvestatd[1545]: got timeout
Feb 06 00:01:23 proxmox1 pvestatd[1545]: got timeout
Feb 06 00:01:28 proxmox1 pvestatd[1545]: status update time (7.485 seconds)
Feb 06 00:01:39 proxmox1 pveproxy[2778164]: send HUP to 1609
Feb 06 00:01:39 proxmox1 pveproxy[1609]: received signal HUP
Feb 06 00:01:39 proxmox1 systemd[1]: Reloaded PVE API Proxy Server.
Feb 06 00:01:39 proxmox1 pveproxy[1609]: server closing
Feb 06 00:01:39 proxmox1 pveproxy[1609]: server shutdown (restart)
Feb 06 00:01:39 proxmox1 systemd[1]: Reloading PVE SPICE Proxy Server.
Feb 06 00:01:40 proxmox1 spiceproxy[2778620]: send HUP to 1616
Feb 06 00:01:40 proxmox1 systemd[1]: Reloaded PVE SPICE Proxy Server.
Feb 06 00:01:40 proxmox1 systemd[1]: Stopping Proxmox VE firewall logger...
Feb 06 00:01:40 proxmox1 pvefw-logger[2459839]: received terminate request (signal)
Feb 06 00:01:40 proxmox1 pvefw-logger[2459839]: stopping pvefw logger
Feb 06 00:01:40 proxmox1 spiceproxy[1616]: received signal HUP
Feb 06 00:01:40 proxmox1 spiceproxy[1616]: server closing
Feb 06 00:01:40 proxmox1 spiceproxy[1616]: server shutdown (restart)
Feb 06 00:01:41 proxmox1 systemd[1]: pvefw-logger.service: Succeeded.
Feb 06 00:01:41 proxmox1 systemd[1]: Stopped Proxmox VE firewall logger.
Feb 06 00:01:41 proxmox1 systemd[1]: pvefw-logger.service: Consumed 6.351s CPU time.
Feb 06 00:01:41 proxmox1 systemd[1]: Starting Proxmox VE firewall logger...
Feb 06 00:01:41 proxmox1 systemd[1]: Started Proxmox VE firewall logger.
Feb 06 00:01:41 proxmox1 pvefw-logger[2778630]: starting pvefw logger
Feb 06 00:01:41 proxmox1 systemd[1]: logrotate.service: Succeeded.
Feb 06 00:01:41 proxmox1 systemd[1]: Finished Rotate log files.
Feb 06 00:01:41 proxmox1 spiceproxy[1616]: restarting server
Feb 06 00:01:42 proxmox1 spiceproxy[1616]: starting 1 worker(s)
Feb 06 00:01:42 proxmox1 spiceproxy[1616]: worker 2778640 started
Feb 06 00:01:42 proxmox1 pveproxy[1609]: restarting server
Feb 06 00:01:42 proxmox1 pveproxy[1609]: starting 3 worker(s)
Feb 06 00:01:42 proxmox1 pveproxy[1609]: worker 2778642 started
Feb 06 00:01:42 proxmox1 pveproxy[1609]: worker 2778643 started
Feb 06 00:01:42 proxmox1 pveproxy[1609]: worker 2778644 started
Feb 06 00:01:47 proxmox1 pveproxy[2692965]: worker exit
Feb 06 00:01:47 proxmox1 pveproxy[2752908]: worker exit
Feb 06 00:01:47 proxmox1 pveproxy[2741582]: worker exit
Feb 06 00:01:47 proxmox1 spiceproxy[2142143]: worker exit
Feb 06 00:01:51 proxmox1 spiceproxy[1616]: worker 2142143 finished
Feb 06 00:01:52 proxmox1 pveproxy[1609]: worker 2692965 finished
Feb 06 00:01:52 proxmox1 pveproxy[1609]: worker 2752908 finished
Feb 06 00:01:52 proxmox1 pveproxy[1609]: worker 2741582 finished
Feb 06 00:02:04 proxmox1 pvestatd[1545]: got timeout
Feb 06 00:02:07 proxmox1 pvestatd[1545]: status update time (5.557 seconds)
Feb 06 00:02:19 proxmox1 pvestatd[1545]: status update time (8.485 seconds)
Feb 06 00:02:42 proxmox1 pvestatd[1545]: status update time (10.772 seconds)
Feb 06 00:03:15 proxmox1 pvestatd[1545]: status update time (12.686 seconds)
Feb 06 00:03:40 proxmox1 pvestatd[1545]: status update time (5.004 seconds)
Feb 06 00:03:54 proxmox1 pvestatd[1545]: status update time (9.367 seconds)
Feb 06 00:04:18 proxmox1 pvestatd[1545]: got timeout
Feb 06 00:04:19 proxmox1 pvestatd[1545]: unable to activate storage 'newbacks' - directory '/mnt/pve/newbacks' does not exist or is unreachable
Feb 06 00:04:22 proxmox1 pvestatd[1545]: status update time (6.131 seconds)
Feb 06 00:04:54 proxmox1 pvestatd[1545]: status update time (29.250 seconds)
Feb 06 00:05:14 proxmox1 pvestatd[1545]: status update time (20.245 seconds)
Feb 06 00:05:17 proxmox1 pve-ha-crm[1590]: loop take too long (35 seconds)
Feb 06 00:05:22 proxmox1 pvestatd[1545]: status update time (7.508 seconds)
Feb 06 00:05:41 proxmox1 pvestatd[1545]: status update time (17.055 seconds)
Feb 06 00:05:47 proxmox1 watchdog-mux[688]: client watchdog expired - disable watchdog updates
Feb 06 00:05:49 proxmox1 watchdog-mux[688]: exit watchdog-mux with active connections
Feb 06 00:05:49 proxmox1 pve-ha-lrm[1622]: loop take too long (82 seconds)
Feb 06 00:05:50 proxmox1 pvestatd[1545]: status update time (9.364 seconds)
Feb 06 00:05:53 proxmox1 systemd[1]: watchdog-mux.service: Succeeded.
Feb 06 00:05:53 proxmox1 kernel: watchdog: watchdog0: watchdog did not stop!
-- Reboot --
 
Last edited:
Yes. I do have a separate cluster network on its own nic. I put some diagnostic output below - they seem to look good? My non-cluster traffic is on a different interface which is a bond of 2 separate nics. Is it possible that my system is trying to backup across my cluster network? Does that make sense? I wonder if my switch is failing..

root@proxmox1:~# pvecm status
Cluster information
-------------------
Name: SWVC
Config Version: 7
Transport: knet
Secure auth: on

Quorum information
------------------
Date: Wed Feb 7 10:40:38 2024
Quorum provider: corosync_votequorum
Nodes: 3
Node ID: 0x00000001
Ring ID: 1.2b2
Quorate: Yes

Votequorum information
----------------------
Expected votes: 3
Highest expected: 3
Total votes: 3
Quorum: 2
Flags: Quorate

Membership information
----------------------
Nodeid Votes Name
0x00000001 1 10.1.0.1 (local)
0x00000002 1 10.1.0.2
0x00000003 1 10.1.0.3

root@proxmox1:~# ping 10.1.0.2
PING 10.1.0.2 (10.1.0.2) 56(84) bytes of data.
64 bytes from 10.1.0.2: icmp_seq=1 ttl=64 time=0.221 ms
64 bytes from 10.1.0.2: icmp_seq=2 ttl=64 time=0.330 ms
64 bytes from 10.1.0.2: icmp_seq=3 ttl=64 time=0.202 ms
64 bytes from 10.1.0.2: icmp_seq=4 ttl=64 time=0.149 ms
64 bytes from 10.1.0.2: icmp_seq=5 ttl=64 time=0.180 ms
^C
--- 10.1.0.2 ping statistics ---
5 packets transmitted, 5 received, 0% packet loss, time 4101ms
rtt min/avg/max/mdev = 0.149/0.216/0.330/0.061 ms
root@proxmox1:~# ping 10.1.0.3
PING 10.1.0.3 (10.1.0.3) 56(84) bytes of data.
64 bytes from 10.1.0.3: icmp_seq=1 ttl=64 time=0.260 ms
64 bytes from 10.1.0.3: icmp_seq=2 ttl=64 time=0.236 ms
64 bytes from 10.1.0.3: icmp_seq=3 ttl=64 time=0.280 ms
64 bytes from 10.1.0.3: icmp_seq=4 ttl=64 time=0.163 ms
64 bytes from 10.1.0.3: icmp_seq=5 ttl=64 time=0.165 ms
^C
--- 10.1.0.3 ping statistics ---
5 packets transmitted, 5 received, 0% packet loss, time 4084ms
rtt min/avg/max/mdev = 0.163/0.220/0.280/0.048 ms
 
Yes. I do have a separate cluster network on its own nic. I put some diagnostic output below - they seem to look good? My non-cluster traffic is on a different interface which is a bond of 2 separate nics. Is it possible that my system is trying to backup across my cluster network? Does that make sense? I wonder if my switch is failing..

root@proxmox1:~# pvecm status
Cluster information
-------------------
Name: SWVC
Config Version: 7
Transport: knet
Secure auth: on

Quorum information
------------------
Date: Wed Feb 7 10:40:38 2024
Quorum provider: corosync_votequorum
Nodes: 3
Node ID: 0x00000001
Ring ID: 1.2b2
Quorate: Yes

Votequorum information
----------------------
Expected votes: 3
Highest expected: 3
Total votes: 3
Quorum: 2
Flags: Quorate

Membership information
----------------------
Nodeid Votes Name
0x00000001 1 10.1.0.1 (local)
0x00000002 1 10.1.0.2
0x00000003 1 10.1.0.3

root@proxmox1:~# ping 10.1.0.2
PING 10.1.0.2 (10.1.0.2) 56(84) bytes of data.
64 bytes from 10.1.0.2: icmp_seq=1 ttl=64 time=0.221 ms
64 bytes from 10.1.0.2: icmp_seq=2 ttl=64 time=0.330 ms
64 bytes from 10.1.0.2: icmp_seq=3 ttl=64 time=0.202 ms
64 bytes from 10.1.0.2: icmp_seq=4 ttl=64 time=0.149 ms
64 bytes from 10.1.0.2: icmp_seq=5 ttl=64 time=0.180 ms
^C
--- 10.1.0.2 ping statistics ---
5 packets transmitted, 5 received, 0% packet loss, time 4101ms
rtt min/avg/max/mdev = 0.149/0.216/0.330/0.061 ms
root@proxmox1:~# ping 10.1.0.3
PING 10.1.0.3 (10.1.0.3) 56(84) bytes of data.
64 bytes from 10.1.0.3: icmp_seq=1 ttl=64 time=0.260 ms
64 bytes from 10.1.0.3: icmp_seq=2 ttl=64 time=0.236 ms
64 bytes from 10.1.0.3: icmp_seq=3 ttl=64 time=0.280 ms
64 bytes from 10.1.0.3: icmp_seq=4 ttl=64 time=0.163 ms
64 bytes from 10.1.0.3: icmp_seq=5 ttl=64 time=0.165 ms
^C
--- 10.1.0.3 ping statistics ---
5 packets transmitted, 5 received, 0% packet loss, time 4084ms
rtt min/avg/max/mdev = 0.163/0.220/0.280/0.048 ms
Thanks for the outputs, they however only show that the cluster is currently healthy, that does not tell much about the time when the backups are ongoing. Please share your corosync config cat /etc/pve/corosync.conf as well as the network configuration for the hosts, cat /etc/network/interfaces. You could also monitor you network traffic during the backup runs, just to verify the traffic is routed as expected. Also, excluding the switch as point of failure would definitely be of interest. If possible, you might configure a failover network for corosync for maintenance on the other network without downtime, see https://pve.proxmox.com/pve-docs/pve-admin-guide.html#pvecm_redundancy
 
Hi. Thanks for your reply. The info is below. I have a feeling my config is ok but let me know if not. Yes I will observe closely during backup. Looking at my hosts file I think it should not be a problem. I don't have the cluster network addresses named at all. They are only referred to by ip and only in the corosync.conf
I think maybe it is my switch. I could use more redundancy.

root@proxmox1:~# cat /etc/pve/corosync.conf
logging {
debug: off
to_syslog: yes
}

nodelist {
node {
name: proxmox1
nodeid: 1
quorum_votes: 1
ring0_addr: 10.1.0.1
}
node {
name: proxmox2
nodeid: 2
quorum_votes: 1
ring0_addr: 10.1.0.2
}
node {
name: proxmox3
nodeid: 3
quorum_votes: 1
ring0_addr: 10.1.0.3
}
}

quorum {
provider: corosync_votequorum
}

totem {
cluster_name: SWVC
config_version: 7
interface {
bindnetaddr: 10.1.0.1
ringnumber: 0
}
ip_version: ipv4
secauth: on
version: 2
}
root@proxmox1:~# cat /etc/network/interfaces
auto lo
iface lo inet loopback

#onboard
auto eno1
iface eno1 inet static
address 10.1.0.1
netmask 255.255.255.0

#onboard
auto enp4s0
iface enp4s0 inet static
address 10.2.0.1
netmask 255.255.255.0

#mellanox
auto enp2s0
iface enp2s0 inet static
address 10.0.1.1
netmask 255.255.255.0

#pci card
iface enp1s0f0 inet manual

#pci card
iface enp1s0f1 inet manual

auto bond0
iface bond0 inet manual
slaves enp1s0f0 enp1s0f1
bond_miimon 100
bond_mode 802.3ad

auto vmbr0
iface vmbr0 inet static
address 192.168.1.21
netmask 255.255.255.0
gateway 192.168.1.1
bridge_ports bond0
bridge_stp off
bridge_fd 0

auto bond0.4088
iface bond0.4088 inet manual
vlan-raw-device bond0

auto vmbr4088
iface vmbr4088 inet manual
bridge_ports bond0.4088
bridge_stp off
bridge_fd 0

auto bond0.10
iface bond0.10 inet manual
vlan-raw-device bond0

auto vmbr10
iface vmbr10 inet manual
bridge_ports bond0.10
bridge_stp off
bridge_fd 0
 
Yes, this looks fine so far, so definitely worth to exclude the switch as bad actor here. Maybe you can find some relevant logs on the switch itself.
 
Happened again despite setting up a second, redundant cluster network. Also I replaced the single ethernet connection between switch and backup server (proxmox3) with a bonded connection.
Is it possible to tell anything from the log of the other 2 nodes? Although I see complaints about unreachable storage, it is always mounted whenever I check.
proxmox1 and proxmox2 nodes host containers and vms. proxmox3 functions as quorum and backup server.

New corosync.conf:

logging {
debug: off
to_syslog: yes
}

nodelist {
node {
name: proxmox1
nodeid: 1
quorum_votes: 1
ring0_addr: 10.1.0.1
ring1_addr: 10.2.0.1
}
node {
name: proxmox2
nodeid: 2
quorum_votes: 1
ring0_addr: 10.1.0.2
ring1_addr: 10.2.0.2
}
node {
name: proxmox3
nodeid: 3
quorum_votes: 1
ring0_addr: 10.1.0.3
ring1_addr: 10.2.0.3
}
}

quorum {
provider: corosync_votequorum
}

totem {
cluster_name: SWVC
config_version: 8
ip_version: ipv4
secauth: off
version: 2
interface {
bindnetaddr: 10.1.0.0
ringnumber: 0
}
interface {
bindnetaddr: 10.2.0.0
ringnumber: 1
}
}

relevant logs:

proxmox 1 log ---> same as previously


Feb 12 00:00:03 proxmox2 pmxcfs[1514]: [status] notice: received log
Feb 12 00:00:44 proxmox2 systemd[1]: Starting Rotate log files...
Feb 12 00:00:44 proxmox2 systemd[1]: Starting Daily man-db regeneration...
Feb 12 00:00:44 proxmox2 systemd[1]: Reloading PVE API Proxy Server.
Feb 12 00:00:46 proxmox2 systemd[1]: man-db.service: Succeeded.
Feb 12 00:00:46 proxmox2 systemd[1]: Finished Daily man-db regeneration.
Feb 12 00:00:49 proxmox2 audit[3456233]: AVC apparmor="DENIED" operation="mount" info="failed flags match" error=-13 profile="lxc-902_</var/lib/lxc>" name="/run/systemd/unit-root/proc/" pid=3456233 comm="(ogrotate)" fstype="proc" srcname="proc" flags="rw, nosuid, nodev, noexec"
Feb 12 00:00:49 proxmox2 kernel: audit: type=1400 audit(1707724849.622:243): apparmor="DENIED" operation="mount" info="failed flags match" error=-13 profile="lxc-902_</var/lib/lxc>" name="/run/systemd/unit-root/proc/" pid=3456233 comm="(ogrotate)" fstype="proc" srcname="proc" flags="rw, nosuid, nodev, noexec"
Feb 12 00:00:50 proxmox2 pveproxy[3456223]: send HUP to 1667
Feb 12 00:00:50 proxmox2 pveproxy[1667]: received signal HUP
Feb 12 00:00:50 proxmox2 systemd[1]: Reloaded PVE API Proxy Server.
Feb 12 00:00:50 proxmox2 pveproxy[1667]: server closing
Feb 12 00:00:50 proxmox2 systemd[1]: Reloading PVE SPICE Proxy Server.
Feb 12 00:00:50 proxmox2 pveproxy[1667]: server shutdown (restart)
Feb 12 00:00:51 proxmox2 spiceproxy[3456258]: send HUP to 1674
Feb 12 00:00:51 proxmox2 systemd[1]: Reloaded PVE SPICE Proxy Server.
Feb 12 00:00:51 proxmox2 systemd[1]: Stopping Proxmox VE firewall logger...
Feb 12 00:00:51 proxmox2 pvefw-logger[2869334]: received terminate request (signal)
Feb 12 00:00:51 proxmox2 pvefw-logger[2869334]: stopping pvefw logger
Feb 12 00:00:51 proxmox2 spiceproxy[1674]: received signal HUP
Feb 12 00:00:51 proxmox2 spiceproxy[1674]: server closing
Feb 12 00:00:51 proxmox2 systemd[1]: pvefw-logger.service: Succeeded.
Feb 12 00:00:51 proxmox2 systemd[1]: Stopped Proxmox VE firewall logger.
Feb 12 00:00:51 proxmox2 systemd[1]: pvefw-logger.service: Consumed 6.826s CPU time.
Feb 12 00:00:51 proxmox2 systemd[1]: Starting Proxmox VE firewall logger...
Feb 12 00:00:51 proxmox2 spiceproxy[1674]: server shutdown (restart)
Feb 12 00:00:51 proxmox2 pvefw-logger[3456288]: starting pvefw logger
Feb 12 00:00:51 proxmox2 systemd[1]: Started Proxmox VE firewall logger.
Feb 12 00:00:51 proxmox2 systemd[1]: logrotate.service: Succeeded.
Feb 12 00:00:51 proxmox2 systemd[1]: Finished Rotate log files.
Feb 12 00:00:51 proxmox2 pveproxy[1667]: restarting server
Feb 12 00:00:52 proxmox2 pveproxy[1667]: starting 3 worker(s)
Feb 12 00:00:52 proxmox2 pveproxy[1667]: worker 3456306 started
Feb 12 00:00:52 proxmox2 pveproxy[1667]: worker 3456308 started
Feb 12 00:00:52 proxmox2 pveproxy[1667]: worker 3456310 started
Feb 12 00:00:52 proxmox2 spiceproxy[1674]: restarting server
Feb 12 00:00:52 proxmox2 spiceproxy[1674]: starting 1 worker(s)
Feb 12 00:00:52 proxmox2 spiceproxy[1674]: worker 3456317 started
Feb 12 00:00:56 proxmox2 pveproxy[2285387]: worker exit
Feb 12 00:00:56 proxmox2 pveproxy[2285386]: worker exit
Feb 12 00:00:57 proxmox2 pveproxy[2285384]: worker exit
Feb 12 00:00:57 proxmox2 spiceproxy[2285383]: worker exit
Feb 12 00:01:01 proxmox2 pveproxy[1667]: worker 2285384 finished
Feb 12 00:01:01 proxmox2 pveproxy[1667]: worker 2285386 finished
Feb 12 00:01:01 proxmox2 pveproxy[1667]: worker 2285387 finished
Feb 12 00:01:02 proxmox2 spiceproxy[1674]: worker 2285383 finished
Feb 12 00:01:43 proxmox2 pvestatd[1610]: got timeout
Feb 12 00:02:43 proxmox2 pvestatd[1610]: got timeout
Feb 12 00:02:45 proxmox2 pvestatd[1610]: got timeout
Feb 12 00:02:45 proxmox2 pvestatd[1610]: unable to activate storage 'backups' - directory '/mnt/pve/backups' does not exist or is unreachable
Feb 12 00:02:54 proxmox2 pvestatd[1610]: got timeout
Feb 12 00:02:54 proxmox2 pvestatd[1610]: unable to activate storage 'backups' - directory '/mnt/pve/backups' does not exist or is unreachable
Feb 12 00:02:56 proxmox2 pvestatd[1610]: got timeout
Feb 12 00:03:03 proxmox2 pvestatd[1610]: got timeout
Feb 12 00:03:03 proxmox2 pvestatd[1610]: unable to activate storage 'backups' - directory '/mnt/pve/backups' does not exist or is unreachable
Feb 12 00:03:13 proxmox2 pvestatd[1610]: got timeout
Feb 12 00:03:13 proxmox2 pvestatd[1610]: unable to activate storage 'backups' - directory '/mnt/pve/backups' does not exist or is unreachable
Feb 12 00:03:15 proxmox2 pvestatd[1610]: got timeout
Feb 12 00:03:15 proxmox2 pvestatd[1610]: unable to activate storage 'newbacks' - directory '/mnt/pve/newbacks' does not exist or is unreachable
Feb 12 00:06:13 proxmox2 kernel: mlx4_en: enp2s0: Link Down
Feb 12 00:06:15 proxmox2 corosync[1559]: [KNET ] link: host: 1 link: 0 is down
Feb 12 00:06:15 proxmox2 corosync[1559]: [KNET ] link: host: 1 link: 1 is down
Feb 12 00:06:15 proxmox2 corosync[1559]: [KNET ] host: host: 1 (passive) best link: 0 (pri: 1)
Feb 12 00:06:15 proxmox2 corosync[1559]: [KNET ] host: host: 1 has no active links
Feb 12 00:06:15 proxmox2 corosync[1559]: [KNET ] host: host: 1 (passive) best link: 0 (pri: 1)
Feb 12 00:06:15 proxmox2 corosync[1559]: [KNET ] host: host: 1 has no active links
Feb 12 00:06:16 proxmox2 corosync[1559]: [TOTEM ] Token has not been received in 2737 ms
Feb 12 00:06:17 proxmox2 corosync[1559]: [TOTEM ] A processor failed, forming new configuration: token timed out (3650ms), waiting 4380ms for consensus.
Feb 12 00:06:21 proxmox2 corosync[1559]: [QUORUM] Sync members[2]: 2 3
Feb 12 00:06:21 proxmox2 corosync[1559]: [QUORUM] Sync left[1]: 1
Feb 12 00:06:21 proxmox2 corosync[1559]: [TOTEM ] A new membership (2.2ee) was formed. Members left: 1
Feb 12 00:06:21 proxmox2 corosync[1559]: [TOTEM ] Failed to receive the leave message. failed: 1



Feb 12 00:00:03 proxmox3 pmxcfs[1675]: [status] notice: received log
Feb 12 00:00:10 proxmox3 systemd[1]: Starting Rotate log files...
Feb 12 00:00:10 proxmox3 systemd[1]: Starting Daily man-db regeneration...
Feb 12 00:00:11 proxmox3 systemd[1]: Reloading PVE API Proxy Server.
Feb 12 00:00:16 proxmox3 systemd[1]: man-db.service: Succeeded.
Feb 12 00:00:16 proxmox3 systemd[1]: Finished Daily man-db regeneration.
Feb 12 00:00:18 proxmox3 pveproxy[723459]: send HUP to 1770
Feb 12 00:00:18 proxmox3 pveproxy[1770]: received signal HUP
Feb 12 00:00:18 proxmox3 pveproxy[1770]: server closing
Feb 12 00:00:18 proxmox3 pveproxy[1770]: server shutdown (restart)
Feb 12 00:00:18 proxmox3 systemd[1]: Reloaded PVE API Proxy Server.
Feb 12 00:00:18 proxmox3 systemd[1]: Reloading PVE SPICE Proxy Server.
Feb 12 00:00:19 proxmox3 spiceproxy[723531]: send HUP to 1777
Feb 12 00:00:19 proxmox3 spiceproxy[1777]: received signal HUP
Feb 12 00:00:19 proxmox3 spiceproxy[1777]: server closing
Feb 12 00:00:19 proxmox3 spiceproxy[1777]: server shutdown (restart)
Feb 12 00:00:19 proxmox3 systemd[1]: Reloaded PVE SPICE Proxy Server.
Feb 12 00:00:19 proxmox3 systemd[1]: Stopping Proxmox VE firewall logger...
Feb 12 00:00:19 proxmox3 pvefw-logger[338662]: received terminate request (signal)
Feb 12 00:00:19 proxmox3 pvefw-logger[338662]: stopping pvefw logger
Feb 12 00:00:19 proxmox3 systemd[1]: pvefw-logger.service: Succeeded.
Feb 12 00:00:19 proxmox3 systemd[1]: Stopped Proxmox VE firewall logger.
Feb 12 00:00:19 proxmox3 systemd[1]: pvefw-logger.service: Consumed 7.119s CPU time.
Feb 12 00:00:19 proxmox3 systemd[1]: Starting Proxmox VE firewall logger...
Feb 12 00:00:19 proxmox3 systemd[1]: Started Proxmox VE firewall logger.
Feb 12 00:00:19 proxmox3 pvefw-logger[723559]: starting pvefw logger
Feb 12 00:00:20 proxmox3 systemd[1]: logrotate.service: Succeeded.
Feb 12 00:00:20 proxmox3 systemd[1]: Finished Rotate log files.
Feb 12 00:00:20 proxmox3 spiceproxy[1777]: restarting server
Feb 12 00:00:20 proxmox3 spiceproxy[1777]: starting 1 worker(s)
Feb 12 00:00:20 proxmox3 spiceproxy[1777]: worker 723568 started
Feb 12 00:00:20 proxmox3 pveproxy[1770]: restarting server
Feb 12 00:00:20 proxmox3 pveproxy[1770]: starting 3 worker(s)
Feb 12 00:00:20 proxmox3 pveproxy[1770]: worker 723570 started
Feb 12 00:00:20 proxmox3 pveproxy[1770]: worker 723571 started
Feb 12 00:00:20 proxmox3 pveproxy[1770]: worker 723572 started
Feb 12 00:00:25 proxmox3 spiceproxy[338671]: worker exit
Feb 12 00:00:25 proxmox3 spiceproxy[1777]: worker 338671 finished
Feb 12 00:00:25 proxmox3 pveproxy[338682]: worker exit
Feb 12 00:00:25 proxmox3 pveproxy[338684]: worker exit
Feb 12 00:00:25 proxmox3 pveproxy[338683]: worker exit
Feb 12 00:00:25 proxmox3 pveproxy[1770]: worker 338682 finished
Feb 12 00:00:25 proxmox3 pveproxy[1770]: worker 338683 finished
Feb 12 00:00:25 proxmox3 pveproxy[1770]: worker 338684 finished
Feb 12 00:01:01 proxmox3 pvestatd[1712]: got timeout
Feb 12 00:02:42 proxmox3 pvestatd[1712]: got timeout
Feb 12 00:02:44 proxmox3 pvestatd[1712]: got timeout
Feb 12 00:03:01 proxmox3 pvestatd[1712]: got timeout
Feb 12 00:03:01 proxmox3 pvestatd[1712]: unable to activate storage 'backups' - directory '/mnt/pve/backups' does not exist or is unreachable
Feb 12 00:03:03 proxmox3 pvestatd[1712]: got timeout
Feb 12 00:06:15 proxmox3 corosync[29596]: [KNET ] link: host: 1 link: 0 is down
Feb 12 00:06:15 proxmox3 corosync[29596]: [KNET ] link: host: 1 link: 1 is down
Feb 12 00:06:15 proxmox3 corosync[29596]: [KNET ] host: host: 1 (passive) best link: 0 (pri: 1)
Feb 12 00:06:15 proxmox3 corosync[29596]: [KNET ] host: host: 1 has no active links
Feb 12 00:06:15 proxmox3 corosync[29596]: [KNET ] host: host: 1 (passive) best link: 0 (pri: 1)
Feb 12 00:06:15 proxmox3 corosync[29596]: [KNET ] host: host: 1 has no active links
Feb 12 00:06:16 proxmox3 corosync[29596]: [TOTEM ] Token has not been received in 2737 ms
Feb 12 00:06:17 proxmox3 corosync[29596]: [TOTEM ] A processor failed, forming new configuration: token timed out (3650ms), waiting 4380ms for consensus.
Feb 12 00:06:21 proxmox3 corosync[29596]: [QUORUM] Sync members[2]: 2 3
Feb 12 00:06:21 proxmox3 corosync[29596]: [QUORUM] Sync left[1]: 1
Feb 12 00:06:21 proxmox3 corosync[29596]: [TOTEM ] A new membership (2.2ee) was formed. Members left: 1
Feb 12 00:06:21 proxmox3 corosync[29596]: [TOTEM ] Failed to receive the leave message. failed: 1
Feb 12 00:06:21 proxmox3 pmxcfs[1675]: [dcdb] notice: members: 2/1514, 3/1675
Feb 12 00:06:21 proxmox3 pmxcfs[1675]: [dcdb] notice: starting data syncronisation
Feb 12 00:06:21 proxmox3 pmxcfs[1675]: [status] notice: members: 2/1514, 3/1675
Feb 12 00:06:21 proxmox3 pmxcfs[1675]: [status] notice: starting data syncronisation
Feb 12 00:06:21 proxmox3 corosync[29596]: [QUORUM] Members[2]: 2 3
Feb 12 00:06:21 proxmox3 corosync[29596]: [MAIN ] Completed service synchronization, ready to provide service.
Feb 12 00:06:21 proxmox3 pmxcfs[1675]: [dcdb] notice: received sync request (epoch 2/1514/0000001E)
Feb 12 00:06:21 proxmox3 pmxcfs[1675]: [status] notice: received sync request (epoch 2/1514/0000001C)
Feb 12 00:06:21 proxmox3 pmxcfs[1675]: [dcdb] notice: received all states
Feb 12 00:06:21 proxmox3 pmxcfs[1675]: [dcdb] notice: leader is 2/1514
Feb 12 00:06:21 proxmox3 pmxcfs[1675]: [dcdb] notice: synced members: 2/1514, 3/1675
Feb 12 00:06:21 proxmox3 pmxcfs[1675]: [dcdb] notice: all data is up to date
Feb 12 00:06:21 proxmox3 pmxcfs[1675]: [dcdb] notice: dfsm_deliver_queue: queue length 4
 
Still having problems with this. I replaced the case fans and added a case fan. I also added ventilation of the server closet. I replaced the motherboard and a cripsy looking raid card. I am starting to think this is not a hardware issue. For one thing it always happens with the same VM. It is a large VM but not the largest one that I backup. Is it something to do with "pve-daily-update"? Is it something to do with "pvestatd" timeout?
I see a similar problem from years ago (https://forum.proxmox.com/threads/n...pvestatd-warning-storage-is-not-online.12717/)
Why does this timeout?


Jun 29 01:00:02 proxmox1 pmxcfs[1259]: [status] notice: received log
Jun 29 01:12:14 proxmox1 pmxcfs[1259]: [dcdb] notice: data verification successful
Jun 29 01:17:01 proxmox1 CRON[548313]: pam_unix(cron:session): session opened for user root(uid=0) by (uid=0)
Jun 29 01:17:01 proxmox1 CRON[548314]: (root) CMD (cd / && run-parts --report /etc/cron.hourly)
Jun 29 01:17:01 proxmox1 CRON[548313]: pam_unix(cron:session): session closed for user root
Jun 29 02:00:03 proxmox1 pmxcfs[1259]: [status] notice: received log
Jun 29 02:12:14 proxmox1 pmxcfs[1259]: [dcdb] notice: data verification successful
Jun 29 02:17:01 proxmox1 CRON[563523]: pam_unix(cron:session): session opened for user root(uid=0) by (uid=0)
Jun 29 02:17:01 proxmox1 CRON[563524]: (root) CMD (cd / && run-parts --report /etc/cron.hourly)
Jun 29 02:17:01 proxmox1 CRON[563523]: pam_unix(cron:session): session closed for user root
Jun 29 02:18:33 proxmox1 systemd[1]: Starting pve-daily-update.service - Daily PVE download activities...
Jun 29 02:18:36 proxmox1 pveupdate[563970]: <root@pam> starting task UPID:proxmox1:00089B07:00CBDC37:667FD16C:aptupdate::root@pam:
Jun 29 02:18:37 proxmox1 pveupdate[563975]: update new package list: /var/lib/pve-manager/pkgupdates
Jun 29 02:18:40 proxmox1 pveupdate[563970]: <root@pam> end task UPID:proxmox1:00089B07:00CBDC37:667FD16C:aptupdate::root@pam: OK
Jun 29 02:18:41 proxmox1 systemd[1]: pve-daily-update.service: Deactivated successfully.
Jun 29 02:18:41 proxmox1 systemd[1]: Finished pve-daily-update.service - Daily PVE download activities.
Jun 29 02:18:41 proxmox1 systemd[1]: pve-daily-update.service: Consumed 4.738s CPU time.
Jun 29 02:20:15 proxmox1 pmxcfs[1259]: [status] notice: received log
Jun 29 02:20:20 proxmox1 pmxcfs[1259]: [status] notice: received log
Jun 29 03:00:06 proxmox1 pmxcfs[1259]: [status] notice: received log
Jun 29 03:03:42 proxmox1 pvestatd[1478]: got timeout
Jun 29 03:03:44 proxmox1 pvestatd[1478]: got timeout
Jun 29 03:03:53 proxmox1 pvestatd[1478]: got timeout
Jun 29 03:03:53 proxmox1 pvestatd[1478]: unable to activate storage 'backups' - directory '/mnt/pve/backups' does not exist or is unreachable
Jun 29 03:03:56 proxmox1 pvestatd[1478]: status update time (5.628 seconds)
Jun 29 03:10:01 proxmox1 CRON[577337]: pam_unix(cron:session): session opened for user root(uid=0) by (uid=0)
Jun 29 03:10:01 proxmox1 CRON[577338]: (root) CMD (test -e /run/systemd/system || SERVICE_MODE=1 /sbin/e2scrub_all -A -r)
Jun 29 03:10:01 proxmox1 CRON[577337]: pam_unix(cron:session): session closed for user root
Jun 29 03:11:39 proxmox1 systemd[1]: Starting man-db.service - Daily man-db regeneration...
Jun 29 03:11:40 proxmox1 systemd[1]: man-db.service: Deactivated successfully.
Jun 29 03:11:40 proxmox1 systemd[1]: Finished man-db.service - Daily man-db regeneration.
Jun 29 03:12:14 proxmox1 pmxcfs[1259]: [dcdb] notice: data verification successful
Jun 29 03:17:01 proxmox1 CRON[579093]: pam_unix(cron:session): session opened for user root(uid=0) by (uid=0)
Jun 29 03:17:01 proxmox1 CRON[579094]: (root) CMD (cd / && run-parts --report /etc/cron.hourly)
Jun 29 03:17:01 proxmox1 CRON[579093]: pam_unix(cron:session): session closed for user root
Jun 29 04:00:03 proxmox1 pvescheduler[590003]: <root@pam> starting task UPID:proxmox1:000900B4:00D5263F:667FE933:vzdump:104:root@pam:
Jun 29 04:00:04 proxmox1 pvescheduler[590004]: INFO: starting new backup job: vzdump 104 --storage newbacks --notes-template '{{guestname}}' --mode snapshot --mailnotification always --compress zstd --quiet 1 --fleecing 0
Jun 29 04:00:04 proxmox1 pvescheduler[590004]: INFO: Starting Backup of VM 104 (qemu)
Jun 29 04:00:33 proxmox1 pvestatd[1478]: got timeout
Jun 29 04:00:33 proxmox1 pvestatd[1478]: unable to activate storage 'local' - directory '/var/lib/vz' does not exist or is unreachable
Jun 29 04:00:47 proxmox1 pvestatd[1478]: status update time (36.717 seconds)
Jun 29 04:01:08 proxmox1 pvestatd[1478]: status update time (21.631 seconds)
Jun 29 04:01:20 proxmox1 pvestatd[1478]: status update time (11.188 seconds)
Jun 29 04:01:33 proxmox1 pve-firewall[1465]: firewall update time (5.748 seconds)
Jun 29 04:01:42 proxmox1 pvestatd[1478]: status update time (12.160 seconds)
Jun 29 04:01:58 proxmox1 pvestatd[1478]: status update time (6.079 seconds)
Jun 29 04:02:17 proxmox1 pvestatd[1478]: status update time (15.305 seconds)
Jun 29 04:02:27 proxmox1 pvescheduler[590206]: jobs: 'file-jobs_cfg'-locked command timed out - aborting
Jun 29 04:03:12 proxmox1 pvestatd[1478]: status update time (15.504 seconds)
Jun 29 04:03:25 proxmox1 pvestatd[1478]: status update time (12.397 seconds)
Jun 29 04:03:40 proxmox1 pvestatd[1478]: status update time (15.189 seconds)
Jun 29 04:03:40 proxmox1 pmxcfs[1259]: [status] notice: RRDC update error /var/lib/rrdcached/db/pve2-storage/proxmox1/drbd2: -1
Jun 29 04:03:47 proxmox1 pmxcfs[1259]: [status] notice: RRDC update error /var/lib/rrdcached/db/pve2-storage/proxmox1/drbd1: -1
Jun 29 04:03:56 proxmox1 pmxcfs[1259]: [status] notice: RRDC update error /var/lib/rrdcached/db/pve2-storage/proxmox1/newbacks: -1
Jun 29 04:03:56 proxmox1 pmxcfs[1259]: [status] notice: RRD update error /var/lib/rrdcached/db/pve2-storage/proxmox1/newbacks: /var/lib/rrdcached/db/pve2-storage/proxmox1/newbacks: illegal attempt to update using time 1719659020 when last update time is 1719659020 (minimum one second step)
Jun 29 04:03:56 proxmox1 pmxcfs[1259]: [status] notice: RRDC update error /var/lib/rrdcached/db/pve2-storage/proxmox1/backups: -1
Jun 29 04:03:56 proxmox1 pmxcfs[1259]: [status] notice: RRDC update error /var/lib/rrdcached/db/pve2-storage/proxmox1/local: -1
-- Reboot --

My other node looks like this:

Jun 29 02:31:09 proxmox2 pvescheduler[727087]: INFO: Starting Backup of VM 501 (qemu)
Jun 29 02:31:09 proxmox2 postfix/cleanup[737151]: B0E5A206CC: message-id=<20240629093109.B0E5A206CC@redacted>
Jun 29 02:31:09 proxmox2 postfix/qmgr[1293]: B0E5A206CC: from=<redacted>, size=31647, nrcpt=1 (queue active)
Jun 29 02:31:10 proxmox2 postfix/smtp[737156]: B0E5A206CC: replace: header From: Proxmox VE <redacted>: From:no-reply@sedrovet.com
Jun 29 02:31:10 proxmox2 postfix/smtp[737156]: B0E5A206CC: to=<redacted>, relay=smtp.zoho.com[204.141.42.56]:587, delay=0.86, delays=0.05/0.14/0.2/0.47, dsn=2.0.0, status=sent (250 Message received)
Jun 29 02:31:10 proxmox2 postfix/qmgr[1293]: B0E5A206CC: removed
Jun 29 02:31:24 proxmox2 pvestatd[1388]: status update time (6.480 seconds)
Jun 29 02:32:37 proxmox2 pvestatd[1388]: status update time (9.479 seconds)
Jun 29 02:32:54 proxmox2 pvestatd[1388]: status update time (6.482 seconds)
Jun 29 02:33:00 proxmox2 pve-firewall[1386]: firewall update time (6.352 seconds)
Jun 29 02:33:09 proxmox2 pvestatd[1388]: status update time (11.675 seconds)
Jun 29 02:33:35 proxmox2 pvestatd[1388]: status update time (6.307 seconds)
Jun 29 02:33:44 proxmox2 pvestatd[1388]: status update time (5.073 seconds)
Jun 29 02:34:12 proxmox2 pvestatd[1388]: status update time (12.131 seconds)
Jun 29 02:36:11 proxmox2 pvestatd[1388]: status update time (8.784 seconds)
Jun 29 02:37:16 proxmox2 pvescheduler[727087]: INFO: Finished Backup of VM 501 (00:06:07)
 
Last edited:
Still having problems with this. I replaced the case fans and added a case fan. I also added ventilation of the server closet. I replaced the motherboard and a cripsy looking raid card. I am starting to think this is not a hardware issue. For one thing it always happens with the same VM. It is a large VM but not the largest one that I backup. Is it something to do with "pve-daily-update"? Is it something to do with "pvestatd" timeout?
I see a similar problem from years ago (https://forum.proxmox.com/threads/n...pvestatd-warning-storage-is-not-online.12717/)
Why does this timeout?


Jun 29 01:00:02 proxmox1 pmxcfs[1259]: [status] notice: received log
Jun 29 01:12:14 proxmox1 pmxcfs[1259]: [dcdb] notice: data verification successful
Jun 29 01:17:01 proxmox1 CRON[548313]: pam_unix(cron:session): session opened for user root(uid=0) by (uid=0)
Jun 29 01:17:01 proxmox1 CRON[548314]: (root) CMD (cd / && run-parts --report /etc/cron.hourly)
Jun 29 01:17:01 proxmox1 CRON[548313]: pam_unix(cron:session): session closed for user root
Jun 29 02:00:03 proxmox1 pmxcfs[1259]: [status] notice: received log
Jun 29 02:12:14 proxmox1 pmxcfs[1259]: [dcdb] notice: data verification successful
Jun 29 02:17:01 proxmox1 CRON[563523]: pam_unix(cron:session): session opened for user root(uid=0) by (uid=0)
Jun 29 02:17:01 proxmox1 CRON[563524]: (root) CMD (cd / && run-parts --report /etc/cron.hourly)
Jun 29 02:17:01 proxmox1 CRON[563523]: pam_unix(cron:session): session closed for user root
Jun 29 02:18:33 proxmox1 systemd[1]: Starting pve-daily-update.service - Daily PVE download activities...
Jun 29 02:18:36 proxmox1 pveupdate[563970]: <root@pam> starting task UPID:proxmox1:00089B07:00CBDC37:667FD16C:aptupdate::root@pam:
Jun 29 02:18:37 proxmox1 pveupdate[563975]: update new package list: /var/lib/pve-manager/pkgupdates
Jun 29 02:18:40 proxmox1 pveupdate[563970]: <root@pam> end task UPID:proxmox1:00089B07:00CBDC37:667FD16C:aptupdate::root@pam: OK
Jun 29 02:18:41 proxmox1 systemd[1]: pve-daily-update.service: Deactivated successfully.
Jun 29 02:18:41 proxmox1 systemd[1]: Finished pve-daily-update.service - Daily PVE download activities.
Jun 29 02:18:41 proxmox1 systemd[1]: pve-daily-update.service: Consumed 4.738s CPU time.
Jun 29 02:20:15 proxmox1 pmxcfs[1259]: [status] notice: received log
Jun 29 02:20:20 proxmox1 pmxcfs[1259]: [status] notice: received log
Jun 29 03:00:06 proxmox1 pmxcfs[1259]: [status] notice: received log
Jun 29 03:03:42 proxmox1 pvestatd[1478]: got timeout
Jun 29 03:03:44 proxmox1 pvestatd[1478]: got timeout
Jun 29 03:03:53 proxmox1 pvestatd[1478]: got timeout
Jun 29 03:03:53 proxmox1 pvestatd[1478]: unable to activate storage 'backups' - directory '/mnt/pve/backups' does not exist or is unreachable
Jun 29 03:03:56 proxmox1 pvestatd[1478]: status update time (5.628 seconds)
Jun 29 03:10:01 proxmox1 CRON[577337]: pam_unix(cron:session): session opened for user root(uid=0) by (uid=0)
Jun 29 03:10:01 proxmox1 CRON[577338]: (root) CMD (test -e /run/systemd/system || SERVICE_MODE=1 /sbin/e2scrub_all -A -r)
Jun 29 03:10:01 proxmox1 CRON[577337]: pam_unix(cron:session): session closed for user root
Jun 29 03:11:39 proxmox1 systemd[1]: Starting man-db.service - Daily man-db regeneration...
Jun 29 03:11:40 proxmox1 systemd[1]: man-db.service: Deactivated successfully.
Jun 29 03:11:40 proxmox1 systemd[1]: Finished man-db.service - Daily man-db regeneration.
Jun 29 03:12:14 proxmox1 pmxcfs[1259]: [dcdb] notice: data verification successful
Jun 29 03:17:01 proxmox1 CRON[579093]: pam_unix(cron:session): session opened for user root(uid=0) by (uid=0)
Jun 29 03:17:01 proxmox1 CRON[579094]: (root) CMD (cd / && run-parts --report /etc/cron.hourly)
Jun 29 03:17:01 proxmox1 CRON[579093]: pam_unix(cron:session): session closed for user root
Jun 29 04:00:03 proxmox1 pvescheduler[590003]: <root@pam> starting task UPID:proxmox1:000900B4:00D5263F:667FE933:vzdump:104:root@pam:
Jun 29 04:00:04 proxmox1 pvescheduler[590004]: INFO: starting new backup job: vzdump 104 --storage newbacks --notes-template '{{guestname}}' --mode snapshot --mailnotification always --compress zstd --quiet 1 --fleecing 0
Jun 29 04:00:04 proxmox1 pvescheduler[590004]: INFO: Starting Backup of VM 104 (qemu)
Jun 29 04:00:33 proxmox1 pvestatd[1478]: got timeout
Jun 29 04:00:33 proxmox1 pvestatd[1478]: unable to activate storage 'local' - directory '/var/lib/vz' does not exist or is unreachable
Jun 29 04:00:47 proxmox1 pvestatd[1478]: status update time (36.717 seconds)
Jun 29 04:01:08 proxmox1 pvestatd[1478]: status update time (21.631 seconds)
Jun 29 04:01:20 proxmox1 pvestatd[1478]: status update time (11.188 seconds)
Jun 29 04:01:33 proxmox1 pve-firewall[1465]: firewall update time (5.748 seconds)
Jun 29 04:01:42 proxmox1 pvestatd[1478]: status update time (12.160 seconds)
Jun 29 04:01:58 proxmox1 pvestatd[1478]: status update time (6.079 seconds)
Jun 29 04:02:17 proxmox1 pvestatd[1478]: status update time (15.305 seconds)
Jun 29 04:02:27 proxmox1 pvescheduler[590206]: jobs: 'file-jobs_cfg'-locked command timed out - aborting
Jun 29 04:03:12 proxmox1 pvestatd[1478]: status update time (15.504 seconds)
Jun 29 04:03:25 proxmox1 pvestatd[1478]: status update time (12.397 seconds)
Jun 29 04:03:40 proxmox1 pvestatd[1478]: status update time (15.189 seconds)
Jun 29 04:03:40 proxmox1 pmxcfs[1259]: [status] notice: RRDC update error /var/lib/rrdcached/db/pve2-storage/proxmox1/drbd2: -1
Jun 29 04:03:47 proxmox1 pmxcfs[1259]: [status] notice: RRDC update error /var/lib/rrdcached/db/pve2-storage/proxmox1/drbd1: -1
Jun 29 04:03:56 proxmox1 pmxcfs[1259]: [status] notice: RRDC update error /var/lib/rrdcached/db/pve2-storage/proxmox1/newbacks: -1
Jun 29 04:03:56 proxmox1 pmxcfs[1259]: [status] notice: RRD update error /var/lib/rrdcached/db/pve2-storage/proxmox1/newbacks: /var/lib/rrdcached/db/pve2-storage/proxmox1/newbacks: illegal attempt to update using time 1719659020 when last update time is 1719659020 (minimum one second step)
Jun 29 04:03:56 proxmox1 pmxcfs[1259]: [status] notice: RRDC update error /var/lib/rrdcached/db/pve2-storage/proxmox1/backups: -1
Jun 29 04:03:56 proxmox1 pmxcfs[1259]: [status] notice: RRDC update error /var/lib/rrdcached/db/pve2-storage/proxmox1/local: -1
-- Reboot --

My other node looks like this:

Jun 29 02:31:09 proxmox2 pvescheduler[727087]: INFO: Starting Backup of VM 501 (qemu)
Jun 29 02:31:09 proxmox2 postfix/cleanup[737151]: B0E5A206CC: message-id=<20240629093109.B0E5A206CC@redacted>
Jun 29 02:31:09 proxmox2 postfix/qmgr[1293]: B0E5A206CC: from=<redacted>, size=31647, nrcpt=1 (queue active)
Jun 29 02:31:10 proxmox2 postfix/smtp[737156]: B0E5A206CC: replace: header From: Proxmox VE <redacted>: From:no-reply@sedrovet.com
Jun 29 02:31:10 proxmox2 postfix/smtp[737156]: B0E5A206CC: to=<redacted>, relay=smtp.zoho.com[204.141.42.56]:587, delay=0.86, delays=0.05/0.14/0.2/0.47, dsn=2.0.0, status=sent (250 Message received)
Jun 29 02:31:10 proxmox2 postfix/qmgr[1293]: B0E5A206CC: removed
Jun 29 02:31:24 proxmox2 pvestatd[1388]: status update time (6.480 seconds)
Jun 29 02:32:37 proxmox2 pvestatd[1388]: status update time (9.479 seconds)
Jun 29 02:32:54 proxmox2 pvestatd[1388]: status update time (6.482 seconds)
Jun 29 02:33:00 proxmox2 pve-firewall[1386]: firewall update time (6.352 seconds)
Jun 29 02:33:09 proxmox2 pvestatd[1388]: status update time (11.675 seconds)
Jun 29 02:33:35 proxmox2 pvestatd[1388]: status update time (6.307 seconds)
Jun 29 02:33:44 proxmox2 pvestatd[1388]: status update time (5.073 seconds)
Jun 29 02:34:12 proxmox2 pvestatd[1388]: status update time (12.131 seconds)
Jun 29 02:36:11 proxmox2 pvestatd[1388]: status update time (8.784 seconds)
Jun 29 02:37:16 proxmox2 pvescheduler[727087]: INFO: Finished Backup of VM 501 (00:06:07)
Hi,
did you validate that the network is fine during backups? Is the NFS server reachable/pingable from the host having the issues the whole time during the backup?
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!