I have a proxmox cluster with 3 nodes versions: 7.1-10.
Every day atleast one node goes to status ? and all of the VMs also go to ?.
The only thing that so far I can get them backonline is to kill the server and start it back up.
I'm really not sure what logs I need to look at.
/var/log/messages
/var/log/syslog
Any suggestions at what I should look at?
Every day atleast one node goes to status ? and all of the VMs also go to ?.
The only thing that so far I can get them backonline is to kill the server and start it back up.
I'm really not sure what logs I need to look at.
/var/log/messages
Code:
Mar 2 09:58:05 pve3 kernel: [ 4325.946512] 8021q: adding VLAN 0 to HW filter on device enp1s0f0
Mar 2 09:58:05 pve3 kernel: [ 4326.057002] cfg80211: Loading compiled-in X.509 certificates for regulatory database
Mar 2 09:58:05 pve3 kernel: [ 4326.058830] cfg80211: Loaded X.509 cert 'sforshee: 00b28ddf47aef9cea7'
Mar 2 09:58:05 pve3 kernel: [ 4326.060186] platform regulatory.0: Direct firmware load for regulatory.db failed with error -2
Mar 2 09:58:05 pve3 kernel: [ 4326.061519] cfg80211: failed to load regulatory.db
Mar 2 09:58:05 pve3 kernel: [ 4326.456402] device tap136i0 entered promiscuous mode
Mar 2 09:58:05 pve3 kernel: [ 4326.491910] fwbr136i0: port 1(fwln136i0) entered blocking state
Mar 2 09:58:05 pve3 kernel: [ 4326.492786] fwbr136i0: port 1(fwln136i0) entered disabled state
Mar 2 09:58:05 pve3 kernel: [ 4326.493668] device fwln136i0 entered promiscuous mode
Mar 2 09:58:05 pve3 kernel: [ 4326.494533] fwbr136i0: port 1(fwln136i0) entered blocking state
Mar 2 09:58:05 pve3 kernel: [ 4326.495337] fwbr136i0: port 1(fwln136i0) entered forwarding state
Mar 2 09:58:05 pve3 kernel: [ 4326.500353] vmbr0: port 16(fwpr136p0) entered blocking state
Mar 2 09:58:05 pve3 kernel: [ 4326.501189] vmbr0: port 16(fwpr136p0) entered disabled state
Mar 2 09:58:05 pve3 kernel: [ 4326.502071] device fwpr136p0 entered promiscuous mode
Mar 2 09:58:05 pve3 kernel: [ 4326.502931] vmbr0: port 16(fwpr136p0) entered blocking state
Mar 2 09:58:05 pve3 kernel: [ 4326.503748] vmbr0: port 16(fwpr136p0) entered forwarding state
Mar 2 09:58:05 pve3 kernel: [ 4326.508658] fwbr136i0: port 2(tap136i0) entered blocking state
Mar 2 09:58:05 pve3 kernel: [ 4326.509496] fwbr136i0: port 2(tap136i0) entered disabled state
Mar 2 09:58:05 pve3 kernel: [ 4326.510386] fwbr136i0: port 2(tap136i0) entered blocking state
Mar 2 09:58:05 pve3 kernel: [ 4326.511190] fwbr136i0: port 2(tap136i0) entered forwarding state
Mar 2 09:58:08 pve3 kernel: [ 4329.297522] IPv6: ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready
Mar 2 09:58:08 pve3 kernel: [ 4329.298922] vmbr0: port 9(veth140i0) entered blocking state
Mar 2 09:58:08 pve3 kernel: [ 4329.300215] vmbr0: port 9(veth140i0) entered forwarding state
Mar 2 09:58:08 pve3 kernel: [ 4329.520838] kauditd_printk_skb: 4 callbacks suppressed
Mar 2 09:58:08 pve3 kernel: [ 4329.520843] audit: type=1400 audit(1646233088.678:34): apparmor="STATUS" operation="profile_load" label="lxc-140_</var/lib/lxc>//&:lxc-140_<-var-lib-lxc>:unconfined" name="nvidia_modprobe" pid=14964 comm="apparmor_parser"
Mar 2 09:58:08 pve3 kernel: [ 4329.523528] audit: type=1400 audit(1646233088.678:35): apparmor="STATUS" operation="profile_load" label="lxc-140_</var/lib/lxc>//&:lxc-140_<-var-lib-lxc>:unconfined" name="nvidia_modprobe//kmod" pid=14964 comm="apparmor_parser"
Mar 2 09:58:08 pve3 kernel: [ 4329.642392] audit: type=1400 audit(1646233088.798:36): apparmor="STATUS" operation="profile_load" label="lxc-140_</var/lib/lxc>//&:lxc-140_<-var-lib-lxc>:unconfined" name="/usr/sbin/tcpdump" pid=14963 comm="apparmor_parser"
Mar 2 09:58:08 pve3 kernel: [ 4329.650126] audit: type=1400 audit(1646233088.806:37): apparmor="STATUS" operation="profile_load" label="lxc-140_</var/lib/lxc>//&:lxc-140_<-var-lib-lxc>:unconfined" name="/usr/lib/NetworkManager/nm-dhcp-client.action" pid=14967 comm="apparmor_parser"
Mar 2 09:58:08 pve3 kernel: [ 4329.652303] audit: type=1400 audit(1646233088.806:38): apparmor="STATUS" operation="profile_load" label="lxc-140_</var/lib/lxc>//&:lxc-140_<-var-lib-lxc>:unconfined" name="/usr/lib/NetworkManager/nm-dhcp-helper" pid=14967 comm="apparmor_parser"
Mar 2 09:58:08 pve3 kernel: [ 4329.654403] audit: type=1400 audit(1646233088.806:39): apparmor="STATUS" operation="profile_load" label="lxc-140_</var/lib/lxc>//&:lxc-140_<-var-lib-lxc>:unconfined" name="/usr/lib/connman/scripts/dhclient-script" pid=14967 comm="apparmor_parser"
Mar 2 09:58:08 pve3 kernel: [ 4329.656507] audit: type=1400 audit(1646233088.806:40): apparmor="STATUS" operation="profile_load" label="lxc-140_</var/lib/lxc>//&:lxc-140_<-var-lib-lxc>:unconfined" name="/{,usr/}sbin/dhclient" pid=14967 comm="apparmor_parser"
Mar 2 09:58:08 pve3 kernel: [ 4329.677381] IPv6: ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready
Mar 2 09:58:08 pve3 kernel: [ 4329.679074] vmbr0: port 11(veth143i0) entered blocking state
Mar 2 09:58:08 pve3 kernel: [ 4329.680637] vmbr0: port 11(veth143i0) entered forwarding state
Mar 2 09:58:08 pve3 kernel: [ 4329.692565] audit: type=1400 audit(1646233088.850:41): apparmor="STATUS" operation="profile_load" label="lxc-140_</var/lib/lxc>//&:lxc-140_<-var-lib-lxc>:unconfined" name="lsb_release" pid=14972 comm="apparmor_parser"
Mar 2 09:58:08 pve3 kernel: [ 4329.698052] audit: type=1400 audit(1646233088.854:42): apparmor="STATUS" operation="profile_load" label="lxc-140_</var/lib/lxc>//&:lxc-140_<-var-lib-lxc>:unconfined" name="/usr/bin/man" pid=14971 comm="apparmor_parser"
Mar 2 09:58:08 pve3 kernel: [ 4329.700005] audit: type=1400 audit(1646233088.854:43): apparmor="STATUS" operation="profile_load" label="lxc-140_</var/lib/lxc>//&:lxc-140_<-var-lib-lxc>:unconfined" name="man_filter" pid=14971 comm="apparmor_parser"
Mar 2 10:50:58 pve3 kernel: [ 7499.372518] perf: interrupt took too long (2505 > 2500), lowering kernel.perf_event_max_sample_rate to 79750
Mar 2 13:16:32 pve3 kernel: [16233.638310] perf: interrupt took too long (3151 > 3131), lowering kernel.perf_event_max_sample_rate to 63250
/var/log/syslog
Code:
Mar 2 15:49:42 pve3 pve-ha-lrm[14050]: Task 'UPID:pve3:000036E3:0006980C:621F85F8:qmstart:120:root@pam:' still active, waiting
Mar 2 15:49:43 pve3 pve-ha-lrm[14486]: Task 'UPID:pve3:00003899:0006997F:621F85FC:qmstart:134:root@pam:' still active, waiting
Mar 2 15:49:47 pve3 pve-ha-lrm[14050]: Task 'UPID:pve3:000036E3:0006980C:621F85F8:qmstart:120:root@pam:' still active, waiting
Mar 2 15:49:47 pve3 pve-ha-lrm[14748]: Task 'UPID:pve3:0000399E:00069A0C:621F85FD:qmstart:150:root@pam:' still active, waiting
Mar 2 15:49:48 pve3 pve-ha-lrm[14486]: Task 'UPID:pve3:00003899:0006997F:621F85FC:qmstart:134:root@pam:' still active, waiting
Mar 2 15:49:52 pve3 pve-ha-lrm[14748]: Task 'UPID:pve3:0000399E:00069A0C:621F85FD:qmstart:150:root@pam:' still active, waiting
Mar 2 15:49:52 pve3 pve-ha-lrm[14050]: Task 'UPID:pve3:000036E3:0006980C:621F85F8:qmstart:120:root@pam:' still active, waiting
Mar 2 15:49:53 pve3 pve-ha-lrm[14486]: Task 'UPID:pve3:00003899:0006997F:621F85FC:qmstart:134:root@pam:' still active, waiting
Mar 2 15:49:57 pve3 pve-ha-lrm[14050]: Task 'UPID:pve3:000036E3:0006980C:621F85F8:qmstart:120:root@pam:' still active, waiting
Mar 2 15:49:57 pve3 pve-ha-lrm[14748]: Task 'UPID:pve3:0000399E:00069A0C:621F85FD:qmstart:150:root@pam:' still active, waiting
Mar 2 15:49:58 pve3 pve-ha-lrm[14486]: Task 'UPID:pve3:00003899:0006997F:621F85FC:qmstart:134:root@pam:' still active, waiting
Mar 2 15:50:02 pve3 pve-ha-lrm[14050]: Task 'UPID:pve3:000036E3:0006980C:621F85F8:qmstart:120:root@pam:' still active, waiting
Mar 2 15:50:02 pve3 pve-ha-lrm[14748]: Task 'UPID:pve3:0000399E:00069A0C:621F85FD:qmstart:150:root@pam:' still active, waiting
Mar 2 15:50:03 pve3 pve-ha-lrm[14486]: Task 'UPID:pve3:00003899:0006997F:621F85FC:qmstart:134:root@pam:' still active, waiting
Mar 2 15:50:07 pve3 pve-ha-lrm[14748]: Task 'UPID:pve3:0000399E:00069A0C:621F85FD:qmstart:150:root@pam:' still active, waiting
Mar 2 15:50:07 pve3 pve-ha-lrm[14050]: Task 'UPID:pve3:000036E3:0006980C:621F85F8:qmstart:120:root@pam:' still active, waiting
Mar 2 15:50:08 pve3 pve-ha-lrm[14486]: Task 'UPID:pve3:00003899:0006997F:621F85FC:qmstart:134:root@pam:' still active, waiting
Mar 2 15:50:12 pve3 pve-ha-lrm[14748]: Task 'UPID:pve3:0000399E:00069A0C:621F85FD:qmstart:150:root@pam:' still active, waiting
Mar 2 15:50:12 pve3 pve-ha-lrm[14050]: Task 'UPID:pve3:000036E3:0006980C:621F85F8:qmstart:120:root@pam:' still active, waiting
Mar 2 15:50:13 pve3 pve-ha-lrm[14486]: Task 'UPID:pve3:00003899:0006997F:621F85FC:qmstart:134:root@pam:' still active, waiting
Mar 2 15:50:17 pve3 pve-ha-lrm[14050]: Task 'UPID:pve3:000036E3:0006980C:621F85F8:qmstart:120:root@pam:' still active, waiting
Mar 2 15:50:17 pve3 pve-ha-lrm[14748]: Task 'UPID:pve3:0000399E:00069A0C:621F85FD:qmstart:150:root@pam:' still active, waiting
Mar 2 15:50:18 pve3 pve-ha-lrm[14486]: Task 'UPID:pve3:00003899:0006997F:621F85FC:qmstart:134:root@pam:' still active, waiting
Mar 2 15:50:22 pve3 pve-ha-lrm[14748]: Task 'UPID:pve3:0000399E:00069A0C:621F85FD:qmstart:150:root@pam:' still active, waiting
Mar 2 15:50:22 pve3 pve-ha-lrm[14050]: Task 'UPID:pve3:000036E3:0006980C:621F85F8:qmstart:120:root@pam:' still active, waiting
Mar 2 15:50:23 pve3 pve-ha-lrm[14486]: Task 'UPID:pve3:00003899:0006997F:621F85FC:qmstart:134:root@pam:' still active, waiting
Mar 2 15:50:27 pve3 pve-ha-lrm[14050]: Task 'UPID:pve3:000036E3:0006980C:621F85F8:qmstart:120:root@pam:' still active, waiting
Mar 2 15:50:27 pve3 pve-ha-lrm[14748]: Task 'UPID:pve3:0000399E:00069A0C:621F85FD:qmstart:150:root@pam:' still active, waiting
Mar 2 15:50:28 pve3 pve-ha-lrm[14486]: Task 'UPID:pve3:00003899:0006997F:621F85FC:qmstart:134:root@pam:' still active, waiting
Mar 2 15:50:32 pve3 pve-ha-lrm[14050]: Task 'UPID:pve3:000036E3:0006980C:621F85F8:qmstart:120:root@pam:' still active, waiting
Mar 2 15:50:32 pve3 pve-ha-lrm[14748]: Task 'UPID:pve3:0000399E:00069A0C:621F85FD:qmstart:150:root@pam:' still active, waiting
Mar 2 15:50:33 pve3 pve-ha-lrm[14486]: Task 'UPID:pve3:00003899:0006997F:621F85FC:qmstart:134:root@pam:' still active, waiting
Mar 2 15:50:37 pve3 pve-ha-lrm[14748]: Task 'UPID:pve3:0000399E:00069A0C:621F85FD:qmstart:150:root@pam:' still active, waiting
Mar 2 15:50:37 pve3 pve-ha-lrm[14050]: Task 'UPID:pve3:000036E3:0006980C:621F85F8:qmstart:120:root@pam:' still active, waiting
Mar 2 15:50:38 pve3 pve-ha-lrm[14486]: Task 'UPID:pve3:00003899:0006997F:621F85FC:qmstart:134:root@pam:' still active, waiting
Mar 2 15:50:42 pve3 pve-ha-lrm[14748]: Task 'UPID:pve3:0000399E:00069A0C:621F85FD:qmstart:150:root@pam:' still active, waiting
Mar 2 15:50:42 pve3 pve-ha-lrm[14050]: Task 'UPID:pve3:000036E3:0006980C:621F85F8:qmstart:120:root@pam:' still active, waiting
Mar 2 15:50:43 pve3 pve-ha-lrm[14486]: Task 'UPID:pve3:00003899:0006997F:621F85FC:qmstart:134:root@pam:' still active, waiting
Mar 2 15:50:47 pve3 pve-ha-lrm[14748]: Task 'UPID:pve3:0000399E:00069A0C:621F85FD:qmstart:150:root@pam:' still active, waiting
Mar 2 15:50:47 pve3 pve-ha-lrm[14050]: Task 'UPID:pve3:000036E3:0006980C:621F85F8:qmstart:120:root@pam:' still active, waiting
Mar 2 15:50:48 pve3 pve-ha-lrm[14486]: Task 'UPID:pve3:00003899:0006997F:621F85FC:qmstart:134:root@pam:' still active, waiting
Mar 2 15:50:52 pve3 pve-ha-lrm[14748]: Task 'UPID:pve3:0000399E:00069A0C:621F85FD:qmstart:150:root@pam:' still active, waiting
Mar 2 15:50:52 pve3 pve-ha-lrm[14050]: Task 'UPID:pve3:000036E3:0006980C:621F85F8:qmstart:120:root@pam:' still active, waiting
Mar 2 15:50:52 pve3 systemd[1]: user@0.service: State 'final-sigterm' timed out. Killing.
Mar 2 15:50:52 pve3 systemd[1]: user@0.service: Killing process 55583 (systemd) with signal SIGKILL.
Mar 2 15:50:53 pve3 pve-ha-lrm[14486]: Task 'UPID:pve3:00003899:0006997F:621F85FC:qmstart:134:root@pam:' still active, waiting
Mar 2 15:50:57 pve3 pve-ha-lrm[14050]: Task 'UPID:pve3:000036E3:0006980C:621F85F8:qmstart:120:root@pam:' still active, waiting
Mar 2 15:50:57 pve3 pve-ha-lrm[14748]: Task 'UPID:pve3:0000399E:00069A0C:621F85FD:qmstart:150:root@pam:' still active, waiting
Mar 2 15:50:58 pve3 pve-ha-lrm[14486]: Task 'UPID:pve3:00003899:0006997F:621F85FC:qmstart:134:root@pam:' still active, waiting
Any suggestions at what I should look at?