Hi all,
Kind of an ongoing issue that I have been dealing with. It would seem that my pvestatd.service randomly crashes at or very near midnight on one of my nodes. Sometimes this causes only the daemon to crash, other times it will cause the node to become unresponsive until hard-reboot.
The main difference between the two nodes is that the crashing node is an AMD Ryzen, should be running the latest AGESA BIOS. Both nodes are running Proxmox 5.2-6, however this crash has been happening since 5.2-0 at lest. I realize that a lot of things could be causing this crash, so any help narrowing it down is much appreciated.
Here are the logs right around the time of crashing:
Node 1 (the crashing node):
Node 2 (the stable node):
One day I took a photo of the console on a hard crash. Here is the wall:
Thank you to anyone who takes the time to look this over. I couldn't get by without the support of this amazing community!
-Matt
Kind of an ongoing issue that I have been dealing with. It would seem that my pvestatd.service randomly crashes at or very near midnight on one of my nodes. Sometimes this causes only the daemon to crash, other times it will cause the node to become unresponsive until hard-reboot.
The main difference between the two nodes is that the crashing node is an AMD Ryzen, should be running the latest AGESA BIOS. Both nodes are running Proxmox 5.2-6, however this crash has been happening since 5.2-0 at lest. I realize that a lot of things could be causing this crash, so any help narrowing it down is much appreciated.
Here are the logs right around the time of crashing:
Node 1 (the crashing node):
Code:
Aug 02 23:16:54 Orion rrdcached[1784]: flushing old values
Aug 02 23:16:54 Orion rrdcached[1784]: rotating journals
Aug 02 23:16:54 Orion rrdcached[1784]: started new journal /var/lib/rrdcached/journal/rrd.journal.1533273414.037578
Aug 02 23:16:54 Orion rrdcached[1784]: removing old journal /var/lib/rrdcached/journal/rrd.journal.1533266214.037600
Aug 02 23:16:54 Orion pmxcfs[1817]: [dcdb] notice: data verification successful
Aug 02 23:17:01 Orion CRON[9493]: pam_unix(cron:session): session opened for user root by (uid=0)
Aug 02 23:17:01 Orion CRON[9494]: (root) CMD ( cd / && run-parts --report /etc/cron.hourly)
Aug 02 23:17:01 Orion CRON[9493]: pam_unix(cron:session): session closed for user root
Aug 02 23:48:49 Orion systemd[1]: Starting Daily apt download activities...
Aug 02 23:48:49 Orion systemd[1]: Started Daily apt download activities.
Aug 02 23:48:49 Orion systemd[1]: apt-daily.timer: Adding 2h 40min 51.115682s random time.
Aug 02 23:48:49 Orion systemd[1]: apt-daily.timer: Adding 8h 4min 39.312242s random time.
Aug 03 00:16:54 Orion rrdcached[1784]: flushing old values
Aug 03 00:16:54 Orion rrdcached[1784]: rotating journals
Aug 03 00:16:54 Orion rrdcached[1784]: started new journal /var/lib/rrdcached/journal/rrd.journal.1533277014.037604
Aug 03 00:16:54 Orion rrdcached[1784]: removing old journal /var/lib/rrdcached/journal/rrd.journal.1533269814.037569
Aug 03 00:16:54 Orion pmxcfs[1817]: [dcdb] notice: data verification successful
Aug 03 00:17:01 Orion CRON[13785]: pam_unix(cron:session): session opened for user root by (uid=0)
Aug 03 00:17:01 Orion CRON[13786]: (root) CMD ( cd / && run-parts --report /etc/cron.hourly)
Aug 03 00:17:01 Orion CRON[13785]: pam_unix(cron:session): session closed for user root
Aug 03 00:39:50 Orion systemd[1]: pvestatd.service: Main process exited, code=killed, status=11/SEGV
Aug 03 00:39:50 Orion kernel: pvestatd[1966]: segfault at c ip 00005627fb7b9cde sp 00007ffe9c5046b0 error 4 in perl[5627fb676000+1e6000]
Aug 03 00:39:51 Orion systemd[1]: pvestatd.service: Unit entered failed state.
Aug 03 00:39:51 Orion systemd[1]: pvestatd.service: Failed with result 'signal'.
Aug 03 01:16:54 Orion rrdcached[1784]: flushing old values
Aug 03 01:16:54 Orion rrdcached[1784]: rotating journals
Aug 03 01:16:54 Orion rrdcached[1784]: started new journal /var/lib/rrdcached/journal/rrd.journal.1533280614.037596
Aug 03 01:16:54 Orion rrdcached[1784]: removing old journal /var/lib/rrdcached/journal/rrd.journal.1533273414.037578
Aug 03 01:16:54 Orion pmxcfs[1817]: [dcdb] notice: data verification successful
Aug 03 01:17:01 Orion CRON[16801]: pam_unix(cron:session): session opened for user root by (uid=0)
Aug 03 01:17:01 Orion CRON[16802]: (root) CMD ( cd / && run-parts --report /etc/cron.hourly)
Aug 03 01:17:01 Orion CRON[16801]: pam_unix(cron:session): session closed for user root
Aug 03 02:16:54 Orion rrdcached[1784]: flushing old values
Aug 03 02:16:54 Orion rrdcached[1784]: rotating journals
Aug 03 02:16:54 Orion rrdcached[1784]: started new journal /var/lib/rrdcached/journal/rrd.journal.1533284214.037602
Aug 03 02:16:54 Orion rrdcached[1784]: removing old journal /var/lib/rrdcached/journal/rrd.journal.1533277014.037604
Aug 03 02:16:54 Orion pmxcfs[1817]: [dcdb] notice: data verification successful
Node 2 (the stable node):
Code:
Aug 02 23:16:54 Wash pmxcfs[6092]: [dcdb] notice: data verification successful
Aug 02 23:17:01 Wash CRON[20020]: pam_unix(cron:session): session opened for user root by (uid=0)
Aug 02 23:17:01 Wash CRON[20021]: (root) CMD ( cd / && run-parts --report /etc/cron.hourly)
Aug 02 23:17:01 Wash CRON[20020]: pam_unix(cron:session): session closed for user root
Aug 02 23:49:38 Wash rrdcached[1667]: flushing old values
Aug 02 23:49:38 Wash rrdcached[1667]: rotating journals
Aug 02 23:49:38 Wash rrdcached[1667]: started new journal /var/lib/rrdcached/journal/rrd.journal.1533275378.649650
Aug 02 23:49:38 Wash rrdcached[1667]: removing old journal /var/lib/rrdcached/journal/rrd.journal.1533268178.649595
Aug 03 00:16:54 Wash pmxcfs[6092]: [dcdb] notice: data verification successful
Aug 03 00:17:01 Wash CRON[5371]: pam_unix(cron:session): session opened for user root by (uid=0)
Aug 03 00:17:01 Wash CRON[5372]: (root) CMD ( cd / && run-parts --report /etc/cron.hourly)
Aug 03 00:17:01 Wash CRON[5371]: pam_unix(cron:session): session closed for user root
Aug 03 00:49:38 Wash rrdcached[1667]: flushing old values
Aug 03 00:49:38 Wash rrdcached[1667]: rotating journals
Aug 03 00:49:38 Wash rrdcached[1667]: started new journal /var/lib/rrdcached/journal/rrd.journal.1533278978.649663
Aug 03 00:49:38 Wash rrdcached[1667]: removing old journal /var/lib/rrdcached/journal/rrd.journal.1533271778.649650
Aug 03 01:16:54 Wash pmxcfs[6092]: [dcdb] notice: data verification successful
Aug 03 01:17:01 Wash CRON[23367]: pam_unix(cron:session): session opened for user root by (uid=0)
Aug 03 01:17:01 Wash CRON[23368]: (root) CMD ( cd / && run-parts --report /etc/cron.hourly)
Aug 03 01:17:01 Wash CRON[23367]: pam_unix(cron:session): session closed for user root
Aug 03 01:49:38 Wash rrdcached[1667]: flushing old values
Aug 03 01:49:38 Wash rrdcached[1667]: rotating journals
Aug 03 01:49:38 Wash rrdcached[1667]: started new journal /var/lib/rrdcached/journal/rrd.journal.1533282578.649650
Aug 03 01:49:38 Wash rrdcached[1667]: removing old journal /var/lib/rrdcached/journal/rrd.journal.1533275378.649650
Aug 03 02:06:21 Wash audit[5324]: AVC apparmor="DENIED" operation="mount" info="failed flags match" error=-13 profile="lxc-container-default-cgns" name="/" pid=5324 comm="(certbot)" flags="rw, rslave"
Aug 03 02:06:21 Wash kernel: audit: type=1400 audit(1533283581.298:234): apparmor="DENIED" operation="mount" info="failed flags match" error=-13 profile="lxc-container-default-cgns" name="/" pid=5324 comm="(certbot)" flags="rw, rslave"
Aug 03 02:16:54 Wash pmxcfs[6092]: [dcdb] notice: data verification successful
One day I took a photo of the console on a hard crash. Here is the wall:
Thank you to anyone who takes the time to look this over. I couldn't get by without the support of this amazing community!
-Matt