My PVE (running in a homelab on desktop hardware) keeps crashing. First it was crashing ever 2 month or so, but lastly, it crashed on two of the last three sundays at exactly the same time.
I retrieved some info from the SYSLOG from the webinterface:
Crash 1:
Apr 09 06:47:01 pve CRON[1164866]: pam_unix(cron:session): session opened for user root(uid=0) by (uid=0)
Apr 09 06:47:01 pve CRON[1164867]: (root) CMD (test -x /usr/sbin/anacron || ( cd / && run-parts --report /etc/cron.weekly ))
Apr 09 06:47:01 pve CRON[1164866]: pam_unix(cron:session): session closed for user root
Crash 2:
Apr 23 06:47:01 pve CRON[2051669]: pam_unix(cron:session): session opened for user root(uid=0) by (uid=0)
Apr 23 06:47:01 pve CRON[2051670]: (root) CMD (test -x /usr/sbin/anacron || ( cd / && run-parts --report /etc/cron.weekly ))
Apr 23 06:47:01 pve CRON[2051669]: pam_unix(cron:session): session closed for user root
Apr 23 06:48:57 pve kernel: ata5.00: exception Emask 0x0 SAct 0x1ffc001 SErr 0x40000 action 0x6 frozen
Apr 23 06:48:57 pve kernel: ata5: SError: { CommWake }
Apr 23 06:48:57 pve kernel: ata5.00: failed command: WRITE FPDMA QUEUED
Apr 23 06:48:57 pve kernel: ata5.00: cmd 61/10:00:c0:e2:ed/00:00:0e:00:00/40 tag 0 ncq dma 8192 out
res 40/00:00:00:4f:c2/00:00:00:00:00/00 Emask 0x4 (timeout)
Apr 23 06:48:57 pve kernel: ata5.00: status: { DRDY }
Apr 23 06:48:57 pve kernel: ata5.00: failed command: WRITE FPDMA QUEUED
Apr 23 06:48:57 pve kernel: ata5.00: cmd 61/08:70:f8:f0:9c/00:00:0d:00:00/40 tag 14 ncq dma 4096 out
res 40/00:01:01:4f:c2/00:00:00:00:00/00 Emask 0x4 (timeout)
Apr 23 06:48:57 pve kernel: ata5.00: status: { DRDY }
Apr 23 06:48:57 pve kernel: ata5.00: failed command: WRITE FPDMA QUEUED
Apr 23 06:48:57 pve kernel: ata5.00: cmd 61/08:78:b8:b3:9d/00:00:0d:00:00/40 tag 15 ncq dma 4096 out
res 40/00:ff:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
Apr 23 06:48:57 pve kernel: ata5.00: status: { DRDY }
Apr 23 06:48:57 pve kernel: ata5.00: failed command: WRITE FPDMA QUEUED
Apr 23 06:48:57 pve kernel: ata5.00: cmd 61/08:80:c0:87:c3/00:00:0d:00:00/40 tag 16 ncq dma 4096 out
res 40/00:00:00:4f:c2/00:00:00:00:00/00 Emask 0x4 (timeout)
Apr 23 06:48:57 pve kernel: ata5.00: status: { DRDY }
Apr 23 06:48:57 pve kernel: ata5.00: failed command: WRITE FPDMA QUEUED
Apr 23 06:48:57 pve kernel: ata5.00: cmd 61/08:88:e8:b7:c3/00:00:0d:00:00/40 tag 17 ncq dma 4096 out
res 40/00:ff:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
Apr 23 06:48:57 pve kernel: ata5.00: status: { DRDY }
Apr 23 06:48:57 pve kernel: ata5.00: failed command: WRITE FPDMA QUEUED
Apr 23 06:48:57 pve kernel: ata5.00: cmd 61/08:90:68:2d:ec/00:00:0e:00:00/40 tag 18 ncq dma 4096 out
res 40/00:00:00:4f:c2/00:00:00:00:00/00 Emask 0x4 (timeout)
Apr 23 06:48:57 pve kernel: ata5.00: status: { DRDY }
Apr 23 06:48:57 pve kernel: ata5.00: failed command: WRITE FPDMA QUEUED
Apr 23 06:48:57 pve kernel: ata5.00: cmd 61/08:98:18:3a:ec/00:00:0e:00:00/40 tag 19 ncq dma 4096 out
res 40/00:01:00:4f:c2/00:00:00:00:00/00 Emask 0x4 (timeout)
Apr 23 06:48:57 pve kernel: ata5.00: status: { DRDY }
Apr 23 06:48:57 pve kernel: ata5.00: failed command: WRITE FPDMA QUEUED
Apr 23 06:48:57 pve kernel: ata5.00: cmd 61/08:a0:70:e2:ee/00:00:0e:00:00/40 tag 20 ncq dma 4096 out
res 40/00:01:06:4f:c2/00:00:00:00:00/00 Emask 0x4 (timeout)
Apr 23 06:48:57 pve kernel: ata5.00: status: { DRDY }
Apr 23 06:48:57 pve kernel: ata5.00: failed command: WRITE FPDMA QUEUED
Apr 23 06:48:57 pve kernel: ata5.00: cmd 61/08:a8:70:2f:f2/00:00:0e:00:00/40 tag 21 ncq dma 4096 out
res 40/00:01:01:4f:c2/00:00:00:00:00/00 Emask 0x4 (timeout)
Apr 23 06:48:57 pve kernel: ata5.00: status: { DRDY }
Apr 23 06:48:57 pve kernel: ata5.00: failed command: WRITE FPDMA QUEUED
Apr 23 06:48:57 pve kernel: ata5.00: cmd 61/08:b0:78:da:f0/00:00:0e:00:00/40 tag 22 ncq dma 4096 out
res 40/00:01:01:4f:c2/00:00:00:00:00/00 Emask 0x4 (timeout)
Apr 23 06:48:57 pve kernel: ata5.00: status: { DRDY }
Apr 23 06:48:57 pve kernel: ata5.00: failed command: WRITE FPDMA QUEUED
Apr 23 06:48:57 pve kernel: ata5.00: cmd 61/10:b8:b0:08:f2/00:00:0e:00:00/40 tag 23 ncq dma 8192 out
res 40/00:01:00:4f:c2/00:00:00:00:00/00 Emask 0x4 (timeout)
Apr 23 06:48:57 pve kernel: ata5.00: status: { DRDY }
Apr 23 06:48:57 pve kernel: ata5.00: failed command: WRITE FPDMA QUEUED
Apr 23 06:48:57 pve kernel: ata5.00: cmd 61/08:c0:30:84:f2/00:00:0e:00:00/40 tag 24 ncq dma 4096 out
res 40/00:01:00:4f:c2/00:00:00:00:00/00 Emask 0x4 (timeout)
Apr 23 06:48:57 pve kernel: ata5.00: status: { DRDY }
Apr 23 06:48:57 pve kernel: ata5: hard resetting link
Apr 23 06:48:57 pve kernel: ata5: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
Apr 23 06:48:57 pve kernel: ata5.00: configured for UDMA/133
Apr 23 06:48:57 pve kernel: ahci 0000:00:17.0: port does not support device sleep
Apr 23 06:48:58 pve kernel: ata5: EH complete
Apr 23 06:48:58 pve postfix/qmgr[1200]: 2039E34146B: from=<root@pve.local>, size=10154, nrcpt=1 (queue active)
Apr 23 06:48:58 pve pve-firewall[1253]: firewall update time (12.299 seconds)
Apr 23 06:48:58 pve pvestatd[1255]: status update time (31.185 seconds)
I am looking for some help identifying the problem.
I could not find out which job is running every Sunday at 6:47am. An other PVE instance from a friend of mine has the same entry at the same time.
PVE is installed on an M.2 SSD which is most likely the one with all the errors.
Thanks for any help!
I retrieved some info from the SYSLOG from the webinterface:
Crash 1:
Apr 09 06:47:01 pve CRON[1164866]: pam_unix(cron:session): session opened for user root(uid=0) by (uid=0)
Apr 09 06:47:01 pve CRON[1164867]: (root) CMD (test -x /usr/sbin/anacron || ( cd / && run-parts --report /etc/cron.weekly ))
Apr 09 06:47:01 pve CRON[1164866]: pam_unix(cron:session): session closed for user root
Crash 2:
Apr 23 06:47:01 pve CRON[2051669]: pam_unix(cron:session): session opened for user root(uid=0) by (uid=0)
Apr 23 06:47:01 pve CRON[2051670]: (root) CMD (test -x /usr/sbin/anacron || ( cd / && run-parts --report /etc/cron.weekly ))
Apr 23 06:47:01 pve CRON[2051669]: pam_unix(cron:session): session closed for user root
Apr 23 06:48:57 pve kernel: ata5.00: exception Emask 0x0 SAct 0x1ffc001 SErr 0x40000 action 0x6 frozen
Apr 23 06:48:57 pve kernel: ata5: SError: { CommWake }
Apr 23 06:48:57 pve kernel: ata5.00: failed command: WRITE FPDMA QUEUED
Apr 23 06:48:57 pve kernel: ata5.00: cmd 61/10:00:c0:e2:ed/00:00:0e:00:00/40 tag 0 ncq dma 8192 out
res 40/00:00:00:4f:c2/00:00:00:00:00/00 Emask 0x4 (timeout)
Apr 23 06:48:57 pve kernel: ata5.00: status: { DRDY }
Apr 23 06:48:57 pve kernel: ata5.00: failed command: WRITE FPDMA QUEUED
Apr 23 06:48:57 pve kernel: ata5.00: cmd 61/08:70:f8:f0:9c/00:00:0d:00:00/40 tag 14 ncq dma 4096 out
res 40/00:01:01:4f:c2/00:00:00:00:00/00 Emask 0x4 (timeout)
Apr 23 06:48:57 pve kernel: ata5.00: status: { DRDY }
Apr 23 06:48:57 pve kernel: ata5.00: failed command: WRITE FPDMA QUEUED
Apr 23 06:48:57 pve kernel: ata5.00: cmd 61/08:78:b8:b3:9d/00:00:0d:00:00/40 tag 15 ncq dma 4096 out
res 40/00:ff:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
Apr 23 06:48:57 pve kernel: ata5.00: status: { DRDY }
Apr 23 06:48:57 pve kernel: ata5.00: failed command: WRITE FPDMA QUEUED
Apr 23 06:48:57 pve kernel: ata5.00: cmd 61/08:80:c0:87:c3/00:00:0d:00:00/40 tag 16 ncq dma 4096 out
res 40/00:00:00:4f:c2/00:00:00:00:00/00 Emask 0x4 (timeout)
Apr 23 06:48:57 pve kernel: ata5.00: status: { DRDY }
Apr 23 06:48:57 pve kernel: ata5.00: failed command: WRITE FPDMA QUEUED
Apr 23 06:48:57 pve kernel: ata5.00: cmd 61/08:88:e8:b7:c3/00:00:0d:00:00/40 tag 17 ncq dma 4096 out
res 40/00:ff:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
Apr 23 06:48:57 pve kernel: ata5.00: status: { DRDY }
Apr 23 06:48:57 pve kernel: ata5.00: failed command: WRITE FPDMA QUEUED
Apr 23 06:48:57 pve kernel: ata5.00: cmd 61/08:90:68:2d:ec/00:00:0e:00:00/40 tag 18 ncq dma 4096 out
res 40/00:00:00:4f:c2/00:00:00:00:00/00 Emask 0x4 (timeout)
Apr 23 06:48:57 pve kernel: ata5.00: status: { DRDY }
Apr 23 06:48:57 pve kernel: ata5.00: failed command: WRITE FPDMA QUEUED
Apr 23 06:48:57 pve kernel: ata5.00: cmd 61/08:98:18:3a:ec/00:00:0e:00:00/40 tag 19 ncq dma 4096 out
res 40/00:01:00:4f:c2/00:00:00:00:00/00 Emask 0x4 (timeout)
Apr 23 06:48:57 pve kernel: ata5.00: status: { DRDY }
Apr 23 06:48:57 pve kernel: ata5.00: failed command: WRITE FPDMA QUEUED
Apr 23 06:48:57 pve kernel: ata5.00: cmd 61/08:a0:70:e2:ee/00:00:0e:00:00/40 tag 20 ncq dma 4096 out
res 40/00:01:06:4f:c2/00:00:00:00:00/00 Emask 0x4 (timeout)
Apr 23 06:48:57 pve kernel: ata5.00: status: { DRDY }
Apr 23 06:48:57 pve kernel: ata5.00: failed command: WRITE FPDMA QUEUED
Apr 23 06:48:57 pve kernel: ata5.00: cmd 61/08:a8:70:2f:f2/00:00:0e:00:00/40 tag 21 ncq dma 4096 out
res 40/00:01:01:4f:c2/00:00:00:00:00/00 Emask 0x4 (timeout)
Apr 23 06:48:57 pve kernel: ata5.00: status: { DRDY }
Apr 23 06:48:57 pve kernel: ata5.00: failed command: WRITE FPDMA QUEUED
Apr 23 06:48:57 pve kernel: ata5.00: cmd 61/08:b0:78:da:f0/00:00:0e:00:00/40 tag 22 ncq dma 4096 out
res 40/00:01:01:4f:c2/00:00:00:00:00/00 Emask 0x4 (timeout)
Apr 23 06:48:57 pve kernel: ata5.00: status: { DRDY }
Apr 23 06:48:57 pve kernel: ata5.00: failed command: WRITE FPDMA QUEUED
Apr 23 06:48:57 pve kernel: ata5.00: cmd 61/10:b8:b0:08:f2/00:00:0e:00:00/40 tag 23 ncq dma 8192 out
res 40/00:01:00:4f:c2/00:00:00:00:00/00 Emask 0x4 (timeout)
Apr 23 06:48:57 pve kernel: ata5.00: status: { DRDY }
Apr 23 06:48:57 pve kernel: ata5.00: failed command: WRITE FPDMA QUEUED
Apr 23 06:48:57 pve kernel: ata5.00: cmd 61/08:c0:30:84:f2/00:00:0e:00:00/40 tag 24 ncq dma 4096 out
res 40/00:01:00:4f:c2/00:00:00:00:00/00 Emask 0x4 (timeout)
Apr 23 06:48:57 pve kernel: ata5.00: status: { DRDY }
Apr 23 06:48:57 pve kernel: ata5: hard resetting link
Apr 23 06:48:57 pve kernel: ata5: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
Apr 23 06:48:57 pve kernel: ata5.00: configured for UDMA/133
Apr 23 06:48:57 pve kernel: ahci 0000:00:17.0: port does not support device sleep
Apr 23 06:48:58 pve kernel: ata5: EH complete
Apr 23 06:48:58 pve postfix/qmgr[1200]: 2039E34146B: from=<root@pve.local>, size=10154, nrcpt=1 (queue active)
Apr 23 06:48:58 pve pve-firewall[1253]: firewall update time (12.299 seconds)
Apr 23 06:48:58 pve pvestatd[1255]: status update time (31.185 seconds)
I am looking for some help identifying the problem.
I could not find out which job is running every Sunday at 6:47am. An other PVE instance from a friend of mine has the same entry at the same time.
PVE is installed on an M.2 SSD which is most likely the one with all the errors.
Thanks for any help!