Good day.
I've been using proxmox on a Tyan Tempest with an intel i5000 mem controler mobo since version 1.0 wit no problems what-so-ever. I've recently updated to 1.2 also without incident.
Ever since though, I've been getting random crashes that seem to be related to the intel mem controller (this bad boy here http://tomoyo.sourceforge.jp/cgi-bin/lxr/source/drivers/edac/i5000_edac.c), as indicated by:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/276444
They suggest blacklisting the module, which I'm goingo to try now, but if it's a non-fatal error, why does it crashes? Also, the proposed solution relates to mem throtling, due to thermal considerations. This may not be the case here.
This only happened with 2.6.24-5-pve. Any ideas?
Full trace:
Jun 4 12:02:45 cronus kernel: EDAC i5000 MC0: NON-FATAL ERRORS Found!!! 1st NON-FATAL Err Reg= 0x20000
Jun 4 12:02:45 cronus kernel: EDAC i5000: NORTHBOUND CRC Error, bits= 0x20000
Jun 4 12:02:46 cronus kernel: EDAC i5000 MC0: NON-FATAL ERRORS Found!!! 1st NON-FATAL Err Reg= 0x20000
Jun 4 12:02:46 cronus kernel: EDAC i5000: NORTHBOUND CRC Error, bits= 0x20000
Jun 4 12:03:32 cronus kernel: EDAC i5000 MC0: NON-FATAL ERRORS Found!!! 1st NON-FATAL Err Reg= 0x20000
Jun 4 12:03:32 cronus kernel: EDAC i5000: NORTHBOUND CRC Error, bits= 0x20000
Jun 4 12:03:45 cronus kernel: EDAC i5000 MC0: NON-FATAL ERRORS Found!!! 1st NON-FATAL Err Reg= 0x20000
Jun 4 12:03:45 cronus kernel: EDAC i5000: NORTHBOUND CRC Error, bits= 0x20000
Jun 4 12:06:15 cronus kernel: EDAC i5000 MC0: NON-FATAL ERRORS Found!!! 1st NON-FATAL Err Reg= 0x20000
Jun 4 12:06:15 cronus kernel: EDAC i5000: NORTHBOUND CRC Error, bits= 0x20000
Jun 4 12:06:23 cronus kernel: EDAC i5000 MC0: NON-FATAL ERRORS Found!!! 1st NON-FATAL Err Reg= 0x20000
Jun 4 12:06:23 cronus kernel: EDAC i5000: NORTHBOUND CRC Error, bits= 0x20000
Jun 4 12:06:25 cronus kernel: EDAC i5000 MC0: NON-FATAL ERRORS Found!!! 1st NON-FATAL Err Reg= 0x20000
Jun 4 12:06:25 cronus kernel: EDAC i5000: NORTHBOUND CRC Error, bits= 0x20000
Jun 4 12:06:37 cronus kernel: EDAC i5000 MC0: NON-FATAL ERRORS Found!!! 1st NON-FATAL Err Reg= 0x20000
Jun 4 12:06:37 cronus kernel: EDAC i5000: NORTHBOUND CRC Error, bits= 0x20000
Jun 4 12:06:40 cronus kernel: EDAC i5000 MC0: FATAL ERRORS Found!!! 1st FATAL Err Reg= 0x1
Jun 4 12:06:40 cronus kernel: EDAC i5000 MC0: Alert on non-redundant retry or fast reset timeout
Jun 4 12:06:40 cronus kernel: EDAC MC0: UE row 1, channel-a= 0 channel-b= 1 labels "-": (Branch=0 DRAM-Bank=1 RDWR=Write RAS=7607 CAS=0 FATAL Err=0x1)
Jun 4 12:06:40 cronus kernel: EDAC i5000 MC0: NON-FATAL ERRORS Found!!! 1st NON-FATAL Err Reg= 0x20000
Jun 4 12:06:40 cronus kernel: EDAC i5000: NORTHBOUND CRC Error, bits= 0x20000
Jun 4 12:06:55 cronus kernel: EDAC i5000 MC0: NON-FATAL ERRORS Found!!! 1st NON-FATAL Err Reg= 0x20000
Jun 4 12:06:55 cronus kernel: EDAC i5000: NORTHBOUND CRC Error, bits= 0x20000
Jun 4 12:08:19 cronus kernel: EDAC i5000 MC0: NON-FATAL ERRORS Found!!! 1st NON-FATAL Err Reg= 0x20000
Jun 4 12:08:19 cronus kernel: EDAC i5000: NORTHBOUND CRC Error, bits= 0x20000
Jun 4 12:08:24 cronus kernel: EDAC i5000 MC0: NON-FATAL ERRORS Found!!! 1st NON-FATAL Err Reg= 0x20000
Jun 4 12:08:24 cronus kernel: EDAC i5000: NORTHBOUND CRC Error, bits= 0x20000
Jun 4 12:08:33 cronus kernel: EDAC i5000 MC0: NON-FATAL ERRORS Found!!! 1st NON-FATAL Err Reg= 0x20000
Jun 4 12:08:33 cronus kernel: EDAC i5000: NORTHBOUND CRC Error, bits= 0x20000
and then, crash!
Any help will be appreciated.
I've been using proxmox on a Tyan Tempest with an intel i5000 mem controler mobo since version 1.0 wit no problems what-so-ever. I've recently updated to 1.2 also without incident.
Ever since though, I've been getting random crashes that seem to be related to the intel mem controller (this bad boy here http://tomoyo.sourceforge.jp/cgi-bin/lxr/source/drivers/edac/i5000_edac.c), as indicated by:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/276444
They suggest blacklisting the module, which I'm goingo to try now, but if it's a non-fatal error, why does it crashes? Also, the proposed solution relates to mem throtling, due to thermal considerations. This may not be the case here.
This only happened with 2.6.24-5-pve. Any ideas?
Full trace:
Jun 4 12:02:45 cronus kernel: EDAC i5000 MC0: NON-FATAL ERRORS Found!!! 1st NON-FATAL Err Reg= 0x20000
Jun 4 12:02:45 cronus kernel: EDAC i5000: NORTHBOUND CRC Error, bits= 0x20000
Jun 4 12:02:46 cronus kernel: EDAC i5000 MC0: NON-FATAL ERRORS Found!!! 1st NON-FATAL Err Reg= 0x20000
Jun 4 12:02:46 cronus kernel: EDAC i5000: NORTHBOUND CRC Error, bits= 0x20000
Jun 4 12:03:32 cronus kernel: EDAC i5000 MC0: NON-FATAL ERRORS Found!!! 1st NON-FATAL Err Reg= 0x20000
Jun 4 12:03:32 cronus kernel: EDAC i5000: NORTHBOUND CRC Error, bits= 0x20000
Jun 4 12:03:45 cronus kernel: EDAC i5000 MC0: NON-FATAL ERRORS Found!!! 1st NON-FATAL Err Reg= 0x20000
Jun 4 12:03:45 cronus kernel: EDAC i5000: NORTHBOUND CRC Error, bits= 0x20000
Jun 4 12:06:15 cronus kernel: EDAC i5000 MC0: NON-FATAL ERRORS Found!!! 1st NON-FATAL Err Reg= 0x20000
Jun 4 12:06:15 cronus kernel: EDAC i5000: NORTHBOUND CRC Error, bits= 0x20000
Jun 4 12:06:23 cronus kernel: EDAC i5000 MC0: NON-FATAL ERRORS Found!!! 1st NON-FATAL Err Reg= 0x20000
Jun 4 12:06:23 cronus kernel: EDAC i5000: NORTHBOUND CRC Error, bits= 0x20000
Jun 4 12:06:25 cronus kernel: EDAC i5000 MC0: NON-FATAL ERRORS Found!!! 1st NON-FATAL Err Reg= 0x20000
Jun 4 12:06:25 cronus kernel: EDAC i5000: NORTHBOUND CRC Error, bits= 0x20000
Jun 4 12:06:37 cronus kernel: EDAC i5000 MC0: NON-FATAL ERRORS Found!!! 1st NON-FATAL Err Reg= 0x20000
Jun 4 12:06:37 cronus kernel: EDAC i5000: NORTHBOUND CRC Error, bits= 0x20000
Jun 4 12:06:40 cronus kernel: EDAC i5000 MC0: FATAL ERRORS Found!!! 1st FATAL Err Reg= 0x1
Jun 4 12:06:40 cronus kernel: EDAC i5000 MC0: Alert on non-redundant retry or fast reset timeout
Jun 4 12:06:40 cronus kernel: EDAC MC0: UE row 1, channel-a= 0 channel-b= 1 labels "-": (Branch=0 DRAM-Bank=1 RDWR=Write RAS=7607 CAS=0 FATAL Err=0x1)
Jun 4 12:06:40 cronus kernel: EDAC i5000 MC0: NON-FATAL ERRORS Found!!! 1st NON-FATAL Err Reg= 0x20000
Jun 4 12:06:40 cronus kernel: EDAC i5000: NORTHBOUND CRC Error, bits= 0x20000
Jun 4 12:06:55 cronus kernel: EDAC i5000 MC0: NON-FATAL ERRORS Found!!! 1st NON-FATAL Err Reg= 0x20000
Jun 4 12:06:55 cronus kernel: EDAC i5000: NORTHBOUND CRC Error, bits= 0x20000
Jun 4 12:08:19 cronus kernel: EDAC i5000 MC0: NON-FATAL ERRORS Found!!! 1st NON-FATAL Err Reg= 0x20000
Jun 4 12:08:19 cronus kernel: EDAC i5000: NORTHBOUND CRC Error, bits= 0x20000
Jun 4 12:08:24 cronus kernel: EDAC i5000 MC0: NON-FATAL ERRORS Found!!! 1st NON-FATAL Err Reg= 0x20000
Jun 4 12:08:24 cronus kernel: EDAC i5000: NORTHBOUND CRC Error, bits= 0x20000
Jun 4 12:08:33 cronus kernel: EDAC i5000 MC0: NON-FATAL ERRORS Found!!! 1st NON-FATAL Err Reg= 0x20000
Jun 4 12:08:33 cronus kernel: EDAC i5000: NORTHBOUND CRC Error, bits= 0x20000
and then, crash!
Any help will be appreciated.