I have a test host that is constantly generating EDAC errors after an upgrade to Scientific Linux 6.2. It’s possible that these are real errors but it’s also possible that it’s a problem with the motherboard hardware and/or BIOS. It’s an older ASUS DSBF-DE that’s already running the latest BIOS release from 2008. My experience with EDAC has been rocky and every reproducable error I’ve seen from it has been either a buggy motherboard or a kernel bug. On the otherhand, MCE has been has been highly reliable for me in catching memory problems and I’ve only once encountered a BIOS problem causing false MCE errors (which I got the motherboard vendor to fix). In the case of the test system throwing EDAC errors, the mcelog is empty.
An example of EDAC messages in the dmesg
:
EDAC MC0: CE row 0, channel 1, label "": (Branch=0 DRAM-Bank=6 RDWR=Read RAS=16075 CAS=2974, CE Err=0x2000 (Correctable Non-Mirrored Demand Data ECC))
EDAC MC0: CE row 0, channel 1, label "": (Branch=0 DRAM-Bank=0 RDWR=Read RAS=7233 CAS=6, CE Err=0x2000 (Correctable Non-Mirrored Demand Data ECC))
EDAC MC0: CE row 0, channel 1, label "": (Branch=0 DRAM-Bank=0 RDWR=Read RAS=7233 CAS=6, CE Err=0x2000 (Correctable Non-Mirrored Demand Data ECC))
EDAC MC0: CE row 0, channel 1, label "": (Branch=0 DRAM-Bank=5 RDWR=Read RAS=12666 CAS=2968, CE Err=0x2000 (Correctable Non-Mirrored Demand Data ECC))
EDAC MC0: CE row 0, channel 1, label "": (Branch=0 DRAM-Bank=0 RDWR=Read RAS=7233 CAS=6, CE Err=0x2000 (Correctable Non-Mirrored Demand Data ECC))
EDAC MC0: CE row 0, channel 1, label "": (Branch=0 DRAM-Bank=1 RDWR=Read RAS=3174 CAS=4, CE Err=0x2000 (Correctable Non-Mirrored Demand Data ECC))
I just want to completely disable EDAC on this system but disabling the kernel modules. The procedure to turn it off should be the same for all RHEL6.x derived distributions and more or less the same for all Linux 2.6/3 based systems.
Some of the EDAC code is platform specific so there will be a ‘core’ module and some platform specific bits. In this case, i5000_edac
is the module specific to my platform.
# lsmod | grep -i edac
i5000_edac 8867 0
edac_core 46773 3 i5000_edac
And now we just need to blacklist the loading of those modules.
[root@archdbn1 ~]# cat > /etc/modprobe.d/edac.conf < blacklist i5000_edac
> blacklist edac_core
> END
Reboot the system and then verify that the EDAC kernel modules are not loading and that there are no message in the system dmesg
.
# lsmod | grep -i edac
# dmesg | grep -i edac