I have a test host that is constantly generating EDAC errors after an upgrade to Scientific Linux 6.2. It’s possible that these are real errors but it’s also possible that it’s a problem with the motherboard hardware and/or BIOS. It’s an older ASUS DSBF-DE that’s already running the latest BIOS release from 2008. My experience with EDAC has been rocky and every reproducable error I’ve seen from it has been either a buggy motherboard or a kernel bug. On the otherhand, MCE has been has been highly reliable for me in catching memory problems and I’ve only once encountered a BIOS problem causing false MCE errors (which I got the motherboard vendor to fix). In the case of the test system throwing EDAC errors, the mcelog is empty.
An example of EDAC messages in the dmesg
:
EDAC MC0: CE row 0, channel 1, label "": (Branch=0 DRAM-Bank=6 RDWR=Read RAS=16075 CAS=2974, CE Err=0x2000 (Correctable Non-Mirrored Demand Data ECC)) EDAC MC0: CE row 0, channel 1, label "": (Branch=0 DRAM-Bank=0 RDWR=Read RAS=7233 CAS=6, CE Err=0x2000 (Correctable Non-Mirrored Demand Data ECC)) EDAC MC0: CE row 0, channel 1, label "": (Branch=0 DRAM-Bank=0 RDWR=Read RAS=7233 CAS=6, CE Err=0x2000 (Correctable Non-Mirrored Demand Data ECC)) EDAC MC0: CE row 0, channel 1, label "": (Branch=0 DRAM-Bank=5 RDWR=Read RAS=12666 CAS=2968, CE Err=0x2000 (Correctable Non-Mirrored Demand Data ECC)) EDAC MC0: CE row 0, channel 1, label "": (Branch=0 DRAM-Bank=0 RDWR=Read RAS=7233 CAS=6, CE Err=0x2000 (Correctable Non-Mirrored Demand Data ECC)) EDAC MC0: CE row 0, channel 1, label "": (Branch=0 DRAM-Bank=1 RDWR=Read RAS=3174 CAS=4, CE Err=0x2000 (Correctable Non-Mirrored Demand Data ECC))
I just want to completely disable EDAC on this system but disabling the kernel modules. The procedure to turn it off should be the same for all RHEL6.x derived distributions and more or less the same for all Linux 2.6/3 based systems.
Some of the EDAC code is platform specific so there will be a ‘core’ module and some platform specific bits. In this case, i5000_edac
is the module specific to my platform.
# lsmod | grep -i edac i5000_edac 8867 0 edac_core 46773 3 i5000_edac
And now we just need to blacklist the loading of those modules.
[root@archdbn1 ~]# cat > /etc/modprobe.d/edac.conf <blacklist i5000_edac > blacklist edac_core > END
Reboot the system and then verify that the EDAC kernel modules are not loading and that there are no message in the system dmesg
.
# lsmod | grep -i edac # dmesg | grep -i edac
2013-08-29 at 07:49
Hi Joshua,
I am trying to implement fault handling using EDAC for Freescale processor (MPC85XX) on 2.6.34 kernel (powerpc arch). But I couldn’t create sysfs entries under /sys/devices/system/edac/ for mc and pci. Though the ‘mc’ device is being registered under edac, the csrow elements mc* are not being created.
Keeping the vMC integration for EDAC on, I have got the following warning while building the kernel.
WARNING: “vMC_alloc_sel_record” [drivers/char/ipmi/vmc.ko] has no CRC!
Does this warning has any impact on the behavior of the code? The following is the dmesg log for edac.
EDAC MC: Ver: 2.1.0 Aug 28 2013
EDAC DEBUG: in /home/usr/src/linux/drivers/edac/edac_mc_sysfs.c, line at 1069: edac_sysfs_setup_mc_kset()
EDAC DEBUG: in /home/usr/src/linux/drivers/edac/edac_mc_sysfs.c, line at 1086: edac_sysfs_setup_mc_kset() Registered ‘…/edac/mc’ kobject
Freescale(R) MPC85xx EDAC driver, (C) 2006 Montavista Software
The following config options are enabled for EDAC
CONFIG_EDAC=y
CONFIG_EDAC_VMC=y
CONFIG_EDAC_DEBUG=y
CONFIG_EDAC_MM_EDAC=m
CONFIG_EDAC_MPC85XX=m
CONFIG_EDAC_DUMP_IRQ_REGS=y
On inserting the module edac_core.ko, error comes up.
> modprobe edac_core
FATAL: Error inserting edac_core (/lib/modules/2.6.34.12-kairos-edac-vmc-changes-t1/kernel/drivers/edac/edac_core.ko): Unknown symbol in module, or unknown parameter (see dmesg)
> dmesg | tail -f &
edac_core: no symbol version for vMC_alloc_sel_record
edac_core: Unknown symbol vMC_alloc_sel_record
vmc: no symbol version for vMC_alloc_sel_record
vmc: Unknown symbol vMC_alloc_sel_record
Could you please help me on how to register devices under edac?
2015-01-07 at 06:59
Hi Joshua,
I installed Oracle redhat 5.10 on hp DL580 G7, after installing, I’ve got same error and make a black list on my system.
I have question about EDAC, if I send EDAC into black list, what happen in my system when I got a problem?
thanks a lot,
2015-01-18 at 14:09
If the EDAC support in your kernel is broken, either way, you won’t be able detect an EDAC fault.