Masking bad RAM with Grub2
I recently ran into the situation that during installation of some packages in Debian the display started showing graphic errors and the root file system reported to be read only (as it was configure to switch to read-only on errors through its mount options). memtest86+ first complete pass showed no errors at all but later runs indicated errors at at least three addresses:
- 00010c1d370
- 00010c1dab0
- 00004c1da90
Interestingly, grub2 supports masking sections of RAM out of the box, a
feature I recently spotted in /etc/grub/grub.cfg
by chance. The example and
documentation of parameter GRUB_BADRAM
in grub.cfg
looked like it was just a
list of sectors to ignore so I started with "0x10c1d370,0x10c1dab0,0x04c1da90"
for it... to find a frozen Grub after reboot. After a bit of investigation I
learned that every second entry is a mask on its predecessor and found a
good howto
and
on how to construct these. The bad RAM mask gave me a few hours of no
noticeable errors... and then it came back, from another unmasked section I
suppose. That made me order new RAM of a different brand.