Why a memory test script?

As it turns out, lots of the dedicated memory test programs that you can run on an Intel computer are not all that good. The problem is, they attempt to test the memory in your computer by beating on it with the CPU. Unfortunately, in real life, the CPU isn't the only thing that beats on your memory. DMA based IDE drives, DMA based SCSI transfers, almost all modern PCI controller cards, etc. all use direct memory access to transfer data in and out of the machine. This takes place totally outside of the CPU, and in parallel with CPU operations. So, as a result, under a real system load, part of your memory bandwidth is consumed by the CPU and part of it is consumed by these DMA operations. As it turns out, the CPU isn't typically fast enough to place the same load on your memory that this combination of CPU and DMA does no matter what program you run. In order to get around that, the script that we have here intentionally does lots of CPU based memory access while also starting up lots of DMA based disk access (assuming your linux filesystem is on a DMA based device). This works your memory harder than any CPU based tester by itself.

What do you need?

You will need the shell script below and you will need a linux kernel tar.gz file and you will need a filesystem with at least 1 1/2 * RAM of free space. NOTE: the script below needs bash2 for the wait command! Don't bother trying to run it on bash1 or non-bash shells.

To run the script, define variables for anything in the script that isn't the way you want it (I didn't feel like writing an option parser, it was much faster and easier to just do the options this way) and let the script do its work.

Here's the shell script:
#!/bin/bash2
#
# memtest.sh
#
# Shell script to help isolate memory failures under linux
#
# Author: Doug Ledford  + contributors
#
# (C) Copyright 2000-2002 Doug Ledford; Red Hat, Inc.
# This shell script is released under the terms of the GNU General
# Public License Version 2, June 1991.  If you do not have a copy
# of the GNU General Public License Version 2, then one may be
# retrieved from http://people.redhat.com/dledford/GPL.html
#
# Note, this needs bash2 for the wait command support.

# This is where we will run the tests at
if [ -z "$TEST_DIR" ]; then
  TEST_DIR=/tmp
fi

# The location of the linux kernel source file we will be using
if [ -z "$SOURCE_FILE" ]; then
  SOURCE_FILE=$TEST_DIR/linux.tar.gz
fi

if [ ! -f "$SOURCE_FILE" ]; then
  echo "Missing source file $SOURCE_FILE"
  exit 1
fi

# How many passes to run of this test, higher numbers are better
if [ -z "$NR_PASSES" ]; then
  NR_PASSES=20
fi

# Guess how many megs the unpacked archive is
if [ -z "$MEG_PER_COPY" ]; then
  MEG_PER_COPY=$(ls -l $SOURCE_FILE | awk '{print int($5/1024/1024) * 4}')
fi

# How many trees do we have to unpack in order to make our trees be larger
# than physical RAM?  If we don't unpack more data than memory can hold
# before we start to run the diff program on the trees then we won't
# actually flush the data to disk and force the system to reread the data
# from disk.  Instead, the system will do everything in RAM.  That doesn't
# work (as far as the memory test is concerned).  It's the simultaneous
# unpacking of data in memory and the read/writes to hard disk via DMA that
# breaks the memory subsystem in most cases.  Doing everything in RAM without
# causing disk I/O will pass bad memory far more often than when you add
# in the disk I/O.
if [ -z "$NR_SIMULTANEOUS" ]; then
  NR_SIMULTANEOUS=$(free | awk -v meg_per_copy=$MEG_PER_COPY 'NR == 2 {print int($2*1.5/1024/meg_per_copy + (($2/1024)%meg_per_copy >= (meg_per_copy/2)) + (($2/1024/32) < 1))}')
fi

# Should we unpack/diff the $NR_SIMULTANEOUS trees in series or in parallel?
if [ ! -z "$PARALLEL" ]; then
  PARALLEL="yes"
else
  PARALLEL="no"
fi

if [ ! -z "$JUST_INFO" ]; then
  echo "TEST_DIR:		$TEST_DIR"
  echo "SOURCE_FILE:		$SOURCE_FILE"
  echo "NR_PASSES:		$NR_PASSES"
  echo "MEG_PER_COPY:		$MEG_PER_COPY"
  echo "NR_SIMULTANEOUS:	$NR_SIMULTANEOUS"
  echo "PARALLEL:		$PARALLEL"
  echo
  exit
fi

cd $TEST_DIR

# Remove any possible left over directories from a cancelled previous run
rm -fr linux linux.orig linux.pass.*

# Unpack the one copy of the source tree that we will be comparing against
tar -xzf $SOURCE_FILE
mv linux linux.orig

i=0
while [ "$i" -lt "$NR_PASSES" ]; do
  j=0
  while [ "$j" -lt "$NR_SIMULTANEOUS" ]; do
    if [ $PARALLEL = "yes" ]; then
      (mkdir $j; tar -xzf $SOURCE_FILE -C $j; mv $j/linux linux.pass.$j; rmdir $j) &
    else
      tar -xzf $SOURCE_FILE
      mv linux linux.pass.$j
    fi
    j=`expr $j + 1`
  done
  wait
  j=0
  while [ "$j" -lt "$NR_SIMULTANEOUS" ]; do
    if [ $PARALLEL = "yes" ]; then
      (diff -U 3 -rN linux.orig linux.pass.$j; rm -fr linux.pass.$j) &
    else
      diff -U 3 -rN linux.orig linux.pass.$j
      rm -fr linux.pass.$j
    fi
    j=`expr $j + 1`
  done
  wait
  i=`expr $i + 1`
done

# Clean up after ourselves
rm -fr linux linux.orig linux.pass.*
				          


    


How do you know if your memory passed?

Very simple. If you run that script from the command line on your computer and it completes without ever spewing a single message onto your screen, then you passed. If you get messages from diff about differences between files or any other anomolies such as that, then you failed.