Stress-testing Xen on Rawhide


Capturing coredumps

The first thing to do is to set up the machine so that any time a process segfaults, a coredump is produced and saved. Following tips on this page I added the following lines to /etc/rc.local:

# Create coredumps in /var/log/core
mkdir -p /var/log/core
chmod 0777 /var/log/core
echo "/var/log/core/core.%e.%t.%p" > /proc/sys/kernel/core_pattern
echo 0 > /proc/sys/kernel/core_uses_pid

These lines cause coredump files to be captured centrally in a single directory (/var/log/core) with a standardised naming scheme (eg. core.xen-vncfb.1179397829.2635).

However they don't yet ensure that coredumps will actually be produced. To change that we need to edit two more files. In /etc/security/limits.conf, add these lines:

*               hard    core            unlimited
*               soft    core            unlimited

and in /etc/profile you may need to comment out this line:

#ulimit -S -c 0 > /dev/null 2>&1

(why that is there I have no idea). If you log out and log in again you should see:

$ ulimit -Hc
$ ulimit -Sc

and the same should happen if you 'su' to root or log in as root.

Note that unfortunately this doesn't make coredumps happen for init scripts. They aren't run as a logged in user. So after booting the machine you should restart any services for which you want coredumps. In particular:

/etc/init.d/xend restart

Starting and stopping domains from scripts

One way to stress a Xen machine is to start and stop domains. To avoid having to do this manually, we'd like to do it from a script.

First of all you need to create a number of domains. For this testing they should be Unix-like and they should have a console and be set up to listen for logins on that console (because we rely on them creating a login prompt on the console in order to detect that they have fully booted). I simply created a Fedora Core 6 paravirt guest and cloned its disks to create 4 guests in all. My guests were called (imaginatively) fc6, fc6-2, fc6-3 and fc6-4.

It should be possible to boot the guest and wait for it to come up fully by doing:

# /usr/sbin/xm start fc6
# ./ fc6

If everything is working well then the command should not return until the guest has finished booting up.

Similarly to shut it down and wait until it has shut down fully:

# /usr/sbin/xm shutdown fc6
# ./ fc6

You are now ready to begin stress testing. Start four terminals and in each one do:

# ./ guestname

where guestname is fc6, fc6-2, etc. or whatever you called your guests.

Looking for segfaults

On x86-64 the kernel prints a message whenever a segfault happens so the easiest way to look out for them is to do:

$ watch -n 10 'dmesg | grep seg'

You can also keep an eye on the /var/log/core directory.

If you find a coredump, please report the problem on Bugzilla.

Testing under load

For real load testing, I also keep a copy of the latest Linux kernel around, and I use a simple script to continuously rebuild it:

$ cd /usr/src/linux-
$ while true; do make -j 4; make clean; done

Replace -j 4 with the number of CPU cores that you have in the system (excluding "hyperthreads" which you should probably turn off anyway).

rjones AT redhat DOT com

$Id: index.html,v 1.1 2007/05/17 10:54:29 rjones Exp $