1. Red Hat People
  2. Daniel Berrange
  3. RPM DB Recovery

RPM DB Recovery on RHEL 3 (RPM v 4.2.1)

This document provides an overview of how to deal with RPM database corruption with RHEL-3 based RPM. With the switch from LinuxThreads to NPTL there have been an abnormal spike in database corruption in comparison to RHEL-2.1, where is was typically confined to /var/ partition becoming full.

Stale lock file cleanup

If an RPM command hangs, segfaults, or otherwise behaves abnormally during use then the first task is to check for stale lock files. This can be accomplished with -CA option to the rpmdb_stat command:

 # cd /var/lib/rpm
 # /usr/lib/rpm/rpmdb_stat -CA

Look at the output under the sections headed 'Locks grouped by lockers' and 'Locks grouped by object'. If no RPM command is executing, then there should be no entries. The RPM DB format allows many processes to be concurrently reading *AND* writing to the DB, so there is no safe way for an RPM command to identify & remove a stale lock. Stale locks are typically a result of a process being killed in an abnormal manner (power loss, kernel crash, 'kill' from an impatient admin). The locks are maintained in a handful of files named with two initial underscores.

Since there is generally no 100% reliable way to determine if an arbitrary application has a lock on the RPM db, the best time at which to clear the stale locks is while in single user mode. Thus, during the first stage of the boot process, while the system is still single user, the RHEL /etc/rc.d/rc.sysinit script performs a cleanup of existing lock files boot

 # grep rpm /etc/rc.d/rc.sysinit
 rm -f /var/lib/rpm/__db*

Thus to clean up stale lock files, the best course of action is to simply reboot the machine. If for some reason, it is not feasible to reboot the machine, then it is feasible to simply delete the files manually. Before doing this, one must ensure no application has any of the RPM database files open. This can be checked with the lsof command. For example, as root, ensure the following command shows no lines of output

 # lsof | grep /var/lib/rpm 

If this shows no output then it is safe to delete the lock files

 # rm -f /var/lib/rpm/__db*

DB corruption recovery process

If after cleaning up stale lock files, problems are still experianced, then it is likely some level of database corruption is present. The file that usually requires rebuilding is master package metadata file /var/lib/rpm/Packages, and following that the indexes will also need to be re-generated. Before doing any potentially destructive action *ALWAYS* take a backup of the RPM DB (you are doing that regularly anyway right ;-)

 # cd /var/lib
 # tar zcvf /var/preserve/rpmdb-[today's date].tar.gz rpm

Now verify the integrity of the Packages file

 # cd /var/lib/rpm
 # rm -f __db*      # to avoid stale locks
 # /usr/lib/rpm/rpmdb_verify Packages

Iff this reports any errors then a dump and load of the DB is required. After dumping, verify the integrity of the newly loaded Packages file

 # mv Packages Packages.orig
 # /usr/lib/rpm/rpmdb_dump Packages.orig |
      /usr/lib/rpm/rpmdb_load Packages
 # /usr/lib/rpm/rpmdb_verify Packages

Iff the Packages file now passes the verify step, then as an additional sanity check query all headers in the DB by doing, and watch for any messages sent to standard error (it helps to discard standard out when looking for this):

 # rpm -qa 1> /dev/null

Iff this query passes without generating any messages to standard error, then it is time to rebuild the indexes by invoking

 # rpm -v --rebuilddb

At this point you should have a functioning RPM database again. If any of the recovery steps failed, then a bug report should be filed at http://bugzilla.redhat.com/. When creating the report, provide the tar.gz backup of the RPM DB as an attachment, along with any daily package list log files named /var/log/rpm*.

Recovery without /usr/lib/rpm/rpmdb_* tools present

The above recovery steps assume access to various tools under /usr/lib/rpm/, however, prior to verion 4.2.3 of RPM, these tools were only provided by the sub-package rpm-devel which may well not be available. In addition, since the database is in a damaged state, it will not be possible to actually install this additional RPM. There are two ways around this apparent chicken-and-egg scenario.

  1. Use the rpm2cpio command to extract the contents of rpm-devel to a temporary scratch directory
    # cd /root
    # rpm2cpio rpm-devel-4.2.1-4.4.i386.rpm | cpio -idmv
    
    This will place the appropriate tools into /root/usr/lib/rpm
  2. Copy the database to a second machine, repair it there & then copy it back to the original machine & restore it.
     # scp /var/preserve/rpmdb-XXX.tar.gz root@other-machine:/root
     # ssh root@other-machine
     $ cd /root
     $ tar zxvf rpmdb-XXX.tar.gz
    
    Now proceed as per the original instructions, except apply them to the directory /root/rpm, instead of /var/lib/rpm. When recovery is complete, copy the database back
     $ cd /root
     $ tar zcvf rpmdb-recovered-XXX.tar.gz rpm
     $ scp rpmdb-recovered-XXX.tar.gz root@original-machine:/root
     $ ssh root@original-machine
     # cd /var/lib
     # rm -f rpm/__*
     # tar zxvf /root/rpmdb-recovered-XXX.tar.gz
    

Further differences for RHEL 2.1

On RHEL 2.1, the commands in /usr/lib/rpm/rpmdb_XXX are not present. Instead use the /usr/bin/db_XXX commands from the db3-utils package.


Last updated on Wednesday, July 29, 2005.

Creative Commons License
This work is licensed under a Creative Commons Attribution-Noncommercial-Share Alike 2.5 License.