There have been sporadic sightings of ext3 causing little blips of 100,000
context switches per second when under load.

At the start of do_get_write_access() we have this logic:

	repeat:
		lock_buffer(jh->bh);
		...
		unlock_buffer(jh->bh);
		...
		if (jh->j_list == BJ_Shadow) {
			sleep_on_buffer(jh->bh);
			goto repeat;
		}

The problem is that the unlock_buffer() will wake up anyone who is sleeping
in the sleep_on_buffer().

So if task A is asleep in sleep_on_buffer() and task B now runs
do_get_write_access(), task B will wake task A by accident.  Task B will then
sleep on the buffer and task A will loop, will run unlock_buffer() and then
wake task B.

This state will continue until I/O completes against the buffer and kjournal
changes jh->j_list.

Unless task A and task B happen to both have realtime scheduling policy - if
they do then kjournald will never run.  The state is never cleared and your
box locks up.


The fix is to not do the `goto repeat;' until the buffer has been taken of
the shadow list.  So we don't go and wake up the other waiter(s) until they
can actually proceed to use the buffer.

This bug was introduced introduced into 2.4.20-pre5.


 jbd/transaction.c |    3 ++-
 1 files changed, 2 insertions(+), 1 deletion(-)

--- linux-2.4.21-pre3-rmap-ext3merge/fs/jbd/transaction.c.=K0003=.orig	2003-01-23 16:29:21.000000000 +0000
+++ linux-2.4.21-pre3-rmap-ext3merge/fs/jbd/transaction.c	2003-01-23 16:29:39.000000000 +0000
@@ -669,7 +669,8 @@
 			spin_unlock(&journal_datalist_lock);
 			unlock_journal(journal);
 			/* commit wakes up all shadow buffers after IO */
-			sleep_on(&jh2bh(jh)->b_wait);
+			wait_event(jh2bh(jh)->b_wait,
+						jh->b_jlist != BJ_Shadow);
 			lock_journal(journal);
 			goto repeat;
 		}