From agk@redhat.com Tue May 6 17:25:07 2008 Date: Tue, 6 May 2008 22:24:36 +0100 From: Alasdair G Kergon To: Mikulas Patocka Cc: Milan Broz Subject: Re: snapshot merging Second patch: dm-snapshot-fix-race-during-exception-creation.patch Again, factored out the hash table check into its own function. Moved the goto so the snapshot lock is not held during the yield in case something causes snapshot to be invalidated which requires that same lock - I'm not interested whether that race exists in the current code or not - if it doesn't, future changes might introduce it so we might as well play safe here. Alasdair. From: Mikulas Patocka Fix a race condition that returns incorrect data when a write causes an exception to be allocated whilst a read is still in flight. The race condition happens as follows: * A read to non-reallocated sector in the snapshot is submitted so that the read is routed to the original device. * A write to the original device is submitted. The write causes an exception that reallocates the block. The write proceeds. * The original read is dequeued and reads the wrong data. This race can be triggered with CFQ scheduler and one thread writing and multiple threads reading simultaneously. N.B. This patch relies upon dm-kcopyd-per-device.patch to avoid a deadlock. Signed-off-by: Mikulas Patocka Signed-off-by: Alasdair G Kergon Cc: stable@kernel.org --- drivers/md/dm-snap.c | 31 +++++++++++++++++++++++++++++++ 1 file changed, 31 insertions(+) Index: linux-2.6.25.3/drivers/md/dm-snap.c =================================================================== --- linux-2.6.25.3.orig/drivers/md/dm-snap.c 2008-05-14 19:21:06.000000000 +0200 +++ linux-2.6.25.3/drivers/md/dm-snap.c 2008-05-14 19:22:25.000000000 +0200 @@ -133,6 +133,27 @@ static void stop_tracking_chunk(struct d mempool_free(c, s->tracked_chunk_pool); } +static int __chunk_is_tracked(struct dm_snapshot *s, chunk_t chunk) +{ + struct dm_snap_tracked_chunk *c; + struct hlist_node *hn; + int found = 0; + + spin_lock_irq(&s->tracked_chunk_lock); + + hlist_for_each_entry(c, hn, + &s->tracked_chunk_hash[DM_TRACKED_CHUNK_HASH(chunk)], node) { + if (c->chunk == chunk) { + found = 1; + break; + } + } + + spin_unlock_irq(&s->tracked_chunk_lock); + + return found; +} + /* * One of these per registered origin, held in the snapshot_origins hash */ @@ -826,6 +847,7 @@ static void pending_complete(struct dm_s *e = pe->e; down_write(&s->lock); +check_again: if (!s->valid) { free_exception(e); error = 1; @@ -833,6 +855,15 @@ static void pending_complete(struct dm_s } /* + * Check for conflicting reads. This race condition is very improbable, + * so yield() is sufficient and there is no need for a wait queue. + */ + if (__chunk_is_tracked(s, pe->e.old_chunk)) { + yield(); + goto check_again; + } + + /* * Add a proper exception, and remove the * in-flight exception from the list. */