diff options
author | Patrick Steinhardt <psteinhardt@gitlab.com> | 2021-10-21 13:22:29 +0300 |
---|---|---|
committer | Patrick Steinhardt <psteinhardt@gitlab.com> | 2021-10-21 14:46:38 +0300 |
commit | 273f8bf62b866284c89cdaf0d59e552392fb2b34 (patch) | |
tree | 53de113a028c70d342c11dc3b2a91c43db6af52d /internal/streamcache | |
parent | dfbf06e817c6a14fecd0e3d3343857fed820735d (diff) |
streamcache: Fix deadlocking caused by early removal condition
In our CI pipelines, we frequently observe a deadlock in the streamcache
where tests abort after 10 minutes of inactivity. The deadlock happens
while we're waiting for the cache cleanup to remove on-disk files, where
we never get a signal that files have been removed.
The root cause of this is that removal of these files is too fast such
that the removal condition is signalled on before we start to wait on
the condition. While we do take the condition's lock before spawning the
Goroutine, we don't take the lock before calling `Broadcast()` on the
condition. As a result, we broadcast the signal without locking the
condition first and may thus do it before the listener is set up.
Fix this race by locking the condition before broadcasting. While
executing this single testcase with `-count=10000` was locking up quite
fast previous to this fix, I couldn't reproduce it at all anymore with
the fix applied.
Diffstat (limited to 'internal/streamcache')
-rw-r--r-- | internal/streamcache/cache.go | 2 |
1 files changed, 2 insertions, 0 deletions
diff --git a/internal/streamcache/cache.go b/internal/streamcache/cache.go index 4c1b7c237..22b2969db 100644 --- a/internal/streamcache/cache.go +++ b/internal/streamcache/cache.go @@ -195,6 +195,8 @@ func (c *cache) clean() { } if c.removalCond != nil { + c.removalCond.L.Lock() + defer c.removalCond.L.Unlock() c.removalCond.Broadcast() } }() |