Welcome to mirror list, hosted at ThFree Co, Russian Federation.

github.com/mono/mono.git - Unnamed repository; edit this file 'description' to name the repository.
summaryrefslogtreecommitdiff
path: root/libgc/doc
diff options
context:
space:
mode:
authorMartin Baulig <martin@novell.com>2006-05-17 14:51:31 +0400
committerMartin Baulig <martin@novell.com>2006-05-17 14:51:31 +0400
commit1663a14d56b8b4ad9b43b6a00d3efcb0adf4b149 (patch)
tree9310a891d3dd6fec6147f50c7acca7119d801d4d /libgc/doc
parent20f9256680027576ec788c8906c746c14a253a83 (diff)
2006-05-17 Martin Baulig <martin@ximian.com>
Fix a weird race condition which prevented XSP from working inside the debugger - see doc/debugger-issues.txt for details. * include/gc.h: Moved the "libgc-mono-debugger.h" #include down after the gc_pthread_redirects.h one. * include/libgc-mono-debugger.h (GCThreadFunctions): Added `thread_created' and `thread_exited'. (GC_mono_debugger_add_all_threads): New function prototype. * pthread_stop_world.c (gc_thread_vtable): Allow the vtable and any function in it be NULL; use NULL as the default vtable. (GC_mono_debugger_add_all_threads): New public function. * pthread_support.c (GC_new_thread): Use calloc() instead of GC_INTERNAL_MALLOC() to allocate the `GC_thread' structure. (GC_delete_thread): Call `gc_thread_vtable->thread_exited()'. (GC_thr_init): Call `gc_thread_vtable->thread_created()'. (GC_start_routine_head): Likewise; use calloc() instead of GC_INTERNAL_MALLOC() to allocate the `start_info'. svn path=/trunk/mono/; revision=60766
Diffstat (limited to 'libgc/doc')
-rw-r--r--libgc/doc/debugger-issues.txt85
1 files changed, 85 insertions, 0 deletions
diff --git a/libgc/doc/debugger-issues.txt b/libgc/doc/debugger-issues.txt
new file mode 100644
index 00000000000..a739393ee35
--- /dev/null
+++ b/libgc/doc/debugger-issues.txt
@@ -0,0 +1,85 @@
+I spent the last couple of days debugging a very weird race condition.
+
+The problem only occured when running XSP (SVN revision 60518) inside
+the debugger and only when using special parameters.
+
+I'm using Mono from SVN revision 60564 (that's from last Thursday),
+XSP from SVN revision 60518 and manually installed xsp.exe.mdb and
+Mono.WebServer.dll.mdb into $prefix/lib/xsp/1.0/.
+
+With this setup, I'm running XSP with
+
+ mdb -args /work/asgard/INSTALL/lib/xsp/1.0/xsp.exe --root /work/asgard/INSTALL/lib/xsp/test/
+
+Note that adding options like --nonstop or changing the --root may
+make the problem go away or make it crash somewhere else.
+
+Then I insert a breakpoint on line 476 (that's the line before the
+Console.ReadLine()) and continue.
+
+Using `set env GC_DONT_GC 1' inside mdb makes the problem go away and
+running a stand-alone mono with -O=shared (and all the other
+optimization flags the debugger is using) works fine.
+
+So my first guess was that this is a GC issue.
+
+After implementing hardware breakpoints in the debugger, I was finally
+able to track this down. If I understand things correctly, the
+problem goes like this:
+
+Some code inside XSP calls mono_thread_pool_add() - inside that
+method, we GC-allocate an `ASyncCall *ac' structure, store the `msg'
+and `state' objects in it and create a `MonoAsyncResult *ares'.
+
+Then we call mono_thread_create() passing it async_invoke_thread() and
+the `ares'.
+
+mono_thread_create() stores them as `func' and `start_arg' in the
+g_new()-allocated `start_info' and calls CreateThread() which calls
+pthread_create().
+
+pthread_create() is in fact a wrapper in libgc - it calls the "real"
+pthread_create() and then blocks on a semaphore until the thread is
+actually started.
+
+Now - somehow - and I still don't fully understand why - the parent
+"loses" all references to the `ac' and `ares' after calling the real
+pthread_create().
+
+If I understand this correctly, mono_thread_pool_add() only stores
+them in registers and not on the stack, so the `start_info' contains
+the only references to them. The `start_info', however, is just
+passed to the clone() system call and not accessed anymore after that.
+
+This means that all references to the `ac' and the `ares' may
+disappear from the parent's stack between the clone() and sem_wait()
+system calls. Under normal circumstances, this is no problem since
+the child's stack is created with a reference to the `start_info'.
+
+I said under normal circumstances, because this is where race
+condition #1 comes into the picture:
+
+The GC's pthread_create() passes a wrapper called GC_start_func()
+around the original `start_func' to the real pthread_create(). This
+wrapper calls GC_new_thread() and stores some information about the
+newly created thread in its internal structures - this information is
+also used to determine the child's stack.
+
+After that, it posts the semaphore on which the parent thread is
+blocking, we release the allocation lock and everything is fine.
+
+However - GC_new_thread() uses GC_INTERNAL_MALLOC() to allocate the
+`GC_thread' structure - and GC_INTERNAL_MALLOC() may in fact trigger a
+collection !
+
+Doing a collection at this time means we don't know about the child's
+stack yet - and if the parent doesn't keep a reference to the `ares'
+anymore, it's gone ....
+
+Fixing this was really easy, all I had to do is make GC_new_thread()
+use calloc() instead of GC_INTERNAL_MALLOC().
+
+The second issue is a debugger-only problem: we need to tell the
+debugger about newly created threads while still holding the
+allocation lock to ensure that no collection may happen in the
+meantime.