diff options
author | Martin Baulig <martin@novell.com> | 2006-05-17 14:51:31 +0400 |
---|---|---|
committer | Martin Baulig <martin@novell.com> | 2006-05-17 14:51:31 +0400 |
commit | 1663a14d56b8b4ad9b43b6a00d3efcb0adf4b149 (patch) | |
tree | 9310a891d3dd6fec6147f50c7acca7119d801d4d /libgc/doc | |
parent | 20f9256680027576ec788c8906c746c14a253a83 (diff) |
2006-05-17 Martin Baulig <martin@ximian.com>
Fix a weird race condition which prevented XSP from working inside
the debugger - see doc/debugger-issues.txt for details.
* include/gc.h: Moved the "libgc-mono-debugger.h" #include down
after the gc_pthread_redirects.h one.
* include/libgc-mono-debugger.h
(GCThreadFunctions): Added `thread_created' and `thread_exited'.
(GC_mono_debugger_add_all_threads): New function prototype.
* pthread_stop_world.c (gc_thread_vtable): Allow the vtable and
any function in it be NULL; use NULL as the default vtable.
(GC_mono_debugger_add_all_threads): New public function.
* pthread_support.c (GC_new_thread): Use calloc() instead of
GC_INTERNAL_MALLOC() to allocate the `GC_thread' structure.
(GC_delete_thread): Call `gc_thread_vtable->thread_exited()'.
(GC_thr_init): Call `gc_thread_vtable->thread_created()'.
(GC_start_routine_head): Likewise; use calloc() instead of
GC_INTERNAL_MALLOC() to allocate the `start_info'.
svn path=/trunk/mono/; revision=60766
Diffstat (limited to 'libgc/doc')
-rw-r--r-- | libgc/doc/debugger-issues.txt | 85 |
1 files changed, 85 insertions, 0 deletions
diff --git a/libgc/doc/debugger-issues.txt b/libgc/doc/debugger-issues.txt new file mode 100644 index 00000000000..a739393ee35 --- /dev/null +++ b/libgc/doc/debugger-issues.txt @@ -0,0 +1,85 @@ +I spent the last couple of days debugging a very weird race condition. + +The problem only occured when running XSP (SVN revision 60518) inside +the debugger and only when using special parameters. + +I'm using Mono from SVN revision 60564 (that's from last Thursday), +XSP from SVN revision 60518 and manually installed xsp.exe.mdb and +Mono.WebServer.dll.mdb into $prefix/lib/xsp/1.0/. + +With this setup, I'm running XSP with + + mdb -args /work/asgard/INSTALL/lib/xsp/1.0/xsp.exe --root /work/asgard/INSTALL/lib/xsp/test/ + +Note that adding options like --nonstop or changing the --root may +make the problem go away or make it crash somewhere else. + +Then I insert a breakpoint on line 476 (that's the line before the +Console.ReadLine()) and continue. + +Using `set env GC_DONT_GC 1' inside mdb makes the problem go away and +running a stand-alone mono with -O=shared (and all the other +optimization flags the debugger is using) works fine. + +So my first guess was that this is a GC issue. + +After implementing hardware breakpoints in the debugger, I was finally +able to track this down. If I understand things correctly, the +problem goes like this: + +Some code inside XSP calls mono_thread_pool_add() - inside that +method, we GC-allocate an `ASyncCall *ac' structure, store the `msg' +and `state' objects in it and create a `MonoAsyncResult *ares'. + +Then we call mono_thread_create() passing it async_invoke_thread() and +the `ares'. + +mono_thread_create() stores them as `func' and `start_arg' in the +g_new()-allocated `start_info' and calls CreateThread() which calls +pthread_create(). + +pthread_create() is in fact a wrapper in libgc - it calls the "real" +pthread_create() and then blocks on a semaphore until the thread is +actually started. + +Now - somehow - and I still don't fully understand why - the parent +"loses" all references to the `ac' and `ares' after calling the real +pthread_create(). + +If I understand this correctly, mono_thread_pool_add() only stores +them in registers and not on the stack, so the `start_info' contains +the only references to them. The `start_info', however, is just +passed to the clone() system call and not accessed anymore after that. + +This means that all references to the `ac' and the `ares' may +disappear from the parent's stack between the clone() and sem_wait() +system calls. Under normal circumstances, this is no problem since +the child's stack is created with a reference to the `start_info'. + +I said under normal circumstances, because this is where race +condition #1 comes into the picture: + +The GC's pthread_create() passes a wrapper called GC_start_func() +around the original `start_func' to the real pthread_create(). This +wrapper calls GC_new_thread() and stores some information about the +newly created thread in its internal structures - this information is +also used to determine the child's stack. + +After that, it posts the semaphore on which the parent thread is +blocking, we release the allocation lock and everything is fine. + +However - GC_new_thread() uses GC_INTERNAL_MALLOC() to allocate the +`GC_thread' structure - and GC_INTERNAL_MALLOC() may in fact trigger a +collection ! + +Doing a collection at this time means we don't know about the child's +stack yet - and if the parent doesn't keep a reference to the `ares' +anymore, it's gone .... + +Fixing this was really easy, all I had to do is make GC_new_thread() +use calloc() instead of GC_INTERNAL_MALLOC(). + +The second issue is a debugger-only problem: we need to tell the +debugger about newly created threads while still holding the +allocation lock to ensure that no collection may happen in the +meantime. |