diff options
author | Martin Baulig <martin@novell.com> | 2006-05-24 18:56:07 +0400 |
---|---|---|
committer | Martin Baulig <martin@novell.com> | 2006-05-24 18:56:07 +0400 |
commit | 98cdcf90eb6787609533320cadb2090eddbedffe (patch) | |
tree | b815461a9b6bee265a08077f9eb2a9b207caf39d /libgc/doc | |
parent | ee927fdb90a8b93c5c60b61401dfd8b8207e8e9e (diff) |
2006-05-24 Martin Baulig <martin@ximian.com>
* doc/debugger-support.txt: Removed; this issue turned out to be
something completely different and the patch mentioned in this
file is already reverted.
svn path=/trunk/mono/; revision=61062
Diffstat (limited to 'libgc/doc')
-rw-r--r-- | libgc/doc/debugger-issues.txt | 85 |
1 files changed, 0 insertions, 85 deletions
diff --git a/libgc/doc/debugger-issues.txt b/libgc/doc/debugger-issues.txt deleted file mode 100644 index a739393ee35..00000000000 --- a/libgc/doc/debugger-issues.txt +++ /dev/null @@ -1,85 +0,0 @@ -I spent the last couple of days debugging a very weird race condition. - -The problem only occured when running XSP (SVN revision 60518) inside -the debugger and only when using special parameters. - -I'm using Mono from SVN revision 60564 (that's from last Thursday), -XSP from SVN revision 60518 and manually installed xsp.exe.mdb and -Mono.WebServer.dll.mdb into $prefix/lib/xsp/1.0/. - -With this setup, I'm running XSP with - - mdb -args /work/asgard/INSTALL/lib/xsp/1.0/xsp.exe --root /work/asgard/INSTALL/lib/xsp/test/ - -Note that adding options like --nonstop or changing the --root may -make the problem go away or make it crash somewhere else. - -Then I insert a breakpoint on line 476 (that's the line before the -Console.ReadLine()) and continue. - -Using `set env GC_DONT_GC 1' inside mdb makes the problem go away and -running a stand-alone mono with -O=shared (and all the other -optimization flags the debugger is using) works fine. - -So my first guess was that this is a GC issue. - -After implementing hardware breakpoints in the debugger, I was finally -able to track this down. If I understand things correctly, the -problem goes like this: - -Some code inside XSP calls mono_thread_pool_add() - inside that -method, we GC-allocate an `ASyncCall *ac' structure, store the `msg' -and `state' objects in it and create a `MonoAsyncResult *ares'. - -Then we call mono_thread_create() passing it async_invoke_thread() and -the `ares'. - -mono_thread_create() stores them as `func' and `start_arg' in the -g_new()-allocated `start_info' and calls CreateThread() which calls -pthread_create(). - -pthread_create() is in fact a wrapper in libgc - it calls the "real" -pthread_create() and then blocks on a semaphore until the thread is -actually started. - -Now - somehow - and I still don't fully understand why - the parent -"loses" all references to the `ac' and `ares' after calling the real -pthread_create(). - -If I understand this correctly, mono_thread_pool_add() only stores -them in registers and not on the stack, so the `start_info' contains -the only references to them. The `start_info', however, is just -passed to the clone() system call and not accessed anymore after that. - -This means that all references to the `ac' and the `ares' may -disappear from the parent's stack between the clone() and sem_wait() -system calls. Under normal circumstances, this is no problem since -the child's stack is created with a reference to the `start_info'. - -I said under normal circumstances, because this is where race -condition #1 comes into the picture: - -The GC's pthread_create() passes a wrapper called GC_start_func() -around the original `start_func' to the real pthread_create(). This -wrapper calls GC_new_thread() and stores some information about the -newly created thread in its internal structures - this information is -also used to determine the child's stack. - -After that, it posts the semaphore on which the parent thread is -blocking, we release the allocation lock and everything is fine. - -However - GC_new_thread() uses GC_INTERNAL_MALLOC() to allocate the -`GC_thread' structure - and GC_INTERNAL_MALLOC() may in fact trigger a -collection ! - -Doing a collection at this time means we don't know about the child's -stack yet - and if the parent doesn't keep a reference to the `ares' -anymore, it's gone .... - -Fixing this was really easy, all I had to do is make GC_new_thread() -use calloc() instead of GC_INTERNAL_MALLOC(). - -The second issue is a debugger-only problem: we need to tell the -debugger about newly created threads while still holding the -allocation lock to ensure that no collection may happen in the -meantime. |