Switch to precise GC design document.

svn path=/trunk/mono/; revision=30950
author: Paolo Molaro <lupus@oddwiz.org> 2004-07-09 19:18:26 +0400
committer: Paolo Molaro <lupus@oddwiz.org> 2004-07-09 19:18:26 +0400
commit: e1de07f1529fd5223cd1fae40fb849da6632cd42 (patch)
tree: 60dad1ce1cb819a6f1f98263c2aa5f0e05088532 /docs
parent: 1620eac119f182a8decb4b3a95ebc5d51c3d90c1 (diff)
1 files changed, 65 insertions, 0 deletions
diff --git a/docs/precise-gc b/docs/precise-gc
new file mode 100644
index 00000000000..f0dd52e5792
--- /dev/null
+++ b/docs/precise-gc
@@ -0,0 +1,65 @@
+While the Boehm GC is quite good, we need to move to a 
+precise, generational GC for better performance and smaller
+memory usage (no false-positives memory retentions with big
+allocations).
+
+This is a large task, but it can be done in steps.
+
+1) use the GCJ support to mark reference fields in objects, so
+scanning the heap is faster. This is mostly done already, needs
+checking that it is always used correctly (big objects, arrays).
+
+2) don't include in the static roots the .bss and .data segments
+to save in scanning time and limit false-positives. This is mostly 
+done already.
+
+3) keep track precisely of stack locations and registers in native
+code generation. This basically requires the regalloc rewrite code 
+first, if we don't want to duplicate much of it. This is the hardest 
+task of all, since proving it's correctness is very hard. Some tricks,
+like having a build that injects GC.Collect() after every few simple 
+operations may help. We also need to decide if we want to handle safe 
+points at calls and back jumps only or at every instruction. The latter
+case is harder to implement and requires we keep around much more data
+(it potentially makes for faster stop-the-world phases).
+The first case requires us to be able to advance a thread until it 
+reaches the next safe point: this can be done with the same techniques 
+used by a debugger. We already need something like this to handle
+safely aborts happening in the middle of a prolog in managed code, 
+for example, so this could be an additional sub-task that can be done
+separately from the GC work.
+Note that we can adapt the libgc code to use the info we collect
+when scanning the stack in managed methods and still use the conservative
+approach for the unmanaged stack, until we have our own collector,
+which requires we define a proper icall interface to switch from managed 
+to unmanaged code (hwo to we handle object references in the icall 
+implementations, for example).
+
+4) we could make use of the generational capabilities of the 
+Boehm GC, but not with the current method involving signals which
+may create incompatibilities and is not supported on all platforms.
+We need to start using write barriers: they will be required anyway
+for the generational GC we'll use. When a field holding a reference
+is changed in an object (or an item in an array), we mark the card
+or page where the field is stored as dirty. Later, when a collection 
+is run, only objects in pages marked as dirty are scanned for
+references instead of the whole heap. This could take a few days to
+implement and probably much more time to debug if all the cases were 
+not catched:-)
+
+5) actually write the new generational and precise collector. There are
+several examples out there as open source projects, though the CLR
+needs some specific semantics so the code needs to be written from 
+scratch anyway. Compared to item 3 this is relatively easer and it can
+be tested outside of mono, too, until mono is ready to use it.
+The important features needed:
+*) precise, so there is no false positive memory retention
+*) generational to reduce collection times
+*) pointer-hopping allocation to reduce alloc time
+*) possibly per-thread lock-free allocation
+*) handle weakrefs and finalizers with the CLR semantics
+
+The different tasks can be done in parallel. 1, 2 and 4 can be done in time
+for the mono 1.2 release. Parts of 3 and 5 could be done as well.
+The complete switch is supposed to happen with the mono 2.0 release.
+
author	Paolo Molaro <lupus@oddwiz.org>	2004-07-09 19:18:26 +0400
committer	Paolo Molaro <lupus@oddwiz.org>	2004-07-09 19:18:26 +0400
commit	e1de07f1529fd5223cd1fae40fb849da6632cd42 (patch)
tree	60dad1ce1cb819a6f1f98263c2aa5f0e05088532 /docs
parent	1620eac119f182a8decb4b3a95ebc5d51c3d90c1 (diff)