diff options
Diffstat (limited to 'docs/precise-gc')
-rw-r--r-- | docs/precise-gc | 77 |
1 files changed, 77 insertions, 0 deletions
diff --git a/docs/precise-gc b/docs/precise-gc new file mode 100644 index 00000000000..8f84ea3203b --- /dev/null +++ b/docs/precise-gc @@ -0,0 +1,77 @@ +While the Boehm GC is quite good, we need to move to a +precise, generational GC for better performance and smaller +memory usage (no false-positives memory retentions with big +allocations). + +This is a large task, but it can be done in steps. + +1) use the GCJ support to mark reference fields in objects, so +scanning the heap is faster. This is mostly done already, needs +checking that it is always used correctly (big objects, arrays). +There are also some data structures we use in the runtime that are +currently untyped that are allocated in the Gc heap and used to +keep references to GC objects. We need to make them typed as to +precisely track GC references or make them non-GC memory, +by using more the GC hnadle support code (MonoGHashTable, MonoDomain, +etc). + +2) don't include in the static roots the .bss and .data segments +to save in scanning time and limit false-positives. This is mostly +done already. + +3) keep track precisely of stack locations and registers in native +code generation. This basically requires the regalloc rewrite code +first, if we don't want to duplicate much of it. This is the hardest +task of all, since proving it's correctness is very hard. Some tricks, +like having a build that injects GC.Collect() after every few simple +operations may help. We also need to decide if we want to handle safe +points at calls and back jumps only or at every instruction. The latter +case is harder to implement and requires we keep around much more data +(it potentially makes for faster stop-the-world phases). +The first case requires us to be able to advance a thread until it +reaches the next safe point: this can be done with the same techniques +used by a debugger. We already need something like this to handle +safely aborts happening in the middle of a prolog in managed code, +for example, so this could be an additional sub-task that can be done +separately from the GC work. +Note that we can adapt the libgc code to use the info we collect +when scanning the stack in managed methods and still use the conservative +approach for the unmanaged stack, until we have our own collector, +which requires we define a proper icall interface to switch from managed +to unmanaged code (hwo to we handle object references in the icall +implementations, for example). + +4) we could make use of the generational capabilities of the +Boehm GC, but not with the current method involving signals which +may create incompatibilities and is not supported on all platforms. +We need to start using write barriers: they will be required anyway +for the generational GC we'll use. When a field holding a reference +is changed in an object (or an item in an array), we mark the card +or page where the field is stored as dirty. Later, when a collection +is run, only objects in pages marked as dirty are scanned for +references instead of the whole heap. This could take a few days to +implement and probably much more time to debug if all the cases were +not catched:-) + +5) actually write the new generational and precise collector. There are +several examples out there as open source projects, though the CLR +needs some specific semantics so the code needs to be written from +scratch anyway. Compared to item 3 this is relatively easer and it can +be tested outside of mono, too, until mono is ready to use it. +The important features needed: +*) precise, so there is no false positive memory retention +*) generational to reduce collection times +*) pointer-hopping allocation to reduce alloc time +*) possibly per-thread lock-free allocation +*) handle weakrefs and finalizers with the CLR semantics + +Note: some GC engines use a single mmap area, because it makes +handling generations and the implementation much easier, but this also +limits the expension of the heap, so people may need to use a command-line +option to set the max heap size etc. It would be better to have a design +that allows mmapping a few megabytes chunks at a time. + +The different tasks can be done in parallel. 1, 2 and 4 can be done in time +for the mono 1.2 release. Parts of 3 and 5 could be done as well. +The complete switch is supposed to happen with the mono 2.0 release. + |