diff options
author | Zoltan Varga <vargaz@gmail.com> | 2009-02-12 22:22:39 +0300 |
---|---|---|
committer | Zoltan Varga <vargaz@gmail.com> | 2009-02-12 22:22:39 +0300 |
commit | 18b99ebc85b2f466d0852fa641da3264e60d3349 (patch) | |
tree | 1e4d1e64f38b4c9e92216aea55ad564b0d3a445c /docs | |
parent | 4445ef40a6bf1997053a976d3919180e865b90ac (diff) |
2009-02-12 Zoltan Varga <vargaz@gmail.com>
* memory-management.txt thread-safety.txt aot-compiler.txt jit-regalloc
exception-handling.txt: Remove documents which are now on the wiki.
svn path=/trunk/mono/; revision=126745
Diffstat (limited to 'docs')
-rw-r--r-- | docs/ChangeLog | 5 | ||||
-rw-r--r-- | docs/aot-compiler.txt | 393 | ||||
-rw-r--r-- | docs/exception-handling.txt | 330 | ||||
-rw-r--r-- | docs/jit-regalloc | 283 | ||||
-rw-r--r-- | docs/memory-management.txt | 32 | ||||
-rw-r--r-- | docs/thread-safety.txt | 118 |
6 files changed, 5 insertions, 1156 deletions
diff --git a/docs/ChangeLog b/docs/ChangeLog index 79952cb6787..4742f7c4d63 100644 --- a/docs/ChangeLog +++ b/docs/ChangeLog @@ -1,3 +1,8 @@ +2009-02-12 Zoltan Varga <vargaz@gmail.com> + + * memory-management.txt thread-safety.txt aot-compiler.txt jit-regalloc + exception-handling.txt: Remove documents which are now on the wiki. + 2009-02-11 Rodrigo Kumpera <rkumpera@novell.com> * thread-safety.txt: Improve the docs about image lock. diff --git a/docs/aot-compiler.txt b/docs/aot-compiler.txt deleted file mode 100644 index 3d77c0a11ca..00000000000 --- a/docs/aot-compiler.txt +++ /dev/null @@ -1,393 +0,0 @@ -Mono Ahead Of Time Compiler -=========================== - - The Ahead of Time compilation feature in Mono allows Mono to - precompile assemblies to minimize JIT time, reduce memory - usage at runtime and increase the code sharing across multiple - running Mono application. - - To precompile an assembly use the following command: - - mono --aot -O=all assembly.exe - - The `--aot' flag instructs Mono to ahead-of-time compile your - assembly, while the -O=all flag instructs Mono to use all the - available optimizations. - -* Caching metadata ------------------- - - Besides code, the AOT file also contains cached metadata information which allows - the runtime to avoid certain computations at runtime, like the computation of - generic vtables. This reduces both startup time, and memory usage. It is possible - to create an AOT image which contains only this cached information and no code by - using the 'metadata-only' option during compilation: - - mono --aot=metadata-only assembly.exe - - This works even on platforms where AOT is not normally supported. - -* Position Independent Code ---------------------------- - - On x86 and x86-64 the code generated by Ahead-of-Time compiled - images is position-independent code. This allows the same - precompiled image to be reused across multiple applications - without having different copies: this is the same way in which - ELF shared libraries work: the code produced can be relocated - to any address. - - The implementation of Position Independent Code had a - performance impact on Ahead-of-Time compiled images but - compiler bootstraps are still faster than JIT-compiled images, - specially with all the new optimizations provided by the Mono - engine. - -* How to support Position Independent Code in new Mono Ports ------------------------------------------------------------- - - Generated native code needs to reference various runtime - structures/functions whose address is only known at run - time. JITted code can simple embed the address into the native - code, but AOT code needs to do an indirection. This - indirection is done through a table called the Global Offset - Table (GOT), which is similar to the GOT table in the Elf - spec. When the runtime saves the AOT image, it saves some - information for each method describing the GOT table entries - used by that method. When loading a method from an AOT image, - the runtime will fill out the GOT entries needed by the - method. - - * Computing the address of the GOT - - Methods which need to access the GOT first need to compute its - address. On the x86 it is done by code like this: - - call <IP + 5> - pop ebx - add <OFFSET TO GOT>, ebx - <save got addr to a register> - - The variable representing the got is stored in - cfg->got_var. It is allways allocated to a global register to - prevent some problems with branches + basic blocks. - - * Referencing GOT entries - - Any time the native code needs to access some other runtime - structure/function (i.e. any time the backend calls - mono_add_patch_info ()), the code pointed by the patch needs - to load the value from the got. For example, instead of: - - call <ABSOLUTE ADDR> - it needs to do: - call *<OFFSET>(<GOT REG>) - - Here, the <OFFSET> can be 0, it will be fixed up by the AOT compiler. - - For more examples on the changes required, see - - svn diff -r 37739:38213 mini-x86.c - - * The Program Linkage Table - - As in ELF, calls made from AOT code do not go through the GOT. Instead, a direct call is - made to an entry in the Program Linkage Table (PLT). This is based on the fact that on - most architectures, call instructions use a displacement instead of an absolute address, so - they are already position independent. An PLT entry is usually a jump instruction, which - initially points to some trampoline code which transfers control to the AOT loader, which - will compile the called method, and patch the PLT entry so that further calls are made - directly to the called method. - If the called method is in the same assembly, and does not need initialization (i.e. it - doesn't have GOT slots etc), then the call is made directly, bypassing the PLT. - -* Implementation ----------------- - -** The Precompiled File Format ------------------------------ - - We use the native object format of the platform. That way it - is possible to reuse existing tools like objdump and the - dynamic loader. All we need is a working assembler, i.e. we - write out a text file which is then passed to gas (the gnu - assembler) to generate the object file. - - The precompiled image is stored in a file next to the original - assembly that is precompiled with the native extension for a shared - library (on Linux its ".so" to the generated file). - - For example: basic.exe -> basic.exe.so; corlib.dll -> corlib.dll.so - - To avoid symbol lookup overhead and to save space, some things like the - compiled code of the individual methods are not identified by specific symbols - like method_code_1234. Instead, they are stored in one big array and the - offsets inside this array are stored in another array, requiring just two - symbols. The offsets array is usually named 'FOO_offsets', where FOO is the - array the offsets refer to, like 'methods', and 'method_offsets'. - - Generating code using an assembler and linker has some disadvantages: - - it requires GNU binutils or an equivalent package to be installed on the - machine running the aot compilation. - - it is slow. - - There is some support in the aot compiler for directly emitting elf files, but - its not complete (yet). - - The following things are saved in the object file and can be - looked up using the equivalent to dlsym: - - mono_assembly_guid - - A copy of the assembly GUID. - - mono_aot_version - - The format of the AOT file format. - - mono_aot_opt_flags - - The optimizations flags used to build this - precompiled image. - - method_infos - - Contains additional information needed by the runtime for using the - precompiled method, like the GOT entries it uses. - - method_info_offsets - - Maps method indexes to offsets in the method_infos array. - - mono_icall_table - - A table that lists all the internal calls - references by the precompiled image. - - mono_image_table - - A list of assemblies referenced by this AOT - module. - - methods - - The precompiled code itself. - - method_offsets - - Maps method indexes to offsets in the methods array. - - ex_info - - Contains information about methods which is rarely used during normal execution, - like exception and debug info. - - ex_info_offsets - - Maps method indexes to offsets in the ex_info array. - - class_info - - Contains precomputed metadata used to speed up various runtime functions. - - class_info_offsets - - Maps class indexes to offsets in the class_info array. - - class_name_table - - A hash table mapping class names to class indexes. Used to speed up - mono_class_from_name (). - - plt - - The Program Linkage Table - - plt_info - - Contains information needed to find the method belonging to a given PLT entry. - -** Source file structure ------------------------------ - - The AOT infrastructure is split into two files, aot-compiler.c and - aot-runtime.c. aot-compiler.c contains the AOT compiler which is invoked by - --aot, while aot-runtime.c contains the runtime support needed for loading - code and other things from the aot files. - -** Compilation process ----------------------------- - - AOT compilation consists of the following stages: - - collecting the methods to be compiled. - - compiling them using the JIT. - - emitting the JITted code and other information into an assembly file (.s). - - assembling the file using the system assembler. - - linking the resulting object file into a shared library using the system - linker. - -** Handling compiled code ----------------------------- - - Each method is identified by a method index. For normal methods, this is - equivalent to its index in the METHOD metadata table. For runtime generated - methods (wrappers), it is an arbitrary number. - Compiled code is created by invoking the JIT, requesting it to created AOT - code instead of normal code. This is done by the compile_method () function. - The output of the JIT is compiled code and a set of patches (relocations). Each - relocation specifies an offset inside the compiled code, and a runtime object - whose address is accessed at that offset. - Patches are described by a MonoJumpInfo structure. From the perspective - of the AOT compiler, there are two kinds of patches: - - calls, which require an entry in the PLT table. - - everything else, which require an entry in the GOT table. - How patches is handled is described in the next section. - After all the method are compiled, they are emitted into the output file into - a byte array called 'methods', The emission - is done by the emit_method_code () and emit_and_reloc_code () functions. Each - piece of compiled code is identified by the local symbol .Lm_<method index>. - While compiled code is emitted, all the locations which have an associated patch - are rewritten using a platform specific process so the final generated code will - refer to the plt and got entries belonging to the patches. - The compiled code array -can be accessed using the 'methods' global symbol. - -** Handling patches ----------------------------- - - Before a piece of AOTed code can be used, the GOT entries used by it must be - filled out with the addresses of runtime objects. Those objects are identified - by MonoJumpInfo structures. These stuctures are saved in a serialized form in - the AOT file, so the AOT loader can deconstruct them. The serialization is done - by the encode_patch () function, while the deserialization is done by the - decode_patch_info () function. - Every method has an associated method info blob inside the 'method_info' byte - array in the AOT file. This contains all the information required to load the - method at runtime: - - the first got entry used by the method. - - the number of got entries used by the method. - - the serialized patch info for the got entries. - Some patches, like vtables, icalls are very common, so instead of emitting their - info every time they are used by a method, we emit the info only once into a - byte array named 'got_info', and only emit an index into this array for every - access. - -** The Procedure Linkage Table (PLT) ------------------------------------- - - Our PLT is similar to the elf PLT, it is used to handle calls between methods. - If method A needs to call method B, then an entry is allocated in the PLT for - method B, and A calls that entry instead of B directly. This is useful because - in some cases the runtime needs to do some processing the first time B is - called. - There are two cases: - - if B is in another assembly, then it needs to be looked up, then JITted or the - corresponding AOT code needs to be found. - - if B is in the same assembly, but has got slots, then the got slots need to be - initialized. - If none of these cases is true, then the PLT is not used, and the call is made - directly to the native code of the target method. - A PLT entry is usually implemented by a jump though a jump table, where the - jump table entries are initially filled up with the address of a trampoline so - the runtime can get control, and after the native code of the called method is - created/found, the jump table entry is changed to point to the native code. - All PLT entries also embed a integer offset after the jump which indexes into - the 'plt_info' table, which stores the information required to find the called - method. The PLT is emitted by the emit_plt () function. - -** Exception/Debug info ----------------------------- - - Each compiled method has some additional info generated by the JIT, usable - for debugging (IL offset-native offset maps) and exception handling - (saved registers, native offsets of try/catch clauses). Since this info is - rarely needed, it is saved into a separate byte array called 'ex_info'. - -** Cached metadata ---------------------------- - - When the runtime loads a class, it needs to compute a variety of information - which is not readily available in the metadata, like the instance size, - vtable, whenever the class has a finalizer/type initializer etc. Computing this - information requires a lot of time, causes the loading of lots of metadata, - and it usually involves the creation of many runtime data structures - (MonoMethod/MonoMethodSignature etc), which are long living, and usually persist - for the lifetime of the app. To avoid this, we compute the required information - at aot compilation time, and save it into the aot image, into an array called - 'class_info'. The runtime can query this information using the - mono_aot_get_cached_class_info () function, and if the information is available, - it can avoid computing it. - -** Full AOT mode -------------------------- - - Some platforms like the iphone prohibit JITted code, using technical and/or - legal means. This is a significant problem for the mono runtime, since it - generates a lot of code dynamically, using either the JIT or more low-level - code generation macros. To solve this, the AOT compiler is able to function in - full-aot or aot-only mode, where it generates and saves all the neccesary code - in the aot image, so at runtime, no code needs to be generated. - There are two kinds of code which needs to be considered: - - wrapper methods, that is methods whose IL is generated dynamically by the - runtime. They are handled by generating them in the add_wrappers () function, - then emitting them the same way as the 'normal' methods. The only problem is - that these methods do not have a methoddef token, so we need a separate table - in the aot image ('wrapper_info') to find their method index. - - trampolines and other small hand generated pieces of code. They are handled - in an ad-hoc way in the emit_trampolines () function. - -* Performance considerations ----------------------------- - - Using AOT code is a trade-off which might lead to higher or - slower performance, depending on a lot of circumstances. Some - of these are: - - - AOT code needs to be loaded from disk before being used, so - cold startup of an application using AOT code MIGHT be - slower than using JITed code. Warm startup (when the code is - already in the machines cache) should be faster. Also, - JITing code takes time, and the JIT compiler also need to - load additional metadata for the method from the disk, so - startup can be faster even in the cold startup case. - - - AOT code is usually compiled with all optimizations turned - on, while JITted code is usually compiled with default - optimizations, so the generated code in the AOT case should - be faster. - - - JITted code can directly access runtime data structures and - helper functions, while AOT code needs to go through an - indirection (the GOT) to access them, so it will be slower - and somewhat bigger as well. - - - When JITting code, the JIT compiler needs to load a lot of - metadata about methods and types into memory. - - - JITted code has better locality, meaning that if A method - calls B, then the native code for A and B is usually quite - close in memory, leading to better cache behaviour thus - improved performance. In contrast, the native code of - methods inside the AOT file is in a somewhat random order. - -* Future Work -------------- - - - Currently, when an AOT module is loaded, all of its - dependent assemblies are also loaded eagerly, and these - assemblies need to be exactly the same as the ones loaded - when the AOT module was created ('hard binding'). Non-hard - binding should be allowed. - - - On x86, the generated code uses call 0, pop REG, add - GOTOFFSET, REG to materialize the GOT address. Newer - versions of gcc use a separate function to do this, maybe we - need to do the same. - - - Currently, we get vtable addresses from the GOT. Another - solution would be to store the data from the vtables in the - .bss section, so accessing them would involve less - indirection. - - - diff --git a/docs/exception-handling.txt b/docs/exception-handling.txt deleted file mode 100644 index 1fae4e4e1b7..00000000000 --- a/docs/exception-handling.txt +++ /dev/null @@ -1,330 +0,0 @@ - - Exception Handling In the Mono Runtime - -------------------------------------- - -* Introduction --------------- - - There are many types of exceptions which the runtime needs to - handle. These are: - - - exceptions thrown from managed code using the 'throw' or 'rethrow' CIL - instructions. - - - exceptions thrown by some IL instructions like InvalidCastException thrown - by the 'castclass' CIL instruction. - - - exceptions thrown by runtime code - - - synchronous signals received while in managed code - - - synchronous signals received while in native code - - - asynchronous signals - - Since exception handling is very arch dependent, parts of the - exception handling code reside in the arch specific - exceptions-<ARCH>.c files. The architecture independent parts - are in mini-exceptions.c. The different exception types listed - above are generated in different parts of the runtime, but - ultimately, they all end up in the mono_handle_exception () - function in mini-exceptions.c. - -* Exceptions throw programmatically from managed code ------------------------------------------------------ - - These exceptions are thrown from managed code using 'throw' or - 'rethrow' CIL instructions. The JIT compiler will translate - them to a call to a helper function called - 'mono_arch_throw/rethrow_exception'. - - These helper functions do not exist at compile time, they are - created dynamically at run time by the code in the - exceptions-<ARCH>.c files. - - They perform various stack manipulation magic, then call a - helper function usually named throw_exception (), which does - further processing in C code, then calls - mono_handle_exception() to do the rest. - -* Exceptions thrown implicitly from managed code ------------------------------------------------- - - These exceptions are thrown by some IL instructions when - something goes wrong. When the JIT needs to throw such an - exception, it emits a forward conditional branch and remembers - its position, along with the exception which needs to be - emitted. This is usually done in macros named - EMIT_COND_SYSTEM_EXCEPTION in the mini-<ARCH>.c files. - - After the machine code for the method is emitted, the JIT - calls the arch dependent mono_arch_emit_exceptions () function - which will add the exception throwing code to the end of the - method, and patches up the previous forward branches so they - will point to this code. - - This has the advantage that the rarely-executed exception - throwing code is kept separate from the method body, leading - to better icache performance. - - The exception throwing code braches to the dynamically - generated mono_arch_throw_corlib_exception helper function, - which will create the proper exception object, does some stack - manipulation, then calls throw_exception (). - -* Exceptions thrown by runtime code ------------------------------------ - - These exceptions are usually thrown by the implementations of - InternalCalls (icalls). First an appropriate exception object - is created with the help of various helper functions in - metadata/exception.c, which has a separate helper function for - allocating each kind of exception object used by the runtime - code. Then the mono_raise_exception () function is called to - actually throw the exception. That function never returns. - - An example: - - if (something_is_wrong) - mono_raise_exception (mono_get_exception_index_out_of_range ()); - - mono_raise_exception () simply passes the exception to the JIT - side through an API, where it will be received by helper - created by mono_arch_throw_exception (). From now on, it is - treated as an exception thrown from managed code. - -* Synchronous signals ---------------------- - - For performance reasons, the runtime does not do same checks - required by the CLI spec. Instead, it relies on the CPU to do - them. The two main checks which are omitted are null-pointer - checks, and arithmetic checks. When a null pointer is - dereferenced by JITted code, the CPU will notify the kernel - through an interrupt, and the kernel will send a SIGSEGV - signal to the process. The runtime installs a signal handler - for SIGSEGV, which is sigsegv_signal_handler () in mini.c. The - signal handler creates the appropriate exception object and - calls mono_handle_exception () with it. Arithmetic exceptions - like division by zero are handled similarly. - -* Synchronous signals in native code ------------------------------------- - - Receiving a signal such as SIGSEGV while in native code means - something very bad has happened. Because of this, the runtime - will abort after trying to print a managed plus a native stack - trace. The logic is in the mono_handle_native_sigsegv () - function. - - Note that there are two kinds of native code which can be the - source of the signal: - - - code inside the runtime - - code inside a native library loaded by an application, ie. libgtk+ - -* Stack overflow checking -------------------------- - - Stack overflow exceptions need special handling. When a thread - overflows its stack, the kernel sends it a normal SIGSEGV - signal, but the signal handler tries to execute on the same as - the thread leading to a further SIGSEGV which will terminate - the thread. A solution is to use an alternative signal stack - supported by UNIX operating systems through the sigaltstack - (2) system call. When a thread starts up, the runtime will - install an altstack using the mono_setup_altstack () function - in mini-exceptions.c. When a SIGSEGV is received, the signal - handler checks whenever the fault address is near the bottom - of the threads normal stack. If it is, a - StackOverflowException is created instead of a - NullPointerException. This exception is handled like any other - exception, with some minor differences. - - There are two reasons why sigaltstack is disabled by default: - - * The main problem with sigaltstack() is that the stack - employed by it is not visible to the GC and it is possible - that the GC will miss it. - - * Working sigaltstack support is very much os/kernel/libc - dependent, so it is disabled by default. - - -* Asynchronous signals ----------------------- - - Async signals are used by the runtime to notify a thread that - it needs to change its state somehow. Currently, it is used - for implementing thread abort/suspend/resume. - - Handling async signals correctly is a very hard problem, - since the receiving thread can be in basically any state upon - receipt of the signal. It can execute managed code, native - code, it can hold various managed/native locks, or it can be - in a process of acquiring them, it can be starting up, - shutting down etc. Most of the C APIs used by the runtime are - not asynch-signal safe, meaning it is not safe to call them - from an async signal handler. In particular, the pthread - locking functions are not async-safe, so if a signal handler - interrupted code which was in the process of acquiring a lock, - and the signal handler tries to acquire a lock, the thread - will deadlock. Unfortunately, the current signal handling - code does acquire locks, so sometimes it does deadlock. - - When receiving an async signal, the signal handler first tries - to determine whenever the thread was executing managed code - when it was interrupted. If it did, then it is safe to - interrupt it, so a ThreadAbortException is constructed and - thrown. If the thread was executing native code, then it is - generally not safe to interrupt it. In this case, the runtime - sets a flag then returns from the signal handler. That flag is - checked every time the runtime returns from native code to - managed code, and the exception is thrown then. Also, a - platform specific mechanism is used to cause the thread to - interrupt any blocking operation it might be doing. - - The async signal handler is in sigusr1_signal_handler () in - mini.c, while the logic which determines whenever an exception - is safe to be thrown is in mono_thread_request_interruption - (). - -* Stack unwinding during exception handling -------------------------------------------- - - The execution state of a thread during exception handling is - stored in an arch-specific structure called MonoContext. This - structure contains the values of all the CPU registers - relevant during exception handling, which usually means: - - - IP (instruction pointer) - - SP (stack pointer) - - FP (frame pointer) - - callee saved registers - - Callee saved registers are the registers which are required by - any procedure to be saved/restored before/after using - them. They are usually defined by each platforms ABI - (Application Binary Interface). For example, on x86, they are - EBX, ESI and EDI. - - The code which calls mono_handle_exception () is required to - construct the initial MonoContext. How this is done depends on - the caller. For exceptions thrown from managed code, the - mono_arch_throw_exception helper function saves the values of - the required registers and passes them to throw_exception (), - which will save them in the MonoContext structure. For - exceptions thrown from signal handlers, the MonoContext - stucture is initialized from the signal info received from the - kernel. - - During exception handling, the runtime needs to 'unwind' the - stack, i.e. given the state of the thread at a stack frame, - construct the state at its callers. Since this is platform - specific, it is done by a platform specific function called - mono_arch_find_jit_info (). - - Two kinds of stack frames need handling: - - - Managed frames are easier. The JIT will store some - information about each managed method, like which - callee-saved registers it uses. Based on this information, - mono_arch_find_jit_info () can find the values of the - registers on the thread stack, and restore them. - - - Native frames are problematic, since we have no information - about how to unwind through them. Some compilers generate - unwind information for code, some don't. Also, there is no - general purpose library to obtain and decode this unwind - information. So the runtime uses a different solution. When - managed code needs to call into native code, it does through - a managed->native wrapper function, which is generated by - the JIT. This function is responsible for saving the machine - state into a per-thread structure called MonoLMF (Last - Managed Frame). These LMF structures are stored on the - threads stack, and are linked together using one of their - fields. When the unwinder encounters a native frame, it - simply pops one entry of the LMF 'stack', and uses it to - restore the frame state to the moment before control passed - to native code. In effect, all successive native frames are - skipped together. - -Problems/future work --------------------- - -1. Async signal safety ----------------------- - - The current async signal handling code is not async safe, so - it can and does deadlock in practice. It needs to be rewritten - to avoid taking locks at least until it can determine that it - was interrupting managed code. - - Another problem is the managed stack frame unwinding code. It - blindly assumes that if the IP points into a managed frame, - then all the callee saved registers + the stack pointer are - saved on the stack. This is not true if the thread was - interrupted while executing the method prolog/epilog. - -2. Raising exceptions from native code --------------------------------------- - - Currently, exceptions are raised by calling - mono_raise_exception () in the middle of runtime code. This - has two problems: - - - No cleanup is done, ie. if the caller of the function which - throws an exception has taken locks, or allocated memory, - that is not cleaned up. For this reason, it is only safe to - call mono_raise_exception () 'very close' to managed code, - ie. in the icall functions themselves. - - - To allow mono_raise_exception () to unwind through native - code, we need to save the LMF structures which can add a lot - of overhead even in the common case when no exception is - thrown. So this is not zero-cost exception handling. - - An alternative might be to use a JNI style - set-pending-exception API. Runtime code could call - mono_set_pending_exception (), then return to its caller with - an error indication allowing the caller to clean up. When - execution returns to managed code, then managed->native - wrapper could check whenever there is a pending exception and - throw it if neccesary. Since we already check for pending - thread interruption, this would have no overhead, allowing us - to drop the LMF saving/restoring code, or significant parts of - it. - -4. libunwind ------------- - - There is an OSS project called libunwind which is a standalone - stack unwinding library. It is currently in development, but - it is used by default by gcc on ia64 for its stack - unwinding. The mono runtime also uses it on ia64. It has - several advantages in relation to our current unwinding code: - - - it has a platform independent API, i.e. the same unwinding - code can be used on multiple platforms. - - - it can generate unwind tables which are correct at every - instruction, i.e. can be used for unwinding from async - signals. - - - given sufficient unwind info generated by a C compiler, it - can unwind through C code. - - - most of its API is async-safe - - - it implements the gcc C++ exception handling API, so in - theory it can be used to implement mixed-language exception - handling (i.e. C++ exception caught in mono, mono exception - caught in C++). - - - it is MIT licensed - - The biggest problem with libuwind is its platform support. ia64 support is - complete/well tested, while support for other platforms is missing/incomplete. - - http://www.hpl.hp.com/research/linux/libunwind/ - diff --git a/docs/jit-regalloc b/docs/jit-regalloc deleted file mode 100644 index 47a277046c8..00000000000 --- a/docs/jit-regalloc +++ /dev/null @@ -1,283 +0,0 @@ -Register Allocation -=================== - -The current JIT implementation uses a tree matcher to generate code. We use a -simple algorithm to allocate registers in trees, and limit the number of used -temporary register to 4 when evaluating trees. So we can use 2 registers for -global register allocation. - -Register Allocation for Trees -============================= - -We always evaluate trees from left to right. When there are no more registers -available we need to spill values to memory. Here is the simplified algorithm. - -gboolean -tree_allocate_regs (tree, exclude_reg) -{ - if (!tree_allocate_regs (tree->left, -1)) - return FALSE; - - if (!tree_allocate_regs (tree->right, -1)) { - - tree->left->spilled == TRUE; - - free_used_regs (tree->left); - - if (!tree_allocate_regs (tree->right, tree->left->reg)) - return FALSE; - } - - free_used_regs (tree->left); - free_used_regs (tree->right); - - /* try to allocate a register (reg != exclude_reg) */ - if ((tree->reg = next_free_reg (exclude_reg)) != -1) - return TRUE; - - return FALSE; -} - -The emit routing actually spills the registers: - -tree_emit (tree) -{ - - tree_emit (tree->left); - - if (tree->left->spilled) - save_reg (tree->left->reg); - - tree_emit (tree->right); - - if (tree->left->spilled) - restore_reg (tree->left->reg); - - - emit_code (tree); -} - - -Global Register Allocation -========================== - -TODO. - -Local Register Allocation -========================= - -This section describes the cross-platform local register allocator which is -in the file mini-codegen.c. - -The input to the allocator is a basic block which contains linear IL, ie. -instructions of the form: - - DEST <- SRC1 OP SRC2 - -where DEST, SRC1, and SRC2 are virtual registers (vregs). The job of the -allocator is to assign hard or physical registers (hregs) to each virtual -registers so the vreg references in the instructions can be replaced with their -assigned hreg, allowing machine code to be generated later. - -The allocator needs information about the number and types of arguments of -instructions. It takes this information from the machine description files. It -also needs arch specific information, like the number and type of the hard -registers. It gets this information from arch-specific macros. - -Currently, the vregs and hregs are partitioned into two classes: integer and -floating point. - -The allocator consists of two phases: In the first phase, a forward pass is -made over the instructions, collecting liveness information for vregs. In the -second phase, a backward pass is made over the instructions, assigning -registers. This backward mode of operation makes understanding the allocator -somewhat difficult to understand, but leads to better code in most cases. - -Allocator state -=============== - -The state of the allocator is stored in two arrays: iassign and isymbolic. -iassign maps vregs to hregs, while isymbolic is the opposite. -For a vreg, iassign [vreg] can contain the following values: - - * -1 vreg has no assigned hreg - - * hreg index (>= 0) vreg is assigned to the given hreg. This means - later instructions (which we have already - processed due to the backward direction) expect - the value of vreg to be found in hreg. - - * spill slot index (< -1) vreg is spilled to the given spill slot. This - means later instructions expect the value of - vreg to be found on the stack in the given - spill slot. - -Also, the allocator keeps track of which hregs are free and which are used. -This information is stored in a bitmask called ifree_mask. - -There is a similar set of data structures for floating point registers. - -Spilling -======== - -When an allocator needs a free hreg, but all of them are assigned, it needs to -free up one of them. It does this by spilling the contents of the vreg which -is currently assigned to the selected hreg. Since later instructions expect -the vreg to be found in the selected hreg, the allocator emits a spill-load -instruction to load the value from the spill slot into the hreg after the -currently processed instruction. When the vreg which is spilled is a -destination in an instruction, the allocator will emit a spill-store to store -the value into the spill slot. - -Fixed registers -=============== - -Some architectures, notably x86/amd64 require that the arguments/results of -some instructions be assigned to specific hregs. An example is the shift -opcodes on x86, where the second argument must be in ECX. The allocator -has support for this. It tries to allocate the vreg to the required hreg. If -thats not possible, then it will emit compensation code which moves values to -the correct registers before/after the instruction. - -Fixed registers are mainly used on x86, but they are useful on more regular -architectures on well, for example to model that after a call instruction, the -return of the call is in a specific register. - -A special case of fixed registers is two address architectures, like the x86, -where the instructions place their results into their first argument. This is -modelled in the allocator by allocating SRC1 and DEST to the same hreg. - -Global registers -================ - -Some variables might already be allocated to hardware registers during the -global allocation phase. In this case, SRC1, SRC2 and DEST might already be -a hardware register. The allocator needs to do nothing in this case, except -when the architecture uses fixed registers, in which case it needs to emit -compensation code. - -Register pairs -============== - -64 bit arithmetic on 32 bit machines requires instructions whose arguments are -not registers, but register pairs. The allocator has support for this, both -for freely allocatable register pairs, and for register pairs which are -constrained to specific hregs (EDX:EAX on x86). - -Floating point stack -==================== - -The x86 architecture uses a floating point register stack instead of a set of -fp registers. The allocator supports this by keeping track of the height of the -fp stack, and spilling/loading values from the stack as neccesary. - -Calls -===== - -Calls need special handling for two reasons: first, they will clobber all -caller-save registers, meaning their contents will need to be spilled. Also, -some architectures pass arguments in registers. The registers used for -passing arguments are usually the same as the ones used for local allocation, -so the allocator needs to handle them specially. This is done as follows: -the MonoInst for the call instruction contains a map mapping vregs which -contain the argument values to hregs where the argument needs to be placed, -like this (on amd64): - -R33 -> RDI -R34 -> RSI -... - -When the allocator processes the call instruction, it allocates the vregs -in the map to their associated hregs. So the call instruction is processed as -if having a variable number of arguments which fixed register assignments. - -An example: - - R33 <- 1 - R34 <- 2 - call - -When the call instruction is processed, R33 is assigned to RDI, and R34 is -assigned to RSI. Later, when the two assignment instructions are processed, -R33 and R34 are already assigned to a hreg, so they are replaced with the -associated hreg leading to the following final code: - - RDI <- 1 - RSI <- 1 - call - -Machine description files -========================= - -A typical entry in the machine description files looks like this: - - shl: dest:i src1:i src2:s clob:1 len:2 - -The allocator is only interested in the dest,src1,src2 and clob fields. -It understands the following values for the dest, src1, src2 fields: - - i - integer register - f - fp register - b - base register (same as i, but the instruction does not modify the reg) - m - fp register, even if an fp stack is used (no fp stack tracking) - -It understands the following values for the clob field: - - 1 - sreg1 needs to be the same as dreg - c - instruction clobbers the caller-save registers - -Beside these values, an architecture can define additional values (like the 's' -in the example). The allocator depends on a set of arch-specific macros to -convert these values to information it needs during allocation. - -Arch specific macros -==================== - -These macros usually receive a value from the machine description file (like -the 's' in the example). The examples below are for x86. - -/* - * A bitmask selecting the caller-save registers (these are used for local - * allocation). - */ -#define MONO_ARCH_CALLEE_REGS X86_CALLEE_REGS - -/* - * A bitmask selecting the callee-saved registers (these are usually used for - * global allocation). - */ -#define MONO_ARCH_CALLEE_SAVED_REGS X86_CALLER_REGS - -/* Same for the floating point registers */ -#define MONO_ARCH_CALLEE_FREGS 0 -#define MONO_ARCH_CALLEE_SAVED_FREGS 0 - -/* Whenever the target uses a floating point stack */ -#define MONO_ARCH_USE_FPSTACK TRUE - -/* The size of the floating point stack */ -#define MONO_ARCH_FPSTACK_SIZE 6 - -/* - * Given a descriptor value from the machine description file, return the fixed - * hard reg corresponding to that value. - */ -#define MONO_ARCH_INST_FIXED_REG(desc) ((desc == 's') ? X86_ECX : ((desc == 'a') ? X86_EAX : ((desc == 'd') ? X86_EDX : ((desc == 'y') ? X86_EAX : ((desc == 'l') ? X86_EAX : -1))))) - -/* - * A bitmask selecting the hregs which can be used for allocating sreg2 for - * a given instruction. - */ -#define MONO_ARCH_INST_SREG2_MASK(ins) (((ins [MONO_INST_CLOB] == 'a') || (ins [MONO_INST_CLOB] == 'd')) ? (1 << X86_EDX) : 0) - -/* - * Given a descriptor value, return whenever it denotes a register pair. - */ -#define MONO_ARCH_INST_IS_REGPAIR(desc) (desc == 'l' || desc == 'L') - -/* - * Given a descriptor value, and the first register of a regpair, return a - * bitmask selecting the hregs which can be used for allocating the second - * register of the regpair. - */ -#define MONO_ARCH_INST_REGPAIR_REG2(desc,hreg1) (desc == 'l' ? X86_EDX : -1) diff --git a/docs/memory-management.txt b/docs/memory-management.txt deleted file mode 100644 index a78ab5e3bf0..00000000000 --- a/docs/memory-management.txt +++ /dev/null @@ -1,32 +0,0 @@ -Metadata memory management --------------------------- - -Most metadata structures have a lifetime which is equal to the MonoImage where they are -loaded from. These structures should be allocated from the memory pool of the -corresponding MonoImage. The memory pool is protected by the loader lock. -Examples of metadata structures in this category: -- MonoClass -- MonoMethod -- MonoType -Memory owned by these structures should be allocated from the image mempool as well. -Examples include: klass->methods, klass->fields, method->signature etc. - -Generics complicates things. A generic class could have many instantinations where the -generic arguments are from different assemblies. Where should we allocate memory for -instantinations ? We can allocate from the mempool of the image which contains the -generic type definition, but that would mean that the instantinations would remain in -memory even after the assemblies containing their type arguments are unloaded, leading -to a memory leak. Therefore, we do the following: -- data structures representing the generic definitions are allocated from the image - mempool as usual. These include: - - generic class definition (MonoGenericClass->container_class) - - generic method definitions - - type parameters (MonoGenericParam) -- data structures representing inflated classes/images are allocated from the heap. These - structures are kept in a cache, indexed by type arguments of the instantinations. When - an assembly is unloaded, this cache is searched and all instantinations referencing - types from the assembly are freed. This is done by mono_metadata_clean_for_image () - in metadata.c. The structures handled this way include: - - MonoGenericClass - - MonoGenericInst - - inflated MonoMethods diff --git a/docs/thread-safety.txt b/docs/thread-safety.txt deleted file mode 100644 index c1e5d7f8720..00000000000 --- a/docs/thread-safety.txt +++ /dev/null @@ -1,118 +0,0 @@ - -1. Thread safety of metadata structures ----------------------------------------- - -1.1 Synchronization of read-only data -------------------------------------- - -Read-only data is data which is not modified after creation, like the -actual binary metadata in the metadata tables. - -There are three kinds of threads with regards to read-only data: -- readers -- the creator of the data -- the destroyer of the data - -Most threads are readers. - -- synchronization between readers is not necessary -- synchronization between the writers is done using locks. -- synchronization between the readers and the creator is done by not exposing - the data to readers before it is fully constructed. -- synchronization between the readers and the destroyer: TBD. - -1.2 Deadlock prevention plan ----------------------------- - -Hold locks for the shortest time possible. Avoid calling functions inside -locks which might obtain global locks (i.e. locks known outside this module). - -1.3 Locks ----------- - -1.3.1 Simple locks ------------------- - - There are a lot of global data structures which can be protected by a 'simple' lock. Simple means: - - the lock protects only this data structure or it only protects the data structures in a given C module. - An example would be the appdomains list in domain.c - - the lock can span many modules, but it still protects access to a single resource or set of resources. - An example would be the image lock, which protects all data structures that belong to a given MonoImage. - - the lock is only held for a short amount of time, and no other lock is acquired inside this simple lock. Thus there is - no possibility of deadlock. - - Simple locks include, at least, the following : - - the per-image lock acquired by using mono_image_(un)lock functions. - -1.3.2 The class loader lock ---------------------------- - -This locks is held by the class loading routines in class.c and loader.c. It -protects the various caches inside MonoImage which are used by these modules. - -1.3.3 The domain lock ---------------------- - -Each appdomain has a lock which protects the per-domain data structures. - -1.3.4 The locking hierarchy ---------------------------- - -It is useful to model locks by a locking hierarchy, which is a relation between locks, which is reflexive, transitive, -and antisymmetric, in other words, a lattice. If a thread wants to acquire a lock B, while already holding A, it can only -do it if A < B. If all threads work this way, then no deadlocks can occur. - -Our locking hierarchy so far looks like this: - <DOMAIN LOCK> - \ - <CLASS LOADER LOCK> - \ \ - <SIMPLE LOCK 1> <SIMPLE LOCK 2> - -1.4 Notes ----------- - -Some common scenarios: -- if a function needs to access a data structure, then it should lock it itself, and do not count on its caller locking it. - So for example, the image->class_cache hash table would be locked by mono_class_get(). - -- there are lots of places where a runtime data structure is created and stored in a cache. In these places, care must be - taken to avoid multiple threads creating the same runtime structure, for example, two threads might call mono_class_get () - with the same class name. There are two choices here: - - <enter mutex> - <check that item is created> - if (created) { - <leave mutex> - return item - } - <create item> - <store it in cache> - <leave mutex> - - This is the easiest solution, but it requires holding the lock for the whole time which might create a scalability problem, and could also lead to deadlock. - - <enter mutex> - <check that item is created> - <leave mutex> - if (created) { - return item - } - <create item> - <enter mutex> - <check that item is created> - if (created) { - /* Another thread already created and stored the same item */ - <free our item> - <leave mutex> - return orig item - } - else { - <store item in cache> - <leave mutex> - return item - } - - This solution does not present scalability problems, but the created item might be hard to destroy (like a MonoClass). - -- lazy initialization of hashtables etc. is not thread safe |