diff options
author | Alex Rønne Petersen <alexrp@xamarin.com> | 2015-04-07 15:46:41 +0300 |
---|---|---|
committer | Alex Rønne Petersen <alexrp@xamarin.com> | 2015-04-07 15:46:41 +0300 |
commit | 467ca088b6accb38f258197d03b5e872e0f663a6 (patch) | |
tree | ce4591c2c4aedf63864713fb6b534ea20927eca0 /docs | |
parent | 7fd61974bab711569069c447d2314f294b0284b6 (diff) |
Remove outdated documentation from the docs subdirectory.
Diffstat (limited to 'docs')
-rw-r--r-- | docs/Makefile.am | 9 | ||||
-rw-r--r-- | docs/assembly-bundle | 57 | ||||
-rw-r--r-- | docs/exceptions | 37 | ||||
-rw-r--r-- | docs/local-regalloc.txt | 208 | ||||
-rw-r--r-- | docs/mono_handle_d | 98 | ||||
-rw-r--r-- | docs/new-regalloc | 68 | ||||
-rw-r--r-- | docs/opcode-decomp.txt | 113 | ||||
-rw-r--r-- | docs/reactive-extension-bundle.txt | 49 | ||||
-rw-r--r-- | docs/release-notes-1.0.html | 16 | ||||
-rw-r--r-- | docs/stack-alignment | 33 | ||||
-rw-r--r-- | docs/tree-mover.txt | 261 |
11 files changed, 0 insertions, 949 deletions
diff --git a/docs/Makefile.am b/docs/Makefile.am index 811955257bb..5eb42044614 100644 --- a/docs/Makefile.am +++ b/docs/Makefile.am @@ -14,7 +14,6 @@ ASSEMBLED_DOCS = \ EXTRA_DIST = \ abc-removal.txt \ api-style.css \ - assembly-bundle \ check-exports \ check-coverage \ convert.cs \ @@ -23,7 +22,6 @@ EXTRA_DIST = \ docs.make \ documented \ embedded-api \ - exceptions \ exdoc \ file-share-modes \ gc-issues \ @@ -35,19 +33,14 @@ EXTRA_DIST = \ jit-imt \ jit-thoughts \ jit-trampolines \ - local-regalloc.txt \ - magic.diff \ mini-doc.txt \ mono-api-metadata.html \ mono-file-formats.config\ mono-file-formats.source\ - mono_handle_d \ mono-tools.config \ mono-tools.source \ monoapi.source \ - new-regalloc \ object-layout \ - opcode-decomp.txt \ precise-gc \ produce-lists \ public \ @@ -56,12 +49,10 @@ EXTRA_DIST = \ release-notes-1.0.html \ remoting \ ssapre.txt \ - stack-alignment \ stack-overflow.txt \ threading \ toc.xml \ TODO \ - tree-mover.txt \ unmanaged-calls dist-hook: diff --git a/docs/assembly-bundle b/docs/assembly-bundle deleted file mode 100644 index 3e64e147cb1..00000000000 --- a/docs/assembly-bundle +++ /dev/null @@ -1,57 +0,0 @@ - - HOWTO bundle assemblies inside the mono runtime. - Paolo Molaro (lupus@ximian.com) - -* Intent - - Bundling assemblies inside the mono runtime may be useful for a number - of reasons: - - * creating a standalone complete runtime that can be more easily - distributed - - * having an application run against a known set of assemblies - that has been tested - - Of course, there are drawbacks, too: if there has been fixes - to the assemblies, replacing them means recompiling the - runtime as well and if there are other mono apps, unless they - use the same mono binary, there will be less opportunities for - the operating system to optimize memory usage. So use this - feature only when really needed. - -* Creating the Bundle - - To bundle a set of assemblies, you need to create a file that - lists the assembly names and the relative files. Empty lines - and lines starting with # are ignored: - - == cut cut == - # Sample bundle template - mscorlib: /path/to/mscorlib/assembly.dll - myapp: /path/to/myapp.exe - == cut cut == - - Next you need to build the mono runtime using a special configure option: - - ./configure --with-bundle=/path/to/bundle/template - - The path to the template should be an absolute path. - - The script metadata/make-bundle.pl will take the specifie - assemblies and embed them inside the runtime where the loading - routines can find them before searching for them on disk. - -* Open Issues - - There are still two issues to solve: - - * config files: sometimes they are needed but they are - not yet bundled inside the library () - - * building with the included libgc makes it not - possible to build a mono binary statically linked to - libmono: this needs to be fixed to make bundles - really useful. - - diff --git a/docs/exceptions b/docs/exceptions index d5ecaeead43..bd22de9f25f 100644 --- a/docs/exceptions +++ b/docs/exceptions @@ -71,40 +71,3 @@ unwinding code. catch handler: catch hanlders are always called from the stack unwinding code. The exception object is passed in a local variable (cfg->exvar). - -gcc support for Exceptions -========================== - -gcc supports exceptions in files compiled with the -fexception option. gcc -generates DWARF exceptions tables in that case, so it is possible to unwind the -stack. The method to read those exception tables is contained in libgcc.a, and -in newer versions of glibc (glibc 2.2.5 for example), and it is called -__frame_state_for(). Another usable glibc function is backtrace_symbols() which -returns the function name corresponding to a code address. - -We dynamically check if those features are available using g_module_symbol(), -and we use them only when available. If not available we use the LMF as -fallback. - -Using gcc exception information prevents us from saving the LMF at each native -call, so this is a way to speed up native calls. This is especially valuable -for internal calls, because we can make sure that all internal calls are -compiled with -fexceptions (we compile the whole mono runtime with that -option). - -All native function are able to call function without exception tables, and so -we are unable to restore all caller saved registers if an exception is raised -in such function. Well, its possible if the previous function already saves all -registers. So we only omit the the LMF if a function has an exception table -able to restore all caller saved registers. - -One problem is that gcc almost never saves all caller saved registers, because -it is just unnecessary in normal situations. But there is a trick forcing gcc -to save all register, we just need to call __builtin_unwind_init() at the -beginning of a function. That way gcc generates code to save all caller saved -register on the stack. - - - - -
\ No newline at end of file diff --git a/docs/local-regalloc.txt b/docs/local-regalloc.txt deleted file mode 100644 index a6e523557fe..00000000000 --- a/docs/local-regalloc.txt +++ /dev/null @@ -1,208 +0,0 @@ - -* Proposal for the local register allocator - - The local register allocator deals with allocating registers - for temporaries inside a single basic block, while the global - register allocator is concerned with method-wide allocation of - variables. - The global register allocator uses callee-saved register for it's - purpouse so that there is no need to save and restore these registers - at call sites. - - There are a number of issues the local allocator needs to deal with: - *) some instructions expect operands in specific registers (for example - the shl instruction on x86, or the call instruction with thiscall - convention, or the equivalent call instructions on other architectures, - such as the need to put output registers in %oX on sparc) - *) some instructions deliver results only in specific registers (for example - the div instruction on x86, or the call instructionson on almost all - the architectures). - *) it needs to know what registers may be clobbered by an instruction - (such as in a method call) - *) it should avoid excessive reloads or stores to improve performance - - While which specific instructions have limitations is architecture-dependent, - the problem shold be solved in an arch-independent way to reduce code duplication. - The register allocator will be 'driven' by the arch-dependent code, but it's - implementation should be arch-independent. - - To improve the current local register allocator, we need to - keep more state in it than the current setup that only keeps busy/free info. - - Possible state information is: - - free: the resgister is free to use and it doesn't contain useful info - freeable: the register contains data loaded from a local (there is - also info about _which_ local it contains) as a result from previous - instructions (like, there was a store from the register to the local) - moveable: it contains live data that is needed in a following instruction, but - the contents may be moved to a different register - busy: the register contains live data and it is placed there because - the following instructions need it exactly in that register - allocated: the register is used by the global allocator - - The local register allocator will have the following interfaces: - - int get_register (); - Searches for a register in the free state. If it doesn't find it, - searches for a freeable register. Sets the status to moveable. - Looking for a 'free' register before a freeable one should allow for - removing a few redundant loads (though I'm still unsure if such - things should be delegated entirely to the peephole pass). - - int get_register_force (int reg); - Returns 'reg' if it is free or freeable. If it is moveable, it moves it - to another free or freeable register. - Sets the status of 'reg' to busy. - - void set_register_freeable (int reg); - Sets the status of 'reg' to freeable. - - void set_register_free (int reg); - Sets the status of 'reg' to free. - - void will_clobber (int reg); - Spills the register to the stack. Sets the status to freeable. - After the clobbering has occurred, set the status to free. - - void register_unspill (int reg); - Un-spills register reg and sets the status to moveable. - - FIXME: how is the 'local' information represented? Maybe a MonoInst* pointer. - - Note: the register allocator will insert instructions in the basic block - during it's operation. - -* Examples - - Given the tree (on x86 the right argument to shl needs to be in ecx): - - store (local1, shl (local1, call (some_arg))) - - At the start of the basic block, the registers are set to the free state. - The sequence of instructions may be: - instruction register status -> [%eax %ecx %edx] - start free free free - eax = load local1 mov free free - /* call clobbers eax, ecx, edx */ - spill eax free free free - call mov free free - /* now eax contains the right operand of the shl */ - mov %eax -> %ecx free busy free - un-spill mov busy free - shl %cl, %eax mov free free - - The resulting x86 code is: - mov $fffc(%ebp), %eax - mov %eax, $fff0(%ebp) - push some_arg - call func - mov %eax, %ecx - mov $fff0(%ebp), %eax - shl %cl, %eax - - Note that since shl could operate directly on memory, we could have: - - push some_arg - call func - mov %eax, %ecx - shl %cl, $fffc(%ebp) - - The above example with loading the operand in a register is just to complicate - the example and show that the algorithm should be able to handle it. - - Let's take another example with the this-call call convention (the first argument - is passed in %ecx). - In this case, will_clobber() will be called only on %eax and %edx, while %ecx - will be allocated with get_register_force (). - Note: when a register is allocated with get_register_force(), it should be set - to a different state as soon as possible. - - store (local1, shl (local1, this-call (local1))) - - instruction register status -> [%eax %ecx %edx] - start free free free - eax = load local1 mov free free - /* force load in %ecx */ - ecx = load local1 mov busy free - spill eax free busy free - call mov free free - /* now eax contains the right operand of the shl */ - mov %eax -> %ecx free busy free - un-spill mov busy free - shl %cl, %eax mov free free - - What happens when a register that we need to allocate with get_register_force () - contains an operand for the next instruction? - - instruction register status -> [%eax %ecx %edx] - eax = load local0 mov free free - ecx = load local1 mov mov free - get_register_force (ecx) here. - We have two options: - mov %ecx, %edx - or: - spill %ecx - The first option is way better (and allows the peephole pass to - just load the value in %edx directly, instead of loading first to %ecx). - This doesn't work, though, if the instruction clobbers the %edx register - (like in a this-call). So, we first need to clobber the registers - (so the state of %ecx changes to freebale and there is no issue - with get_register_force ()). - What if an instruction both clobbers a register and requires it as - an operand? Lets' take the x86 idiv instruction as an example: it - requires the dividend in edx:eax and returns the result in eax, - with the modulus in edx. - - store (local1, div (local1, local2)) - - instruction register status -> [%eax %ecx %edx] - eax = load local0 mov free free - will_clobber eax, edx free mov free - force mov %ecx, %eax busy free free - set %edx busy free busy - idiv mov free free - - Note: edx is set to free after idiv, because the modulus is not needed - (if it was a rem, eax would have been freed). - If we load the divisor before will_clobber(), we'll have to spill - eax and reload it later. If we load it just after the idiv, there is no issue. - In any case, the algorithm should give the correct results and allow the operation. - - Working recursively on the isntructions there shouldn't be huge issues - with this algorithm (though, of course, it's not optimal and it may - introduce excessive spills or register moves). The advantage over the current - local reg allocator is that: - 1) the number of spills/moves would be smaller anyway - 2) a separate peephole pass could be able to eliminate reg moves - 3) we'll be able to remove the 'forced' spills we currently do with - the return value of method calls - -* Issues - - How to best integrate such a reg allocator with the burg stuff. - - Think about a call os sparc with two arguments: they got into %o0 and %o1 - and each of them sets the register as busy. But what if the values to put there - are themselves the result of a call? %o0 is no problem, but for all the - next argument n the above algorithm would spill all the 0...n-1 registers... - -* Papers - - More complex solutions to the local register allocator problem: - http://dimacs.rutgers.edu/TechnicalReports/abstracts/1997/97-33.html - - Combining register allocation and instruction scheduling: - http://citeseer.nj.nec.com/motwani95combining.html - - More on LRA euristics: - http://citeseer.nj.nec.com/liberatore97hardness.html - - Linear-time optimal code scheduling for delayedload architectures - http://www.cs.wisc.edu/~fischer/cs701.f01/inst.sched.ps.gz - - Precise Register Allocation for Irregular Architectures - http://citeseer.nj.nec.com/kong98precise.html - - Allocate registers first to subtrees that need more of them. - http://www.upb.de/cs/ag-kastens/compii/folien/comment401-409.2.pdf diff --git a/docs/mono_handle_d b/docs/mono_handle_d deleted file mode 100644 index a8f97b141c3..00000000000 --- a/docs/mono_handle_d +++ /dev/null @@ -1,98 +0,0 @@ -=pod - -=head1 Internal design document for the mono_handle_d - -This document is designed to hold the design of the mono_handle_d and -not as an api reference. - -=head2 Primary goal and purpose - -The mono_handle_d is a process which takes care of the (de)allocation -of scratch shared memory and handles (of files, threads, mutexes, -sockets etc. see L<WapiHandleType>) and refcounts of the -filehandles. It is designed to be run by a user and to be fast, thus -minimal error checking on input is done and will most likely crash if -given a faulty package. No effort has been, or should be, made to have -the daemon talking to machine of different endianness/size of int. - -=head2 How to start the daemon - -To start the daemon you either run the mono_handle_d executable or try -to attach to the shared memory segment via L<_wapi_shm_attach> which -will start a daemon if one does not exist. - -=head1 Internal details - -The daemon works by opening a socket and listening to clients. These -clients send packages over the socket complying to L<struct -WapiHandleRequest>. - -=head2 Possible requests - -=over - -=item WapiHandleRequest_New - -Find a handle in the shared memory segment that is free and allocate -it to the specified type. To destroy use -L</WapiHandleRequest_Close>. A L<WapiHandleResponse> with -.type=WapiHandleResponseType_New will be sent back with .u.new.handle -set to the handle that was allocated. .u.new.type is the type that was -requested. - -=item WapiHandleRequestType_Open - -Increase the ref count of an already created handle. A -L<WapiHandleResponse> with .type=WapiHandleResponseType_Open will be sent -back with .u.new.handle set to the handle, .u.new.type is set to the -type of handle this is. - -=item WapiHandleRequestType_Close - -Decrease the ref count of an already created handle. A -L<WapiHandleResponse> with .type=WapiHandleResponseType_Close will be -sent back with .u.close.destroy set to TRUE if ref count for this -client reached 0. - -=item WapiHandleRequestType_Scratch - -Allocate a shared memory area of size .u.scratch.length in bytes. A -L<WapiHandleResponse> with .type=WapiHandleResponseType_Scratch will be -sent back with .u.scratch.idx set to the index into the shared -memory's scratch area where to memory begins. (works just like -malloc(3)) - -=item WapiHandleRequestType_Scratch - -Deallocate a shared memory area, this must have been allocated before -deallocating. A L<WapiHandleResponse> with -.type=WapiHandleResponseType_ScratchFree will be sent back (works just -like free(3)) - -=back - -=head1 Why a daemon - -From an email: - -Dennis: I just have one question about the daemon... Why does it -exist? Isn't it better performancewise to just protect the shared area -with a mutex when allocation a new handle/shared mem segment or -changing refcnt? It will however be a less resilient to clients that -crash (the deamon cleans up ref'd handles if socket closes) - -Dick: It's precisely because with a mutex the shared memory segment -can be left in a locked state. Also, it's not so easy to clean up -shared memory without it (you can't just mark it deleted when creating -it, because you can't attach any more readers to the same segment -after that). I did some minimal performance testing, and I don't -think the daemon is particularly slow. - - -=head1 Authors - -Documentaion: Dennis Haney - -Implementation: Dick Porter - -=cut diff --git a/docs/new-regalloc b/docs/new-regalloc deleted file mode 100644 index b687c2b50c6..00000000000 --- a/docs/new-regalloc +++ /dev/null @@ -1,68 +0,0 @@ -We need to switch to a new register allocator. -The current one is split in a global and a local register allocator. -The global one can assign only callee-saves registers and happens -on the tree-based internal representation: it assigns local variables -to hardware registers. -The local one happens on the linear representation on a per basic -block basis and assigns hard registers to virtual registers (which -hold temporary values during expression executions) and it deals also -with the platform-specific issues (fixed registers, call conventions). - -Moving to a different register will help solve some of the performance -issues introduced by the above split, make the register more easily -portable and solve some of the issues generated by dealing with trees. - -The general design ideas are below. - -The new allocator should have a global view of all the method, so it can be -able to assign variables also to some of the volatile registers if possible, -even across basic blocks (this would improve performance). - -The allocator would be driven by per-arch declarative data, so porting -should be easier: an architecture needs to specify register classes, -call convention and instructions requirements (similar to the gcc code). - -The allocator should operate on the linear representation, this way it's -easier and faster to track usages more correctly. We need to assign virtual -registers on a per-method basis instead of per basic block. We can assign -virtual registers to variables, too. Note that since we fix the stack offset -of local vars only after this step (which happens after the burg rules are run), -some of the burg rules that try to optimize the code won't apply anymore: -the peephole code may need to be enhanced to do the optimizations instead. - -We need to handle floating point registers in the global allocator, too. - -The new allocator also needs to keep track precisely of which registers -contain references or managed pointers to allow us to move to a precise GC. - -It may be worth to use a single increasing set of integers for the virtual -registers, with the class of the register stored separately (unless the -current local allocator which keeps interger and fp registers separate). - -Since this is a large task, we need to do it in steps as much as possible. -The first is to run the register allocator _after_ the burg rules: this -requires a rewrite of the liveness code, too, to use linear indexes instead -of basic-block/tree number combinations. This can be done by: -*) allocating virtual regs to all the locals that can be register allocated -*) running the burg rules (some may require adjustments): the local virtual -registers are assigned starting from global-virt-regs+1, instead of the current -hardware-regs+1, so we can tell apart global and local virt regs. -*) running the liveness/whatever code is needed to allocate the global registers -*) allocate the rest of the local variables to stack slots -*) continue with the current local allocator - -This work could take 2-3 weeks. - -The next step is to define the kind of declarative data an architecture needs -and assigning virtual regs to all the registers and making the allocator -assign from the volatile registers, too. -Note that some of the code that is currently emitted in the arch-specific -code, will need to be emitted as instructions that the reg allocator -can inspect: think of a method that returns the first argument which is -received in a register: the current code copies it to either a local slot or -to a global reg in the prolog an copies it back to the return register -int he basic block, but since neither the regallocator nor the peephole code -knows about the prolog code, the first store cannot be optimized away. -The gcc code has some example of how to specify register classes in a -declarative way. - diff --git a/docs/opcode-decomp.txt b/docs/opcode-decomp.txt deleted file mode 100644 index 48968d17ab9..00000000000 --- a/docs/opcode-decomp.txt +++ /dev/null @@ -1,113 +0,0 @@ - -* How to handle complex IL opcodes in an arch-independent way - - Many IL opcodes are very simple: add, ldind etc. - Such opcodes can be implemented with a single cpu instruction - in most architectures (on some, a group of IL instructions - can be converted to a single cpu op). - There are many IL opcodes, though, that are more complex, but - can be expressed as a series of trees or a single tree of - simple operations. Such simple operations are architecture-independent. - It makes sense to decompose such complex IL instructions in their - simpler equivalent so that we gain in several ways: - *) porting effort is easier, because only the simple instructions - need to be implemented in arch-specific code - *) we could apply BURG rules to the trees and do pattern matching - on them to optimize the expressions according to the host cpu - - The issue is: where do we do such conversion from coarse opcodes to - simple expressions? - -* Doing the conversion in method_to_ir () - - Some of these conversions can certainly be done in method_to_ir (), - but it's not always easy to decide which are better done there and - which in a different pass. - For example, let's take ldlen: in the mono implementation, ldlen - can be simply implemented with a load from a fixed position in the - array object: - - len = [reg + maxlen_offset] - - However, ldlen carries also semantics information: the result is the - length of the array, and since in the CLR arrays are of fixed size, - this information can be useful to later do bounds check removal. - If we convert this opcode in method_to_ir () we lost some useful - information for further optimizations. - - In some other ways, decomposing an opcode in method_to_ir() may - allow for better optimizations later on (need to come up with an - example here ...). - -* Doing the conversion in inssel.brg - - Some conversion may be done inside the burg rules: this has the - disadvantage that the instruction selector is not run again on - the resulting expression tree and we could miss some optimization - (this is what effectively happens with the coarse opcodes in the old - jit). This may also interfere with an efficient local register allocator. - It may be possible to add an extension in monoburg that allows a rule - such as: - - recheck: LDLEN (reg) { - create an expression tree representing LDLEN - and return it - } - - When the monoburg label process gets back a recheck, it will run - the labeling again on the resulting expression tree. - If this is possible at all (and in an efficient way) is a - question for dietmar:-) - It should be noted, though, that this may not always work, since - some complex IL opcodes may require a series of expression trees - and handling such cases in monoburg could become quite hairy. - For example, think of opcode that need to do multiple actions on the - same object: this basically means a DUP... - On the other end, if a complex opcode needs a DUP, monoburg doesn't - actually need to create trees if it emits the instructions in - the correct sequence and maintains the right values in the registers - (usually the values that need a DUP are not changed...). How - this integrates with the current register allocator is not clear, since - that assigns registers based on the rule, but the instructions emitted - by the rules may be different (this already happens with the current JIT - where a MULT is replaced with lea etc...). - -* Doing it in a separate pass. - - Doing the conversion in a separate pass over the instructions - is another alternative. This can be done right after method_to_ir () - or after the SSA pass (since the IR after the SSA pass should look - almost like the IR we get back from method_to_ir ()). - - This has the following advantages: - *) monoburg will handle only the simple opcodes (makes porting easier) - *) the instruction selection will be run on all the additional trees - *) it's easier to support coarse opcodes that produce multiple expression - trees (and apply the monoburg selector on all of them) - *) the SSA optimizer will see the original opcodes and will be able to use - the semantic info associated with them - - The disadvantage is that this is a separate pass on the code and - it takes time (how much has not been measured yet, though). - - With this approach, we may also be able to have C implementations - of some of the opcodes: this pass would insert a function call to - the C implementation (for example in the cases when first porting - to a new arch and implemenating some stuff may be too hard in asm). - -* Extended basic blocks - - IL code needs a lot of checks, bounds checks, overflow checks, - type checks and so on. This potentially increases by a lot - the number of basic blocks in a control flow graph. However, - all such blocks end up with a throw opcode that gives control to the - exception handling mechanism. - After method_to_ir () a MonoBasicBlock can be considered a sort - of extended basic block where the additional exits don't point - to basic blocks in the same procedure (at least when the method - doesn't have exception tables). - We need to make sure the passes following method_to_ir () can cope - with such kinds of extended basic blocks (especially the passes - that we need to apply to all the methods: as a start, we could - skip SSA optimizations for methods with exception clauses...) - diff --git a/docs/reactive-extension-bundle.txt b/docs/reactive-extension-bundle.txt deleted file mode 100644 index 175818a63be..00000000000 --- a/docs/reactive-extension-bundle.txt +++ /dev/null @@ -1,49 +0,0 @@ -With this change, we bundle Reactive Extensions from Microsoft. - -Steps to do: - -- Until we add submodule, check out Rx sources from http://rx.codeplex.com: - - $ cd external - $ git clone git://github.com/atsushieno/rx.git - $ cd rx - $ git checkout rx-oss-v1.0 - $ cd ../.. - - Note that the original repo at rx.codeplex.com will *fail* on Linux! - codeplex.codeplex.com/workitem/26133 - Also note that rx.codeplex.com is huge and takes very long time to checkout. - -- expand rx-mono-changes-3.tar.bz2 - - $ tar jxvf rx-mono-changes-3.tar.bz2 - -- Apply changes to mcs/class/Makefile: - - $ cd mcs/class - $ patch -i add-rx-libs.patch -p3 - $ cd ../.. - -Then it should be done. - -Note that this does not include Mono.Reactive.Testing into the build yet - -this library depends on nunit.framework.dll but it wouldn't be built before -this assembly is built. This needs to be resolved. - -** Current Status - -- We don't have Microsoft.Reactive.Testing.dll. Instead, I created an - alternative Mono.Reactive.Testing.dll which *mostly* uses MS sources for - that assembly but uses NUnit.Framework instead. - - To make it happen, I added a small script that automatically replaces - MSTest dependency parts with that for NUnit (replacer.sh under rx tree). - - (We'll also have to rename namespaces and have more source changes, but - so far it is to get things runnable.) - -- To check the build sanity, I imported unit tests (as explained above) - and it is supposed to run by "make run-test" in Mono.Reactive.Testing - directory (the tests were all in one place in MS tests, so I made it - in Mono.Reactive.Testing directory instead). - diff --git a/docs/release-notes-1.0.html b/docs/release-notes-1.0.html deleted file mode 100644 index 9c305433c51..00000000000 --- a/docs/release-notes-1.0.html +++ /dev/null @@ -1,16 +0,0 @@ -<h1>Mono 1.0 Release Notes</h1> - -<h2>What does Mono Include</h2> - -<h2>Missing functionality</h2> - - <p>COM support. - - <p>EnterpriseServices are non-existant. - - <p>Windows.Forms is only available as a preview, it is not - completed nor stable. - -<h3>Assembly: System.Drawing</h3> - - <p>System.Drawing.Printing is not supported.
\ No newline at end of file diff --git a/docs/stack-alignment b/docs/stack-alignment deleted file mode 100644 index da995fb288f..00000000000 --- a/docs/stack-alignment +++ /dev/null @@ -1,33 +0,0 @@ -Size and alignment requirements of stack values -=============================================== - -P ... System.IntPtr -I1 ... System.Int8 -I2 ... System.Int16 -I4 ... System.Int32 -I8 ... System.Int64 -F ... System.Single -D ... System.Double -LD ... native long double - ------------------------------------------------------------ -ARCH | P | I1 | I2 | I4 | I8 | F | D | LD | ------------------------------------------------------------ -X86 | 4/4 | 4/4 | 4/4 | 4/4 | 8/4 | 4/4 | 8/4 |12/4 | ------------------------------------------------------------ -X86/W32 | 4/4 | 4/4 | 4/4 | 4/4 | 8/4 | 4/4 | 8/4 |12/4 | ------------------------------------------------------------ -ARM | 4/4 | 4/4 | 4/4 | 4/4 | 8/4 | 4/4 | 8/4 | 8/4 | ------------------------------------------------------------ -M68K | 4/4 | 4/4 | 4/4 | 4/4 | 8/4 | 4/4 | 8/4 |12/4 | ------------------------------------------------------------ -ALPHA | 8/8 | 8/8 | 8/8 | 8/8 | 8/8 | 8/8 | 8/8 | 8/8 | ------------------------------------------------------------ -SPARC | 4/4 | 4/4 | 4/4 | 4/4 | 8/8 | 4/4 | 8/8 |16/8 | ------------------------------------------------------------ -SPARC64 | 8/8 | 8/8 | 8/8 | 8/8 | 8/8 | 8/8 | 8/8 |16/16| ------------------------------------------------------------ -MIPS | 4/4 | 4/4 | 4/4 | 4/4 | ?/? | 4/4 | 8/8 | 8/8 | ------------------------------------------------------------ - | | | | | | | | | ------------------------------------------------------------ diff --git a/docs/tree-mover.txt b/docs/tree-mover.txt deleted file mode 100644 index 3ee836a5b2f..00000000000 --- a/docs/tree-mover.txt +++ /dev/null @@ -1,261 +0,0 @@ - -Purpose - -Especially when inlining is active, it can happen that temporary -variables add pressure to the register allocator, producing bad -code. - -The idea is that some of these temporaries can be totally eliminated -my moving the MonoInst tree that defines them directly to the use -point in the code (so the name "tree mover"). - -Please note that this is *not* an optimization: it is mostly a -workaround to issues we have in the regalloc. -Actually, with the new linear IR this will not be possible at all -(there will be no more trees in the code!). -Anyway, this workaround turns out to be useful in the current state -of things... - ------------------------------------------------------------------------ - -Base logic - -If a local is defined by a value which is a proper expression (a tree -of MonoInst, not just another local or a constant), and this definition -is used only once, the tree can be moved directly to the use location, -and the definition eliminated. -Of course, none of the variables used in the tree must be defined in -the code path between the definition and the use, and the tree must be -free of side effects. -We do not handle the cases when the tree is just a local or a constant -because they are handled by copyprop and consprop, respectively. - -To make things simpler, we restrict the tree move to the case when: -- the definition and the use are in the same BB, and -- the use is followed by another definition in the same BB (it is not - possible that the 1st value is used again), or alternatively there - is no BB in the whole CFG that contains a use of this local before a - definition (so, again, there is no code path that can lead to a - subsequent use). - -To handle this, we maintain an ACT array (Available Copy Tree, similar -to the ACP), where we store the "state" of every local. -Ideally, every local can be in the following state: -[E] Undefined (by a tree, it could be in the ACP but we don't care). -[D] Defined (by a tree), and waiting for a use. -[U] Used, with a tree definition available in the same BB, but still - without a definition following the use (always in the same BB). -Of course state [E] (empty) is the initial one. - -Besides, there are two sort of "meta states", or flags: -[W] Still waiting for a use or definition in this BB (we have seen no - occurrence of the local yet). -[X] Used without being previously defined in the same BB (note that if - there is a definition that precedes the use in the same BB, even if - the definition is not a tree or is not available because of side - effects or because the tree value has changed the local is not in - state [X]). -Also note that state [X] is a sort of "global" condition, which if set -in one BB will stay valid for the whole CFG, even if the local will -otherwise change state. The idea of flagging a local as [X] is that if -there is a definition/use pair that reaches the end of a BB, it could -be that there is a CFG path that then leads to the BB flagging it as -[X] (which contains a use), so the tree cannot be moved. -So state [X] will always be set, and never examined in all the state -transitions we will describe. -In practice, we use flag [W] to set state [X]: if, when traversing a -BB, we find a use for a local in state [W], then that local is flagged -[X]. - - -For each BB, we initialize all states to [E] and [W], and then we -traverse the code one inst at a time, and update the variable states -in the ACT in the following ways: - -[Definition] - - Flag [W] is cleared. - - All "affected trees" are killed (go from state [D] to [E]). - The "affected trees" are the trees which contain (use) the defined - local, and the rationale is that the tree value changed, so the - tree is no longer available. - - If the local was in state [U], *that* tree move is marked "safe" - (because *this* definition makes us sure that the previous tree - cannot be used again in any way). - The idea is that "safe" moves can happen even if the local is - flagged [X], because the second definition "covers" the use. - The tree move is then saved in the "todo" list (and the affecting - nodes are cleared). - - If the local was defined by a tree, it goes to state [D], the tree - is recorded, and all the locals used in it are marked as "affecting - this tree" (of course these markers are lists, because each local - could affect more than one tree). - -[IndirectDefinition] - - All potentially affected trees (in state [D]) are killed. - -[Use] - - If the local is still [W], it is flagged [X] (the [W] goes away). - - If the local is in state [D], it goes to state [U]. - The tree move must not yet be recorded in the "todo" list, it still - stays in the ACT slot belonging to this local. - Anyway, the "affecting" nodes are updated, because now a definition - of a local used in this tree will affect only "indirect" (or also - "propagated") moves, but not *this* move (see below). - - If the local is in state [U], then the tree cannot be moved (it is - used two times): the move is canceled, and the state goes [E]. - - If the local is in state [E], the use is ignored. - -[IndirectUse] - - All potentially affected trees (in state [D] or [U]) are killed. - -[SideEffect] - - Tree is marked as "unmovable". - -Then, at the end of the BB, for each ACT slot: - - If state is [U], the tree move is recorded in the "todo" list, but - flagged "unsafe". - - Anyway, state goes to [E], the [W] flag is set, and all "affecting" - lists are cleared (we get ready to traverse the next BB). -Finally, when all BBs has been scanned, we traverse the "todo" list, -moving all "safe" entries, and moving "unsafe" ones only if their ACT -slot is not flagged [X]. - -So far, so good. -But there are two issues that make things harder :-( - -The first is the concept of "indirect tree move". -It can happen that a tree is scheduled for moving, and its destination -is a use that is located in a second tree, which could also be moved. -The main issue is that a definition of a variable of the 1st tree on -the path between the definition and the use of the 2nd one must prevent -the move. -But which move? The 1st or the 2nd? -Well, any of the two! -The point is, the 2nd move must be prevented *only* if the 1st one -happens: if it is aborted (for an [X] flag or any other reason), the -2nd move is OK, and vice versa... -We must handle this in the following way: -- The ACT must still remember if a slot is scheduled for moving in - this BB, and if it is, all the locals used in the tree. - We say that the slot is in state [M]. - Note that [M] is (like [X] and [W]) a sort of "meta state": a local - is flagged [M] when it goes to state [U], and the flag is cleared - when the tree move is cancelled -- A tree that uses a local whose slot is in state [M] is also using all - the locals used by the tree in state [M], but the use is "indirect". - These use nodes are also included in the "affecting" lists. -- The definition of a variable used in an "indirect" way has the - effect of "linking" the two involved tree moves, saying that only one - of the two can happen in practice, but not both. -- When the 2nd tree is scheduled for moving, the 1st one is *still* in - state [M], because a third move could "carry it forward", and all the - *three* moves should be mutually exclusive (to be safe!). - -The second tricky complication is the "tree forwarding" that can happen -when copyprop is involved. -It is conceptually similar to the "indirect tree move". -Only, the 2nd tree is not really a tree, it is just the local defined -in the 1st tree move. -It can happen that copyprop will propagate the definition. -We cannot make treeprop do the same job of copyprop, because copyprop -has less constraints, and is therefore more powerful in its scope. -The main issue is that treeprop cannot propagate a tree to *two* uses, -while copyprop is perfectly capable of propagating one definition to -two (or more) different places. -So we must let copyprop do its job otherwise we'll miss optimizations, -but we must also make it play safe with treeprop. -Let's clarify with an example: - a = v1 + v2; //a is defined by a tree, state [D], uses v2 and v2 - b = a; //a is used, state [U] with move scheduled, and - //b is defined by a, ACP[b] is a, and b is in state [DC] - c = b + v3; // b is used, goes to state [U] -The real trouble is that copyprop happens *immediately*, while treeprop -is deferred to the end of the CFG traversal. -So, in the 3rd statement, the "b" is immediately turned into an "a" by -copyprop, regardless of what treeprop will do. -Anyway, if we are careful, this is not so bad. -First of all, we must "accept" the fact that in the 3rd statement the -"b" is in fact an "a", as treeprop must happen *after* copyprop. -The real problem is that "a" is used twice: in the 2nd and 3rd lines. -In our usual setup, the 2nd line would set it to [U], and the 3rd line -would kill the move (and set "a" to [E]). -I have tried to play tricks, and reason as of copyprop didn't happen, -but everything becomes really messy. -Instead, we should note that the 2nd line is very likely to be dead. -At least in this BB, copyprop will turn all "b"s into "a"s as long as -it can, and when it cannot, it will be because either "a" or "b" have -been redefined, which would be after the tree move anyway. -So, the reasoning gets different: let's pretend that "b" will be dead. -This will make the "a" use in the 2nd statement useless, so there we -can "reset" "a" to [D], but also take note that if "b" will end up -not being dead, the tree move associated to this [D] must be aborted. -We can detect this in the following way: -- Either "b" is used before being defined in this BB, or -- It will be flagged "unsafe". -Both things are very easy to check. -The only quirk is that the "affecting" lists must not be cleared when -a slot goes to state [U], because a "propagation" could put it back -to state [D] (where those lists are needed, because it can be killed -by a definition to a used slot). - ------------------------------------------------------------------------ - -Implementation notes - -All the implementation runs inside the existing mono_local_cprop -function, and a separate memory pool is used to hold the temporary -data. - -A struct, MonoTreeMover, contains the pointers to the pool, the ACT, -the list of scheduled moves and auxiliary things. -This struct is allocated if the tree move pass is requested, and is -then passed along to all the involved functions, which are therefore -aware of the tree mover state. - -The ACT is an array of slots, obviously one per local. -Each slot is of type MonoTreeMoverActSlot, and contains the used and -affected locals, a pointer to the pending tree move and the "waiting" -and "unsafe" flags. - -The "affecting" lists a built from "dependency nodes", of type -MonoTreeMoverDependencyNode. -Each of the nodes contains the used and affected local, and is in -two lists: the locals used by a slot, and the locals affected by a -slot (obviously a different one). -So, each node means: "variable x is used in tree t, so a definition -of x affects tree t". -The "affecting" lists are doubly linked, to allow for O(1) deletion. -The "used" lists are simply linked, but when they are mantained there -is always a pointer to the last element to allow for O(1) list moving. -When a used list is dismissed (which happens often, any time a node is -killed), its nodes are unlinked from their respective affecting lists -and are then put in a "free" list in the MonoTreeMover to be reused. - -Each tree move is represented by a struct (MonoTreeMoverTreeMove), -which contains: -- the definition and use points, -- the "affected" moves (recall the concept of "indirect tree move"), -- the "must be dead" slots (recall "tree forwarding"). and -- a few utility flags. -The tree moves stays in the relevant ACT slot until it is ready to be -scheduled for moving, at which point it is put in a list in the -MonoTreeMover. -The tree moves structs are reused when they are killed, so there is -also a "free" list for them in the MonoTreeMover. - -The tree mover code has been added to all the relevant functions that -participate in consprop and copyprop, particularly: -- mono_cprop_copy_values takes care of variable uses (transitions from - states [D] to [U] and [U] to [E] because of killing), -- mono_cprop_invalidate_values takes care of side effects (indirect - accesses, calls...), -- mono_local_cprop_bb sets up and cleans the traversals for each BB, - and for each MonoInst it takes care of variable definitions. -To each of them has been added a MonoTreeMover parameter, which is not -NULL if the tree mover is running. -After mono_local_cprop_bb has run for all BBs, the MonoTreeMover has -the list of all the pending moves, which must be walked to actually -perform the moves (when possible, because "unsafe" flags, "affected" -moves and "must be dead" slots can still have their effects, which -must be handled now because they are fully known only at the end of -the CFG traversal). |