Welcome to mirror list, hosted at ThFree Co, Russian Federation.

github.com/mono/mono.git - Unnamed repository; edit this file 'description' to name the repository.
summaryrefslogtreecommitdiff
diff options
context:
space:
mode:
Diffstat (limited to 'docs/local-regalloc.txt')
-rw-r--r--docs/local-regalloc.txt208
1 files changed, 208 insertions, 0 deletions
diff --git a/docs/local-regalloc.txt b/docs/local-regalloc.txt
new file mode 100644
index 00000000000..a6e523557fe
--- /dev/null
+++ b/docs/local-regalloc.txt
@@ -0,0 +1,208 @@
+
+* Proposal for the local register allocator
+
+ The local register allocator deals with allocating registers
+ for temporaries inside a single basic block, while the global
+ register allocator is concerned with method-wide allocation of
+ variables.
+ The global register allocator uses callee-saved register for it's
+ purpouse so that there is no need to save and restore these registers
+ at call sites.
+
+ There are a number of issues the local allocator needs to deal with:
+ *) some instructions expect operands in specific registers (for example
+ the shl instruction on x86, or the call instruction with thiscall
+ convention, or the equivalent call instructions on other architectures,
+ such as the need to put output registers in %oX on sparc)
+ *) some instructions deliver results only in specific registers (for example
+ the div instruction on x86, or the call instructionson on almost all
+ the architectures).
+ *) it needs to know what registers may be clobbered by an instruction
+ (such as in a method call)
+ *) it should avoid excessive reloads or stores to improve performance
+
+ While which specific instructions have limitations is architecture-dependent,
+ the problem shold be solved in an arch-independent way to reduce code duplication.
+ The register allocator will be 'driven' by the arch-dependent code, but it's
+ implementation should be arch-independent.
+
+ To improve the current local register allocator, we need to
+ keep more state in it than the current setup that only keeps busy/free info.
+
+ Possible state information is:
+
+ free: the resgister is free to use and it doesn't contain useful info
+ freeable: the register contains data loaded from a local (there is
+ also info about _which_ local it contains) as a result from previous
+ instructions (like, there was a store from the register to the local)
+ moveable: it contains live data that is needed in a following instruction, but
+ the contents may be moved to a different register
+ busy: the register contains live data and it is placed there because
+ the following instructions need it exactly in that register
+ allocated: the register is used by the global allocator
+
+ The local register allocator will have the following interfaces:
+
+ int get_register ();
+ Searches for a register in the free state. If it doesn't find it,
+ searches for a freeable register. Sets the status to moveable.
+ Looking for a 'free' register before a freeable one should allow for
+ removing a few redundant loads (though I'm still unsure if such
+ things should be delegated entirely to the peephole pass).
+
+ int get_register_force (int reg);
+ Returns 'reg' if it is free or freeable. If it is moveable, it moves it
+ to another free or freeable register.
+ Sets the status of 'reg' to busy.
+
+ void set_register_freeable (int reg);
+ Sets the status of 'reg' to freeable.
+
+ void set_register_free (int reg);
+ Sets the status of 'reg' to free.
+
+ void will_clobber (int reg);
+ Spills the register to the stack. Sets the status to freeable.
+ After the clobbering has occurred, set the status to free.
+
+ void register_unspill (int reg);
+ Un-spills register reg and sets the status to moveable.
+
+ FIXME: how is the 'local' information represented? Maybe a MonoInst* pointer.
+
+ Note: the register allocator will insert instructions in the basic block
+ during it's operation.
+
+* Examples
+
+ Given the tree (on x86 the right argument to shl needs to be in ecx):
+
+ store (local1, shl (local1, call (some_arg)))
+
+ At the start of the basic block, the registers are set to the free state.
+ The sequence of instructions may be:
+ instruction register status -> [%eax %ecx %edx]
+ start free free free
+ eax = load local1 mov free free
+ /* call clobbers eax, ecx, edx */
+ spill eax free free free
+ call mov free free
+ /* now eax contains the right operand of the shl */
+ mov %eax -> %ecx free busy free
+ un-spill mov busy free
+ shl %cl, %eax mov free free
+
+ The resulting x86 code is:
+ mov $fffc(%ebp), %eax
+ mov %eax, $fff0(%ebp)
+ push some_arg
+ call func
+ mov %eax, %ecx
+ mov $fff0(%ebp), %eax
+ shl %cl, %eax
+
+ Note that since shl could operate directly on memory, we could have:
+
+ push some_arg
+ call func
+ mov %eax, %ecx
+ shl %cl, $fffc(%ebp)
+
+ The above example with loading the operand in a register is just to complicate
+ the example and show that the algorithm should be able to handle it.
+
+ Let's take another example with the this-call call convention (the first argument
+ is passed in %ecx).
+ In this case, will_clobber() will be called only on %eax and %edx, while %ecx
+ will be allocated with get_register_force ().
+ Note: when a register is allocated with get_register_force(), it should be set
+ to a different state as soon as possible.
+
+ store (local1, shl (local1, this-call (local1)))
+
+ instruction register status -> [%eax %ecx %edx]
+ start free free free
+ eax = load local1 mov free free
+ /* force load in %ecx */
+ ecx = load local1 mov busy free
+ spill eax free busy free
+ call mov free free
+ /* now eax contains the right operand of the shl */
+ mov %eax -> %ecx free busy free
+ un-spill mov busy free
+ shl %cl, %eax mov free free
+
+ What happens when a register that we need to allocate with get_register_force ()
+ contains an operand for the next instruction?
+
+ instruction register status -> [%eax %ecx %edx]
+ eax = load local0 mov free free
+ ecx = load local1 mov mov free
+ get_register_force (ecx) here.
+ We have two options:
+ mov %ecx, %edx
+ or:
+ spill %ecx
+ The first option is way better (and allows the peephole pass to
+ just load the value in %edx directly, instead of loading first to %ecx).
+ This doesn't work, though, if the instruction clobbers the %edx register
+ (like in a this-call). So, we first need to clobber the registers
+ (so the state of %ecx changes to freebale and there is no issue
+ with get_register_force ()).
+ What if an instruction both clobbers a register and requires it as
+ an operand? Lets' take the x86 idiv instruction as an example: it
+ requires the dividend in edx:eax and returns the result in eax,
+ with the modulus in edx.
+
+ store (local1, div (local1, local2))
+
+ instruction register status -> [%eax %ecx %edx]
+ eax = load local0 mov free free
+ will_clobber eax, edx free mov free
+ force mov %ecx, %eax busy free free
+ set %edx busy free busy
+ idiv mov free free
+
+ Note: edx is set to free after idiv, because the modulus is not needed
+ (if it was a rem, eax would have been freed).
+ If we load the divisor before will_clobber(), we'll have to spill
+ eax and reload it later. If we load it just after the idiv, there is no issue.
+ In any case, the algorithm should give the correct results and allow the operation.
+
+ Working recursively on the isntructions there shouldn't be huge issues
+ with this algorithm (though, of course, it's not optimal and it may
+ introduce excessive spills or register moves). The advantage over the current
+ local reg allocator is that:
+ 1) the number of spills/moves would be smaller anyway
+ 2) a separate peephole pass could be able to eliminate reg moves
+ 3) we'll be able to remove the 'forced' spills we currently do with
+ the return value of method calls
+
+* Issues
+
+ How to best integrate such a reg allocator with the burg stuff.
+
+ Think about a call os sparc with two arguments: they got into %o0 and %o1
+ and each of them sets the register as busy. But what if the values to put there
+ are themselves the result of a call? %o0 is no problem, but for all the
+ next argument n the above algorithm would spill all the 0...n-1 registers...
+
+* Papers
+
+ More complex solutions to the local register allocator problem:
+ http://dimacs.rutgers.edu/TechnicalReports/abstracts/1997/97-33.html
+
+ Combining register allocation and instruction scheduling:
+ http://citeseer.nj.nec.com/motwani95combining.html
+
+ More on LRA euristics:
+ http://citeseer.nj.nec.com/liberatore97hardness.html
+
+ Linear-time optimal code scheduling for delayedload architectures
+ http://www.cs.wisc.edu/~fischer/cs701.f01/inst.sched.ps.gz
+
+ Precise Register Allocation for Irregular Architectures
+ http://citeseer.nj.nec.com/kong98precise.html
+
+ Allocate registers first to subtrees that need more of them.
+ http://www.upb.de/cs/ag-kastens/compii/folien/comment401-409.2.pdf