diff options
Diffstat (limited to 'docs/opcode-decomp.txt')
-rw-r--r-- | docs/opcode-decomp.txt | 113 |
1 files changed, 113 insertions, 0 deletions
diff --git a/docs/opcode-decomp.txt b/docs/opcode-decomp.txt new file mode 100644 index 00000000000..48968d17ab9 --- /dev/null +++ b/docs/opcode-decomp.txt @@ -0,0 +1,113 @@ + +* How to handle complex IL opcodes in an arch-independent way + + Many IL opcodes are very simple: add, ldind etc. + Such opcodes can be implemented with a single cpu instruction + in most architectures (on some, a group of IL instructions + can be converted to a single cpu op). + There are many IL opcodes, though, that are more complex, but + can be expressed as a series of trees or a single tree of + simple operations. Such simple operations are architecture-independent. + It makes sense to decompose such complex IL instructions in their + simpler equivalent so that we gain in several ways: + *) porting effort is easier, because only the simple instructions + need to be implemented in arch-specific code + *) we could apply BURG rules to the trees and do pattern matching + on them to optimize the expressions according to the host cpu + + The issue is: where do we do such conversion from coarse opcodes to + simple expressions? + +* Doing the conversion in method_to_ir () + + Some of these conversions can certainly be done in method_to_ir (), + but it's not always easy to decide which are better done there and + which in a different pass. + For example, let's take ldlen: in the mono implementation, ldlen + can be simply implemented with a load from a fixed position in the + array object: + + len = [reg + maxlen_offset] + + However, ldlen carries also semantics information: the result is the + length of the array, and since in the CLR arrays are of fixed size, + this information can be useful to later do bounds check removal. + If we convert this opcode in method_to_ir () we lost some useful + information for further optimizations. + + In some other ways, decomposing an opcode in method_to_ir() may + allow for better optimizations later on (need to come up with an + example here ...). + +* Doing the conversion in inssel.brg + + Some conversion may be done inside the burg rules: this has the + disadvantage that the instruction selector is not run again on + the resulting expression tree and we could miss some optimization + (this is what effectively happens with the coarse opcodes in the old + jit). This may also interfere with an efficient local register allocator. + It may be possible to add an extension in monoburg that allows a rule + such as: + + recheck: LDLEN (reg) { + create an expression tree representing LDLEN + and return it + } + + When the monoburg label process gets back a recheck, it will run + the labeling again on the resulting expression tree. + If this is possible at all (and in an efficient way) is a + question for dietmar:-) + It should be noted, though, that this may not always work, since + some complex IL opcodes may require a series of expression trees + and handling such cases in monoburg could become quite hairy. + For example, think of opcode that need to do multiple actions on the + same object: this basically means a DUP... + On the other end, if a complex opcode needs a DUP, monoburg doesn't + actually need to create trees if it emits the instructions in + the correct sequence and maintains the right values in the registers + (usually the values that need a DUP are not changed...). How + this integrates with the current register allocator is not clear, since + that assigns registers based on the rule, but the instructions emitted + by the rules may be different (this already happens with the current JIT + where a MULT is replaced with lea etc...). + +* Doing it in a separate pass. + + Doing the conversion in a separate pass over the instructions + is another alternative. This can be done right after method_to_ir () + or after the SSA pass (since the IR after the SSA pass should look + almost like the IR we get back from method_to_ir ()). + + This has the following advantages: + *) monoburg will handle only the simple opcodes (makes porting easier) + *) the instruction selection will be run on all the additional trees + *) it's easier to support coarse opcodes that produce multiple expression + trees (and apply the monoburg selector on all of them) + *) the SSA optimizer will see the original opcodes and will be able to use + the semantic info associated with them + + The disadvantage is that this is a separate pass on the code and + it takes time (how much has not been measured yet, though). + + With this approach, we may also be able to have C implementations + of some of the opcodes: this pass would insert a function call to + the C implementation (for example in the cases when first porting + to a new arch and implemenating some stuff may be too hard in asm). + +* Extended basic blocks + + IL code needs a lot of checks, bounds checks, overflow checks, + type checks and so on. This potentially increases by a lot + the number of basic blocks in a control flow graph. However, + all such blocks end up with a throw opcode that gives control to the + exception handling mechanism. + After method_to_ir () a MonoBasicBlock can be considered a sort + of extended basic block where the additional exits don't point + to basic blocks in the same procedure (at least when the method + doesn't have exception tables). + We need to make sure the passes following method_to_ir () can cope + with such kinds of extended basic blocks (especially the passes + that we need to apply to all the methods: as a start, we could + skip SSA optimizations for methods with exception clauses...) + |