Welcome to mirror list, hosted at ThFree Co, Russian Federation.

github.com/mono/mono.git - Unnamed repository; edit this file 'description' to name the repository.
summaryrefslogtreecommitdiff
diff options
context:
space:
mode:
Diffstat (limited to 'docs/opcode-decomp.txt')
-rw-r--r--docs/opcode-decomp.txt113
1 files changed, 113 insertions, 0 deletions
diff --git a/docs/opcode-decomp.txt b/docs/opcode-decomp.txt
new file mode 100644
index 00000000000..48968d17ab9
--- /dev/null
+++ b/docs/opcode-decomp.txt
@@ -0,0 +1,113 @@
+
+* How to handle complex IL opcodes in an arch-independent way
+
+ Many IL opcodes are very simple: add, ldind etc.
+ Such opcodes can be implemented with a single cpu instruction
+ in most architectures (on some, a group of IL instructions
+ can be converted to a single cpu op).
+ There are many IL opcodes, though, that are more complex, but
+ can be expressed as a series of trees or a single tree of
+ simple operations. Such simple operations are architecture-independent.
+ It makes sense to decompose such complex IL instructions in their
+ simpler equivalent so that we gain in several ways:
+ *) porting effort is easier, because only the simple instructions
+ need to be implemented in arch-specific code
+ *) we could apply BURG rules to the trees and do pattern matching
+ on them to optimize the expressions according to the host cpu
+
+ The issue is: where do we do such conversion from coarse opcodes to
+ simple expressions?
+
+* Doing the conversion in method_to_ir ()
+
+ Some of these conversions can certainly be done in method_to_ir (),
+ but it's not always easy to decide which are better done there and
+ which in a different pass.
+ For example, let's take ldlen: in the mono implementation, ldlen
+ can be simply implemented with a load from a fixed position in the
+ array object:
+
+ len = [reg + maxlen_offset]
+
+ However, ldlen carries also semantics information: the result is the
+ length of the array, and since in the CLR arrays are of fixed size,
+ this information can be useful to later do bounds check removal.
+ If we convert this opcode in method_to_ir () we lost some useful
+ information for further optimizations.
+
+ In some other ways, decomposing an opcode in method_to_ir() may
+ allow for better optimizations later on (need to come up with an
+ example here ...).
+
+* Doing the conversion in inssel.brg
+
+ Some conversion may be done inside the burg rules: this has the
+ disadvantage that the instruction selector is not run again on
+ the resulting expression tree and we could miss some optimization
+ (this is what effectively happens with the coarse opcodes in the old
+ jit). This may also interfere with an efficient local register allocator.
+ It may be possible to add an extension in monoburg that allows a rule
+ such as:
+
+ recheck: LDLEN (reg) {
+ create an expression tree representing LDLEN
+ and return it
+ }
+
+ When the monoburg label process gets back a recheck, it will run
+ the labeling again on the resulting expression tree.
+ If this is possible at all (and in an efficient way) is a
+ question for dietmar:-)
+ It should be noted, though, that this may not always work, since
+ some complex IL opcodes may require a series of expression trees
+ and handling such cases in monoburg could become quite hairy.
+ For example, think of opcode that need to do multiple actions on the
+ same object: this basically means a DUP...
+ On the other end, if a complex opcode needs a DUP, monoburg doesn't
+ actually need to create trees if it emits the instructions in
+ the correct sequence and maintains the right values in the registers
+ (usually the values that need a DUP are not changed...). How
+ this integrates with the current register allocator is not clear, since
+ that assigns registers based on the rule, but the instructions emitted
+ by the rules may be different (this already happens with the current JIT
+ where a MULT is replaced with lea etc...).
+
+* Doing it in a separate pass.
+
+ Doing the conversion in a separate pass over the instructions
+ is another alternative. This can be done right after method_to_ir ()
+ or after the SSA pass (since the IR after the SSA pass should look
+ almost like the IR we get back from method_to_ir ()).
+
+ This has the following advantages:
+ *) monoburg will handle only the simple opcodes (makes porting easier)
+ *) the instruction selection will be run on all the additional trees
+ *) it's easier to support coarse opcodes that produce multiple expression
+ trees (and apply the monoburg selector on all of them)
+ *) the SSA optimizer will see the original opcodes and will be able to use
+ the semantic info associated with them
+
+ The disadvantage is that this is a separate pass on the code and
+ it takes time (how much has not been measured yet, though).
+
+ With this approach, we may also be able to have C implementations
+ of some of the opcodes: this pass would insert a function call to
+ the C implementation (for example in the cases when first porting
+ to a new arch and implemenating some stuff may be too hard in asm).
+
+* Extended basic blocks
+
+ IL code needs a lot of checks, bounds checks, overflow checks,
+ type checks and so on. This potentially increases by a lot
+ the number of basic blocks in a control flow graph. However,
+ all such blocks end up with a throw opcode that gives control to the
+ exception handling mechanism.
+ After method_to_ir () a MonoBasicBlock can be considered a sort
+ of extended basic block where the additional exits don't point
+ to basic blocks in the same procedure (at least when the method
+ doesn't have exception tables).
+ We need to make sure the passes following method_to_ir () can cope
+ with such kinds of extended basic blocks (especially the passes
+ that we need to apply to all the methods: as a start, we could
+ skip SSA optimizations for methods with exception clauses...)
+