1 files changed, 113 insertions, 0 deletions
diff --git a/docs/opcode-decomp.txt b/docs/opcode-decomp.txt
new file mode 100644
index 00000000000..48968d17ab9
--- /dev/null
+++ b/docs/opcode-decomp.txt
@@ -0,0 +1,113 @@
+
+* How to handle complex IL opcodes in an arch-independent way
+
+	Many IL opcodes are very simple: add, ldind etc.
+	Such opcodes can be implemented with a single cpu instruction
+	in most architectures (on some, a group of IL instructions
+	can be converted to a single cpu op).
+	There are many IL opcodes, though, that are more complex, but
+	can be expressed as a series of trees or a single tree of
+	simple operations. Such simple operations are architecture-independent.
+	It makes sense to decompose such complex IL instructions in their
+	simpler equivalent so that we gain in several ways:
+	*) porting effort is easier, because only the simple instructions 
+		need to be implemented in arch-specific code
+	*) we could apply BURG rules to the trees and do pattern matching
+		on them to optimize the expressions according to the host cpu
+	
+	The issue is: where do we do such conversion from coarse opcodes to 
+	simple expressions?
+
+* Doing the conversion in method_to_ir ()
+
+	Some of these conversions can certainly be done in method_to_ir (),
+	but it's not always easy to decide which are better done there and 
+	which in a different pass.
+	For example, let's take ldlen: in the mono implementation, ldlen
+	can be simply implemented with a load from a fixed position in the 
+	array object:
+
+		len = [reg + maxlen_offset]
+	
+	However, ldlen carries also semantics information: the result is the
+	length of the array, and since in the CLR arrays are of fixed size,
+	this information can be useful to later do bounds check removal.
+	If we convert this opcode in method_to_ir () we lost some useful
+	information for further optimizations.
+
+	In some other ways, decomposing an opcode in method_to_ir() may
+	allow for better optimizations later on (need to come up with an 
+	example here ...).
+
+* Doing the conversion in inssel.brg
+
+	Some conversion may be done inside the burg rules: this has the 
+	disadvantage that the instruction selector is not run again on
+	the resulting expression tree and we could miss some optimization
+	(this is what effectively happens with the coarse opcodes in the old 
+	jit). This may also interfere with an efficient local register allocator.
+	It may be possible to add an extension in monoburg that allows a rule 
+	such as:
+
+		recheck: LDLEN (reg) {
+			create an expression tree representing LDLEN
+			and return it
+		}
+	
+	When the monoburg label process gets back a recheck, it will run
+	the labeling again on the resulting expression tree.
+	If this is possible at all (and in an efficient way) is a 
+	question for dietmar:-)
+	It should be noted, though, that this may not always work, since
+	some complex IL opcodes may require a series of expression trees
+	and handling such cases in monoburg could become quite hairy.
+	For example, think of opcode that need to do multiple actions on the 
+	same object: this basically means a DUP...
+	On the other end, if a complex opcode needs a DUP, monoburg doesn't
+	actually need to create trees if it emits the instructions in
+	the correct sequence and maintains the right values in the registers
+	(usually the values that need a DUP are not changed...). How
+	this integrates with the current register allocator is not clear, since
+	that assigns registers based on the rule, but the instructions emitted 
+	by the rules may be different (this already happens with the current JIT
+	where a MULT is replaced with lea etc...).
+
+* Doing it in a separate pass.
+
+	Doing the conversion in a separate pass over the instructions
+	is another alternative. This can be done right after method_to_ir ()
+	or after the SSA pass (since the IR after the SSA pass should look
+	almost like the IR we get back from method_to_ir ()).
+
+	This has the following advantages:
+	*) monoburg will handle only the simple opcodes (makes porting easier)
+	*) the instruction selection will be run on all the additional trees
+	*) it's easier to support coarse opcodes that produce multiple expression 
+		trees (and apply the monoburg selector on all of them)
+	*) the SSA optimizer will see the original opcodes and will be able to use
+		the semantic info associated with them
+	
+	The disadvantage is that this is a separate pass on the code and
+	it takes time (how much has not been measured yet, though).
+
+	With this approach, we may also be able to have C implementations
+	of some of the opcodes: this pass would insert a function call to 
+	the C implementation (for example in the cases when first porting
+	to a new arch and implemenating some stuff may be too hard in asm).
+
+* Extended basic blocks
+
+	IL code needs a lot of checks, bounds checks, overflow checks,
+	type checks and so on. This potentially increases by a lot
+	the number of basic blocks in a control flow graph. However,
+	all such blocks end up with a throw opcode that gives control to the
+	exception handling mechanism.
+	After method_to_ir () a MonoBasicBlock can be considered a sort
+	of extended basic block where the additional exits don't point
+	to basic blocks in the same procedure (at least when the method
+	doesn't have exception tables).
+	We need to make sure the passes following method_to_ir () can cope
+	with such kinds of extended basic blocks (especially the passes
+	that we need to apply to all the methods: as a start, we could
+	skip SSA optimizations for methods with exception clauses...)
+