Added documentation for abc removal code.

svn path=/trunk/mono/; revision=27511
author: Massimiliano Mantione <massi@mono-cvs.ximian.com> 2004-05-17 13:35:07 +0400
committer: Massimiliano Mantione <massi@mono-cvs.ximian.com> 2004-05-17 13:35:07 +0400
commit: f338a0c396237b392d90eae3172c1f3d7091472a (patch)
tree: fd2a99b0eac4ba75365f6fb608d731718a5b855a /docs/abc-removal.txt
parent: a188e60dacdbb0bc84b5edd7cdaefd43b3b79023 (diff)
1 files changed, 525 insertions, 0 deletions
diff --git a/docs/abc-removal.txt b/docs/abc-removal.txt
new file mode 100755
index 00000000000..82bad22330e
--- /dev/null
+++ b/docs/abc-removal.txt
@@ -0,0 +1,525 @@
+
+Here "abc" stays for "array bounds check", or "array bound checks", or
+some combination of the two...
+
+------------------------------------------------------------------------------
+USAGE
+
+Simply use the "abcrem" optimization invoking mono.
+
+To see if bound checks are actually removed, use "mono -v" and grep for
+"ARRAY-ACCESS" in the output, there should be a message for each check
+that has been removed.
+
+To trace the algorithm execution, use "-v -v -v", and be prepared to be
+totally submersed by debugging messages...
+
+------------------------------------------------------------------------------
+EFFECTIVENESS
+
+The abc removal code can now always remove bound checks from "clean"
+array scans in loops, and generally anyway there are clear conditions that
+already state that the index is "safe".
+
+To be clearer, and give an idea of what the algorithm can and cannot do
+without describing it in detail... keep in mind that only "redunant" checks
+cannot be removed. By "redundant", I mean "already explicitly checked" in
+the method code.
+
+Unfortunately, analyzing complex expressions is not so easy (see below for
+details), so the capabilities of this "abc remover" are limited.
+
+These are simple guidelines:
+- Only simple comparisons between variables are understood.
+- Only increment-decrement operations are handled.
+- "switch" statements are not yet handled.
+
+This means that code like this will be handled well:
+
+for (int i = 0; i < a.Length; i++) {
+	a [i] = .....
+}
+
+The "i" variable could be declared out of the "for", the "for" could be a
+"while", and maybe even implemented with "goto", the array could be scanned
+in reverse order, and everything would still work.
+I could have a temporary variable storing the array length, and check on
+it inside the loop, and the abc removal would still occurr, like this:
+
+int i = 0;
+int l = a.Length;
+while ( i < l ) {
+	a [i] ......
+}
+
+or this:
+
+int l = a.Length;
+for (int i = l; i > 0; i--) {
+	a [i] = .....
+}
+
+
+But just something like this
+
+for (int i = 0; i < (a.Length -1); i++) .....
+
+or like this
+
+for (int i = 0; i < a.Length; i += 2) .....
+
+would not work, the check would stay there, sorry :-(
+
+Just to make you understand how things are tricky... this would work!
+
+int limit = a.Length - 1;
+for (int i = 0; i < limit; i++) {
+	a [i] = .....
+}
+
+
+A detailed explanation of the reason why things are done like this is given
+below...
+
+All in all, the compiling phase is generally non so fast, and the abc removed
+are so few, that there is hardly a performance gain in using the "abcrem"
+flag in the compiling options for short programs like the ones I tried.
+
+Anyway, various bound checks *are* removed, and this is good :-)
+
+------------------------------------------------------------------------------
+THE ALGORITHM
+
+Unfortunately, this time google has not been my friend, and I did not
+find an algorithm I could reasonably implement in the time frame I had,
+so I had to invent one. The two cleasses of algorithms I found were either
+simply an analisys of variable ranges (which is not very different from
+what I'm doing), or they were based on type systems richer than the .NET
+one (particularly, they required integer types to have an explicit range,
+which could be related to array ranges). By the way, it seems that the
+gcj people do not have an array bound check removal optimization (they
+just have a compiler flag that unconditionally removes all the checks,
+but this is not what we want).
+
+This array bound check removal (abc removal) algorithm is based on
+symbolic execution concepts. I handle the removal like an "unreachable
+code elimination" (and in fact the optimization could be extended to remove
+also other unreachable sections of code, due to branches that "always go the
+same way").
+
+In symbolic execution, variables do not have actual (numeric) values, but
+instead symbolic expressions (like "a", or "x+y").
+Also, branch conditions are handled like symbolic conditions (like "i<k"),
+which state relations between variable values.
+
+The SSA representation inside mini is somewhat close to a symbolic
+representation of the execution of the compiled method.
+Particularly, the representation of variable values is exactly a symbolic
+one. It is enough to find all CEE_STIND_* instructions which store to a
+local variable, and their second argument is exactly the variable value.
+Actually, "cfg->vars [<variable-index>]->def" should contain exactly
+those store instructions, and the "get_variable_value_from_ssa_store"
+function extracts the variable value from there.
+
+On the other hand, the conditions under which each basic block is
+executed are not made fully explicit.
+
+However, it is not difficult to make them so.
+Each BB that has more than one exit BB, in practice must end either with
+a conditional branch instruction or with a switch instruction.
+In the first case, the BB has exactly two exit BBs, and their execution
+conditions are easy to get from the condition of the branch (see the
+"get_relation_from_branch_instruction" function, and expecially the end of
+"analyze_block" in abcremoval.c.
+If there is a switch, the jump condition of every exit BB is the equality
+of the switch argument with the particular index associated with its case
+(but the current implementation does not handle switch statements yet).
+
+These individual conditions are in practice associated to each arc that
+connects BBs in the CFG (with the simple case that unconditional branches
+have a "TRUE" condition, because they always happen).
+
+So, for each BB, its *proper* entry condition is the union of all the
+conditions associated to arcs that enter the BB. The "union" is like a
+logical "or", in the sense that either of the condition could be true,
+they are not necessarily all true. This means that if I can enter a BB
+in two ways, and in one case I know that "x>0", and in the other that
+"x==0", actually in the BB I know that "x>=0", which is a weaker
+condition (the union of the two).
+
+Also, the *complete* entry condition for a BB is the "intersection" of all
+the entry conditions of its dominators. This is true because each dominator
+is the only way to reach the BB, so the entry condition of each dominator
+must be true if the control flow reached the BB. This translates to the
+logical "and" of all the "proper" conditions of the BBs met walking up in the
+dominator tree. So, if one says "x>0", and another "x==1", then I know
+that "x==1", which is a stronger condition (the intersection of the two).
+Note that, if the two conditions were "x>0" and "x==0", then the block would
+be unreachable (the intersection is empty), because some branch is impossible.
+
+Another observation is that, inside each BB, every variable is subject to the
+complete entry condition of that very same BB, and not the one in which it is
+defined (with the "complete entry condition" being the thing I defined before,
+sorry if these terms "proper" and "complete" are strange, I found nothing
+better).
+This happens because the branch conditions are related to the control flow.
+I can define "i=a", and if I am in a BB where "a>0", then "i>0", but not
+otherwise.
+
+So, intuitively, if the available conditions say "i>=0", and i is used as an
+index in an array access, then the lower bound check can be omitted.
+If the condition also says "(i>=0)&&(i<array.length)", the abc removal can
+occur.
+
+So, a complete solution to the problem of abc removal would be the following:
+for each array access, build a system of equations containing:
+[1] all the symbolic variable definitions
+[2] the complete entry condition of the BB in which the array access occurs
+[3] the two "goal functions" ("index >=0" and "index < array.length")
+If the system is valid for *each possible* variable value, then the goal
+functions are always true, and the abc can be removed.
+
+All this discussion is useful to give a precise specification to the problem
+we are trying to solve.
+The trouble is that, in the general case, the resulting system of equations
+is like a predicate in first order logic, which is semi-decidable, and its
+general solution is anyway too complex to be attempted in a JIT compiler
+(which should not contain a full fledged theorem prover).
+
+Therefore, we must cut some corner.
+
+
+By the way, there is also another big problem, which is caused by "recursive"
+symbolic definitions. These definition can (and generally do) happen every
+time there is a loop. For instance, in the following piece of code
+
+for ( int i = 0; i < array.length; i++ ) {
+	Console.WriteLine( "array [i] = " + array [i] );
+}
+
+one of the definitions of i is a PHI that can be either 0 or "i + 1".
+
+Now, we know that mathematically "i = i + 1" does not make sense, and in
+fact symbolic values are not "equations", they are "symbolic definitions".
+
+The symbolic value of i is a generic "n", where "n" is the number of
+iterations of the loop, but this is terrible to handle (and in more complex
+examples the symbolic definition of i simply cannot be written, because i is
+calculated in an iterative way).
+
+However, the definition "i = i + 1" tells us something about i: it tells us
+that i "grows". So (from the PHI definition) we know that i is either 0, or
+"grows". This is enough to tell that "i>=0", which is what we want!
+It is important to note that recursive definitions can only occurr inside
+PHI definitions, because actually a variable cannot be defined *only* in terms
+of itself!
+
+
+At this point, I can explain which corners I want to cut to make the
+problem solvable. It will not remove all the abc that could theoretically
+be removed, but at least it will work.
+
+The easiest way to cut corners is to only handle expressions which are
+"reasonably simple", and ignore the rest.
+Keep in mind that ignoring an expression is not harmful in itself.
+The algorithm will be simply "less powerful", because it will ignore
+conditions that could have caused to the removal of an abc, but will
+not remove checks "by mistake" (so the resulting code will be in any case
+correct).
+
+In a first attempt, we can consider only conditions that have the simple
+possible form, which is "(compare simpleExpression simpleExpression)",
+where "simpleExpression" is either "(ldind.* local[*])" or
+"(ldlen (ldind.ref local[*]))" (or a constant, of course).
+
+We can also "summarize" variable definitions, keeping only what is relevant
+to know: if they are greater or smaller than zero or other variables.
+
+One particular note on PHI functions: they work (obviously) like the logical
+"or" of their definitions, and therefore are equivalent to the "logical or"
+of the summarization of their definitions.
+
+About recursive definitions (which, believe me, are the worst thing in all
+this mess), we handle only "monotonic" ones. That is, we try to understand
+if the recursive definition (which, as we said above, must happen because
+of a loop) always "grows" or "gets smaller". In all other cases, we decide
+we cannot handle it.
+
+So, the algorithm to optimize one method could be something like:
+
+[1] Build the SSA representation.
+[2] Analyze each BB. Particularly, do the following:
+    [2a] summarize its exit conditions
+    [2b] summarize all the variable definitions occurring in the BB
+    [2c] check if it contains array accesses (only as optimization, so that
+         step 3 will be performed only when necessary)
+[3] Perform the removal. For each BB that contains array accesses, do:
+    [3a] build an evaluation area where for each variable we keep track
+         of its relations with zero and with each other variable.
+    [3b] populate the evaluation area with the initial data (the variable
+         definitions and the complete entry condition of the BB)
+    [3c] propagate the relations between variables (so that from "a<b" and
+         "b<c" we know that "a<c") in the evaluation area
+    [3d] for each array access in the BB, use the evaluation area to "match
+         conditions", to see if the goal functions are satisfied
+
+
+------------------------------------------------------------------------------
+LOW LEVEL DESIGN DECISIONS
+
+One of the problems I had to solve was the representation of the relations
+between symbolic values.
+The "canonical" solution would be to have a complex, potentially recursive
+data structure, somewhat similar to MonoInst, and use it all the time (maybe
+MonoInst could be used "as is").
+The problem is that, at this point, the software would have to "play" with
+these structures, for instance to deduce "b < a-1" from "a > b+1", because we
+are interested in what value b has, and not a. And maybe the software should
+be able to do this without modifying the original condition, because it was
+used elsewhere (or could be reused).
+Then, the software should also be able to apply logical relations to
+conditions (like "(a>0)||(c<1)"), and manipulate them, too.
+
+In my opinion, an optimizer written in this way risks to be too complex, and
+its performance not acceptable in a JIT compiler.
+Therefore, I decided to represent values in a "summarized" way.
+For instance, "x = a + 1" becomes "x > a" (and, by the way, "x = x + 1"
+would look like "x > x", which means that "x grows").
+So, each variable value is simply a "relation" of the variable with other
+variables or constants. Anything more complex is ignored.
+Relations are represented with flags (EQ, LT and GT), so that it is easy
+to combine them logically (it is enough to perform the bitwise operations
+on the flags). The condition (EQ|LT|GT) means we know nothing special, any
+value is possible. The empty condition would mean an unreachable statement
+(because no value is acceptable).
+
+There is still the problem of identifying variables, and where to store all
+these relations. After some thinking, I decided that a "brute force" approach
+is the easier, and probably also the fastest. In practice, I keep in memory
+an array of "variable relations", where for each variable I record its
+relation with the constants 0 and 1 and an array of its relations with all
+other variables. The evaluation area, therefore, looks like a square matrix
+as large as the number of variables. I *know* that the matrix is rather sparse
+(in the sense that only few of its cells are significant), but I decided that
+this data structure was the simplest and more efficient anyway. After all,
+each cell takes just one byte, and any other data structure must be some
+kind of associative array (like a g_hash_table). Such a data structue is
+difficult to use with a MonoMemPool, and in the end is slower than a plain
+array, and has the storage overhead of all the pointers... in the end, maybe
+the full matrix is not a waste at all.
+<note>
+After a few benchmarks, I clearly see that the matrix works well if the
+variables are not too much, otherwise it gets "too sparse", and much time
+is wasted in the propagation phase. Maybe the data structure could be changed
+after a more careful evaluation of the problem...
+</note>
+
+With these assumptions, all the memory used to represent values, conditions
+and the evaluation area can be allocated in advance (from a MonoMemPool), and
+used for all the analysis of one method. Moreover, the performance of logical
+operations is high (they are simple bitwise operations on bytes).
+<note>
+For the propagation phase, there was a function I could not easily code
+with bitwise operations. As that phase is a real pain for the performance
+(the complexity is quadratic, and it must be executed iteratively until
+nothing changes), I decided to code it in perl, and generate a precomputed
+table then used by the C code (just 64 bytes). I think this is faster than
+all those branches... Really, I could think again to phases 3c and 3d (see
+above), maybe there is a better way to do them.
+</note>
+
+Of course not all expressions can be handled by the summarization, so the
+optimizer is not as effective as a full blown theorem prover...
+
+Another thing related to design... the implementation is as less intrusive
+as possible.
+I "patched" the main files to add an option for abc removal, and handle it
+just like any other option, and then wrote all the relevant code in two
+separated files (one header for data structures, and a C file for the code).
+If the patch turns out to be useful also for other pourposes (like a more
+generic "unreachable code elimination", that detects branches that will never
+be taken using the analysis of variable values), it will not be difficult
+to integrate the branch summarizations into the BasicBlock data structures,
+and so on.
+
+One last thing... in some places, I knowingly coded in an inefficient way.
+For instance, "remove_abc_from_block" does something only if the flag
+"has_array_access_instructions" is true, but I call it anyway, letting it
+do nothing (calling it conditionally would be better). There are several
+things like this in the code, but I was in a hurry, and preferred to make
+the code more readable (it is already difficult to understand as it is).
+
+------------------------------------------------------------------------------
+WHEN IS ABC REMOVAL POSSIBLE? WHEN IS IT USEFUL?
+
+Warning! Random ideas ahead!
+
+In general, managed code should be safe, so abc removal is possible only if
+there already is some branch in the method that does the check.
+So, one check on the index is "mandatory", the abc removal can only avoid a
+second one (which is useless *because* of the first one).
+If in the code there is not a conditional branch that "avoids" to access
+an array out of its bounds, in general the abc removal cannot occur.
+
+It could happen, however, that one method is called from another one, and
+the check on the index is performed in the caller.
+
+For instance:
+
+void
+caller( int[] a, int n ) {
+	if ( n <= a.Length ) {
+		for ( int i = 0; i < n; i++ ) {
+			callee( a, i );
+		}
+	}
+}
+
+void callee( int[] a, int i ) {
+	Console.WriteLine( "a [i] = " + a [i] );
+}
+
+
+In this case, it should be possible to have two versions of the callee, one
+without checks, to be called from methods that do the check themselves, and
+another to be called otherwise.
+This kind of optimization could be profile driven: if one method is executed
+several times, wastes CPU time for bound checks, and is mostly called from
+another one, it could make sense to analyze the situation in detail.
+
+However, one simple way to perform this optimization would be to inline
+the callee (so that the abc removal algorithm can see both methods at once).
+
+The biggest benefits of abc removal can be seen for array accesses that
+occur "often", like the ones inside inner loops. This is why I tried to
+handle "recursive" variable definitions, because without them you cannot
+optimize inside loops, which is also where you need it most.
+
+Another possible optimization (alwais related to loops) is the following:
+
+void method( int[] a, int n ) {
+	for ( int i = 0; i < n; i++ ) {
+		Console.WriteLine( "a [i] = " + a [i] );
+	}
+}
+
+In this case, the check on n is missing, so the abc could not be removed.
+However, this would mean that i will be checked twice at each loop iteration,
+one against n, and the other against a.Length. The code could be transformed
+like this:
+
+void method( int[] a, int n ) {
+	if ( n < a.Length ) {
+		for ( int i = 0; i < n; i++ ) {
+			Console.WriteLine( "a [i] = " + a [i] ); // Remove abc
+		}
+	} else {
+		for ( int i = 0; i < n; i++ ) {
+			Console.WriteLine( "a [i] = " + a [i] ); // Leave abc
+		}
+	}
+}
+
+This could result in performance improvements (again, probably this
+optimization should be profile driven).
+
+------------------------------------------------------------------------------
+OPEN ISSUES
+
+There are several issues that should still be addressed...
+
+One is related to aliasing. For now, I decided to operate only on local
+variables, and ignore aliasing problems (and alias analisys has not yet
+been implemented anyway).
+Actually, I identified the local arrays with their length, because I
+totally ignore the contents of arrays (and objects in general).
+
+Also, in several places in the code I only handle some cases, and ignore
+other more complex ones, which could anyway work (there are comments where
+this happens). Anyway, this is not harmful (the code is not as effective
+as it could be).
+
+Another possible improvement is the explicit handling of all constants.
+For now, code like
+
+void
+method () {
+	int[] a = new int [16];
+	for ( int i = 0; i < 16; i++ ) {
+		a [i] = i;
+	}
+}
+
+is not handled, because the two constants "16" are lost when variable
+relations are summarized (actually they are not lost, they are stored in the
+summarized values, but they are not yet used correctly).
+
+The worst thing, anyway, is that for now I fail completely in distinguishing
+between signed and unsigned variables/operations/conditions, and between
+different integer sizes (in bytes). I did this on pourpose, just for lack of
+time, but this can turn out to be terribly wrong.
+The problem is caused by the fact that I handle arithmetical operations and
+conditions as "ideal" operations, but actually they can overflow and/or
+underflow, giving "surprising" results.
+
+For instance, look at the following two methods:
+
+public static void testSignedOverflow()
+{
+  Console.WriteLine( "Testing signed overflow..." );
+  int[] ia = new int[70000];
+  int i = 1;
+  while ( i <ia.Length )
+  {
+    try
+    {
+      Console.WriteLine( " i = " + i );
+      ia[i] = i;
+    }
+    catch ( IndexOutOfRangeException e )
+    {
+      Console.WriteLine( "Yes, we had an overflow (i = " + i + ")" );
+    }
+    i *= 60000;
+  }
+  Console.WriteLine( "Testing signed overflow done." );
+}
+
+public static void testUnsignedOverflow()
+{
+  Console.WriteLine( "Testing unsigned overflow..." );
+  int[] ia = new int[70000];
+  uint i = 1;
+  while ( i <ia.Length )
+  {
+    try
+    {
+      Console.WriteLine( " i = " + i );
+      ia[i] = (int) i;
+    }
+    catch ( IndexOutOfRangeException e )
+    {
+      Console.WriteLine( "Yes, we had an overflow (i = " + i + ")" );
+    }
+    i *= 60000;
+  }
+  Console.WriteLine( "Testing unsigned overflow done." );
+}
+
+In the "testSignedOverflow" method, the exception will be thrown, while
+in the "testUnsignedOverflow" everything will go on smoothly.
+The problem is that the test "i <ia.Length" is a signed comparison in the
+first case, and will not exit from the loop if the index is negative.
+Therefore, in the first method, the abc should not be removed.
+
+In practice, the abc removal code should try to predict if any given
+expression could over/under-flow, and act accordingly.
+For now, I made so that the only accepted expressions are plain increments
+and decrements of variables (which cannot over/under-flow without being
+trapped by the existing conditions). Anyway, this issue should be analyzed
+better.
+By the way, in summarizations there is a field that keeps track of the
+relation with "1", which I planned to use exactly to summarize products and
+quotients, but it is never used (as I said, only increment and decrement
+operations are properly summarized).
+
author	Massimiliano Mantione <massi@mono-cvs.ximian.com>	2004-05-17 13:35:07 +0400
committer	Massimiliano Mantione <massi@mono-cvs.ximian.com>	2004-05-17 13:35:07 +0400
commit	f338a0c396237b392d90eae3172c1f3d7091472a (patch)
tree	fd2a99b0eac4ba75365f6fb608d731718a5b855a /docs/abc-removal.txt
parent	a188e60dacdbb0bc84b5edd7cdaefd43b3b79023 (diff)