2004-12-03 Massimiliano Mantione <massi@ximian.com>

* mini.c: Added removal of critical edges (prerequisite for SSAPRE), and some utility functions (always for SSAPRE), integrated SSAPRE. * mini.h: Likewise. * driver.c: Added ssapre option. * ssa.c: Small fix on OP_ARG handling. * ssapre.c, ssapre.h: Added files containing SSAPRE implementation. * Makefile.am: Likewise. * ../docs/ssapre.txt: Added SSAPRE documentation. svn path=/trunk/mono/; revision=37005
author: Massimiliano Mantione <massi@mono-cvs.ximian.com> 2004-12-03 14:42:12 +0300
committer: Massimiliano Mantione <massi@mono-cvs.ximian.com> 2004-12-03 14:42:12 +0300
commit: e7349ddfca1429e3c75a63b23eb611d74916ce0e (patch)
tree: 054d77246755e5ee05ceb23f6f924539b33641c8 /docs
parent: 7ba24e25281829b46dec513dd094bb49027a8d3d (diff)
1 files changed, 248 insertions, 0 deletions
diff --git a/docs/ssapre.txt b/docs/ssapre.txt
new file mode 100644
index 00000000000..49b9a9a0eac
--- /dev/null
+++ b/docs/ssapre.txt
@@ -0,0 +1,248 @@
+
+SSAPRE stands for "SSA based Partial Redundancy Elimination".
+
+The algorithm is explained in this paper:
+
+Partial Redundancy Elimination in SSA Form (1999)
+Robert Kennedy, Sun Chan, SHIN-MING LIU, RAYMOND LO, PENG TU, FRED CHOW
+ACM Transactions on Programming Languages and Systems
+
+http://citeseer.ist.psu.edu/kennedy99partial.html
+
+In this document I give a gentle introduction to the concept of "partial"
+redundancies, and I explain the basic design decisions I took in implementing
+SSAPRE, but the paper is essential to understand the code.
+
+Partial Redundancy Elimination (or PRE) is an optimization that (guess what?)
+tries to remove redundant computations.
+It achieves this by saving the result of "not redundant" evaluations of
+expressions into appositely created temporary variables, so that "redundant"
+evaluations can be replaced by a load from the appropriate variable.
+
+Of course, on register starved architectures (x86) a temporary could cost more
+than the evaluation itself... PRE guarantees that the live range of the
+introduced variables is the minimal possible, but the added pressure on the
+register allocator can be an issue.
+
+The nice thing about PRE is that it not only removes "full" redundancies, but
+also "partial" ones.
+A full redundancy is easy to spot, and straightforward to handle, like in the
+following example (in every example here, the "expression" is "a + b"):
+
+int FullRedundancy1 (int a, int b) {
+     int v1 = a + b;
+     int v2 = a + b;
+     return v1 + v2;
+}
+
+PRE would transform it like this:
+
+int FullRedundancy1 (int a, int b) {
+     int t = a + b;
+     int v1 = t;
+     int v2 = t;
+     return v1 + v2;
+}
+
+Of course, either a copy propagation pass or a register allocator smart enough
+to remove unneeded variables would be necessary afterwords.
+
+Another example of full redundancy is the following:
+
+int FullRedundancy2 (int a, int b) {
+     int v1;
+     
+     if (a >= 0) {
+          v1 = a + b; // BB1
+     } else {
+          a = -a; // BB2
+          v1 = a + b;
+     }
+     
+     int v2 = a + b; // BB3
+     return v1 + v2;
+}
+
+Here the two expressions in BB1 and BB2 are *not* the same thing (a is
+modified in BB2), but both are redundant with the expression in BB3, so the
+code can be transformed like this:
+
+int FullRedundancy2 (int a, int b) {
+     int v1;
+     int t;
+     
+     if (a >= 0) {
+          t = a + b; // BB1
+          v1 = t;
+     } else {
+          a = -a; // BB2
+          t = a + b;
+          v1 = t;
+     }
+     
+     int v2 = t; // BB3
+     return v1 + v2;
+}
+
+Note that there are still two occurrences of the expression, while it can be
+easily seen that one (at the beginning of BB3) would suffice.
+This, however, is not a redundancy for PRE, because there is no path in the
+CFG where the expression is evaluated twice.
+Maybe this other kind of redundancy (which affects code size, and not the
+computations that are actually performed) would be eliminated by code hoisting,
+but I should check it; anyway, it is not a PRE related thing.
+
+An example of partial redundancy, on the other hand, is the following:
+
+int PartialRedundancy (int a, int b) {
+     int v1;
+     
+     if (a >= 0) {
+          v1 = a + b; // BB1
+     } else {
+          v1 = 0; // BB2
+     }
+     
+     int v2 = a + b; // BB3
+     return v1 + v2;
+}
+
+The redundancy is partial because the expression is computed more than once
+along some path in the CFG, not all paths.
+In fact, on the path BB1 - BB3 the expression is computed twice, but on the
+path BB2 - BB3 it is computed only once.
+In this case, PRE must insert new occurrences of the expression in order to
+obtain a full redundancy, and then use temporary variables as before.
+Adding a computation in BB2 would do the job.
+
+One nice thing about PRE is that loop invariants can be seen as partial
+redundancies.
+The idea is that you can get into the loop from two locations: from before the
+loop (at the 1st iteration), and from inside the loop itself (at any other
+iteration). If there is a computation inside the loop that is in fact a loop
+invariant, PRE will spot this, and will handle the BB before the loop as a
+place where to insert a new computation to get a full redundancy.
+At this point, the computation inside the loop would be replaced by an use of
+the temporary stored before the loop, effectively performing "loop invariant
+code motion".
+
+Now, this is what PRE does to the code.
+
+But how does it work?
+
+In "classic" solutions, PRE is formulated as a data flow analysis problem.
+The Muchnick provides a detailed description of the algorithm in this way (it
+is by far the most complex problem of this kind in the whole book).
+The point is that this algorithm does not exploit the SSA form.
+In fact, it has to perform all that amount of data flow analysis exactly
+because it does not take advantage of the work already done if you have put
+the program into SSA form.
+
+The SSAPRE algorithm, on the other hand, is designed with SSA in mind.
+It fully exploits the properties of SSA variables, it also explicitly reuses
+some data structures that must have been computed when building the SSA form,
+and takes great care to output its code in that form already (which means that
+the temporaries it introduces are already "versioned", with all the phi
+variables correctly placed).
+
+The main concept used in this algorithm is the "Factored Redundancy Graph" (or
+FRG in short). Basically, for each given expression, this graph shows all its
+occurrences in the program, and redundant ones are linked to their
+"representative occurrence" (the 1st occurrence met in a CFG traversal).
+The central observation is that the FRG is "factored" because each expression
+occurrence has exactly one representative occurrence, in the same way as the
+SSA form is a "factored" use-definition graph (each use is related to exactly
+one definition).
+And in fact building the FRG is much like building an SSA representation, with
+PHI nodes and all.
+By the way, I use "PHI" for "PHI expression occurrences", and "phi" for the
+usual phi definitions in SSA, because the paper uses an uppercase phi greek
+letter for PHI occurrences (while the lowercase one is standard in SSA).
+
+The really interesting point is that the FRG for a given expression has exactly
+the same "shape" of the use-definition graph for the temporary var that must
+be introduced to remove the redundancy, and this is why SSAPRE can easily emit
+its output code in correct SSA form.
+
+One particular characteristic of the SSAPRE algorithm is that it is "sparse",
+in the sense that it operates on expressions individually, looking only at the
+specific nodes it needs. This is in contrast to the classical way of solving
+data flow analysis problems, which is to operate globally, using data
+structures that work "in parallel" on all the entities they operate on (in
+practice bit sets, with for instance one bit per variable, or one bit per
+expression occurrence, you get the idea). This is handy (it exploits the
+parallelism of bit operations), but in general setting up those data
+structures is time consuming, and if the number of the things represented by
+the bits is not fixed in advance the code can get quite messy (bit sets must
+become growable).
+Moreover, applying special handling to individual expressions becomes a very
+tricky thing.
+
+SSAPRE, on the other hand, looks at the whole program (method in our case) only
+once, when it scans the code to collect (and classify) all expression
+occurrences.
+From here on, it operates on one expression at a time, looking only at its
+specific data structures (which, for each expression, are much smaller than
+the whole program, so the operations are fast).
+
+This approach has another advantage: the data structures used to operate on
+one expressions can be recycled when operating on other expressions, making
+the memory usage of the compiler lower, and (also important) avoiding losing
+time with memory allocations at all.
+This reflects directly on the design of those data structures.
+We can better see these advantages following which data structures are used
+during the application of SSAPRE to a method.
+
+The steps that are performed are the following:
+
+    * Initialization: scan the whole program, collect all the occurrences, and
+      build a worklist of expressions to be processed (each worklist entry
+      describes all the occurrences of the given expression).
+Here the data structures are the following:
+- One struct (the working area), containing the worklist and other
+ "global" data. The worklist itself contains an entry for each expression
+  which in turn has an entry for each occurrence.
+- One "info" struct for each BB, containing interesting dominance and CFG
+  related properties of the BB.
+
+Then, for each entry in the worklist, these operations are performed:
+
+    * PHI placement: find where the PHI nodes of the FRG must be placed.
+    * Renaming: assign appropriate "redundancy classes" to all occurrences (it
+      is like assigning variable versions when building an SSA form).
+    * Analyze: compute various flags in PHI nodes (which are the only places
+      that define where additional computations may be added).
+      This conceptually is composed of two data flow analysis passes, which in
+      practice only scan the PHI nodes in the FRG, not the whole code, so they
+      are not that heavy.
+    * Finalize: make so that the FRG is exactly like the use-def graph for the
+      temporary that will be introduced (it generally must be reshaped a bit
+      according to the flags computed in the previous step).
+      This is also made of two steps, but more for implementation reasons than
+      for conceptual ones.
+    * Code motion: actually update the code using the FRG.
+Here, what's needed is the following:
+- PHI occurrences (and some flags sbout them)
+- PHI argument occurrences (and some flags sbout them)
+- The renaming stack
+In practice, one can observe that each BB can have at most one PHI (we work
+on one expression at a time), and also one PHI argument (which we consider
+occurring at the end of the BB). Therefore, we do not have separate structures
+for these, but store them directly in the BB infos (which are kept for the
+whole SSAPRE invocation).
+The renaming stack is managed directly as a list of occurrences, with
+special handling for PHI nodes (which, being represented directly by their
+BB, are not "occurrences").
+
+
+So far, the only two missing things (with respect to SSAPRE in the paper) are
+unneeded PHIs eliminantion and the handling of "composite" expressions.
+Otherwise, the implementation is complete.
+
+Other interesting issues are:
+- SSAPRE has the assumption that:
+  - each SSA variable is related to one "original" (not SSA) variable, and
+  - no more than one version of each originsl variable is live at the same time
+    in the CFG.
+  It would be better to relax these assumptions.
+- SSAPRE operates on "syntactic"
author	Massimiliano Mantione <massi@mono-cvs.ximian.com>	2004-12-03 14:42:12 +0300
committer	Massimiliano Mantione <massi@mono-cvs.ximian.com>	2004-12-03 14:42:12 +0300
commit	e7349ddfca1429e3c75a63b23eb611d74916ce0e (patch)
tree	054d77246755e5ee05ceb23f6f924539b33641c8 /docs
parent	7ba24e25281829b46dec513dd094bb49027a8d3d (diff)