2006-03-31 Massimiliano Mantione <massi@ximian.com>

* docs/tree-mover.txt: Added tree mover documentation. svn path=/trunk/mono/; revision=58830
author: Massimiliano Mantione <massi@mono-cvs.ximian.com> 2006-03-31 10:59:23 +0400
committer: Massimiliano Mantione <massi@mono-cvs.ximian.com> 2006-03-31 10:59:23 +0400
commit: b37add239c289a64f1d54cb0f367d2e081c0984f (patch)
tree: d7659abcba0bfebe93f8839026d42f35d4825467 /docs/tree-mover.txt
parent: 693aaf3bcc01559d2f2f33fa8c67ed8d96634c90 (diff)
1 files changed, 261 insertions, 0 deletions
diff --git a/docs/tree-mover.txt b/docs/tree-mover.txt
new file mode 100644
index 00000000000..3ee836a5b2f
--- /dev/null
+++ b/docs/tree-mover.txt
@@ -0,0 +1,261 @@
+
+Purpose
+
+Especially when inlining is active, it can happen that temporary
+variables add pressure to the register allocator, producing bad
+code.
+
+The idea is that some of these temporaries can be totally eliminated
+my moving the MonoInst tree that defines them directly to the use
+point in the code (so the name "tree mover").
+
+Please note that this is *not* an optimization: it is mostly a
+workaround to issues we have in the regalloc.
+Actually, with the new linear IR this will not be possible at all
+(there will be no more trees in the code!).
+Anyway, this workaround turns out to be useful in the current state
+of things...
+
+-----------------------------------------------------------------------
+
+Base logic
+
+If a local is defined by a value which is a proper expression (a tree
+of MonoInst, not just another local or a constant), and this definition
+is used only once, the tree can be moved directly to the use location,
+and the definition eliminated.
+Of course, none of the variables used in the tree must be defined in
+the code path between the definition and the use, and the tree must be
+free of side effects.
+We do not handle the cases when the tree is just a local or a constant
+because they are handled by copyprop and consprop, respectively.
+
+To make things simpler, we restrict the tree move to the case when:
+- the definition and the use are in the same BB, and
+- the use is followed by another definition in the same BB (it is not
+  possible that the 1st value is used again), or alternatively there
+  is no BB in the whole CFG that contains a use of this local before a
+  definition (so, again, there is no code path that can lead to a
+  subsequent use).
+
+To handle this, we maintain an ACT array (Available Copy Tree, similar
+to the ACP), where we store the "state" of every local.
+Ideally, every local can be in the following state:
+[E] Undefined (by a tree, it could be in the ACP but we don't care).
+[D] Defined (by a tree), and waiting for a use.
+[U] Used, with a tree definition available in the same BB, but still
+    without a definition following the use (always in the same BB).
+Of course state [E] (empty) is the initial one.
+
+Besides, there are two sort of "meta states", or flags:
+[W] Still waiting for a use or definition in this BB (we have seen no
+    occurrence of the local yet).
+[X] Used without being previously defined in the same BB (note that if
+    there is a definition that precedes the use in the same BB, even if
+    the definition is not a tree or is not available because of side
+    effects or because the tree value has changed the local is not in
+    state [X]).
+Also note that state [X] is a sort of "global" condition, which if set
+in one BB will stay valid for the whole CFG, even if the local will
+otherwise change state. The idea of flagging a local as [X] is that if
+there is a definition/use pair that reaches the end of a BB, it could
+be that there is a CFG path that then leads to the BB flagging it as
+[X] (which contains a use), so the tree cannot be moved.
+So state [X] will always be set, and never examined in all the state
+transitions we will describe.
+In practice, we use flag [W] to set state [X]: if, when traversing a
+BB, we find a use for a local in state [W], then that local is flagged
+[X].
+
+
+For each BB, we initialize all states to [E] and [W], and then we
+traverse the code one inst at a time, and update the variable states
+in the ACT in the following ways:
+
+[Definition]
+  - Flag [W] is cleared.
+  - All "affected trees" are killed (go from state [D] to [E]).
+    The "affected trees" are the trees which contain (use) the defined
+    local, and the rationale is that the tree value changed, so the
+    tree is no longer available.
+  - If the local was in state [U], *that* tree move is marked "safe"
+    (because *this* definition makes us sure that the previous tree
+    cannot be used again in any way).
+    The idea is that "safe" moves can happen even if the local is
+    flagged [X], because the second definition "covers" the use.
+    The tree move is then saved in the "todo" list (and the affecting
+    nodes are cleared).
+  - If the local was defined by a tree, it goes to state [D], the tree
+    is recorded, and all the locals used in it are marked as "affecting
+    this tree" (of course these markers are lists, because each local
+    could affect more than one tree).
+
+[IndirectDefinition]
+  - All potentially affected trees (in state [D]) are killed.
+
+[Use]
+  - If the local is still [W], it is flagged [X] (the [W] goes away).
+  - If the local is in state [D], it goes to state [U].
+    The tree move must not yet be recorded in the "todo" list, it still
+    stays in the ACT slot belonging to this local.
+    Anyway, the "affecting" nodes are updated, because now a definition
+    of a local used in this tree will affect only "indirect" (or also
+    "propagated") moves, but not *this* move (see below).
+  - If the local is in state [U], then the tree cannot be moved (it is
+    used two times): the move is canceled, and the state goes [E].
+  - If the local is in state [E], the use is ignored.
+
+[IndirectUse]
+  - All potentially affected trees (in state [D] or [U]) are killed.
+
+[SideEffect]
+  - Tree is marked as "unmovable".
+
+Then, at the end of the BB, for each ACT slot:
+  - If state is [U], the tree move is recorded in the "todo" list, but
+    flagged "unsafe".
+  - Anyway, state goes to [E], the [W] flag is set, and all "affecting"
+    lists are cleared (we get ready to traverse the next BB).
+Finally, when all BBs has been scanned, we traverse the "todo" list,
+moving all "safe" entries, and moving "unsafe" ones only if their ACT
+slot is not flagged [X].
+
+So far, so good.
+But there are two issues that make things harder :-(
+
+The first is the concept of "indirect tree move".
+It can happen that a tree is scheduled for moving, and its destination
+is a use that is located in a second tree, which could also be moved.
+The main issue is that a definition of a variable of the 1st tree on
+the path between the definition and the use of the 2nd one must prevent
+the move.
+But which move? The 1st or the 2nd?
+Well, any of the two!
+The point is, the 2nd move must be prevented *only* if the 1st one
+happens: if it is aborted (for an [X] flag or any other reason), the
+2nd move is OK, and vice versa...
+We must handle this in the following way:
+- The ACT must still remember if a slot is scheduled for moving in
+  this BB, and if it is, all the locals used in the tree.
+  We say that the slot is in state [M].
+  Note that [M] is (like [X] and [W]) a sort of "meta state": a local
+  is flagged [M] when it goes to state [U], and the flag is cleared
+  when the tree move is cancelled
+- A tree that uses a local whose slot is in state [M] is also using all
+  the locals used by the tree in state [M], but the use is "indirect".
+  These use nodes are also included in the "affecting" lists.
+- The definition of a variable used in an "indirect" way has the
+  effect of "linking" the two involved tree moves, saying that only one
+  of the two can happen in practice, but not both.
+- When the 2nd tree is scheduled for moving, the 1st one is *still* in
+  state [M], because a third move could "carry it forward", and all the
+  *three* moves should be mutually exclusive (to be safe!).
+
+The second tricky complication is the "tree forwarding" that can happen
+when copyprop is involved.
+It is conceptually similar to the "indirect tree move".
+Only, the 2nd tree is not really a tree, it is just the local defined
+in the 1st tree move.
+It can happen that copyprop will propagate the definition.
+We cannot make treeprop do the same job of copyprop, because copyprop
+has less constraints, and is therefore more powerful in its scope.
+The main issue is that treeprop cannot propagate a tree to *two* uses,
+while copyprop is perfectly capable of propagating one definition to
+two (or more) different places.
+So we must let copyprop do its job otherwise we'll miss optimizations,
+but we must also make it play safe with treeprop.
+Let's clarify with an example:
+  a = v1 + v2; //a is defined by a tree, state [D], uses v2 and v2
+  b = a; //a is used, state [U] with move scheduled, and
+         //b is defined by a, ACP[b] is a, and b is in state [DC]
+  c = b + v3; // b is used, goes to state [U]
+The real trouble is that copyprop happens *immediately*, while treeprop
+is deferred to the end of the CFG traversal.
+So, in the 3rd statement, the "b" is immediately turned into an "a" by
+copyprop, regardless of what treeprop will do.
+Anyway, if we are careful, this is not so bad.
+First of all, we must "accept" the fact that in the 3rd statement the
+"b" is in fact an "a", as treeprop must happen *after* copyprop.
+The real problem is that "a" is used twice: in the 2nd and 3rd lines.
+In our usual setup, the 2nd line would set it to [U], and the 3rd line
+would kill the move (and set "a" to [E]).
+I have tried to play tricks, and reason as of copyprop didn't happen,
+but everything becomes really messy.
+Instead, we should note that the 2nd line is very likely to be dead.
+At least in this BB, copyprop will turn all "b"s into "a"s as long as
+it can, and when it cannot, it will be because either "a" or "b" have
+been redefined, which would be after the tree move anyway.
+So, the reasoning gets different: let's pretend that "b" will be dead.
+This will make the "a" use in the 2nd statement useless, so there we
+can "reset" "a" to [D], but also take note that if "b" will end up
+not being dead, the tree move associated to this [D] must be aborted.
+We can detect this in the following way:
+- Either "b" is used before being defined in this BB, or
+- It will be flagged "unsafe".
+Both things are very easy to check.
+The only quirk is that the "affecting" lists must not be cleared when
+a slot goes to state [U], because a "propagation" could put it back
+to state [D] (where those lists are needed, because it can be killed
+by a definition to a used slot).
+
+-----------------------------------------------------------------------
+
+Implementation notes
+
+All the implementation runs inside the existing mono_local_cprop
+function, and a separate memory pool is used to hold the temporary
+data.
+
+A struct, MonoTreeMover, contains the pointers to the pool, the ACT,
+the list of scheduled moves and auxiliary things.
+This struct is allocated if the tree move pass is requested, and is
+then passed along to all the involved functions, which are therefore
+aware of the tree mover state.
+
+The ACT is an array of slots, obviously one per local.
+Each slot is of type MonoTreeMoverActSlot, and contains the used and
+affected locals, a pointer to the pending tree move and the "waiting"
+and "unsafe" flags.
+
+The "affecting" lists a built from "dependency nodes", of type
+MonoTreeMoverDependencyNode.
+Each of the nodes contains the used and affected local, and is in
+two lists: the locals used by a slot, and the locals affected by a
+slot (obviously a different one).
+So, each node means: "variable x is used in tree t, so a definition
+of x affects tree t".
+The "affecting" lists are doubly linked, to allow for O(1) deletion.
+The "used" lists are simply linked, but when they are mantained there
+is always a pointer to the last element to allow for O(1) list moving.
+When a used list is dismissed (which happens often, any time a node is
+killed), its nodes are unlinked from their respective affecting lists
+and are then put in a "free" list in the MonoTreeMover to be reused.
+
+Each tree move is represented by a struct (MonoTreeMoverTreeMove),
+which contains:
+- the definition and use points,
+- the "affected" moves (recall the concept of "indirect tree move"),
+- the "must be dead" slots (recall "tree forwarding"). and
+- a few utility flags.
+The tree moves stays in the relevant ACT slot until it is ready to be
+scheduled for moving, at which point it is put in a list in the
+MonoTreeMover.
+The tree moves structs are reused when they are killed, so there is
+also a "free" list for them in the MonoTreeMover.
+
+The tree mover code has been added to all the relevant functions that
+participate in consprop and copyprop, particularly:
+- mono_cprop_copy_values takes care of variable uses (transitions from
+  states [D] to [U] and [U] to [E] because of killing),
+- mono_cprop_invalidate_values takes care of side effects (indirect
+  accesses, calls...),
+- mono_local_cprop_bb sets up and cleans the traversals for each BB,
+  and for each MonoInst it takes care of variable definitions.
+To each of them has been added a MonoTreeMover parameter, which is not
+NULL if the tree mover is running.
+After mono_local_cprop_bb has run for all BBs, the MonoTreeMover has
+the list of all the pending moves, which must be walked to actually
+perform the moves (when possible, because "unsafe" flags, "affected"
+moves and "must be dead" slots can still have their effects, which
+must be handled now because they are fully known only at the end of
+the CFG traversal).
author	Massimiliano Mantione <massi@mono-cvs.ximian.com>	2006-03-31 10:59:23 +0400
committer	Massimiliano Mantione <massi@mono-cvs.ximian.com>	2006-03-31 10:59:23 +0400
commit	b37add239c289a64f1d54cb0f367d2e081c0984f (patch)
tree	d7659abcba0bfebe93f8839026d42f35d4825467 /docs/tree-mover.txt
parent	693aaf3bcc01559d2f2f33fa8c67ed8d96634c90 (diff)