docs/abc-removal.txt


1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302


Here "abc" stays for "array bounds check", or "array bound checks", or
some combination of the two...

------------------------------------------------------------------------------
USAGE

Simply use the "abcrem" optimization invoking mono.

To see if bound checks are actually removed, use "mono -v" and grep for
"ARRAY-ACCESS" in the output, there should be a message for each check
that has been removed.

To trace the algorithm execution, use "-v -v -v", and be prepared to be
totally submersed by debugging messages...

------------------------------------------------------------------------------
EFFECTIVENESS

The abc removal code can now always remove bound checks from "clean"
array scans in loops, and generally anyway there are clear conditions that
already state that the index is "safe".

To be clearer, and give an idea of what the algorithm can and cannot do
without describing it in detail... keep in mind that only "redunant" checks
cannot be removed. By "redundant", I mean "already explicitly checked" in
the method code.

Unfortunately, analyzing complex expressions is not so easy (see below for
details), so the capabilities of this "abc remover" are limited.

These are simple guidelines:
- Only expressions of the following kinds are handled:
  - constant
  - variable [+/- constant]
- Only comparisons between "handled" expressions are understood.
- "switch" statements are not yet handled.

This means that code like this will be handled well:

for (int i = 0; i < a.Length; i++) {
	a [i] = .....
}

The "i" variable could be declared out of the "for", the "for" could be a
"while", and maybe even implemented with "goto", the array could be scanned
in reverse order, and everything would still work.
I could have a temporary variable storing the array length, and check on
it inside the loop, and the abc removal would still occurr, like this:

int i = 0;
int l = a.Length;
while ( i < l ) {
	a [i] ......
}

or this:

int l = a.Length;
for (int i = l; i > 0; i--) {
	a [i] = .....
}

The following two examples would work:

for (int i = 0; i < (a.Length -1); i++) .....
for (int i = 0; i < a.Length; i += 2) .....

But just something like this

int delta = -1;
for (int i = 0; i < (a.Length + delta); i++) .....

or like this

int delta = +2;
for (int i = 0; i < a.Length; i += delta) .....

would not work, the check would stay there, sorry :-(
(unless a combination of cfold, consprop and copyprop is used, too, which
would make the constant value of "delta" explicit).

Just to make you understand how things are tricky... this would work!

int limit = a.Length - 1;
for (int i = 0; i < limit; i++) {
	a [i] = .....
}


A detailed explanation of the reason why things are done like this is given
below...

------------------------------------------------------------------------------
THE ALGORITHM

This array bound check removal (abc removal) algorithm is based on
symbolic execution concepts. I handle the removal like an "unreachable
code elimination" (and in fact the optimization could be extended to remove
also other unreachable sections of code, due to branches that "always go the
same way").

In symbolic execution, variables do not have actual (numeric) values, but
instead symbolic expressions (like "a", or "x+y").
Also, branch conditions are handled like symbolic conditions (like "i<k"),
which state relations between variable values.

The SSA representation inside mini is somewhat close to a symbolic
representation of the execution of the compiled method.
Particularly, the representation of variable values is exactly a symbolic
one. It is enough to find all CEE_STIND_* instructions which store to a
local variable, and their second argument is exactly the variable value.
Actually, "cfg->vars [<variable-index>]->def" should contain exactly
those store instructions, and the "get_variable_value_from_ssa_store"
function extracts the variable value from there.

On the other hand, the conditions under which each basic block is
executed are not made fully explicit.

However, it is not difficult to make them so.
Each BB that has more than one exit BB, in practice must end either with
a conditional branch instruction or with a switch instruction.
In the first case, the BB has exactly two exit BBs, and their execution
conditions are easy to get from the condition of the branch (see the
"get_relation_from_branch_instruction" function, and expecially the end of
"analyze_block" in abcremoval.c.
If there is a switch, the jump condition of every exit BB is the equality
of the switch argument with the particular index associated with its case
(but the current implementation does not handle switch statements yet).

These individual conditions are in practice associated to each arc that
connects BBs in the CFG (with the simple case that unconditional branches
have a "TRUE" condition, because they always happen).

So, for each BB, its *proper* entry condition is the union of all the
conditions associated to arcs that enter the BB. The "union" is like a
logical "or", in the sense that either of the condition could be true,
they are not necessarily all true. This means that if I can enter a BB
in two ways, and in one case I know that "x>0", and in the other that
"x==0", actually in the BB I know that "x>=0", which is a weaker
condition (the union of the two).

Also, the *complete* entry condition for a BB is the "intersection" of all
the entry conditions of its dominators. This is true because each dominator
is the only way to reach the BB, so the entry condition of each dominator
must be true if the control flow reached the BB. This translates to the
logical "and" of all the "proper" conditions of the BBs met walking up in the
dominator tree. So, if one says "x>0", and another "x==1", then I know
that "x==1", which is a stronger condition (the intersection of the two).
Note that, if the two conditions were "x>0" and "x==0", then the block would
be unreachable (the intersection is empty), because some branch is impossible.

Another observation is that, inside each BB, every variable is subject to the
complete entry condition of that very same BB, and not the one in which it is
defined (with the "complete entry condition" being the thing I defined before,
sorry if these terms "proper" and "complete" are strange, I found nothing
better).
This happens because the branch conditions are related to the control flow.
I can define "i=a", and if I am in a BB where "a>0", then "i>0", but not
otherwise.

So, intuitively, if the available conditions say "i>=0", and i is used as an
index in an array access, then the lower bound check can be omitted.
If the condition also says "(i>=0)&&(i<array.length)", the abc removal can
occur.

So, a complete solution to the problem of abc removal would be the following:
for each array access, build a system of equations containing:
[1] all the symbolic variable definitions
[2] the complete entry condition of the BB in which the array access occurs
[3] the two "goal functions" ("index >=0" and "index < array.length")
If the system is valid for *each possible* variable value, then the goal
functions are always true, and the abc can be removed.

All this discussion is useful to give a precise specification to the problem
we are trying to solve.
The trouble is that, in the general case, the resulting system of equations
is like a predicate in first order logic, which is semi-decidable, and its
general solution is anyway too complex to be attempted in a JIT compiler
(which should not contain a full fledged theorem prover).

Therefore, we must cut some corner.


By the way, there is also another big problem, which is caused by "recursive"
symbolic definitions. These definition can (and generally do) happen every
time there is a loop. For instance, in the following piece of code

for ( int i = 0; i < array.length; i++ ) {
	Console.WriteLine( "array [i] = " + array [i] );
}

one of the definitions of i is a PHI that can be either 0 or "i + 1".

Now, we know that mathematically "i = i + 1" does not make sense, and in
fact symbolic values are not "equations", they are "symbolic definitions".

The actual symbolic value of i is a generic "n", where "n" is the number of
iterations of the loop, but this is terrible to handle (and in more complex
examples the symbolic value of i simply cannot be written, because i is
calculated in an iterative way).

However, the definition "i = i + 1" tells us something about i: it tells us
that i "grows". So (from the PHI definition) we know that i is either 0, or
"grows". This is enough to tell that "i>=0", which is what we want!
It is important to note that recursive definitions can only occurr inside
PHI definitions, because actually a variable cannot be defined *only* in terms
of itself!


At this point, I can explain which corners I want to cut to make the
problem solvable. It will not remove all the abc that could theoretically
be removed, but at least it will work.

The easiest way to cut corners is to only handle expressions which are
"reasonably simple", and ignore the rest.
Keep in mind that ignoring an expression is not harmful in itself.
The algorithm will be simply "less powerful", because it will ignore
conditions that could have caused to the removal of an abc, but will
not remove checks "by mistake" (so the resulting code will be in any case
correct).

The expressions we handle are the following (all of integer type):
- constant
- variable
- variable + constant
- constant + variable
- variable - constant

And, of course, PHI definitions.
Any other expression causes the introduction of an "any" value in the
evaluation, which makes all values that depend from it unknown as well.

We will call these kind of definitions "summarizable" definitions.

In a first attempt, we can consider only branch conditions that have the
simplest possible form (the comparison of two summarizable expressions).

We can also simplify the effect of variable definitions, keeping only
what is relevant to know: their value range with respect to zero and with
respect to the length of the array we are currently handling.

One particular note on PHI functions: they work (obviously) like the logical
"or" of their definitions, and therefore are equivalent to the "logical or"
of the summarization of their definitions.

About recursive definitions (which, believe me, are the worst thing in all
this mess), we handle only "monotonic" ones. That is, we try to understand
if the recursive definition (which, as we said above, must happen because
of a loop) always "grows" or "gets smaller". In all other cases, we decide
we cannot handle it.


One critical thing, once we have defined all these data structures, is how
the evaluation is actually performed.

In a first attempt I coded a "brute force" approach, which for each BB
tried to examine all possible conditions between all variables, filling
a sort of "evaluation matrix". The problem was that the complexity of this
evaluation was quadratic (or worse) on the number of variables, and that
many wariables were examined even if they were not involved in any array
access.

Following the ABCD paper (http://citeseer.ist.psu.edu/bodik00abcd.html),
I rewrote the algorithm in a more "sparse" way.
Now, the main data structure is a graph of relations between variables, and
each attempt to remove a check performs a traversal of the graph, looking
for a path from the index to the array length that satisfies the properties
"index >= 0" and "index < length". If such a path is found, the check is
removed. It is true that in theory *each* traversal has a complexity which
is exponential on the number of variables, but in practice the graph is not
very connected, so the traversal terminates quickly.


Then, the algorithm to optimize one method looks like this:

[1] Preparation:
    [1a] Build the SSA representation.
    [1b] Prepare the evaluation graph (empty)
    [1b] Summarize each varible definition, and put the resulting relations
         in the evaluation graph
[2] Analyze each BB, starting from the entry point and following the
    dominator tree:
    [2a] Summarize its entry condition, and put the resulting relations
         in the evaluation graph (this is the reason why the BBs are examined
	 following the dominator tree, so that conditions are added to the
	 graph in a "cumulative" way)
    [2b] Scan the BB instructions, and for each array access perform step [3]
    [2c] Process children BBs following the dominator tree (step [2])
    [2d] Remove from the evaluation area the conditions added at step [2a]
         (so that backtracking along the tree the area is properly cleared)
[3] Attempt the removal:
    [3a] Summarize the index expression, to see if we can handle it; there
         are three cases: the index is either a constant, or a variable (with
	 an optional delta) or cannot be handled (is a "any")
    [3b] If the index can be handled, traverse the evaluation area searching
         a path from the index variable to the array length (if the index is
	 a constant, just examine the array length to see if it has some
	 relation with this constant)
    [3c] Use the results of step [3b] to decide if the check is redundant