The loop hijack worker routine is not honoring the contract that it should be. Namely, the runtime is not allowed to trash any registers in our worker (except r12 on ARM). The two big oversights were scratch FP registers and the flags registers.
I have also added a per-module map from loop index to target address (thus requiring all the shash.h includes). This primarily helps gcstress throughput because the loop indirection cell address calculation ends up being surprisingly lengthy. I considered the other obvious approach of "back-patching" the loop indirection cell in the gcstress case (normal loop hijacking does this, but under gcstress, we do not). However, I ended up preferring this because it could help GC suspension latency in normal operation.
[tfs-changeset: 1573401]
This change enables compilation of the runtime excluding the PAL
layer on Linux.
Most of the changes are just to make it build with clang that's
more strict w.r.t. the C++11 standard.
In addition to that, I have removed our implementation of the
new / delete operators and replaced all calls to new in the
runtime by new (nothrow).