Welcome to mirror list, hosted at ThFree Co, Russian Federation.

github.com/mono/corert.git - Unnamed repository; edit this file 'description' to name the repository.
summaryrefslogtreecommitdiff
diff options
context:
space:
mode:
authordotnet bot <dotnet-bot@microsoft.com>2016-04-14 23:59:59 +0300
committerMichal Strehovský <MichalStrehovsky@users.noreply.github.com>2016-04-14 23:59:59 +0300
commit9ed921ffded2f76e55e9a24fd8b8c9d3d2570c43 (patch)
tree62a773a7c336793f6b17f4be98c1ebbbd99d82f5 /src/Native/Runtime/i386
parent9804f42dce998498b693915ff1b2051478dadc97 (diff)
Improve Reverse PInvoke performance in ProjectN (#1138)
Recently while experimenting with interop performance micro-benchmarks, I observed that Reverse PInvoke code path in ProjectN is 8-10x slower than Desktop CLR. Morgan also noticed the slow down and extra memory allocation on this code path while doing the Unity app investigation. This change aims to improve this code path to achieve parity with Desktop CLR. This change improve Reverse PInvoke performance by: 1. Remove the dictionary in McgModuleManager which maps between thunk address and delegate. Instead store a weak GCHandle of the delegate in the thunk data section. 2. Optimize open static delegate by storing the static function pointer directly in the thunk data. And later on reverse pinvoke code path directly do a CallI on the function pointer. Modified MCG to generate special function to handle open static delegates. 3. While storing the function pointer, store jump stub code target address. Added a runtime helper to get the jump stub target. 4. Reorder some instructions in RhpReversePInvoke and InteropNative_CommonStub functions so that the hot path get better instruction cache use. Results: X86 33% slower than Desktop CLR (75 vs 92 instruction) 5.5x faster than latest ProjectN AMD64 9% faster than Desktop CLR (54 vs 67 instructions) 7.5x faster than latest ProjectN [tfs-changeset: 1596255]
Diffstat (limited to 'src/Native/Runtime/i386')
-rw-r--r--src/Native/Runtime/i386/PInvoke.asm30
1 files changed, 16 insertions, 14 deletions
diff --git a/src/Native/Runtime/i386/PInvoke.asm b/src/Native/Runtime/i386/PInvoke.asm
index 219387ad1..2e0c0ca9d 100644
--- a/src/Native/Runtime/i386/PInvoke.asm
+++ b/src/Native/Runtime/i386/PInvoke.asm
@@ -135,21 +135,8 @@ ThreadAttached:
;; 2) Performing a managed delegate invoke on a reverse pinvoke delegate.
;;
cmp dword ptr [edx + OFFSETOF__Thread__m_pTransitionFrame], 0
- jne ValidTransition
+ je CheckBadTransition
- ;; Allow 'bad transitions' in when the TSF_DoNotTriggerGc mode is set. This allows us to have
- ;; [NativeCallable] methods that are called via the "restricted GC callouts" as well as from native,
- ;; which is necessary because the methods are CCW vtable methods on interfaces passed to native.
- test dword ptr [edx + OFFSETOF__Thread__m_ThreadStateFlags], TSF_DoNotTriggerGc
- jz BadTransition
-
- ;; zero-out our 'previous transition frame' save slot
- mov dword ptr [eax], 0
-
- ;; nothing more to do
- jmp AllDone
-
-ValidTransition:
; Save previous TransitionFrame prior to making the mode transition so that it is always valid
; whenever we might attempt to hijack this thread.
mov ecx, [edx + OFFSETOF__Thread__m_pTransitionFrame]
@@ -164,6 +151,21 @@ AllDone:
pop edx ; restore arg reg
pop ecx ; restore arg reg
ret
+
+CheckBadTransition:
+ ;; Allow 'bad transitions' in when the TSF_DoNotTriggerGc mode is set. This allows us to have
+ ;; [NativeCallable] methods that are called via the "restricted GC callouts" as well as from native,
+ ;; which is necessary because the methods are CCW vtable methods on interfaces passed to native.
+ test dword ptr [edx + OFFSETOF__Thread__m_ThreadStateFlags], TSF_DoNotTriggerGc
+ jz BadTransition
+
+ ;; zero-out our 'previous transition frame' save slot
+ mov dword ptr [eax], 0
+
+ ;; nothing more to do
+ jmp AllDone
+
+
AttachThread:
;;