github.com/dotnet/runtime.git - Unnamed repository; edit this file 'description' to name the repository.

Age	Commit message (Collapse)	Author
2021-07-13	Import `cgt.un(op, 0)` as `NE(op, 0)` (#54539)	SingleAccretion
	* Some minor code modernization Use "genActualType" directly as it is now a templated method. Don't create casts to TYP_ULONG - they are identical to casts to TYP_LONG. TYP_ULONG is only relevant for checked casts. Add a TODO on addressing the duplicated logic that upcasts to native int from int32 on 64 bit. Use modern comments. Zero diffs. * Normalize GT_UN(op, 0) early in importer Normalizing this idiom helps downstream optimizations. * Solve most of the regressions In morph, when narrowing the AND operand, only insert casts if necessary - prefer to use "optNarrowTree". Otherwise we end up with redundant register shuffling.
2021-07-13	Do not eliminate casts from FP types (#53667)	SingleAccretion
	An optimization in morph was deleting casts on the RHS of a narrow store if the cast-to-type was wider than the type being stored. This is only valid for casts from integral types, as the backend does not handle "STOREIND(byte*, double)", nor is there an instruction to go from an XMM register to a narrow memory location on x86/x64. The issue is not reproducible right now because fgMorphCast wraps the casts in question, but it is an invalid IR transformation nonetheless, and similar code in fgMorphSmpOpOptional guards against non-integral sources.
2021-07-13	Fix bad folding (#54722)	SingleAccretion
	* Always sign-extend from int32 in gtFoldExprConst GT_CNS_INT of TYP_INT on 64 hosts has an implicit contract: the value returned by IconValue() must "fit" into 32 bit signed integer, i. e. "static_cast<int>(IconValue()) == IconValue()". gtFoldExprConst was failing to uphold this contract when the target was 32 bit and the host was 64 bit. Fix this by always truncating before calling SetIconValue(). * Add a simple test that reproduces bad codegen
2021-07-13	Enreg structs x86 windows. (#55535)	Sergey Andreenko
	* Mark more cases as DoNotEnreg before CSE. There are CSE metrics that take into account how many potential enreg locals do we have. * enable for x86.
2021-07-13	Move the "do not zero-extend setcc" optimization to lower (#53778)	SingleAccretion
	* Strongly type StoreInd lowering * Improve clarity of code through the use of helpers * Move the "do not zero-extend setcc" to lowering It is XARCH-specific and moving it eliminates questionable code that is trying to compensate for CSE changing the store. * Delete now unnecessary copying of the relop type
2021-07-13	Fix more alloc-dealloc mismatches and use-after-scopes (#55420)	Jeremy Koritzinsky
	* Fix another dynamically-sized allocation to use new/delete instead of the mismatched new[]/delete. * Fix use-after-scope * Fix another alloc-dealloc mismatch * Update src/coreclr/vm/threadstatics.cpp Co-authored-by: Jan Kotas <jkotas@microsoft.com> * Use standard size_t instead of custom SIZE_T typedef. * Fix formatting. Co-authored-by: Jan Kotas <jkotas@microsoft.com>
2021-07-12	Use init rather than ms extension syntax (#55475)	Adeel Mujahid

2021-07-10	Improve loop cloning, with debugging improvements (#55299)	Bruce Forstall
	When loop cloning was creating cloning conditions, it was creating unnecessary bounds checks in some multi-dimensional array index cases. When creating a set of cloning conditions, first a null check is done, then an array length check is done, etc. Thus, the array length expression itself won't fault because we've already done a null check. And a subsequent array index expression won't fault (or need a bounds check) because we've already checked the array length (i.e., we've done a manual bounds check). So, stop creating the unnecessary bounds checks, and mark the appropriate instructions as non-faulting by clearing the GTF_EXCEPT bit. Note that I did not turn on the code to clear GTF_EXCEPT for array length checks because it leads to negative downstream effects in CSE. Namely, there end up being array length expressions that are identical except for the exception bit. When CSE sees this, it gives up on creating a CSE, which leads to regressions in some cases where we don't CSE the array length expression. Also, for multi-dimension jagged arrays, when optimizing the fast path, we were not removing as many bounds checks as we could. In particular, we weren't removing outer bounds checks, only inner ones. Add code to handle all the bounds checks. There are some runtime improvements (measured via BenchmarkDotNet on the JIT microbenchmarks), but also some regressions, due, as far as I can tell, to the Intel jcc erratum performance impact. In particular, benchmark ludcmp shows up to a 9% regression due to a `jae` instruction in the hot loop now crossing a 32-byte boundary due to code changes earlier in the function affecting instruction alignment. The hot loop itself is exactly the same (module register allocation differences). As there is nothing that can be done (without mitigating the jcc erratum) -- it's "bad luck". In addition to those functional changes, there are a number of debugging-related improvements: 1. Loop cloning: (a) Improved dumping of cloning conditions and other things, (b) remove an unnecessary member to `LcOptInfo`, (c) convert the `LoopCloneContext` raw arrays to `jitstd::vector` for easier debugging, as clrjit.natvis can be taught to understand them. 2. CSE improvements: (a) Add `getCSEAvailBit` and `getCSEAvailCrossCallBit` functions to avoid multiple hard-codings of these expresions, (b) stop printing all the details of the CSE dataflow to JitDump; just print the result, (c) add `optPrintCSEDataFlowSet` function to print the CSE dataflow set in symbolic form, not just the raw bits, (d) added `FMT_CSE` string to use for formatting CSE candidates, (e) added `optOptimizeCSEs` to the phase structure for JitDump output, (f) remove unused `optCSECandidateTotal` (remnant of Valnum + lexical CSE) 3. Alignment: (a) Moved printing of alignment boundaries from `emitIssue1Instr` to `emitEndCodeGen`, to avoid the possibility of reading an instruction beyond the basic block. Also, improved the Intel jcc erratum criteria calculations, (b) Change `align` instructions of zero size to have a zero PerfScore throughput number (since they don't generate code), (c) Add `COMPlus_JitDasmWithAlignmentBoundaries` to force disasm output to display alignment boundaries. 4. Codegen / Emitter: (a) Added `emitLabelString` function for constructing a string to display for a bound emitter label. Created `emitPrintLabel` to directly print the label, (b) Add `genInsDisplayName` function to create a string for use when outputting an instruction. For xarch, this prepends the "v" for SIMD instructions, as necessary. This is preferable to calling the raw `genInsName` function, (c) For each insGroup, created a debug-only list of basic blocks that contributed code to that insGroup. Display this set of blocks in the JitDump disasm output, with block ID. This is useful for looking at an IG, and finding the blocks in a .dot flow graph visualization that contributed to it, (d) remove unused `instDisp` 5. Clrjit.natvis: (a) add support for `jitstd::vector`, `JitExpandArray<T>`, `JitExpandArrayStack<T>`, `LcOptInfo`. 6. Misc: (a) When compacting an empty loop preheader block with a subsequent block, clear the preheader flag. ## benchmarks.run.windows.x64.checked.mch: ``` Summary of Code Size diffs: (Lower is better) Total bytes of base: 25504 Total bytes of diff: 25092 Total bytes of delta: -412 (-1.62% of base) Total relative delta: -0.31 diff is an improvement. relative diff is an improvement. ``` <details> <summary>Detail diffs</summary> ``` Top file improvements (bytes): -92 : 14861.dasm (-2.57% of base) -88 : 2430.dasm (-0.77% of base) -68 : 12182.dasm (-3.82% of base) -48 : 24678.dasm (-1.61% of base) -31 : 21598.dasm (-5.13% of base) -26 : 21601.dasm (-4.57% of base) -21 : 25069.dasm (-7.14% of base) -16 : 14859.dasm (-1.38% of base) -11 : 14862.dasm (-1.35% of base) -6 : 21600.dasm (-1.83% of base) -5 : 25065.dasm (-0.58% of base) 11 total files with Code Size differences (11 improved, 0 regressed), 1 unchanged. Top method improvements (bytes): -92 (-2.57% of base) : 14861.dasm - LUDecomp:ludcmp(System.Double[][],int,System.Int32[],byref):int -88 (-0.77% of base) : 2430.dasm - System.DefaultBinder:BindToMethod(int,System.Reflection.MethodBase[],byref,System.Reflection.ParameterModifier[],System.Globalization.CultureInfo,System.String[],byref):System.Reflection.MethodBase:this -68 (-3.82% of base) : 12182.dasm - AssignJagged:second_assignments(System.Int32[][],System.Int16[][]) -48 (-1.61% of base) : 24678.dasm - Benchstone.BenchI.MulMatrix:Inner(System.Int32[][],System.Int32[][],System.Int32[][]) -31 (-5.13% of base) : 21598.dasm - Benchstone.BenchI.Array2:Bench(int):bool -26 (-4.57% of base) : 21601.dasm - Benchstone.BenchI.Array2:VerifyCopy(System.Int32[][][],System.Int32[][][]):bool -21 (-7.14% of base) : 25069.dasm - Benchstone.BenchF.InProd:InnerProduct(byref,System.Double[][],System.Double[][],int,int) -16 (-1.38% of base) : 14859.dasm - LUDecomp:build_problem(System.Double[][],int,System.Double[]) -11 (-1.35% of base) : 14862.dasm - LUDecomp:lubksb(System.Double[][],int,System.Int32[],System.Double[]) -6 (-1.83% of base) : 21600.dasm - Benchstone.BenchI.Array2:Initialize(System.Int32[][][]) -5 (-0.58% of base) : 25065.dasm - Benchstone.BenchF.InProd:Test():bool:this Top method improvements (percentages): -21 (-7.14% of base) : 25069.dasm - Benchstone.BenchF.InProd:InnerProduct(byref,System.Double[][],System.Double[][],int,int) -31 (-5.13% of base) : 21598.dasm - Benchstone.BenchI.Array2:Bench(int):bool -26 (-4.57% of base) : 21601.dasm - Benchstone.BenchI.Array2:VerifyCopy(System.Int32[][][],System.Int32[][][]):bool -68 (-3.82% of base) : 12182.dasm - AssignJagged:second_assignments(System.Int32[][],System.Int16[][]) -92 (-2.57% of base) : 14861.dasm - LUDecomp:ludcmp(System.Double[][],int,System.Int32[],byref):int -6 (-1.83% of base) : 21600.dasm - Benchstone.BenchI.Array2:Initialize(System.Int32[][][]) -48 (-1.61% of base) : 24678.dasm - Benchstone.BenchI.MulMatrix:Inner(System.Int32[][],System.Int32[][],System.Int32[][]) -16 (-1.38% of base) : 14859.dasm - LUDecomp:build_problem(System.Double[][],int,System.Double[]) -11 (-1.35% of base) : 14862.dasm - LUDecomp:lubksb(System.Double[][],int,System.Int32[],System.Double[]) -88 (-0.77% of base) : 2430.dasm - System.DefaultBinder:BindToMethod(int,System.Reflection.MethodBase[],byref,System.Reflection.ParameterModifier[],System.Globalization.CultureInfo,System.String[],byref):System.Reflection.MethodBase:this -5 (-0.58% of base) : 25065.dasm - Benchstone.BenchF.InProd:Test():bool:this 11 total methods with Code Size differences (11 improved, 0 regressed), 1 unchanged. ``` </details> -------------------------------------------------------------------------------- ``` Summary of Perf Score diffs: (Lower is better) Total PerfScoreUnits of base: 38374.96 Total PerfScoreUnits of diff: 37914.07000000001 Total PerfScoreUnits of delta: -460.89 (-1.20% of base) Total relative delta: -0.12 diff is an improvement. relative diff is an improvement. ``` <details> <summary>Detail diffs</summary> ``` Top file improvements (PerfScoreUnits): -220.67 : 24678.dasm (-1.74% of base) -99.27 : 14861.dasm (-2.09% of base) -66.30 : 21598.dasm (-1.41% of base) -18.73 : 2430.dasm (-0.28% of base) -18.40 : 21601.dasm (-1.37% of base) -9.73 : 25065.dasm (-0.56% of base) -9.05 : 14859.dasm (-0.77% of base) -5.51 : 21600.dasm (-0.77% of base) -4.15 : 12182.dasm (-0.17% of base) -3.92 : 14860.dasm (-0.32% of base) -3.46 : 25069.dasm (-2.31% of base) -1.70 : 14862.dasm (-0.20% of base) 12 total files with Perf Score differences (12 improved, 0 regressed), 0 unchanged. Top method improvements (PerfScoreUnits): -220.67 (-1.74% of base) : 24678.dasm - Benchstone.BenchI.MulMatrix:Inner(System.Int32[][],System.Int32[][],System.Int32[][]) -99.27 (-2.09% of base) : 14861.dasm - LUDecomp:ludcmp(System.Double[][],int,System.Int32[],byref):int -66.30 (-1.41% of base) : 21598.dasm - Benchstone.BenchI.Array2:Bench(int):bool -18.73 (-0.28% of base) : 2430.dasm - System.DefaultBinder:BindToMethod(int,System.Reflection.MethodBase[],byref,System.Reflection.ParameterModifier[],System.Globalization.CultureInfo,System.String[],byref):System.Reflection.MethodBase:this -18.40 (-1.37% of base) : 21601.dasm - Benchstone.BenchI.Array2:VerifyCopy(System.Int32[][][],System.Int32[][][]):bool -9.73 (-0.56% of base) : 25065.dasm - Benchstone.BenchF.InProd:Test():bool:this -9.05 (-0.77% of base) : 14859.dasm - LUDecomp:build_problem(System.Double[][],int,System.Double[]) -5.51 (-0.77% of base) : 21600.dasm - Benchstone.BenchI.Array2:Initialize(System.Int32[][][]) -4.15 (-0.17% of base) : 12182.dasm - AssignJagged:second_assignments(System.Int32[][],System.Int16[][]) -3.92 (-0.32% of base) : 14860.dasm - LUDecomp:DoLUIteration(System.Double[][],System.Double[],System.Double[][][],System.Double[][],int):long -3.46 (-2.31% of base) : 25069.dasm - Benchstone.BenchF.InProd:InnerProduct(byref,System.Double[][],System.Double[][],int,int) -1.70 (-0.20% of base) : 14862.dasm - LUDecomp:lubksb(System.Double[][],int,System.Int32[],System.Double[]) Top method improvements (percentages): -3.46 (-2.31% of base) : 25069.dasm - Benchstone.BenchF.InProd:InnerProduct(byref,System.Double[][],System.Double[][],int,int) -99.27 (-2.09% of base) : 14861.dasm - LUDecomp:ludcmp(System.Double[][],int,System.Int32[],byref):int -220.67 (-1.74% of base) : 24678.dasm - Benchstone.BenchI.MulMatrix:Inner(System.Int32[][],System.Int32[][],System.Int32[][]) -66.30 (-1.41% of base) : 21598.dasm - Benchstone.BenchI.Array2:Bench(int):bool -18.40 (-1.37% of base) : 21601.dasm - Benchstone.BenchI.Array2:VerifyCopy(System.Int32[][][],System.Int32[][][]):bool -9.05 (-0.77% of base) : 14859.dasm - LUDecomp:build_problem(System.Double[][],int,System.Double[]) -5.51 (-0.77% of base) : 21600.dasm - Benchstone.BenchI.Array2:Initialize(System.Int32[][][]) -9.73 (-0.56% of base) : 25065.dasm - Benchstone.BenchF.InProd:Test():bool:this -3.92 (-0.32% of base) : 14860.dasm - LUDecomp:DoLUIteration(System.Double[][],System.Double[],System.Double[][][],System.Double[][],int):long -18.73 (-0.28% of base) : 2430.dasm - System.DefaultBinder:BindToMethod(int,System.Reflection.MethodBase[],byref,System.Reflection.ParameterModifier[],System.Globalization.CultureInfo,System.String[],byref):System.Reflection.MethodBase:this -1.70 (-0.20% of base) : 14862.dasm - LUDecomp:lubksb(System.Double[][],int,System.Int32[],System.Double[]) -4.15 (-0.17% of base) : 12182.dasm - AssignJagged:second_assignments(System.Int32[][],System.Int16[][]) 12 total methods with Perf Score differences (12 improved, 0 regressed), 0 unchanged. ``` </details> -------------------------------------------------------------------------------- ## coreclr_tests.pmi.windows.x64.checked.mch: ``` Summary of Code Size diffs: (Lower is better) Total bytes of base: 25430 Total bytes of diff: 24994 Total bytes of delta: -436 (-1.71% of base) Total relative delta: -0.42 diff is an improvement. relative diff is an improvement. ``` <details> <summary>Detail diffs</summary> ``` Top file improvements (bytes): -92 : 194668.dasm (-2.57% of base) -68 : 194589.dasm (-3.82% of base) -48 : 248565.dasm (-1.61% of base) -32 : 249053.dasm (-3.58% of base) -31 : 251012.dasm (-5.13% of base) -26 : 251011.dasm (-4.57% of base) -19 : 248561.dasm (-6.76% of base) -16 : 194667.dasm (-1.38% of base) -15 : 252241.dasm (-0.72% of base) -12 : 252242.dasm (-0.81% of base) -11 : 194669.dasm (-1.35% of base) -9 : 246308.dasm (-1.06% of base) -9 : 246307.dasm (-1.06% of base) -9 : 246245.dasm (-1.06% of base) -9 : 246246.dasm (-1.06% of base) -6 : 228622.dasm (-0.77% of base) -6 : 251010.dasm (-1.83% of base) -5 : 248557.dasm (-0.61% of base) -4 : 249054.dasm (-0.50% of base) -4 : 249052.dasm (-0.47% of base) 22 total files with Code Size differences (22 improved, 0 regressed), 1 unchanged. Top method improvements (bytes): -92 (-2.57% of base) : 194668.dasm - LUDecomp:ludcmp(System.Double[][],int,System.Int32[],byref):int -68 (-3.82% of base) : 194589.dasm - AssignJagged:second_assignments(System.Int32[][],System.Int16[][]) -48 (-1.61% of base) : 248565.dasm - Benchstone.BenchI.MulMatrix:Inner(System.Int32[][],System.Int32[][],System.Int32[][]) -32 (-3.58% of base) : 249053.dasm - SimpleArray_01.Test:BadMatrixMul2() -31 (-5.13% of base) : 251012.dasm - Benchstone.BenchI.Array2:Bench(int):bool -26 (-4.57% of base) : 251011.dasm - Benchstone.BenchI.Array2:VerifyCopy(System.Int32[][][],System.Int32[][][]):bool -19 (-6.76% of base) : 248561.dasm - Benchstone.BenchF.InProd:InnerProduct(byref,System.Double[][],System.Double[][],int,int) -16 (-1.38% of base) : 194667.dasm - LUDecomp:build_problem(System.Double[][],int,System.Double[]) -15 (-0.72% of base) : 252241.dasm - Complex_Array_Test:Main(System.String[]):int -12 (-0.81% of base) : 252242.dasm - Simple_Array_Test:Main(System.String[]):int -11 (-1.35% of base) : 194669.dasm - LUDecomp:lubksb(System.Double[][],int,System.Int32[],System.Double[]) -9 (-1.06% of base) : 246308.dasm - DefaultNamespace.MulDimJagAry:SetThreeDimJagVarAry(System.Object[][][],int,int):this -9 (-1.06% of base) : 246307.dasm - DefaultNamespace.MulDimJagAry:SetThreeDimJagAry(System.Object[][][],int,int):this -9 (-1.06% of base) : 246245.dasm - DefaultNamespace.MulDimJagAry:SetThreeDimJagAry(System.Object[][][],int,int):this -9 (-1.06% of base) : 246246.dasm - DefaultNamespace.MulDimJagAry:SetThreeDimJagVarAry(System.Object[][][],int,int):this -6 (-0.77% of base) : 228622.dasm - SciMark2.LU:solve(System.Double[][],System.Int32[],System.Double[]) -6 (-1.83% of base) : 251010.dasm - Benchstone.BenchI.Array2:Initialize(System.Int32[][][]) -5 (-0.61% of base) : 248557.dasm - Benchstone.BenchF.InProd:Bench():bool -4 (-0.50% of base) : 249054.dasm - SimpleArray_01.Test:BadMatrixMul3() -4 (-0.47% of base) : 249052.dasm - SimpleArray_01.Test:BadMatrixMul1() Top method improvements (percentages): -19 (-6.76% of base) : 248561.dasm - Benchstone.BenchF.InProd:InnerProduct(byref,System.Double[][],System.Double[][],int,int) -31 (-5.13% of base) : 251012.dasm - Benchstone.BenchI.Array2:Bench(int):bool -26 (-4.57% of base) : 251011.dasm - Benchstone.BenchI.Array2:VerifyCopy(System.Int32[][][],System.Int32[][][]):bool -68 (-3.82% of base) : 194589.dasm - AssignJagged:second_assignments(System.Int32[][],System.Int16[][]) -32 (-3.58% of base) : 249053.dasm - SimpleArray_01.Test:BadMatrixMul2() -92 (-2.57% of base) : 194668.dasm - LUDecomp:ludcmp(System.Double[][],int,System.Int32[],byref):int -6 (-1.83% of base) : 251010.dasm - Benchstone.BenchI.Array2:Initialize(System.Int32[][][]) -48 (-1.61% of base) : 248565.dasm - Benchstone.BenchI.MulMatrix:Inner(System.Int32[][],System.Int32[][],System.Int32[][]) -16 (-1.38% of base) : 194667.dasm - LUDecomp:build_problem(System.Double[][],int,System.Double[]) -11 (-1.35% of base) : 194669.dasm - LUDecomp:lubksb(System.Double[][],int,System.Int32[],System.Double[]) -3 (-1.11% of base) : 249057.dasm - SimpleArray_01.Test:Test2() -9 (-1.06% of base) : 246308.dasm - DefaultNamespace.MulDimJagAry:SetThreeDimJagVarAry(System.Object[][][],int,int):this -9 (-1.06% of base) : 246307.dasm - DefaultNamespace.MulDimJagAry:SetThreeDimJagAry(System.Object[][][],int,int):this -9 (-1.06% of base) : 246245.dasm - DefaultNamespace.MulDimJagAry:SetThreeDimJagAry(System.Object[][][],int,int):this -9 (-1.06% of base) : 246246.dasm - DefaultNamespace.MulDimJagAry:SetThreeDimJagVarAry(System.Object[][][],int,int):this -12 (-0.81% of base) : 252242.dasm - Simple_Array_Test:Main(System.String[]):int -6 (-0.77% of base) : 228622.dasm - SciMark2.LU:solve(System.Double[][],System.Int32[],System.Double[]) -15 (-0.72% of base) : 252241.dasm - Complex_Array_Test:Main(System.String[]):int -5 (-0.61% of base) : 248557.dasm - Benchstone.BenchF.InProd:Bench():bool -4 (-0.50% of base) : 249054.dasm - SimpleArray_01.Test:BadMatrixMul3() 22 total methods with Code Size differences (22 improved, 0 regressed), 1 unchanged. ``` </details> -------------------------------------------------------------------------------- ``` Summary of Perf Score diffs: (Lower is better) Total PerfScoreUnits of base: 161610.68999999997 Total PerfScoreUnits of diff: 160290.10999999996 Total PerfScoreUnits of delta: -1320.58 (-0.82% of base) Total relative delta: -0.20 diff is an improvement. relative diff is an improvement. ``` <details> <summary>Detail diffs</summary> ``` Top file improvements (PerfScoreUnits): -639.25 : 252241.dasm (-0.97% of base) -220.67 : 248565.dasm (-1.74% of base) -132.59 : 252242.dasm (-0.26% of base) -99.27 : 194668.dasm (-2.09% of base) -66.30 : 251012.dasm (-1.41% of base) -62.20 : 249053.dasm (-2.74% of base) -18.40 : 251011.dasm (-1.37% of base) -9.33 : 248557.dasm (-0.54% of base) -9.05 : 194667.dasm (-0.77% of base) -8.32 : 249054.dasm (-0.42% of base) -5.85 : 246308.dasm (-0.52% of base) -5.85 : 246307.dasm (-0.52% of base) -5.85 : 246245.dasm (-0.52% of base) -5.85 : 246246.dasm (-0.52% of base) -5.51 : 251010.dasm (-0.77% of base) -4.36 : 249052.dasm (-0.22% of base) -4.16 : 253363.dasm (-0.21% of base) -4.15 : 194589.dasm (-0.17% of base) -3.92 : 194666.dasm (-0.32% of base) -3.41 : 248561.dasm (-2.29% of base) 23 total files with Perf Score differences (23 improved, 0 regressed), 0 unchanged. Top method improvements (PerfScoreUnits): -639.25 (-0.97% of base) : 252241.dasm - Complex_Array_Test:Main(System.String[]):int -220.67 (-1.74% of base) : 248565.dasm - Benchstone.BenchI.MulMatrix:Inner(System.Int32[][],System.Int32[][],System.Int32[][]) -132.59 (-0.26% of base) : 252242.dasm - Simple_Array_Test:Main(System.String[]):int -99.27 (-2.09% of base) : 194668.dasm - LUDecomp:ludcmp(System.Double[][],int,System.Int32[],byref):int -66.30 (-1.41% of base) : 251012.dasm - Benchstone.BenchI.Array2:Bench(int):bool -62.20 (-2.74% of base) : 249053.dasm - SimpleArray_01.Test:BadMatrixMul2() -18.40 (-1.37% of base) : 251011.dasm - Benchstone.BenchI.Array2:VerifyCopy(System.Int32[][][],System.Int32[][][]):bool -9.33 (-0.54% of base) : 248557.dasm - Benchstone.BenchF.InProd:Bench():bool -9.05 (-0.77% of base) : 194667.dasm - LUDecomp:build_problem(System.Double[][],int,System.Double[]) -8.32 (-0.42% of base) : 249054.dasm - SimpleArray_01.Test:BadMatrixMul3() -5.85 (-0.52% of base) : 246308.dasm - DefaultNamespace.MulDimJagAry:SetThreeDimJagVarAry(System.Object[][][],int,int):this -5.85 (-0.52% of base) : 246307.dasm - DefaultNamespace.MulDimJagAry:SetThreeDimJagAry(System.Object[][][],int,int):this -5.85 (-0.52% of base) : 246245.dasm - DefaultNamespace.MulDimJagAry:SetThreeDimJagAry(System.Object[][][],int,int):this -5.85 (-0.52% of base) : 246246.dasm - DefaultNamespace.MulDimJagAry:SetThreeDimJagVarAry(System.Object[][][],int,int):this -5.51 (-0.77% of base) : 251010.dasm - Benchstone.BenchI.Array2:Initialize(System.Int32[][][]) -4.36 (-0.22% of base) : 249052.dasm - SimpleArray_01.Test:BadMatrixMul1() -4.16 (-0.21% of base) : 253363.dasm - MatrixMul.Test:MatrixMul() -4.15 (-0.17% of base) : 194589.dasm - AssignJagged:second_assignments(System.Int32[][],System.Int16[][]) -3.92 (-0.32% of base) : 194666.dasm - LUDecomp:DoLUIteration(System.Double[][],System.Double[],System.Double[][][],System.Double[][],int):long -3.41 (-2.29% of base) : 248561.dasm - Benchstone.BenchF.InProd:InnerProduct(byref,System.Double[][],System.Double[][],int,int) Top method improvements (percentages): -62.20 (-2.74% of base) : 249053.dasm - SimpleArray_01.Test:BadMatrixMul2() -3.41 (-2.29% of base) : 248561.dasm - Benchstone.BenchF.InProd:InnerProduct(byref,System.Double[][],System.Double[][],int,int) -99.27 (-2.09% of base) : 194668.dasm - LUDecomp:ludcmp(System.Double[][],int,System.Int32[],byref):int -220.67 (-1.74% of base) : 248565.dasm - Benchstone.BenchI.MulMatrix:Inner(System.Int32[][],System.Int32[][],System.Int32[][]) -2.70 (-1.71% of base) : 249057.dasm - SimpleArray_01.Test:Test2() -66.30 (-1.41% of base) : 251012.dasm - Benchstone.BenchI.Array2:Bench(int):bool -18.40 (-1.37% of base) : 251011.dasm - Benchstone.BenchI.Array2:VerifyCopy(System.Int32[][][],System.Int32[][][]):bool -639.25 (-0.97% of base) : 252241.dasm - Complex_Array_Test:Main(System.String[]):int -9.05 (-0.77% of base) : 194667.dasm - LUDecomp:build_problem(System.Double[][],int,System.Double[]) -5.51 (-0.77% of base) : 251010.dasm - Benchstone.BenchI.Array2:Initialize(System.Int32[][][]) -9.33 (-0.54% of base) : 248557.dasm - Benchstone.BenchF.InProd:Bench():bool -5.85 (-0.52% of base) : 246308.dasm - DefaultNamespace.MulDimJagAry:SetThreeDimJagVarAry(System.Object[][][],int,int):this -5.85 (-0.52% of base) : 246307.dasm - DefaultNamespace.MulDimJagAry:SetThreeDimJagAry(System.Object[][][],int,int):this -5.85 (-0.52% of base) : 246245.dasm - DefaultNamespace.MulDimJagAry:SetThreeDimJagAry(System.Object[][][],int,int):this -5.85 (-0.52% of base) : 246246.dasm - DefaultNamespace.MulDimJagAry:SetThreeDimJagVarAry(System.Object[][][],int,int):this -8.32 (-0.42% of base) : 249054.dasm - SimpleArray_01.Test:BadMatrixMul3() -3.92 (-0.32% of base) : 194666.dasm - LUDecomp:DoLUIteration(System.Double[][],System.Double[],System.Double[][][],System.Double[][],int):long -132.59 (-0.26% of base) : 252242.dasm - Simple_Array_Test:Main(System.String[]):int -1.89 (-0.22% of base) : 228622.dasm - SciMark2.LU:solve(System.Double[][],System.Int32[],System.Double[]) -4.36 (-0.22% of base) : 249052.dasm - SimpleArray_01.Test:BadMatrixMul1() 23 total methods with Perf Score differences (23 improved, 0 regressed), 0 unchanged. ``` </details> -------------------------------------------------------------------------------- ## libraries.crossgen2.windows.x64.checked.mch: ``` Summary of Code Size diffs: (Lower is better) Total bytes of base: 10828 Total bytes of diff: 10809 Total bytes of delta: -19 (-0.18% of base) Total relative delta: -0.00 diff is an improvement. relative diff is an improvement. ``` <details> <summary>Detail diffs</summary> ``` Top file improvements (bytes): -19 : 72504.dasm (-0.18% of base) 1 total files with Code Size differences (1 improved, 0 regressed), 0 unchanged. Top method improvements (bytes): -19 (-0.18% of base) : 72504.dasm - System.DefaultBinder:BindToMethod(int,System.Reflection.MethodBase[],byref,System.Reflection.ParameterModifier[],System.Globalization.CultureInfo,System.String[],byref):System.Reflection.MethodBase:this Top method improvements (percentages): -19 (-0.18% of base) : 72504.dasm - System.DefaultBinder:BindToMethod(int,System.Reflection.MethodBase[],byref,System.Reflection.ParameterModifier[],System.Globalization.CultureInfo,System.String[],byref):System.Reflection.MethodBase:this 1 total methods with Code Size differences (1 improved, 0 regressed), 0 unchanged. ``` </details> -------------------------------------------------------------------------------- ``` Summary of Perf Score diffs: (Lower is better) Total PerfScoreUnits of base: 6597.12 Total PerfScoreUnits of diff: 6586.31 Total PerfScoreUnits of delta: -10.81 (-0.16% of base) Total relative delta: -0.00 diff is an improvement. relative diff is an improvement. ``` <details> <summary>Detail diffs</summary> ``` Top file improvements (PerfScoreUnits): -10.81 : 72504.dasm (-0.16% of base) 1 total files with Perf Score differences (1 improved, 0 regressed), 0 unchanged. Top method improvements (PerfScoreUnits): -10.81 (-0.16% of base) : 72504.dasm - System.DefaultBinder:BindToMethod(int,System.Reflection.MethodBase[],byref,System.Reflection.ParameterModifier[],System.Globalization.CultureInfo,System.String[],byref):System.Reflection.MethodBase:this Top method improvements (percentages): -10.81 (-0.16% of base) : 72504.dasm - System.DefaultBinder:BindToMethod(int,System.Reflection.MethodBase[],byref,System.Reflection.ParameterModifier[],System.Globalization.CultureInfo,System.String[],byref):System.Reflection.MethodBase:this 1 total methods with Perf Score differences (1 improved, 0 regressed), 0 unchanged. ``` </details> -------------------------------------------------------------------------------- * Increase loop cloning max allowed condition blocks Allows inner loop of 3-nested loops (e.g., Array2 benchmark) to be cloned. * Clear GTF_INX_RNGCHK bit on loop cloning created index nodes to avoid unnecessary bounds checks. Revert max cloning condition blocks to 3; allowing more doesn't seem to improve performance (probably too many conditions before a not-sufficiently-executed loop, at least for the Array2 benchmark) * Remove outer index bounds checks * Convert loop cloning data structures to `vector` for better debugging * Improve CSE dump output 1. "#if 0" the guts of the CSE dataflow; that's not useful to most people. 2. Add readable CSE number output to the CSE dataflow set output 3. Add FMT_CSE to commonize CSE number output. 4. Add PHASE_OPTIMIZE_VALNUM_CSES to the pre-phase output "allow list" and stop doing its own blocks/trees output. 5. Remove unused optCSECandidateTotal 6. Add functions `getCSEAvailBit` and `getCSEAvailCrossCallBit` to avoid hand-coding these bit calculations in multiple places, for the CSE dataflow set bits. * Mark cloned array indexes as non-faulting When generating loop cloning conditions, mark array index expressions as non-faulting, as we have already null- and range-checked the array before generating an index expression. I also added similary code to mark array length expressions as non-faulting, for the same reason. However, that leads to CQ losses because of downstream CSE effects. * Don't count zero-sized align instructions in PerfScore * Add COMPlus_JitDasmWithAlignmentBoundaries This outputs the alignment boundaries without requiring outputting the actual addresses. It makes it easier to diff changes. * Improve bounds check output * Improve emitter label printing Create function for printing bound emitter labels. Also, add debug code to associate a BasicBlock with an insGroup, and output the block number and ID with the emitter label in JitDump, so it's easier to find where a group of generated instructions came from. * Formatting * Clear BBF_LOOP_PREHEADER bit when compacting empty pre-header block * Keep track of all basic blocks that contribute code to an insGroup * Update display of Intel jcc erratum branches in dump For instructions or instruction sequences which match the Intel jcc erratum criteria, note that in the alignment boundary dump. Also, a few fixes: 1. Move the alignment boundary dumping from `emitIssue1Instr` to `emitEndCodeGen` to avoid the possibility of reading the next instruction in a group when there is no next instruction. 2. Create `IsJccInstruction` and `IsJmpInstruction` functions for use by the jcc criteria detection, and fix that detection to fix a few omissions/errors. 3. Change the jcc criteria detection to be hard-coded to 32 byte boundaries instead of assuming `compJitAlignLoopBoundary` is 32. An example: ``` cmp r11d, dword ptr [rax+8] ; ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (cmp: 0 ; jcc erratum) 32B boundary ............................... jae G_M42486_IG103 ``` In this case, the `cmp` doesn't cross the boundary, it is adjacent (the zero indicates the number of bytes of the instruction which cross the boundary), followed by the `jae` which starts after the boundary. Indicating the jcc erratum criteria can help point out potential performance issues due to unlucky alignment of these instructions in asm diffs. * Display full instruction name in alignment and other messages XArch sometimes prepends a "v" to the instructions names from the instruction table. Add a function `genInsDisplayName` to create the full instruction name that should be displayed, and use that in most places an instruction name will be displayed, such as in the alignment messages, and normal disassembly. Use this instead of the raw `genInsName`. This could be extended to handle arm32 appending an "s", but I didn't want to touch arm32 with this change. * Fix build * Code review feedback 1. Rename GTF_INX_NONFAULTING to GTF_INX_NOFAULT to increase clarity compared to existing GTF_IND_NONFAULTING. 2. Minor cleanup in getInsDisplayName. * Formatting
2021-07-10	Handle a missing case in zero-extend peephole (#55129)	Jakob Botsch Nielsen

2021-07-10	Spill single-def variable at definition to avoid further spilling (#54345)	Kunal Pathak
	* Print single-def * Rename lvEhWriteThruCandidate->lvSingleDefRegCandidate, introduce isSingleDef * Introduce singleDefSpillAfter If a single-def variable is decided to get spilled in its lifetime, then spill it at the firstRefPosition RefTypeDef so the value of the variable is always valid on the stack. Going forward, no more spills will be needed for such variable or no more resolutions (reg to stack) will be needed for such single-def variables. * jit format * some fixes * wip * Add check of isSingleDef in validateInterval() * Make isSingleDef during buildIntervals * minor fix in lclvars.cpp * some fixes after self CR * Updated some comments * Remove lvSpillAtSingleDef from some asserts * Use singleDefSpill information in getWeight() * Remove lvSpillAtSingleDef from some more checks * Mark lvSpillAtSingleDef whenever refPosition->singleDefSpill==true * Add TODO for SingleDefVarCandidate * Some notes on setting singleDefSpill * jit format * review feedback * review comments
2021-07-09	Fix helloworld on x86 Linux (#55095)	t-mustafin
	Make managed->managed call use of ecx, edx to pass first two arguments. Make stack alignment returning after call.
2021-07-09	Enregister structs on win x64. (#55045)	Sergey Andreenko
	* Enreg structs x64 windows. * try to get zero diffs on other platforms. * fix comment
2021-07-09	Loop alignment: Handle blocks added in loop as part of split edges of LSRA ↵	Kunal Pathak
	(#55047) * Handle blocks added in loop as part of split edges of LSRA If there are new blocks added by LSRA and modifies the flow of blocks that are in loop, then make sure that we do not align such loops if they intersect with last aligned loop. * Retain LOOP_ALIGN flag of loops whose start are same * jit format * review feedback
2021-07-09	Don't use GT_ARR_ELEM as a location/value (#54780)	SingleAccretion
	* Don't use GT_ARR_ELEM as a location It represents an address. No diffs. * Clarify the purpose of GenTreeArrElem
2021-07-09	Print indices of assertions instead of raw bitsets (#54928)	SingleAccretion
	* Add JITDUMPEXEC macro For use in contexts where some printing method should only be executed when "verbose" is true. * Add helpers for printing assertion indexes * Print assertion indices instead of raw bitsets To aid in understanding what assertions are being propagated and merged when reading the dumps. * Don't print VNs for the same assertion twice * Also correctly print VNs in CopyProp * Align "in"s with "out"s for final assertions * Don't print the assertion dataflow in usual dumps It can still be enabled if needed.
2021-07-08	JIT: Set default value for JitExtDefaultPolicyProfTrust 0 (#55229)	Egor Bogatov

2021-07-08	Do not setup both an inlined call frame and perform a reverse pinvoke in the ↵	David Wrighton
	same function (#55092) - Pushing/popping the frame chain requires that the thread be in cooperative mode - Before the reverse pinvoke logic engages, the thread is in preemptive mode - The current implementation of InlinedCallFrame and reverse p/invoke sets up the InlinedCallFrame before the reverse p/invoke transition, which is unsafe - A reverse pinvoke function directly calling back into native code is fairly rare, and so optimizing the situation is of marginal utility. - The fix is to use the pinvoke helpers logic to setup the pinvoke instead of relying on the InlinedCallFrame logic. This avoid the problem of incorrect ordering in the prolog/epilog, but moving inlined call frame handling to point of use. Fix bug #45326
2021-07-08	fix formatting errors (#55320)	Kunal Pathak

2021-07-08	JIT: Drop redundant static initializations from ↵	Egor Bogatov
	(Equality)Comparer<T>.Default (#50446) Co-authored-by: Andy Ayers <andya@microsoft.com>
2021-07-08	Fix perf regressions with a recent change. (#55300)	Sergey Andreenko
	* Fix the perf regressions. * improve comments * Update src/coreclr/jit/lower.cpp Co-authored-by: Kunal Pathak <Kunal.Pathak@microsoft.com> Co-authored-by: Kunal Pathak <Kunal.Pathak@microsoft.com>
2021-07-08	Use correct assignedInterval for SPILL_COST calculation (#55247)	Kunal Pathak

2021-07-07	Disable folding of implementation-defined casts (#53782)	SingleAccretion
	* Enhance the FloatOvfToInt2 test to exercise VN paths It is not very effective right now because the "bad" combinations of types are morphed into helpers too early, but it will be helpful when we enable folding outside of the importer for those cases too. * Enhance the FloatOvfToInt2 test to cover importer It now fails on Windows x86, where many of the previous similar failures were observed. * Disable the test on Mono * Re-enable tests disabled against #13651 * Re-enable tests disabled against #51346 * Re-enable tests disabled against #47374 * Disable folding in gtFoldExprConst * Disable folding in VN * Temporarily promote the test to Pri0 * Reword the comment * Move the tests back to Pri1 Where they originally were.
2021-07-05	JIT: don't report profile for methods with insufficient samples during ↵	Egor Bogatov
	prejitting (#55096) * Don't report profile for methods with insufficient samples for PREJIT
2021-07-05	Fix morph negation transforms (#55145)	SingleAccretion
	* Add a test * Fix the SUB(NEG(a), (NEG(b)) => SUB(b, a) case Also relax the condition for SUB(a, (NEG(b)) => ADD(a, b). * Fix the ADD((NEG(a), b) => SUB(b, a) case Also relax the condition for ADD(a, (NEG(b)) => SUB(a, b).
2021-07-05	Allow implicit widenings when tailcalling (#54864)	Jakob Botsch Nielsen
	The managed calling convention dictates that the callee is responsible for widening up to 4 bytes. We can use this to allow some more tailcalls. Fix #51957
2021-07-02	Delete `compQuirkForPPP`. (#55050)	Sergey Andreenko

2021-07-01	Poison address-exposed user variables in debug (#54685)	Jakob Botsch Nielsen
	* Poison address exposed user variables in debug code Fix #13072 * Run jit-format * Use named scratch register and kill it in LSRA * Enable it unconditionally for testing purposes * Remove unnecessary modified reg on ARM * Fix OSR and get rid of test code * Remove a declaration * Undo modified comment and use modulo instead of and * Add a test * Rephrase comment Co-authored-by: Kunal Pathak <Kunal.Pathak@microsoft.com> * Disable poisoning test on mono * Remove outdated line Co-authored-by: Kunal Pathak <Kunal.Pathak@microsoft.com>
2021-07-01	Mark vars as do not enreg earlier in minopts. (#54998)	Sergey Andreenko
	* Improve morphblock logic. * change lclVars. * Extract compEnregLocals * same for args/locals
2021-07-01	[JIT] Improve inliner: new heuristics, rely on PGO data (#52708)	Egor Bogatov
	Co-authored-by: Andy Ayers <andya@microsoft.com>
2021-06-30	Print spillweight of RefPosition (#54933)	Kunal Pathak

2021-06-29	Remove some unneeded code from division morphing (#53464)	SingleAccretion
	* Remove GTF_UNSIGNED check from the condition It is not necessary: GTF_UNSIGNED does not have anything to do with the operands being unsigned. Some positive diffs in runtime tests for win-x86 and one regression in System.Net.WebSockets.ManagedWebSocket.ApplyMask. The regressions is because we generate two "div"s for a long UMOD on x86 with a constant divisor, always, even for powers of two. Something to improve for sure. Naturally, no diffs for win-x64, linux-x64 or linux-arm. * Don't fold casts from constants in UMOD morphing It used to be that "ldc.i4.1 conv.i8" sequences survived importation, and since UMOD morphing is sensitive to constant divisors, morph tried to fold them. This is no longer the case, so stop doing that. Of course, morph can be called from anywhere at any point, but if some code is creating casts from constants, the proper place to fix is that code. No diffs for win-x86 or win-x64 or linux-arm. * Some code modernization Use modern helpers and move comments around.
2021-06-29	Fix unreached during dump. (#54861)	Sergey Andreenko

2021-06-29	Fix lowering usage of an unset LSRA field. (#54731)	Sergey Andreenko
	* Add repro. * fix the issue. * delete a dead condition * add a todo. * Fix the failures.
2021-06-26	Extract fgMorph(Init/Copy)Block into their own classes. (#53983)	Sergey Andreenko
	* Rewrite fgMorph(Copy/Init)Block. * fix a last squash error. * fix nits
2021-06-25	Fix CQ regression & correctness bug in morphing of long muls (#53566)	SingleAccretion
	* Add a test covering GTF_MUL_64RSLT transform * Disable the test on Mono * Add genActualTypeIsInt helper * Add some gtFlags helpers * Prepare decomposition for new long muls * Update gtIsValid64RsltMul To understand the new format for long muls. * Rework morphing of long muls Previously, morph was looking for the exact pattern of MUL(CAST(long <- int), CAST(long <- int)) when assessing candidacy of GT_MUL for being marked with "GTF_MUL_64RSLT" and emitted as "long mul". This worked fine, until the importer was changed to fold all casts with constant operands. This broke the pattern matching and thus all MULs in the form of (long)value * 10 started being emitted as helper calls. This change updates morph to understand the new folded casts and in general updates the "format" of long mul from "CAST * CAST" to "CAST * (CAST \| CONST)". In the process, new helper functions have been introduced, to avoid bloating fgMorphSmpOp with the new sizeable logic. Recognition of overflowing cases has been upgraded, and a correctness bug, where "checked((long)uint.MaxValue * (long)uint.MaxValue)" was wrongly treated as non-overflowing, fixed. Additionally, the logic to emit intermediate NOPs has been changed to instead always skip morphing the casts themselves, even when remorphing. * Add the script to generate the longmul test The test itself has been regenerated using it and there were no diffs, as expected.
2021-06-25	Eliminate chained casts to small types (#52561)	SingleAccretion
	We can take advantage of the implicit zero/sign-extension for small integer types and eliminate some casts.
2021-06-25	Optimize `CAST(int <- long)` on 32 bit targets (#53040)	SingleAccretion
	* Optimize CAST(int <- long) on 32 bit targets * Revert "Optimize CAST(int <- long) on 32 bit targets" Revert the implementation in lowering * Optimize CAST(int <- long) on 32 bit targets Move the code from lowering to long decomposition. * Fixed the "Arguments" note for DecomposeNode * Added the function header For OptimizeCastFromDecomposedLong. * Remove the TODO comment While correct, it has questionable value. * Add a more detailed dump output * Do not try to optimize checked casts It is easy to get it wrong. Let the frontend handle this. * Do not depend on tree order Previous version of the code assumed that there could be no nodes between the cast and its operand. That is not a correct assumption to make in LIR.
2021-06-25	Disallow promotion of HVA structs when their fields of TYP_SIMD8 were ↵	Egor Chesakov
	promoted as plain structs (#54694)
2021-06-25	defMAC construction up a scope to make it live long enough. (#54702)	Jeremy Koritzinsky
	Fixes #54649
2021-06-24	Add args descriptions for VNF_MapStore and VNF_MapSelect (#54108)	SingleAccretion

2021-06-24	Fix instruction hex display (#54675)	Jakob Botsch Nielsen

2021-06-23	Keep obj node for ArrayIndex. (#54584)	Sergey Andreenko

2021-06-19	Fix AMD64 epilog ABI (#54357)	Bruce Forstall
	* Fix AMD64 epilog ABI The Windows AMD64 epilog ABI states that an `lea rsp,[rbp+constant]` instruction may only be used if a frame pointer has been reported to the OS in the prolog unwind info, otherwise an `add rsp, constant` instruction must be used. There were a number of cases where the JIT used the `lea` form simply because a frame pointer had been established and was available, even though it had not been reported to the OS (and, thus, the frame was effectively an `rsp` frame). Fix this by using the same condition in the epilog for determining which form to use, `lea` or `add`, that was used in the prolog to determine whether or not to report the frame pointer in the unwind info. Fixes #54320 * Formatting * Fix OSR
2021-06-18	Fix Linux x86 build (#50836)	Gleb Balykov

2021-06-17	Improve live variable JitDump output (#54256)	Bruce Forstall
	The variable live range output is unnecessarily verbose. Simplify it; clean it up; make it smaller; use standard dumpers. Example, before: ``` //////////////////////////////////////// //////////////////////////////////////// Variable Live Range History Dump for Block 2 IL Var Num 0: [rcx [ (G_M13669_IG02,ins#0,ofs#0), (G_M13669_IG03,ins#1,ofs#2) ]; rbp[16] (1 slot) [ (G_M13669_IG03,ins#1,ofs#2), NON_CLOSED_RANGE ]; ] IL Var Num 1: [rsi [ (G_M13669_IG03,ins#1,ofs#2), NON_CLOSED_RANGE ]; ] //////////////////////////////////////// //////////////////////////////////////// End Generating code for Block 2 ``` After: ``` Variable Live Range History Dump for BB02 V00 this: rcx [(G_M13669_IG02,ins#0,ofs#0), (G_M13669_IG03,ins#1,ofs#2)]; rbp[16] (1 slot) [(G_M13669_IG03,ins#1,ofs#2), ...] V01 loc0: rsi [(G_M13669_IG03,ins#1,ofs#2), ...] ``` And the end-of-dump output, before: ``` //////////////////////////////////////// //////////////////////////////////////// PRINTING VARIABLE LIVE RANGES: IL Var Num 0: [rsi [18 , B5 )rsi [100 , 13A )rsi [14D , 186 )rsi [196 , 1C5 )rsi [1E3 , 271 )rsi [280 , 285 )] IL Var Num 1: [rdi [18 , B9 )rdi [100 , 137 )rdi [14D , 184 )rdi [196 , 1C2 )rdi [1E3 , 271 )rdi [280 , 288 )] IL Var Num 2: [rbx [18 , CA )rbx [100 , 10D )rbx [14D , 15A )rbx [196 , 1C7 )rbx [1E3 , 271 )rbx [280 , 28B )] IL Var Num 3: [rbp [3A , F0 )rbp [100 , 141 )rbp [14D , 18C )rbp [196 , 1D6 )rbp [1E3 , 275 )] IL Var Num 4: [r14 [3E , EC )r14 [100 , 13D )r14 [14D , 188 )r14 [196 , 1D2 )r14 [1E3 , 271 )] IL Var Num 5: [rcx [22A , 263 )] //////////////////////////////////////// //////////////////////////////////////// ``` After: ``` VARIABLE LIVE RANGES: V00 arg0: rsi [18, B5); rsi [100, 13A); rsi [14D, 186); rsi [196, 1C5); rsi [1E3, 271); rsi [280, 285) V01 arg1: rdi [18, B9); rdi [100, 137); rdi [14D, 184); rdi [196, 1C2); rdi [1E3, 271); rdi [280, 288) V02 arg2: rbx [18, CA); rbx [100, 10D); rbx [14D, 15A); rbx [196, 1C7); rbx [1E3, 271); rbx [280, 28B) V03 loc0: rbp [3A, F0); rbp [100, 141); rbp [14D, 18C); rbp [196, 1D6); rbp [1E3, 275) V04 loc1: r14 [3E, EC); r14 [100, 13D); r14 [14D, 188); r14 [196, 1D2); r14 [1E3, 271) V05 loc2: rcx [22A, 263) ```
2021-06-16	lvDoNotEnregister is not set when we are calling this function. (#54199)	Sergey Andreenko

2021-06-15	Split 16-byte SIMD store around GC struct fields into two 8-byte SIMD stores ↵	Egor Chesakov
	(x86)/ two 8-byte mov-s (x64) (#53116) Fixes #51638 by using 1) Constructing `ASG(OBJ(addr), 0)` for structs that have GC fields and keeping the current IR (i.e. `ASG(BLK(addr), 0)`) for other types. Such bookkeeping would allow the JIT to maintain information about the class layout. 2a) Emitting a sequence of `mov [m64],r64` instead of `movdqu [m128],xmm` when zeroing structs with GC fields that are not guaranteed to be on the stack on win-x64 or linux-x64. 2b) Emitting a sequence of `movq [m64],xmm` when zeroing such structs on win-x86.
2021-06-14	Do not mark op2 as delayRegFree if op1==op2 (#53964)	Kunal Pathak
	* Do not mark op2 as delayRegFree if op1==op2 * Revert NodesAreEquivalentLeaves change * Pass rmwNode to `BuildDelayFreeUses()` which does the right thing * Make similar change in arm64 * remove TODO comment * review feedback
2021-06-14	Ensure Vector.Sum uses SSE3, rather than SSSE3, for floating-point (#54123)	Tanner Gooding
	* Adding a JIT/SIMD test validating Vector.Sum * Ensure Vector.Sum uses SSE3, rather than SSSE3, for floating-point * Ensure we do ISA checks before popping values from the stack * Applying formatting patch
2021-06-13	Port IsRedundantMov to xarch (#54075)	Tanner Gooding
	* Port IsRedundantMov to xarch * Applying formatting patch * Responding to PR feedback