Age | Commit message (Collapse) | Author |
|
For x86 delegate invoke tailcall via helper calls,
Lower expects the `this` pointer in a LCL_VAR. Morph
(`fgMorphTailCallViaJitHelper`) was letting a LCL_FLD go through
instead of inserting a copy of the field to a temp local.
Simply change the condition in morph to match the condition in Lower.
Fixes #53568
|
|
* Vector.Sum(Vector<T>) API implementation for horizontal add.
* Fixed inccorrect referece to Arm64 AddAccross intrinsic function.
* Added implementation for hardware accelerated Vector<T>.Sum for long, ulong, float, double on ARM64.
* Fixed formatting issue.
* Correctness.
* Fixed compiler error for ARM64.
* Formatting issue.
* More explicit switch statement. Fixed wrong simd size for NI_Vector64_ToScalar.
* Fixed auto formatting issue.
* Use AddPairwiseScalar for double, long and ulong on ARM64 for VectorT128_Sum.
* Forgot ToScalar call after AddPairwiseScalar.
* Fixed wrong return type.
|
|
* Consider reloadWeight while evaluating spill cost
* fix linux arm issue
* jit format
|
|
Address deficiencies in current devirtualization infrastructure
- Remove the responsibility of creating a CORINFO_RESOLVED_TOKEN structure from the JIT and make it a responsibility of the VM side of the jit interface.
- This enables the component (crossgen2) which has deeper understanding of the requirements here to correctly handle scenarios that would otherwise require expressing crossgen2 specific details across the jit interface.
- Add a new set of fixups (`READYTORUN_FIXUP_Check_VirtualFunctionOverride` and `READYTORUN_FIXUP_Verify_VirtualFunctionOverride`) these are used to validate that the behavior of the runtime and crossgen2 compiler is equivalent for a virtual resolution event
- `READYTORUN_FIXUP_Check_VirtualFunctionOverride` will ensure that the virtual resolution decision is the same at crossgen2 time and runtime, and if the decision differs, any generated code affected by the decision will not be used.
- `READYTORUN_FIXUP_Verify_VirtualFunctionOverride` will perform the same checks as `READYTORUN_FIXUP_Check_VirtualFunctionOverride`, but if it fails the check, the process will be terminated with a fail-fast. It is intended for use under the `--verify-type-and-field-layout` stress mode.
- Currently only the `READYTORUN_FIXUP_Verify_VirtualFunctionOverride` is actually generated, and it is only generated when using the `--verify-type-and-field-layout` switch to crossgen2. Future work will identify if there are scenarios where we need to generate the `READYTORUN_FIXUP_Check_VirtualFunctionOverride` flag. One area of possible concern is around covariant returns, another is around handling of type equivalence.
- In order to express the fixup signature for the VirtualFunctionOverride fixups, a new flag has been added to `ReadyToRunMethodSigFlags`. `READYTORUN_METHOD_SIG_UpdateContext` will allow the method signature to internally specify the assembly which is associated with the method token, instead of relying on the ambient context.
- R2RDump and the ReadyToRun format documentation have been updated with the details of the new fixups/flags.
- Update the rules for handling unboxing stubs
- See #51918 for details. This adds a new test, as well as proper handling for unboxing stubs to match the JIT behavior
- Also revert #52605, which avoided the problem by simply disabling devirtualization in the presence of structs
- Adjust the rules for when it is legal to devirtualize and maintain version resiliency
- The VersionsWithCode and VersionsWithType rules are unnecessarily restrictive.
- Instead Validate that the metadata is safely checkable, and rely on the canInline logic to ensure that no IL that can't be handled is inlined.
- This also involved adding a check that the chain of types from the implementation type to the declaration method table type is within the version bubble.
- And changing the `VersionsWithType` check on the implementation type, to a new `VersionsWithTypeReference` check which can be used to validate that the type can be referred to, in combination with using `VersionsWithType` on the type definition.
- By adjusting the way that the declMethod is referred to, it becomes possible to use the declMethod without checking the full method is `VersionsWithCode`, and it just needs a relationship to version matching code.
- In `resolveVirtualMethod` generate the `CORINFO_RESOLVED_TOKEN` structures for the jit
- In particular we are now able to resolve to methods where the decl method is the resolution result but is not within the version bubble itself. This can happen if we can prove that the decl method is the only method which can possibly implement a virtual.
- Add support for devirtualization reasons to crossgen2
- Port all devirtualization abort conditions to crossgen2 from runtime that were not already present
- Fix devirtualization from a canonical virtual method when the actual implementation is more exact
- Fix variant interface override scenario where there is an interface that requires implementation of the variant interface as well as the variant interface itself.
|
|
* Add a repro.
* small ref of `VNApplySelectorsAssignTypeCoerce`.
* Fix the bug.
* fix comment
* Update src/coreclr/jit/valuenum.cpp
Co-authored-by: Andy Ayers <andya@microsoft.com>
* Update src/coreclr/jit/valuenum.cpp
Co-authored-by: Andy Ayers <andya@microsoft.com>
Co-authored-by: Andy Ayers <andya@microsoft.com>
|
|
* Fix gcc armel build
Mostly signed/unsigned comparisons, etc.
* Fix gcc arm64 build
|
|
This reverts commit 7df92fd478aef22d4d98693e64d93730bc513e29.
|
|
Co-authored-by: Andy Ayers <andya@microsoft.com>
|
|
1. Add LclFld output: same as LclVar plus offset
2. Remove extra trailing GenTree brace
|
|
Revise the reporting of the special stack slots for OSR to be more uniform.
* Always record the original method FP-relative offset.
* Always apply the same adjustment for original method slosts i the OSR frame
* Handle caller-SP relative adjustment in `lvaToCallerSPRelativeOffset`
In particular, this fixes #43534 where we were reporting the wrong caller SP
for the profiler exit hook.
|
|
* Lsrstats
* jit format
* use blocks iterator
|
|
* Do byref liveness updates for same register GPR moves on x86/x64
* Change where emitLastEmittedIns is tracked
* Ensure emitLastEmittedIns isn't tracked across instruction groups
|
|
* Add dummy support for s390x in vm, jit, debug, and unwinder
* This suffices to make clr.iltools and clr.paltests buildable
|
|
* Update instruction table with accurate EFlags information
* Revert "Add issues.targets entry for the GitHub_13822 test to make CI green (#53789)"
This reverts commit bd9ba598a0a3417510d318472d3c0f6641cdba93.
* minor fixup
* Fix some more instructions
* review comments
|
|
Add more iterators compatible with range-based `for` syntax for various data structures. These iterators all assume (and some check) that the underlying data structures determining the iteration are not changed during the iteration. For example, don't use these to iterate over the predecessor edges if you are changing the order or contents of the predecessor edge list.
- BasicBlock: iterate over all blocks in the function, a subset starting not at the first block, or a specified range of blocks. Removed uses of the `foreach_block` macro. E.g.:
```
for (BasicBlock* const block : Blocks()) // all blocks in function
for (BasicBlock* const block : BasicBlockSimpleList(fgFirstBB->bbNext)) // all blocks starting at fgFirstBB->bbNext
for (BasicBlock* const testBlock : BasicBlockRangeList(firstNonLoopBlock, lastNonLoopBlock)) // all blocks in range (inclusive)
```
- block predecessors: iterate over all predecessor edges, or all predecessor blocks, e.g.:
```
for (flowList* const edge : block->PredEdges())
for (BasicBlock* const predBlock : block->PredBlocks())
```
- block successors: iterate over all block successors using the `NumSucc()/GetSucc()`, or `NumSucc(Compiler*)/GetSucc(Compiler*)` pairs, e.g.:
```
for (BasicBlock* const succ : Succs())
for (BasicBlock* const succ : Succs(compiler))
```
Note that there already exists the "AllSuccessorsIter" which iterates over block successors including possible EH successors, e.g.:
```
for (BasicBlock* succ : block->GetAllSuccs(m_pCompiler))
```
- switch targets, (namely, the successors of `BBJ_SWITCH` blocks), e.g.:
```
for (BasicBlock* const bTarget : block->SwitchTargets())
```
- loops blocks: iterate over all the blocks in a loop, e.g.:
```
for (BasicBlock* const blk : optLoopTable[loopInd].LoopBlocks())
```
- Statements: added an iterator shortcut for the non-phi statements, e.g.:
```
for (Statement* const stmt : block->NonPhiStatements())
```
Note that there already exists an iterator over all statements, e.g.:
```
for (Statement* const stmt : block->Statements())
```
- EH clauses, e.g.:
```
for (EHblkDsc* const HBtab : EHClauses(this))
```
- GenTree in linear order (but not LIR, which already has an iterator), namely, using the `gtNext` links, e.g.:
```
for (GenTree* const call : stmt->TreeList())
```
This is a no-diff change.
|
|
* Eliminate intermediate casts to double on ARM64
Enables the same optimization that has been there for
a while for AArch32 for AArch64 too.
* Fix a typo
|
|
* Optimize constant localloc on x64
Avoid popping off the outgoing arg space just to push it back later,
for constant size localloc. This typically removes two `sub rsp`
instructions per case, although we don't do the `push 0` "optimization",
so sometimes there are more instructions removed.
* Fix clang build error
|
|
* Correctly track how x86 instructions read/write flags
* For GT_EQ/GT_NE, reuse flag
* Explicit flags for jcc, setcc, comvcc
* Add reset flags
* remove duplicate enum
* Handle cases where shift-amount is 0
* Add helper method for Resets OF/CF flags
* Rename methods
* one more rename
* review feedback
Co-authored-by: Tanner Gooding <tagoo@outlook.com>
|
|
* Remove some unused defines and functions
* Delete the _CROSS_COMPILER_ define
It is also unused.
* Also fix a typo while I am here
* Delete #define DUMPER
* Delete #include's under #ifdef ICECAP
* Delete MAX/MIN_SHORT_AS_INT defines
|
|
* Print LSRA block sequence progress
* review comment
|
|
* add a repro
* passed spmi.
* update comment.
* update the test
* improve the check.
* fix a stressfailure
* fix x64 unix diff
|
|
When jump threading, we had been insisting that the redundant branch be
the immediate dominator (idom) of the branch being optimized. But now that we
are doing exact reachability checks, we can consider threading based on
redundant branches higher in the (original) dominator tree, provided that the
in-between branches have also been optimized.
This situation arises when there are repeated redundant branches in if-then
or if-then-else chains. Say there are 3 such branches. The algorithm works
top-down in the original dominator tree. To begin with, the second branch
in the chain is jump-threaded since it redundant to the first. As part
of this the second branch's block is modified to fall through. Now the
third branch's idom is the block that held the second branch, but this block
no longer ends with a branch. So the nearest redundant dominating branch is the
first branch in the chain. And this arrangement blocks jump-threading the third
branch as the redundant branch block is not in the idom block.
Jump threading will work correctly so long as there are unique paths from the
redundant branch outcomes to the branch being optimized. So we have appropriate
safety checks and can consider threading based on higher dominating branches.
Resolves #53501.
|
|
If we introduce a temp for the switch operand, the switch node may have extra flags
set that it doesn't need. Reset these based on the operand.
Closes #53548.
|
|
|
|
* Add support for AvxVnni instructions under Experimental.
* Add support for AvxVnni instructions
* Add preveiw feature attribute
* Handle operands in lsra
* Undo changes for Experimental
* Update JITEEVersionIdentifier and fix remaining issues
* Resolve Mono CI failure
* Disable tests
* Disable Vector128 tests
* Modify disable tests
Co-authored-by: Tanner Gooding <tagoo@outlook.com>
|
|
Add a config setting to randomly choose one of the observed classes for
guarded devirtualization, rather than the most likely class.
|
|
The method has been refactored and the code that was
causing problems for the optimizer no longer exists.
|
|
|
|
|
|
|
|
* add a repro test.
* LclVar which addresses are taken should be marked as doNotEnreg.
Check that we don't have independently promoted LCL_VAR that are references after lowering.
Check that all LclVars that have ADDR() on top of them are marked as doNotEnreg.
In the past when we did not enregister structs we were allocating them on the stack even without doNotEnreg set.
|
|
* Delete references to `GT_PHI_ARG/GT_PHI` after rat.
* Delete references after Rat.
* add check that we don't see them.
|
|
(#51440)
* Added a missing license header
* Added a test verifying that checked arithmetic is correct
* Added a test verifying that checked casts are correct
* Disabled the test for checked casts on Mono
* Refactored VNEvalShouldFold
* Refactored gtFoldExprConst to use helpers and follow the common code style
* Fixed the comment stating TYP_BYREF has no zero value
* Moved checking of overflow for arithmetic operations from gtFoldExprConst into a separate namespace
* Implemented folding of overflow arithmetic in value numbering
* Fixed some typos in valuenum.cpp/h
* Added identity-based evaluation for overflow arithmetic
* Made the definition of var_types a standalone header so that it can be safely #included'ed in utils.h
* Refactored gtFoldExpr some more, moved the overflow checking logic to CheckedOps, implemented overflow checking for floating point -> integer casts
* Implemented folding of checked casts in value numbering
* Demote the tests to Tier1
They throw and catch quite a few exceptions.
* Fixed a comment
UINT32 -> UINT64
* Made arithmetic CheckedOps functions templated
Reduces code duplication and obviates the need for
some conditions and casts.
They use the implementation from the Long* variants of
the old functions, except for "SubOverflows", where some
instantiations, unreachable at runtime, were using "size_t" as the
type argument and causing warnings. The relevant part of "AddOverflows"
has been inlined into "SubOverflows".
* Move the locals under "if" to avoid shadowing
* Use ClrSafeInt instead of custom code
* Fix a copy and paste mistake
Co-authored-by: Anton Lapounov <anton.lapounov@microsoft.com>
* Update src/coreclr/jit/utils.cpp
* Apply suggestions from code review
Co-authored-by: Anton Lapounov <anton.lapounov@microsoft.com>
* Assert type != TYP_BYREF in VNEvalShouldFold
The method is not prepared to handle them.
Also add a note about that to the header.
Also delete TODO-Review about it.
Right now the only caller of VNEvalShouldFold guards against
TYP_BYREF folding, so this assert is a safety measure against
future callers not taking byrefs into account.
* Drop the MAX_ prefix from MIN
Co-authored-by: Anton Lapounov <anton.lapounov@microsoft.com>
|
|
Several changes to help better diagnose PGO and devirtualization issues:
* Report the source of the PGO data to the jit
* Report the reason for a devirtualization failure to the jit
* Add checking mode that compares result of devirtualization to class profile
* Add reporting mode to assess overall rates of devirtualization failure
when the jit has exact type information.
Also fix a loophole where in some case we'd still devirtualize if not
optimizing.
Note crossgen2 does not yet set devirtualization failure reasons.
|
|
|
|
* fix a condition.
* Move `PromoteLongVars` to `DecomposeLongs`.
* response review
|
|
* Fix an issue with Vector128.WithElement around unused nodes for pre SSE4.1
* Fixing the expected exception for a structreturn test
* Ensure we check if the baseline SIMD ISAs are supported in morph
* Ensure TYP_SIMD12 LclVar can be cloned in lowering
* Fixing up the non SSE41 path for WithElement
* Applying formatting patch
* Ensure ReplaceWithLclVar lowers the created LclVar and assignment
* Don't check the JitLog for compiled methods when the baseline ISAs aren't supported
* Address PR feedback
* Responding to more PR feedback
* Applying formatting patch
* Fixing the more PR review feedback
|
|
|
|
* Stringify RMWStatus for the dump
* Use a simpler switch-based implementation
* Made the string into a description
|
|
* Added a test
Verifying that checked '.un' casts from floats to small types are not treated as casts from unsigned types.
* Do not mark casts from FP types with GTF_UNSIGNED
It used to be that in the importer, whether the cast was to
be marked as GTF_UNSIGNED was decided exclusively based on
the incoming opcode. However, the flag only makes sense
for casts from integral sources, and it turns out morph
had a bug where it failed to clear this flag which resulted in
bad codegen.
The bug went as follows: "gtMorphCast" turns casts from an FP
type to a small integer into a chain of casts:
CAST(small integer <- FP) => CAST(small integer <- CAST(TYP_INT <- FP)).
On 32 bit platforms, the code failed to clear the GTF_UNSIGNED flag
from the original tree, which meant that the outer cast thought it
had TYP_UINT as the source. This matters for checked casts:
conv.ovf.i2.un(-2.0d), which is a legitimate conversion, was interpreted
wrongly as an overflowing one as the resulting codegen only checked the
upper bound via an unsigned compare.
The fix is two-fold: clear GTF_UNSIGNED for GT_CAST nodes with FP sources
on creation and unify the 64 bit and 32 bit paths in "gtMorphCast",
which, after the removal of GTF_UNSIGNED handling, are identical.
This is a zero-diff change across all SPMI collections for Windows x64, Linux x64,
Linux ARM64.
This **is not** a zero-diff change for Windows x86.
Instances of bad codegen have been corrected in some tests.
* Assert instead of normalizing
Instead of normalizing GTF_UNSIGNED for FP sources in "gtNewCastNode",
assert that it is not set in GenTreeCast's constructor and fix the
importer to respect that constraint.
|
|
* Specified the parameter name for ReinterpretHexAsDecimal
* Refactored "genUnsignedType", now "varTypeToUnsigned"
Renamed "genUnsignedType" to "varTypeToUnsigned" to conform to the existing
naming convention, moved its definition from "compiler.hpp" to "vartype.h",
made it a templated function like all the other "varType*" functions.
Deleted the equivalent but unused "varTypeSignedToUnsigned".
* Deleted "genSignedType" and renamed "varTypeUnsignedToSigned"
"genSignedType" had confusing semantics where it only returned the actual
signed type for TYP_UINT and TYP_ULONG. Deleted the function and made
the callsites explicitly request that behavior.
Also renamed "varTypeUnsignedToSigned" to "varTypeToSigned" for parity
with "varTypeToUnsigned" and made it a templated function.
* Made "genActualType" a templated function
* Made "genTypeStSz" a templated function
Also renamed the parameters for it and "genTypeSize" to be
consistent with "genActualType".
|
|
In particular we need to set `GTF_DONT_CSE` so that CSE doesn't
introduce commas under `GT_JTRUE` nodes.
Fixes #52785.
|
|
|
|
|
|
|
|
If a value class method returns a struct, and its unboxed entry point
requires a type context argument, make sure to pass the context argument
properly.
Also, fetch the type context (method table) from the box, rather than
creating it from the class handle we have on hand; this tends to produce
smaller code as we're often fetching the method table for other reasons
anyways.
Closes #52975.
|
|
|
|
* Cut target.h into platform specific pieces.
* C_ASSERT is not used there.
* delete unused "REGNUM_MASK'.
* delete redefines of "REGMASK_BITS".
* add headers to JIT_HEADERS
|
|
Co-authored-by: Jan Kotas <jkotas@microsoft.com>
|
|
* refactor lsra
* Add missing return condition
* Minor cleanup
- summary docs
- Removed DEFAULT_ORDER
- Added order seq ID
* jit format
* fix linux build
* review feedback
* Remove TODO
|