Age | Commit message (Collapse) | Author |
|
We know we're going to overwrite it anyway.
It'd be a bit of work to coordinate not generating it at all, but setting this
flag avoids generating ~10k of the 13k string.
Differential Revision: https://reviews.llvm.org/D125180
|
|
This patch restores the symmetry between how operator new and operator delete
are handled by also inlining the content of operator delete when possible.
Patch by Fred Tingaud.
Reviewed By: martong
Differential Revision: https://reviews.llvm.org/D124845
|
|
purpose [nfc]
|
|
Like regular assignment, compound assignment operators can be assumed to
write to their left-hand side operand. So we strengthen the requirements
there. (Previously only the default read access had been required.)
Just like operator->, operator->* can also be assumed to dereference the
left-hand side argument, so we require read access to the pointee. This
will generate new warnings if the left-hand side has a pt_guarded_by
attribute. This overload is rarely used, but it was trivial to add, so
why not. (Supporting the builtin operator requires changes to the TIL.)
Reviewed By: aaron.ballman
Differential Revision: https://reviews.llvm.org/D124966
|
|
This is unneccesary work.
With this change, we skip generating and lexing ~10k of predefines twice.
A dumb benchmark of building a preamble for an empty file in a loop shows:
- before: 1.90ms/run
- after: 1.36ms/run
So this should be worth 0.5ms for each AST build and code completion.
There can be a functional difference, but it's very minor.
If the preamble contains e.g. `#ifndef __llvm__ ... #endif` then before we would
not take it. After this change we will take the branch (single-file mode takes
all branches with unknown conditions) and so gather different directives.
However I think this is negligible:
- this is already true of non-builtin macros (from included headers).
We've had no complaints.
- this affects the baseline and modified in the same way, so only makes a
difference transiently when code guarded by such an #ifdef is being edited
Differential Revision: https://reviews.llvm.org/D125179
|
|
This includes a fix for the libc++ issue I ran across with friend
declarations not properly being identified as overloads.
This reverts commit 45c07db31cc76802a1a2e41bed1ce9c1b8198181.
|
|
The ifdef is not required in the header, common::int128_t is always
defined. The function declaration must be available in lowering
regardless of the host int128_t support.
Differential Revision: https://reviews.llvm.org/D125211
|
|
This fixes the first of several cases where the state computed in phase 1 and 2 of the algorithm differs from the state computed during phase 3. Note that such differences can cause miscompiles by creating disagreements about contents of the VL and VTYPE registers at block boundaries.
In this particular case, we recognize that for the first vsetvli in a block, that if the AVL is a phi of GPR results from previous vsetvlis and the VTYPE field matches, we can avoid emitting a vsetvli as the register contents don't change. Unfortunately, the abstract state does change and that update was lost.
As noted in the test change, this can actually improve results by preserving information until later state transitions in the block. However, this minor codegen improvement is not the motivation for the patch. The motivation is to avoid cases a case where we break a key internal correctness invariant.
Differential Revision: https://reviews.llvm.org/D125133
|
|
With the demangler parenthesizing 'a >> b' inside template parameters,
because C++11 parsing of >> there, we don't really need to add spaces
between adjacent template arg closing '>' chars. In 2022, that just
looks odd.
Reviewed By: MaskRay
Differential Revision: https://reviews.llvm.org/D123134
|
|
As suggested from 02f8519502447de, this uses the
isAnyConstantBuildVector method in lieu of separate
isBuildVectorOfConstantSDNodes calls. It should
otherwise be an NFC.
|
|
Fold %x umin_seq %y to %x umin %y if %x cannot be zero. They only
differ in semantics for %x==0.
More generally %x *_seq %y folds to %x * %y if %x cannot be the
saturation fold (though currently we only have umin_seq).
|
|
D117829 added the generic "__builtin_reduce_mul" which we can use to replace the x86 specific integer mul reduction builtins - internally these were mapping to the same intrinsic already so there are no test changes required.
Differential Revision: https://reviews.llvm.org/D125222
|
|
|
|
When extracting the first lane of a predicate created using the
llvm.get.active.lane.mask intrinsic, it should give the same codegen as
when the predicate is created using the llvm.aarch64.sve.whilelo
intrinsic, since get.active.lane.mask is lowered to whilelo. This patch
ensures the codegen is the same by recognizing
llvm.get.active.lane.mask as a flag-setting operation in this case.
Differential Revision: https://reviews.llvm.org/D125215
|
|
Previously the EXPECT_AVAILABLE macros would rebuild the code at each marked
point, by expanding the cases textually.
There were often lots, and it's nice to have lots!
This reduces total unittest time by ~10% on my machine.
I did have to sacrifice a little apply() coverage in AddUsingTests (was calling
expandCases directly, which was otherwise unused), but we have
EXPECT_AVAILABLE tests covering that, I don't think there's real risk here.
Differential Revision: https://reviews.llvm.org/D125109
|
|
This reverts commit 7211d5ce07830ebfa2cfc30818cd7155375f7e47.
This version fixes a crash that caused buildbot failures with the first
version.
|
|
|
|
This is a clever cross-cutting sanity test for clang's arg parsing I suppose.
But clangd creates thousands of invocations, ~all with identical trivial
arguments, and problems with these would be caught by clang's tests.
This overhead accounts for 10% of total unittest time!
Differential Revision: https://reviews.llvm.org/D125169
|
|
These aren't needed. With them the generated predefines buffer is 13KB.
For every TestTU, we must:
- generate the buffer (3 times: parsing preamble, scanning preamble, main file)
- parse the buffer (again 3 times)
- serialize all the macros it defines in the PCH
- compress the buffer itself to write it into the PCH
- decompress it from the PCH
Avoiding this reduces unit test time by ~25%.
Differential Revision: https://reviews.llvm.org/D125172
|
|
Differential Revision: https://reviews.llvm.org/D125001
|
|
The output buffer has a 'back' member, which returns NUL when you try
it with an empty buffer. But there are no use cases that need that
additional functionality. This makes the 'back' member behave more
like STL containers' back members. (It still returns a value, not a
reference.)
Reviewed By: dblaikie
Differential Revision: https://reviews.llvm.org/D123201
|
|
Similar to the existing bitwise reduction builtins, this lowers to a llvm.vector.reduce.mul intrinsic call.
For other reductions, we've tried to share builtins for float/integer vectors, but the fmul reduction intrinsic also take a starting value argument and can either do unordered or serialized, but not reduction-trees as specified for the builtins. However we address fmul support this shouldn't affect the integer case.
Differential Revision: https://reviews.llvm.org/D117829
|
|
would suffice.
There's many instances in clang tidy checks where owning strings are used when we already have a stable string from the options, so using a StringRef makes much more sense.
Reviewed By: aaron.ballman
Differential Revision: https://reviews.llvm.org/D124341
|
|
See [[ https://github.com/llvm/llvm-project/issues/55040 | issue 55040 ]] where static members of classes declared in the anonymous namespace are incorrectly returned as member fields from lldb::SBType::GetFieldAtIndex(). It appears that attrs.member_byte_offset contains a sentinel value for members that don't have a DW_AT_data_member_location.
Reviewed By: labath
Differential Revision: https://reviews.llvm.org/D124409
|
|
Converts to SVBool are already considered as a nop, if they
are converting an operand from a ptrue or a cmp, because
they zero the extra predicate lanes by construction.
This patch adds 2 similar cases:
- The wide cmp, which were not directly recognized by the test
for other forms of cmp
- Splats of 1, which will be generated as ptrue, and as such
will also zero the extra predicate lines.
Reviewed By: paulwalker-arm, peterwaller-arm
Differential Revision: https://reviews.llvm.org/D124908
|
|
libm doesn't have overloads for the small types, so promote them to a
bigger type and use the f32 function.
Differential Revision: https://reviews.llvm.org/D125093
|
|
IIUC, the purpose of CopyUniqueClassMethodTypes is to link together
class definitions in two compile units so that we only have a single
definition of a class. It does this by adding entries to the die_to_type
and die_to_decl_ctx maps.
However, the direction of the linking seems to be reversed. It is taking
entries from the class that has not yet been parsed, and copying them to
the class which has been parsed already -- i.e., it is a very
complicated no-op.
Changing the linking order allows us to revert the changes in D13224
(while keeping the associated test case passing), and is sufficient to
fix PR54761, which was caused by an undesired interaction with that
patch.
Differential Revision: https://reviews.llvm.org/D124370
|
|
This isn't a configuration that we unfortunately can add to
the CI practically at the moment, but I do run the tests
sporadically offline in this configuration.
Differential Revision: https://reviews.llvm.org/D124993
|
|
That's a partial fix for https://github.com/llvm/llvm-project/issues/55292.
Before, known builtins behaved differently from other identifiers:
```
void f () { return F (__builtin_LINE() + __builtin_FOO ()); }
```
After:
```
void f () { return F (__builtin_LINE () + __builtin_FOO ()); }
```
Reviewed By: owenpan
Differential Revision: https://reviews.llvm.org/D125085
|
|
The initial support for the Ampere1 mistakenly signalled support for
the MTE feature. However, the core does not include the optional MTE
functionality.
Update the target parser to not include MTE for Ampere1.
Reviewed By: dmgreen
Differential Revision: https://reviews.llvm.org/D125191
|
|
This patch implements a for a target specific optimization that replaces
the cmp and csel from cttz with an and mask.
Differential Revision: https://reviews.llvm.org/D123782
|
|
This reverts commit b7d807dbcff0d9df466e0312b4fef57178d207be -- it
breaks TestMultipleDebuggers.py.
|
|
|
|
closing parenthesis is followed by a newline.
Fixes https://github.com/llvm/llvm-project/issues/54522.
This fixes regression introduced in https://github.com/llvm/llvm-project/commit/5e5efd8a91f2e340e79a73bedbc6ab66ad4a4281.
Before the culprit commit, macros in WhitespaceSensitiveMacros were correctly formatted even if their closing parenthesis weren't followed by semicolon (or, to be precise, when they were followed by a newline).
That commit changed the type of the macro token type from TT_UntouchableMacroFunc to TT_FunctionLikeOrFreestandingMacro.
Correct formatting (with `WhitespaceSensitiveMacros = ['FOO']`):
```
FOO(1+2)
FOO(1+2);
```
Regressed formatting:
```
FOO(1 + 2)
FOO(1+2);
```
Reviewed By: HazardyKnusperkeks, owenpan, ksyx
Differential Revision: https://reviews.llvm.org/D123676
|
|
This prevents an infinite loop from D123801, where code trying to reduce
the total number of bitcasts, but also handling constants, could create
the opposite transform. Prevent the transform in these case to let the
bitcast of a constant transform naturally.
Fixes #55345
|
|
Reviewed By: Patryk27
Differential Revision: https://reviews.llvm.org/D124913
|
|
|
|
When processing an entry-stmt in name resolution, attrs_ was
reset before SetBindNameOn was called, causing the symbol to lose
the binding label information.
Differential Revision: https://reviews.llvm.org/D125097
|
|
The per-callsite size threshold used today to drive preinline decision is based on hotness/coldness cutoff. The default setup is for callsites with a sample count above the hotness cutoff (99%), a 1500 size threshold is used. Any callsite below 99.99% coldness cutoff uses a zero threshold. This has a couple issues:
1. While both cutoffs and size thoresholds are configurable, different applications may need different setups, making a universal setup impractical.
2. The callsites between hotness cutoff and coldness cutoff are not considered as inline candidates, which could be a missing opportunity.
3. Hot callsites always use the same threshold. In reality we may want a bigger threshold for hotter callsites.
In this change we are introducing a linear threshold regardless of hot/cold cutoffs. Given a sample space, a threshold is computed for a callsite based on the position of that callsite sample in the whole space. With that we no longer need to define what's hot or cold. Callsites with different hotness will get a different threshold. This should overcome the above three issues.
I have seen good results with a universal default setup for two of our internal services.
For one service, 0.2% to 0.5% perf improvement over a baseline with a previous default setup, on-par code size.
For the second service, 0.5% to 0.8% perf improvement over a baseline with a previous default setup, 0.2% code size increase; on-par performance and code size with a baseline that is with a carefully tuned cutoff to cover enough hot functions.
Reviewed By: wenlei
Differential Revision: https://reviews.llvm.org/D125023
|
|
Adds missing logic in the lowering from NvGPU to NVVM to support fp32
(in an accumulator operand) and tf32 (in multiplicand operand) types.
Fixes logic in one of the helper functions for converting the result
of a mma.sync operation with multiple 8x256bit output tiles, which is
the case for f32 outputs.
Differential Revision: https://reviews.llvm.org/D124533
|
|
As Fortran 2018 5.2.2 states, a program shall consist of exactly one
main program. Add this semantic check.
Reviewed By: klausler
Differential Revision: https://reviews.llvm.org/D125186
|
|
Reviewed By: MaskRay
Differential Revision: https://reviews.llvm.org/D125163
|
|
Fixing a small bug where it would assert if CU does not modify .debug_addr section.
Differential Revision: https://reviews.llvm.org/D125181
|
|
|
|
|
|
This is needed to prepare for adding FLAGS option.
Reviewed By: sivachandra
Differential Revision: https://reviews.llvm.org/D125055
|
|
|
|
|
|
These are all microcoded/multi-pipe nightmares on Ryzen, but we shouldn't just be using the WriteMicrocoded class which is for REALLY bad microcoded nightmares - instead use the same approximate latencies as znver2 (Agner and uops.info both suggest similar values) - and make sure we use the FPU defs for both
Fixes #53242
|
|
have one use
Fixes #51381
|