Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

release/20.x: [TBAA] Don't emit pointer-tbaa for void pointers. (#122116) #125206

Merged
merged 1 commit into from
Feb 18, 2025

Conversation

llvmbot
Copy link
Member

@llvmbot llvmbot commented Jan 31, 2025

Backport 77d3f8a

Requested by: @fhahn

@llvmbot llvmbot added this to the LLVM 20.X Release milestone Jan 31, 2025
@llvmbot
Copy link
Member Author

llvmbot commented Jan 31, 2025

@rjmccall What do you think about merging this PR to the release branch?

@llvmbot
Copy link
Member Author

llvmbot commented Jan 31, 2025

@llvm/pr-subscribers-clang-codegen
@llvm/pr-subscribers-clang

@llvm/pr-subscribers-backend-amdgpu

Author: None (llvmbot)

Changes

Backport 77d3f8a

Requested by: @fhahn


Full diff: https://github.com/llvm/llvm-project/pull/125206.diff

5 Files Affected:

  • (modified) clang/docs/UsersManual.rst (+79-9)
  • (modified) clang/lib/CodeGen/CodeGenTBAA.cpp (+8)
  • (modified) clang/test/CodeGen/tbaa-pointers.c (+3-10)
  • (modified) clang/test/CodeGenOpenCL/amdgpu-enqueue-kernel.cl (+12-13)
  • (modified) clang/unittests/CodeGen/TBAAMetadataTest.cpp (+3-9)
diff --git a/clang/docs/UsersManual.rst b/clang/docs/UsersManual.rst
index a56c9425ebb757b..943a9218ccbc252 100644
--- a/clang/docs/UsersManual.rst
+++ b/clang/docs/UsersManual.rst
@@ -2489,6 +2489,82 @@ are listed below.
 
     $ clang -fuse-ld=lld -Oz -Wl,--icf=safe -fcodegen-data-use code.cc
 
+.. _strict_aliasing:
+
+Strict Aliasing
+---------------
+
+The C and C++ standards require accesses to objects in memory to use l-values of
+an appropriate type for the object. This is called *strict aliasing* or
+*type-based alias analysis*. Strict aliasing enhances a variety of powerful
+memory optimizations, including reordering, combining, and eliminating memory
+accesses. These optimizations can lead to unexpected behavior in code that
+violates the strict aliasing rules. For example:
+
+.. code-block:: c++
+
+    void advance(size_t *index, double *data) {
+      double value = data[*index];
+      /* Clang may assume that this store does not change the contents of `data`. */
+      *index += 1;
+      /* Clang may assume that this store does not change the contents of `index`. */
+      data[*index] = value;
+      /* Either of these facts may create significant optimization opportunities
+       if Clang is able to inline this function. */
+  }
+
+Strict aliasing can be explicitly enabled with ``-fstrict-aliasing`` and
+disabled with ``-fno-strict-aliasing``. ``clang-cl`` defaults to
+``-fno-strict-aliasing``; see . Otherwise, Clang defaults to ``-fstrict-aliasing``.
+
+C and C++ specify slightly different rules for strict aliasing. To improve
+language interoperability, Clang allows two types to alias if either language
+would permit it. This includes applying the C++ similar types rule to C,
+allowing ``int **`` to alias ``int const * const *``. Clang also relaxes the
+standard aliasing rules in the following ways:
+
+* All integer types of the same size are permitted to alias each other,
+  including signed and unsigned types.
+* ``void*`` is permitted to alias any pointer type, ``void**`` is permitted to
+  alias any pointer to pointer type, and so on.
+
+Code which violates strict aliasing has undefined behavior. A program that
+works in one version of Clang may not work in another because of changes to the
+optimizer. Clang provides a :doc:`TypeSanitizer` to help detect
+violations of the strict aliasing rules, but it is currently still experimental.
+Code that is known to violate strict aliasing should generally be built with
+``-fno-strict-aliasing`` if the violation cannot be fixed.
+
+Clang supports several ways to fix a violation of strict aliasing:
+
+* L-values of the character types ``char`` and ``unsigned char`` (as well as
+  other types, depending on the standard) are permitted to access objects of
+  any type.
+
+* Library functions such as ``memcpy`` and ``memset`` are specified as treating
+  memory as characters and therefore are not limited by strict aliasing. If a
+  value of one type must be reinterpreted as another (e.g. to read the bits of a
+  floating-point number), use ``memcpy`` to copy the representation to an object
+  of the destination type. This has no overhead over a direct l-value access
+  because Clang should reliably optimize calls to these functions to use simple
+  loads and stores when they are used with small constant sizes.
+
+* The attribute ``may_alias`` can be added to a ``typedef`` to give l-values of
+  that type the same aliasing power as the character types.
+
+Clang makes a best effort to avoid obvious miscompilations from strict aliasing
+by only considering type information when it cannot prove that two accesses must
+refer to the same memory. However, it is not recommended that programmers
+intentionally rely on this instead of using one of the solutions above because
+it is too easy for the compiler's analysis to be blocked in surprising ways.
+
+In Clang 20, Clang strengthened its implementation of strict aliasing for
+accesses of pointer type. Previously, all accesses of pointer type were
+permitted to alias each other, but Clang now distinguishes different pointers
+by their pointee type, except as limited by the relaxations around qualifiers
+and ``void*`` described above. The previous behavior of treating all pointers as
+aliasing can be restored using ``-fno-pointer-tbaa``.
+
 Profile Guided Optimization
 ---------------------------
 
@@ -5272,12 +5348,6 @@ The Visual C++ Toolset has a slightly more elaborate mechanism for detection.
 Restrictions and Limitations compared to Clang
 ----------------------------------------------
 
-Strict Aliasing
-^^^^^^^^^^^^^^^
-
-Strict aliasing (TBAA) is always off by default in clang-cl. Whereas in clang,
-strict aliasing is turned on by default for all optimization levels.
-
-To enable LLVM optimizations based on strict aliasing rules (e.g., optimizations
-based on type of expressions in C/C++), user will need to explicitly pass
-`-fstrict-aliasing` to clang-cl.
+Strict aliasing (TBAA) is always off by default in clang-cl whereas in clang,
+strict aliasing is turned on by default for all optimization levels. For more
+details, see :ref:`Strict aliasing <strict_aliasing>`.
diff --git a/clang/lib/CodeGen/CodeGenTBAA.cpp b/clang/lib/CodeGen/CodeGenTBAA.cpp
index 75e66bae79afdce..3f1a24791ddd816 100644
--- a/clang/lib/CodeGen/CodeGenTBAA.cpp
+++ b/clang/lib/CodeGen/CodeGenTBAA.cpp
@@ -226,6 +226,14 @@ llvm::MDNode *CodeGenTBAA::getTypeInfoHelper(const Type *Ty) {
       PtrDepth++;
       Ty = Ty->getPointeeType()->getBaseElementTypeUnsafe();
     } while (Ty->isPointerType());
+
+    // While there are no special rules in the standards regarding void pointers
+    // and strict aliasing, emitting distinct tags for void pointers break some
+    // common idioms and there is no good alternative to re-write the code
+    // without strict-aliasing violations.
+    if (Ty->isVoidType())
+      return AnyPtr;
+
     assert(!isa<VariableArrayType>(Ty));
     // When the underlying type is a builtin type, we compute the pointee type
     // string recursively, which is implicitly more forgiving than the standards
diff --git a/clang/test/CodeGen/tbaa-pointers.c b/clang/test/CodeGen/tbaa-pointers.c
index 4aae2552f107a3d..48adac503357f00 100644
--- a/clang/test/CodeGen/tbaa-pointers.c
+++ b/clang/test/CodeGen/tbaa-pointers.c
@@ -208,12 +208,9 @@ int void_ptrs(void **ptr) {
 // COMMON-LABEL: define i32 @void_ptrs(
 // COMMON-SAME: ptr noundef [[PTRA:%.+]])
 // COMMON:        [[PTR_ADDR:%.+]]  = alloca ptr, align 8
-// DISABLE-NEXT:  store ptr [[PTRA]], ptr [[PTR_ADDR]], align 8, !tbaa [[ANYPTR]]
-// DISABLE-NEXT:  [[L0:%.+]] = load ptr, ptr  [[PTR_ADDR]], align 8, !tbaa  [[ANYPTR]]
-// DISABLE-NEXT:  [[L1:%.+]] = load ptr, ptr [[L0]], align 8, !tbaa  [[ANYPTR]]
-// DEFAULT-NEXT:  store ptr [[PTRA]], ptr [[PTR_ADDR]], align 8, !tbaa [[P2VOID:!.+]]
-// DEFAULT-NEXT:  [[L0:%.+]] = load ptr, ptr  [[PTR_ADDR]], align 8, !tbaa  [[P2VOID]]
-// DEFAULT-NEXT:  [[L1:%.+]] = load ptr, ptr [[L0]], align 8, !tbaa  [[P1VOID:!.+]]
+// COMMON-NEXT:   store ptr [[PTRA]], ptr [[PTR_ADDR]], align 8, !tbaa [[ANYPTR]]
+// COMMON-NEXT:   [[L0:%.+]] = load ptr, ptr  [[PTR_ADDR]], align 8, !tbaa  [[ANYPTR]]
+// COMMON-NEXT:   [[L1:%.+]] = load ptr, ptr [[L0]], align 8, !tbaa  [[ANYPTR]]
 // COMMON-NEXT:   [[BOOL:%.+]] = icmp ne ptr [[L1]], null
 // COMMON-NEXT:   [[BOOL_EXT:%.+]] = zext i1 [[BOOL]] to i64
 // COMMON-NEXT:   [[COND:%.+]] = select i1 [[BOOL]], i32 0, i32 1
@@ -254,7 +251,3 @@ int void_ptrs(void **ptr) {
 // COMMON:  [[INT_TAG]] = !{[[INT_TY:!.+]], [[INT_TY]], i64 0}
 // COMMON:  [[INT_TY]] = !{!"int", [[CHAR]], i64 0}
 // DEFAULT: [[ANYPTR]] = !{[[ANY_POINTER]],  [[ANY_POINTER]], i64 0}
-// DEFAULT: [[P2VOID]] = !{[[P2VOID_TY:!.+]], [[P2VOID_TY]], i64 0}
-// DEFAULT: [[P2VOID_TY]] = !{!"p2 void", [[ANY_POINTER]], i64 0}
-// DEFAULT: [[P1VOID]] = !{[[P1VOID_TY:!.+]], [[P1VOID_TY]], i64 0}
-// DEFAULT: [[P1VOID_TY]] = !{!"p1 void", [[ANY_POINTER]], i64 0}
diff --git a/clang/test/CodeGenOpenCL/amdgpu-enqueue-kernel.cl b/clang/test/CodeGenOpenCL/amdgpu-enqueue-kernel.cl
index 5599f4dd50f0422..ace34dd0ca6dce2 100644
--- a/clang/test/CodeGenOpenCL/amdgpu-enqueue-kernel.cl
+++ b/clang/test/CodeGenOpenCL/amdgpu-enqueue-kernel.cl
@@ -651,7 +651,7 @@ kernel void test_target_features_kernel(global int *i) {
 //
 // GFX900: Function Attrs: convergent nounwind
 // GFX900-LABEL: define {{[^@]+}}@__test_block_invoke_3_kernel
-// GFX900-SAME: (<{ i32, i32, ptr, ptr addrspace(1), ptr addrspace(1), i64, i8 }> [[TMP0:%.*]], ptr addrspace(3) [[TMP1:%.*]]) #[[ATTR6]] !kernel_arg_addr_space [[META28:![0-9]+]] !kernel_arg_access_qual [[META29:![0-9]+]] !kernel_arg_type [[META30:![0-9]+]] !kernel_arg_base_type [[META30]] !kernel_arg_type_qual [[META31:![0-9]+]] {
+// GFX900-SAME: (<{ i32, i32, ptr, ptr addrspace(1), ptr addrspace(1), i64, i8 }> [[TMP0:%.*]], ptr addrspace(3) [[TMP1:%.*]]) #[[ATTR6]] !kernel_arg_addr_space [[META27:![0-9]+]] !kernel_arg_access_qual [[META28:![0-9]+]] !kernel_arg_type [[META29:![0-9]+]] !kernel_arg_base_type [[META29]] !kernel_arg_type_qual [[META30:![0-9]+]] {
 // GFX900-NEXT:  entry:
 // GFX900-NEXT:    [[TMP2:%.*]] = alloca <{ i32, i32, ptr, ptr addrspace(1), ptr addrspace(1), i64, i8 }>, align 8, addrspace(5)
 // GFX900-NEXT:    store <{ i32, i32, ptr, ptr addrspace(1), ptr addrspace(1), i64, i8 }> [[TMP0]], ptr addrspace(5) [[TMP2]], align 8
@@ -688,7 +688,7 @@ kernel void test_target_features_kernel(global int *i) {
 //
 // GFX900: Function Attrs: convergent norecurse nounwind
 // GFX900-LABEL: define {{[^@]+}}@test_target_features_kernel
-// GFX900-SAME: (ptr addrspace(1) noundef align 4 [[I:%.*]]) #[[ATTR2]] !kernel_arg_addr_space [[META32:![0-9]+]] !kernel_arg_access_qual [[META23]] !kernel_arg_type [[META33:![0-9]+]] !kernel_arg_base_type [[META33]] !kernel_arg_type_qual [[META25]] {
+// GFX900-SAME: (ptr addrspace(1) noundef align 4 [[I:%.*]]) #[[ATTR2]] !kernel_arg_addr_space [[META31:![0-9]+]] !kernel_arg_access_qual [[META23]] !kernel_arg_type [[META32:![0-9]+]] !kernel_arg_base_type [[META32]] !kernel_arg_type_qual [[META25]] {
 // GFX900-NEXT:  entry:
 // GFX900-NEXT:    [[I_ADDR:%.*]] = alloca ptr addrspace(1), align 8, addrspace(5)
 // GFX900-NEXT:    [[DEFAULT_QUEUE:%.*]] = alloca ptr addrspace(1), align 8, addrspace(5)
@@ -700,7 +700,7 @@ kernel void test_target_features_kernel(global int *i) {
 // GFX900-NEXT:    [[FLAGS_ASCAST:%.*]] = addrspacecast ptr addrspace(5) [[FLAGS]] to ptr
 // GFX900-NEXT:    [[NDRANGE_ASCAST:%.*]] = addrspacecast ptr addrspace(5) [[NDRANGE]] to ptr
 // GFX900-NEXT:    [[TMP_ASCAST:%.*]] = addrspacecast ptr addrspace(5) [[TMP]] to ptr
-// GFX900-NEXT:    store ptr addrspace(1) [[I]], ptr [[I_ADDR_ASCAST]], align 8, !tbaa [[TBAA34:![0-9]+]]
+// GFX900-NEXT:    store ptr addrspace(1) [[I]], ptr [[I_ADDR_ASCAST]], align 8, !tbaa [[TBAA33:![0-9]+]]
 // GFX900-NEXT:    call void @llvm.lifetime.start.p5(i64 8, ptr addrspace(5) [[DEFAULT_QUEUE]]) #[[ATTR8]]
 // GFX900-NEXT:    call void @llvm.lifetime.start.p5(i64 4, ptr addrspace(5) [[FLAGS]]) #[[ATTR8]]
 // GFX900-NEXT:    store i32 0, ptr [[FLAGS_ASCAST]], align 4, !tbaa [[TBAA17]]
@@ -803,16 +803,15 @@ kernel void test_target_features_kernel(global int *i) {
 // GFX900: [[META23]] = !{!"none"}
 // GFX900: [[META24]] = !{!"__block_literal"}
 // GFX900: [[META25]] = !{!""}
-// GFX900: [[TBAA26]] = !{[[META27:![0-9]+]], [[META27]], i64 0}
-// GFX900: [[META27]] = !{!"p1 void", [[META9]], i64 0}
-// GFX900: [[META28]] = !{i32 0, i32 3}
-// GFX900: [[META29]] = !{!"none", !"none"}
-// GFX900: [[META30]] = !{!"__block_literal", !"void*"}
-// GFX900: [[META31]] = !{!"", !""}
-// GFX900: [[META32]] = !{i32 1}
-// GFX900: [[META33]] = !{!"int*"}
-// GFX900: [[TBAA34]] = !{[[META35:![0-9]+]], [[META35]], i64 0}
-// GFX900: [[META35]] = !{!"p1 int", [[META9]], i64 0}
+// GFX900: [[TBAA26]] = !{[[META9]], [[META9]], i64 0}
+// GFX900: [[META27]] = !{i32 0, i32 3}
+// GFX900: [[META28]] = !{!"none", !"none"}
+// GFX900: [[META29]] = !{!"__block_literal", !"void*"}
+// GFX900: [[META30]] = !{!"", !""}
+// GFX900: [[META31]] = !{i32 1}
+// GFX900: [[META32]] = !{!"int*"}
+// GFX900: [[TBAA33]] = !{[[META34:![0-9]+]], [[META34]], i64 0}
+// GFX900: [[META34]] = !{!"p1 int", [[META9]], i64 0}
 //.
 //// NOTE: These prefixes are unused and the list is autogenerated. Do not add tests below this line:
 // CHECK: {{.*}}
diff --git a/clang/unittests/CodeGen/TBAAMetadataTest.cpp b/clang/unittests/CodeGen/TBAAMetadataTest.cpp
index cad8783ea73fba0..f05c9787c63eaf0 100644
--- a/clang/unittests/CodeGen/TBAAMetadataTest.cpp
+++ b/clang/unittests/CodeGen/TBAAMetadataTest.cpp
@@ -117,15 +117,9 @@ TEST(TBAAMetadataTest, BasicTypes) {
   ASSERT_TRUE(I);
 
   I = matchNext(I,
-      MInstruction(Instruction::Store,
-        MValType(PointerType::getUnqual(Compiler.Context)),
-        MMTuple(
-          MMTuple(
-            MMString("p1 void"),
-            AnyPtr,
-            MConstInt(0)),
-          MSameAs(0),
-          MConstInt(0))));
+                MInstruction(Instruction::Store,
+                             MValType(PointerType::getUnqual(Compiler.Context)),
+                             MMTuple(AnyPtr, MSameAs(0), MConstInt(0))));
   ASSERT_TRUE(I);
 
   I = matchNext(I,

@tstellar
Copy link
Collaborator

@AaronBallman Any thoughts about this one?

Copy link
Collaborator

@AaronBallman AaronBallman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm on the fence. On the one hand, it's a small amount of code to change and it fixes miscompiles, so it's a good candidate. On the other hand, there's not been much bake time and it's changing behavior in Clang that's worked that way for about 15 years as best I can tell, so I'm not certain we know if there's negative fallout.

@rjmccall @fhahn do you think this should have more bake time?

@nikic
Copy link
Contributor

nikic commented Feb 18, 2025

@AaronBallman Isn't the pointer TBAA support new in Clang 20?

@AaronBallman
Copy link
Collaborator

@AaronBallman Isn't the pointer TBAA support new in Clang 20?

It changed code in https://github.com/llvm/llvm-project/blame/27fe2c95ee067ee013b947040538224187b3adb7/clang/lib/CodeGen/CodeGenTBAA.cpp#L117 which is ~15 years old.

@fhahn
Copy link
Contributor

fhahn commented Feb 18, 2025

Without the pointer-tbaa changes (which is new on by default in Clang 20), we would always generate any pointer. Without this fix, we will generate different tags for different void pointer depths.

With this fix, we will generate any pointer again for void pointers. I think this should be very safe to take.

@AaronBallman
Copy link
Collaborator

Without the pointer-tbaa changes (which is new on by default in Clang 20), we would always generate any pointer. Without this fix, we will generate different tags for different void pointer depths.

With this fix, we will generate any pointer again for void pointers. I think this should be very safe to take.

Okay, thank you for the explanation! I'm okay with this (hopefully early rc testing will shake out any serious issues if there are any).

Copy link
Collaborator

@AaronBallman AaronBallman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@rjmccall
Copy link
Contributor

Right, I agree with taking this for the release branch. It is essentially a bug fix (relaxing a rule that was undesirably strict) for the pointer TBAA feature, which is already in the branch.

While there are no special rules in the standards regarding void
pointers and strict aliasing, emitting distinct tags for void pointers
break some common idioms and there is no good alternative to re-write
the code without strict-aliasing violations. An example is to count the
entries in an array of pointers:

    int count_elements(void * values) {
      void **seq = values;
      int count;
      for (count = 0; seq && seq[count]; count++);
      return count;
    }

https://clang.godbolt.org/z/8dTv51v8W

An example in the wild is from
llvm#119099

This patch avoids emitting distinct tags for void pointers, to avoid
those idioms causing mis-compiles for now.

Fixes llvm#119099.
Fixes llvm#122537.

PR: llvm#122116
(cherry picked from commit 77d3f8a)
@tstellar tstellar merged commit 7643bd6 into llvm:release/20.x Feb 18, 2025
9 of 12 checks passed
Copy link

@fhahn (or anyone else). If you would like to add a note about this fix in the release notes (completely optional). Please reply to this comment with a one or two sentence description of the fix. When you are done, please add the release:note label to this PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backend:AMDGPU clang:codegen clang Clang issues not falling into any other category
Projects
Development

Successfully merging this pull request may close these issues.

6 participants