S390x: add preliminary support for SystemZ #4810

liushuyu · 2025-01-01T05:30:00Z

This pull request adds preliminary support for IBM Z systems (running in z/Architecture mode).

Gayathri-Berli · 2025-01-03T14:46:13Z

Thank you for bringing it to our attention. We will check internally and confirm.

uweigand · 2025-01-07T08:43:01Z

Many thanks for working on the SystemZ port! I'll have a look at the details.

kalev · 2025-01-09T14:19:25Z

I tried to build this on s390x on Fedora rawhide, using gdc as a bootstrap compiler and ran into the following error:

core/thread/osthread.d(1648): Error: cannot implicitly convert expression `regs[9]` of type `ulong` to `void*`
core/thread/osthread.d(1648): Error: cannot implicitly convert expression `regs[9]` of type `ulong` to `void*`
core/thread/osthread.d(1648): Error: cannot implicitly convert expression `regs[9]` of type `ulong` to `void*`
core/thread/osthread.d(1648): Error: cannot implicitly convert expression `regs[9]` of type `ulong` to `void*`

Any ideas what's going on?

liushuyu · 2025-01-09T16:32:48Z

I tried to build this on s390x on Fedora rawhide, using gdc as a bootstrap compiler and ran into the following error:

core/thread/osthread.d(1648): Error: cannot implicitly convert expression `regs[9]` of type `ulong` to `void*`
core/thread/osthread.d(1648): Error: cannot implicitly convert expression `regs[9]` of type `ulong` to `void*`
core/thread/osthread.d(1648): Error: cannot implicitly convert expression `regs[9]` of type `ulong` to `void*`
core/thread/osthread.d(1648): Error: cannot implicitly convert expression `regs[9]` of type `ulong` to `void*`

Any ideas what's going on?

This is now fixed.

kalev · 2025-01-09T18:13:20Z

Thanks! With this, I was able to build an initial version of ldc (using gdc), but when I try to build ldc with itself, it now fails with the following:

/usr/bin/ld: lib64/libldc.a(asmstmt.cpp.o)(.text+0x1d02): misaligned symbol `_ZN3Loc12messageStyleE' (0x1efc009) for relocation R_390_PC32DBL
/usr/bin/ld: final link failed
collect2: error: ld returned 1 exit status

uweigand

Thanks for implementing SystemZ support! I'm not familiar with D at all, but I tried to review this from a perspective of compliance with the system ABI. This looks good to me for the most part, see inline comments for some issues / questions.

uweigand · 2025-01-10T13:43:16Z

gen/abi/systemz.cpp

+
+struct StructSimpleFlattenRewrite : BaseBitcastABIRewrite {
+  LLType *type(Type *ty) override {
+    const size_t type_size = size(ty);


In the C ABI, a struct containing just a single float or double member is passed like a plain float or double, i.e. possibly in a floating-point register. I don't see this being handled anywhere here.

This is fixed now.

Thanks! I forgot to mention one similar case, sorry: a struct containing just a single element of vector type (or recursively another such struct), is passed like a plain vector type (i.e. in vector registers).

uweigand · 2025-01-10T13:52:08Z

gen/abi/systemz.cpp

+    }
+    if (t->ty == TY::Tint128 || t->ty == TY::Tcomplex80) {
+      return true;
+    }


Vector types of size up to 16 should be passed in vector registers, but only when compiling for an architecture that supports vector registers in the first place (i.e. z13 and above). Older machines use another ABI where vector types are always passed via reference. Do you intend to support both ABIs, or do you plan to simply require z13 or later (either in general, or whenever vector types are used)? That may be a reasonable choice at this point, but I guess you should make sure that the machine type / features are set up accordingly for the LLVM back-end.

Vector types of size up to 16 should be passed in vector registers, but only when compiling for an architecture that supports vector registers in the first place (i.e. z13 and above). Older machines use another ABI where vector types are always passed via reference. Do you intend to support both ABIs, or do you plan to simply require z13 or later (either in general, or whenever vector types are used)? That may be a reasonable choice at this point, but I guess you should make sure that the machine type / features are set up accordingly for the LLVM back-end.

I don't think in D, and there is an easy way to construct TY::Tint128 (you can use core.int128.Cent, but that type is { i64, i64 } in D ABI). Even if we lower it to int128 and pass by reference, LLVM will still correctly pass it using vector registers (see https://godbolt.org/z/a8xEEfezz).
For TY::Tcomplex80, this lowers to fp128 and will automatically be handled by LLVM according to the -mcpu values passed to LLVM.

Vector types of size up to 16 should be passed in vector registers, but only when compiling for an architecture that supports vector registers in the first place (i.e. z13 and above). Older machines use another ABI where vector types are always passed via reference. Do you intend to support both ABIs, or do you plan to simply require z13 or later (either in general, or whenever vector types are used)? That may be a reasonable choice at this point, but I guess you should make sure that the machine type / features are set up accordingly for the LLVM back-end.

I don't think in D, and there is an easy way to construct TY::Tint128 (you can use core.int128.Cent, but that type is { i64, i64 } in D ABI). Even if we lower it to int128 and pass by reference, LLVM will still correctly pass it using vector registers (see https://godbolt.org/z/a8xEEfezz).

int128 is always passed via reference, also in your godbolt example (it uses vector registers temporarily to set up the value, but the actual argument is passed at 160(%r15), with %r2 pointing to that address.

For TY::Tcomplex80, this lowers to fp128 and will automatically be handled by LLVM according to the -mcpu values passed to LLVM.

fp128 is always passed in a pair of floating-point registers, no matter what -mcpu.

What I was refering to applies solely to actual vector types (of size up to 16). Those are passed via reference on pre-z13 machines, and in vector registers on z13 and later.

uweigand · 2025-01-10T13:53:24Z

gen/abi/systemz.cpp

+    }
+    // "A struct or union of any other size, a complex type, an __int128, a long
+    // double, a _Decimal128, or a vector whose size exceeds 16 bytes"
+    if (size(t) > 16 || t->iscomplex() || t->isimaginary()) {


I'm a bit confused by the size <=8 check above vs. the size > 16 check here. What about structs with sizes in between the two? They should be passed by reference - I'm not sure if this is what this code does.

Any struct objects that are between 8 to 16 bytes would be passed by reference (will be determined by DtoIsInMemoryOnly).

uweigand · 2025-01-10T13:56:20Z

runtime/druntime/src/core/internal/vararg/s390x.d

+        {
+            // Arg is passed in one register
+            alias T1 = U[0];
+            static if (is(T1 == double) || is(T1 == float))


As above, single-element float/double structs might need to be handled here as well.

This is now somewhat addressed in the compiler, where single-element float/double structs are rewritten to float/double in va_args before passing to this code.

uweigand · 2025-01-10T13:58:38Z

runtime/druntime/src/core/internal/vararg/s390x.d

+{
+    TypeInfo arg1, arg2;
+    if (!ti.argTypes(arg1, arg2))
+    {


I guess I'm not familiar with how "runtime" va_arg works in D - is this when the type itself it not known at compile-time? I notice this function doesn't appear to handle floating-point types at all anywhere. Can those never occur here?

uweigand · 2025-01-10T13:59:04Z

runtime/druntime/src/core/thread/osthread.d

-                ("sd $fp, %0") : "=m" (regs[9]); 
-                ("sd $ra, %0") : "=m" (sp);
+                ("sd $fp, %0") : "=m" (regs[9]);
+                ("sd $sp, %0") : "=m" (sp);


This seems unrelated to SystemZ?

I will remove this hunk when the pull request is cleaned up.

It does appear to be correct. You can also submit as separate PR through the github web interface. Thnx.

uweigand · 2025-01-10T14:00:03Z

runtime/druntime/src/core/threadasm.S

+    .cfi_def_cfa_offset 384
+    /* store the (optional) backchain data */
+    stg     %r1, 0(%r15)
+    aghi    %r1, -64


I looks like you're accessing the stack below r15 here. This is unsafe on SystemZ - we do not have any "red zone", i.e. all memory below r15 may be overwritten at any time by a signal handler.

Should be fixed now

kinke · 2025-01-16T11:09:37Z

Wrt. SystemZ CI, we apparently could

apply for a free VM from IBM: https://community.ibm.com/zsystems/form/l1cc-oss-vm-request/
install a GitHub Actions runner on it: https://github.com/anup-kodlekere/gaplib
target it in our GHA YAML (using GDC as host compiler for now), to see how bad the test failures are

uweigand · 2025-01-17T13:08:48Z

Wrt. SystemZ CI, we apparently could

1. apply for a free VM from IBM: https://community.ibm.com/zsystems/form/l1cc-oss-vm-request/

2. install a GitHub Actions runner on it: https://github.com/anup-kodlekere/gaplib

3. target it in our GHA YAML (using GDC as host compiler for now), to see how bad the test failures are

Yes, that would definitely be an option - that is the intended purpose of those VMs we make available.

As an alternative, some projects go the route of running SystemZ tests under qemu (see e.g. https://github.com/bytecodealliance/wasmtime/blob/main/.github/workflows/main.yml). For performance reasons, in this case it might be preferable to run the compiler natively (as cross-compiler) and only run the resulting test cases in qemu.

JohanEngelen · 2025-01-19T16:54:20Z

Wrt. SystemZ CI, we apparently could
1. apply for a free VM from IBM: https://community.ibm.com/zsystems/form/l1cc-oss-vm-request/

2. install a GitHub Actions runner on it: https://github.com/anup-kodlekere/gaplib

3. target it in our GHA YAML (using GDC as host compiler for now), to see how bad the test failures are
Yes, that would definitely be an option - that is the intended purpose of those VMs we make available.

Is that a task you are willing to pick up? Thanks!

uweigand · 2025-01-20T12:52:16Z

gen/abi/systemz.cpp

+using namespace dmd;
+
+struct SimpleHardfloatRewrite : ABIRewrite {
+  Type *getFirstFieldType(Type *ty) {


In the C ABI, this is applied recursively: a struct with a single element that is a struct with a single element of floating-point type is also passed like a float, etc.

uweigand · 2025-01-20T12:53:06Z

gen/abi/systemz.cpp

+
+struct StructSimpleFlattenRewrite : BaseBitcastABIRewrite {
+  LLType *type(Type *ty) override {
+    const size_t type_size = size(ty);


Thanks! I forgot to mention one similar case, sorry: a struct containing just a single element of vector type (or recursively another such struct), is passed like a plain vector type (i.e. in vector registers).

uweigand · 2025-01-20T12:58:03Z

gen/abi/systemz.cpp

+    }
+    if (t->ty == TY::Tint128 || t->ty == TY::Tcomplex80) {
+      return true;
+    }


Vector types of size up to 16 should be passed in vector registers, but only when compiling for an architecture that supports vector registers in the first place (i.e. z13 and above). Older machines use another ABI where vector types are always passed via reference. Do you intend to support both ABIs, or do you plan to simply require z13 or later (either in general, or whenever vector types are used)? That may be a reasonable choice at this point, but I guess you should make sure that the machine type / features are set up accordingly for the LLVM back-end.

I don't think in D, and there is an easy way to construct TY::Tint128 (you can use core.int128.Cent, but that type is { i64, i64 } in D ABI). Even if we lower it to int128 and pass by reference, LLVM will still correctly pass it using vector registers (see https://godbolt.org/z/a8xEEfezz).

int128 is always passed via reference, also in your godbolt example (it uses vector registers temporarily to set up the value, but the actual argument is passed at 160(%r15), with %r2 pointing to that address.

For TY::Tcomplex80, this lowers to fp128 and will automatically be handled by LLVM according to the -mcpu values passed to LLVM.

fp128 is always passed in a pair of floating-point registers, no matter what -mcpu.

What I was refering to applies solely to actual vector types (of size up to 16). Those are passed via reference on pre-z13 machines, and in vector registers on z13 and later.

uweigand · 2025-01-20T13:03:21Z

runtime/druntime/src/core/threadasm.S

+    /* Save callee-saved floating point registers
+       s390x ABI has a very unique way for storing fp registers:
+       even-pairs first and odd-pairs last */
+    std     %f8,  0(%r15)


This clobbers the backchain data. (Which would make backchain-based unwinding, e.g. for profiling, break if a sample hits while in this routine.) Normally, what you'd do is to allocate a new register-save area below the 64 bytes of temporary stack space, and fill in only the backchain in that area.

uweigand · 2025-01-20T13:03:54Z

runtime/druntime/src/core/threadasm.S

+
+    /* Save stack pointer, the stack pointer is adjusted so that
+       GC won't see the float point registers */
+    stg     %r15, 0(%r2)


Not sure what exactly that comment means, but I don't see any adjustment to the stack pointer here?

uweigand · 2025-01-20T13:11:22Z

Wrt. SystemZ CI, we apparently could
1. apply for a free VM from IBM: https://community.ibm.com/zsystems/form/l1cc-oss-vm-request/

2. install a GitHub Actions runner on it: https://github.com/anup-kodlekere/gaplib

3. target it in our GHA YAML (using GDC as host compiler for now), to see how bad the test failures are
Yes, that would definitely be an option - that is the intended purpose of those VMs we make available.
Is that a task you are willing to pick up? Thanks!

I could certainly set up the machine and the Actions runner. For integration with your CI I'd probably need help from someone who understands its current setup ...

kinke · 2025-01-20T13:32:58Z

I could certainly set up the machine and the Actions runner.

Thanks, that'd be great!

For integration with your CI I'd probably need help from someone who understands its current setup ...

No worries, once we have a registered runner, we can take over. According to https://docs.github.com/en/actions/hosting-your-own-runners/managing-self-hosted-runners/adding-self-hosted-runners, the registration requires a token from us, which is valid for one hour. So a bit of timely coordination will be required.

kalev mentioned this pull request Jan 3, 2025

porting to the s390x/SystemZ architecture #4171

Open

liushuyu force-pushed the s390x branch from ad333b4 to c5cb043 Compare January 7, 2025 16:53

liushuyu force-pushed the s390x branch 2 times, most recently from 8ccb0d2 to 496476a Compare January 9, 2025 16:28

uweigand reviewed Jan 10, 2025

View reviewed changes

liushuyu force-pushed the s390x branch 2 times, most recently from ecea763 to c0cd3a2 Compare January 19, 2025 07:03

liushuyu added 10 commits January 19, 2025 23:46

druntime/rt/sections_elf_shared.d: add support for SystemZ (S390x)

3c2f782

druntime/core/thread/fiber: add fiber implementation for s390x

eed913e

gen/abi: add initial ABI implementations for s390x

4266a8e

gen/ir: add support for s390x special va_arg type

009cd20

osthread.d: add callWithStackShell fallback

9eb951f

gen/abi: add more s390x ABI rewrites

fb8420e

druntime/thread: add osthread support for s390x

bea3afb

gen/ctfloat: make CTFloat big-endian aware

b6bcfb5

gen/abi: flatten single float struct to float on s390x

c36ff97

dmd: more s390x va_arg implementations

91108c5

liushuyu force-pushed the s390x branch from c0cd3a2 to 91108c5 Compare January 20, 2025 06:46

uweigand reviewed Jan 20, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

S390x: add preliminary support for SystemZ #4810

S390x: add preliminary support for SystemZ #4810

liushuyu commented Jan 1, 2025

Gayathri-Berli commented Jan 3, 2025

uweigand commented Jan 7, 2025

kalev commented Jan 9, 2025

liushuyu commented Jan 9, 2025

kalev commented Jan 9, 2025

uweigand left a comment

uweigand Jan 10, 2025

liushuyu Jan 18, 2025

uweigand Jan 20, 2025

uweigand Jan 10, 2025

liushuyu Jan 18, 2025

uweigand Jan 20, 2025

uweigand Jan 10, 2025

liushuyu Jan 12, 2025

uweigand Jan 10, 2025

liushuyu Jan 20, 2025

uweigand Jan 10, 2025

uweigand Jan 10, 2025

liushuyu Jan 12, 2025

JohanEngelen Jan 12, 2025

uweigand Jan 10, 2025

liushuyu Jan 20, 2025

kinke commented Jan 16, 2025

uweigand commented Jan 17, 2025

JohanEngelen commented Jan 19, 2025

uweigand Jan 20, 2025

uweigand Jan 20, 2025

uweigand Jan 20, 2025

uweigand Jan 20, 2025

uweigand Jan 20, 2025

uweigand commented Jan 20, 2025

kinke commented Jan 20, 2025

S390x: add preliminary support for SystemZ #4810

Are you sure you want to change the base?

S390x: add preliminary support for SystemZ #4810

Conversation

liushuyu commented Jan 1, 2025

Gayathri-Berli commented Jan 3, 2025

uweigand commented Jan 7, 2025

kalev commented Jan 9, 2025

liushuyu commented Jan 9, 2025

kalev commented Jan 9, 2025

uweigand left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

kinke commented Jan 16, 2025

uweigand commented Jan 17, 2025

JohanEngelen commented Jan 19, 2025

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

uweigand commented Jan 20, 2025

kinke commented Jan 20, 2025