Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

S390x: add preliminary support for SystemZ #4810

Draft
wants to merge 10 commits into
base: master
Choose a base branch
from

Conversation

liushuyu
Copy link
Contributor

@liushuyu liushuyu commented Jan 1, 2025

This pull request adds preliminary support for IBM Z systems (running in z/Architecture mode).

@Gayathri-Berli
Copy link

Thank you for bringing it to our attention. We will check internally and confirm.

@uweigand
Copy link

uweigand commented Jan 7, 2025

Many thanks for working on the SystemZ port! I'll have a look at the details.

@kalev
Copy link
Contributor

kalev commented Jan 9, 2025

I tried to build this on s390x on Fedora rawhide, using gdc as a bootstrap compiler and ran into the following error:

core/thread/osthread.d(1648): Error: cannot implicitly convert expression `regs[9]` of type `ulong` to `void*`
core/thread/osthread.d(1648): Error: cannot implicitly convert expression `regs[9]` of type `ulong` to `void*`
core/thread/osthread.d(1648): Error: cannot implicitly convert expression `regs[9]` of type `ulong` to `void*`
core/thread/osthread.d(1648): Error: cannot implicitly convert expression `regs[9]` of type `ulong` to `void*`

Any ideas what's going on?

@liushuyu liushuyu force-pushed the s390x branch 2 times, most recently from 8ccb0d2 to 496476a Compare January 9, 2025 16:28
@liushuyu
Copy link
Contributor Author

liushuyu commented Jan 9, 2025

I tried to build this on s390x on Fedora rawhide, using gdc as a bootstrap compiler and ran into the following error:

core/thread/osthread.d(1648): Error: cannot implicitly convert expression `regs[9]` of type `ulong` to `void*`
core/thread/osthread.d(1648): Error: cannot implicitly convert expression `regs[9]` of type `ulong` to `void*`
core/thread/osthread.d(1648): Error: cannot implicitly convert expression `regs[9]` of type `ulong` to `void*`
core/thread/osthread.d(1648): Error: cannot implicitly convert expression `regs[9]` of type `ulong` to `void*`

Any ideas what's going on?

This is now fixed.

@kalev
Copy link
Contributor

kalev commented Jan 9, 2025

Thanks! With this, I was able to build an initial version of ldc (using gdc), but when I try to build ldc with itself, it now fails with the following:

/usr/bin/ld: lib64/libldc.a(asmstmt.cpp.o)(.text+0x1d02): misaligned symbol `_ZN3Loc12messageStyleE' (0x1efc009) for relocation R_390_PC32DBL
/usr/bin/ld: final link failed
collect2: error: ld returned 1 exit status

Copy link

@uweigand uweigand left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for implementing SystemZ support! I'm not familiar with D at all, but I tried to review this from a perspective of compliance with the system ABI. This looks good to me for the most part, see inline comments for some issues / questions.


struct StructSimpleFlattenRewrite : BaseBitcastABIRewrite {
LLType *type(Type *ty) override {
const size_t type_size = size(ty);

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the C ABI, a struct containing just a single float or double member is passed like a plain float or double, i.e. possibly in a floating-point register. I don't see this being handled anywhere here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is fixed now.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! I forgot to mention one similar case, sorry: a struct containing just a single element of vector type (or recursively another such struct), is passed like a plain vector type (i.e. in vector registers).

}
if (t->ty == TY::Tint128 || t->ty == TY::Tcomplex80) {
return true;
}

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Vector types of size up to 16 should be passed in vector registers, but only when compiling for an architecture that supports vector registers in the first place (i.e. z13 and above). Older machines use another ABI where vector types are always passed via reference. Do you intend to support both ABIs, or do you plan to simply require z13 or later (either in general, or whenever vector types are used)? That may be a reasonable choice at this point, but I guess you should make sure that the machine type / features are set up accordingly for the LLVM back-end.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Vector types of size up to 16 should be passed in vector registers, but only when compiling for an architecture that supports vector registers in the first place (i.e. z13 and above). Older machines use another ABI where vector types are always passed via reference. Do you intend to support both ABIs, or do you plan to simply require z13 or later (either in general, or whenever vector types are used)? That may be a reasonable choice at this point, but I guess you should make sure that the machine type / features are set up accordingly for the LLVM back-end.

I don't think in D, and there is an easy way to construct TY::Tint128 (you can use core.int128.Cent, but that type is { i64, i64 } in D ABI). Even if we lower it to int128 and pass by reference, LLVM will still correctly pass it using vector registers (see https://godbolt.org/z/a8xEEfezz).
For TY::Tcomplex80, this lowers to fp128 and will automatically be handled by LLVM according to the -mcpu values passed to LLVM.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Vector types of size up to 16 should be passed in vector registers, but only when compiling for an architecture that supports vector registers in the first place (i.e. z13 and above). Older machines use another ABI where vector types are always passed via reference. Do you intend to support both ABIs, or do you plan to simply require z13 or later (either in general, or whenever vector types are used)? That may be a reasonable choice at this point, but I guess you should make sure that the machine type / features are set up accordingly for the LLVM back-end.

I don't think in D, and there is an easy way to construct TY::Tint128 (you can use core.int128.Cent, but that type is { i64, i64 } in D ABI). Even if we lower it to int128 and pass by reference, LLVM will still correctly pass it using vector registers (see https://godbolt.org/z/a8xEEfezz).

int128 is always passed via reference, also in your godbolt example (it uses vector registers temporarily to set up the value, but the actual argument is passed at 160(%r15), with %r2 pointing to that address.

For TY::Tcomplex80, this lowers to fp128 and will automatically be handled by LLVM according to the -mcpu values passed to LLVM.

fp128 is always passed in a pair of floating-point registers, no matter what -mcpu.

What I was refering to applies solely to actual vector types (of size up to 16). Those are passed via reference on pre-z13 machines, and in vector registers on z13 and later.

}
// "A struct or union of any other size, a complex type, an __int128, a long
// double, a _Decimal128, or a vector whose size exceeds 16 bytes"
if (size(t) > 16 || t->iscomplex() || t->isimaginary()) {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm a bit confused by the size <=8 check above vs. the size > 16 check here. What about structs with sizes in between the two? They should be passed by reference - I'm not sure if this is what this code does.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any struct objects that are between 8 to 16 bytes would be passed by reference (will be determined by DtoIsInMemoryOnly).

{
// Arg is passed in one register
alias T1 = U[0];
static if (is(T1 == double) || is(T1 == float))

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As above, single-element float/double structs might need to be handled here as well.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is now somewhat addressed in the compiler, where single-element float/double structs are rewritten to float/double in va_args before passing to this code.

{
TypeInfo arg1, arg2;
if (!ti.argTypes(arg1, arg2))
{

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess I'm not familiar with how "runtime" va_arg works in D - is this when the type itself it not known at compile-time? I notice this function doesn't appear to handle floating-point types at all anywhere. Can those never occur here?

("sd $fp, %0") : "=m" (regs[9]);
("sd $ra, %0") : "=m" (sp);
("sd $fp, %0") : "=m" (regs[9]);
("sd $sp, %0") : "=m" (sp);

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems unrelated to SystemZ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will remove this hunk when the pull request is cleaned up.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It does appear to be correct. You can also submit as separate PR through the github web interface. Thnx.

.cfi_def_cfa_offset 384
/* store the (optional) backchain data */
stg %r1, 0(%r15)
aghi %r1, -64

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I looks like you're accessing the stack below r15 here. This is unsafe on SystemZ - we do not have any "red zone", i.e. all memory below r15 may be overwritten at any time by a signal handler.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should be fixed now

@kinke
Copy link
Member

kinke commented Jan 16, 2025

Wrt. SystemZ CI, we apparently could

  1. apply for a free VM from IBM: https://community.ibm.com/zsystems/form/l1cc-oss-vm-request/
  2. install a GitHub Actions runner on it: https://github.com/anup-kodlekere/gaplib
  3. target it in our GHA YAML (using GDC as host compiler for now), to see how bad the test failures are

@uweigand
Copy link

Wrt. SystemZ CI, we apparently could

1. apply for a free VM from IBM: https://community.ibm.com/zsystems/form/l1cc-oss-vm-request/

2. install a GitHub Actions runner on it: https://github.com/anup-kodlekere/gaplib

3. target it in our GHA YAML (using GDC as host compiler for now), to see how bad the test failures are

Yes, that would definitely be an option - that is the intended purpose of those VMs we make available.

As an alternative, some projects go the route of running SystemZ tests under qemu (see e.g. https://github.com/bytecodealliance/wasmtime/blob/main/.github/workflows/main.yml). For performance reasons, in this case it might be preferable to run the compiler natively (as cross-compiler) and only run the resulting test cases in qemu.

@liushuyu liushuyu force-pushed the s390x branch 2 times, most recently from ecea763 to c0cd3a2 Compare January 19, 2025 07:03
@JohanEngelen
Copy link
Member

Wrt. SystemZ CI, we apparently could

1. apply for a free VM from IBM: https://community.ibm.com/zsystems/form/l1cc-oss-vm-request/

2. install a GitHub Actions runner on it: https://github.com/anup-kodlekere/gaplib

3. target it in our GHA YAML (using GDC as host compiler for now), to see how bad the test failures are

Yes, that would definitely be an option - that is the intended purpose of those VMs we make available.

Is that a task you are willing to pick up? Thanks!

using namespace dmd;

struct SimpleHardfloatRewrite : ABIRewrite {
Type *getFirstFieldType(Type *ty) {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the C ABI, this is applied recursively: a struct with a single element that is a struct with a single element of floating-point type is also passed like a float, etc.


struct StructSimpleFlattenRewrite : BaseBitcastABIRewrite {
LLType *type(Type *ty) override {
const size_t type_size = size(ty);

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! I forgot to mention one similar case, sorry: a struct containing just a single element of vector type (or recursively another such struct), is passed like a plain vector type (i.e. in vector registers).

}
if (t->ty == TY::Tint128 || t->ty == TY::Tcomplex80) {
return true;
}

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Vector types of size up to 16 should be passed in vector registers, but only when compiling for an architecture that supports vector registers in the first place (i.e. z13 and above). Older machines use another ABI where vector types are always passed via reference. Do you intend to support both ABIs, or do you plan to simply require z13 or later (either in general, or whenever vector types are used)? That may be a reasonable choice at this point, but I guess you should make sure that the machine type / features are set up accordingly for the LLVM back-end.

I don't think in D, and there is an easy way to construct TY::Tint128 (you can use core.int128.Cent, but that type is { i64, i64 } in D ABI). Even if we lower it to int128 and pass by reference, LLVM will still correctly pass it using vector registers (see https://godbolt.org/z/a8xEEfezz).

int128 is always passed via reference, also in your godbolt example (it uses vector registers temporarily to set up the value, but the actual argument is passed at 160(%r15), with %r2 pointing to that address.

For TY::Tcomplex80, this lowers to fp128 and will automatically be handled by LLVM according to the -mcpu values passed to LLVM.

fp128 is always passed in a pair of floating-point registers, no matter what -mcpu.

What I was refering to applies solely to actual vector types (of size up to 16). Those are passed via reference on pre-z13 machines, and in vector registers on z13 and later.

/* Save callee-saved floating point registers
s390x ABI has a very unique way for storing fp registers:
even-pairs first and odd-pairs last */
std %f8, 0(%r15)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This clobbers the backchain data. (Which would make backchain-based unwinding, e.g. for profiling, break if a sample hits while in this routine.) Normally, what you'd do is to allocate a new register-save area below the 64 bytes of temporary stack space, and fill in only the backchain in that area.


/* Save stack pointer, the stack pointer is adjusted so that
GC won't see the float point registers */
stg %r15, 0(%r2)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure what exactly that comment means, but I don't see any adjustment to the stack pointer here?

@uweigand
Copy link

Wrt. SystemZ CI, we apparently could

1. apply for a free VM from IBM: https://community.ibm.com/zsystems/form/l1cc-oss-vm-request/

2. install a GitHub Actions runner on it: https://github.com/anup-kodlekere/gaplib

3. target it in our GHA YAML (using GDC as host compiler for now), to see how bad the test failures are

Yes, that would definitely be an option - that is the intended purpose of those VMs we make available.

Is that a task you are willing to pick up? Thanks!

I could certainly set up the machine and the Actions runner. For integration with your CI I'd probably need help from someone who understands its current setup ...

@kinke
Copy link
Member

kinke commented Jan 20, 2025

I could certainly set up the machine and the Actions runner.

Thanks, that'd be great!

For integration with your CI I'd probably need help from someone who understands its current setup ...

No worries, once we have a registered runner, we can take over. According to https://docs.github.com/en/actions/hosting-your-own-runners/managing-self-hosted-runners/adding-self-hosted-runners, the registration requires a token from us, which is valid for one hour. So a bit of timely coordination will be required.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants