Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

build Fedora+OpenSUSE arm64 container images #180

Merged
merged 4 commits into from
Nov 27, 2024

Conversation

abachmann
Copy link
Contributor

@abachmann abachmann commented Sep 13, 2024

Extends the GA workflow to build and push Fedora and OpenSUSE arm64 container images. The build-image script has been modified to use Docker Buildx for cross-building.

Preparation for: #179
Fixes: #155

@abachmann abachmann force-pushed the build-arm64-images branch 2 times, most recently from 9343f64 to dad4ff9 Compare September 13, 2024 19:47
@abachmann
Copy link
Contributor Author

@phlogistonjohn do you have time to look into this topic? It would be nice to see if the pipeline is running with my changes.

@phlogistonjohn
Copy link
Collaborator

I am currently traveling and dont have a lot of time too look, but I will allow the CI jobs to run!

Copy link
Collaborator

@obnoxxx obnoxxx left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot for your contribution!

This looks like a good addition.

Siome CI checks have failed, please fix.
Especially the check-commits test. (no commit message body provided).

@abachmann abachmann force-pushed the build-arm64-images branch 2 times, most recently from 7795ca5 to bd8e932 Compare September 24, 2024 19:58
@abachmann
Copy link
Contributor Author

abachmann commented Sep 24, 2024

@obnoxxx Thanks for the review. I have added a message body to all commits.

I found a potential reason for the pipeline failure: bd8e932

If it's okay to remove the arch parameter, I can squash the following commits:

@abachmann
Copy link
Contributor Author

abachmann commented Sep 25, 2024

Things are getting a bit more interesting now.
49.85 - Status code: 404 for https://mirrors.centos.org/metalink?repo=centos-resilientstorage-9-stream&arch=aarch64&protocol=https,http (IP: 8.43.85.73)

The centos-resilientstorage-9-stream repository is only available for the following architectures:

  • ppc64le
  • s390x
  • x86_64

It’s also possible to set the arch parameter to source. This will increase the build time, but we could add a check: if the target architecture is not supported, then fall back to source. This way, the amd64 builds won’t be affected.

I checked the list of available repositories and their architecture support. A lot of them support aarch64, but not the resilientstorage repositories. So, if we start building the packages from source, we might run into issues.

Before I proceed, I’d like to get your thoughts on this. Maybe you're already aware of the issue and have some solutions in mind.

Copy link
Collaborator

@obnoxxx obnoxxx left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for adding commit message bodies, @abachmann !

If you want to squash the two commits that change the workflows, go ahead. I don't have a strong opinion on that part as long as the change that modifies the script remains a separate commit.

I didn't quite understand yet what makes it necessary to remove the explicit invocation of the arch from the workflows. I do prefer we keep testing arm builds in the ci.

some ci checks are still failing. please address.

@obnoxxx
Copy link
Collaborator

obnoxxx commented Oct 14, 2024

@abachmann, any updates here?

We'd really like to take the changes, but please address the requests.

@abachmann
Copy link
Contributor Author

abachmann commented Oct 14, 2024

@obnoxxx, sorry for the late response. I was busy with other things.

I didn't quite understand yet what makes it necessary to remove the explicit invocation of the arch from the workflows. I do prefer we keep testing arm builds in the ci.

Yeah, I initially thought the explicit invocation of the arch was causing the pipeline issue, but it turned out that wasn’t the case. So, I reverted the commit.

As I wrote in my last comment:

Things are getting a bit more interesting now.
49.85 - Status code: 404 for https://mirrors.centos.org/metalink?repo=centos-resilientstorage-9-stream&arch=aarch64&protocol=https,http (IP: 8.43.85.73)

The centos-resilientstorage-9-stream repository is only available for the following architectures:

  • ppc64le
  • s390x
  • x86_64

Addressing the pipeline issue isn't so easy. I'll try to find some time over the weekend to fix it.

@obnoxxx
Copy link
Collaborator

obnoxxx commented Oct 15, 2024

@abachmann wrote:

@obnoxxx, sorry for the late response. I was busy with other things.

sure. Thanks for following up! And sorry for nagging so badly!

I didn't quite understand yet what makes it necessary to remove the explicit invocation of the arch from the workflows. I do prefer we keep testing arm builds in the ci.

Yeah, I initially thought the explicit invocation of the arch was causing the pipeline issue, but it turned out that wasn’t the case. So, I reverted the commit.

Thanks for explaining. This makes more sense now.

As I wrote in my last comment:

Things are getting a bit more interesting now.
49.85 - Status code: 404 for https://mirrors.centos.org/metalink?repo=centos-resilientstorage-9-stream&arch=aarch64&protocol=https,http (IP: 8.43.85.73)

right. The centos repo mirrors do sometimes have issues with availability.

The centos-resilientstorage-9-stream repository is only available for the following architectures:

  • ppc64le
  • s390x
  • x86_64

Interesting.

Addressing the pipeline issue isn't so easy. I'll try to find some time over the weekend to fix it.

Thanks! Looking forward to seeing a solution.

@abachmann
Copy link
Contributor Author

abachmann commented Oct 22, 2024

@obnoxxx

sure. Thanks for following up! And sorry for nagging so badly!

This topic was completely out of mind, so nagging was actually a good idea ;)

Over the weekend, I had some time to look more deeply into the issue where the resilientstorage repository was not found during the image build. As I mentioned before, no aarch64 packages are provided in the official repositories. So, I tried to build the packages from source. To do this, I modified the install-packeges.sh file as follows:

-     dnf_cmd+=(--enablerepo=crb --enablerepo=resilientstorage)
+     dnf_cmd+=(--enablerepo=crb --enablerepo=resilientstorage-source)

This resolved the 404 error, but I ran into another issue afterward: the ctdb package, which is provided by resilientstorage repository, was not installed. So, building the packages from source didn’t work as expected. Unfortunately, I wasn’t able to find another repository that provides the necessary aarch64 package.

If you know a repository which provides the missing package, please let me know. Otherwise, I would suggest to skip the centos arch64 images for for now, because all other OSes seem to work. I'll double-check it this week.

@phlogistonjohn
Copy link
Collaborator

I've had very little time to follow up with you on this but I did want to pop in and say that the idea to start with arm64 on the fedora based images alone would be fine - in fact, highly recommended.

FWIW, while I'm thinking about this topic, I would envision that the github actions build matrix have a task to build the arm64 images as an explicit include, for only fedora with default package source, similar to how we extend the build matrix for the devbuilds image:

include:
- package_source: devbuilds
os: centos
arch: amd64

One reason is that is is clean (IMO) and another is that I'm 90% sure the nightly builds don't have arm64 versions either. We'd need to work with @anoopcs9 and make changes in the https://github.com/samba-in-kubernetes/samba-build repo before we could consume those nightly samba builds in the images.

@nerijus
Copy link

nerijus commented Oct 22, 2024

I wasn’t able to find another repository that provides the necessary arch64 package.

Both AlmaLinux and Rocky Linux have ctdb aarch64 packages - https://pkgs.org/search/?q=ctdb

@abachmann abachmann force-pushed the build-arm64-images branch 2 times, most recently from dee4e69 to f8638fb Compare October 22, 2024 18:53
@abachmann abachmann requested a review from obnoxxx October 22, 2024 18:58
Copy link
Collaborator

@obnoxxx obnoxxx left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with @phlogistonjohn in that what is still missing here is making the arch one dimension of the workflow matrix
here:

include:
- package_source: devbuilds
os: centos
arch: amd64

Additionally, we can limit to Fedora for now and instead of waiting for arm packages from the distos, we should probably add arm builds to our own here: https://github.com/samba-in-kubernetes/samba-build

Finally, do we need to use docker for the container build, or can we use podman instead?

@abachmann abachmann force-pushed the build-arm64-images branch 2 times, most recently from 8030547 to edee756 Compare October 23, 2024 17:35
- name: Fetch server default-fedora-arm64
uses: ishworkh/[email protected]
with:
image: "samba-server:default-fedora-arm64"
container_engine: ${{ env.CONTAINER_CMD }}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm going to recommend skipping the tests for now. Let's just get the builds on emulated arm64 in this initial PR.

hack/build-image Outdated
@@ -125,6 +125,7 @@ QUAL_FQIN = 'fqin'


_DISCOVERED_CONTAINER_ENGINES = []
_BUILDX_BUILDER_NAME = "samba-in-kubernetes"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you explain what this does? I'm not too familiar with buildx.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When installing buildx, a default builder is created that can only build images for the host platform, which in my case is amd64.

The docker buildx ls command can be used to list all available builders:

NAME/NODE               DRIVER/ENDPOINT             STATUS  BUILDKIT             PLATFORMS
default *               docker                                                   
  default               default                     running v0.11.7+d3e6c1360f6e linux/amd64, linux/amd64/v2, linux/amd64/v3, linux/386

To build images for platforms other than the host, a new builder must be created that supports other architectures. Each builder requires a unique name, so I used the GitHub group name as the builder name, and this variable holds its value.

After creating the new builder, samba-in-kubernetes, the output of docker buildx ls looks like this:

NAME/NODE               DRIVER/ENDPOINT             STATUS  BUILDKIT             PLATFORMS
samba-in-kubernetes     docker-container                                         
  samba-in-kubernetes0  unix:///var/run/docker.sock running v0.16.0              linux/arm64*, linux/amd64*, linux/arm/v7*

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I understand the buildx workflow a bit better now, thanks!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I removed the global variable and used target.flat_name() as the builder name instead.

hack/build-image Outdated Show resolved Hide resolved
@abachmann
Copy link
Contributor Author

Finally, do we need to use docker for the container build, or can we use podman instead?

@obnoxxx, I believe building the images with podman should work as well, but I have not tested it.

Finally, the pipeline built the following Fedora arm64 images successfully:

  • server
  • ad-server
  • client

@phlogistonjohn, what about the toolbox image? This image is only used for testing, so we can skip it, right? If we need to build it, then we have an issue because the Dockerfile is using the latest tag, which points to the Fedora amd64 image.

@abachmann abachmann changed the title build/push arm64 container images build Fedora arm64 container images Oct 23, 2024
hack/build-image Outdated Show resolved Hide resolved
hack/build-image Outdated
@@ -125,6 +125,7 @@ QUAL_FQIN = 'fqin'


_DISCOVERED_CONTAINER_ENGINES = []
_BUILDX_BUILDER_NAME = "samba-in-kubernetes"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I understand the buildx workflow a bit better now, thanks!

hack/build-image Outdated
# Therefore, we need to create a new builder to support other
# architectures. Errors are suppressed to prevent issues when
# the builder is already available - this can be improved later.
run(cli, [eng, "buildx", "create", f"--name={_BUILDX_BUILDER_NAME}"], check=False)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would be a bit clearer if we moved this line and the corresponding comment above the assignment to args = (currently on line 213). That way a reader immediately sees the command setting up the new builder

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to remove/clean up this builder? I won't require it for the PR but for people using the script outside the CI environment (and we exist) I wouldn't not like to leave stuff behind. Similarly, is there an issue if this command is executed >1 time on the same host?

Copy link
Contributor Author

@abachmann abachmann Oct 24, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, we need to clean up the builder, which is one reason I added the comment 'this can be improved later'. Executing this command more than once should not result in an error due to check=False. However, this approach is not ideal since it also suppresses other potentially important errors.

I’d like to address these topics in this PR. I’ll try to find some time this evening.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added two new commits to clean up the container_build method a little bit and to address the open topics.

When starting a new container build, the builder, which might exist due to a failure, is removed. This way, we can ensure that we always start with a fresh builder and have no leftovers from a previous build. Afterwards, a new clean builder is created. After the build process is complete, the created builder will be removed.

@phlogistonjohn
Copy link
Collaborator

@phlogistonjohn, what about the toolbox image? This image is only used for testing, so we can skip it, right? If we need to build it, then we have an issue because the Dockerfile is using the latest tag, which points to the Fedora amd64 image.

I'm OK with skipping it for now. I'm in favor of moving forward in baby steps for this work :-)

@obnoxxx
Copy link
Collaborator

obnoxxx commented Oct 24, 2024

I created a request to add arm builds to samba-build: samba-in-kubernetes/samba-build#63

Once we have that, using those RPMs for Arm containers should not be a problem

@obnoxxx
Copy link
Collaborator

obnoxxx commented Oct 30, 2024

I created a request to add arm builds to samba-build: samba-in-kubernetes/samba-build#63

Once we have that, using those RPMs for Arm containers should not be a problem

@abachmann , FYI:

RPM builds for ARM have now been implemented in the samba-build repo and er have first nightly builds from master and release 4.21.

https://artifacts.ci.centos.org/samba/pkgs/master/fedora/samba-nightly-master.repo
https://artifacts.ci.centos.org/samba/pkgs/4.21/fedora/samba-nightly-4.21.repo

https://artifacts.ci.centos.org/samba/pkgs/master/centos/samba-nightly-master.repo
https://artifacts.ci.centos.org/samba/pkgs/4.21/centos/samba-nightly-4.21.repo

This samba-container project already has mechanisms to use those custom yum repos.

@phlogistonjohn
Copy link
Collaborator

Because of the work @anoopcs9 did on those builds I learned that centos ci has real arm builders - I was not aware of this before. My suggestion is to drive this PR to completion but once it is done we should rethink how our "nightly" images are built. It may make way more sense to switch to the centos ci infrastructure. We can discuss the topic in more detail later but I don't think building up everything using github actions is ideal path forward when we have access to actual arm builders.

@abachmann
Copy link
Contributor Author

@obnoxxx, thanks for the info. I’ll try to find some time over the weekend to take a closer look.

@abachmann
Copy link
Contributor Author

@phlogistonjohn, I addressed the review concerns in a separate commit to ease the review process. If you're okay with these changes, I'd like to combine the commit hack: use buildx to cross-build images with docker and hack: rework container_build function into a single commit, as I don't see any benefit to keeping them separate. I also want to ensure we're all on the same page and fine with the current changes before I start working on the CentOS topic.

@abachmann
Copy link
Contributor Author

@phlogistonjohn, @obnoxxx, I have a question: what about OpenSUSE images? I see that images are built for this OS, but they aren’t pushed. I was able to build OpenSUSE images locally for arm64 without any issues. What’s the plan for OpenSUSE?

@phlogistonjohn
Copy link
Collaborator

I'm in favor of starting small. Let's handle the fedora images and then move on to opensuse later. If they're easy enough to handle, I'm in favor of doing it.

Go ahead and squash commits as you see fit.

@abachmann
Copy link
Contributor Author

abachmann commented Nov 21, 2024

I extended the GH workflow to build Fedora nightly images as well. For testing, I added a separate commit for the opensuse images to check if they are built in the pipeline. Should I create a new PR for the opensuse changes?

@phlogistonjohn
Copy link
Collaborator

@Mergifyio rebase

Copy link

mergify bot commented Nov 26, 2024

rebase

✅ Branch has been successfully rebased

@phlogistonjohn
Copy link
Collaborator

I have approved this PR and am prepared to merge it. I do want to let you know that @anoopcs9 and I are looking into switching the process we use to build the images that are pushed to quay.io to use centos CI because this platform has native arm64 machines. I don't want to invest too much more effort in doing cross-arch builds in github CI. I do want to continue doing the PR tests in github CI because of the nice feedback it provides.

See samba-in-kubernetes/samba-centosci#73 for this initial effort.

@abachmann abachmann requested a review from obnoxxx November 26, 2024 15:37
@phlogistonjohn
Copy link
Collaborator

mergify rules require a 2nd approval (since this is coming from a contributor and not a maintainer) - @obnoxxx @anoopcs9 @gd - thanks!
I'd like not to have to override mergify :-)

@abachmann abachmann changed the title build Fedora arm64 container images build Fedora+OpenSUSE arm64 container images Nov 27, 2024
Copy link
Collaborator

@obnoxxx obnoxxx left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM now.
Thanks a lot for your good contribution and patience with our review requests, @abachmann !

@phlogistonjohn
Copy link
Collaborator

@Mergifyio rebase

Add a separate function to create common container engine arguments
to reduce complexity in container_build function.

Signed-off-by: Alexander Bachmann <[email protected]>
When Docker is used and the target architecture differs from the host
architecture, buildx is used for cross-building.

Depending on the container engine and the target architecture,
multiple commands need to be executed to prepare the environment,
build the image, and clean up artifacts afterward.

To streamline this process, a tasks array (an array of lambdas) has
been introduced, its elements are executed sequencally at the end
of the container_build function.

Signed-off-by: Alexander Bachmann <[email protected]>
Extend the build matrix to include fedora arm64 images for samba-server,
samba-ad-server and samba-client.

Signed-off-by: Alexander Bachmann <[email protected]>
Extend the build matrix to include opensuse arm64 images for samba-server,
samba-ad-server and samba-client.

Signed-off-by: Alexander Bachmann <[email protected]>
Copy link

mergify bot commented Nov 27, 2024

rebase

✅ Branch has been successfully rebased

@phlogistonjohn
Copy link
Collaborator

phlogistonjohn commented Nov 27, 2024

Totally bizarre - I swear the comment I wrote was on a ceph RP I had open in a completely different tab.

@mergify mergify bot merged commit a7a3866 into samba-in-kubernetes:master Nov 27, 2024
41 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add AARCH64 Support
4 participants