Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactoring of the container image build process for simplification, local testing and broader contributions #132

Open
gbartolini opened this issue Jan 2, 2025 · 5 comments · May be fixed by #135
Labels
enhancement New feature or request

Comments

@gbartolini
Copy link
Contributor

gbartolini commented Jan 2, 2025

Following @ardentperf's proposal in #126 to reorganize the container images, I have begun analyzing the current workflows to assess their feasibility. The primary objective is to align with the long-term vision of enabling CloudNativePG to operate with minimalistic images containing only PostgreSQL.

The current workflow reflects "scar tissue" development—residual patterns from an earlier phase. Initially, we adapted workflows developed at EDB for the closed-source operator and reused them with minimal adjustments when we open-sourced the project. This included adopting Docker Hub’s official PostgreSQL images. Over time, we made incremental changes as needed but never paused to reassess the overall structure comprehensively. This proposal aims to use this opportunity to introduce fundamental changes to how we build container images. Below is my initial proposal, intended to spark constructive discussions.

Distributions

  • Continue using Debian stable while maintaining oldstable.

Base Image

Transition from the official Docker Hub PostgreSQL image to images built on Debian Slim for Bookworm (12, stable) and Bullseye (11, oldstable). (Refer to Debian Releases.)
The primary reasons for this transition are:

  1. Official PostgreSQL images are designed to function outside Kubernetes as system containers, running as root (undesirable for us).
  2. They include an entry point we do not require.

Instead, we need the following:

  • Debian Slim as the base
  • PostgreSQL APT repositories for package management
  • PostgreSQL itself
  • Cleanup to minimize image size
  • Non-root user (user ID 26, consistent with the existing CNPG default from RH packages)
  • No entry point

We should include multi-lang packages as well, given the expected reduction in size.

Flavours

  1. Minimal: Contains only PostgreSQL. Maintained by the core maintainers.
  2. Standard: Builds on the minimal flavour, adding Barman Cloud (until required) and extensions like pgaudit, pg_failover_slots, and pgvector. We should only use APT packages and stop building Barman from sources. Maintained by the core maintainers.
  3. Full: Includes the minimal and standard components plus additional tools proposed in @ardentperf's proposal: minimal vs standard containers #126. Maintained by specific component owners (@ardentperf and ideally additional volunteers) under clear guidelines. This could contain also PostGIS.

Frequency

The original goal was a continuous delivery approach, rebuilding images only when underlying packages or the base image underwent significant changes. However, this intent was disrupted when we transitioned Barman Cloud from package-based installation to building from source using requirements.txt as a checksum. This led to unnecessary daily image regenerations.

Monthly or bi-weekly builds are more than enough, with on-demand builds as necessary (for example, in the case of a new release of PostgreSQL).

Container Image Sequence Number

Previously, we increased a sequence number whenever a container image of a specific PostgreSQL version contained changes (e.g., postgresql:16.6-28-bookworm, where 28 represents the 28th build for PostgreSQL 16.6 on Bookworm).

This approach stemmed from limitations in the earlier operator (pre-CloudNativePG), which lacked support for image digests like <image>:<tag>@sha256:<digestValue>. As this limitation no longer exists, the sequence number is now redundant.

Proposal: Replace Sequence Number with Timestamps

We can transition to using timestamps to denote builds, aligning better with modern image-management practices and avoiding unnecessary complexity. Precision could be minutes (or seconds) in the UTC timezone.

Cognitive Load and Untestability of the Current Build Process

The current build process is overly complex and untestable locally. It relies on GitHub Actions to generate a convoluted matrix of combinations (which are actually predetermined) and to manage the sequence number. This complexity is unnecessary and can be replaced with a streamlined approach that can be run and tested locally.

We should aim for a single multi-stage Dockerfile that accepts PostgreSQL versions and Debian distributions as parameters. Additionally, we should explore open-source tools to assist with image building and related areas, such as generating SBOMs (Software Bill of Materials) and enriching OCI metadata, which have become more robust in recent years.

OCI Metadata

Currently, our images only include basic LABELs. We should adopt OCI annotations with the org.opencontainers.image prefix to better align with industry standards.

SBOM

This is an excellent opportunity to introduce SBOMs into the build process, ensuring transparency and compliance with modern security practices.

Image Naming Schema

Our current naming schema is:
postgresql:<POSTGRES_VERSION>-<SEQUENCE>-<DEBIAN_VERSION_NAME>.

We manage the following aliases:

  • latest: Points to the highest sequence number for the latest patch of the highest PostgreSQL major version (currently 17) on bullseye.
  • MAJOR_VERSION (e.g., 17): Points to the highest sequence number for the latest patch of a specific PostgreSQL major version on bullseye.
  • POSTGRES_VERSION (e.g., 17.2): Points to the highest sequence number for a specific PostgreSQL minor version on bullseye.
  • MAJOR_VERSION-DEBIAN_VERSION_NAME (e.g., 17-bullseye): Points to the latest patch for a specific PostgreSQL major version on a given Debian version.
  • POSTGRES_VERSION-DEBIAN_VERSION_NAME (e.g., 17.2-bullseye): Points to the latest patch for a specific PostgreSQL minor version on a given Debian version.

In practice, only the last two aliases are meaningful. The first three stem from "scar tissue" approaches dating back to when we had a single Debian version and used these containers to promote PostgreSQL in Kubernetes. These approaches are no longer necessary.

Proposed Schema

We propose a new naming schema:
postgresql:<POSTGRES_VERSION>-<FLAVOUR>-<DEBIAN_VERSION_NAME>-<TIMESTAMP_TO_MINUTE_IN_UTC>.

The <FLAVOUR> field could be one of:

  • minimal
  • standard
  • full

Under this schema, we eliminate aliases that omit critical details like flavour and Debian version, resulting in examples such as:

  • 17-minimal-bookworm: The latest minimal image on Bookworm with the most recent PostgreSQL 17 patch.
  • 17.2-minimal-bullseye: The latest minimal image on Bullseye with the most recent PostgreSQL 17.2 patch.

Image Catalogs

Currently, image catalogs for PostgreSQL containers are built and stored in the Git repository. Given the direction to separate operands from extensions, it makes sense to suspend changes to the catalogs for now. Eventually, these catalogs should be moved to a separate repository to streamline development and maintenance.

Gradual Deprecation of Existing Images

To ensure a seamless transition for end users, we should implement a gradual deprecation strategy for the current operand images. This will minimise disruption while encouraging the adoption of the new image schema.

The most noticeable change will be the removal of the latest alias, along with all other aliases that do not explicitly include:

  1. The name of the Debian distribution in the first stage.
  2. The flavour of the image.

Smoke Tests

We should incorporate smoke tests for each image built, ensuring they are tested with the latest stable version of CloudNativePG.

Summary

This document is likely not exhaustive, but it aims to provide a solid foundation for further discussions and planning.

Suggested Next Steps

  1. Repository Structure: Decide whether to:

    • Create a new branch within this repository, or
    • Establish a new repository entirely (likely the better option, though naming it might prove challenging).
  2. Testing and Artifact Management: Begin testing the proposed changes and push resulting artifacts to the postgresql_testing repository for review and validation.

These steps should guide us toward a more efficient and flexible container image build process. Let’s continue iterating on this as a community!

@gbartolini
Copy link
Contributor Author

gbartolini commented Jan 3, 2025

Ad Interim Proposal

To streamline the current workflow while maintaining backward compatibility, we propose reusing the existing Git repository and introducing a temporary system flavour. This flavour will serve as an interim solution and is intended for eventual deprecation.

The system flavour will be built from the official PostgreSQL image using the simplified approach outlined above. The naming convention for these images will follow this structure:

postgresql:<POSTGRES_VERSION>-system-<DEBIAN_VERSION_NAME>-<TIMESTAMP_TO_MINUTE_IN_UTC>

Key Features:

  • Backward Compatibility: Maintains support for existing container images
  • Unchanged Dockerfile Content: No significant modifications will be made to the current Dockerfile content (including Barman Cloud, which will continue to be built from sources).
  • Aliasing Support: Existing aliases, including latest, will be retained for continuity.

Proposed Timeline

Q1/2025

  • Transition to the simplified build process:
    • Introduce the ad interim system flavour with a clear path for deprecation.
    • Begin building new flavours: minimal, standard, and, when ready, full.

CloudNativePG 1.26 Release

  • User Recommendations: Encourage users who rely on Barman Cloud to transition to the new plugin API based on the standard container image (without Barman Cloud, which will be part of the plugin sidecar image).
  • Support Measures: Identify and implement strategies to facilitate the migration to the plugin API.

CloudNativePG 1.28 Release

  • Core Changes: Remove Barman Cloud from in-core in favour of the plugin and start setting the standard image as the default image.
  • Deprecation Notice: Initiate the deprecation process for system flavour images.

3 Months Post-CloudNativePG 1.28

  • Discontinuation: Cease building system flavour images entirely and remove those aliases that don't include flavour and Debian version.

@gbartolini
Copy link
Contributor Author

Regarding locales, we should seek advice from @ardentperf—refer to issue #123 . The research on this topic conducted by him and Jeff Davis is both thorough and impressive. I strongly recommend leaning on his expertise and following his recommendations.

@gbartolini
Copy link
Contributor Author

I am going to create sub tickets to organize the work.

@sxd
Copy link
Member

sxd commented Jan 3, 2025

This will be a really big improvement to the container images!! what do you think about this @NiccoloFei ?

fcanovai added a commit that referenced this issue Jan 9, 2025
Build images without barman-cloud, to be used with backup plugins.

Closes #132
fcanovai added a commit that referenced this issue Jan 9, 2025
Build images without barman-cloud, to be used with backup plugins.

Closes #132

Signed-off-by: Francesco Canovai <[email protected]>
fcanovai added a commit that referenced this issue Jan 9, 2025
Build images without barman-cloud, to be used with backup plugins.

Closes #132

Signed-off-by: Francesco Canovai <[email protected]>
@fcanovai fcanovai linked a pull request Jan 9, 2025 that will close this issue
fcanovai added a commit that referenced this issue Jan 9, 2025
Build images without barman-cloud, to be used with backup plugins.

Closes #132

Signed-off-by: Francesco Canovai <[email protected]>
fcanovai added a commit that referenced this issue Jan 9, 2025
Build images without barman-cloud, to be used with backup plugins.

Closes #132

Signed-off-by: Francesco Canovai <[email protected]>
fcanovai added a commit that referenced this issue Jan 9, 2025
Build images without barman-cloud, to be used with backup plugins.

Closes #132

Signed-off-by: Francesco Canovai <[email protected]>
fcanovai added a commit that referenced this issue Jan 9, 2025
Build images without barman-cloud, to be used with backup plugins.

Closes #132

Signed-off-by: Francesco Canovai <[email protected]>
fcanovai added a commit that referenced this issue Jan 9, 2025
Build images without barman-cloud, to be used with backup plugins.

Closes #132

Signed-off-by: Francesco Canovai <[email protected]>
fcanovai added a commit that referenced this issue Jan 9, 2025
Build images without barman-cloud, to be used with backup plugins.

Closes #132

Signed-off-by: Francesco Canovai <[email protected]>
fcanovai added a commit that referenced this issue Jan 9, 2025
Build images without barman-cloud, to be used with backup plugins.

Closes #132

Signed-off-by: Francesco Canovai <[email protected]>
fcanovai added a commit that referenced this issue Jan 9, 2025
Build images without barman-cloud, to be used with backup plugins.

Closes #132

Signed-off-by: Francesco Canovai <[email protected]>
gbartolini added a commit that referenced this issue Jan 10, 2025
Signed-off-by: Gabriele Bartolini <[email protected]>
gbartolini added a commit that referenced this issue Jan 10, 2025
Signed-off-by: Gabriele Bartolini <[email protected]>
@ardentperf
Copy link

ardentperf commented Jan 13, 2025

We should include multi-lang packages as well, given the expected reduction in size.

In the docs, I think that it would be good to include a note alongside this with the recommendation going forward to avoid libc linguistic collations in favor of either ICU or builtin/C. I can propose some doc update wording as a PR after things are merged. Continuing to provide libc linguistic collations (via the multi-lang packages) for compatibility with older applications, or while people transition to builtin and ICU, seems reasonable to me.

Proposed Schema

We propose a new naming schema: postgresql:<POSTGRES_VERSION>-<FLAVOUR>-<DEBIAN_VERSION_NAME>-<TIMESTAMP_TO_MINUTE_IN_UTC>.

...

Under this schema, we eliminate aliases that omit critical details like flavour and Debian version, resulting in examples such as:

...

The most noticeable change will be the removal of the latest alias, along with all other aliases that do not explicitly include:

Eliminating aliases that omit the debian major version is a great improvement, because it will prevent corruptions due to people unknowingly getting an image from a different debian major when the maintainers change where the tag points. I think this is a good decision to drop those tags for production use.

I do think we should add a little documentation about the reasoning (avoiding database corruption) because it does add cognitive load for developers. I'm happy to help draft something up for this after things are merged, along with proposed wording around multilang packs that I mentioned above.

I agree that removing the latest alias may be noticeable. If there's pushback on the idea of removing this, we might consider having it only for the postgresql-testing package/target (but not the production-ready postgresql package/target)?

Flavours

  1. Minimal: Contains only PostgreSQL. Maintained by the core maintainers.
  2. Standard: Builds on the minimal flavour, adding Barman Cloud (until required) and extensions like pgaudit, pg_failover_slots, and pgvector. We should only use APT packages and stop building Barman from sources. Maintained by the core maintainers.
  3. Full: Includes the minimal and standard components plus additional tools proposed in @ardentperf's proposal: minimal vs standard containers #126. Maintained by specific component owners (@ardentperf and ideally additional volunteers) under clear guidelines. This could contain also PostGIS.

and from #126 :

It is important though to clarify the supportability. It must be clear who is responsible to support users when a certain extension or combination of extensions is not working, or for example, updates, and so on. This could easily become an activity that takes a lot of time and IMHO it is not fair for a community to be expected to do that (my 2 cents).

I think this is the most important part of the whole proposal - specifically having crisp, clear and strategic definitions or tenets for flavours. Also agree that supportability is a very important dimension here.

Related: in #126 @gbartolini also said "Our long-term vision is for each extension to be distributed as a self-contained container image. CloudNativePG would then be able to rely on Kubernetes' VolumeMount functionality (currently in alpha) to mount immutable extension images as volumes and configure PostgreSQL to find them."

Proposed wording:

Minimal: A production-ready container meeting the bare minimum requirements to run PostgreSQL on CNPG. Requires CNPG-I for backup. This image intentionally has the minimum possible functionality that can be supported out-of-the-box for production CNPG use. It is also intended as a base container for user customization. Maintained by the core maintainers.

  1. I would propose that we treat pg-failover-slots as a hard requirement for CNPG, and include it in the minimal image.
  2. I'm hopeful LLVM can be removed from the debian postgres package for pg18, similar to how it's been removed from the rpm postgres package. If that happens, then I propose that LLVM/JIT is not included in the minimal image.
  3. Eventually, automated full-stack CNPG test coverage including the minimal and standard images.
  4. Eventually, if extensions move to self-contained container images with VolumeMount functionality, the extensions in the standard image might be transitioned to self-contained images and this image might become the primary image for production use. But I think that's a little ways out and the direction might still evolve further.
  5. What's the status of CNPG-I for testing? I'd lean toward not including in-tree Barman here at all if CNPG-I is at a state where we can start testing this image with it now.

Proposed wording:

Standard: Builds on the minimal flavour, including in-tree Barman Cloud for backup and the standard extensions that are supported by CloundNativePG. This is compatible with both CNPG-I and legacy in-tree Barman backup. User experience is prioritized over minimal container size. Maintained by the core maintainers.

  1. Includes in-tree barman, pgaudit, pgvector and maybe postgis. One-stop shop for all extensions officially supported by CNPG maintainers. This will keep in-tree Barman a little longer than minimal and custom for users who take longer to transition.
  2. Specifically on postgis: I realize that it increases size, but in my opinion this is a place where it's worth considering user experience over size. Every additional official image increases cognative load for users (choosing one and looking at options). It also increases the matrix of images that need official test coverage and support. If CNPG wants to officially support postgis then I'd suggest just including it in the standard image. Note that major version upgrades of postgis can get complicated; it might complicate major version upgrades with CNPG to officially support this extension. I don't know how much usage the postgis CNPG container has - if it's not getting any serious usage then maybe we want to move it to the "custom/full" image below and give it volunter-support-status for now.
  3. If LLVM/JIT is removed from the base debian package, this image should explicitly include it.
  4. I don't think versioning of extensions has been covered yet. This is a complicated topic. I like the idea of tightly coupling extension versions to postgres minor versions, similar to AWS. Extension version changes are not always seamless and in some cases it requires running an upgrade SQL command (otherwise DB metadata is out-of-sync with library files). Users need to know when extension version changes will come, and they need to plan for this. By far, the simplest mechanism is to just pin extension versions on each postgres minor. It means that users might need to wait a few months for an extension update, but it tends to be worthwhile from an overall project perspective. It makes the user experience a lot easier. If there is a zero-day vulnerability or highly-critical bug, then an exception can be made to update the latest currently released minor version - along with making lots of noise to make sure users know the change is happening if any action is required on their part (like an alter extension update SQL execution). Users can still use a tag like 17-bookworm to always get the latest versions, but users on tags like 17.1-bookworm would get the stability they expect (no surprise updates underneath them). There also might be a few key OS packages that should be pinned as well, like ICU or LLVM? I'm not sure. The topic of change management across a full dependency tree gets complex. We do also have the daily tags in the new tagging scheme, so maybe those can be used for stability/control? Regardless, from a user perspective, it's very useful if they can find a set of "release notes" for a given version to see exactly what changes are included - throughout the entire container image. I'm not sure the best way to do that here.
  5. Agree about only using APT packages, extensions should be first packaged in the debian ecosystem if they are desired here.
  6. Will there be filesystem conflicts if the barman CNPG-I plugin is used with a container that also has in-tree barman cloud installed in the image? I briefly reviewed https://github.com/cloudnative-pg/plugin-barman-cloud but didn't see an answer in the README. Just want to make sure the transition will be smooth.

Proposed wording:

Custom: A large full image with widely expanded postgres functionality out-of-the-box. Requires CNPG-I for backup. This also serves as an example of creating a customized image downstream from the minimal image. Volunteer-driven with best-effort support. Not maintained by the core maintainers.

  1. APT packages only, packages must be available in debian ecosystem. Anything in PGDG can be automatically included (assuming no issues), other repositories can be submitted for discussion but we will want to talk about whether contributors can directly support the extension in PGDG repos before accepting outside repos.
  2. An initial goal is to add all extensions that are commonly supported by other major postgres providers.
  3. Follows same versioning guidelines outlined above.
  4. Not guaranteed to have the same level of automated full-stack CNPG test coverage that the officially supported images have.
  5. Same as minimal - I'd lean toward not including in-tree Barman here at all if CNPG-I is at a state where we can start testing this image with it now.
  6. I was thinking it could also maybe serve as a "reference implementation" of sorts for customizing on top of the minimal image and provide step-by-step instructions for people to do their own customizations choosing only the extensions they want.

That last one is the most interesting bit to me and I'm curious what others think. How about a separate simple repo that just creates this custom (aka full) image downstream from minimal? Then if a user wants to create their own custom image, they can basically just fork this repo and remove the lines for the APT packages they don't want. I think with the new docker-bake build system, this could be a really simple repo. Also, I switched the name to custom so that when users fork the repo to make their own images, they will inherit the name "custom" in their own tags.

gbartolini pushed a commit that referenced this issue Jan 13, 2025
Build images without barman-cloud, to be used with backup plugins.

Closes #132

Signed-off-by: Francesco Canovai <[email protected]>
gbartolini added a commit that referenced this issue Jan 13, 2025
Signed-off-by: Gabriele Bartolini <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Development

Successfully merging a pull request may close this issue.

3 participants