Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Helm chart redesign for Quarkus-based runtimes #626

Open
wants to merge 5 commits into
base: main
Choose a base branch
from

Conversation

adutra
Copy link
Contributor

@adutra adutra commented Jan 8, 2025

This is a major redesign of the Helm chart to make it compliant with Quarkus, but also improving a lot of aspects of the existing chart, such as the persistence secret management and the bootstrap jobs, to name a few.

Summary of changes

Persistence

Thanks to the improvement brought by #613, it's not necessary anymore to have an init container create a conf.jar. From now on, the user provides a secret with their persistence.xml, and that file is mounted on each Polaris pod directly.

Enhanced services

The services for Polaris now are separated in 3 categories:

  1. Main service: the service that serves API requests; it is usually behind an Ingress or LoadBalancer
  2. Management service: the service for metrics and health; this is usually ClusterIP and headless
  3. Extra services: if any of the services needs to be exposed with different settings to different consumers, this section can be used.

Each service section has the same structure and allows to configure one or more ports.

Observability

New sections were added for logging, tracing, metrics, with also the ability to create a ServiceMonitor.

Bootstrap Job

The bootstrap section now configures the bootstrap job. Thanks to the Polaris Admin Tool introduced in #605, the jobs now use the tool to bootstrap realms.

I am not convinced personally that this bootstrap job has a huge value, compared to just running the admin tool directly. But I didn't want to remove it.

Advanced Config

A new advancedConfig section can be used to customize Polaris deployments in any possible way, even when the Helm chart does not expose the desired setting.

@adutra adutra force-pushed the quarkus-helm branch 4 times, most recently from 09657d1 to f66d623 Compare January 8, 2025 17:21
@adutra adutra force-pushed the quarkus-helm branch 13 times, most recently from 0ac047b to 434a847 Compare January 13, 2025 21:36
@adutra adutra force-pushed the quarkus-helm branch 3 times, most recently from 68cbdfc to acee0c9 Compare January 15, 2025 09:50
@adutra adutra changed the title [WIP] Helm chart redesign for Quarkus-based runtimes Helm chart redesign for Quarkus-based runtimes Jan 24, 2025
```

### Uninstalling the chart

```bash
$ helm uninstall --namespace polaris polaris
helm uninstall --namespace polaris polaris
```

## Values
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note: this is automatically generated by helm-docs.

@adutra adutra marked this pull request as ready for review January 24, 2025 16:51
@adutra
Copy link
Contributor Author

adutra commented Jan 24, 2025

This is now ready for review.

@MonkeyCanCode I would love to get your review on this please :-)

I know we could add more unit tests, but I wanted to get your general feeling first. We can complete test coverage little by little imho.

@MonkeyCanCode
Copy link
Contributor

This is now ready for review.

@MonkeyCanCode I would love to get your review on this please :-)

I know we could add more unit tests, but I wanted to get your general feeling first. We can complete test coverage little by little imho.

Great. Nice work. I will review those later this weekend

@adutra adutra force-pushed the quarkus-helm branch 4 times, most recently from 67bf183 to 69c1265 Compare January 24, 2025 18:38
# -- The root credentials to create during the bootstrap. If you don't provide credentials for the
# root principal of each realm to bootstrap, random credentials will be generated.
# Each entry in the array must be of the form: realm,clientId,clientSecret
credentials: []
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is admittedly not great. I opened #878 to improve this and use secrets instead. But for now we need to stick with credentials in clear text.

{{- if has $portNumber (values $ports) -}}
{{- fail (printf "service.ports[%d]: port number already taken: %v" $i $portNumber) -}}
{{- end -}}
{{- $_ := set $ports $port.name $portNumber }}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nits: lines 261-264 and 276-278, may want to add 2 more space for each of them to fix indentation (still valid yaml files)

args: ['{{ include "polaris.fullname" . }}:{{ index .Values.service.ports "polaris-metrics" }}/q/health']
args:
- --spider
- '{{ include "polaris.fullnameWithSuffix" (list . "mgmt") }}:{{ get (first .Values.managementService.ports) "port" }}/q/health/ready'
restartPolicy: Never
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seem likes now the pod will take some time to start up. From my local testing, this will always fail. Should we consider add an init container to sleep for X seconds before run the connection test? Here is a sample change:

  initContainers:
    - name: sleep
      image: busybox
      command: ['sleep', '30']

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm it wasn't failing for me, the test pod always comes up after the Polaris pod reaches ready state.

I don't mind changing but waiting 30 seconds seems a lot, how about we poll the health endpoint X times until it succeeds?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that will be good. Yeah, on my setup it was taking 6-7 seconds somehow. But yeah, good to add some retry etc.

name: {{ tpl .Values.authentication.tokenBroker.secret.name . }}
items:
{{- if eq .Values.authentication.tokenBroker.type "rsa-key-pair" }}
- key: {{ tpl .Values.authentication.tokenBroker.secret.publicKey . }}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nits: lines 173-175 and 179-180, may want to add 2 more space for each of them to fix indentation (still valid yaml files)

@@ -99,7 +99,7 @@ jobs:
if: steps.list-changed.outputs.changed == 'true'
run: |
eval $(minikube -p minikube docker-env)
./gradlew :polaris-quarkus-server:assemble \
./gradlew :polaris-quarkus-server:assemble :polaris-quarkus-admin:assemble \
-Dquarkus.container-image.build=true \
-PeclipseLinkDeps=com.h2database:h2:2.3.232
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While testing the helm, I noticed the bootstrap pod will shows the bootstrap is completed. However, when trying to use the polaris CLI against the server, it will say the instance is not yet bootstrapped. Not sure if this is known issue?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you using in-memory persistence? Because with in-memory it won't work indeed. The bootstrap job can only be used with EclipseLink.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, H2 will not work either because it's in-memory.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@MonkeyCanCode I think we should switch our tests to use Postgres, wdyt? H2 won't give us a "real" experience and bootstrapping will be broken.

I just tested with postgres and the bootstrap job worked and I was able to connect to Polaris after.

The only annoying thing is that we will need support for purge as well. I will try to add that too.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I was using this branch for testing and bootstrap is probably hitting in-memory persistency for h2database i think? I also agreed we should use somewhat real db people may be using (e.g. Postgres) instead of using h2database (kinds left people to think this project is not close to ready to be use).

The latest main branch seems to be pretty stable (I only did some basic testing...planning to do more later this weekend). So maybe good to introduce those changes now?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is probably hitting in-memory persistency for h2database i think?

Yup it was using H2 in-memory.

So maybe good to introduce those changes now?

I just pushed a commit that switches our tests to postgres. It's working well in CI.

When doing local testing though, if you install the chart more than once, you need to either kill the postgres pod or --set bootstrap.enabled=false to avoid an error when bootstrapping the realm, since it was already bootstrapped previously.

env:
{{- if .Values.storage.secret.name }}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe use a helper function for secret to env?

If necessary, load the Docker images into Minikube:

```bash
eval $(minikube -p minikube docker-env)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nits: I used kind a lot more as oppose to minikube (both are great...kind is very lightwight and super fast). I am okay with switching to minikube, we may want to update https://github.com/apache/polaris/blob/main/run.sh as well to match the same.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah that's true. Well, let's mention both! I will add a note about Kind.

@adutra adutra force-pushed the quarkus-helm branch 4 times, most recently from 6e88142 to de2db6f Compare January 31, 2025 10:59
{{- end }}
{{- end }}
containers:
- name: "polaris-bootstrap"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just want to go on record as saying I think a bootstrap container is a bad idea. Bootstrapping is the kind of activity that ought to be invoked manually by a person in most scenarios - or at the most by a terraform process that sets up a region for the very first time. I worry that a k8s job could be misconfigured and accidentally purge a deployment during a release. It feels to me like one of those "don't leave a loaded gun in the drawer" scenarios.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I fully agree – but are you OK dropping a feature that existed so far?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm going to remove support for purge – that is new, and indeed sounds too risky. Not a good idea after all.

Copy link
Contributor

@MonkeyCanCode MonkeyCanCode Feb 2, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I think that is fine too. May be better drop the support now before the first release as oppose to later. I added bootstrap earlier as part of helm as there was no different tool available for this and we had to packaged the service config as part of the service binary (thinking about having different binary for diff environments and bootstrap those manually will be really tedious). Now as we have a dedicated tool for it, I also think it is a good idea to drop this.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay job removed!

@adutra
Copy link
Contributor Author

adutra commented Feb 1, 2025

@MonkeyCanCode I rebased and squashed. I also added more unit tests for services. Let me know what you think!

@@ -79,7 +79,7 @@ jobs:
if: steps.list-changed.outputs.changed == 'true'
run: |
helm plugin install https://github.com/helm-unittest/helm-unittest.git || true
helm unittest helm/polaris
helm unittest helm/polaris 2> >(grep -v 'found symbolic link' >&2)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With https://github.com/apache/polaris/pull/912/files, we will need only the original command without filter out the warning messages due to symbolic link.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes! I'm just waiting for #912 to be merged.


```bash
$ helm install polaris helm/polaris --namespace polaris --create-namespace
helm unittest helm/polaris 2> >(grep -v 'found symbolic link' >&2)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as above if we should merge https://github.com/apache/polaris/pull/912/files first then remove the filter part.

<property name="jakarta.persistence.jdbc.url" value="jdbc:h2:mem:polaris-{realm}"/>
<property name="jakarta.persistence.jdbc.user" value="sa"/>
<property name="jakarta.persistence.jdbc.password" value=""/>
<property name="jakarta.persistence.jdbc.url"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

EclipseLink uses a default of 32 connections for connection pooling. For local testing, this is more than enough. If people want to use it for some load testing, this will suffer when there are many concurrent client connection. Do you think we may want to add an example or add the default value here? So for people who may not be familiar with EclipseLink, they can just change the max value etc.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. I added the following:

          <property name="eclipselink.connection-pool.default.initial" value="1" />
          <property name="eclipselink.connection-pool.default.min" value="1" />
          <property name="eclipselink.connection-pool.default.max" value="1" />

{{- $global := . -}}
{{- range $k, $v := $map }}
{{ include "polaris.appendConfigOption" (list $k $v $global) }}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nits: extra space between "list $k"

tag: "latest"
# -- The path to the directory where the application.properties file, and other configuration
# files, if any, should be mounted.
configDir: /deployments/config
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we add a similar comment here for the file listing issue (#907)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point, done

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants