-
Notifications
You must be signed in to change notification settings - Fork 31
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Systematization of CD in our repos #213
Comments
For repo2docker we have been using the "create a tag and travis does the rest" workflow and it is great! The config for it is here https://github.com/jupyter/repo2docker/blob/11fc69f8d62a39f494109d2926cb763c022821c6/.travis.yml#L77-L86 and it works well. It si the same recipe I used for scikit-optimize previously. (side comment: pypi.org has been doing cool stuff wit h2FA and tokens and so on, basically better auth mechanisms, so maybe the recipe needs revamping soon/you can tell this is a snippet that I created many years ago and since have been copy&pasting). It is really useful to have something like versioneer that figures out the version from the tag. A lot of people have a love-hate relationship with versioneer but I would recommend to "just use it" to get started. Then at a later point consider switching. For BinderHub we don't have this, I'd say that is Ok as it is mostly a "helm package". It feels like Chartpress does everything we need there. Release cadence: for BinderHub we release every time we merge something into master. I really like the continuous release practice and it seems to fit well with the ideas of CI, CD, small releases, continuous improvement from devops. For repo2docker we try and cut a release every 3months. It is easy to do, keeps people who like releases happy, creates "fix points". Doing it regularly without "waiting for X" means it is easy to decide to not wait for X because it will be in the next release which is only a few months away. |
The one thing that I think we miss with this "make a release for every commit" approach is that it makes it harder for us to communicate to the outside world what is being updated. Perhaps there is a way to make "checkpoints" that let us create a changelog but without making any kind of special commitment to stability, backwards-support, etc. btw, I wrote a little CLI to quickly generate markdown that gives summaries / links of github activity in a repository. Perhaps it would be useful for us to use this in generating changelogs quickly? https://github.com/choldgraf/github-activity For example, here was the output of running 2019-09-01...todayOther closed PRs
Closed issues
Opened PRs
Opened issues
|
For updating version strings I've used bump2version on a couple projects. You run it with the level of the release |
I think a good way to combine "continuous releases" with "keep the world updated" is to have a blog post every N months that tells people what happened in the last month. Can call it "the quarterly X checkpoint" as well as blog posts when there is a big new feature. I see more and more companies moving towards decoupling technical releases from publicity pushes to reduce stress and coordination required. (If you pay close attention you will see new features appear and only a while later will there be an announcement about it. However most people don't pay close attention and for sure the "general public" doesn't so essentially no one notices.) We already did this when we created the federation. We had it running for a few weeks to debug stuff and then announced it when we knew it was solid. The markdown generator looks very cool for helping to create release notes!! |
@betatim I like the idea of regular blog posts! |
I've just discovered https://github.com/toolmantim/release-drafter (via ansible/molecule#2367) for automatically drafting release notes. |
@betatim I would prefer to pass on Versioneer. You can tell where I stand on the love-not_love debate 😉 It's a lot of overhead to a repo for minimal benefit IMHO. |
There are a couple of requests for new releases:
oauthenticator should be straightforward to automate using a Travis Pypi deployment, shall we aim to get that in place for the next release? Two questions:
jupyter-server-proxy is a bit more complicated since it requires a pypi and npm release, though both are supported by Travis |
I have a slight preference for storing the secrets in https://pypi.org/project/oauthenticator/ suggests that only @minrk has credentials for updating it. Maybe we can create a "jupyterhubteam" account that is a maintainer like we have a "mybinderteam" for https://pypi.org/project/binderhub/? For jupyter-server-proxy I have no idea how to even get started on making a release :( |
A dedicated service account makes sense. Pypi recently added support for token authentication which can be scoped to a single package: https://pypi.org/help/#apitoken I'm happy to look into jupyter-server-proxy when everyone's agreed on this issue |
The API token approach looks nice! I hadn't realised this had shipped yet (still dazzled by the 2FA support). We could also add the mybinderteam account to the jupyter-server-proxy package as it is used a lot in Binder. I think I should have access to that account and can share a API token scoped to that package with you. |
Let's make a jupyterhub-team bot account with token access for upload on PyPI. That sounds excellent. |
New Travis feature: importable configs: https://blog.travis-ci.com/2019-11-11-build-config-imports
If this works it means the token only has to be stored and updated in one place for all jupyterhub repos. Might be a reason to use a PyPi token with full access instead of a single repo? |
I want to get actionable!
When using these tokens, we don't need a service account identity alongside them, only the token. The username will be set to
Is it correct that the idea is to have a PyPI service account? If so, then I assume we let TravisCI we use this account's credentials to do deployments. I think there are some choices to be made. Choice one - scope of PyPI credentials
Choice two - scope of configuration
My suggestion
What do you all think? |
@minrk can you give me access to kubespawner on pypi? I've have 2FA on PyPI as well as GitHub btw. |
I think a separate repo for shared Travis and other CI/CD configuration is clearer to others than reusing team-compass. If there's an organisation wide pypi token stored as a secret in a shared Travis config (which also provides an audit log of changes) then adding CD to a repo should be two steps:
If each project has its own token there are two additional steps: Both will work, though my preference is for the first as it's less manual work that needs to be done by an admin. |
Which repos do we want to add this to? I like the idea of having a central place to control the token, i am a bit hesitant towards creating a central recipe that has to work for "all the repos" and is based on a feature that is new (and only works for travis, not also circleci). Besides credentials copy&pasting build config is a bit tedious when you do it but over the life of a repo (many years) I have hardly had the need to adjust the "push to pypi" part of CI setups. This means copy&paste is Ok and has the advantage of letting each repo slightly change stuff that they need to change vs having to find a way to configure the central recipe. There are also repos (like repo2docker and JupyterHub) that have a working CI/CD setup that I think we should keep as-is (e.g. r2d has tried to move to Azure builds but we haven't even had the resources to complete that move, so changing other stuff "because we can"). This means we need a way to run several setups in parallel for a while. Finding a way that reduces the effort for new repos and works well together with existing ones is what I think we should aim for. |
I think having a central configuration for only the deploy section is good enough, it doesn't need to be for the whole CI config. My thinking is to reduce the number of infrequent manual admin steps that have to be done i.e. dealing with credentials. |
Since tokens can be scoped to packages, I think it's a good idea to have tokens allocated per package, scoped only to that package, not re-using tokens across repos. I think the degree to which we have uniformity should probably be at the level of a "suggested template" that we can host here in team-compass. I don't think inheritance is worth the challenges involved in actually making something work everywhere, but documentation for "it's a good idea to start here" is probably the most useful level of sharing. |
In my mind then, the action plan is:
Then, we repeat for:
|
Action pointIn the Team compass resources section, we add another subsection about repository building blocks where we describe various common patterns, such as for example automating PyPI package uploads on git tag pushes using TravisCI. |
To use a lockfile or similar could be useful with regards to releases, to pinpoint the state. I consider both using a lockfile in repo, or producing a build artificat that we store in association with the release. |
Does anyone have experience with https://pypi.org/project/setuptools-scm/ ? I've just tried it. Instead of putting the version in setup.py or another file |
One disadvantage is you can't install an archive directly from GitHub
As indicated by the error message you need to use a
but having an automatic version that includes the commit hash is nice. Example of the changes to use |
@manics I have not used it. I'm not against using it as it under PyPA so will have some maintainership of the code. One area to investigate before adopting would be its usage with Sphinx and any extensions to see if additional configuration is needed to make it work there and on RTD. @takluyver Happy New Year. Have you used |
Yeah, setuptools-scm and versioneer and the like are nice for automation, but do tend to eliminate support for installing from git archives (which necessarily lack both the scm data and generated version.py) and can cause issues with installing from forks, which often don't have up-to-date tags. |
Installing straight from GitHub via |
It 'works' to install from a git archive in that the code is installed, but doesn't install with the right version number. So the package itself works, but versioneer fails to do its thing. This is fine if nothing is going to check the version of the package, but can cause surprising issues if you install another package that depends on a specific version of the first, either with runtime version checks or an install-time dependency on $ pip install https://github.com/jupyterhub/traefik-proxy/archive/master.zip
$ pip list | grep traefik
jupyterhub-traefik-proxy 0+unknown Installing from forks with git has a similar issue in that if the tags are out of date, installing from a git url ( $ pip install git+https://github.com/jupyterhub/traefik-proxy
...
Successfully installed jupyterhub-traefik-proxy-0.1.4+4.ga96d0eb
$ pip install git+https://github.com/minrk/traefik-proxy
...
Successfully installed jupyterhub-traefik-proxy-0.1.2+64.ga96d0eb To summarize, it's no problem if your installed-from-a-branch package has no dependencies, but can be for dependencies of the package. |
Responding to questions that I missed along the way:
I'm not sure this belongs in a central place, since it will vary by repo. Most of our repos are lightweight and small with frequent mostly bugfix/new feature releases, where betas, etc. don't add anything but time to the release process. A few, such as jupyterhub and zero-to-jupyterhub are large, production projects that can easily contain changes that break stuff in very particular unanticipated user configurations not covered by our test suite, where the release really benefits from testing by the early-adopter user community. These are the cases where betas and release candidates really help. For most of our Authenticators, Spawners, etc., it doesn't benefit either side to add this to the process. If there is a guideline, I would probably say that it would be to ask this question: do we need to solicit feedback from the wider user community "testing in the wild" before we know if we are ready to make this release? And can we expect to get this testing and feedback during this time? If the answer is "yes," then a prerelease (and announcement to ask folks to test, etc.) is warranted. If the answer is "no," (i.e. that we are confident in the changes and/or we wouldn't get enough user testing and feedback on the changes to be worthwhile), then publishing the release immediately is the thing to do. Factors that contribute to "yes, we need a beta"
Factors that contribute to "no, let's publish without a beta"
The result is that for most repos, the answer is usually no, but certain releases could use a beta round to solicit testing and feedback, especially for specific new features or changes. It's also the other way around with the big repos - we usually do a beta for jupyterhub and zero-to-jupyterhub, but sometimes there's a small bugfix we want to push out and there's no need for a beta round in those cases. In all cases, it's really up to the judgement of the team for each release of each repo and if there is a process, it's a per-repo per-release decision, not a project-wide one. |
@minrk thanks for an excellent elaboration on when to have a beta and not etc, it makes a lot of sense to me! |
I had not bumped into the situation of a fork and that resulting in the wrong version getting reported :-/ How often do people run into these problems? I'd argue that most people install from PyPI, the next largest group installs from a tag/branch of the upstream repo, and then people who install from their fork. Does that seem like a reasonable sorting? I don't know how big each group is though :-/ |
@betatim It'll occur if you:
It's annoying and may leak into other packaging tools (e.g. conda may need the tool added a dependency), and that's balanced against the convenience of having versions just work based on pushing a tag with no need to faff with running a separate command. Would it help to have a demo session (e.g. zoom call) or some simple example repos showing how each tool works? One of the difficulties in evaluating versioneer in jupyterhub/traefik-proxy#89 is many people were unfamiliar with it, and it's difficult to tell how much of the versioneer code is required and how much was additions/customisation built on top of versioneer but would be required for any other versioning tool. Tools mentioned so far are:
|
What I meant with "what happens"/"who does this happen to" was more a question in the direction of who experiences this problem while doing what? If we want to weight this up against the convenience of the package maintainers it matters if this frequently/rarely effects newcomers/seasoned devs or the odd person doing weird things. Thinking back in my life history I can't remember ever being caught out by this. So either I don't ever have the use-case that would put me in the position of suffering from this or the use-case is exceedingly rare so I've forgotten already or it happens frequently but I never notice or yet another thing. I do remember having to re-release packages because there was a typo in the version or one of the places in the repo not being updated, or it not being bumped properly back to "dev" or people not making releases because too many steps were involved :-/ As a result I have a strong bias towards versioneer because it makes the failure mode I've experience several times go away. Until today I didn't realise there was a downside (outdated tags for forks) :) |
Sorry, I get you now! I've run into both sides of the problem:
|
We have shifted towards using Example of this can be seen in https://github.com/jupyterhub/jupyterhub/blob/3.1.1/RELEASE.md. I'll close this issue as its quite outdated, and we have a jupyterhub/jupyterhub-python-repo-template where we can establish common practices etc. |
I believe we have a lot to gain by making it easier and quicker to release with the help of Continuously Deployment (CD). If we make the complexity of making a release become
git tag -a x.y.z && git push --tags
then I'd be very happy :DIt is extra relevant if we end up with a chain of things that needs to be released for something to be fixed. From my own experience, it can become too scary to do it if it isn't quite simple.
Technical types of deployments
Publishing a pip package
To CD a pip package, do...
Publishing a Helm chart
We publish Helm charts in zero-to-jupyterhub-k8s and binderhub. This is done like this ...
???
Release guidelines
Should we have common release guidelines among repositories? If it is easy to make a release, we could encourage a practice of making beta releases for example.
I guess what I'm suggesting in this issue mainly is to support making releases using git only.
The text was updated successfully, but these errors were encountered: