-
Notifications
You must be signed in to change notification settings - Fork 65
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
docs build - process notes and TODOs #138
Comments
Any reason not to use RTD for PR builds? That part with surge feels like reinventing the wheel. |
RTD was the first thing I considered. But it doesn't seem like they have any kind of push mode. It appears to be git based where you connect it to github and it builds docs based on commits. Since I'm not committing rendered docs, and since I want to publish multiple simultaneous sets of docsites (that are specific to PRs), RTD seems to be a poor fit. I'm happy to hear otherwise but I could not find a way to make that work. I also considered GitHub Pages, but there were several reasons that made that tricky as well. You can see more if you review the past DAWGs meetings minutes where this was discussed. Surge was a surprisingly easy and straightforward option that came up in my search, and it's been working great for this purpose. Of course anyone else who implements docs builds in their collections can choose to forego publishing, or choose a different publishing option. |
Have you ever used RTD yourself? From your statements, it sounds like you haven't. It is able to build docs on pushes to branches on GitHub as well as to build separate docsites for each PR. Exactly like you want. I fail to see how you can see it as not solving the problem 🤷♂️ For example, here's a site that gets updated on every commit to devel: https://ansible-pylibssh.rtfd.io. But each PR gets its own build. |
Are you saying that you want to run https://github.com/ansible-collections/community.hashi_vault/blob/main/docs/preview/build.sh on RTD but you don't know how? It's easy to work around (although, I'm truly confused why antsibull-doc isn't implemented as a proper Sphinx extension, but it's doable even without that). |
I definitely have not! I spent time researching it and was not able to figure out how to make it work for this use case. Which is not to say that it isn't possible, but their docs do not suggest that it is, and I was not able to find the information. I pursued other avenues, and found one that worked. 🤷♂️
That looks great! Definitely what I was trying to do. From your links I still have no idea how it was done; as I said I was able to get surge working and published in 3 minutes, far less time than it took me to even read the RTD docs, so I went with it. Anyway, while I'm not at all opposed to switching to RTD, is there any reason you're so insistent upon it specifically? |
The reason is that it's a de-facto standard in the wider Python community (outside of the Ansible bubble) and there's a good chance that more contributors are familiar with it. Besides, it supports Sphinx as a first-class citizen. This means that there would be no need to maintain a GHA+repo-specific custom setup (hence no need for people to gain this knowledge). |
@briantist I'm still waiting for your RTD account + some answers. I'm on PTO now so expect to get a PR from me after Sep 29. |
@webknjaz the If you feel that should be changed or done in a different place, I encourage you to propose and/or implement those changes within that tool, as I don't intend on modifying that output all at much. I treat them mostly as generated files (I modified What you're proposing sounds great and I look forward to it but it might be best worked on with other members of the community; I know Felix has been interested in working on docs build and we were going to work on getting some of my changes into the collections he manages. Maybe it'd be best to skip that and for you to work with him on how best to do it via RTD from the get go? Also strongly encouraging you to join and attend the Documentation Working Group, as your input would have been valuable while all of this was being worked on. For the short term I'm not intending to switch to RTD because I have something "good enough" and need to focus efforts elsewhere for a little while, but I'll be watching closely to see how that plays out and I hope to use it in the future! Thanks! |
Did you mean
I still want to send a demo PR here because you cannot demo anything on that repo. It doesn't have to be merged but it'd be a good platform for discussions, a more visual one.
I wish I had time for this. Maybe I'll join sometimes but often I'm just unable to come 🤷♂️ |
Yes :)
Sure, that's ok as long as it doesn't require me signing up for RTD and sending the credentials, it's just something I'm not able to put much time to right now. I suggested working with Felix on one of the collections he maintains as there are a lot of choices,
Of course, I can definitely relate, time is a precious commodity for us all, and I do appreciate you spending some of yours on reviews and suggestions! |
I don't have an RTD account. That site I'm building locally and uploading on my own server. |
My mistake! Apologies for the assumption. |
FWIW you can log in to RTD via GitHub with one click and that's about what you'll need to do 🤷♂️ |
@briantist so I had a few hours and crafted a small demo. The deployment from the branch in my fork is here: https://community-hashi-vault-ansible-collection-webknjaz-fork.readthedocs.io/en/maintenance-docs-sphinx-ext/ (the URL is this long because I used a long demo project name on RTD + the branch is long too). Build log: https://readthedocs.org/projects/community-hashi-vault-ansible-collection-webknjaz-fork/builds/15100515/. Patch: Of course, there's area for improvements but I wanted to show the main idea for now. |
Hi @webknjaz , we chatted a bit on IRC already, but thank you again for working on that! As you mentioned there, it would be great to see this work integrated into something like antsibull; there is a lot here and at the moment, I don't want to stray too far from what I can produce with the existing init command. It would be great to use RTD for the hosting instead of Surge, but for now the latter is working fine. To also reiterate some of the points raised in chat, the ultimate purpose of this build process is not really for publishing permanent docs, it's to publish changes in PRs so that can one can see the rendered result of the changes, have some confidence that rST was written correctly (refs, cross-site links, formatting, etc.), and to help highlight when/where code changes resulted in documentation changes (this could happen in docstrings and docfragments for example). To accomplish that:
You can see the github workflows for the details of the steps. I also strongly encourage you to submit a throwaway PR with changes to the docsite so that you can see the build process in action for yourself. I mention all this because in a previous comment when asked why you so strongly want to change this to use RTD, one of the things you mentioned is:
But I think I perhaps I did not communicate enough that the point of this whole endeavor is post temporary docs in GitHub PRs, not to publish the permanent home of the collection's docs; this is why the place where they are hosted does not matter much, and why I chose a least-friction option. In the end, I would prefer RTD if it were just as easy, or if the complexity were hidden away/taken care of like the current method is with antsibull. Surge is still working, but it seems like the interface to it is quite old so it may very well disappear one day. If that happens, I will still have docs attached to PR builds as artifacts, still have a way for contributors to generate docs locally with little fuss (this is documented in the docsite), and still have diff output in the PR comment, but the lack of a published website would be very sad. So, I'm really not discouraging this work (quite the opposite) but I won't be able to use it just yet. I'm really looking forward to it, especially as it relates to other collections being able to use it (RTD will be more familiar and have better trust). Thanks again for building on and improving this, I'd love to see it make its way into antsibull where it can be supported, and if I can help with that I certainly will where time permits. |
In #199, I've run into an issue with the split docsite-only/vs. full build:
A "quick" resolution I am considering is:
As a longer term fix, I am looking at using the
This ought to get rid of the need for the second job that requires manual approval on every push, which I have to also pay attention to to make sure it doesn't run before limited build (otherwise it will clobber the full build). So this would be better all around. One thing I'm not sure of with regard to dropping permission, is whether I can actually drop the permission to read secrets. I don't see that as one of the possible scopes listed by GitHub, so I will need to do some experimenting. If I can get this working, it would be a huge step up for this. EDIT: it looks like the
|
docs build has moved to a dedicated repository! Further discussion, ideas, and documentation will happen there, in the issues, discussions, and wiki in that repo. |
SUMMARY
This issue is meant to keep track of things related to the docs build process, which builds and publishes collection documentation on PRs and pushes, in temporary sites outside of the published collection docs on docs.ansible.com.
Current State
On a PR (open, updated, resynced)
git diff
of the builds is taken to show in markdowngit diff
output in an expandable section (unless it's too big, which crashes GHA), and it shows a list of modified files, each of which links to the published docsite.In addition to the above, another build is triggered which does exactly the same thing, but it will build the entire set of docs using the entire PR contents. That means it will include changes to plugin and module documentation.
The key for this one is that it gets built in a separate GitHub "environment" with rules that require it to be approved on every commit. This happens because the build is still privileged and we cannot trust the PR contents, so every commit must be reviewed by a person before approving this build. But once it runs, we get the complete picture of the built docs.
On PR close or merge
On Push (to
main
only)Important things
The
pull_request_target
eventREQUIRED READING: https://securitylab.github.com/research/github-actions-preventing-pwn-requests/
When a GitHub Workflow is run for the
pull_request
event, from a fork of the repository (the norm in this and most open source projects), the rights are limited, compared to PRs opened from a branch on the same repo (which requires write access to the repo). In particular, on a PR from a fork:These are important security considerations, because anyone can open a PR, and use it to execute untrusted code, leading to unauthorized commits on the repo, secret exposure, etc.
For the purposes of this doc build process, this is important because:
Enter
pull_request_target
Workflows run under this event do have access to secrets, and do have write access to the repository. The difference is that both the workflow definition itself, and its defined "ref" refer to the target of the PR. That means that the workflow file is always loaded from the target branch (for example
main
) so that it cannot be manipulated in the PR, and it also means that using thecheckout
action with its defaults is going to clone the target (ex
main`) and not the PR content.The point of all this is: you can't trust the PR content when running in a context that has access to secrets and elevated privileges.
It is possible to checkout the PR contents, but you must be careful not to execute anything in it, and there's lots of indirect ways that using the content of a PR could lead to exploitation, even without direct "execution".
Bringing it back to docs build
We use
pull_request_target
so we can get the surge secret and so we can post PR comments.But in order to render the docs in the PR, we need some content from the PR.
In our case, we bring in only
docs/docsite/
, which contains YAML and RST. While a PR could stick other files there, we won't be executing anything in that subdirectory.Problems and difficulties with
pull_request_target
Problem 1: Changing the workflow
Because the workflow is always loaded from the target, it's very difficult to make changes to the workflow via the PR process and test them from within the same PR. The version of the workflow will always be loaded from
main
.I've dealt with this in two ways:
Commit/merge it now, use other PRs, update incrementally
main
)fix type in docs workflow (#123)
), and push the commit tomain
.This isn't great; pushing commits directly to
main
, some of which are the type that squashing was made for, and squashing those after the fact means a force push (also not good).Put workflow changes on a direct branch, open other PRs against the branch
main
(that is, you'll be asking to merge the PR into the branch that the workflow PR is based on). This PR can be in a fork.upstream
if yourorigin
is your fork)git commit -m empty --allow-empty
and that's enough to re-trigger.This method is a little more complex, but it avoids the direct pushes.
Both methods make it near impossible for someone without write access to the repo to meaningfully contribute to the docs build process.
Problem 2: which docs?
The docs build process includes plugin and module documentation, which is very nice. However because of the security implications of
pll_request_target
, we're only actually using thedocs/docsite/
folder from the PR. So changes made to collection content other than the docsite, will not be taken into account in the docs build.The docstrings for executable content live in
.py
files, and even though some of them are read via AST and not executed, doc fragments are executed, and as a result, I'm not sure that we can ever include them in PRs submitted on forks.One idea I had was to possibly introduce aworkflow_dispacth
event-based workflow (manual invocation) that can be run by a maintainer. This workflow would run against a PR, and render the entire docsite. The idea would be that a maintainer who has reviewed the PR can decide to run the docs build against the entire PR contents, and the resulting docs build site would contain all changes. A way of sorting gating that process. It would have to be re-done on every new commit to a PR as well.Another idea I had was to figure out if the PR was submitted on a branch in the same repo, which can implicitly be trusted (the submitter had write access already). This doesn't help much on the external contributor front, but in many/most collections, the maintainers are very active contributors (it would certainly help me!).Both of the above will make for some tricky logic and conditionals in the workflows, making them harder to read and grok.So this has been sorta-solved by the separate build that requires approval (scroll up).
Problem 3: workflows running in forks
Right now, my workflow compares where it's running from to ensure it's running in
ansible-collections/community.hashi_vault
, and skips many/most steps if it's not. This was really an indirect way of saying "a fork doesn't have the surge secret so don't bother running all this stuff, it'll fail anyway".It would be better to check for the existence of a secret instead, so that forks could optionally add such a thing.
That leads to the possibility of using secrets to define things like the surge site name stub, etc.
It has some weird implications for PR comments though.
ISSUE TYPE
COMPONENT NAME
docs build
ANSIBLE VERSION
N/A
The text was updated successfully, but these errors were encountered: