Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

docs build - process notes and TODOs #138

Closed
briantist opened this issue Sep 5, 2021 · 17 comments
Closed

docs build - process notes and TODOs #138

briantist opened this issue Sep 5, 2021 · 17 comments
Assignees
Labels
documentation Improvements or additions to documentation

Comments

@briantist
Copy link
Collaborator

briantist commented Sep 5, 2021

SUMMARY

This issue is meant to keep track of things related to the docs build process, which builds and publishes collection documentation on PRs and pushes, in temporary sites outside of the published collection docs on docs.ansible.com.

Current State

On a PR (open, updated, resynced)

  • The docs from the target are built (rendered)
  • The docs (docsite changes only) from the PR are built
  • Hashes of both are recorded to determine if the PR makes any differences
  • A git diff of the builds is taken to show in markdown
  • If there are changes, then:
    • The rendered docs are attached to the build as an artifact (expires in 7 days)
    • The rendered docs are published via https://surge.sh to a temporary site that's specific to the PR. This lets anyone browse the fully rendered docs by clicking a link.
    • A bot comment on the PR provides links to the rendered sites of both the target branch docs and the PR's docs for comparison, and shows the git diff output in an expandable section (unless it's too big, which crashes GHA), and it shows a list of modified files, each of which links to the published docsite.
  • If there are no changes:
    • if there's an existing surge site (because there were previously changes in the PR), it is unpublished
    • if there's an existing PR comment (because of previous docs changes), it is removed.

In addition to the above, another build is triggered which does exactly the same thing, but it will build the entire set of docs using the entire PR contents. That means it will include changes to plugin and module documentation.

The key for this one is that it gets built in a separate GitHub "environment" with rules that require it to be approved on every commit. This happens because the build is still privileged and we cannot trust the PR contents, so every commit must be reviewed by a person before approving this build. But once it runs, we get the complete picture of the built docs.

On PR close or merge

  • Any existing surge site is unpublished
  • If there's an existing PR comment, it is updated
    • if the PR was closed, the comment is updated to say the site is unpublished
    • if the PR was merged, the comment is updated thanking for the contribution, and linking to the main surge docsite build (as the changes have now been incorporated there)

On Push (to main only)

  • the docs are built, and the "main" surge site is updated.

Important things

The pull_request_target event

REQUIRED READING: https://securitylab.github.com/research/github-actions-preventing-pwn-requests/

When a GitHub Workflow is run for the pull_request event, from a fork of the repository (the norm in this and most open source projects), the rights are limited, compared to PRs opened from a branch on the same repo (which requires write access to the repo). In particular, on a PR from a fork:

  • the token used within workflow does not have write access to the repository, so it cannot (for example) commit, create/remove/update comments, open issues, etc.
  • the workflow cannot read secrets

These are important security considerations, because anyone can open a PR, and use it to execute untrusted code, leading to unauthorized commits on the repo, secret exposure, etc.

For the purposes of this doc build process, this is important because:

  • we need to store a secret for the token required to publish to surge
  • we need to be able to created/edit/delete comments in PRs

Enter pull_request_target

Workflows run under this event do have access to secrets, and do have write access to the repository. The difference is that both the workflow definition itself, and its defined "ref" refer to the target of the PR. That means that the workflow file is always loaded from the target branch (for example main) so that it cannot be manipulated in the PR, and it also means that using the checkout action with its defaults is going to clone the target (ex main`) and not the PR content.

The point of all this is: you can't trust the PR content when running in a context that has access to secrets and elevated privileges.

It is possible to checkout the PR contents, but you must be careful not to execute anything in it, and there's lots of indirect ways that using the content of a PR could lead to exploitation, even without direct "execution".

Bringing it back to docs build

We use pull_request_target so we can get the surge secret and so we can post PR comments.

But in order to render the docs in the PR, we need some content from the PR.

In our case, we bring in only docs/docsite/, which contains YAML and RST. While a PR could stick other files there, we won't be executing anything in that subdirectory.

Problems and difficulties with pull_request_target

Problem 1: Changing the workflow

Because the workflow is always loaded from the target, it's very difficult to make changes to the workflow via the PR process and test them from within the same PR. The version of the workflow will always be loaded from main.

I've dealt with this in two ways:

Commit/merge it now, use other PRs, update incrementally
  • For this, I'd put up a PR with a change to keep track of the purpose and what's being changed, and then merge it right away.
  • A new PR would be put up that tests the functionality (it will be using the version of the workflow now in main)
  • If anything needs to be changed in the workflow, I make new commits, put the original workflow PR # in the commit message (for example fix type in docs workflow (#123)), and push the commit to main.
  • By mentioning the PR, GitHub will automatically show it linked in the original PR, even though it was not part of the PR; that at least keeps some semblance of cohesiveness.

This isn't great; pushing commits directly to main, some of which are the type that squashing was made for, and squashing those after the fact means a force push (also not good).

Put workflow changes on a direct branch, open other PRs against the branch
  • In this method, the PR making changes to the workflow must be in a branch in the actual repository, not in a fork
  • A new PR is still needed to test the changes, but this time, the PR should be made against the above branch, not against main (that is, you'll be asking to merge the PR into the branch that the workflow PR is based on). This PR can be in a fork.
  • Updates to the workflow are pushed to the workflow PR branch as usual (just remember to push upstream if your origin is your fork)
  • One difficulty here is that if you re-run a CI run on the test PR, it will not use the updated state of the workflow branch, it will use the one it ran against before (the exact commit). So if you update your workflow branch, your test PRs need some other change to trigger their CI to run against the new workflow. I've been using empty commits for that a la git commit -m empty --allow-empty and that's enough to re-trigger.
  • A test PR that doesn't contain actual changes you want will be closed anyway. A test PR with changes you might want to keep is going to be squashed, or a manual rebase is super easy (removing the empty commits altogether).

This method is a little more complex, but it avoids the direct pushes.

Both methods make it near impossible for someone without write access to the repo to meaningfully contribute to the docs build process.

Problem 2: which docs?

The docs build process includes plugin and module documentation, which is very nice. However because of the security implications of pll_request_target, we're only actually using the docs/docsite/ folder from the PR. So changes made to collection content other than the docsite, will not be taken into account in the docs build.

The docstrings for executable content live in .py files, and even though some of them are read via AST and not executed, doc fragments are executed, and as a result, I'm not sure that we can ever include them in PRs submitted on forks.

One idea I had was to possibly introduce a workflow_dispacth event-based workflow (manual invocation) that can be run by a maintainer. This workflow would run against a PR, and render the entire docsite. The idea would be that a maintainer who has reviewed the PR can decide to run the docs build against the entire PR contents, and the resulting docs build site would contain all changes. A way of sorting gating that process. It would have to be re-done on every new commit to a PR as well.

Another idea I had was to figure out if the PR was submitted on a branch in the same repo, which can implicitly be trusted (the submitter had write access already). This doesn't help much on the external contributor front, but in many/most collections, the maintainers are very active contributors (it would certainly help me!).

Both of the above will make for some tricky logic and conditionals in the workflows, making them harder to read and grok.

So this has been sorta-solved by the separate build that requires approval (scroll up).

Problem 3: workflows running in forks

Right now, my workflow compares where it's running from to ensure it's running in ansible-collections/community.hashi_vault, and skips many/most steps if it's not. This was really an indirect way of saying "a fork doesn't have the surge secret so don't bother running all this stuff, it'll fail anyway".

It would be better to check for the existence of a secret instead, so that forks could optionally add such a thing.

That leads to the possibility of using secrets to define things like the surge site name stub, etc.

It has some weird implications for PR comments though.

ISSUE TYPE
  • Documentation Report
COMPONENT NAME

docs build

ANSIBLE VERSION

N/A

@webknjaz
Copy link
Member

Any reason not to use RTD for PR builds? That part with surge feels like reinventing the wheel.

@briantist
Copy link
Collaborator Author

Any reason not to use RTD for PR builds? That part with surge feels like reinventing the wheel.

RTD was the first thing I considered. But it doesn't seem like they have any kind of push mode. It appears to be git based where you connect it to github and it builds docs based on commits. Since I'm not committing rendered docs, and since I want to publish multiple simultaneous sets of docsites (that are specific to PRs), RTD seems to be a poor fit. I'm happy to hear otherwise but I could not find a way to make that work.

I also considered GitHub Pages, but there were several reasons that made that tricky as well. You can see more if you review the past DAWGs meetings minutes where this was discussed.

Surge was a surprisingly easy and straightforward option that came up in my search, and it's been working great for this purpose.

Of course anyone else who implements docs builds in their collections can choose to forego publishing, or choose a different publishing option.

@webknjaz
Copy link
Member

Have you ever used RTD yourself? From your statements, it sounds like you haven't. It is able to build docs on pushes to branches on GitHub as well as to build separate docsites for each PR. Exactly like you want. I fail to see how you can see it as not solving the problem 🤷‍♂️

For example, here's a site that gets updated on every commit to devel: https://ansible-pylibssh.rtfd.io. But each PR gets its own build.
If you look at ansible/pylibssh#205, there's a status check reported from RTD, there's a link to a PR-specific website at https://ansible-pylibssh--205.org.readthedocs.build/en/205/ and while the build is in progress that link points to the build page at https://readthedocs.org/projects/ansible-pylibssh/builds/13759664/.

@webknjaz
Copy link
Member

Are you saying that you want to run https://github.com/ansible-collections/community.hashi_vault/blob/main/docs/preview/build.sh on RTD but you don't know how? It's easy to work around (although, I'm truly confused why antsibull-doc isn't implemented as a proper Sphinx extension, but it's doable even without that).

@briantist
Copy link
Collaborator Author

Have you ever used RTD yourself? From your statements, it sounds like you haven't.

I definitely have not! I spent time researching it and was not able to figure out how to make it work for this use case. Which is not to say that it isn't possible, but their docs do not suggest that it is, and I was not able to find the information. I pursued other avenues, and found one that worked. 🤷‍♂️

It is able to build docs on pushes to branches on GitHub as well as to build separate docsites for each PR. Exactly like you want. I fail to see how you can see it as not solving the problem 🤷‍♂️

For example, here's a site that gets updated on every commit to devel: https://ansible-pylibssh.rtfd.io. But each PR gets its own build.
If you look at ansible/pylibssh#205, there's a status check reported from RTD, there's a link to a PR-specific website at https://ansible-pylibssh--205.org.readthedocs.build/en/205/ and while the build is in progress that link points to the build page at https://readthedocs.org/projects/ansible-pylibssh/builds/13759664/.

That looks great! Definitely what I was trying to do. From your links I still have no idea how it was done; as I said I was able to get surge working and published in 3 minutes, far less time than it took me to even read the RTD docs, so I went with it.


Anyway, while I'm not at all opposed to switching to RTD, is there any reason you're so insistent upon it specifically?

@webknjaz
Copy link
Member

The reason is that it's a de-facto standard in the wider Python community (outside of the Ansible bubble) and there's a good chance that more contributors are familiar with it. Besides, it supports Sphinx as a first-class citizen. This means that there would be no need to maintain a GHA+repo-specific custom setup (hence no need for people to gain this knowledge).
The problem you're facing is probably that you want to run arbitrary code before invoking Sphinx and RTD only runs dep installs before running sphinx. The solution is to move the antsibull-doc invocation into the Sphinx setup. It belongs there anyway. I asked you earlier about build.sh — I want to know if it needs to be executed only on RTD or in any env invoking sphinx?
Also, the PR builds have been introduced about 2 years ago, first as a private beta (that I participated) and then it became a default.
@briantist I need your RTD login (you can sign in via GH) so I'll be able to add you as a maintainer to a test project. Also, I need somebody with better repo access to click on a button that automatically sets up the webhooks.
I'll send a PR with the RTD config and edits to conf.py to show you how it works but we need somebody to enable the webhooks (otherwise, I can trigger manual builds only after the merge to the main branch).

@webknjaz
Copy link
Member

@briantist I'm still waiting for your RTD account + some answers. I'm on PTO now so expect to get a PR from me after Sep 29.

@briantist
Copy link
Collaborator Author

@webknjaz the setup.sh and the entire Sphinx config all come from the work @felixfontein has done with antsibull-docs sphinx-init in ansible-community/antsibull-build#297 .

If you feel that should be changed or done in a different place, I encourage you to propose and/or implement those changes within that tool, as I don't intend on modifying that output all at much. I treat them mostly as generated files (I modified build.sh slightly to not use hardcoded paths).

What you're proposing sounds great and I look forward to it but it might be best worked on with other members of the community; I know Felix has been interested in working on docs build and we were going to work on getting some of my changes into the collections he manages. Maybe it'd be best to skip that and for you to work with him on how best to do it via RTD from the get go?

Also strongly encouraging you to join and attend the Documentation Working Group, as your input would have been valuable while all of this was being worked on.

For the short term I'm not intending to switch to RTD because I have something "good enough" and need to focus efforts elsewhere for a little while, but I'll be watching closely to see how that plays out and I hope to use it in the future! Thanks!

@webknjaz
Copy link
Member

@webknjaz the setup.sh and the entire Sphinx config all come from the work @felixfontein has done with antsibull-docs sphinx-init in ansible-community/antsibull#297 .

Did you mean build.sh?

If you feel that should be changed or done in a different place, I encourage you to propose and/or implement those changes within that tool, as I don't intend on modifying that output all at much. I treat them mostly as generated files (I modified build.sh slightly to not use hardcoded paths).

What you're proposing sounds great and I look forward to it but it might be best worked on with other members of the community; I know Felix has been interested in working on docs build and we were going to work on getting some of my changes into the collections he manages. Maybe it'd be best to skip that and for you to work with him on how best to do it via RTD from the get go?

I still want to send a demo PR here because you cannot demo anything on that repo. It doesn't have to be merged but it'd be a good platform for discussions, a more visual one.
And then, it could be moved to antsibull then.

Also strongly encouraging you to join and attend the Documentation Working Group, as your input would have been valuable while all of this was being worked on.

I wish I had time for this. Maybe I'll join sometimes but often I'm just unable to come 🤷‍♂️

@briantist
Copy link
Collaborator Author

briantist commented Sep 20, 2021

Did you mean build.sh?

Yes :)

I still want to send a demo PR here because you cannot demo anything on that repo. It doesn't have to be merged but it'd be a good platform for discussions, a more visual one.
And then, it could be moved to antsibull then.

Sure, that's ok as long as it doesn't require me signing up for RTD and sending the credentials, it's just something I'm not able to put much time to right now. I suggested working with Felix on one of the collections he maintains as there are a lot of choices, and I know he already has at least one RTD account, and a sample collection build RTD site ( https://github.com/felixfontein/felix-ansible-docsite ), so may be more fruitful than it would be working with me for the time being (if he's available).

Also strongly encouraging you to join and attend the Documentation Working Group, as your input would have been valuable while all of this was being worked on.

I wish I had time for this. Maybe I'll join sometimes but often I'm just unable to come 🤷‍♂️

Of course, I can definitely relate, time is a precious commodity for us all, and I do appreciate you spending some of yours on reviews and suggestions!

@felixfontein
Copy link
Contributor

I know he already has at least one RTD account, and a sample collection build RTD site ( https://github.com/felixfontein/felix-ansible-docsite ),

I don't have an RTD account. That site I'm building locally and uploading on my own server.

@briantist
Copy link
Collaborator Author

I know he already has at least one RTD account, and a sample collection build RTD site ( https://github.com/felixfontein/felix-ansible-docsite ),

I don't have an RTD account. That site I'm building locally and uploading on my own server.

My mistake! Apologies for the assumption.

@webknjaz
Copy link
Member

FWIW you can log in to RTD via GitHub with one click and that's about what you'll need to do 🤷‍♂️

@webknjaz
Copy link
Member

@briantist so I had a few hours and crafted a small demo.

The deployment from the branch in my fork is here: https://community-hashi-vault-ansible-collection-webknjaz-fork.readthedocs.io/en/maintenance-docs-sphinx-ext/ (the URL is this long because I used a long demo project name on RTD + the branch is long too).

Build log: https://readthedocs.org/projects/community-hashi-vault-ansible-collection-webknjaz-fork/builds/15100515/.

Patch: main...webknjaz:maintenance/docs-sphinx-ext?expand=1#diff-cde814ef2f.

Of course, there's area for improvements but I wanted to show the main idea for now.

cc @felixfontein

@briantist
Copy link
Collaborator Author

Hi @webknjaz , we chatted a bit on IRC already, but thank you again for working on that! As you mentioned there, it would be great to see this work integrated into something like antsibull; there is a lot here and at the moment, I don't want to stray too far from what I can produce with the existing init command. It would be great to use RTD for the hosting instead of Surge, but for now the latter is working fine.

To also reiterate some of the points raised in chat, the ultimate purpose of this build process is not really for publishing permanent docs, it's to publish changes in PRs so that can one can see the rendered result of the changes, have some confidence that rST was written correctly (refs, cross-site links, formatting, etc.), and to help highlight when/where code changes resulted in documentation changes (this could happen in docstrings and docfragments for example).

To accomplish that:

  • I need to produce rendered HTML on disk; I think that's possible with what you've shown here (I just don't know how to do it).
  • Rendered HTML is produced in the GitHub workflow for both the target branch and the PR, and then compared.
  • When there are no changes, the docs build does not post a comment nor publish a docsite.
  • The PR comment lists each changed (rendered) file, links it to the published docsite, and shows an expandable diff output of the HTML changes. All of this is about making changes easier to see, proofread, and review.
  • If on a subsequent push there are no changes against the target anymore, the docsite is torn down and comment deleted.

You can see the github workflows for the details of the steps.

I also strongly encourage you to submit a throwaway PR with changes to the docsite so that you can see the build process in action for yourself.

I mention all this because in a previous comment when asked why you so strongly want to change this to use RTD, one of the things you mentioned is:

This means that there would be no need to maintain a GHA+repo-specific custom setup

But I think I perhaps I did not communicate enough that the point of this whole endeavor is post temporary docs in GitHub PRs, not to publish the permanent home of the collection's docs; this is why the place where they are hosted does not matter much, and why I chose a least-friction option.

In the end, I would prefer RTD if it were just as easy, or if the complexity were hidden away/taken care of like the current method is with antsibull. Surge is still working, but it seems like the interface to it is quite old so it may very well disappear one day. If that happens, I will still have docs attached to PR builds as artifacts, still have a way for contributors to generate docs locally with little fuss (this is documented in the docsite), and still have diff output in the PR comment, but the lack of a published website would be very sad.

So, I'm really not discouraging this work (quite the opposite) but I won't be able to use it just yet. I'm really looking forward to it, especially as it relates to other collections being able to use it (RTD will be more familiar and have better trust).

Thanks again for building on and improving this, I'd love to see it make its way into antsibull where it can be supported, and if I can help with that I certainly will where time permits.

@briantist
Copy link
Collaborator Author

briantist commented Dec 25, 2021

In #199, I've run into an issue with the split docsite-only/vs. full build:

  • a new docsite page references two new plugins that are added in the same PR
  • the limited docsite build then emits warnings about the references, since they can't be resolved (the plugins are not included in this buid), which fails the workflow
  • the full docs build works fine

A "quick" resolution I am considering is:

  • the limited docs build failure should not count toward the PR
  • if I can use exit codes to determine whether "failure" was caused only by warnings vs errors, I could continue past the step on warnings and let the limited docs build workflow finish (capturing warnings to show in the PR comment would be nice)

As a longer term fix, I am looking at using the permissions: key in the workflow (better link) to possibly get rid of the split build altogether. The way I envision this working is as follows:

  • the docs build itself would run in a separate job (same workflow), where the elevated permissions granted by the pull_request_target event would be purposefully dropped.
  • That would resolve the concern about executing arbitrary PR code.
  • The PR author cannot change the workflow that is executed to get around that.
  • This job would upload the rendered docsite as an artifact, and would also preserve the diff output that's used in the PR comment (will have to see if this can be done as an output value or will also need to be put into an artifact).
  • A second job (same workflow), depending on the first job, will use its elevated permissions to:
    • publish the site using the surge secret
    • post the PR comment

This ought to get rid of the need for the second job that requires manual approval on every push, which I have to also pay attention to to make sure it doesn't run before limited build (otherwise it will clobber the full build). So this would be better all around.

One thing I'm not sure of with regard to dropping permission, is whether I can actually drop the permission to read secrets. I don't see that as one of the possible scopes listed by GitHub, so I will need to do some experimenting. If I can get this working, it would be a huge step up for this.

EDIT: it looks like the contents scope controls access to secrets (and a lot more). According to that, it looks like write permission is needed to even read secrets, so that should be perfect: we would only need read on contents to build the actual docs.

  • GET /repos/:owner/:repo/actions/secrets (:write)
  • GET /repos/:owner/:repo/actions/secrets/:name (:write)

@briantist
Copy link
Collaborator Author

docs build has moved to a dedicated repository!
https://github.com/ansible-community/github-docs-build/

Further discussion, ideas, and documentation will happen there, in the issues, discussions, and wiki in that repo.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Improvements or additions to documentation
Projects
None yet
Development

No branches or pull requests

3 participants