Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Static hosting Thoughts #1

Open
dgreisen opened this issue Feb 7, 2018 · 1 comment
Open

Static hosting Thoughts #1

dgreisen opened this issue Feb 7, 2018 · 1 comment
Labels
question Further information is requested

Comments

@dgreisen
Copy link
Contributor

dgreisen commented Feb 7, 2018

Our shop uses Python, Javascript and C#. So the tooling will need to be in one of those languages (in that order of preference) so we can maintain it.

Example HTML repository: https://github.com/DCCouncil/dc-law-html/ with submodule https://github.com/DCCouncil/dc-law-docs-laws; may eventually eventually want to pull css/ and js/ from a third repository.

Currently, dc-law-html consists of 800mb of html in ~30k files and. Most file changes every time the the repository is updated (due to recency information updating on ~20k of those files. We are considering ways to reduce the number of files that change to several hundred each time the repository is updated.

There are a bunch of redirects in https://github.com/DCCouncil/dc-law-html/blob/master/redirects.json. There are a bunch of bulk elasticsearch index updates in https://github.com/DCCouncil/dc-law-html/blob/master/index.bulk. There are a bunch of programatic rewrites that need to happen, replacing ~ with : (see, e.g. https://github.com/DCCouncil/dc-law-html/blob/master/dc/council/code/sections/28~1-204.html which becomes https://code.dccouncil.us/dc/council/code/sections/28:1-204.html)

One possibility is to copy all files into a new s3 directory and update the origin at build time. Example:

AWS Cloudfront has two origins one corresponding to each repository, the origin points to directory with the current commit inside the the corresponding repo. Cloudfront terminates ssl. aws bucket has the following directory structure:

openlawlibrary/dc-law-html
  /71e2c192ed7213b6ededb6f2f268d0b20ccabba5
    /...
  /e038e33f96442738782f099e60bfa4e056585aec
    /...
  /...
openlawlibrary/dc-law-docs-laws
  /df0ddb837ad84c775ec1ab9988f45a0c7e23efe0
    /...
  /219704d51bb7f036b2d25fd38cad9ad62f081795
    /...
  /...

Every time a build occurs, the origins for the repositories are updated to point to the current hashes.

AWS Cloudfront calls a lambda function every time it has to hit the origin (not every time it gets a request). The lambda function does the following:

  1. rewrite all :s to ~s
  2. rewrite urls:
    * /dc/council/laws/docs/{remainder} to /dc-law-docs-laws/{current hash}/{remainder}
    • (remove /dc/council/laws/docs/ prefix, add /dc-law-docs-laws/{current hash}/ prefix)
    • get current hash prefix from an environment variable DC-LAW-DOCS-LAWS
      * /{remainder} to /dc-law-html/{current hash}/{remainder}
    • (add /dc-law-docs-laws/{current hash}/ prefix)
    • get current hash prefix from an environment variable DC-LAW-HTML

When a new commit is pushed to dc-law-html HEAD:

  1. upload dc-law-html HEAD to openlawlibrary/dc-law-html/{hash}; rewriting urls as we upload
  2. add s3 redirect objects for all paths listed in redirects.json
  3. Update the cloudformation or terraform template to point to the new hash, and run cloudformation/terraform.
  4. update elasticsearch
  5. invalidate cache

We should be able to set an environment to track HEAD, a tag or to update to any arbitrary commit.
Notes:

  • Once we do this, it should be possible to move to a system where we pull open-law-publish-client in directly through rewrites as well. This will let us differentiate development from preview without doing a whole new build of html.
  • once we change our html around a bit to eliminate superfluous changes between versions, then we can store files by their hash, and store version-path-filehash relations in a key/value store. then we could transition to a lambda@edge that just looks up the key and rewrites the path to the hash value.
  • when we start serving historical versions and diffs, we can just extend the routing lambda function
  • origins can only add prefixes, not remove them, so will have to add the original prefix during build time for deeply nested submodules.
@dgreisen dgreisen added the question Further information is requested label Feb 7, 2018
@dgreisen
Copy link
Contributor Author

dgreisen commented Feb 7, 2018

Above is relatively fleshed out. Comments include more incoherent thoughts and research.

  • It seems we could significantly simplify MVP by creating a library to use inside of a free CI tool like appveyor or travis while making it less turn-key - probably a good tradeoff.
  • terraform is really easy to learn, and has a nicer api than cloudformation, but adds another third-party thing we have to add to the stack.
  • https://github.com/cloudtools/troposphere looks like a fairly robust tool for rapidly and composibly building cloudformation json structures in Python.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

1 participant