You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Currently, dc-law-html consists of 800mb of html in ~30k files and. Most file changes every time the the repository is updated (due to recency information updating on ~20k of those files. We are considering ways to reduce the number of files that change to several hundred each time the repository is updated.
One possibility is to copy all files into a new s3 directory and update the origin at build time. Example:
AWS Cloudfront has two origins one corresponding to each repository, the origin points to directory with the current commit inside the the corresponding repo. Cloudfront terminates ssl. aws bucket has the following directory structure:
get current hash prefix from an environment variable DC-LAW-DOCS-LAWS
* /{remainder} to /dc-law-html/{current hash}/{remainder}
(add /dc-law-docs-laws/{current hash}/ prefix)
get current hash prefix from an environment variable DC-LAW-HTML
When a new commit is pushed to dc-law-html HEAD:
upload dc-law-html HEAD to openlawlibrary/dc-law-html/{hash}; rewriting urls as we upload
add s3 redirect objects for all paths listed in redirects.json
Update the cloudformation or terraform template to point to the new hash, and run cloudformation/terraform.
update elasticsearch
invalidate cache
We should be able to set an environment to track HEAD, a tag or to update to any arbitrary commit.
Notes:
Once we do this, it should be possible to move to a system where we pull open-law-publish-client in directly through rewrites as well. This will let us differentiate development from preview without doing a whole new build of html.
once we change our html around a bit to eliminate superfluous changes between versions, then we can store files by their hash, and store version-path-filehash relations in a key/value store. then we could transition to a lambda@edge that just looks up the key and rewrites the path to the hash value.
when we start serving historical versions and diffs, we can just extend the routing lambda function
origins can only add prefixes, not remove them, so will have to add the original prefix during build time for deeply nested submodules.
The text was updated successfully, but these errors were encountered:
Above is relatively fleshed out. Comments include more incoherent thoughts and research.
It seems we could significantly simplify MVP by creating a library to use inside of a free CI tool like appveyor or travis while making it less turn-key - probably a good tradeoff.
terraform is really easy to learn, and has a nicer api than cloudformation, but adds another third-party thing we have to add to the stack.
Our shop uses Python, Javascript and C#. So the tooling will need to be in one of those languages (in that order of preference) so we can maintain it.
Example HTML repository: https://github.com/DCCouncil/dc-law-html/ with submodule https://github.com/DCCouncil/dc-law-docs-laws; may eventually eventually want to pull
css/
andjs/
from a third repository.Currently, dc-law-html consists of 800mb of html in ~30k files and. Most file changes every time the the repository is updated (due to recency information updating on ~20k of those files. We are considering ways to reduce the number of files that change to several hundred each time the repository is updated.
There are a bunch of redirects in https://github.com/DCCouncil/dc-law-html/blob/master/redirects.json. There are a bunch of bulk elasticsearch index updates in https://github.com/DCCouncil/dc-law-html/blob/master/index.bulk. There are a bunch of programatic rewrites that need to happen, replacing
~
with:
(see, e.g. https://github.com/DCCouncil/dc-law-html/blob/master/dc/council/code/sections/28~1-204.html which becomes https://code.dccouncil.us/dc/council/code/sections/28:1-204.html)One possibility is to copy all files into a new s3 directory and update the origin at build time. Example:
AWS Cloudfront has two origins one corresponding to each repository, the origin points to directory with the current commit inside the the corresponding repo. Cloudfront terminates ssl. aws bucket has the following directory structure:
Every time a build occurs, the origins for the repositories are updated to point to the current hashes.
AWS Cloudfront calls a lambda function every time it has to hit the origin (not every time it gets a request). The lambda function does the following:
:
s to~
s*
/dc/council/laws/docs/{remainder}
to/dc-law-docs-laws/{current hash}/{remainder}
/dc/council/laws/docs/
prefix, add/dc-law-docs-laws/{current hash}/
prefix)DC-LAW-DOCS-LAWS
*
/{remainder}
to/dc-law-html/{current hash}/{remainder}
/dc-law-docs-laws/{current hash}/
prefix)DC-LAW-HTML
When a new commit is pushed to dc-law-html HEAD:
We should be able to set an environment to track HEAD, a tag or to update to any arbitrary commit.
Notes:
The text was updated successfully, but these errors were encountered: