Name	Name	Last commit message	Last commit date
Latest commit jdub233 Merge pull request #8 from jdub233/release/0.0.2 Jun 18, 2020 ceb22be · Jun 18, 2020 History 43 Commits
docs	docs	add readme	May 21, 2019
.gitignore	.gitignore	adjust gitignore	Jul 12, 2019
README.md	README.md	📝 update readme on error checking	Jun 18, 2020
config.example.yml	config.example.yml	add subdir prefix from config	Jun 26, 2019
handler.js	handler.js	validates page capture and cancels upload	May 20, 2020
package.json	package.json	🔖version bump	May 20, 2020
serverless.yml	serverless.yml	update to nodejs 12	May 19, 2020
validatePlugin.js	validatePlugin.js	adjust error message	May 20, 2020
yarn.lock	yarn.lock	📦update packages	May 20, 2020

Repository files navigation

Page Capture S3

A NodeJS Lambda that can capture a single HTML page and all of the associated files that are served from the same domain. The lambda can push these files to a designated S3 bucket. This is useful if you have a page being dynamically generated and want to periodically render it to static files that can be served to the web directly out of S3.

For example, capturing the home page at www.bu.edu will also pull in and relink associated images, css, js, etc. that are served from the www.bu.edu/ domain. External links will be left as.

The lambda is triggered by a secure API gateway interface using an API key.

The function includes error checking, and will cancel downloads on any assets that do not return a status code 200 (OK). If the root page capture fails, the entire capture is cancelled and no changes will be made to the current contents of the S3 bucket.

How to configure

The capture URL, S3 bucket name, and S3 bucket path are configurable by setting values in a config.yml file. Also, the captured assets can be stored in a separate subdirectory, if one is specified in the config.

CAPTURE_URL sets the URL to the page to be captured.
S3_BUCKET_NAME sets the destination S3 bucket for the captured static files.
S3_PATH sets a path within the bucket for the capture directory. If blank, the root of the bucket will be used.
SUBDIR_PREFIX sets the name of the sub-directory used to store the assets (use the directory name only, no trailing slash). If blank, assets will be stored at the root.

When installing the Lambda, copy the config.example.yml to a config.yml file and customize the values. Once installed, they are also available as environment variables in the running Lambda and can be further adjusted from there.

How to install

Page Capture S3 uses the serverless framework to manage the API gateway and Lambda infrastrucure components. It assumes an existing S3 bucket for the static files.

First install or update the serverless CLI tool, which can generally be done using npm like this:

npm install -g serverless

Also there must be existing AWS cli credentials for the account where the Lambda will be installed (generally aws config).

Once the serverless cli is installed, the Lambda infrastructure can be provisioned on AWS using the deploy command:

serverless deploy

This will package the Lambda code, compile the serverless.yml infrastructure directives to a CloudFormation template, and install a CloudFormation stack in AWS. The deploy command will also return the following values:

the stack name in CloudFormation
an API key named capturePingKey: this is necessary to trigger the private API gateway
an endpoint URL for the API gateway: together with the API key, this can be used to trigger the capture Lambda

How to trigger

The Lambda can be triggered by an https request to the gateway endpoint that includes the key in a header named x-api-key.

For example:

curl --header "x-api-key: <key>" https://<gateway url>

Here is a PHP example using the wp_remote_get() function inside WordPress:

$api_key = '<key>';
$api_url = '<gateway url>';

$response = wp_remote_get( $api_url, array( 'timeout' => 30, 'headers' => array( 'x-api-key' => $api_key, ) ) );

How to monitor

CloudWatch logs are provisioned along with the Lambda. A console link to the CloudWatch events is available in the Resources tab of the CloudFormation stack. Recent logs are also available through the serverless cli like this:

serverless logs --function capture

This is also available through a yarn run command:

yarn log

Local testing

The capture Lambda can simulated locally by the serverless cli with this command:

serverless invoke local --function capture

Or using the yarn run shortcut:

yarn local

The local testing simulation isn't perfect; specifically the upload callback doesn't seem to correctly fire, and the files in /tmp/page-capture are not correctly deleted.

How to remove

The Lambda and all of it's associated resources can be removed by deleting the CloudFormation stack. The existing S3 bucket specified for the static assets will not be affected.

The CloudFormation stack can also be removed using the serverless cli:

serverless remove

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Page Capture S3

How to configure

How to install

How to trigger

How to monitor

Local testing

How to remove

About

Releases 2

Packages

Languages

jdub233/page-capture-s3

Folders and files

Latest commit

History

Repository files navigation

Page Capture S3

How to configure

How to install

How to trigger

How to monitor

Local testing

How to remove

About

Resources

Stars

Watchers

Forks

Releases 2

Packages 0

Languages

Packages