Skip to content

Rails app to organize downloaded web archiving data and trigger preassembly/accessioning when appropriate

Notifications You must be signed in to change notification settings

sul-dlss/was-registrar-app

Folders and files

NameName
Last commit message
Last commit date
Feb 10, 2025
Jun 28, 2023
Nov 1, 2024
Oct 29, 2024
Feb 10, 2025
Oct 29, 2024
Jul 8, 2022
May 10, 2023
May 24, 2019
Oct 29, 2024
Jan 31, 2025
May 24, 2019
May 2, 2022
May 24, 2019
Aug 13, 2019
May 2, 2022
Aug 8, 2024
Aug 8, 2024
May 2, 2022
Sep 21, 2019
Jan 27, 2025
Oct 29, 2024
May 24, 2022
Jan 27, 2025
Oct 29, 2024
Feb 17, 2025
Jul 8, 2022
Jan 31, 2025
Aug 8, 2024
Aug 9, 2024
Jan 5, 2021
Aug 8, 2024
Feb 17, 2025

Repository files navigation

WAS Registrar App

CircleCI codecov

The WAS Registrar App (WRA) is a Rails application that:

  • Allows a web archivist to update configuration and schedule web archive collections to be fetched.
  • Allows a web archivist to monitor fetch workflow outcomes.
  • Initiates web archive fetch workflows according to schedule.
  • Maintains state for web archive collections.

WAS Registrar App is the successor to the Web Registrar.

Requirements

  • Ruby 3.4.1
  • Docker / Docker-Compose (optional)
  • npm for building assets
  • Java 8ish for WASAPI Downloader

Setup

To use Postgres container (instead of local Postgres)

docker compose up -d db

Setup the db

RAILS_ENV=test rake db:create db:migrate

To use Redis container (instead of local Redis)

docker compose up -d redis

Install WASAPI Downloader

Note: The WASAPI Downloader is not typically needed for development; it is necessary for running fetches.

curl -L https://github.com/sul-dlss/wasapi-downloader/releases/download/v1.1.1/wasapi-downloader.zip > wasapi-downloader.zip
unzip wasapi-downloader.zip

If installing in a different location, make the appropriate change in settings.

Tests

bin/rails test:prepare
bundle exec rubocop
bundle exec rspec

Run with docker

docker compose up -d db
docker compose run web rake db:setup
docker compose up -d

The app will now be available on http://localhost:3000.

As you make changes (e.g., to gems), you will need to rebuild the web container:

docker compose stop web
docker compose build web
docker compose up -d

Run locally

First install foreman (foreman is not supposed to be in the Gemfile, See this wiki article ):

gem install foreman

Then you can run

bin/dev

This starts css/js bundling and the development server

Alernatively, you can start use docker compose:

docker compose up -d

if you want to run the web container in intractive mode, stop it first and then run it so it will show interactive live output:

docker compose stop web
docker compose run --service-ports web

Background processing

Background processing is performed by Sidekiq.

Sidekiq can be monitored from /queues.

For more information on configuring and deploying Sidekiq, see this doc.

To run a Sidekiq worker locally:

bundle exec sidekiq

Deploying

To deploy to stage: bundle exec cap stage deploy

To deploy to production: bundle exec cap prod deploy

Auditing

To audit the WARCs that have been accessioned in SDR against the WARCs available from a WASAPI provider, use an audit rake task:

  • For a collection that is configured in WRA: bin/rake audit_collection['<collection druid>']
  • For a collection that is not configured in WRA: bin/rake audit['<collection_druid>','<wasapi_collection_id>','<wasapi_account>','<embargo_months>']

For example:

RAILS_ENV=production bin/rake audit_collection['druid:hw105qf0103']`
RAILS_ENV=production bin/rake audit['druid:gq319xk9269','14373','shl','1']

This will return a list of WARC filenames that are available but have not been accessioned. This will respect embargoes and exclude WARCs from the current month.

Remediating

To fetch and initiate a one-time registration for missing WARCs (based on the auditing procedure described above), use a remediate rake task:

  • For a collection that is configured in WRA: bin/rake remediate_collection['<collection druid>']
  • For a collection that is not configured in WRA: bin/rake remediate['<collection_druid>','<wasapi_collection_id>','<wasapi_account>','<embargo_months>']

For example:

RAILS_ENV=production bin/rake remediate_collection['druid:hw105qf0103']`
RAILS_ENV=production bin/rake remediate['druid:gq319xk9269','14373','shl','1']

Reset Process (for QA/Stage)

Requirements

Steps

  1. Stop the redis queues: https://was-registrar-app-stage.stanford.edu/queues/
  2. Reset the database including seeding.
  3. Verify the default collection has been created and no jobs are reported
  4. Run the web_archive_accessioning_spec (bundle exec rspec spec/features/web_archiving_accessioning_spec.rb) integration test and verify that a One-time WARC is created.
  5. Verify that https://library.stanford.edu/sites/all/themes/sulair2016/logo.svg is indexed: https://swap-stage.stanford.edu/was/*/https://library.stanford.edu/sites/all/themes/sulair2016/logo.svg

About

Rails app to organize downloaded web archiving data and trigger preassembly/accessioning when appropriate

Topics

Resources

Stars

Watchers

Forks

Packages

No packages published