Skip to content

Commit

Permalink
Add functionality to CLI
Browse files Browse the repository at this point in the history
Why these changes are being introduced:
* Given the application's simplicity, the majority of the application's functionality will reside in the CLI.

How this addresses that need:
* Update cli.py with the application's functionality
* Update dependencies
* Update README.md

Side effects of this change:
* None

Relevant ticket(s):
* https://mitlibraries.atlassian.net/browse/IN-1093
  • Loading branch information
ehanson8 committed Nov 20, 2024
1 parent 5ce4192 commit 8b1ebd9
Show file tree
Hide file tree
Showing 7 changed files with 558 additions and 68 deletions.
8 changes: 5 additions & 3 deletions Pipfile
Original file line number Diff line number Diff line change
Expand Up @@ -4,19 +4,21 @@ verify_ssl = true
name = "pypi"

[packages]
archivessnake = "*"
boto3 = "*"
click = "*"
pyairtable = "*"
sentry-sdk = "*"
boto3 = "*"

[dev-packages]
black = "*"
boto3-stubs = "*"
coveralls = "*"
moto = "*"
mypy = "*"
pre-commit = "*"
pytest = "*"
ruff = "*"
boto3-stubs = "*"
moto = "*"

[requires]
python_version = "3.12"
Expand Down
446 changes: 420 additions & 26 deletions Pipfile.lock

Large diffs are not rendered by default.

46 changes: 15 additions & 31 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,28 +1,14 @@
# python-cli-template

A template repository for creating Python CLI applications.

## App Setup (delete this section and above after initial application setup)

1. Rename "asati" to the desired app name across the repo. (May be helpful to do a project-wide find-and-replace).
2. Update Python version if needed.
3. Install all dependencies with `make install` to create initial Pipfile.lock with latest dependency versions.
4. Add initial app description to README and update initial required ENV variable documentation as needed.
5. Update license if needed (check app-specific dependencies for licensing terms).
6. Check Github repository settings:
- Confirm repo branch protection settings are correct (see [dev docs](https://mitlibraries.github.io/guides/basics/github.html) for details)
- Confirm that all of the following are enabled in the repo's code security and analysis settings:
- Dependabot alerts
- Dependabot security updates
- Secret scanning
7. Create a Sentry project for the app if needed (we want this for most apps):
- Send initial exceptions to Sentry project for dev, stage, and prod environments to create them.
- Create an alert for the prod environment only, with notifications sent to the appropriate team(s).
- If *not* using Sentry, delete Sentry configuration from config.py and test_config.py, and remove sentry_sdk from project dependencies.
# archivesspace-airtable-integration
An application for updating Airtable with ArchivesSpace data.


# asati
The application extracts data from newly-created accession records in ArchivesSpace on a daily basis. It uses an SSM parameter (`/apps/asati/last-accession-uri-id`) to track the last accession that was added to Airtable via the ID in the URI (e.g. "/repositories/2/accessions/**123**"). The application accesses the SSM parameter at the start of each run and calls the ArchivesSpace API's `accessions` endpoint with the `?all_ids=true` parameter to check if a higher ID number has been created. ArchivesSpace increments the IDs and does not reuse IDs, making this a safe method for checking if new accessions have been created.

If no new accessions were create, the run ends. If a new accession has been created, the accession record is retrieved and the necessary data is posted as a new row in Airtable. This is not expected to be high-volume application, only a few accessions at most are expected to be created on a given day and most days will not have any new accessions.

Sentry will be used for exception monitoring and the application is expected to immediately fail if there is an error. Given the simple structure and logging at potential failure points, troubleshooting should be quick if the application does fail. Furthermore, the application can pick up where it left off on the next daily run, given that the SSM parameter is updated after each row is added to Airtable.

Description of the app

## Development

Expand All @@ -40,16 +26,14 @@ Description of the app
```shell
SENTRY_DSN=### If set to a valid Sentry DSN, enables Sentry exception monitoring. This is not needed for local development.
WORKSPACE=### Set to `dev` for local development, this will be set to `stage` and `prod` in those environments by Terraform.
LAST_ACCESSION_PARAMETER=### The name of the SSM parameter that stores the ID from the URI of the last processed accession.
ARCHIVESSPACE_URL=### The URL of the ArchivesSpace instance to use. Given this app is read-only, the production instance will be used.
ARCHIVESSPACE_USER=### The username to use for authenticating to ArchivesSpace.
ARCHIVESSPACE_PASSWORD=### The password to use for authenticating to ArchivesSpace.
AIRTABLE_TOKEN=### The token to use for authenticating to Airtable.
AIRTABLE_BASE_ID=### The Airtable base containing the table to be updated.
AIRTABLE_TABLE_NAME=### The Airtable table to be updated.
```

### Optional

_Delete this section if it isn't applicable to the PR._

```shell
<OPTIONAL_ENV>=### Description for optional environment variable
```




66 changes: 65 additions & 1 deletion asati/cli.py
Original file line number Diff line number Diff line change
@@ -1,10 +1,14 @@
import logging
import os
from datetime import timedelta
from time import perf_counter

import click
import pyairtable
from asnake.client import ASnakeClient # type: ignore[import-untyped]

from asati.config import configure_logger, configure_sentry
from asati.utils import SSMClient, parse_accession_number, parse_extent_data

logger = logging.getLogger(__name__)

Expand All @@ -20,7 +24,67 @@ def main(*, verbose: bool) -> None:
logger.info(configure_sentry())
logger.info("Running process")

# Do things here!
ssm_client = SSMClient()
last_accession_parameter = os.environ["LAST_ACCESSION_PARAMETER"]
last_accession_uri_id = ssm_client.get_parameter(last_accession_parameter)
logger.info(
f"SSM parameter '{last_accession_parameter}' retrieved: '{last_accession_uri_id}'"
)

asnake_client = ASnakeClient(
baseurl=os.environ["ARCHIVESSPACE_URL"],
username=os.environ["ARCHIVESSPACE_USER"],
password=os.environ["ARCHIVESSPACE_PASSWORD"],
)
accession_uri_ids = asnake_client.get(
"/repositories/2/accessions?all_ids=true"
).json()

if max(accession_uri_ids) <= int(last_accession_uri_id):
logger.info("No new accessions to add to Airtable.")
else:
airtable_api = pyairtable.Api(os.environ["AIRTABLE_TOKEN"])
airtable_table = airtable_api.table(
os.environ["AIRTABLE_BASE_ID"], os.environ["AIRTABLE_TABLE_NAME"]
)
logger.info(
f"Airtable client configured for base '{airtable_table.base}'"
f" and table '{airtable_table.name}'"
)
accessions_to_add = sorted(
[
accession_uri_id
for accession_uri_id in accession_uri_ids
if accession_uri_id > int(last_accession_uri_id)
]
)
logger.info(f"Adding the following accessions to Airtable: {accessions_to_add}")
for accession_uri_id in accessions_to_add:
accession_uri = f"/repositories/2/accessions/{accession_uri_id}"
accession_record = asnake_client.get(accession_uri).json()
logger.debug(f"Retrieved record: {accession_uri}")
accession_data = {
"Accession Title": accession_record["title"],
"Accession Number": parse_accession_number(accession_record),
"Current Status": "Unassigned",
**parse_extent_data(accession_record),
}
logger.debug(f"Data extracted from ArchivesSpace: {accession_data}")

response = airtable_table.create(accession_data)
logger.info(
"Airtable row created for Accession Number: "
f"'{response["fields"]["Accession Number"]}'"
)

logger.info(f"Last accession ID processed: {accession_uri_id}")
updated_last_accession_uri_id = ssm_client.update_parameter(
last_accession_parameter, str(accession_uri_id)
)
logger.info(
f"Updated SSM parameter '{last_accession_parameter}' "
f"to '{updated_last_accession_uri_id}'"
)

elapsed_time = perf_counter() - start_time
logger.info(
Expand Down
2 changes: 1 addition & 1 deletion pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -45,7 +45,7 @@ ignore = [
"PLR0915",
"S320",
"S321",
"G004",
"G004"
]

# allow autofix behavior for specified rules
Expand Down
15 changes: 15 additions & 0 deletions tests/conftest.py
Original file line number Diff line number Diff line change
Expand Up @@ -15,6 +15,21 @@ def _test_env(monkeypatch):
monkeypatch.setenv("AWS_SESSION_TOKEN", "testing")


@pytest.fixture
def aspace_record():
return {
"title": "MIT Libraries records and Bibliotech newsletters 2024 July transfer",
"id_0": "2025",
"id_1": "014",
"extents": [
{
"number": "1",
"extent_type": "box(es)",
}
],
}


@pytest.fixture
def runner():
return CliRunner()
Expand Down
43 changes: 37 additions & 6 deletions tests/test_cli.py
Original file line number Diff line number Diff line change
@@ -1,17 +1,48 @@
import logging
from unittest.mock import patch

from asati.cli import main


def test_cli_no_options(caplog, runner):
result = runner.invoke(main)
@patch("asati.cli.ASnakeClient")
@patch("asati.cli.pyairtable")
def test_cli_new_accessions_success(
mocked_airtable, mocked_asnake, aspace_record, caplog, runner
):
mocked_asnake.return_value.get.return_value.json.side_effect = [
[1234, 5678],
aspace_record,
]
mocked_airtable.Api.return_value.table.return_value.create.return_value = {
"fields": {"Accession Number": "2025-014"}
}
caplog.set_level(logging.DEBUG)
result = runner.invoke(main, ["--verbose"])
assert result.exit_code == 0
assert "Logger 'root' configured with level=INFO" in caplog.text
assert "Logger 'root' configured with level=DEBUG" in caplog.text
assert "Running process" in caplog.text
assert (
"Data extracted from ArchivesSpace: {'Accession Title': 'MIT Libraries records "
"and Bibliotech newsletters 2024 July transfer', 'Accession Number': '2025-014',"
" 'Current Status': 'Unassigned', 'Extent Number': 1.0, 'Extent Type': 'box(es)'}"
in caplog.text
)
assert "Airtable row created for Accession Number: '2025-014'" in caplog.text
assert (
"Updated SSM parameter '/apps/asati/last-accession-uri-id' to '5678'"
in caplog.text
)
assert "Total time to complete process" in caplog.text


def test_cli_all_options(caplog, runner):
result = runner.invoke(main, ["--verbose"])
@patch("asati.cli.ASnakeClient")
def test_cli_no_accessions_success(mocked_asnake, caplog, runner):
mocked_asnake.return_value.get.return_value.json.return_value = [1234]
result = runner.invoke(main)
assert result.exit_code == 0
assert "Logger 'root' configured with level=DEBUG" in caplog.text
assert "Logger 'root' configured with level=INFO" in caplog.text
assert "Running process" in caplog.text
assert "No new accessions to add to Airtable." in caplog.text
assert "Total time to complete process" in caplog.text
assert "Airtable row created for Accession Number:" not in caplog.text
assert "Updated SSM parameter" not in caplog.text

0 comments on commit 8b1ebd9

Please sign in to comment.