Creates Alma-compliant XML patron profiles from information extracted from the Data Warehouse and transmits those files to an S3 bucket for ingest into Alma.
This application runs daily to create and deliver XML files for Alma to retrieve and load. It establishes a connection to the Data Warehouse and retrieves data for student and staff patrons via SQL queries. Different data are required for students and staff so separate queries are run againt the student and staff tables to retrieve the necessary data for each.
Though the zip files receive a suffix from Alma after processing to prevent them being re-processed, the application removes any existing zip files from the S3 bucket at the start of a run. This prevents any potential errors if Alma were to process more than 1 zip file of either staff or student data.
Given that student employees may appear as both staff and students, the application processing the staff names first and checks student names against the staff names to ensure that student employees only loaded into Alma once.
A zip file is created for each patron type containing an XML file with the Data Warehouse output formatted according to a template.
The staff and student zip files are then posted to the specified S3 bucket.
Finally, the application sends an email with the logs from the run to the lib-alma-notifications
list.
Maximum size for a zip file is 4 GB Maximum limit of 50 XML files in one zip file Maximum of 20 zip files for each import/synchronization
This app uploads 2 under 50MB zip files, each containing 1 XML file, so it is unlikely that the files produced will conflict with these requirements.
- To install with dev dependencies:
make install
- To update dependencies:
make update
- To run unit tests:
make test
- To lint the repo:
make lint
- To run the app:
pipenv run patronload --help
The Data Warehouse runs on a older version of Oracle that necessitates the thick
mode of python-oracledb
which requires the Oracle Instant Client Library (this app was developed with version 21.9.0.0.0.).
Note: as of this writing, the Apple M1 Macs cannot run Oracle Instant Client, so Docker is the only option for development on those machines.
From the project folder:
-
Run
make dependencies
with appropriate AWS credentials. -
Run
make dist-dev
to build the container. -
Run
docker run alma-patronload-dev:latest
.
-
Download Oracle Instant Client (
basiclite
is sufficient) and set theORACLE_LIB_DIR
env variable. -
Run
pipenv run patronload
.
The password for the Data Warehouse is updated each year. To verify that the updated password works, the app must be run as an ECS task in the stage
environment because Cloudconnector is not enabled in dev1
. The app can run a database connection test when called with the flag, --database_connection_test
or -t
.
- Set AWS credentials for
PatronloadManagers
role in stage - Run
make database-connection-test-stage
. - View the logs from the ECS task run on CloudWatch.
-
On CloudWatch, select the
alma-integrations-patronload-ecs-stage
log group. -
Select the most recent log stream.
-
Verify that the following log is included:
Successfully connected to Oracle Database version: <VERSION NUMBER>
-
To properly test with a connection to the Data Warehouse, the app must be run as an ECS task in the stage
environment.
- Set AWS credentials for
PatronloadManagers
role in stage - Set
ECR_NAME_STAGE
andECR_URL_STAGE
in.env
file (seeECR_NAME_DEV
andECR_URL_DEV
fromMakefile
for guidance). - Run
make dependencies
. - Build the image locally:
make dist-stage
. - Publish the image to AWS ECR for stage:
make publish-stage
. - From Terraform Cloud, select the
workloads-patronload-stage
workspace and copy theaws_cli_run_task
command. - Run the command in your terminal and observe the results in AWS.
DATA_WAREHOUSE_USER=# The user for the Data Warehouse database.
DATA_WAREHOUSE_PASSWORD=# The password for the Data Warehouse database.
DATA_WAREHOUSE_HOST=# The host for the Data Warehouse database.
DATA_WAREHOUSE_PORT=# The port for the Data Warehouse database.
DATA_WAREHOUSE_SID=# The system identifier for the Data Warehouse database instance.
SES_RECIPIENT_EMAIL=# The email address to send to, typically a Moira list.
SES_SEND_FROM_EMAIL=# The email address to send from.
S3_BUCKET_NAME=# The S3 bucket in which files are deposited.
S3_PREFIX=# The file path prefix for files deposited to the S3 bucket.
WORKSPACE=# Set to `dev` for local development, this will be set to `stage` and `prod` in those environments by Terraform.
LOG_LEVEL=# The log level for the `alma-patronload` application. Defaults to `INFO` if not set.
ORACLE_LIB_DIR=# The directory containing the Oracle Instant Client library.
SENTRY_DSN=# If set to a valid Sentry DSN, enables Sentry exception monitoring. This is not needed for local development.
- Infrastructure: mitlib-tf-workloads-patronload
- Depends-on Application: CloudConnector
mindmap
root((alma-patronload))
(infrastructure)
["`mitlib-tf-
workloads-patronload`"]
(depends-on)
[CloudConnector]
(output)
["`Writes data as
zip file(s) to S3`"]
(reporting)
[Emails stakeholders]
- Team: DataEng
- Last Maintenance: 2025-01
- External Documentation: TODO...