Skip to content

Commit

Permalink
Support configuration for multiple projects (#309)
Browse files Browse the repository at this point in the history
* Update config design diagram

* Add example config yaml

* Add pydantic model for project configuration

* Add tests for pydantic config model

* Add config loading helper

* fix: Remove `classmethod` decorators and tell ruff that pydantic's validator is also a classmethod

Setting the validators as `classmethod` will fail for some validations

* Refactor config tests: split up invalid fields tests and create common base data object

* Add some more asserts to config test and use `load_config()` helper

* Add PyYAML as core dependency

`config.py` needs it

* Add copyright headers

* Move `load_config` to top of file

* feat: orthanc-anon project config (#312)

## Project configs
## Common
- add env variable for project configs location
- mount project configs location
- make tag operations a list in config (limited to length 1 for now)
- make config functions return `PixlConfig | Any` to get type hints
- replace deprecated pydantic features
- add utility function to load project config by slug only
### CLI
- install cli as editable to find env files correctly
- add function to check env for all env.sample keys
### orthanc-anon project config
- orthanc anon now cofigurable (tag ops, modalities, destionation)
- add query for project slug by non-hashed values
- move anonymisation logic to dcmd package
### EHR project config
- mark only processing tests with run_containers fixture
- use project config to determine destination

---------

Co-authored-by: Milan Malfait <[email protected]>

* Add project config template

* Add documentation for Parquet files and export process (#280)

* Add documentation for Parquet files and export process

* Formatting

* Move `TODO` to issue #306

* Remove PR references

* Formatting

* Move specific details to `pixl_core` docs and add links

* Update directory structure on the FTPS server

* Formatting

* Rename docs/data -> docs/file_types

* Link to `file_types` documentation

* Add directory structures to docstrings

* Update upload.py

Co-authored-by: Stef Piatek <[email protected]>

* Fix docs link

Co-authored-by: Jeremy Stein <[email protected]>

* Clarify that the radiology reports go through Cogstack

Co-authored-by: Jeremy Stein <[email protected]>

* Add note about test files

Co-authored-by: Jeremy Stein <[email protected]>

---------

Co-authored-by: Stef Piatek <[email protected]>
Co-authored-by: Jeremy Stein <[email protected]>

* Correct instructions for editable installs (#314)

* docs: better editable install instructions

* docs: editable install of pytest-pixl

* Add copyright header to config template

* Add secret fetching from Azure keyvault

* Ignore local secrets

* Add Azure dependenices

* Refactor: move FTPS setup to separet module

* Create AzureKeyVault class to handle project secrets fetching

* Refactor `_connect_to_ftp` to take FTP settings as parameters

* Add abstract `Uploader` and `FTPSUploader` classes to define uploading interface

* Update upload tests to use `Uploader` interface

* Revert back to `pydantic.validator`

`field_validator` is only available in `pydantic>-2.0`

* Add `MockFTPSUploader` for tests

Opted to mock out the `FTPSUploader` for tests instead of the `AzureKeyVault` to set the configuration fields directly.

* Update tests with `MockFTPSUploader` class

* Partially added docs to describe project configuration

* Small code reordering

* Add `upload` method to `ParquetExports` class using the uploader strategy

* Update EHR API to use the new uploader strategy

* Load project config only when necessary

* Move `parquet_export` fixture out of conftest

Fixture is only used in `test_upload` anyway, and it avoids an issue where importing the `ParquetEport` class
requires the `PROJECT_CONFIGS_DIR` envvar to be set, but is only set after the imports in the conftest.

* Update test instruction in EHR API

* Use uploader strategy in orthanc-anonn for DICOM uploads

* Update required environment variables

- Remove the `FTP_` variables (except `FTP_PORT`), as these are now retrievd from the Azure Keyvault
- Add Azure Keyvault variables to docker-compose

* Format docker-compose

* Load `.secrets.env` in system tests

Contains the Azure Keyvault credentials

* Add instructions for setting up project config

* Update core docs

* Set PROJECT_CONFIGS_DIR envvar for cli tests

* Use same config for `MockFTPSUploader` as for the `ftps_server` fixture

* Fix core tests

* Format docker-compose

* Update cli docs

Running tests no longer requires the extra script.

* Fix: keyvault name envvar

* Load local `.env` if it exists before setting `PROJECT_CONFIGS_DIR`

* Improve logging

* Move project configs in top-level `projects/` directory

* Move exports dir in top-level projects

* Allow setting an alias for Azure KV secret fetching

* Remove debugging message

* Set Azure Keyvault credentials as secrets in CI

* Add Azure keyvault setup instructions

Copied from the Hasher API docs

* Add sample `.secrets.env`

* Report which secret is missing

* Set default value for `PROJECT_CONFIGS_DIR` in sample `.env`

* Update destination options in README

* Update Azure KV setup instructions

* Raise wanring when destination not supported instead of error

* Don't mention unsupported alternatives

* No need for line continuation

* Better function naming

* Get FTP port from Azure Keyvault as well

* Remove more references to `FTP_` environment variables

* Fix formatting

* Forgot another renaming instance

* Remove more instances of `FTP_*` env variables

* No more FTP dummy service

* Refer to pytest-pixl plugin in test docs

* Get FTP port from Keyvault as well

* Allow 'none' destination for uploading

* This string expansion isn't working

* Exports are now under `projects/exports`

* Docker mounts need absolute paths

* Seems that we still need the `FTP_*` environment variables

I suspect because they are still used by the pytest plugin ftpserver fixture.

* Update exports dir in gitignore

* Mostly for debugging but probably a useful check to avoid surprises

* Think I finally found the bug

* Update docs

* Make sure exports dir exists in unit tests

* Add check for public parquet exports

* Rename system test checks for exports

* refactor[config]: remove tag-operations from filenames

* Update README.md typo

* Update template_config.yaml

* Update cli/README.md

Co-authored-by: Stef Piatek <[email protected]>

* Switch back to `pydantic.field_validator`

`pydantic.validator` is deprecated

* Record pydantic version

* Revert "Format docker-compose"

This reverts commit 54d3cee.

* Cache secret fetching from AZ keyvault

* Set secret ENV variables for the whole system-test job

This should omit the need to set them explicitly in `.secrets.env`

* Fix Keyvault secret names in diagram

* Add more abstract methods into the base Uploader class

* add slugify reference

* fix[dcmd]: type hints

* refactor[core]: define uploader subpackage

* fix[cli]: convert to_posix to str()

* fix[imports]

* fix docker compose

* fix[core]: add back ftps uploader __init__

* Add static `create` method to `Uploader` base class to handle upload strategies

* Make `uploader._base` non-private

* Make `uploader.ftps` private

* Update client code to use new `Uploader.create` method

* Move `FTPSUploader` to `core.uploader.base` and make private

Avoids circular imports

* Fix test

* remove mention of non-existing options in enum and in diagram

* clarify lastpass secrets note

* Limit scope of secret envvars

* Revert "Move `FTPSUploader` to `core.uploader.base` and make private"

This reverts commit 6ce589e.

* Use package-level `get_uploader` factory instead of static method

* Update client code to use `get_uploader`

* Remove duplicate template

---------

Co-authored-by: Peter Tsrunchev <[email protected]>
Co-authored-by: Stef Piatek <[email protected]>
Co-authored-by: Jeremy Stein <[email protected]>
  • Loading branch information
4 people authored Mar 5, 2024
1 parent a7f7292 commit 7e5a28b
Show file tree
Hide file tree
Showing 53 changed files with 1,862 additions and 452 deletions.
10 changes: 2 additions & 8 deletions .env.sample
Original file line number Diff line number Diff line change
Expand Up @@ -82,10 +82,6 @@ AZ_DICOM_ENDPOINT_CLIENT_SECRET=
AZ_DICOM_ENDPOINT_TENANT_ID=

# EHR extraction API
PIXL_EHR_API_AZ_SUBSCRIPTION_ID=
PIXL_EHR_API_AZ_CLIENT_ID=
PIXL_EHR_API_AZ_CLIENT_SECRET=
PIXL_EHR_API_AZ_TENANT_ID=
PIXL_EHR_API_AZ_STORAGE_ACCOUNT_NAME=
PIXL_EHR_API_AZ_STORAGE_CONTAINER_NAME=
PIXL_EHR_COGSTACK_REDACT_URL=
Expand All @@ -100,7 +96,5 @@ PIXL_DICOM_TRANSFER_TIMEOUT=240
PIXL_QUERY_TIMEOUT=10
PIXL_MAX_MESSAGES_IN_FLIGHT=100

# FTP server
FTP_HOST=
FTP_USER_NAME=
FTP_USER_PASSWORD=
# Project configs directory
PROJECT_CONFIGS_DIR=projects/configs
14 changes: 11 additions & 3 deletions .github/workflows/main.yml
Original file line number Diff line number Diff line change
Expand Up @@ -131,9 +131,9 @@ jobs:

- name: Install Python dependencies
run: |
pip install pixl_core/
pip install pytest-pixl/
pip install cli/[test]
pip install -e pixl_core/
pip install -e pytest-pixl/
pip install -e cli/[test]
- name: Run tests
working-directory: cli/tests
Expand Down Expand Up @@ -225,6 +225,9 @@ jobs:
pip install -e pytest-pixl/
pip install -e cli/[test]
- name: Create .secrets.env
run: touch .secrets.env

- name: Build test services
working-directory: test
run: |
Expand All @@ -236,6 +239,11 @@ jobs:
- name: Run tests
working-directory: test
env:
EXPORT_AZ_CLIENT_ID: ${{ secrets.EXPORT_AZ_CLIENT_ID }}
EXPORT_AZ_CLIENT_PASSWORD: ${{ secrets.EXPORT_AZ_CLIENT_PASSWORD }}
EXPORT_AZ_TENANT_ID: ${{ secrets.EXPORT_AZ_TENANT_ID }}
EXPORT_AZ_KEY_VAULT_NAME: ${{ secrets.EXPORT_AZ_KEY_VAULT_NAME }}
run: |
./run-system-test.sh
echo FINISHED SYSTEM TEST SCRIPT
Expand Down
3 changes: 2 additions & 1 deletion .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,7 @@
.env
local.env
pixl_config.yml
.secrets.env

# Byte-compiled / optimized / DLL files
__pycache__/
Expand Down Expand Up @@ -146,6 +147,6 @@ dmypy.json
.vscode

# project specific files
/exports/
/projects/exports/
**/test_producer.csv
/test/dummy-services/ftp-server/mounts/data/*
6 changes: 6 additions & 0 deletions .secrets.env.sample
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
# Azure key vault
EXPORT_AZ_CLIENT_ID=
EXPORT_AZ_CLIENT_PASSWORD=
EXPORT_AZ_TENANT_ID=
EXPORT_AZ_KEY_VAULT_NAME=

142 changes: 142 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -188,6 +188,148 @@ For the deidentification of the EHR extracts, we rely on an instance running the
[CogStack API](https://cogstack.org/) with a `/redact` endpoint. The URL of this instance should be
set in `.env` as `COGSTACK_REDACT_URL`.

### 3. Configure a new project

To configure a new project, follow these steps:

1. Create a new `git` branch from `main`

```sh
git checkout main
git pull
git switch -c <branch-name>
```

1. Copy the `template_config.yaml` file to a new file in the `projects/config` directory and fill
in the details.
1. The filename of the project config should be `<project-slug>`.yaml

>[!NOTE]
> The project slug should match the [slugify](https://github.com/un33k/python-slugify)-ed project name in the `extract_summary.json` log file!

2. [Open a PR in PIXL](https://github.com/UCLH-Foundry/PIXL/compare) to merge the new project config into `main`

#### The config YAML file

The configuration file defines:

- Project name: the `<project-slug>` name of the Project
- The DICOM dataset modalities to retain (e.g. `["DX", "CR"]` for X-Ray studies)
- The anonymisation operations to be applied to the DICOM tags, by providing a file path to one or multiple YAML files
- The endpoints used to upload the anonymised DICOM data and the public and radiology
[parquet files](./docs/file_types/parquet_files.md). We currently support the following endpoints:
- `"none"`: no upload
- `"ftps"`: a secure FTP server (for both _DICOM_ and _parquet_ files)
<!-- - `"azure"`: a secure Azure Dicom web service (for both _DICOM_ and _parquet_ files) -->
<!-- Requires the `AZURE_*` environment variables to be set in `.env` -->
<!-- - `"dicomweb"`: a DICOMweb server (for _DICOM_ files only) -->
<!-- Requires the `DICOMWEB_*` environment variables to be set in `.env` -->

#### Project secrets

Any credentials required for uploading the project's results should be stored in an **Azure Key Vault**
(set up instructions below).
PIXL will query this key vault for the required secrets at runtime. This requires the following
environment variables to be set in `.secrets.env` so that PIXL can connect to the key vault:

- `EXPORT_AZ_CLIENT_ID`: the service principal's client ID, mapped to `AZURE_CLIENT ID` in `docker-compose`
- `EXPORT_AZ_CLIENT_PASSWORD`: the password, mapped to `AZURE_CLIENT_SECRET` in `docker-compose`
- `EXPORT_AZ_TENANT_ID`: ID of the service principal's tenant. Also called its 'directory' ID. Mapped to `AZURE_TENANT_ID` in `docker-compose`
- `EXPORT_AZ_KEY_VAULT_NAME` the name of the key vault, used to connect to the correct key vault

Create the `.secrets.env` file in the _PIXL_ directory by copying the sample:

```bash
cp .secrets.env.sample .secrets.env
```

and fill in the missing values (for dev purposes find the `pixl_dev_secrets.env` note on LastPass).

<details><summary>Azure Keyvault setup</summary>

## Azure Keyvault setup

_This is done for the \_UCLH_DIF_ `dev` tenancy, will need to be done once in the _UCLHAZ_ `prod`
tenancy when ready to deploy to production.\_

This Key Vault and secret must persist any infrastructure changes so should be separate from disposable
infrastructure services. ServicePrincipal is required to connect to the Key Vault.

The application uses the ServicePrincipal and password to authenticate with Azure via environment
variables. See [here](https://learn.microsoft.com/en-us/python/api/azure-identity/azure.identity.environmentcredential?view=azure-python)
for more info.

The Key Vault and ServicePrincipal have already been created for the `dev` environment and details
are stored in the `pixl-dev-secrets.env` note in the shared PIXL folder on _LastPass_.

The process for doing so using the `az` CLI tool is described below.
This process must be repeated for `staging` & `prod` environments.

### Step 1

Create the Azure Key Vault in an appropriate resource group:

```bash
az keyvault create --resource-group <resource-group-name> --name <key-vault-name> --location "UKSouth"
```

### Step 2

Create Service Principal & grant access as per

```bash
az ad sp create-for-rbac -n pixl-secrets --skip-assignment
```

This will produce the following output

```json
{
"appId": "<generated-app-ID>",
"displayName": "<app-name>",
"name": "http://<app-name>",
"password": "<generated-password>",
"tenant": "<tenant-ID>"
}
```

### Step 3

Assign correct permissions to the newly created ServicePrincipal

```bash
az keyvault set-policy --name <key-vault-name> --spn <generated-app-ID> --secret-permissions backup delete get list set
```

### Step 4

Create a secret and store in the Key Vault

Use Python to create a secret:

```python
import secrets
secrets.token_urlsafe(32)
```

copy the secret and paste as <secret-value> below

```bash
az keyvault secret set --vault-name "<key-vault-name>" --name "<secret-name>" --value "<secret-value>"
```

### Step 5

Save credentials in `.secrets.env` and a LastPass `PIXL Keyvault <project-slug> secrets` note.

```
EXPORT_AZ_CLIENT_ID=<generated-app-ID>
EXPORT_AZ_CLIENT_PASSWORD=<generated-password>
EXPORT_AZ_TENANT_ID=<tenant-ID>
EXPORT_AZ_KEY_VAULT_NAME=<key-vault-name>
```
</details>

## Run

### Start
Expand Down
5 changes: 3 additions & 2 deletions cli/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -133,9 +133,10 @@ pip install -e ../pixl_core/ -e .[test]

### Running tests

Tests can be run with `pytest` from the `tests` directory.
The CLI tests require a running instance of the `rabbitmq` service, for which we provide a
`docker-compose` [file](./tests/docker-compose.yml). The service is automatically started by the
`run_containers` _pytest_ fixture. So to run the tests, run

```bash
cd tests
pytest
```
9 changes: 2 additions & 7 deletions cli/pyproject.toml
Original file line number Diff line number Diff line change
@@ -1,22 +1,17 @@
[project]
name = "pixl_cli"
version = "0.0.4"
authors = [
{ name="PIXL authors" },
]
authors = [{ name = "PIXL authors" }]
description = "PIXL command line interface"
readme = "README.md"
requires-python = "<3.12"
classifiers = [
"Programming Language :: Python :: 3"
]
classifiers = ["Programming Language :: Python :: 3"]
dependencies = [
"core",
"click==8.1.3",
"coloredlogs==15.0.1",
"pandas==1.5.1",
"pyarrow==14.0.1",
"PyYAML==6.0.1"
]

[project.optional-dependencies]
Expand Down
2 changes: 1 addition & 1 deletion cli/src/pixl_cli/_io.py
Original file line number Diff line number Diff line change
Expand Up @@ -27,7 +27,7 @@

# The export root dir from the point of view of the docker host (which is where the CLI runs)
# For the view from inside, see pixl_ehr/main.py: EHR_EXPORT_ROOT_DIR
HOST_EXPORT_ROOT_DIR = Path(__file__).parents[3] / "exports"
HOST_EXPORT_ROOT_DIR = Path(__file__).parents[3] / "projects" / "exports"


def messages_from_state_file(filepath: Path) -> list[Message]:
Expand Down
30 changes: 30 additions & 0 deletions cli/src/pixl_cli/main.py
Original file line number Diff line number Diff line change
Expand Up @@ -24,6 +24,7 @@
import requests
from core.patient_queue.producer import PixlProducer
from core.patient_queue.subscriber import PixlBlockingConsumer
from decouple import RepositoryEnv, UndefinedValueError, config

from pixl_cli._config import SERVICE_SETTINGS, api_config_for_queue
from pixl_cli._database import filter_exported_or_add_to_db
Expand All @@ -48,6 +49,35 @@ def cli(*, debug: bool) -> None:
set_log_level("WARNING" if not debug else "DEBUG")


@cli.command()
@click.option(
"--error",
is_flag=True,
show_default=True,
default=False,
help="Exit with error on missing env vars",
)
@click.option(
"--sample_env_file",
show_default=True,
default=None,
type=click.Path(exists=True),
help="Path to the sample env file",
)
def check_env(*, error: bool, sample_env_file: click.Path) -> None:
"""Check that all variables from .env.sample are set either in .env or in environ"""
if not sample_env_file:
sample_env_file = Path(__file__).parents[3] / ".env.sample"
sample_config = RepositoryEnv(sample_env_file)
for key in sample_config.data:
try:
config(key)
except UndefinedValueError: # noqa: PERF203
logger.warning("Environment variable %s is not set", key)
if error:
raise


@cli.command()
@click.option(
"--queues",
Expand Down
13 changes: 7 additions & 6 deletions cli/tests/conftest.py
Original file line number Diff line number Diff line change
Expand Up @@ -15,15 +15,14 @@
from __future__ import annotations

import os
from typing import TYPE_CHECKING
from pathlib import Path

import pytest
from core.db.models import Base, Extract, Image
from sqlalchemy import Engine, create_engine
from sqlalchemy.orm import Session, sessionmaker

if TYPE_CHECKING:
import pathlib
os.environ["PROJECT_CONFIGS_DIR"] = str(Path(__file__).parents[2] / "projects/configs")

# Set the necessary environment variables
os.environ["PIXL_EHR_API_HOST"] = "localhost"
Expand All @@ -47,9 +46,11 @@


@pytest.fixture(autouse=True)
def export_dir(tmp_path_factory: pytest.TempPathFactory) -> pathlib.Path:
"""Tmp dir to for tests to extract to."""
return tmp_path_factory.mktemp("export_base") / "exports"
def export_dir(tmp_path_factory: pytest.TempPathFactory) -> Path:
"""Tmp dir for tests to extract to."""
export_dir = tmp_path_factory.mktemp("export_base") / "projects" / "exports"
export_dir.mkdir(parents=True)
return export_dir


@pytest.fixture(scope="module")
Expand Down
43 changes: 43 additions & 0 deletions cli/tests/test_check_env.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,43 @@
# Copyright (c) University College London Hospitals NHS Foundation Trust
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
"""Tests for check_env function of CLI."""
from pathlib import Path

from click.testing import CliRunner
from pixl_cli.main import check_env

SAMPLE_ENV_FILE = Path(__file__).parents[2] / ".env.sample"


def test_check_env():
"""
Test that the check_env command runs without error.
- check_env works
- current test env file matches the sample env file
"""
runner = CliRunner()
result = runner.invoke(check_env)
assert result.exit_code == 0


def test_check_env_fails(tmp_path):
"""
Test that check_env fails when the current test env file does not match the sample env file.
""" # noqa: D200 either this or it's 102 chars
tmp_sample_env_file = tmp_path / ".env.sample"
tmp_sample_env_file.write_text("NONEXISTENT_VARIABLE=")

runner = CliRunner()
result = runner.invoke(check_env, str(tmp_sample_env_file))
assert result.exit_code != 0
Loading

0 comments on commit 7e5a28b

Please sign in to comment.