Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Get rid of pixl_config.yml #311

Merged
merged 24 commits into from
Feb 26, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
24 commits
Select commit Hold shift + click to select a range
e59dd37
Rename `config_from_log_file` -> `project_info` + add docstring
milanmlft Feb 20, 2024
aebd6ba
Better variable name for resources path
milanmlft Feb 20, 2024
ffefd38
Move `APIConfig` class to `cli._config`
milanmlft Feb 20, 2024
1a77a58
Get queue API config values form `.env` instead of `pixl_config.yaml`
milanmlft Feb 20, 2024
4f0e563
Better variable name
milanmlft Feb 20, 2024
08d5f5e
Get service settings from env variables instead of `pixl_config.yml`
milanmlft Feb 20, 2024
6c83de0
Update sample env file
milanmlft Feb 20, 2024
610c33b
Replace more refrences to `cli_config`
milanmlft Feb 20, 2024
f35803d
Remove `pixl_config.yml` files
milanmlft Feb 20, 2024
f90f4a9
Build config objects in place
milanmlft Feb 21, 2024
2be0fcd
Add missing env variable
milanmlft Feb 21, 2024
b43134c
docs: Remove references to `pixl_config.yml`, describe config with en…
milanmlft Feb 21, 2024
0982814
Fix `API_CONFIGS` dict keys
milanmlft Feb 21, 2024
8469fc9
Make API_CONFIG dict in one go
milanmlft Feb 21, 2024
d6c5850
Remove default values for env variables and update tests
milanmlft Feb 21, 2024
c66b6cf
Rate should be a float, also update docstrings
milanmlft Feb 21, 2024
9a41205
Add missing `.env` vars to sample and remove another default
milanmlft Feb 21, 2024
c114fa8
Add default for `PIXL_QUERY_TIMEOUT` back in
milanmlft Feb 23, 2024
125fdfd
Make sure `.env` is loaded prior to setting API configs
milanmlft Feb 23, 2024
b252f31
Merge branch 'main' into milanmlft/rm-pixl_config
milanmlft Feb 26, 2024
f32d463
Use `decouple` to load local `.env` file
milanmlft Feb 26, 2024
f0089de
Load local .env file only if it exists
milanmlft Feb 26, 2024
8e3e29f
Merge branch 'main' into milanmlft/rm-pixl_config
milanmlft Feb 26, 2024
9cfbcdc
Revert default query timeout for tests back to 10
milanmlft Feb 26, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
14 changes: 13 additions & 1 deletion .env.sample
Original file line number Diff line number Diff line change
@@ -1,11 +1,14 @@
ENV=dev
DEBUG=True
PIXL_DICOM_TRANSFER_TIMEOUT=120
PIXL_QUERY_TIMEOUT=120

# PIXL PostgreSQL instance
PIXL_DB_HOST=postgres
PIXL_DB_NAME=pixl
PIXL_DB_USER=pixl
PIXL_DB_PASSWORD=
SKIP_ALEMBIC=false

# EMAP UDS
EMAP_UDS_HOST=
Expand All @@ -26,6 +29,15 @@ PIXL_EHR_API_PORT=
RABBITMQ_PORT=
RABBITMQ_ADMIN_PORT=
PIXL_IMAGING_API_PORT=
FTP_PORT=

# PIXL EHR API
PIXL_EHR_API_HOST=localhost
PIXL_EHR_API_RATE=1

# PIXL Imaging API
PIXL_IMAGING_API_HOST=localhost
PIXL_IMAGING_API_RATE=1

# Hasher API
HASHER_API_AZ_CLIENT_ID=
Expand Down Expand Up @@ -77,6 +89,7 @@ PIXL_EHR_API_AZ_STORAGE_CONTAINER_NAME=
PIXL_EHR_COGSTACK_REDACT_URL=

# RABBIT MQ queue. UI available at localhost:$RABBITMQ_ADMIN_PORT
RABBITMQ_HOST=localhost
RABBITMQ_USERNAME=
RABBITMQ_PASSWORD=

Expand All @@ -88,4 +101,3 @@ PIXL_QUERY_TIMEOUT=10
FTP_HOST=
FTP_USER_NAME=
FTP_USER_PASSWORD=
FTP_PORT=
4 changes: 2 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -146,10 +146,10 @@ This is one of `dev|test|staging|prod` and referred to as `<environment>` in the

### 2. Initialise environment configuration

Create a local `.env` and `pixl_config.yml` file in the _PIXL_ directory:
Create a local `.env` file in the _PIXL_ directory:

```bash
cp .env.sample .env && cp pixl_config.yml.sample pixl_config.yml
cp .env.sample .env
```

Add the missing configuration values to the new files:
Expand Down
45 changes: 36 additions & 9 deletions cli/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,11 +9,6 @@ stopped cleanly.

`PIXL CLI` requires Python version 3.10.

The CLI requires a `pixl_config.yml` file in the current working directory. A
[sample file](../pixl_config.yml.sample) is provided in the root of the repository. If you want to
run locally during development, we recommend running `pixl` from the [`./tests/`](./tests/)
directory, which contains a mock `pixl_config.yml` file.

Running the tests requires [docker](https://docs.docker.com/get-docker/) to be installed.

## Installation
Expand All @@ -40,6 +35,39 @@ See the commands and subcommands with
```bash
pixl --help
```
### Configuration

The `rabbitmq` and `postgress` services are configured by setting the following environment variables
(default values shown):

```sh
RABBITMQ_HOST=localhost
RABBITMQ_PORT=7008
RABBITMQ_USERNAME=rabbitmq_username
RABBITMQ_PASSWORD=rabbitmq_password

POSTGRES_HOST=localhost
POSTGRES_PORT=7001
PIXL_DB_USER=pixl_db_username
PIXL_DB_PASSWORD=pixl_db_password
PIXL_DB_NAME=pixl
```

The `rabbitmq` queues for the `ehr` and `imaging` APIs are configured by setting:

```sh
PIXL_EHR_API_HOST=localhost
PIXL_EHR_API_PORT=7006
PIXL_EHR_API_RATE=1

PIXL_IMAGING_API_HOST=localhost
PIXL_IMAGING_API_PORT=7007
PIXL_IMAGING_API_RATE=1
```

where the `*_RATE` variables set the default querying rate for the message queues.

### Running the pipeline

Populate queue for Imaging and EHR extraction

Expand Down Expand Up @@ -113,10 +141,9 @@ pip install -e ../pixl_core/ -e .[test]

### Running tests

The CLI tests require a running instance of the `rabbitmq` service, for which we provide a
`docker-compose` [file](./tests/docker-compose.yml). Spinning up the service and running `pytest`
can be done by running
Tests can be run with `pytest` from the `tests` directory.

```bash
./tests/run-tests.sh
cd tests
pytest
```
71 changes: 60 additions & 11 deletions cli/src/pixl_cli/_config.py
Original file line number Diff line number Diff line change
Expand Up @@ -12,21 +12,70 @@
# See the License for the specific language governing permissions and
# limitations under the License.

"""Configuration of CLI from config file."""
"""Configuration of CLI from environment variables."""

from pathlib import Path

import yaml
from decouple import Config, RepositoryEmpty, RepositoryEnv

env_file = Path.cwd() / ".env"
config = Config(RepositoryEnv(env_file)) if env_file.exists() else Config(RepositoryEmpty())

SERVICE_SETTINGS = {
stefpiatek marked this conversation as resolved.
Show resolved Hide resolved
"rabbitmq": {
"host": config("RABBITMQ_HOST"),
"port": int(config("RABBITMQ_PORT")),
"username": config("RABBITMQ_USERNAME"),
"password": config("RABBITMQ_PASSWORD"),
},
"postgres": {
"host": config("POSTGRES_HOST"),
"port": int(config("POSTGRES_PORT")),
"username": config("PIXL_DB_USER"),
"password": config("PIXL_DB_PASSWORD"),
"database": config("PIXL_DB_NAME"),
},
} # type: dict

milanmlft marked this conversation as resolved.
Show resolved Hide resolved

class APIConfig:
"""API Configuration"""

def __init__(self, host: str, port: int, default_rate: float = 1) -> None:
"""Initialise the APIConfig class"""
self.host = host
self.port = port
self.default_rate = default_rate

@property
def base_url(self) -> str:
"""Return the base url for the API"""
return f"http://{self.host}:{self.port}"


API_CONFIGS = {
"ehr_api": APIConfig(
host=config("PIXL_EHR_API_HOST"),
port=int(config("PIXL_EHR_API_PORT")),
default_rate=float(config("PIXL_EHR_API_RATE", default=1)),
),
"imaging_api": APIConfig(
host=config("PIXL_IMAGING_API_HOST"),
port=int(config("PIXL_IMAGING_API_PORT")),
default_rate=float(config("PIXL_IMAGING_API_RATE", default=1)),
),
}

def _load_config(filename: str = "pixl_config.yml") -> dict:
"""CLI configuration generated from a .yaml file"""
if not Path(filename).exists():
msg = f"Failed to find {filename}. It must be present in the current working directory"
raise FileNotFoundError(msg)

with Path(filename).open() as config_file:
config_dict = yaml.safe_load(config_file)
return dict(config_dict)
def api_config_for_queue(queue_name: str) -> APIConfig:
milanmlft marked this conversation as resolved.
Show resolved Hide resolved
"""Configuration for an API associated with a queue"""
api_name = f"{queue_name}_api"

if api_name not in API_CONFIGS:
msg = (
f"Cannot update the rate for {queue_name}. {api_name} was"
f" not specified in the configuration"
)
raise ValueError(msg)

cli_config = _load_config()
return API_CONFIGS[api_name]
4 changes: 2 additions & 2 deletions cli/src/pixl_cli/_database.py
Original file line number Diff line number Diff line change
Expand Up @@ -19,9 +19,9 @@
from sqlalchemy import URL, create_engine
from sqlalchemy.orm import Session, sessionmaker

from pixl_cli._config import cli_config
from pixl_cli._config import SERVICE_SETTINGS

connection_config = cli_config["postgres"]
connection_config = SERVICE_SETTINGS["postgres"]

url = URL.create(
drivername="postgresql+psycopg2",
Expand Down
14 changes: 9 additions & 5 deletions cli/src/pixl_cli/_io.py
Original file line number Diff line number Diff line change
Expand Up @@ -47,19 +47,23 @@ def messages_from_state_file(filepath: Path) -> list[Message]:
return [deserialise(line) for line in filepath.open().readlines() if string_is_non_empty(line)]


def config_from_log_file(parquet_path: Path) -> tuple[str, datetime]:
log_file = parquet_path / "extract_summary.json"
def project_info(resources_path: Path) -> tuple[str, datetime]:
"""
Get the project name and extract timestamp from the extract summary log file.
:param resources_path: path to the input resources
"""
log_file = resources_path / "extract_summary.json"
logs = json.load(log_file.open())
project_name = logs["settings"]["cdm_source_name"]
omop_es_timestamp = datetime.fromisoformat(logs["datetime"])
return project_name, omop_es_timestamp


def copy_parquet_return_logfile_fields(parquet_path: Path) -> tuple[str, datetime]:
def copy_parquet_return_logfile_fields(resources_path: Path) -> tuple[str, datetime]:
"""Copy public parquet file to extracts directory, and return fields from logfile"""
project_name, omop_es_timestamp = config_from_log_file(parquet_path)
project_name, omop_es_timestamp = project_info(resources_path)
extract = ParquetExport(project_name, omop_es_timestamp, HOST_EXPORT_ROOT_DIR)
project_name_slug = extract.copy_to_exports(parquet_path)
project_name_slug = extract.copy_to_exports(resources_path)
return project_name_slug, omop_es_timestamp


Expand Down
68 changes: 8 additions & 60 deletions cli/src/pixl_cli/main.py
Original file line number Diff line number Diff line change
Expand Up @@ -25,13 +25,13 @@
from core.patient_queue.producer import PixlProducer
from core.patient_queue.subscriber import PixlBlockingConsumer

from pixl_cli._config import cli_config
from pixl_cli._config import SERVICE_SETTINGS, api_config_for_queue
from pixl_cli._database import filter_exported_or_add_to_db
from pixl_cli._io import (
config_from_log_file,
copy_parquet_return_logfile_fields,
messages_from_parquet,
messages_from_state_file,
project_info,
)
from pixl_cli._logging import logger, set_log_level
from pixl_cli._utils import clear_file, remove_file_if_it_exists
Expand Down Expand Up @@ -98,7 +98,7 @@ def populate(parquet_dir: Path, *, restart: bool, queues: str) -> None:
sorted_messages = filter_exported_or_add_to_db(
sorted_messages, messages[0].project_name
)
with PixlProducer(queue_name=queue, **cli_config["rabbitmq"]) as producer:
with PixlProducer(queue_name=queue, **SERVICE_SETTINGS["rabbitmq"]) as producer:
producer.publish(sorted_messages)


Expand All @@ -113,7 +113,7 @@ def extract_radiology_reports(parquet_dir: Path) -> None:
PARQUET_DIR: Directory containing the extract_summary.json
log file defining which extract to export radiology reports for.
"""
project_name, omop_es_datetime = config_from_log_file(parquet_dir)
project_name, omop_es_datetime = project_info(parquet_dir)

# Call the EHR API
api_config = api_config_for_queue("ehr")
Expand Down Expand Up @@ -143,10 +143,9 @@ def extract_radiology_reports(parquet_dir: Path) -> None:
"--rate",
type=float,
default=None,
help="Rate at which to process items from a queue (in items per second)."
"If None then will use the default rate defined in the config file",
help="Rate at which to process items from a queue (in items per second).",
)
def start(queues: str, rate: Optional[int]) -> None:
def start(queues: str, rate: Optional[float]) -> None:
"""Start consumers for a set of queues"""
if rate == 0:
msg = "Cannot start extract with a rate of 0. Must be >0"
Expand Down Expand Up @@ -186,10 +185,7 @@ def _update_extract_rate(queue_name: str, rate: Optional[float]) -> None:

if rate is None:
if api_config.default_rate is None:
msg = (
"Cannot update the rate for %s. No default rate was specified.",
queue_name,
)
msg = f"Cannot update the rate for {queue_name}. No valid rate was specified."
raise ValueError(msg)
rate = float(api_config.default_rate)
logger.info(f"Using the default extract rate of {rate}/second")
Expand Down Expand Up @@ -299,7 +295,7 @@ def consume_all_messages_and_save_csv_file(queue_name: str, timeout_in_seconds:
f"{timeout_in_seconds} seconds"
)

with PixlBlockingConsumer(queue_name=queue_name, **cli_config["rabbitmq"]) as consumer:
with PixlBlockingConsumer(queue_name=queue_name, **SERVICE_SETTINGS["rabbitmq"]) as consumer:
state_filepath = state_filepath_for_queue(queue_name)
if consumer.message_count > 0:
logger.info("Found messages in the queue. Clearing the state file")
Expand All @@ -325,51 +321,3 @@ def inform_user_that_queue_will_be_populated_from(path: Path) -> None: # noqa:
f"state files should be ignored, or delete this file to ignore. Press "
f"Ctrl-C to exit and any key to continue"
)


class APIConfig:
"""
Class to represent the configuration for an API

Attributes
----------
host : str
Hostname for the API
port : int
Port for the API
default_rate : int
Default rate for the API

Methods
-------
base_url()
Return the base url for the API

"""

def __init__(self, kwargs: dict) -> None:
"""Initialise the APIConfig class"""
self.host: Optional[str] = None
self.port: Optional[int] = None
self.default_rate: Optional[int] = None

self.__dict__.update(kwargs)

@property
def base_url(self) -> str:
"""Return the base url for the API"""
return f"http://{self.host}:{self.port}"


def api_config_for_queue(queue_name: str) -> APIConfig:
"""Configuration for an API associated with a queue"""
config_key = f"{queue_name}_api"

if config_key not in cli_config:
msg = (
f"Cannot update the rate for {queue_name}. {config_key} was"
f" not specified in the configuration"
)
raise ValueError(msg)

return APIConfig(cli_config[config_key])
Loading
Loading