Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Timx 412 establish feature flagging pathways #313

Merged
merged 3 commits into from
Dec 9, 2024

Conversation

jonavellecuerdo
Copy link
Contributor

@jonavellecuerdo jonavellecuerdo commented Dec 3, 2024

Purpose and background context

We will be using a feature flag approach while modifying TIMDEX Pipeline Lambdas to generate the required commands for the Transform step to output files to a parquet dataset and for the Load step to read records a parquet dataset
and index records into TIMDEX.

We are doing so with the goal of full backwards compatibility until that refactor is complete.

How can a reviewer manually see the effects of these changes?

The instructions for review involve running the Lambda function locally using Docker.

Prerequisite: Build a Docker image by running make dist-dev.


Generate commands when ETL_VERSION=1

a. Launch a Docker container using the following command, setting ETL_VERSION=1 and AWSAdministratorAccess credentials for Dev1.

docker run \
-e TIMDEX_ALMA_EXPORT_BUCKET_ID=alma-bucket-name \
-e TIMDEX_S3_EXTRACT_BUCKET_ID=timdex-extract-dev-222053980223 \
-e WORKSPACE=dev \
-e ETL_VERSION=1 \
-e AWS_ACCESS_KEY_ID=<aws_access_key_id> \
-e AWS_SECRET_ACCESS_KEY=<aws_secret_access_key> \
-e AWS_SESSION_TOKEN=<aws_session_token> \
-p 9000:8080 timdex-pipeline-lambdas-dev:latest

b. Generate Transform commands. In a second terminal, run the following command to POST to the container.

curl -XPOST "http://localhost:9000/2015-03-31/functions/function/invocations" -d '{
  "run-date": "2024-11-22",
  "run-type": "full",
  "source": "libguides",
  "verbose": false,
  "harvester-type": "oai",
  "next-step": "transform",
  "extract": {
    "extract-command": [
      "--host=https://libguides.mit.edu/oai.php",
      "--output-file=s3://timdex-extract-dev-222053980223/libguides/libguides-2024-11-22-full-extracted-records-to-index.xml",
      "harvest",
      "--metadata-format=oai_dc",
      "--exclude-deleted",
      "--set-spec=guides"
    ]
  }
}'

You should get the following output:

{"run-date": "2024-11-22", "run-type": "full", "source": "libguides", "verbose": false, "next-step": "load", "transform": {"files-to-transform": [{"transform-command": ["--input-file=s3://timdex-extract-dev-222053980223/libguides/libguides-2024-11-22-full-extracted-records-to-index.xml", "--output-file=s3://timdex-extract-dev-222053980223/libguides/libguides-2024-11-22-full-transformed-records-to-index.json", "--source=libguides"]}]}}

c. Generate Load commands. In a second terminal, run the following command to POST to the container.

curl -XPOST "http://localhost:9000/2015-03-31/functions/function/invocations" -d '{
  "run-date": "2024-11-22",
  "run-type": "full",
  "source": "libguides",
  "verbose": false,
  "next-step": "load",
  "transform": {
    "files-to-transform": [
      {
        "transform-command": [
          "--input-file=s3://timdex-extract-dev-222053980223/libguides/libguides-2024-11-22-full-extracted-records-to-index.xml",
          "--output-file=s3://timdex-extract-dev-222053980223/libguides/libguides-2024-11-22-full-transformed-records-to-index.json",
          "--source=libguides"
        ]
      }
    ]
    }
}'

You should get the following output:

{"run-date": "2024-11-22", "run-type": "full", "source": "libguides", "verbose": false, "load": {"create-index-command": ["create", "--index", "libguides-2024-12-03t20-41-46"], "files-to-index": [{"load-command": ["bulk-index", "--index", "libguides-2024-12-03t20-41-46", "s3://timdex-extract-dev-222053980223/libguides/libguides-2024-11-22-full-transformed-records-to-index.json"]}], "promote-index-command": ["promote", "--index", "libguides-2024-12-03t20-41-46"]}}

d. Exit the container.


Generating commands when ETL_VERSION=2

a. Launch a Docker container using the following command, setting ETL_VERSION=2 and AWSAdministratorAccess credentials for Dev1.

docker run \
-e TIMDEX_ALMA_EXPORT_BUCKET_ID=alma-bucket-name \
-e TIMDEX_S3_EXTRACT_BUCKET_ID=timdex-extract-dev-222053980223 \
-e WORKSPACE=dev \
-e ETL_VERSION=2 \
-e AWS_ACCESS_KEY_ID=<aws_access_key_id> \
-e AWS_SECRET_ACCESS_KEY=<aws_secret_access_key> \
-e AWS_SESSION_TOKEN=<aws_session_token> \
-p 9000:8080 timdex-pipeline-lambdas-dev:latest

b. Generate Transform commands. In a second terminal, run the following command to POST to the container.

curl -XPOST "http://localhost:9000/2015-03-31/functions/function/invocations" -d '{
  "run-date": "2024-11-22",
  "run-type": "full",
  "source": "libguides",
  "verbose": false,
  "harvester-type": "oai",
  "next-step": "transform",
  "extract": {
    "extract-command": [
      "--host=https://libguides.mit.edu/oai.php",
      "--output-file=s3://timdex-extract-dev-222053980223/libguides/libguides-2024-11-22-full-extracted-records-to-index.xml",
      "harvest",
      "--metadata-format=oai_dc",
      "--exclude-deleted",
      "--set-spec=guides"
    ]
  }
}'

You should get the following output:

{"errorMessage": "", "errorType": "NotImplementedError", "requestId": "ceff53c0-991c-4fd5-b3c4-5675cf86e13e", "stackTrace": ["  File \"/var/task/lambdas/format_input.py\", line 73, in lambda_handler\n    result[\"transform\"] = commands.generate_transform_commands(\n", "  File \"/var/task/lambdas/commands.py\", line 86, in generate_transform_commands\n    return _etl_v2_generate_transform_commands_method()\n", "  File \"/var/task/lambdas/commands.py\", line 119, in _etl_v2_generate_transform_commands_method\n    raise NotImplementedError\n"]}

c. Generate Load commands. In a second terminal, run the following command to POST to the container.

curl -XPOST "http://localhost:9000/2015-03-31/functions/function/invocations" -d '{
  "run-date": "2024-11-22",
  "run-type": "full",
  "source": "libguides",
  "verbose": false,
  "next-step": "load",
  "transform": {
    "files-to-transform": [
      {
        "transform-command": [
          "--input-file=s3://timdex-extract-dev-222053980223/libguides/libguides-2024-11-22-full-extracted-records-to-index.xml",
          "--output-file=s3://timdex-extract-dev-222053980223/libguides/libguides-2024-11-22-full-transformed-records-to-index.json",
          "--source=libguides"
        ]
      }
    ]
    }
}'

You should get the following output:

{"errorMessage": "", "errorType": "NotImplementedError", "requestId": "13a29e15-64f3-4596-b8a4-f0ea4ba41ddf", "stackTrace": ["  File \"/var/task/lambdas/format_input.py\", line 93, in lambda_handler\n    result[\"load\"] = commands.generate_load_commands(\n", "  File \"/var/task/lambdas/commands.py\", line 137, in generate_load_commands\n    return _etl_v2_generate_load_commands_method()\n", "  File \"/var/task/lambdas/commands.py\", line 213, in _etl_v2_generate_load_commands_method\n    raise NotImplementedError\n"]}

d. Exit the container.

Includes new or updated dependencies?

YES

Changes expectations for external applications?

NO: TIMDEX Pipeline Lambdas remains fully backwards compatible, either via the absence of env var ETL_VERSION or if the value equals 1

What are the relevant tickets?

Developer

  • All new ENV is documented in README
  • All new ENV has been added to staging and production environments
  • All related Jira tickets are linked in commit message(s)
  • Stakeholder approval has been confirmed (or is not needed)

Code Reviewer(s)

  • The commit message is clear and follows our guidelines (not just this PR message)
  • There are appropriate tests covering any new functionality
  • The provided documentation is sufficient for understanding any new functionality introduced
  • Any manual tests have been performed and verified
  • New dependencies are appropriate or there were no changes

@jonavellecuerdo jonavellecuerdo self-assigned this Dec 3, 2024
@jonavellecuerdo jonavellecuerdo force-pushed the TIMX-412-establish-feature-flagging-pathways branch from bd3b8f4 to 3cd12ff Compare December 3, 2024 19:00
@coveralls
Copy link

coveralls commented Dec 3, 2024

Pull Request Test Coverage Report for Build 12240426182

Details

  • 24 of 26 (92.31%) changed or added relevant lines in 2 files are covered.
  • No unchanged relevant lines lost coverage.
  • Overall coverage decreased (-0.6%) to 98.258%

Changes Missing Coverage Covered Lines Changed/Added Lines %
lambdas/commands.py 17 19 89.47%
Totals Coverage Status
Change from base Build 9008200399: -0.6%
Covered Lines: 282
Relevant Lines: 287

💛 - Coveralls

@jonavellecuerdo jonavellecuerdo marked this pull request as ready for review December 3, 2024 20:47
Copy link
Contributor

@ghukill ghukill left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall, looking good! Feels thorough, and will support dropping in "v2" logic when we're ready.

My primary question is about when to pull the ETL_VERSION. It seems like if we do that in helper functions like generate_transform_command() and generate_load_command(), we could avoid some of the function signature changes.
Not-blocking per say, as it's all temporary code, just curious the thinking here.

@@ -51,6 +52,15 @@ def configure_logger(
)


# NOTE: FEATURE FLAG: function will be removed after v2 work is complete
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The more we can follow this comment convention of # NOTE: FEATURE FLAG: ... the easier I think it'll be to find feature flag branching logic in code when we're ready to remove! Maybe/probably better ways, but hopeful this will work, thanks for implementing.

lambdas/format_input.py Outdated Show resolved Hide resolved
lambdas/format_input.py Outdated Show resolved Hide resolved
tests/test_commands.py Outdated Show resolved Hide resolved
Copy link
Contributor

@ehanson8 ehanson8 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me, well-marked code that should be easy to remove and I agree on the minimal but sufficient testing approach given that this is temporary. Deferring to @ghukill for approval so feel free to merge after he approves

Copy link
Contributor

@ghukill ghukill left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great! Nice work.

Why these changes are being introduced:
* We will be using a feature flag approach while modifying
TIMDEX Pipeline Lambdas to generate the required commands
for the Transform step to output files to a parquet dataset
and for the Load step to read records a parquet dataset
and index records into TIMDEX.

We are doing so with the goal of full backwards compatibility until
that refactor is complete.

How this addresses that need:
* Utilizes a new optional env var 'ETL_VERSION' to provide an ETL version
for code to branch from.
* Format Input Lambda function handler is updated to provide an ETL version
to the functions that generate commands for the Transform and Load step.
* Add branching logic and submethods to the functions that generate commands
for the Transform and Load step.
* All feature flag logic branching is noted by comments, suitable for
removal when the development work is complete.

Side effects of this change:
* TIMDEX Pipeline Lambdas remains fully backwards compatible, either via
the absence of env var 'ETL_VERSION' or if the value equals '1'.

Relevant ticket(s):
* https://mitlibraries.atlassian.net/browse/TIMX-412
Why these changes are being introduced:
* We do not need to heavily test the feature flagging logic,
knowing it will be removed after development work.  But a bit
of testing can ensure that it works as expected while termporarily
in a deployed state.

How this addresses that need:
* Adds new test file tests/test_temporary_feature_flagging.py that
is noted to be removed after development work is complete
* Test branching logic in Format Input handler using 'ETL_VERSION' env var
* Test config function for retrieving 'ETL_VERSION' env var

Side effects of this change:
* None

Relevant ticket(s):
* https://mitlibraries.atlassian.net/browse/TIMX-412
@jonavellecuerdo jonavellecuerdo force-pushed the TIMX-412-establish-feature-flagging-pathways branch from 9923300 to b859e80 Compare December 9, 2024 16:56
@jonavellecuerdo jonavellecuerdo merged commit 8e09243 into main Dec 9, 2024
3 checks passed
@jonavellecuerdo jonavellecuerdo deleted the TIMX-412-establish-feature-flagging-pathways branch December 9, 2024 16:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants