Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TIMX 404 - Establish feature flag branching for parquet work #211

Merged
merged 3 commits into from
Nov 22, 2024

Conversation

ghukill
Copy link
Contributor

@ghukill ghukill commented Nov 22, 2024

Purpose and background context

We will be using a feature flag approach while modifying Transmogrifier to output transformed files to a parquet dataset instead of JSON and TXT files.

We are doing so with the goal of full backwards compatability until that refactor is complete.

How this addresses that need:

  • Utilizes a new optional env var ETL_VERSION to provide an ETL version for code to branch from.
  • The Transformer.__next__() method calls dedicated sub-methods depending on the version.
    • Functional stubs have been added for v2 behavior, but are mostly designed for envisioning the feature flag branching, and will have their own pass for completeness and testing.
  • CLI command is updated to call new Transformer methods based on the ETL version.
  • All feature flag logic branching is noted by comments, with the convention # NOTE: FEATURE FLAG: <details...>, suitable for removal when the development work is complete.

A small suite of tests have been added for feature flagging logic, but they are intended to exist only as long as that logic branching is in place. They are not exhaustive, mostly designed to ensure v1 or v2 code pathways are invoked based on the environment.

How can a reviewer manually see the effects of these changes?

1- Run make install and set AWS Dev credentials in console

2- Create some folders in scratch output folder

mkdir -p output/parquet/v1
mkdir -p output/parquet/v2

3- Perform a "v1" transform and observe normal JSON and TXT files output, where ETL_VERSION env var is not set (or equals 1):

Set env var in .env (would also work if omitted):

ETL_VERSION=1

Run transform:

pipenv run transform --verbose \
-s libguides \
-i s3://timdex-extract-dev-222053980223/libguides/libguides-2023-07-21-full-extracted-records-to-index.xml \
-o output/parquet/v1/libguides.json

3- Perform a "v2" transform and observe a parquet dataset is created and written to:

Set env var in .env:

ETL_VERSION=2

Run Transmog:

pipenv run transform --verbose \
-s libguides \
-i s3://timdex-extract-dev-222053980223/libguides/libguides-2023-07-21-full-extracted-records-to-index.xml \
-o output/parquet/v2/dataset

Includes new or updated dependencies?

YES

Changes expectations for external applications?

NO: Transmogrifier remains fully backwards compatible, either via the absence of env var ETL_VERSION or if the value equals 1

What are the relevant tickets?

Developer

  • All new ENV is documented in README
  • All new ENV has been added to staging and production environments
  • All related Jira tickets are linked in commit message(s)
  • Stakeholder approval has been confirmed (or is not needed)

Code Reviewer(s)

  • The commit message is clear and follows our guidelines (not just this PR message)
  • There are appropriate tests covering any new functionality
  • The provided documentation is sufficient for understanding any new functionality introduced
  • Any manual tests have been performed and verified
  • New dependencies are appropriate or there were no changes

Why these changes are being introduced:

We will be using a feature flag approach while modifying Transmogrifier
to output transformed files to a parquet dataset instead of JSON and TXT
files.

We are doing so with the goal of full backwards compatability until
that refactor is complete.

How this addresses that need:
* Utilizes a new optioanl env var 'ETL_VERSION' to provide an ETL version
for code to branch from.
* The Transformer.__next__() method calls dedicated sub-methods depending
on the version.
* Functional stubs have been added for v2 behavior, but are mostly designed
for envisioning the feature flag branching, and will have their own pass
for completeness and testing.
* CLI command is updated to call new Transformer methods based on the ETL
version.
* All feature flag logic branching is noted by comments, suitable for
removal when the development work is complete.

Side effects of this change:
* Transmogrifier remains fully backwards compatible, either via the
absence of env var 'ETL_VERSION' or if the value equals '1'.

Relevant ticket(s):
* https://mitlibraries.atlassian.net/browse/TIMX-404
Why these changes are being introduced:

We do not need to heavily test the feature flagging logic,
knowing it will be removed after development work.  But a bit
of testing can ensure that it works as expected while termporarily
in a deployed state.

How this addresses that need:
* Adds new test file tests/test_temporary_feature_flagging.py that
is noted to be removed after development work is complete.

Side effects of this change:
* None

Relevant ticket(s):
* https://mitlibraries.atlassian.net/browse/TIMX-404
@ghukill ghukill requested a review from ehanson8 November 22, 2024 15:08


@dataclass
class ETLRecord:
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In retrospect, probably could have / should have marked this as "WIP" as well. My hunch is that this approach could work well, but for the sake of this PR it's still a bit of a stub.

Copy link
Contributor

@ehanson8 ehanson8 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me, optional recommendation to alphabetize dependencies but that is not urgent at all

Pipfile Outdated
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might consider alphabetizing the dependencies

Comment on lines -26 to -27
"ANN101",
"ANN102",
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ruff was updated to v0.8, and this warning came up:

warning: The following rules have been removed and ignoring them has no effect:
    - ANN101
    - ANN102

These were removed, and no issue.

@ghukill ghukill merged commit d3a6018 into main Nov 22, 2024
3 checks passed
@ghukill
Copy link
Contributor Author

ghukill commented Nov 22, 2024

@ehanson8 ehanson8 deleted the TIMX-404-establish-feature-flagging-pathways branch November 25, 2024 16:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants