-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
TIMX 404 - Establish feature flag branching for parquet work #211
Conversation
Why these changes are being introduced: We will be using a feature flag approach while modifying Transmogrifier to output transformed files to a parquet dataset instead of JSON and TXT files. We are doing so with the goal of full backwards compatability until that refactor is complete. How this addresses that need: * Utilizes a new optioanl env var 'ETL_VERSION' to provide an ETL version for code to branch from. * The Transformer.__next__() method calls dedicated sub-methods depending on the version. * Functional stubs have been added for v2 behavior, but are mostly designed for envisioning the feature flag branching, and will have their own pass for completeness and testing. * CLI command is updated to call new Transformer methods based on the ETL version. * All feature flag logic branching is noted by comments, suitable for removal when the development work is complete. Side effects of this change: * Transmogrifier remains fully backwards compatible, either via the absence of env var 'ETL_VERSION' or if the value equals '1'. Relevant ticket(s): * https://mitlibraries.atlassian.net/browse/TIMX-404
Why these changes are being introduced: We do not need to heavily test the feature flagging logic, knowing it will be removed after development work. But a bit of testing can ensure that it works as expected while termporarily in a deployed state. How this addresses that need: * Adds new test file tests/test_temporary_feature_flagging.py that is noted to be removed after development work is complete. Side effects of this change: * None Relevant ticket(s): * https://mitlibraries.atlassian.net/browse/TIMX-404
|
||
|
||
@dataclass | ||
class ETLRecord: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In retrospect, probably could have / should have marked this as "WIP" as well. My hunch is that this approach could work well, but for the sake of this PR it's still a bit of a stub.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me, optional recommendation to alphabetize dependencies but that is not urgent at all
Pipfile
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Might consider alphabetizing the dependencies
"ANN101", | ||
"ANN102", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ruff
was updated to v0.8
, and this warning came up:
warning: The following rules have been removed and ignoring them has no effect:
- ANN101
- ANN102
These were removed, and no issue.
Successful run in Dev1, where the |
Purpose and background context
We will be using a feature flag approach while modifying Transmogrifier to output transformed files to a parquet dataset instead of JSON and TXT files.
We are doing so with the goal of full backwards compatability until that refactor is complete.
How this addresses that need:
ETL_VERSION
to provide an ETL version for code to branch from.Transformer.__next__()
method calls dedicated sub-methods depending on the version.# NOTE: FEATURE FLAG: <details...>
, suitable for removal when the development work is complete.A small suite of tests have been added for feature flagging logic, but they are intended to exist only as long as that logic branching is in place. They are not exhaustive, mostly designed to ensure v1 or v2 code pathways are invoked based on the environment.
How can a reviewer manually see the effects of these changes?
1- Run
make install
and set AWS Dev credentials in console2- Create some folders in scratch
output
folder3- Perform a "v1" transform and observe normal JSON and TXT files output, where
ETL_VERSION
env var is not set (or equals1
):Set env var in
.env
(would also work if omitted):Run transform:
3- Perform a "v2" transform and observe a parquet dataset is created and written to:
Set env var in
.env
:Run Transmog:
Includes new or updated dependencies?
YES
Changes expectations for external applications?
NO: Transmogrifier remains fully backwards compatible, either via the absence of env var
ETL_VERSION
or if the value equals1
What are the relevant tickets?
Developer
Code Reviewer(s)