-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Deduplicate list fields in TIMDEX record (#206)
Why these changes are being introduced: * Improve data quality of TIMDEX records by reducing duplication of data in list fields. How this addresses that need: * Create an attrs converter function to dedupe list of items * Create hash method for TIMDEX objects * Set hash methods in custom classes * Set 'converter=dedupe' for every list field in TimdexRecord * Add unit tests verifying functionality of hash and dedupe methods Side effects of this change: * Deduplication is highly likely to result in diffs when comparing transformed records before and after this change. However (and more importantly), reducing duplicates improves the data quality of TIMDEX records. Relevant ticket(s): * https://mitlibraries.atlassian.net/browse/TIMX-332
- Loading branch information
1 parent
1910426
commit 395e612
Showing
2 changed files
with
394 additions
and
41 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.