-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Deduplicate list fields in TIMDEX record
Why these changes are being introduced: * Improve data quality of TIMDEX records by reducing duplication of data in list fields. How this addresses that need: * Create an attrs converter function to dedupe list of items * Create ListFields abstract class with hash method * Set hash methods in custom classes to ListFields.__hash__ * Set 'converter=dedupe' for every list field in TimdexRecord * Add unit tests verifying deduplication of list fields Side effects of this change: * Deduplication is highly likely to result in diffs when comparing transformed records before and after this change. However (and more importantly), reducing duplicates improves the data quality of TIMDEX records. Relevant ticket(s): * https://mitlibraries.atlassian.net/browse/TIMX-332
- Loading branch information
1 parent
1910426
commit 2871769
Showing
2 changed files
with
340 additions
and
41 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.