-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Why these changes are being introduced: Transmogrifier is getting updated to write its output to a TIMDEX parquet dataset, instead of standalone JSON and TXT files. To perform the actual dataset writing, the library timdex-dataset-api is used. These changes also make some preliminary updates to the README and CLI arguments, moving towards parquet dataset writing as the default behavior for Transmogrifier. However, "v1" behavior of writing to JSON and TXT files is still supported, and will be until parquet dataset work is finalized. How this addresses that need: * Installs timdex-dataset-api as application dependency * temporarily pinned to v0.2.0, but pinning will get removed * Uses timdex_dataset_api.TIMDEXDataset class to perform writing of transformed and source files to dataset * Changes -o CLI argument alias from --output-file to --output-location Side effects of this change: * None: where ETL_VERSION=1 (default) behavior will remain the same Relevant ticket(s): * https://mitlibraries.atlassian.net/browse/TIMX-405
- Loading branch information
Showing
10 changed files
with
636 additions
and
443 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Large diffs are not rendered by default.
Oops, something went wrong.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
14 changes: 14 additions & 0 deletions
14
tests/fixtures/dataset/libguides-2024-06-03-full-extracted-records-to-index.xml
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,14 @@ | ||
<?xml version="1.0" encoding="UTF-8"?> | ||
<records> | ||
<!-- valid record to index --> | ||
<record xmlns="http://www.openarchives.org/OAI/2.0/" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"><header><identifier>oai:libguides.com:guides/175846</identifier><datestamp>2024-02-27T18:27:05Z</datestamp><setSpec>guides</setSpec></header><metadata><oai_dc:dc xmlns:oai_dc="http://www.openarchives.org/OAI/2.0/oai_dc/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.openarchives.org/OAI/2.0/oai_dc/ http://www.openarchives.org/OAI/2.0/oai_dc.xsd"><dc:title>Materials Science & Engineering</dc:title><dc:creator>Ye Li</dc:creator><dc:subject>Engineering</dc:subject><dc:subject>Science</dc:subject><dc:description>Useful databases and other research tips for materials science.</dc:description><dc:publisher>MIT Libraries</dc:publisher><dc:date>2008-06-19 17:55:27</dc:date><dc:identifier>https://libguides.mit.edu/materials</dc:identifier></oai_dc:dc></metadata></record> | ||
|
||
<!-- deleted record --> | ||
<record xmlns="http://www.openarchives.org/OAI/2.0/" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"><header status="deleted"><identifier>oai:libguides.com:guides/175849</identifier><datestamp>2024-05-21T18:36:58Z</datestamp><setSpec>guides</setSpec></header><metadata><oai_dc:dc xmlns:oai_dc="http://www.openarchives.org/OAI/2.0/oai_dc/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.openarchives.org/OAI/2.0/oai_dc/ http://www.openarchives.org/OAI/2.0/oai_dc.xsd"><dc:title>Country Data & Analysis</dc:title><dc:creator>Nicholas Albaugh</dc:creator><dc:subject>Business & management</dc:subject><dc:description>This is the subject guide for Country Data & Analysis</dc:description><dc:publisher>MIT Libraries</dc:publisher><dc:date>2008-06-26 00:51:04</dc:date><dc:identifier>https://libguides.mit.edu/country</dc:identifier></oai_dc:dc></metadata></record> | ||
|
||
<!-- skipped record --> | ||
<record xmlns="http://www.openarchives.org/OAI/2.0/" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"><header><identifier>oai:libguides.com:guides/175853</identifier><datestamp>2024-03-26T20:15:38Z</datestamp><setSpec>guides</setSpec></header><metadata><oai_dc:dc xmlns:oai_dc="http://www.openarchives.org/OAI/2.0/oai_dc/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.openarchives.org/OAI/2.0/oai_dc/ http://www.openarchives.org/OAI/2.0/oai_dc.xsd"><dc:title>News, Newspapers, and Current Events</dc:title><dc:creator>Tina Chan</dc:creator><dc:subject>Interdisciplinary</dc:subject><dc:description>This is the subject guide for News</dc:description><dc:publisher>MIT Libraries</dc:publisher><dc:date>2008-06-26 21:29:54</dc:date><dc:identifier>https://libguides.mit.edu/news</dc:identifier></oai_dc:dc></metadata></record> | ||
|
||
<!-- unhandled exception record --> | ||
<record xmlns="http://www.openarchives.org/OAI/2.0/" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"><header><identifier>oai:libguides.com:guides/175855</identifier><datestamp>2021-07-19T09:31:31Z</datestamp><setSpec>guides</setSpec></header><metadata><oai_dc:dc xmlns:oai_dc="http://www.openarchives.org/OAI/2.0/oai_dc/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.openarchives.org/OAI/2.0/oai_dc/ http://www.openarchives.org/OAI/2.0/oai_dc.xsd"><dc:title>Biography</dc:title><dc:creator>Tina Chan</dc:creator><dc:subject>Interdisciplinary</dc:subject><dc:publisher>MIT Libraries</dc:publisher><dc:date>2008-06-26 22:05:13</dc:date><dc:identifier>https://libguides.mit.edu/biography</dc:identifier></oai_dc:dc></metadata></record> | ||
</records> |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.