Skip to content

Commit

Permalink
Refactor DatasetRecord to use attrs
Browse files Browse the repository at this point in the history
Why these changes are being introduced:
* Reworking the dataset partitions to use the [year, month, day]
of the 'run_date' means that parquet files for different 'source' runs
on the same 'run_date' get written to the same partition directory.
Therefore, it is crucial that the timdex_dataset_api.write method
retrieves the correct partition columns from the (batches) of DatasetRecord
objects. The DatasetRecord class has been refactored to adhere
to the following criteria:

1. When writing to the dataset, and therefore serializing DatasetRecord objects,
   year, month, day should be derived from the run_date and should not be modifiable
2. If possible, avoid parsing a datetime string 3 times for each partition column

How this addresses that need:
* Refactor DatasetRecord to use attrs
* Define custom strict_date_parse converter method for 'run_date' field
* Simplify serialization method to rely on converter for 'run_date'
  error handling
* Remove DatasetRecord.validate
* Include attrs as a dependency

Side effects of this change:
* None

Relevant ticket(s):
* https://mitlibraries.atlassian.net/browse/TIMX-432
  • Loading branch information
jonavellecuerdo committed Dec 11, 2024
1 parent 7207d8e commit 5e532d3
Show file tree
Hide file tree
Showing 5 changed files with 303 additions and 307 deletions.
8 changes: 4 additions & 4 deletions Pipfile
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,7 @@ verify_ssl = true
name = "pypi"

[packages]
attrs = "*"
boto3 = "*"
duckdb = "*"
pandas = "*"
Expand All @@ -14,15 +15,14 @@ black = "*"
boto3-stubs = {version = "*", extras = ["s3"]}
coveralls = "*"
ipython = "*"
moto = "*"
mypy = "*"
pandas-stubs = "*"
pre-commit = "*"
pytest-mock = "*"
pyarrow-stubs = "*"
pytest = "*"
ruff = "*"
setuptools = "*"
pandas-stubs = "*"
moto = "*"
pytest-mock = "*"

[requires]
python_version = "3.12"
Loading

0 comments on commit 5e532d3

Please sign in to comment.