You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
@pabloarosado did a great job creating a new regions dataset that resembles a typical dataset. This dataset will soon be used by grapher, and we'll finally have a single source of truth for all regions... except for countries_regions.csv. That file still resides in ETL and supports numerous helper functions and datasets. It's starting to cause headaches because it's not 100% consistent with the regions dataset.
We should attempt to remove it from ETL if we don't encounter any major obstacles.
(I wasn't sure whether we already have an issue for this)
Potential issues
Need to define data://garden/regions/2023-01-01/regions dependency for each step.
Adding alias to regions.yml will trigger update of all datasets that depend on it. That's quite wasteful.
Solution to both would be to make regions dataset implicit dependency of all steps and ignore its checksum. Any updates to regions.yml would have to be followed by manual trigger of ETL (we could have explicit version regions.yml, e.g. 1.2.3 and increment it if we manually update it. That version would be then part of checksum just like pandas version is).
The text was updated successfully, but these errors were encountered:
@pabloarosado did a great job creating a new regions dataset that resembles a typical dataset. This dataset will soon be used by grapher, and we'll finally have a single source of truth for all regions... except for
countries_regions.csv
. That file still resides in ETL and supports numerous helper functions and datasets. It's starting to cause headaches because it's not 100% consistent with the regions dataset.We should attempt to remove it from ETL if we don't encounter any major obstacles.
(I wasn't sure whether we already have an issue for this)
Potential issues
data://garden/regions/2023-01-01/regions
dependency for each step.alias
toregions.yml
will trigger update of all datasets that depend on it. That's quite wasteful.Solution to both would be to make regions dataset implicit dependency of all steps and ignore its checksum. Any updates to
regions.yml
would have to be followed by manual trigger of ETL (we could have explicit versionregions.yml
, e.g.1.2.3
and increment it if we manually update it. That version would be then part of checksum just like pandas version is).The text was updated successfully, but these errors were encountered: