Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GTC-2631 Update datapump for GADM 4.1 #170

Merged
merged 7 commits into from
Jan 17, 2025
Merged

GTC-2631 Update datapump for GADM 4.1 #170

merged 7 commits into from
Jan 17, 2025

Conversation

manukala6
Copy link
Member

@manukala6 manukala6 commented Jan 8, 2025

Pull request checklist

Please check if your PR fulfills the following requirements:

  • Make sure you are requesting to pull a topic/feature/bugfix branch (right side). Don't request your master!
  • Make sure you are making a pull request against the develop branch (left side). Also you should start your branch off our develop.
  • Check the commit's or even all commits' message styles matches our requested structure.
  • Check your code additions will fail neither code linting checks nor unit test.

Pull request type

Please check the type of change your PR introduces:

  • Bugfix
  • Feature
  • Code style update (formatting, renaming)
  • Refactoring (no functional changes, no api changes)
  • Build related changes
  • Documentation content changes
  • Other (please describe):

What is the current behavior?

Issue Number: GTC-2631

What is the new behavior?

  • Uses new GADM features and GDAL bootstrap

Does this introduce a breaking change?

  • Yes
  • No

Other information

@@ -834,6 +833,15 @@ def _run_job_flow(self, name, instances, steps, applications, configurations):
if GLOBALS.emr_service_role:
request["ServiceRole"] = GLOBALS.emr_service_role

# If using version 2.4.1 or earlier, use older GDAL version
if self.geotrellis_version < "2.4.1":
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure what the type of self.geotrellis_version is, but you're comparing it to a string here. Is that going to work?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it is a string

@@ -52,6 +52,8 @@ def get_1x1_asset(self, dataset: str, version: str) -> str:
)
elif dataset == "gadm" and version == "v3.6":
return "s3://gfw-files/2018_update/tsv/gadm36_adm2_1_1.csv"
elif dataset == "gadm" and version == "v4.1":
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it here because you needed to make manual changes to the 1x1 output from the API?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it needed some minor adjustments like column names

@@ -834,6 +833,15 @@ def _run_job_flow(self, name, instances, steps, applications, configurations):
if GLOBALS.emr_service_role:
request["ServiceRole"] = GLOBALS.emr_service_role

# If using version 2.4.1 or earlier, use older GDAL version
if self.geotrellis_version < "2.4.1":
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it is a string

@@ -821,8 +821,7 @@ def _run_job_flow(self, name, instances, steps, applications, configurations):
{
"Name": "Install GDAL",
"ScriptBootstrapAction": {
"Path": f"s3://{GLOBALS.s3_bucket_pipeline}/geotrellis/bootstrap/gdal.sh",
"Args": ["3.1.2"],
"Path": f"s3://{GLOBALS.s3_bucket_pipeline}/geotrellis/bootstrap/gdal-3.8.3.sh"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

small nitpick, but you could reduce the lines of code by just having a block before this like:

bootstrap_path = f"s3://{GLOBALS.s3_bucket_pipeline}/geotrellis/bootstrap/gdal-3.8.3.sh"
if self.geotrellis_version < "2.4.1":
    f"s3://{GLOBALS.s3_bucket_pipeline}/geotrellis/bootstrap/gdal.sh"

And then this line would just be changed to:

"Path": bootstrap_path

@@ -52,6 +52,8 @@ def get_1x1_asset(self, dataset: str, version: str) -> str:
)
elif dataset == "gadm" and version == "v3.6":
return "s3://gfw-files/2018_update/tsv/gadm36_adm2_1_1.csv"
elif dataset == "gadm" and version == "v4.1":
return "s3://gfw-pipelines/geotrellis/features/gadm41_adm2_1x1.tsv"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I might have missed talk about this, but do we know any specific reason why the gadm41_adm2-1x1.tsv is 2.5 times bigger than the previous gadm36_adm2_1_1.csv (4.7 GB vs 1.9 GB)?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I discovered that gadm36_adm2_1_1.csv actually used 10x10 degree tiles. gadm41_adm2-1x1.tsv is tiled by 1x1 degrees

Copy link
Contributor

@danscales danscales left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems good. Just wondering why gadm41_adm2_1x1.tsv is so much large (2.5 times large than 36 version). Were there so many more geographical areas or ADM2 units in gadm 4.1 vs gadm3.6?

)

# If using version 2.4.1 or earlier, use older GDAL version
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment is slightly wrong. Should be "If using version 2.4.0 or earlier, use older GDAL version"

(Change 2.4.1 to 2.4.0)

@manukala6 manukala6 merged commit 6fa990e into develop Jan 17, 2025
3 checks passed
@manukala6 manukala6 deleted the gadm41update branch January 17, 2025 18:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants