-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
GTC-2631 Update datapump for GADM 4.1 #170
Conversation
src/datapump/jobs/geotrellis.py
Outdated
@@ -834,6 +833,15 @@ def _run_job_flow(self, name, instances, steps, applications, configurations): | |||
if GLOBALS.emr_service_role: | |||
request["ServiceRole"] = GLOBALS.emr_service_role | |||
|
|||
# If using version 2.4.1 or earlier, use older GDAL version | |||
if self.geotrellis_version < "2.4.1": |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure what the type of self.geotrellis_version is, but you're comparing it to a string here. Is that going to work?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it is a string
@@ -52,6 +52,8 @@ def get_1x1_asset(self, dataset: str, version: str) -> str: | |||
) | |||
elif dataset == "gadm" and version == "v3.6": | |||
return "s3://gfw-files/2018_update/tsv/gadm36_adm2_1_1.csv" | |||
elif dataset == "gadm" and version == "v4.1": |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it here because you needed to make manual changes to the 1x1 output from the API?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, it needed some minor adjustments like column names
src/datapump/jobs/geotrellis.py
Outdated
@@ -834,6 +833,15 @@ def _run_job_flow(self, name, instances, steps, applications, configurations): | |||
if GLOBALS.emr_service_role: | |||
request["ServiceRole"] = GLOBALS.emr_service_role | |||
|
|||
# If using version 2.4.1 or earlier, use older GDAL version | |||
if self.geotrellis_version < "2.4.1": |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it is a string
src/datapump/jobs/geotrellis.py
Outdated
@@ -821,8 +821,7 @@ def _run_job_flow(self, name, instances, steps, applications, configurations): | |||
{ | |||
"Name": "Install GDAL", | |||
"ScriptBootstrapAction": { | |||
"Path": f"s3://{GLOBALS.s3_bucket_pipeline}/geotrellis/bootstrap/gdal.sh", | |||
"Args": ["3.1.2"], | |||
"Path": f"s3://{GLOBALS.s3_bucket_pipeline}/geotrellis/bootstrap/gdal-3.8.3.sh" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
small nitpick, but you could reduce the lines of code by just having a block before this like:
bootstrap_path = f"s3://{GLOBALS.s3_bucket_pipeline}/geotrellis/bootstrap/gdal-3.8.3.sh"
if self.geotrellis_version < "2.4.1":
f"s3://{GLOBALS.s3_bucket_pipeline}/geotrellis/bootstrap/gdal.sh"
And then this line would just be changed to:
"Path": bootstrap_path
@@ -52,6 +52,8 @@ def get_1x1_asset(self, dataset: str, version: str) -> str: | |||
) | |||
elif dataset == "gadm" and version == "v3.6": | |||
return "s3://gfw-files/2018_update/tsv/gadm36_adm2_1_1.csv" | |||
elif dataset == "gadm" and version == "v4.1": | |||
return "s3://gfw-pipelines/geotrellis/features/gadm41_adm2_1x1.tsv" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I might have missed talk about this, but do we know any specific reason why the gadm41_adm2-1x1.tsv is 2.5 times bigger than the previous gadm36_adm2_1_1.csv (4.7 GB vs 1.9 GB)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I discovered that gadm36_adm2_1_1.csv
actually used 10x10 degree tiles. gadm41_adm2-1x1.tsv
is tiled by 1x1 degrees
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems good. Just wondering why gadm41_adm2_1x1.tsv is so much large (2.5 times large than 36 version). Were there so many more geographical areas or ADM2 units in gadm 4.1 vs gadm3.6?
) | ||
|
||
# If using version 2.4.1 or earlier, use older GDAL version |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Comment is slightly wrong. Should be "If using version 2.4.0 or earlier, use older GDAL version"
(Change 2.4.1 to 2.4.0)
Pull request checklist
Please check if your PR fulfills the following requirements:
Pull request type
Please check the type of change your PR introduces:
What is the current behavior?
Issue Number: GTC-2631
What is the new behavior?
Does this introduce a breaking change?
Other information