-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
GTC-2631 Update datapump for GADM 4.1 #170
Changes from 1 commit
2d27842
c0d614f
b6f850c
486dd2c
357f25c
8cb5858
93eb55f
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -52,6 +52,8 @@ def get_1x1_asset(self, dataset: str, version: str) -> str: | |
) | ||
elif dataset == "gadm" and version == "v3.6": | ||
return "s3://gfw-files/2018_update/tsv/gadm36_adm2_1_1.csv" | ||
elif dataset == "gadm" and version == "v4.1": | ||
return "s3://gfw-pipelines/geotrellis/features/gadm41_adm2_1x1.tsv" | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I might have missed talk about this, but do we know any specific reason why the gadm41_adm2-1x1.tsv is 2.5 times bigger than the previous gadm36_adm2_1_1.csv (4.7 GB vs 1.9 GB)? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yes, I discovered that |
||
|
||
return self.get_asset(dataset, version, "1x1 grid")["asset_uri"] | ||
|
||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -821,8 +821,7 @@ def _run_job_flow(self, name, instances, steps, applications, configurations): | |
{ | ||
"Name": "Install GDAL", | ||
"ScriptBootstrapAction": { | ||
"Path": f"s3://{GLOBALS.s3_bucket_pipeline}/geotrellis/bootstrap/gdal.sh", | ||
"Args": ["3.1.2"], | ||
"Path": f"s3://{GLOBALS.s3_bucket_pipeline}/geotrellis/bootstrap/gdal-3.8.3.sh" | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. small nitpick, but you could reduce the lines of code by just having a block before this like:
And then this line would just be changed to:
|
||
}, | ||
}, | ||
], | ||
|
@@ -834,6 +833,15 @@ def _run_job_flow(self, name, instances, steps, applications, configurations): | |
if GLOBALS.emr_service_role: | ||
request["ServiceRole"] = GLOBALS.emr_service_role | ||
|
||
# If using version 2.4.1 or earlier, use older GDAL version | ||
if self.geotrellis_version < "2.4.1": | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Not sure what the type of self.geotrellis_version is, but you're comparing it to a string here. Is that going to work? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think it is a string |
||
request["BootstrapActions"] = { | ||
"Name": "Install GDAL", | ||
"ScriptBootstrapAction": { | ||
"Path": f"s3://{GLOBALS.s3_bucket_pipeline}/geotrellis/bootstrap/gdal.sh", | ||
}, | ||
}, | ||
|
||
LOGGER.info(f"Sending EMR request:\n{pformat(request)}") | ||
|
||
response = client.run_job_flow(**request) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it here because you needed to make manual changes to the 1x1 output from the API?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, it needed some minor adjustments like column names