Skip to content

Commit

Permalink
Merge pull request #134 from moj-analytical-services/query_bucket_imp…
Browse files Browse the repository at this point in the history
…rovement

add exception if bucket env variable not set
  • Loading branch information
Thomas-Hirsch authored Jul 31, 2024
2 parents 0e5ef67 + 351b5bb commit 5d5a040
Show file tree
Hide file tree
Showing 5 changed files with 24 additions and 5 deletions.
4 changes: 4 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,10 @@ All notable changes to this project will be documented in this file.
The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/),
and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).

## v5.6.0 - 2024-07-30

- Add error and environment variable for query dump bucket, to warn if bucket and region are mismatched.

## v5.5.20 - 2024-07-23

- updated dependencies for security
Expand Down
6 changes: 4 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -43,12 +43,14 @@ df = pydb.read_sql_query("select * from __temp__.my_table where year = 2022")

## Notes

- Amazon Athena using a flavour of SQL called presto docs can be found [here](https://prestodb.io/docs/current/)
- Amazon Athena using a flavour of SQL called trino. Docs can be found [here](https://trino.io/docs/current/language.html)
- To query a date column in Athena you need to specify that your value is a date e.g. `SELECT * FROM db.table WHERE date_col > date '2018-12-31'`
- To query a datetime or timestamp column in Athena you need to specify that your value is a timestamp e.g. `SELECT * FROM db.table WHERE datetime_col > timestamp '2018-12-31 23:59:59'`
- Note dates and datetimes formatting used above. See more specifics around date and datetimes [here](https://prestodb.io/docs/current/functions/datetime.html)
- To specify a string in the sql query always use '' not "". Using ""'s means that you are referencing a database, table or col, etc.
- If you are working in an environment where you cannot change the default AWS region environment
variables you can set `AWS_ATHENA_QUERY_REGION` which will override these
variables you can set `AWS_ATHENA_QUERY_REGION` which will override these.
- You can override the bucket where query results are outputted to with the `ATHENA_QUERY_DUMP_BUCKET` environment variable.
This is mandatory if you set the region to something other than `eu-west-1`.

See changelog for release changes.
2 changes: 1 addition & 1 deletion pydbtools/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -31,4 +31,4 @@
)
from .utils import s3_path_join # noqa: F401

__version__ = "5.5.20"
__version__ = "5.6.0"
15 changes: 14 additions & 1 deletion pydbtools/utils.py
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,6 @@
from botocore.credentials import InstanceMetadataFetcher, InstanceMetadataProvider

# Set pydbtool params - if you were so inclined to change them
bucket = "mojap-athena-query-dump"
temp_database_name_prefix = "mojap_de_temp_"
aws_default_region = os.getenv(
"AWS_ATHENA_QUERY_REGION",
Expand All @@ -21,6 +20,20 @@
os.getenv("AWS_REGION", "eu-west-1"),
),
)

if aws_default_region == "eu-west-1":
bucket = os.getenv("ATHENA_QUERY_DUMP_BUCKET", "mojap-athena-query-dump")
else:
try:
bucket = os.environ["ATHENA_QUERY_DUMP_BUCKET"]
except KeyError:
raise KeyError(
f"""The AWS region is set to {aws_default_region}
but environment variable ATHENA_QUERY_DUMP_BUCKET was not set.
Either set AWS_ATHENA_QUERY_REGION to eu-west-1
or specify the query dump bucket"""
)

aws_role_regex_rules = [
(
r"@[a-z.-]+.gov.uk$", # gov email
Expand Down
2 changes: 1 addition & 1 deletion pyproject.toml
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
[tool]
[tool.poetry]
name = "pydbtools"
version = "5.5.20"
version = "5.6.0"
description = "A python package to query data via amazon athena and bring it into a pandas df using aws-wrangler."
license = "MIT"
authors = ["Karik Isichei <[email protected]>"]
Expand Down

0 comments on commit 5d5a040

Please sign in to comment.