Skip to content

Commit

Permalink
Merge pull request #130 from moj-analytical-services/update-s3-output
Browse files Browse the repository at this point in the history
Make s3_output explicit
  • Loading branch information
gwionap authored May 13, 2024
2 parents c393cee + 2f6e3c0 commit cd82af2
Show file tree
Hide file tree
Showing 6 changed files with 28 additions and 9 deletions.
6 changes: 6 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,12 @@ All notable changes to this project will be documented in this file.
The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/),
and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).

## v5.5.18 - 2024-05-13

- Made s3_output explicit in _create_temp_database
- Added additional default region environment variable
which overrides AWS defaults if these cannot be set / changes

## v5.5.17 - 2024-04-17

- Added aws role rule
Expand Down
2 changes: 2 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -48,5 +48,7 @@ df = pydb.read_sql_query("select * from __temp__.my_table where year = 2022")
- To query a datetime or timestamp column in Athena you need to specify that your value is a timestamp e.g. `SELECT * FROM db.table WHERE datetime_col > timestamp '2018-12-31 23:59:59'`
- Note dates and datetimes formatting used above. See more specifics around date and datetimes [here](https://prestodb.io/docs/current/functions/datetime.html)
- To specify a string in the sql query always use '' not "". Using ""'s means that you are referencing a database, table or col, etc.
- If you are working in an environment where you cannot change the default AWS region environment
variables you can set `AWS_ATHENA_QUERY_REGION` which will override these

See changelog for release changes.
2 changes: 1 addition & 1 deletion pydbtools/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -31,4 +31,4 @@
)
from .utils import s3_path_join # noqa: F401

__version__ = "5.5.17"
__version__ = "5.5.18"
19 changes: 13 additions & 6 deletions pydbtools/_wrangler.py
Original file line number Diff line number Diff line change
Expand Up @@ -226,17 +226,24 @@ def _create_temp_database(
region_name: str = None,
):
region_name = _set_region_name(region_name)

user_id, s3_output = get_user_id_and_table_dir(
boto3_session=boto3_session,
force_ec2=force_ec2,
region_name=region_name,
)

if temp_db_name is None or temp_db_name.lower().strip() == "__temp__":
user_id, _ = get_user_id_and_table_dir(
boto3_session=boto3_session,
force_ec2=force_ec2,
region_name=region_name,
)
temp_db_name = get_database_name_from_userid(user_id)

create_db_query = f"CREATE DATABASE IF NOT EXISTS {temp_db_name}"

q_e_id = ath.start_query_execution(create_db_query, boto3_session=boto3_session)
q_e_id = ath.start_query_execution(
create_db_query,
s3_output=s3_output,
boto3_session=boto3_session,
)

return ath.wait_query(q_e_id, boto3_session=boto3_session)


Expand Down
6 changes: 5 additions & 1 deletion pydbtools/utils.py
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,11 @@
bucket = "mojap-athena-query-dump"
temp_database_name_prefix = "mojap_de_temp_"
aws_default_region = os.getenv(
"AWS_DEFAULT_REGION", os.getenv("AWS_REGION", "eu-west-1")
"AWS_ATHENA_QUERY_REGION",
os.getenv(
"AWS_DEFAULT_REGION",
os.getenv("AWS_REGION", "eu-west-1"),
),
)
aws_role_regex_rules = [
(
Expand Down
2 changes: 1 addition & 1 deletion pyproject.toml
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
[tool]
[tool.poetry]
name = "pydbtools"
version = "5.5.17"
version = "5.5.18"
description = "A python package to query data via amazon athena and bring it into a pandas df using aws-wrangler."
license = "MIT"
authors = ["Karik Isichei <[email protected]>"]
Expand Down

0 comments on commit cd82af2

Please sign in to comment.