diff --git a/CHANGELOG.rst b/CHANGELOG.md similarity index 62% rename from CHANGELOG.rst rename to CHANGELOG.md index 7efdb3a..e33cd5d 100644 --- a/CHANGELOG.rst +++ b/CHANGELOG.md @@ -1,21 +1,17 @@ -Development -*********** +### Development - Add test coverage report - Make consistent use of original_table_id/test_table_id - Add tests for is_sql -0.2.0 (2023-06-12) -****************** +### 0.2.0 (2023-06-12) - Remove redundant project_id in BQConfigRunner and SQLRunner -0.1.0 (2023-06-12) -****************** +### 0.1.0 (2023-06-12) - Lower pandas requirement to 1.5.0 -0.0.1 (2023-06-09) -****************** +### 0.0.1 (2023-06-09) - Initial release \ No newline at end of file diff --git a/CONTRIBUTORS.md b/CONTRIBUTORS.md new file mode 100644 index 0000000..a633ea9 --- /dev/null +++ b/CONTRIBUTORS.md @@ -0,0 +1,15 @@ +This is a list of the people who directly contributed to bquest in one way or another. We <3 your contributions! + +* [Alexander Grimm](https://github.com/almajo) +* [Benjamin Gutzmann](https://github.com/gutzbenj) +* [Lukas Janssen](https://github.com/LockiHH) +* [Norbert Maager](https://github.com/norbertmaager) +* [Thorsten Madlener](https://github.com/mdlnr) +* [Jan Specker](https://github.com/speckerjan) +* [Felix Theodor](https://github.com/FelixTheodor) +* [Malte Weinberg](https://github.com/WeinbergMalte) +* [Nils Weisbach](https://github.com/ncwhh) +* [Julian Werner](https://github.com/scieneers-jw) +* [Jia-Jen Yang](https://github.com/jiajentw) + +Special thanks goes to [Mike Czech](https://github.com/mikeczech) who initiated the development of bquest! \ No newline at end of file diff --git a/CONTRIBUTORS.rst b/CONTRIBUTORS.rst deleted file mode 100644 index ea76096..0000000 --- a/CONTRIBUTORS.rst +++ /dev/null @@ -1,15 +0,0 @@ -This is a list of the people who directly contributed to bquest in one way or another. We <3 your contributions! - -* `Alexander Grimm `_ -* `Benjamin Gutzmann `_ -* `Lukas Janssen `_ -* `Norbert Maager `_ -* `Thorsten Madlener `_ -* `Jan Specker `_ -* `Felix Theodor `_ -* `Malte Weinberg `_ -* `Nils Weisbach `_ -* `Julian Werner `_ -* `Jia-Jen Yang `_ - -Special thanks goes to `Mike Czech `_ who initiated the development of bquest! \ No newline at end of file diff --git a/README.md b/README.md new file mode 100644 index 0000000..0391c63 --- /dev/null +++ b/README.md @@ -0,0 +1,137 @@ +![BQuest Logo](https://raw.githubusercontent.com/ottogroup/bquest/main/docs/assets/logo.svg) + +# BQuest + +Effortlessly validate and test your Google BigQuery queries with the power of pandas DataFrames in Python. + +We would like to thank [Mike Czech](https://github.com/mikeczech) who is the original inventor of bquest! + +**Warning** + +This library is a work in progress! + +Breaking changes should be expected until a 1.0 release, so version pinning is recommended. + +[![CI: Overall outcome](https://github.com/ottogroup/bquest/workflows/Tests/badge.svg)](https://github.com/ottogroup/bquest/actions?workflow=Tests) +[![CD: gh-pages documentation](https://github.com/ottogroup/bquest/actions/workflows/pages/pages-build-deployment/badge.svg?branch=gh-pages)](https://github.com/ottogroup/bquest/actions/workflows/pages/pages-build-deployment) +[![PyPI version](https://img.shields.io/pypi/v/bquest.svg)](https://pypi.org/project/bquest/) +[![Project status (alpha, beta, stable)](https://img.shields.io/pypi/status/bquest.svg)](https://pypi.python.org/pypi/bquest/) +[![PyPI downloads](https://static.pepy.tech/personalized-badge/bquest?period=month&units=international_system&left_color=grey&right_color=blue&left_text=PyPI%20downloads/month)](https://pepy.tech/project/bquest) +[![Project license](https://img.shields.io/github/license/ottogroup/bquest)](https://github.com/ottogroup/bquest/blob/main/LICENSE) +[![Python version compatibility](https://img.shields.io/pypi/pyversions/bquest.svg)](https://pypi.python.org/pypi/bquest/) +[![Documentation: Black](https://img.shields.io/badge/code%20style-black-000000.svg)](https://github.com/psf/black) + +## Overview + +* Use BQuest in combination with your favorite testing framework (e.g. pytest). +* Create temporary test tables from [JSON](https://cloud.google.com/bigquery/docs/loading-data) or +[pandas.DataFrame](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.html). +* Run BQ configurations and plain SQL queries on your test tables and check the result. + +## Installation + +Via PyPi (standard): + +```bash + pip install bquest +``` + +Via Github (most recent): + +```bash + pip install git+https://github.com/ottogroup/bquest +``` + +BQuest also requires a dedicated BigQuery dataset for storing test tables, e.g. + +```yaml + resource "google_bigquery_dataset" "bquest" { + dataset_id = "bquest" + friendly_name = "bquest" + description = "Source tables for bquest tests" + location = "EU" + default_table_expiration_ms = 3600000 + } +``` + +We recommend setting an [expiration time](https://www.terraform.io/docs/providers/google/r/bigquery_dataset.html#default_table_expiration_ms) for tables in the bquest dataset to assure removal of those test tables upon +test execution. + +## Example + +Given a ``pandas.DataFrame`` + +| foo | weight | prediction_date | +|--------|--------|-------------------| +| bar | 23 | 20190301 | +| my | 42 | 20190301 | + +and its table definition + +```python + from bquest.tables import BQTableDefinitionBuilder + + table_def_builder = BQTableDefinitionBuilder(GOOGLE_PROJECT_ID, dataset="bquest", location="EU") + table_definition = table_def_builder.from_df("abc.feed_latest", df) +``` + +you can use the config file ``*./abc/config.py*`` + +```json-object + { + "query": """ + SELECT + foo, + PARSE_DATE('%Y%m%d', prediction_date) + FROM + `{source_table}` + WHERE + weight > {THRESHOLD} + """, + "start_date": "prediction_date", + "end_date": "prediction_date", + "source_tables": {"source_table": "abc.feed_latest"}, + "feature_table_name": "abc.myid", + } +``` + +and the runner + +```python + from bquest.runner import BQConfigFileRunner, BQConfigRunner + + runner = BQConfigFileRunner( + BQConfigRunner(bq_client, bq_executor_func), + "config/bq_config", + ) + + result_df = runner.run_config( + "20190301", + "20190308", + [table_definition], + "abc/config.py", + templating_vars={"THRESHOLD": "30"}, + ) +``` + +to assert the result table + +```python + assert result_df.shape == (1, 2) + assert result_df.iloc[0]["foo"] == "my" +``` + +## Testing + +For the actual testing bquest relies on an accessible BigQuery project which can be configured +with the [gcloud](https://cloud.google.com/sdk/docs/install?hl=de) client. The corresponding +``GOOGLE_PROJECT_ID`` is extracted from this project +and used with [pandas-gbq](https://github.com/googleapis/python-bigquery-pandas) to write temporary tables to the bquest dataset that has to be pre- +configured before testing on that project. + +For Github CI we have configured an identity provider in our testing project which allows +only core members of this repository to access the testing projects' resources. + +## Important Links + +- Full documentation: https://ottogroup.github.io/bquest/ diff --git a/README.rst b/README.rst deleted file mode 100644 index e21e818..0000000 --- a/README.rst +++ /dev/null @@ -1,177 +0,0 @@ -.. image:: https://raw.githubusercontent.com/ottogroup/bquest/main/docs/assets/logo.svg - :alt: BQuest Logo - -BQuest -###### - -Effortlessly validate and test your Google BigQuery queries with the power of pandas DataFrames in Python. - -We would like to thank `Mike Czech `_ who is the original inventor of bquest! - -**Warning** - -This library is a work in progress! - -Breaking changes should be expected until a 1.0 release, so version pinning is recommended. - -.. image:: https://github.com/ottogroup/bquest/workflows/Tests/badge.svg - :target: https://github.com/ottogroup/bquest/actions?workflow=Tests - :alt: CI: Overall outcome -.. image:: https://github.com/ottogroup/bquest/actions/workflows/pages/pages-build-deployment/badge.svg?branch=gh-pages - :target: https://github.com/ottogroup/bquest/actions/workflows/pages/pages-build-deployment - :alt: CD: gh-pages documentation -.. image:: https://img.shields.io/pypi/v/bquest.svg - :target: https://pypi.org/project/bquest/ - :alt: PyPI version -.. image:: https://img.shields.io/pypi/status/bquest.svg - :target: https://pypi.python.org/pypi/bquest/ - :alt: Project status (alpha, beta, stable) -.. image:: https://static.pepy.tech/personalized-badge/bquest?period=month&units=international_system&left_color=grey&right_color=blue&left_text=PyPI%20downloads/month - :target: https://pepy.tech/project/bquest - :alt: PyPI downloads -.. image:: https://img.shields.io/github/license/ottogroup/bquest - :target: https://github.com/ottogroup/bquest/blob/main/LICENSE - :alt: Project license -.. image:: https://img.shields.io/pypi/pyversions/bquest.svg - :target: https://pypi.python.org/pypi/bquest/ - :alt: Python version compatibility -.. image:: https://img.shields.io/badge/code%20style-black-000000.svg - :target: https://github.com/psf/black - :alt: Documentation: Black - -Overview -******** - -* Use BQuest in combination with your favorite testing framework (e.g. pytest). -* Create temporary test tables from JSON_ or `pandas DataFrame`_. -* Run BQ configurations and plain SQL queries on your test tables and check the result. - -.. _JSON: https://cloud.google.com/bigquery/docs/loading-data -.. _pandas DataFrame: https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.html - -Installation -************ - -Via PyPi (standard): - -.. code-block:: bash - - pip install bquest - - -Via Github (most recent): - -.. code-block:: bash - - pip install git+https://github.com/ottogroup/bquest - - -BQuest also requires a dedicated BigQuery dataset for storing test tables, e.g. - -.. code-block:: yaml - - resource "google_bigquery_dataset" "bquest" { - dataset_id = "bquest" - friendly_name = "bquest" - description = "Source tables for bquest tests" - location = "EU" - default_table_expiration_ms = 3600000 - } - -We recommend setting an `expiration time`_ for tables in the bquest dataset to assure removal of those test tables upon -test execution. - -.. _`expiration time`: https://www.terraform.io/docs/providers/google/r/bigquery_dataset.html#default_table_expiration_ms - -Example -******* - -Given a pandas DataFrame - -.. list-table:: - :widths: 30 30 30 - :header-rows: 1 - - * - foo - - weight - - prediction_date - * - bar - - 23 - - 20190301 - * - my - - 42 - - 20190301 - -and its table definition - -.. code-block:: python - - from bquest.tables import BQTableDefinitionBuilder - - table_def_builder = BQTableDefinitionBuilder(GOOGLE_PROJECT_ID, dataset="bquest", location="EU") - table_definition = table_def_builder.from_df("abc.feed_latest", df) - -you can use the config file *./abc/config.py* - -.. code-block:: json-object - - { - "query": """ - SELECT - foo, - PARSE_DATE('%Y%m%d', prediction_date) - FROM - `{source_table}` - WHERE - weight > {THRESHOLD} - """, - "start_date": "prediction_date", - "end_date": "prediction_date", - "source_tables": {"source_table": "abc.feed_latest"}, - "feature_table_name": "abc.myid", - } - -and the runner - -.. code-block:: python - - from bquest.runner import BQConfigFileRunner, BQConfigRunner - - runner = BQConfigFileRunner( - BQConfigRunner(bq_client, bq_executor_func), - "config/bq_config", - ) - - result_df = runner.run_config( - "20190301", - "20190308", - [table_definition], - "abc/config.py", - templating_vars={"THRESHOLD": "30"}, - ) - -to assert the result table - -.. code-block:: python - - assert result_df.shape == (1, 2) - assert result_df.iloc[0]["foo"] == "my" - -Testing -******* - -For the actual testing bquest relies on an accessible BigQuery project which can be configured -with the gcloud_ client. The corresponding ``GOOGLE_PROJECT_ID`` is extracted from this project -and used with pandas-gbq_ to write temporary tables to the bquest dataset that has to be pre- -configured before testing on that project. - -For Github CI we have configured an identity provider in our testing project which allows -only core members of this repository to access the testing projects' resources. - -.. _gcloud: https://cloud.google.com/sdk/docs/install?hl=de -.. _pandas-gbq: https://github.com/googleapis/python-bigquery-pandas - -Important Links -*************** - -- Full documentation: https://ottogroup.github.io/bquest/ diff --git a/docs/getting-started.md b/docs/getting-started.md index e69de29..9492e52 100644 --- a/docs/getting-started.md +++ b/docs/getting-started.md @@ -0,0 +1,24 @@ +# Getting started + +In the project where you want to run bquest tests install bquest via PyPi: + +```bash + pip install bquest +``` + +We advise you to create a bquest dataset in your Google Cloud projects BigQuery instance +so that your regular datasets are not spammed with bquest tables: + +```yaml + resource "google_bigquery_dataset" "bquest" { + dataset_id = "bquest" + friendly_name = "bquest" + description = "Source tables for bquest tests" + location = "EU" + default_table_expiration_ms = 3600000 + } +``` + +*** +WARNING: SET A DEFAULT EXPIRATION TIME SO THAT BQUEST GENERATED TABLES ARE AUTOMATICALLY REMOVED AFTER SOME TIME +*** \ No newline at end of file diff --git a/docs/index.md b/docs/index.md deleted file mode 100644 index b3178df..0000000 --- a/docs/index.md +++ /dev/null @@ -1,3 +0,0 @@ -# Welcome to BQuest - -This documentation is under construction. \ No newline at end of file diff --git a/mkdocs.yml b/mkdocs.yml index 454270a..387ec40 100644 --- a/mkdocs.yml +++ b/mkdocs.yml @@ -22,9 +22,11 @@ extra: provider: mike nav: - - Introduction: index.md + - Introduction: ../README.md - Getting Started: getting-started.md - Reference: - Dataframe: reference/dataframe.md - Runner: reference/runner.md - - Tables: reference/tables.md \ No newline at end of file + - Tables: reference/tables.md + - Contributors: ../CONTRIBUTORS.md + - Changelog: ../CHANGELOG.md \ No newline at end of file