Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FEAT-#4605: Implementation of Small Query Compiler to support small and empty DataFrames #5113

Open
wants to merge 28 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
28 commits
Select commit Hold shift + click to select a range
5e552ff
Partial implementation of small query compiler
billiam-wang Sep 15, 2022
1c87589
Working binary, datetime, and string functions for small query compiler
billiam-wang Sep 18, 2022
05a0ee1
Added resample and rolling implementations
billiam-wang Sep 21, 2022
5722d82
Added a few new functions
billiam-wang Sep 26, 2022
9c107aa
Added most mapped functions
billiam-wang Sep 27, 2022
a5b951c
Passing most tests in test_general.py
billiam-wang Sep 28, 2022
016de6c
Added changes to fix bugs, still failing tests
billiam-wang Oct 7, 2022
8781a54
Passing additional tests for series and groupby
billiam-wang Oct 10, 2022
b6ea7ee
Passing most groupby tests
billiam-wang Oct 13, 2022
170f919
Passing almost all tests except in join sort and groupby
billiam-wang Oct 18, 2022
3b2a02c
Passing all groupby tests except test_dict_agg_rename_mi_columns
billiam-wang Oct 24, 2022
d162386
Passed all except 1 tests
billiam-wang Oct 25, 2022
ec02113
Passing all tests and added config environment variable for SmallQuer…
billiam-wang Oct 27, 2022
85491ea
Added environment variable to control SmallQueryCompiler
billiam-wang Oct 31, 2022
075a93e
Removed print statements and cleaned up
billiam-wang Oct 31, 2022
f018818
More code cleanup
billiam-wang Oct 31, 2022
2f3cf0e
Reset default InitializeWithSmallQueryCompilers envvar to False
billiam-wang Oct 31, 2022
bc122f6
Resolved flake and black failures
billiam-wang Oct 31, 2022
46e517f
Removed print statements and fixed import statements
billiam-wang Oct 31, 2022
a080898
Fixed more import statements
billiam-wang Oct 31, 2022
8e5f80a
Cleaned up helper functions and fixed test_binary issues
billiam-wang Oct 31, 2022
1be9793
Fixed flake8 and black again
billiam-wang Oct 31, 2022
a2c5a6f
Added docstrings for SmallQueryCompiler
billiam-wang Oct 31, 2022
135148d
Made flake8 and black changes
billiam-wang Oct 31, 2022
c0df87a
Passing all lint checks
billiam-wang Nov 1, 2022
dccf80c
Added documentation for try_cast_to_pandas
billiam-wang Nov 1, 2022
40446d3
Added SmallQueryCompiler CI tests and fixed remaining bugs
billiam-wang Nov 2, 2022
43ff32b
Changed CI test version to 3.9
billiam-wang Nov 7, 2022
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
73 changes: 73 additions & 0 deletions .github/workflows/ci.yml
Original file line number Diff line number Diff line change
Expand Up @@ -15,6 +15,22 @@ on:
- setup.py
- versioneer.py
jobs:
changes:
runs-on: ubuntu-latest
permissions:
pull-requests: read
outputs:
test-small-query-compiler: ${{ steps.filter.outputs.test-small-query-compiler }}
steps:
- uses: dorny/paths-filter@v2
id: filter
with:
filters: |
test-small-query-compiler:
- 'modin/experimental/core/storage_formats/pandas/small_query_compiler.py'
- 'modin/core/storage_formats/pandas/query_compiler.py'
- 'modin/core/storage_formats/base/query_compiler.py'
Comment on lines +29 to +32
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this looks clever, but I think it feels temporary to me - query compilers both use stuff from lower level and are used by higher level, so this integration should eventually be tested upon change in any of the things used throughout the pipeline... this should probably stay for now, but be removed towards the shiny future where the "auto-switch" would happen


lint-black:
name: lint (black)
runs-on: ubuntu-latest
Expand Down Expand Up @@ -969,3 +985,60 @@ jobs:
conda info
conda list
- run: python -m pytest modin/experimental/spreadsheet/test/test_general.py

test-small-query-compiler:
needs: [changes, lint-flake8, lint-black, test-api, test-headers]
if: ${{ needs.changes.outputs.test-small-query-compiler == 'true' }}
runs-on: ubuntu-latest
defaults:
run:
shell: bash -l {0}
strategy:
matrix:
python-version: ["3.9"]
env:
MODIN_SMALL_QUERY_COMPILER: "True"
name: test-small-query-compiler python ${{matrix.python-version}})
steps:
- uses: actions/checkout@v2
with:
fetch-depth: 2
- name: Cache conda
uses: actions/cache@v3
with:
path: |
~/conda_pkgs_dir
~/.cache/pip
key:
${{ runner.os }}-conda-${{ hashFiles('requirements/environment-py36.yml') }}
- uses: conda-incubator/setup-miniconda@v2
with:
miniforge-variant: Mambaforge
miniforge-version: latest
use-mamba: true
activate-environment: modin
environment-file: environment-dev.yml
python-version: 3.9
channel-priority: strict
use-only-tar-bz2: true # IMPORTANT: This needs to be set for caching to work properly!
- name: Conda environment
run: |
conda info
conda list
- run: pytest -n 2 modin/pandas/test/dataframe/test_binary.py
- run: pytest -n 2 modin/pandas/test/dataframe/test_default.py
- run: pytest -n 2 modin/pandas/test/dataframe/test_indexing.py
- run: pytest -n 2 modin/pandas/test/dataframe/test_iter.py
- run: pytest -n 2 modin/pandas/test/dataframe/test_join_sort.py
- run: pytest -n 2 modin/pandas/test/dataframe/test_map_metadata.py
- run: pytest -n 2 modin/pandas/test/dataframe/test_pickle.py
- run: pytest -n 2 modin/pandas/test/dataframe/test_reduce.py
- run: pytest -n 2 modin/pandas/test/dataframe/test_udf.py
- run: pytest -n 2 modin/pandas/test/dataframe/test_window.py
- run: pytest -n 2 modin/pandas/test/test_api.py
- run: pytest -n 2 modin/pandas/test/test_concat.py
- run: pytest -n 2 modin/pandas/test/test_general.py
- run: pytest -n 2 modin/pandas/test/test_groupby.py
- run: pytest -n 2 modin/pandas/test/test_rolling.py
- run: pytest -n 2 modin/pandas/test/test_series.py
- uses: codecov/codecov-action@v2
7 changes: 7 additions & 0 deletions modin/config/envvars.py
Original file line number Diff line number Diff line change
Expand Up @@ -592,6 +592,13 @@ class ReadSqlEngine(EnvironmentVariable, type=str):
choices = ("Pandas", "Connectorx")


class InitializeWithSmallQueryCompilers(EnvironmentVariable, type=str):
"""Set to true to use implementation of SmallQueryCompiler."""

varname = "MODIN_SMALL_QUERY_COMPILER"
default = False


def _check_vars() -> None:
"""
Check validity of environment variables.
Expand Down
Loading