Skip to content
Permalink

Comparing changes

Choose two branches to see what’s changed or to start a new pull request. If you need to, you can also or learn more about diff comparisons.

Open a pull request

Create a new pull request by comparing changes across two branches. If you need to, you can also . Learn more about diff comparisons here.
base repository: matthewwardrop/formulaic
Failed to load repositories. Confirm that selected base ref is valid, then try again.
Loading
base: v0.6.6
Choose a base ref
...
head repository: matthewwardrop/formulaic
Failed to load repositories. Confirm that selected head ref is valid, then try again.
Loading
compare: main
Choose a head ref
Loading
Showing with 15,256 additions and 2,040 deletions.
  1. +23 −16 .github/workflows/publish.yml
  2. +26 −8 .github/workflows/publish_docs.yml
  3. +26 −25 .github/workflows/tests.yml
  4. +16 −0 .pre-commit-config.yaml
  5. +20 −3 README.md
  6. +3 −2 benchmarks/benchmark.py
  7. +45 −45 benchmarks/benchmarks.csv
  8. BIN benchmarks/benchmarks.png
  9. +27 −9 benchmarks/plot.py
  10. +1 −0 docsite/.gitignore
  11. +1 −1 docsite/docs/assets/styles.css
  12. +259 −116 docsite/docs/changelog.md
  13. +0 −11 docsite/docs/concepts/index.md
  14. +61 −0 docsite/docs/dev/extensions.md
  15. +7 −2 docsite/docs/dev/index.md
  16. +82 −0 docsite/docs/dev/integration.md
  17. +4 −7 docsite/docs/formulas.md
  18. +22 −14 docsite/docs/guides/contrasts.ipynb
  19. +122 −60 docsite/docs/guides/formulae.ipynb
  20. +22 −14 docsite/docs/guides/grammar.md
  21. +37 −17 docsite/docs/guides/integration.ipynb
  22. +421 −0 docsite/docs/guides/missing_data.ipynb
  23. +68 −33 docsite/docs/guides/model_specs.ipynb
  24. +9 −7 docsite/docs/guides/quickstart.ipynb
  25. +162 −20 docsite/docs/guides/splines.ipynb
  26. +16 −8 docsite/docs/guides/transforms.ipynb
  27. +0 −13 docsite/docs/index.md
  28. +0 −4 docsite/docs/installation.md
  29. +23 −17 docsite/mkdocs.yml
  30. +8 −0 docsite/overrides/partials/main.html
  31. +0 −6 docsite/requirements.in
  32. +282 −0 docsite/requirements.lock
  33. +0 −152 docsite/requirements.txt
  34. +4 −2 formulaic/__init__.py
  35. +681 −173 formulaic/formula.py
  36. +2 −2 formulaic/materializers/__init__.py
  37. +0 −45 formulaic/materializers/arrow.py
  38. +146 −89 formulaic/materializers/base.py
  39. +207 −0 formulaic/materializers/narwhals.py
  40. +27 −39 formulaic/materializers/pandas.py
  41. +2 −2 formulaic/materializers/types/evaluated_factor.py
  42. +58 −26 formulaic/materializers/types/factor_values.py
  43. +27 −2 formulaic/materializers/types/scoped_term.py
  44. +20 −2 formulaic/model_matrix.py
  45. +424 −141 formulaic/model_spec.py
  46. +4 −1 formulaic/parser/__init__.py
  47. +37 −0 formulaic/parser/algos/sanitize_tokens.py
  48. +23 −19 formulaic/parser/algos/tokenize.py
  49. +70 −32 formulaic/parser/algos/tokens_to_ast.py
  50. +280 −22 formulaic/parser/parser.py
  51. +0 −3 formulaic/parser/types/__init__.py
  52. +33 −14 formulaic/parser/types/ast_node.py
  53. +8 −3 formulaic/parser/types/factor.py
  54. +161 −14 formulaic/parser/types/formula_parser.py
  55. +13 −4 formulaic/parser/types/operator.py
  56. +19 −38 formulaic/parser/types/operator_resolver.py
  57. +2 −3 formulaic/parser/types/ordered_set.py
  58. +23 −4 formulaic/parser/types/term.py
  59. +45 −6 formulaic/parser/types/token.py
  60. +43 −21 formulaic/parser/utils.py
  61. +18 −4 formulaic/sugar.py
  62. +13 −1 formulaic/transforms/__init__.py
  63. +44 −20 formulaic/transforms/basis_spline.py
  64. +119 −46 formulaic/transforms/contrasts.py
  65. +605 −0 formulaic/transforms/cubic_spline.py
  66. +76 −0 formulaic/transforms/hashed.py
  67. +50 −0 formulaic/transforms/lag.py
  68. +9 −6 formulaic/transforms/patsy_compat.py
  69. +3 −1 formulaic/transforms/scale.py
  70. +9 −5 formulaic/utils/calculus.py
  71. +5 −4 formulaic/utils/cast.py
  72. +112 −0 formulaic/utils/code.py
  73. +39 −38 formulaic/utils/constraints.py
  74. +3 −2 formulaic/utils/context.py
  75. +31 −0 formulaic/utils/deprecations.py
  76. +3 −2 formulaic/utils/iterators.py
  77. +9 −14 formulaic/utils/layered_mapping.py
  78. +32 −12 formulaic/utils/null_handling.py
  79. +11 −15 formulaic/utils/sentinels.py
  80. +3 −2 formulaic/utils/sparse.py
  81. +20 −115 formulaic/utils/stateful_transforms.py
  82. +118 −58 formulaic/{parser/types → utils}/structured.py
  83. +11 −8 formulaic/utils/variables.py
  84. +88 −72 pyproject.toml
  85. +21 −31 tests/materializers/test_arrow.py
  86. +28 −8 tests/materializers/test_base.py
  87. +105 −38 tests/materializers/test_pandas.py
  88. +147 −0 tests/materializers/test_polars.py
  89. +1 −1 tests/materializers/types/test_evaluated_factor.py
  90. +12 −0 tests/materializers/types/test_factor_values.py
  91. +17 −1 tests/materializers/types/test_scoped_term.py
  92. +19 −2 tests/parser/algos/test_tokenize.py
  93. +3 −3 tests/parser/algos/test_tokens_to_ast.py
  94. +176 −32 tests/parser/test_parser.py
  95. +38 −2 tests/parser/test_utils.py
  96. +1 −0 tests/parser/types/test_factor.py
  97. +9 −1 tests/parser/types/test_formula_parser.py
  98. +6 −13 tests/parser/types/test_operator_resolver.py
  99. +1 −2 tests/parser/types/test_ordered_set.py
  100. +30 −4 tests/parser/types/test_structured.py
  101. +4 −0 tests/parser/types/test_term.py
  102. +6 −0 tests/parser/types/test_token.py
  103. +253 −36 tests/test_formula.py
  104. +15 −4 tests/test_model_matrix.py
  105. +178 −81 tests/test_model_spec.py
  106. +5 −2 tests/test_sugar.py
  107. +22 −0 tests/transforms/data/cubic_spline_r_test_x.json
  108. +7,600 −0 tests/transforms/data/cublic_spine_r_test_data.json
  109. +294 −0 tests/transforms/data/cublic_spine_r_test_data.txt
  110. +38 −1 tests/transforms/test_basis_spline.py
  111. +111 −21 tests/transforms/test_contrasts.py
  112. +363 −0 tests/transforms/test_cubic_spline.py
  113. +60 −0 tests/transforms/test_hashed.py
  114. +38 −0 tests/transforms/test_lag.py
  115. +2 −1 tests/transforms/test_patsy_compat.py
  116. +9 −0 tests/transforms/test_scale.py
  117. +15 −1 tests/utils/test_calculus.py
  118. +20 −3 tests/utils/test_constraints.py
  119. +0 −1 tests/utils/test_iterators.py
  120. +0 −2 tests/utils/test_layered_mapping.py
  121. +1 −3 tests/utils/test_null_handling.py
  122. +4 −2 tests/utils/test_sentinels.py
  123. +21 −2 tests/utils/test_stateful_transforms.py
39 changes: 23 additions & 16 deletions .github/workflows/publish.yml
Original file line number Diff line number Diff line change
@@ -5,23 +5,30 @@ on:
tags:
- "v*"

permissions:
contents: write

jobs:
deploy:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v1
- name: Set up Python
uses: actions/setup-python@v1
with:
python-version: '3.9'
- name: Install dependencies
run: |
python -m pip install --upgrade pip
pip install hatch
- name: Build and publish
env:
HATCH_INDEX_USER: __token__
HATCH_INDEX_AUTH: ${{ secrets.PYPI_PUBLISH_TOKEN }}
run: |
hatch build
hatch publish
- uses: actions/checkout@v1
- name: Set up Python
uses: actions/setup-python@v1
with:
python-version: "3.9"
- name: Install dependencies
run: |
python -m pip install --upgrade pip
pip install hatch
- name: Build and publish
env:
HATCH_INDEX_USER: __token__
HATCH_INDEX_AUTH: ${{ secrets.PYPI_PUBLISH_TOKEN }}
run: |
hatch build
hatch publish
- name: Build and deploy docs
run: |
VERSION=$(echo "${{ github.ref_name }}" | sed -E s/\.[0-9]+$/\.x/)
hatch run docs:deploy "$VERSION" latest
34 changes: 26 additions & 8 deletions .github/workflows/publish_docs.yml
Original file line number Diff line number Diff line change
@@ -5,17 +5,35 @@ on:
branches:
- main

permissions:
contents: write

jobs:
build:
name: Deploy docs
runs-on: ubuntu-latest
steps:
- name: Checkout main
uses: actions/checkout@v1

- name: Deploy docs
uses: mhausenblas/mkdocs-deploy-gh-pages@master
env:
CONFIG_FILE: docsite/mkdocs.yml
REQUIREMENTS: docsite/requirements.txt
GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
uses: actions/checkout@v4
- name: Configure Git Credentials
run: |
git config user.name github-actions[bot]
git config user.email 41898282+github-actions[bot]@users.noreply.github.com
- uses: actions/setup-python@v5
with:
python-version: 3.x
- run: echo "cache_id=$(date --utc '+%V')" >> $GITHUB_ENV
- uses: actions/cache@v4
with:
key: mkdocs-material-${{ env.cache_id }}
path: .cache
restore-keys: |
mkdocs-material-
- name: Install hatch
run: |
python -m pip install --upgrade pip
pip install hatch
- name: Build and Deploy docs.
run: |
git pull
hatch run docs:deploy dev
51 changes: 26 additions & 25 deletions .github/workflows/tests.yml
Original file line number Diff line number Diff line change
@@ -3,7 +3,8 @@ name: Run Tests
on:
push:
branches:
- "**"
- "main"
pull_request:

jobs:
build:
@@ -12,31 +13,31 @@ jobs:
max-parallel: 4
fail-fast: false
matrix:
python-version: ["3.7", "3.8", "3.9", "3.10", "3.11"]
python-version: ["3.9", "3.10", "3.11", "3.12", "3.13"]
env:
OS: ${{ matrix.os }}
PYTHON: ${{ matrix.python-version }}
steps:
- uses: actions/checkout@v1
- name: Set up Python ${{ matrix.python-version }}
uses: actions/setup-python@v4
with:
python-version: ${{ matrix.python-version }}
- name: Install dependencies
run: |
python -m pip install --upgrade pip
pip install hatch hatch-vcs
- name: Run tests
run: hatch run tests
- if: matrix.python-version == '3.7'
name: Check works with min requirements
run: hatch run test_min:pytest tests
- if: matrix.python-version == '3.9'
name: Lint
run: hatch run lint:check
- name: Upload coverage to Codecov
uses: codecov/codecov-action@v1.0.10
with:
file: ./coverage.xml
flags: unittests
env_vars: OS,PYTHON
- uses: actions/checkout@v1
- name: Set up Python ${{ matrix.python-version }}
uses: actions/setup-python@v4
with:
python-version: ${{ matrix.python-version }}
- name: Install dependencies
run: |
python -m pip install --upgrade pip
pip install hatch hatch-vcs
- name: Run tests
run: hatch run tests
- if: matrix.python-version == '3.9'
name: Check works with min requirements
run: hatch run test_min:pytest tests
- if: matrix.python-version == '3.9'
name: Lint
run: hatch run lint:check
- name: Upload coverage to Codecov
uses: codecov/codecov-action@v1.0.10
with:
file: ./coverage.xml
flags: unittests
env_vars: OS,PYTHON
16 changes: 16 additions & 0 deletions .pre-commit-config.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@
repos:
- repo: https://github.com/pre-commit/pre-commit-hooks
rev: v2.3.0
hooks:
- id: check-yaml
- id: check-toml
- id: end-of-file-fixer
- id: trailing-whitespace
- repo: https://github.com/astral-sh/ruff-pre-commit
rev: v0.5.6
hooks:
- id: ruff # Run the linter.
types_or: [ python, pyi, jupyter ]
args: [ --fix ]
- id: ruff-format # Run the formatter.
types_or: [ python, pyi, jupyter ]
23 changes: 20 additions & 3 deletions README.md
Original file line number Diff line number Diff line change
@@ -10,8 +10,6 @@

Formulaic is a high-performance implementation of Wilkinson formulas for Python.

**Note:** This project, while largely complete, is still a work in progress, and the API is subject to change between major versions (0.<major>.<minor>).

- **Documentation**: https://matthewwardrop.github.io/formulaic
- **Source Code**: https://github.com/matthewwardrop/formulaic
- **Issue tracker**: https://github.com/matthewwardrop/formulaic/issues
@@ -31,6 +29,7 @@ It provides:
- `numpy.ndarray`
- `scipy.sparse.CSCMatrix`
- support for symbolic differentiation of formulas (and hence model matrices).
- and much more.

## Example code

@@ -39,7 +38,7 @@ import pandas
from formulaic import Formula
df = pandas.DataFrame({
'y': [0,1,2],
'y': [0, 1, 2],
'x': ['A', 'B', 'C'],
'z': [0.3, 0.1, 0.2],
})
@@ -107,6 +106,13 @@ y, X = Formula('y ~ x + z').get_model_matrix(df)
</tbody>
</table>

Note that the above can be short-handed to:

```
from formulaic import model_matrix
model_matrix('y ~ x + z', df)
```

## Benchmarks

Formulaic typically outperforms R for both dense and sparse model matrices, and vastly outperforms `patsy` (the existing implementation for Python) for dense matrices (`patsy` does not support sparse model matrix output).
@@ -121,3 +127,14 @@ For more details, see [here](benchmarks/README.md).
- [StatsModels.jl `@formula`](https://juliastats.org/StatsModels.jl/stable/formula/): The implementation of Wilkinson formulas for Julia.
- [R Formulas](https://www.rdocumentation.org/packages/stats/versions/3.6.2/topics/formula): The implementation of Wilkinson formulas for R, which is thoroughly introduced [here](https://cran.r-project.org/web/packages/Formula/vignettes/Formula.pdf). [R itself is an implementation of [S](https://en.wikipedia.org/wiki/S_%28programming_language%29), in which formulas were first made popular].
- The work that started it all: Wilkinson, G. N., and C. E. Rogers. Symbolic description of factorial models for analysis of variance. J. Royal Statistics Society 22, pp. 392–399, 1973.

## Used by

Below are some of the projects that use Formulaic:

- [Glum](https://github.com/Quantco/glum): High performance Python GLM's with all the features.
- [Lifelines](https://github.com/camDavidsonPilon/lifelines): Survival analysis in Python.
- [Linearmodels](https://github.com/bashtage/linearmodels): Additional linear models including instrumental variable and panel data models that are missing from statsmodels.
- [Pyfixest](https://github.com/s3alfisc/pyfixest): Fast High-Dimensional Fixed Effects Regression in Python following fixest-syntax.
- [Tabmat](https://github.com/Quantco/tabmat): Efficient matrix representations for working with tabular data.
- Add your project here!
5 changes: 3 additions & 2 deletions benchmarks/benchmark.py
Original file line number Diff line number Diff line change
@@ -4,13 +4,14 @@
import time
from collections import namedtuple

import formulaic
import numpy
import pandas
import patsy
from formulaic import Formula
from uncertainties import ufloat

import formulaic
from formulaic import Formula

ALL_TOOLINGS = ["patsy", "formulaic", "formulaic_sparse", "R", "R_sparse"]

formulas = {
90 changes: 45 additions & 45 deletions benchmarks/benchmarks.csv
Original file line number Diff line number Diff line change
@@ -1,46 +1,46 @@
,formula,tooling,mean,stderr
0,a,patsy,0.06235763004847935,0.005395749279517771
1,a,formulaic,0.01614434378487723,0.0032546966398573733
2,a,formulaic_sparse,0.3261915956224714,0.016261649503701136
3,a,R,0.28679302760532926,0.04140188245157612
4,a,R_sparse,0.3757050037384033,0.10653194222854354
5,A,patsy,5.07630558013916,0.21714776785515827
6,A,formulaic,0.20960685185023717,0.00647732136737535
7,A,formulaic_sparse,0.49657981736319406,0.014160553634556148
8,A,R,0.270564215523856,0.048446057523137825
9,A,R_sparse,0.6199769633156913,0.046671131192277546
10,a+A,patsy,5.372520089149475,0.24907024334140557
11,a+A,formulaic,0.21435914720807756,0.005022992135378462
12,a+A,formulaic_sparse,0.5922877447945731,0.01141291396766895
13,a+A,R,0.3385444368634905,0.050949061524224834
14,a+A,R_sparse,0.8427209513528007,0.05420930370296844
15,a:A,patsy,5.41690331697464,0.20045329513286106
16,a:A,formulaic,0.2447828565325056,0.009831368693974841
17,a:A,formulaic_sparse,0.5952485970088414,0.015557371582484582
18,a:A,R,0.325153112411499,0.0525960780057241
19,a:A,R_sparse,0.6293824059622628,0.05176575091323222
20,A+B,patsy,10.592723488807678,0.36380136013031006
21,A+B,formulaic,0.39785667828151156,0.004189211004718843
22,A+B,formulaic_sparse,0.7370290756225586,0.00559989056050753
23,A+B,R,0.45774364471435547,0.04577860217478202
24,A+B,R_sparse,1.128925051007952,0.0730276871543806
25,a:A:B,patsy,13.139377474784851,0.735141396522522
26,a:A:B,formulaic,0.5296461582183838,0.02897198866398977
27,a:A:B,formulaic_sparse,0.9496099608285087,0.017482256914701913
28,a:A:B,R,0.5121760368347168,0.05913549901764887
29,a:A:B,R_sparse,2.4410063539232527,0.15718149749347077
30,A:B:C:D,patsy,33.971909284591675,0.0
31,A:B:C:D,formulaic,1.4003467900412423,0.013122149254960603
32,A:B:C:D,formulaic_sparse,2.6644029957907542,0.0594315471815126
33,A:B:C:D,R,1.5739161627633231,0.043335739540618624
34,A:B:C:D,R_sparse,11.206892251968384,0.07203364372253418
35,a*b*A*B,patsy,14.135663151741028,0.023609280586242676
36,a*b*A*B,formulaic,0.7015061037881034,0.015836408587630867
37,a*b*A*B,formulaic_sparse,1.2936896937234061,0.008783658171925213
38,a*b*A*B,R,0.7440026828220913,0.0779464030983464
39,a*b*A*B,R_sparse,8.046716928482056,0.09924621730008089
40,a*b*c*A*B*C,patsy,52.30743145942688,0.0
41,a*b*c*A*B*C,formulaic,3.124175344194685,0.015513429204320773
42,a*b*c*A*B*C,formulaic_sparse,4.722880220413208,0.05794530543951235
43,a*b*c*A*B*C,R,3.261254208428519,0.03376348572979368
44,a*b*c*A*B*C,R_sparse,96.12985253334045,0.0
0,a,patsy,0.05834197998046875,0.00803367432263398
1,a,formulaic,0.02302394594464983,0.005276772701941135
2,a,formulaic_sparse,0.21061321667262486,0.010992980906779363
3,a,R,0.20319366455078125,0.04054977688850774
4,a,R_sparse,0.25407181467328754,0.06840892083032614
5,A,patsy,4.188197422027588,0.06544489467164907
6,A,formulaic,0.1523878574371338,0.0036558939106885466
7,A,formulaic_sparse,0.3251234803880964,0.011790060964279047
8,A,R,0.17913893290928432,0.024771923314745647
9,A,R_sparse,0.2776027406964983,0.030147102033801613
10,a+A,patsy,4.570634412765503,0.15648980716104596
11,a+A,formulaic,0.17730648177010672,0.011166617158905944
12,a+A,formulaic_sparse,0.4082690307072231,0.019076310496142206
13,a+A,R,0.38174584933689665,0.07049519867629231
14,a+A,R_sparse,0.5332009451729911,0.21147057077093623
15,a:A,patsy,4.846947574615479,0.17439756289836647
16,a:A,formulaic,0.18150435175214494,0.0029803910808661733
17,a:A,formulaic_sparse,0.40479908670697895,0.02179493572840004
18,a:A,R,0.209270749773298,0.02890730110968228
19,a:A,R_sparse,0.3095934050423758,0.03366518302136102
20,A+B,patsy,8.886903127034506,0.12687320827416307
21,A+B,formulaic,0.37893104553222656,0.07142932738422411
22,A+B,formulaic_sparse,0.6603872776031494,0.12048599251620122
23,A+B,R,0.3503831795283726,0.08556777381884671
24,A+B,R_sparse,0.6867697579520089,0.1680831720230895
25,a:A:B,patsy,10.59350836277008,0.006071925163269043
26,a:A:B,formulaic,0.38779779842921663,0.005652063736152758
27,a:A:B,formulaic_sparse,0.6174772126334054,0.006656982815848345
28,a:A:B,R,0.41255525180271696,0.005872361057027324
29,a:A:B,R_sparse,1.3681020736694336,0.13250905747410396
30,A:B:C:D,patsy,27.812817335128784,0.0
31,A:B:C:D,formulaic,1.7389381953648158,0.1128489022833264
32,A:B:C:D,formulaic_sparse,1.821084805897304,0.04040580196899275
33,A:B:C:D,R,1.1703059673309326,0.01475477228204255
34,A:B:C:D,R_sparse,6.603186547756195,0.10862060432577084
35,a*b*A*B,patsy,14.305930256843567,1.4570282697677612
36,a*b*A*B,formulaic,0.849949870790754,0.12062261319473745
37,a*b*A*B,formulaic_sparse,1.194093908582415,0.27659465274967987
38,a*b*A*B,R,0.633225509098598,0.09868026112633763
39,a*b*A*B,R_sparse,7.428930600484212,1.4652407668448042
40,a*b*c*A*B*C,patsy,48.66431951522827,0.0
41,a*b*c*A*B*C,formulaic,4.352833080291748,0.31885221781014655
42,a*b*c*A*B*C,formulaic_sparse,4.8097954273223875,0.7055727752553242
43,a*b*c*A*B*C,R,2.6774498394557407,0.06813521510330559
44,a*b*c*A*B*C,R_sparse,72.07087659835815,0.0
Binary file modified benchmarks/benchmarks.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading