Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feasture Request: fixest::i() operator to set reference levels and interact Categorical Variables #244

Open
s3alfisc opened this issue Jan 12, 2025 · 1 comment

Comments

@s3alfisc
Copy link

Hi @matthewwardrop ,

A very common operation in econometric analyses is to interact a categorical variable with another variable (for example, in difference-in-differences regressions).

To very flexibly handle such interactions, the fixest R package has introduced a novel operator, the i() operator.

It allows to easily set reference levels for individual categorical variables and their interaction. On top, it provides sugar for binning levels of categorical variables.

Would you consider to add the i() operator to formulaic's transforms? The best starting point to learn about it are the fixest docs, but I have also attached some examples and comparisons to formulaic below.

import pandas as pd
import numpy as np
from formulaic import model_matrix
from formulaic.transforms import stateful_transform
from formulaic.transforms.contrasts import C, TreatmentContrasts


rng = np.random.default_rng(91)
f1 = rng.choice(["a", "b", "c"], 10)
f2 = rng.choice([1, 2, 3], 10)
y = rng.normal(0, 1, 10)

df = pd.DataFrame({"factor1":f1, "factor2":f2, "y": y})

Easily set the reference level for one categorical

library(fixest)
library(reticulate)
df = py$df

fit = feols(y~ i(factor1), data = df)
X = fixest:::model.matrix.fixest(fit)
X |> colnames()
# [1] "(Intercept)" "factor1::b"  "factor1::c" 

fit = feols(y~ i(factor1, ref = "b"), data = df)
X = fixest:::model.matrix.fixest(fit)
X |> colnames()
# [1] "(Intercept)" "factor1::a"  "factor1::c" 

This could be easily achieved by a stateful transform in formulaic:

@stateful_transform
def i(factor_var, ref=None, _state=None, _metadata=None, _spec=None):

    if "i" not in _state:
        _state["i"] = C(data = factor_var, contrasts = TreatmentContrasts(ref))

    return _state["i"]

model_matrix("i(factor1, ref = 'a')", data = df).head()


# Intercept	i(factor1, ref='a')[T.b]	i(factor1, ref='a')[T.c]
# 0	1.0	0	1
# 1	1.0	0	0
# 2	1.0	1	0
# 3	1.0	0	1
# 4	1.0	0	0

Interacting Variables

library(fixest)
library(reticulate)
df = py$df

fit = feols(y~ i(factor1, factor2), data = df)
X = fixest:::model.matrix.fixest(fit)
X |> head()
#      (Intercept) factor1::a:factor2 factor1::b:factor2 factor1::c:factor2
# [1,]           1                  0                  0                  1
# [2,]           1                  3                  0                  0
# [3,]           1                  0                  3                  0
# [4,]           1                  0                  0                  1
# [5,]           1                  3                  0                  0
# [6,]           1                  0                  0                  3
y, X = model_matrix("y ~ C(factor1):factor2", data = df)
X.columns 
#Index(['Intercept', 'C(factor1)[a]:factor2', 'C(factor1)[b]:factor2',
#       'C(factor1)[c]:factor2'],
#      dtype='object')

Two variables, reference level used

fit = feols(y~ i(factor1, factor2, ref = "a"), data = df)
X = fixest:::model.matrix.fixest(fit)
X |> colnames()
# [1] "(Intercept)"        "factor1::b:factor2" "factor1::c:factor2"
y, X = model_matrix("y ~ C(factor1, contr.treatment('a')):factor2", data = df)
X.columns
#Index(['Intercept', 
#      'C(factor1, contr.treatment('a'))[a]:factor2',
#       'C(factor1, contr.treatment('a'))[b]:factor2',
#       'C(factor1, contr.treatment('a'))[c]:factor2'],
#      dtype='object')

# so need to drop column 'C(factor1, contr.treatment('a'))[a]:factor2' by hand

Binning

# binning # group fe levels a & b into 'bin'
fit = feols(y~ i(factor1, factor2, bin = list(bin= c("a","b"))), data = df)
X = fixest:::model.matrix.fixest(fit)
X |> colnames()
# [1] "(Intercept)"          "factor1::bin:factor2" "factor1::c:factor2"  
@matthewwardrop
Copy link
Owner

Thanks for the suggestion @s3alfisc . I'll take a look and get back to you soon :).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants