Feasture Request: `fixest::i()` operator to set reference levels and interact Categorical Variables #244

s3alfisc · 2025-01-12T11:36:36Z

A very common operation in econometric analyses is to interact a categorical variable with another variable (for example, in difference-in-differences regressions).

To very flexibly handle such interactions, the fixest R package has introduced a novel operator, the i() operator.

It allows to easily set reference levels for individual categorical variables and their interaction. On top, it provides sugar for binning levels of categorical variables.

Would you consider to add the i() operator to formulaic's transforms? The best starting point to learn about it are the fixest docs, but I have also attached some examples and comparisons to formulaic below.

import pandas as pd
import numpy as np
from formulaic import model_matrix
from formulaic.transforms import stateful_transform
from formulaic.transforms.contrasts import C, TreatmentContrasts


rng = np.random.default_rng(91)
f1 = rng.choice(["a", "b", "c"], 10)
f2 = rng.choice([1, 2, 3], 10)
y = rng.normal(0, 1, 10)

df = pd.DataFrame({"factor1":f1, "factor2":f2, "y": y})

Easily set the reference level for one categorical

library(fixest)
library(reticulate)
df = py$df

fit = feols(y~ i(factor1), data = df)
X = fixest:::model.matrix.fixest(fit)
X |> colnames()
# [1] "(Intercept)" "factor1::b"  "factor1::c" 

fit = feols(y~ i(factor1, ref = "b"), data = df)
X = fixest:::model.matrix.fixest(fit)
X |> colnames()
# [1] "(Intercept)" "factor1::a"  "factor1::c"

This could be easily achieved by a stateful transform in formulaic:

@stateful_transform
def i(factor_var, ref=None, _state=None, _metadata=None, _spec=None):

    if "i" not in _state:
        _state["i"] = C(data = factor_var, contrasts = TreatmentContrasts(ref))

    return _state["i"]

model_matrix("i(factor1, ref = 'a')", data = df).head()


# Intercept	i(factor1, ref='a')[T.b]	i(factor1, ref='a')[T.c]
# 0	1.0	0	1
# 1	1.0	0	0
# 2	1.0	1	0
# 3	1.0	0	1
# 4	1.0	0	0

Interacting Variables

library(fixest)
library(reticulate)
df = py$df

fit = feols(y~ i(factor1, factor2), data = df)
X = fixest:::model.matrix.fixest(fit)
X |> head()
#      (Intercept) factor1::a:factor2 factor1::b:factor2 factor1::c:factor2
# [1,]           1                  0                  0                  1
# [2,]           1                  3                  0                  0
# [3,]           1                  0                  3                  0
# [4,]           1                  0                  0                  1
# [5,]           1                  3                  0                  0
# [6,]           1                  0                  0                  3

y, X = model_matrix("y ~ C(factor1):factor2", data = df)
X.columns 
#Index(['Intercept', 'C(factor1)[a]:factor2', 'C(factor1)[b]:factor2',
#       'C(factor1)[c]:factor2'],
#      dtype='object')

Two variables, reference level used

fit = feols(y~ i(factor1, factor2, ref = "a"), data = df)
X = fixest:::model.matrix.fixest(fit)
X |> colnames()
# [1] "(Intercept)"        "factor1::b:factor2" "factor1::c:factor2"

y, X = model_matrix("y ~ C(factor1, contr.treatment('a')):factor2", data = df)
X.columns
#Index(['Intercept', 
#      'C(factor1, contr.treatment('a'))[a]:factor2',
#       'C(factor1, contr.treatment('a'))[b]:factor2',
#       'C(factor1, contr.treatment('a'))[c]:factor2'],
#      dtype='object')

# so need to drop column 'C(factor1, contr.treatment('a'))[a]:factor2' by hand

Binning

# binning # group fe levels a & b into 'bin'
fit = feols(y~ i(factor1, factor2, bin = list(bin= c("a","b"))), data = df)
X = fixest:::model.matrix.fixest(fit)
X |> colnames()
# [1] "(Intercept)"          "factor1::bin:factor2" "factor1::c:factor2"

The text was updated successfully, but these errors were encountered:

matthewwardrop · 2025-01-13T06:44:37Z

Thanks for the suggestion @s3alfisc . I'll take a look and get back to you soon :).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feasture Request: `fixest::i()` operator to set reference levels and interact Categorical Variables #244

Feasture Request: `fixest::i()` operator to set reference levels and interact Categorical Variables #244

s3alfisc commented Jan 12, 2025

matthewwardrop commented Jan 13, 2025

Feasture Request: fixest::i() operator to set reference levels and interact Categorical Variables #244

Feasture Request: fixest::i() operator to set reference levels and interact Categorical Variables #244

Comments

s3alfisc commented Jan 12, 2025

Easily set the reference level for one categorical

Interacting Variables

Two variables, reference level used

Binning

matthewwardrop commented Jan 13, 2025

Feasture Request: `fixest::i()` operator to set reference levels and interact Categorical Variables #244

Feasture Request: `fixest::i()` operator to set reference levels and interact Categorical Variables #244