Skip to content

Sanofi-Public/CImpact

Repository files navigation

License Python TensorFlow Prophet Pyro

CImpact - Causal Inference for Measuring Performance and Causal Trends

CImpact is a modular causal impact analysis library for Python, supporting multiple time series models, including TensorFlow , Prophet, and Pyro. It provides a flexible framework for estimating the causal effect of an intervention on time series data.

Table of Contents

Introduction


CImpact is a versatile Python library designed to empower analysts and data scientists to evaluate the causal impact of interventions on time series data. By integrating a suite of statistical and probabilistic models, CImpact offers robust and flexible tools for causal inference, ensuring adaptability across diverse use cases and modeling preferences.

With support for multiple cutting-edge frameworks, including TensorFlow Probability, Pyro, and Prophet, CImpact enables users to:

Quantify Intervention Effects: Measure the influence of interventions with confidence intervals and probabilistic predictions. Leverage Advanced Models: Utilize models that capture trends, seasonality, and covariates, providing deeper insights into time series dynamics. Customize Approaches: Select between Hamiltonian Monte Carlo (HMC), Variational Inference (VI), or Prophet-based methods to match computational and analytical needs. Seamlessly Handle Covariates: Account for external variables that impact the time series through regression components or regressors.

Whether your data exhibits complex seasonality, local trends, or requires the incorporation of contextual variables, CImpact equips you with a powerful toolkit to make informed decisions supported by rigorous statistical analysis.

Why CImpact?


CImpact extends the functionalities of the tfcausalimpact library by incorporating support for multiple modeling approaches. This modular design allows users to choose the best model for their specific needs and compare performance and results across different models. We highly recommend reading this detailed blog post explainng the causal inference in great detail.

Features


  • Support for Advanced Models
    Leverage state-of-the-art statistical models for causal impact analysis, including:

    • TensorFlow Probability: Bayesian Structural Time Series (BSTS) models with support for trend, seasonality, and regression components.
    • Prophet: Time series forecasting with robust handling of seasonality, missing data, and external regressors.
    • Pyro: Bayesian regression using Variational Inference (VI) or Hamiltonian Monte Carlo (HMC) for probabilistic modeling and uncertainty quantification.
  • Adapter-Based Modular Design
    Easily extend the library by integrating custom models. The adapter-based architecture allows seamless addition of new frameworks.

  • Highly Configurable
    Fine-tune model parameters, specify covariates, and select fitting methods (e.g., HMC, VI) to tailor analyses to specific needs.

  • Rigorous Evaluation
    Includes tools for pre- and post-intervention analysis, model performance assessment, and confidence interval computation for causal inference.

  • Powerful Visualization
    Generate insightful visualizations, including forecasts, confidence intervals, and intervention effects, to better interpret and communicate results.

Use Cases & Examples


CImpact is versatile and can be applied to various domains to measure the causal impact of interventions. Here are some examples:

  • Marketing Campaigns: Assess the impact of a marketing campaign on sales over time using time series data.
  • Healthcare: Evaluate the effect of a new drug or treatment on patient outcomes over a period.
  • Economic Policy: Measure the impact of a new economic policy or regulatory change on key economic indicators.

Explore the examples/ directory in this repository for further use case examples and code templates.

Code Structure


CImpact/
├── .github/                      # GitHub configuration files for workflows and actions
├── assets/                       # Stores media assets, such as the project logo, used in the README or documentation
├── examples/                     # Example scripts showcasing usage of the library and sample data for testing
├── scripts/                      # Utility scripts for code cleaning, formatting, and other maintenance tasks
├── src/                          # Core library source code, including main modules and adapters for different models
├── tests/                        # Test cases for ensuring code functionality and correctness across modules
├── .coveragerc                   # Configuration file for coverage reporting, specifying which files to include/exclude
├── .gitignore                    # Specifies files and directories for Git to ignore
├── .pylintrc                     # Configuration for Python linter (Pylint) to enforce code style and quality standards
├── CONTRIBUTING.md               # Guidelines for contributing to the project
├── LICENSE.txt                   # License information for the project, detailing usage rights and limitations
├── Makefile                      # Commands for building, testing, and packaging the project in a standard way
├── README.md                     # Project introduction, usage instructions, and documentation (this file)
├── __init__.py                   # Marks the directory as a Python package
├── pyproject.toml                # Python packaging configuration file for managing dependencies and metadata
├── requirements.txt              # List of Python dependencies required to run the project

Installation


CImpact can be installed using one of the following methods:

1. Stable Release (Coming Soon)

The stable release of CImpact will soon be available on PyPI. Once published, you can install it with:

pip install cimpact

Stay tuned for updates on the stable release!

2. Latest Release (Manual Installation)

To access the latest features or contribute to development, you can manually install CImpact by building it from source. Follow the steps below:

Step 1: Clone the Repository

Clone the CImpact repository to your local machine:

git clone https://github.com/Sanofi-Public/CImpact.git
cd CImpact

Step 2: Install Dependencies

Install the required dependencies listed in the requirements.txt file:

pip install -r requirements.txt

Step 3: Build the Wheel File

Build the library into a Python Wheel file:

python -m build

The generated .whl file will be located in the dist/ directory.

Step 4: Install the Wheel File

Use pip to install the wheel file:

pip install dist/cimpact-<version>.whl

Replace <version> with the version number of the generated .whl file. This will install the cimpact library in your environment and now you can use it using the following steps.

Getting Started


Example Usage

Tensowflow model

import pandas as pd
from cimpact import CausalImpactAnalysis

# Load your data
data = pd.read_csv('https://raw.githubusercontent.com/Sanofi-Public/CImpact/master/examples/google_data.csv')

# Define the configuration for the model
model_config = {
    'model_type': 'tensorflow',  # Options: 'tensorflow', 'prophet', 'pyro'
    'model_args': {
        'standardize': True,
        'learning_rate': 0.01,
        'num_variational_steps': 1000,
        'fit_method': 'vi'
    }
}

# Define the pre and post-intervention periods
pre_period = ['2020-01-01', '2020-03-13']
post_period = ['2020-03-14', '2020-03-31']

#Define index column and target column
index_col = 'date'
target_col = 'y'

# Define color variables (optional arguments)
observed_color = "#000000"         # Black for observed
predicted_color = "#7A00E6"        # Sanofi purple for predicted
ci_color = "#D9B3FF66"             # Light lavender with transparency for CI
intervention_color = "#444444"     # Dark gray for intervention
figsize = (10,7)

# Run the analysis
analysis = CausalImpactAnalysis(data, pre_period, post_period, model_config, index_col, target_col, observed_color,  predicted_color, ci_color, intervention_color)
result = analysis.run_analysis()
print(result)
Outcome

Result visualization for Tensorflow model

Posterior inference {CIMpact}

Average Cumulative
Actual 145 2,614
Prediction (s.d.) 180 (10) 3,237 (10)
95% CI [144, 218] [2,880, 3,594]
Absolute effect (s.d.) -35 (15) -623 (15)
95% CI [-61, -11] [-980, -266]
Relative effect (s.d.) -19.08% (7.58%) -19.08% (7.58%)
95% CI [-32.42%, -6.66%] [-32.42%, -6.66%]
Posterior tail-area probability p: 0.15842
Posterior probability of a causal effect: 84.16%

Note

Please refer to examples/how-to-use.md for detailed model configuration instructions and additional usage examples of the library.

Evaluation Methods


CImpact offers comprehensive tools to evaluate model performance and quantify the causal impact of interventions:

  • Summary Statistics
    Obtain detailed point estimates, confidence intervals, and probabilistic measures of the intervention's impact.

  • Impact Visualization
    Generate intuitive plots that display:

    • Observed data versus counterfactual predictions.
    • Estimated impact over time, including uncertainty intervals.
  • Model Diagnostics
    Conduct residual analysis and access diagnostic metrics to evaluate model fit and robustness.

Performance Comparison


CImpact supports a variety of models, each with unique strengths:

  • TensorFlow
    Delivers robust performance with flexibility for advanced inference techniques, such as Variational Inference (VI) and Hamiltonian Monte Carlo (HMC).

  • Prophet
    Offers a user-friendly experience with built-in support for seasonality and holiday effects. While effective for many use cases, it may exhibit slower performance on larger datasets.

  • Pyro
    Excels in Bayesian inference, enabling powerful probabilistic modeling. However, its computational demands can be higher compared to other models.

Model Comparison & Best Practices


Each model in CImpact has unique advantages, and selecting the right model can significantly impact your results. Here are some recommendations for selecting a model based on your use case:

  • Prophet: Best for time series data with clear seasonal patterns and holidays, and if interpretability and ease of use are a priority. However, it might struggle with very large datasets or complex causal relationships.

  • TensorFlow (Bayesian Structural Time Series): Ideal for users who need advanced Bayesian modeling and have computational resources for methods like HMC and VI. It works well for more complex time series data with multiple covariates and non-linearities.

  • Pyro: Choose this model if you need a fully probabilistic approach for causal inference. It's perfect for those who need flexibility with custom priors and Bayesian inference but are comfortable with Pyro's steeper learning curve.

Make sure to benchmark each model using your data before committing to one, and consider running a few trials with different models to compare their performance.

Future Plans


We welcome contributions to enhance and refine the library. While we are particularly interested in contributions in the following areas, we are open to other suggestions as well. If you have any ideas, please create an issue to discuss potential contributions.

  • Add new, qualified models to broaden analytical options. We are currently exploring zero-shot learning models like Google timesfm or Amazon Chronos.
  • Enhanced Visualization**: Develop advanced plotting functions for deeper insights and a better understanding of results.
  • Publish detailed tutorials to help users in effectively utilizing the library.

Contributing


Contributions are welcome! Please see our Contributing Guidelines for details on how to participate.

Contributors


We would like to acknowledge the following individuals for their contributions to the development of this open-source library:

  • Amin Kamaleddin
  • Diplumar Patel
  • Charles Girard
  • Nitesh Soni

License


This work is available for academic research and non-commercial use only. See the file for details. CImpact is licensed under the MIT License. Feel free to use, modify, and distribute it with attribution.

Acknowledgements


We are thankful of Google research (cited below1) team for publishing "Inferring causal impact using Bayesian structural time-series models" research paper and sharing orginal R package to open souce community. We also extend our gratitude to the authors of tfcausalimpact for their foundational work, which inspired this library.


Footnotes

  1. Brodersen, K. H., Gallusser, F., Koehler, J., Remy, N., & Scott, S. L. (2015). Inferring causal impact using Bayesian structural time-series models.

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published