Skip to content

RobustiPy/robustipy

coverage Generic badge Generic badge DOI

Welcome to the home of RobustiPy, a Python library for the creation of a more robust and stable model space. RobustiPy does a large number of things, included but not limited to: high dimensional visualisation, Bayesian Model Averaging, bootstrapped resampling, (in- and)out-of-sample model evaluation, model selection via Information Criterion, explainable AI (via SHAP), and joint inference tests (as per Simonsohn et al. 2019).

Full documentation is available on Read the Docs. The first release is indexed onto Zenodo here.

RobustiPy performs Multiversal/Specification Curve Analysis which attempts to compute most or all reasonable specifications of a statistical model, understanding a specification as a single attempt to create an estimand of interest, whether through a particular choice of covariates, hyperparameters, data cleaning decisions, and so forth.

More formally, lets assume we have a general model of the form:

$$ \hat{y} = \hat{f}(x, \textbf{z}) + \epsilon . $$

We are essentially attempting to model (single or multiple) dependent variables ($y$) using some kind of function $f()$, some predictor(s) $x$, some covariates $z$, and random error $\epsilon$. For all of these elements, different estimates of the coefficient of interest are produced. Let's assume $y$, $x$ and $z$ are imperfect latent variables or a collection of latent variables. Researchers can come up with reasonable operationalisations of $y$, $x$ and $z$, running the analysis most usually with one or a small number of combinations of them. Ideally -- in an age of vast computational resources -- we should take all such reasonable operationalisations, and store them in sets:

$$Y = \{y_{1}, y_{2}, \dots, y_{n}\}$$ $$X = \{x_{1}, x_{2}, \dots, x_{n}\}$$ $$Z = \{z_{1}, z_{2}, \dots, z_{n}\}$$

RobustiPy will then:

$$\Pi = \left\{ \overline{S_i} \mid S_i \in \mathcal{P}(Y) \text{ and } S_i \neq \emptyset \right\} \times X \times \mathcal{P}(Z)$$

In words, it creates a set contaning the aritmentic mean of the elements of the powerset $\mathcal{P}$ (all possible combination of any length) of $Y$, the set $X$ and the powerset of $Z$ to then produce the Cartesian product of these sets, creating the full set of possible model specifications $\Pi$. RobustiPy then takes these specifications, fits them against observable (tabular) data, and produces coefficients and relevant metrics for each version of the predictor $x$ in the set $X$.

A paper which more fully describes RobustiPy and all of its examples can be found here.

Installation

Installing RobustiPy is simple. To get our most stable current release, simply do:

pip install robustipy

If you want the latest features and releases, clone the repository directly from GitHub:

git clone https://github.com/RobustiPy/robustipy.git
cd robustipy
pip install .

Usage

In a Python script (or Jupyter Notebook), import the OLSRobust class by running:

from robustipy.models import OLSRobust
model_robust = OLSRobust(y=y, x=x, data=data)
model_robust.fit(controls=c, # a list of control variables
	         draws=1000, # number of bootstrap resamples
                 kfold=10, # number of folds for OOS evaluation
                 seed=192735 # an optional but randomly chosen seed for consistent reproducibility
)
model_results = model_robust.get_results()

Where y is a list of (string) variable names used to create your dependent variable, x is your dependent (string) variable name of interest (which can be a list of len>1), and c is a list of control (string) variable names predictors. If you don't fully specify all the things that RobustiPy needs, it will prompt the user through inquiry (this currently includes the number of CPUs to use, the seed or "random state", the number of draws, and the number of folds).

Examples

There are ten empirical example notebooks here which replicate high profile research and teaching examples, and five relatively straightforward simulated examples scripts here. The below is the output of a results.plot() function call made on the canonical union dataset. Note: results.summary() also prints out a large number of helpful statistics about your models, and the results object more broadly stores all results for downstream analysis (as done in the examples which replicate Mankiew et al. 1992 and the infamously retracted Gino et al. 2020 in the ./empirical_examples subdirectory).

Union dataset example

Citing

To cite Robustipy, please consider this reference:

@misc{ibarra2025introducingrobustipyefficientgeneration,
      title={Introducing RobustiPy: An efficient next generation multiversal library with model selection, averaging, resampling, and explainable artificial intelligence}, 
      author={Daniel Valdenegro Ibarra and Jiani Yan and Duiyi Dai and Charles Rahal},
      year={2025},
      eprint={2506.19958},
      archivePrefix={arXiv},
      primaryClass={stat.ME},
      url={https://arxiv.org/abs/2506.19958}, 
}

Website

We have a website made with jekkyl-theme-minimal that you can find here. It also contains all recent news and updates, and information on a Hackathon we ran in 2024!

Contributing and Code of Conduct

Please kindly see our guide for contributors file as well as our code of conduct. If you would like to become a formal project maintainer, please simply contact the team to discuss!

License

This work is free. You can redistribute it and/or modify it under the terms of the GNU GPL 3.0 license. The datasets which are pulled in as part of the ./empirical_examples are (with one reservation) all publicly available, and come with their own licenses which must be respected accordingly.

Acknowledgements

We are grateful to the extensive comments made by various academic communities over the course of our thinking about this work, not least the members of the ESRC Centre for Care and the Leverhulme Centre for Demographic Science.

CfC LCDS

About

A place where we develop RobustiPy

Topics

Resources

License

Code of conduct

Contributing

Stars

Watchers

Forks

Packages

No packages published

Contributors 4

  •  
  •  
  •  
  •