Categories of Statistical Software

Primary Categories

We have identified the following primary categories of statistical software, the current intention of which is twofold:

To guide decisions of which categories (or sub-components) may or may not be in scope; and
To relate these categories to corresponding standards.

These categories ought not be considered in any way mutually exclusive, and it is very likely that any individual piece of software will be described by multiple categories. Decisions of whether a particular piece of software is described by any particular category will also generally involve some degree of ambiguity. These categories have been identified through empirical analyses of both software and conference presentations from the following sources:

All software packages published as articles in both the Journal of Statistical Software and the Journal of Open Source Software.
All software presented at Joint Statistical Meetings (JSMs) 2018 and 2019, or Symposia on Data Science and Statistics (SDSS) 2018, 2019, and 2020.
All conference sessions and associated abstracts from JSM 2018 and 2019, and SDSS 2018, 2019, and 2020.
All CRAN Task Views, and perusal of R packages mentioned therein.

Several descriptions and graphical representations of these raw data are included in the main github repository containing this document. The following categorical descriptions are based primarily on examples which serve to illustrates the kinds of ambiguities and difficulties likely to arise in establishing and delineating the respective categories, and accordingly to guide the construction of standards corresponding to each category.

Explicit standards are currently considered in a separate document. It is envisioned that an author-provided categorisation will guide the selection of appropriate standards which can or should be applied to a given piece of software. The categorisation itself will likely occur via some kind of checklist, with the possibility of checking multiple potential categories. Each of the categories described immediately below includes its own checklist intended to guide the mapping of categories to the explicit standards considered in the separate document. In all cases, it is likely that authors will also be asked to add any additional statements within any chosen category on the unique abilities of the software.

Methods and Algorithms

This main category encompasses all software which implements statistical methods and algorithms. See below in the “Raw Data” section for details of the myriad of potential sub-categories of statistical methods and algorithms. There are a number of sub-categories which some may consider effectively independent, or otherwise beyond the general scope of “Methods and Algorithms”, yet which we consider under this single category because of perceived inability to provide sufficient categorical distinction. These include:

Network software, either for representing, processing, visualising, or analysing networks. Ambiguous examples of such include tcherry (with accompanying JOSS paper); grapherator (with accompanying JOSS paper) which is effectively a distribution generator for data represented in a particular format; and three JSM presentations, one on network-based clustering of high-dimensional data, one on community structure in dynamic networks; and one on Gaussian graphical models.
Software for analysing categorical or qualitative data, which can not be unambiguously distinguished because many methods for dimensionality reduction, and particularly clustering methods, effectively transform data to categorical forms for subsequent (post-)processing.
Spatial software, because spatial statistics are often analogous to non-spatial statistics, yet merely differ by being bound two two orthogonal dimensions (arbitrary numbers of additional dimensions notwithstanding).

We have also prepared an interactive network diagram with nodes representing statistical terms scaled by approximate frequencies of occurrence within JOSS papers, and edges between each pair of nodes scaled according to numbers of JOSS papers which encompass those two terms or concepts. This diagram immediately illustrates the entangled nature of categorical definitions within contemporary statistical software, and provides a strong argument against attempts to distinguish sub-categories.

We nevertheless need to somehow map Method and Algorithm software onto corresponding standards, with the following checklist intended to exemplify the kinds of decisions which will likely need to be made, many in regard to one or more “reference implementations” which software authors may be requested to specify:

Efficiency: Is the software more efficient (faster, simpler, other interpretations of “efficient”) than reference implementations?
Reproducibility or Reliability: Does the software reproduce sufficiently similar results more frequently than reference implementations (or otherwise satisfy similar interpretations of reproducibility)?
Accuracy or Precision: Is the software demonstrably more accurate or precise than reference implementations?
Simplicity of Use: Is the software simpler to use than reference implementations?
Algorithmic Characteristics: Does the algorithmic implementation offer characteristics (such as greater simplicity or sensitivity) superior to reference implementations? If so, which?
Convergence: Does the software provide faster or otherwise better convergence properties than reference implementations?
Method Validity: Does the software overcome demonstrable flaws in previous (reference) implementations? If so, how?
Method Appliciability: Does the software enable a statistical method to be applied to a domain in which such application was not previously possible?
Automation Does the software automate aspects of statistical analyses which previously (in a reference implementation) required manual intervention?
Input Data: Does the software “open up” a method to input data previously unable to be treated by a particular algorithm or method?
Output Data: Does the software provide output in forms previously unavailable by reference implementations?
Reference Standards: Are there any reference standards, such as the US National Institute of Standards and Technology’s collection of reference data sets against which the software may be compared? If so, which?

Note that software which is described by any of the following categories may also directly implement statistical methods or algorithms. Any components of any software which do so can potentially be assessed against this checklist, and it may accordingly be useful to have a simple “meta” checkbox for all software:

Does this software directly implement statistical methods or algorithms?

Those components which do so would then be assessed in more detail with regard to a detailed checklist like the above.

Workflow

This category encompasses software which is more aimed at supporting common statistical workflows than direct analysis. The primary development effort for software in this category is presumed not to be the implementation of particular statistical methods or algorithms, rather the algorithmic support of general statistical workflows. Whereas software in the preceding category may ultimately yield one or more specific models or statistical values, workflow software generally provides more than one of the following:

Classes (whether explicit or not) for representing or processing input and output data;
Generic interfaces to multiple statistical methods or algorithms;
Homogeneous reporting of the results of a variety of methods or algorithms; and
Methods to synthesise, visualise, or otherwise collectively report on analytic results.

Methods and Algorithms software may only provide a specific interface to a specific method or algorithm, although it may also be more general and offer several of the above “workflow” aspects, and so ambiguity may often arise between these two categories. We note in particular that the “workflow” node in the interactive network diagram mentioned above is very strongly connected to the “machine learning” node, generally reflecting software which attempts to unify varied interfaces to varied platforms for machine learning.

Among the numerous examples of software in this category are:

The mlr3 package (with accompanying JOSS paper), which provides, “A modern object-oriented machine learning framework in R.”
The fmcmc package (with accompanying JOSS paper), which provides a unified framework and workflow for Markov-Chain Monte Carlo analyses.
The bayestestR package (with accompanying JOSS paper) for "describing effects and their uncertainty, existence and significance within the Bayesian framework. While this packages includes its own algorithmic implementations, it is primarily intended to aid general Bayesian workflows through a unified interface.

Workflows are also commonly required and developed for specific areas of application, as exemplified by the tabular package (with accompanying JOSS article) for “Analysis, Seriation, and visualisation of Archaeological Count Data”.

Relevant standards for workflow software may be guided by consideration of items such as those in the following checklist, the first four of which in this case largely reflect the four aspects listed above which workflow software may generally provide:

Unified Data: Does the software unify previously disparate forms of data in order to enable a consistent workflow?
Unified Interface: Does the software provide a unified interface to several methods which had to previously be accessed using distinct packages and/or distinct modes of interface?
Unified Results: Does the software synthesise results which were previously only accessible in individual form?
Unified Workflow: Does the software enable a coherent workflow which was not previously possible via any single package?
Educational Workflow Does the software primarily serve an educational purpose, and do so through enabling a structured workflow that aids understanding of statistical or analytic processes?

Statistical Reporting and Meta-Software

Many packages aim to simplify and facilitate the reporting of complex statistical results. Such reporting commonly involves visualisation, and there is direct overlap between this and the Visualisation category. Examples of this category include one package rejected by rOpenSci as out-of-scope, gtsummary, which provides, “Presentation-ready data summary and analytic result tables.” Other examples include:

The smartEDA package (with accompanying JOSS paper) “for automated exploratory data analysis”. The package, “automatically selects the variables and performs the related descriptive statistics. Moreover, it also analyzes the information value, the weight of evidence, custom tables, summary statistics, and performs graphical techniques for both numeric and categorical variables.” This package is potentially as much a workflow package as it is a statistical reporting package, and illustrates the ambiguity between these two categories.
The modeLLtest package (with accompanying JOSS paper) is “An R Package for Unbiased Model Comparison using Cross Validation.” Its main functionality allows different statistical models to be compared, likely implying that this represents a kind of meta package.
The insight package (with accompanying JOSS paper) provides “a unified interface to access information from model objects in R,” with a strong focus on unified and consistent reporting of statistical results.
The arviz software for python (with accompanying JOSS paper) provides “a unified library for exploratory analysis of Bayesian models in Python.”
The iRF package (with accompanying JOSS paper) enables “extracting interactions from random forests”, yet also focusses primarily on enabling interpretation of random forests through reporting on interaction terms.

In addition to potential overlap with the Visualisation category, potential standards for Statistical Reporting and Meta-Software are likely to overlap to some degree with the preceding standards for Workflow Software. Checklist items unique to statistical reporting software might include the following:

Automation Does the software automate aspects of statistical reporting, or of analysis at some sufficiently “meta”-level (such as variable or model selection), which previously (in a reference implementation) required manual intervention?
General Reporting: Does the software report on, or otherwise provide insight into, statistics or important aspects of data or analytic processes which were previously not (directly) accessible using reference implementations?
Comparison: Does the software provide or enable standardised comparison of inputs, processes, models, or outputs which could previously (in reference implementations) only be accessed or compared some comparably unstandardised form?
Interpretation: Does the software facilitate interpretation of otherwise abstruse processes or statistical results?
Exploration: Does the software enable or otherwise guide exploratory stages of a statistical workflow?

Education

Software for the purposes of education may be considered in scope. A prominent example of this category is the LearnBayes package, which provides functions for learning Bayesian inference, and includes many of its own implementations. This category may be of particular interest or relevance because of a potentially direct connection with the Journal of Open Source Education, which has a peer review system (almost) identical to the JOSS. Educational statistical software reviewed by rOpenSci could thus potentially be fast-tracked through JOSE reviews just as current (non-statistical) submissions have the opportunity to be fast-tracked through the JOSS review process. Many examples of educational statistical software are listed on the CRAN Task View: Teaching Statistics. This page also clearly indicates the likely strong overlap between education and visualisation software. With specific regard to the educational components of software, the follow checklist items may be relevant.

Demand: Does the software meet a clear demand otherwise absent from educational material? If so, how?
Audience: What is the intended audience or user base? (For example, is the software intended for direct use by students of statistics, or does it provide a tool for educational professionals to use in their own practice?)
Algorithms: What are the unique algorithmic processes implemented by the software? In what ways are they easier, simpler, faster, or otherwise better than reference implementations (where such exist)?
Interactivity: Is the primary function of the software interactive? If so, is the interactivity primarily graphical (for example, web-based), text-based, or other?

Visualisation

While many may consider software primarily aimed at visualisation to be out of scope, there are nevertheless cases which may indeed be within scope, notably including the ggfortify package which allows results of statistical tests to be “automatically” visualised using the ggplot2 package. The list of “fortified” functions on the packages webpage clearly indicates the very predominantly statistical scope of this software which is in effect a package for statistical reporting, yet in visual rather than tabular form. Other examples of visualisation software include:

modelStudio (with accompanying JOSS paper), a package which is also very much a workflow package.
The shinyEFA package (with accompanying JOSS paper) which provides a, “User-Friendly Shiny Application for Exploratory Factor Analysis.”
The autoplotly package (with accompanying JOSS paper) which provides, “Automatic Generation of Interactive Visualisations for Statistical Results”, primarily by porting the output of the authors’ above-mentioned ggfortify package to plotly.js.

Many software packages also include functions to visualise output, and this category can likely not be considered out of scope in any absolute sense because there must exist a grey zone with regard to the relative amount of code devoted to visualisation routines implemented by a software package for it to be considered a “visualisation” package. Decisions of scope are likely better made with regard to whether the primary purpose of the visualisation is indeed statistical, or perhaps by insisting that visualisation software must also represent at least one of the other primary categories considered here. A checklist for visualisation software may consist of items such as the following.

Purpose: Does the software primarily serve a visualisation purpose? If so, is that purpose primary or secondary?
Statistical Importance: Do the visualisations provide important abilities in presenting or interpreting statistical data, models, results, or processes not (readily) provided through other software?
Algorithms and Methods: Are the visualisations generated by internal implementations of statistical algorithms or methods?
Education: Are the visualisation capabilities of the software (primarily) intended to serve an educational purpose?
Wrapper: Is the package a wrapper around visualisation software not previously (directly) accessible for the statistical analyses enabled by the package?

Wrapper Packages

Wrapper packages provide an interface to previously-written software, often in a different computer language to the original implentation. While this category is reasonably unambiguous, there may be instances in which a “wrapper” additionally offers extension beyond original implementations, or in which only a portion of a package’s functionality may be “wrapped.” Issues to consider for wrapper packages include the extent of functionality represented by wrapped code, and the computer language being wrapped. Rather than internally bundling or wrapping software, a package may also serve as a wrapper thorugh providing access to some external interface, such as a web server. Examples of potential wrapper packages include the following:

The greta package (with accompanying JOSS article) “for writing statistical models and fitting them by MCMC and optimisation” provides a wrapper around google’s TensorFlow library. It is also clearly a workflow package, aiming to provide a single, unified workflow for generic machine learning processes and analyses.
The nse package (with accompanying JOSS paper) which offers “multiple ways to calculate numerical standard errors (NSE) of univariate (or multivariate in some cases) time series,” through providing a unified interface to several other R packages to provide more than 30 NSE estimators. This is an example of a wrapper package which does not wrap either internal code or external interfaces, rather it effectively “wraps” the algorithms of a collection of R packages.

The following checklist items may be relevant in considering wrapper software.

Internal or External: Does the software internally wrap of bundle previously developed routines, or does it provide a wrapper around some external service? If the latter, what kind of service (web-based, or some other form of remote access)?
Language: For internally-bundled routines, in which computer language are the routines written? And how are they bundled? (For R packages: In ./src? In ./inst? Elsewhere?)
Testing: Please provide references, links, or other material relating to or describing how the wrapped software has been tested, and describe how such prior tests have been integrated within current package structure.
Unique Advances: What unique advances does the software offer beyond those offered by the (internally or externally) wrapped software?

Statistical Indices and Scores

Many packages are designed to provide one or more specific statistical indices or scores from some assumed type of input data. Even though methodology used to derive indices or scores may draw on many of the methods or algorithms considered in the first category above, and detailed below, such software may likely be considered within its own category through a singular aim to provide particular indices or scores, in contrast with generic “Methods and Algorithms” software which offers some degree of abstraction in terms of either input or output data, or both. Examples include,

The spatialwarnings package which provides “early-warning signal of ecosystem degradation,” where these signals and associated indices are highly domain-specific.
The heatsaveR package which calculates and displays marine heatwaves using specific indices established in previously-published literature.
The hhi which calculates and visualizes “Herfindahl-Hirschman Index Scores,” which are measures of numeric concentration.
The DscoreApp package which provides an index (the “D-Score”) to quantify the results of Implicit Association Tests.
The thurstonianIRT package (with accompanying JOSS paper) for score forced-choice questionnaires using “Item Response Theory”.

The following checklist items may be relevant in considering software developed to calculate particular indices or scores.

Uniqueness: Are there any implementations to calculate the indices or scores in other computing languages? If so, what are they?
Intended Audience: Who is likely to gain through an ability to calculate such indices or scores?
Accuracy or Precision: Is it possible to confirm the accuracy or precision of the algorithmic implementation?
Performance: Does the software offer superior performance over equivalent ways of calculating analogous scores or indices? If so, how may such performance gains be assessed?
Utility: How laborious would the calculation of the given indices or scores be with alternative software? And what would be the risks of doing so?
Reproducibility and Other Potential Advantages: Is the ability of the software to calculate the given indices or scores likely to enhance the ability of others to reproduce either a general workflow, or previous results, or both?

Additional Applied Categories

There will likely be a host of additional categories of software developed for particular applied domains. One distinguishing feature of such software appears to be the use of custom-developed classes or equivalent representations for input (and often output) data.

Raw Data from Conference Programs

Taken from programs for the following conferences:

Joint Statistical Meetings 2018, 2019, or online text versions with links to abstracts here for 2018, 2019.
Symposium on Data Science and Statistics (SDSS) 2018, 2019, 2020.

Categories which arise from analyses of JOSS abstracts yet do not emerge from these conference programs include:

workflow
exploratory data analysis
summary statistics
statistical reporting

Each of the items listed below is followed by a list of sub-categories. As for the primary categories above, these ought be considered neither mutually exclusive nor unambiguous, and indeed many piece of software and conference presentations explicitly seek to combine and traverse multiple categories.

Algorithms & Methods Categories

Analyses of (multivariate) extremes; anomaly detection
- See CRAN Task View: Extreme Value Analysis
- distributional properties
- reproducibility
- sensitivity to thresholds
- Examples
  - molic with accompanying JOSS paper
  - Conference presentations: 1, 2, 3
(Automated) Variable and model selection
- reproducibility
- Examples 1, 2, 3, 4, 5, 6, 7
Bayesian modelling and statistics
- See CRAN Task View: Bayesian Inference
- sensitivity to prior distributions
- model behaviour under null prior
- Examples 1,
Causal inference, graphical models, and statistical decision making
- causal tree complexity and traversal behaviour
- Examples 1, 2
Clustering
- See CRAN Task View: Cluster Analysis & Finite Mixture Models
- algorithm
- benchmarking against other algorithms
- Examples 1, 2, 3, 4, 5, 6
Dynamical systems
Feature selection
- benchmarking against other algorithms
Functional data analysis
- See CRAN Task View: Functional Data Analysis
- representation of functions
- statistical properties
Genetic Algorithms
High-dimensional data and (non-linear) dimensionality reduction
- Examples 1
Hypothesis testing
- Examples 1
Maximum likelihood
Measurement error, missing data, reliability, model uncertainty
- See CRAN Task View: Missing Data
Monte Carlo, including Markov Chain processes
- sensitivity and reproduciblity
(Multiple) Imputation and synthetic data
- sensitivity and reproduciblity
Non- and semi-parametric methods
- comparison with parametric methods
Non-probability samples and probability samples; sampling techniques
- distributional properties
- divergence of non-probability samples from underlying distribution
(Multivariate) Time series, non-stationarity, changepoint analysis
- See CRAN Task View: Time Series Analysis
- comparison with stationary methods
- sensitivity of changepoints
- model error
- Examples 1, 2, 3 4
Power calculations
Probability
Regression / random trees and forests
- Examples 1
Risk prediction and analysis
- sensitivity
- error
(Single- and multi-dimensional, static and dynamic) smoothing
- tolerance; feature loss
(Stochastic) Optimization
- See CRAN Task View: Optimization and Mathematical Programming
(Supervised, Unsupervised, Automated, Interactive) Machine and Deep Learning, Statistical Learning
- See CRAN Task View: Machine Learning & Statistical Learning
Survival analysis
- See CRAN Task View: Survival Analysis
Warping

Data Categories

Binary and ordinal data
Complex-valued data
Discrete data
Longitudinal and correlated data analyses
Structured and dynamically dependent data
Time-to-event data
Anonymization and (data) privacy; blockchain
- Examples 1
Backcasting, nowcasting, forecasting
(Dynamic) Networks and networks statistics
- underlying data representation
- transferability within other network software
- Examples 1, 2
Financial data and econometrics
- See CRAN Task View: Econometrics and CRAN Task View: Empirical Finance
Information statistics
“omics” data
Randomized clinical trials
- See CRAN Task View: Clinical Trial Design, Monitoring, and Analyzing.
Seasonal data
Spatial and spatio-temporal modelling
- See CRAN Task View: Analysis of Spatial Data and CRAN Task View: Handling and Analyzing Spatio-Temporal Data
- Examples 1
Streaming data

Miscellaneous Categories

Data visualisation
Statistical education
- See CRAN Task View: Teaching Statistics
- Examples 1, 2
Meta analysis
- See CRAN Task View: Meta-Analysis

An additional category of both software, and likely ubiquitous within conference programs, concerns the calculation of specific statistical metrics or indices.

Software packages

There are no python packages mentioned or presented in JMS abstracts.

LearnBayes: Functions for learning Bayesian inference
rethinking: Statistical rethinking course and book packages
rms: Regression modelling strategies
revisit: a Tool for Statistical Reproducibility and for Teaching
liftr: Persistent reproducible reporting by containerization of R Markdown documents
gmediation: Mediation Analysis for Multiple and Multi-Stage Mediators
scdensity: Shape-Constrained Kernel Density Estimation
SurvBoost: Gradient Boosting for Survival Data
EAinference: Estimator Augmentation and Simulation-Based Inference
gLRTH: Genome-Wide Association and Linkage Analysis under Heterogeneity
rpms: Recursive Partitioning for Modeling Survey Data
ggdag: Analyze and Create Elegant Directed Acyclic Graphs
confoundr: Diagnostics for Confounding of Time-Varying and Other Joint Exposures
adapr: Implementation of an Accountable Data Analysis Process
conf: Visualisation and Analysis of Statistical Measures of Confidence
medExtractR: Extraction of Medication Information from Clinical Text
isni: Index of Local Sensitivity to Nonignorability
PhysicalActivity: Process Accelerometer Data for Physical Activity Measurement
accelmissing: Missing Value Imputation for Accelerometer Data
lmboot: Bootstrap in Linear Models
TeachingDemos: Demonstrations for Teaching and Learning
ghclass: Tools for managing classroom organizations
ggvoronoi: Voronoi Diagrams and Heatmaps with ‘ggplot2’
igraphmatch: Tools to find the correspondences between vertices in different graphs
Intkrige: A Numerical Implementation of Interval-Valued Kriging
Bioc2mlr: Utility functions to transform Bioconductor’s S4 omic classes into mlr’s task and CPOs
MHTdiscrete: Multiple Hypotheses Testing for Discrete Data

Packages mentioned but not as focus:

twang: Toolkit for Weighting and Analysis of Nonequivalent Groups
Zelig: Everyone’s Statistical Software
rbounds: Perform Rosenbaum bounds sensitivity tests for matched and unmatched data
refund: Regression with Functional Data
mgcv: Mixed GAM Computation Vehicle with Automatic Smoothness Estimation

As well as the neurconductor and cloudyr platforms.

Python packages presented or mentioned in SDSS programs:

altair: Declarative Visualisation in Python
salmon + github: Symbolic algebra of linear regression and modeling
symbulate: A symbolic algebra for specifying simulations

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

categories.md

categories.md

Categories of Statistical Software

Primary Categories

Methods and Algorithms

Workflow

Statistical Reporting and Meta-Software

Education

Visualisation

Wrapper Packages

Statistical Indices and Scores

Additional Applied Categories

Raw Data from Conference Programs

Algorithms & Methods Categories

Data Categories

Miscellaneous Categories

Software packages

Files

categories.md

Latest commit

History

categories.md

File metadata and controls

Categories of Statistical Software

Primary Categories

Methods and Algorithms

Workflow

Statistical Reporting and Meta-Software

Education

Visualisation

Wrapper Packages

Statistical Indices and Scores

Additional Applied Categories

Raw Data from Conference Programs

Algorithms & Methods Categories

Data Categories

Miscellaneous Categories

Software packages