Skip to content
/ scumpy Public

Library for doing both symbolic and numeric calculations for linear Structural Causal Models (SCM)

License

Notifications You must be signed in to change notification settings

rrtucci/scumpy

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

SCuMpy

Pond SCMPond SCM

Welcome to SCuMpy! The name is intended to (1) evoke the name of the amazing Python library NumPy, and (2) be a portmanteau of SCM (Structural Causal Model) and python.

A linear SCM is a DAG whose arrows are labelled by path coefficients (a.k.a. gains). If you want to learn more about linear SCM, explained using my notational conventions which are also SCuMpy’s notational conventions, check out my free, open source book Bayesuvius. Look in the chapter entitled “Linear Deterministic Bnets with External Noise”.

In SCuMpy, we use the following simple notation. Let $\underline{a}, \underline{b}$ be any 2 random variables. (I underline random variables instead of the usual convention of capitalizing them.) Then

$$\text{Mean Value of } \underline{a}= \langle\underline{a}\rangle$$ $$\text{Covariance}(\underline{a}, \underline{b})= \langle\underline{a}, \underline{b}\rangle$$ $$\text{Standard Deviation of }\underline{a} = \sigma_{\underline{a}} = \sqrt{ \langle\underline{a}, \underline{a}\rangle}$$ $$\text{Correlation}(\underline{a}, \underline{b}) = \rho_{ \underline{a}, \underline{b} }= \frac{\langle\underline{a}, \underline{b}\rangle} {\sigma_{\underline{a}}\sigma_{\underline{b}}}$$ $$\frac{\partial\underline{b}}{ \partial\underline{a}}= \frac{\langle\underline{a}, \underline{b}\rangle} {\langle\underline{a}, \underline{a}\rangle}$$

Linear SCM are described by a system of linear equations of the form

$$\underline{x}_{i}={\sum}_{j}{\alpha}_{i|j}\underline{x}_j+\underline {\epsilon}_i$$

where the $x_i$ are the internal nodes, the $\alpha_{i|j}$ are the path coefficients (a.k.a. gains), and the $\underline{\epsilon}_i$ are the external nodes that inject noise into the system. The $\underline{\epsilon}_i$ are root nodes with zero covariance with each other.

We can view this as either

  1. a linear system of equations with the unknowns $\underline{x}_i$. We can solve for these unknowns using basic Linear Algebra. Once we solve for the unknowns, we can calculate $\langle\underline{x}_i, \underline{x}_k\rangle$.
  2. a linear system of equations with the unknowns $\alpha_{i|j}$. We can solve for these unknowns using basic Linear Algebra.

SCuMpy takes as input a DAG expressed as a dot file. A dot file is a text file describing a single (usually) DAG in the dot language. The dot language is the language used to describe DAGs by the graph rendering software GraphViz. SCuMpy stores all its dot files in a folder entitled "dot_atlas". Another term for "dot atlas" is "DAG atlas".

At this point, given a DAG dot file as input, SCuMpy can do (1) and (2), symbolically, using the excellent Python symbolic manipulator SymPy. To test SCuMpy, we used the 20 DAGs defined in the "G&B-trols" paper:

A Crash Course in Good and Bad Controls, by Carlos Cinelli, Andrew Forney and Judea Pearl

In the SCuMpy folder called "jupyter_notebooks", you will find 20 notebooks where (1) is done symbolically, and another 20 notebooks where (2) is done symbolically, for each of the 20 DAGs in the G&B-trols paper.

In addition, we have notebooks for the back door, front door and napkin examples in Pearl's the "Book of Why".

SCuMpy can also be used to prove rigorously, from its symbolic output, whether a do-query is identifiable for a particular DAG. It can do this without using the fairly complicated Do Calculus rules, which are the most general way of establishing identifiability. See this notebook

SCuMpy can also do some numeric calculations. It can calculate numerically the arrow gain $\alpha_{i|j}$ for each arrow $\underline{x}_j\rightarrow\underline{x}_i$ of the DAG. The estimation algorithm requires as input a file which contains a dataset with the node names as column labels, and with node values as rows. See this notebook

Exciting news. SCuMpy can now do linear SCM with feedback loops. See this blog post of mine for more details. This can be used to do Causal Inference with time series (a.k.a. panel data).

Installation Instructions

See this blog post of mine.

About

Library for doing both symbolic and numeric calculations for linear Structural Causal Models (SCM)

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published