Welcome to SCuMpy! The name is intended to (1) evoke the name of the amazing Python library NumPy, and (2) be a portmanteau of SCM (Structural Causal Model) and python.
A linear SCM is a DAG whose arrows are labelled by path coefficients (a.k.a. gains). If you want to learn more about linear SCM, explained using my notational conventions which are also SCuMpy’s notational conventions, check out my free, open source book Bayesuvius. Look in the chapter entitled “Linear Deterministic Bnets with External Noise”.
In SCuMpy, we use the following simple notation.
Let
Linear SCM are described by a system of linear equations of the form
where the
We can view this as either
- a linear system
of equations with the unknowns
$\underline{x}_i$ . We can solve for these unknowns using basic Linear Algebra. Once we solve for the unknowns, we can calculate$\langle\underline{x}_i, \underline{x}_k\rangle$ . - a linear system
of equations with the
unknowns
$\alpha_{i|j}$ . We can solve for these unknowns using basic Linear Algebra.
SCuMpy takes as input a DAG expressed as a dot file. A dot file is a text file describing a single (usually) DAG in the dot language. The dot language is the language used to describe DAGs by the graph rendering software GraphViz. SCuMpy stores all its dot files in a folder entitled "dot_atlas". Another term for "dot atlas" is "DAG atlas".
At this point, given a DAG dot file as input, SCuMpy can do (1) and (2), symbolically, using the excellent Python symbolic manipulator SymPy. To test SCuMpy, we used the 20 DAGs defined in the "G&B-trols" paper:
A Crash Course in Good and Bad Controls, by Carlos Cinelli, Andrew Forney and Judea Pearl
In the SCuMpy folder called "jupyter_notebooks", you will find 20 notebooks where (1) is done symbolically, and another 20 notebooks where (2) is done symbolically, for each of the 20 DAGs in the G&B-trols paper.
In addition, we have notebooks for the back door, front door and napkin examples in Pearl's the "Book of Why".
SCuMpy can also be used to prove rigorously, from its symbolic output, whether a do-query is identifiable for a particular DAG. It can do this without using the fairly complicated Do Calculus rules, which are the most general way of establishing identifiability. See this notebook
SCuMpy can also do some numeric
calculations. It can calculate
numerically the arrow
gain
Exciting news. SCuMpy can now do linear SCM with feedback loops. See this blog post of mine for more details. This can be used to do Causal Inference with time series (a.k.a. panel data).
See this blog post of mine.