Graph Data Science¶

illustration of a knowledge graph, plus laboratory glassware

The kglab package provides a simple abstraction layer in Python for building knowledge graphs.

The main goal is to leverage idiomatic Python for common use cases in data science and data engineering work that require graph data, presenting graph data science as an emerging practice.

Cut to the Chase¶

To get started right away, jump to Getting Started
For other help, see Community Resources
For an extensive, hands-on coding tour through kglab, follow the Tutorial notebooks
Check the source code at https://github.com/DerwenAI/kglab

Note

FAQ: Why build yet another graph library, when there are already so many available?

A short list of primary motivations have been identified for kglab, its design criteria, and engineering trade-offs:

Point 1: integrate with popular graph libraries, including RDFlib, OWL-RL, pySHACL, NetworkX, iGraph, PyVis, node2vec, pslpython, pgmpy, and so on – several of which would otherwise not have much common ground.

Point 2: close integration plus example code for working with the "PyData" stack, namely pandas, NumPy, scikit-learn, matplotlib, etc., as well as PyTorch, and other quintessential data science tools.

Point 3: integrate efficiently with Big Data tools and practices for contemporary data engineering and cloud computing infrastructure, including: Ray, Jupyter, RAPIDS, Apache Arrow, Apache Parquet, Apache Spark, etc.

Point 4: incorporate graph data science practices and semantic technologies into spaCy pipelines, e.g., through pytextrank, plus Rubrix.ml and other customized natural language pipelines.

Point 5: explore "hybrid" approaches that combine machine learning with symbolic, rule-based processing – including probabilistic graph inference and knowledge graph embedding.

Last update: 2022-03-10