Graph Data Science¶
The kglab package provides a simple abstraction layer in Python for building knowledge graphs.
The main goal is to leverage idiomatic Python for common use cases in data science and data engineering work that require graph data, presenting graph data science as an emerging practice.
Cut to the Chase¶
- To get started right away, jump to Getting Started
- For other help, see Community Resources
- For an extensive, hands-on coding tour through kglab, follow the Tutorial notebooks
- Check the source code at https://github.com/DerwenAI/kglab
Motivations¶
Note
FAQ: Why build yet another graph library, when there are already so many available?
A short list of primary motivations have been identified for kglab, its design criteria, and engineering trade-offs:
Popular Graph Libraries¶
Point 1: integrate with popular graph libraries, including RDFlib, OWL-RL, pySHACL, NetworkX, iGraph, PyVis, node2vec, pslpython, pgmpy, and so on – several of which would otherwise not have much common ground.
Data Science Workflows¶
Point 2: close integration plus example code for working with the "PyData" stack, namely pandas, NumPy, scikit-learn, matplotlib, etc., as well as PyTorch, and other quintessential data science tools.
Distributed Systems Infrastructure¶
Point 3: integrate efficiently with Big Data tools and practices for contemporary data engineering and cloud computing infrastructure, including: Ray, Jupyter, RAPIDS, Apache Arrow, Apache Parquet, Apache Spark, etc.
Natural Language Understanding¶
Point 4:
incorporate graph data science practices and
semantic technologies
into
spaCy
pipelines, e.g., through
pytextrank
,
plus
Rubrix.ml
and other customized
natural language
pipelines.
Hybrid AI Approaches¶
Point 5: explore "hybrid" approaches that combine machine learning with symbolic, rule-based processing – including probabilistic graph inference and knowledge graph embedding.