PyGraphistry is a Python visual graph analytics library to extract, transform, and load big graphs into Graphistry's visual graph analytics platform. It is typically used by data scientists, developers, and operational analysts on problems like visually mapping the behavior of devices and users.
The Python client makes it easy to go from your existing data to a Graphistry server. Through strong notebook support, data scientists can quickly go from data to accelerated visual explorations, and developers can quickly prototype stunning solutions with their users.
Graphistry supports unusually large graphs for interactive visualization. The client's custom WebGL rendering engine renders up to 8MM nodes and edges at a time, and most older client GPUs smoothly support somewhere between 100K and 1MM elements. The serverside GPU analytics engine supports even bigger graphs.
Click to open interactive version! (For server-backed interactive analytics, use an API key) Source data: SNAP |
-
Fast & Gorgeous: Cluster, filter, and inspect large amounts of data at interactive speed. We layout graphs with a descendant of the gorgeous ForceAtlas2 layout algorithm introduced in Gephi. Our data explorer connects to Graphistry's GPU cluster to layout and render hundreds of thousand of nodes+edges in your browser at unparalleled speeds.
-
Notebook Friendly: PyGraphistry plays well with interactive notebooks like Juypter, Zeppelin, and Databricks: Process, visualize, and drill into with graphs directly within your notebooks.
-
Great for Events, CSVs, and more: Not sure if your data is graph-friendly? PyGraphistry's
hypergraph
transform helps turn any sample data like CSVs, SQL results, and event data into a graph for pattern analysis:rows = pandas.read_csv('transactions.csv')[:1000] graphistry.hypergraph(rows)['graph'].plot()
-
Batteries Included: PyGraphistry works out-of-the-box with popular data science and graph analytics libraries. It is also very easy to turn arbitrary data into insightful graphs:
-
edges = pd.read_csv('facebook_combined.txt', sep=' ', names=['src', 'dst']) graphistry.bind(source='src', destination='dst').plot(edges)
table_rows = pd.read_csv('honeypot.csv') graphistry.hypergraph(table_rows, ['attackerIP', 'victimIP', 'victimPort', 'vulnName'])['graph'].plot()
graphistry.hypergraph(table_rows, ['attackerIP', 'victimIP', 'victimPort', 'vulnName'], direct=True, opts={'EDGES': { 'attackerIP': ['victimIP', 'victimPort', 'vulnName'], 'victimIP': ['victimPort', 'vulnName'], 'victimPort': ['vulnName'] }})['graph'].plot()
-
NEO4J_CREDS = {'uri': 'bolt://my.site.ngo:7687', 'auth': ('neo4j', 'mypwd')} graphistry.register(bolt=NEO4J_CREDS) graphistry.cypher("MATCH (a)-[p:PAYMENT]->(b) WHERE p.USD > 7000 AND p.USD < 10000 RETURN a, p, b").plot()
graphistry.cypher("CALL db.schema()").plot()
from neo4j import GraphDatabase, Driver graphistry.register(bolt=GraphDatabase.driver(**NEO4J_CREDS)) graphistry.cypher("MATCH (a)-[p:PAYMENT]->(b) WHERE p.USD > 7000 AND p.USD < 10000 RETURN a, p, b").plot()
-
g = graphistry.tigergraph(protocol='https', ...) g2 = g.gsql("...", {'edges': '@@eList'}) g2.plot() print('# edges', len(g2._edges))
g.endpoint('my_fn', {'arg': 'val'}, {'edges': '@@eList'}).plot()
-
graph = igraph.read('facebook_combined.txt', format='edgelist', directed=False) graphistry.bind(source='src', destination='dst').plot(graph)
-
graph = networkx.read_edgelist('facebook_combined.txt') graphistry.bind(source='src', destination='dst', node='nodeid').plot(graph)
-
hg.hypernetx_to_graphistry_nodes(H).plot()
hg.hypernetx_to_graphistry_bipartite(H.dual()).plot()
-
df = splunkToPandas("index=netflow bytes > 100000 | head 100000", {}) graphistry.bind(source='src_ip', destination='dest_ip').plot(df)
-
graphistry.nodexl('/my/file.xls').plot()
graphistry.nodexl('https://file.xls').plot()
graphistry.nodexl('https://file.xls', 'twitter').plot() graphistry.nodexl('https://file.xls', verbose=True).plot() graphistry.nodexl('https://file.xls', engine='xlsxwriter').plot() graphistry.nodexl('https://file.xls')._nodes
-
Twitter Botnet |
Edit Wars on Wikipedia Source: SNAP |
100,000 Bitcoin Transactions |
Port Scan Attack |
Protein Interactions Source: BioGRID |
Programming Languages Source: Socio-PLT project |
You need to install the PyGraphistry client somewhere and connect it to a Graphistry server. We recommend the following options:
- Private Graphistry Server: One-click launch with Graphistry, PyGraphistry, and Jupyter Notebooks preinstalled and ready to go out-of-the-box (AWS and Azure)
pip install graphistry
: If you already have Jupyter Notebook installed or are using a system like Google Colab, install the PyGraphistry pip package. Connect it to a free Graphistry Hub account or a private Graphistry server.
For new users who have AWS accounts, simply launch the self-serve Graphistry AMI. See GrNaphistr.com for additional quick launch modes.
A private server provides several benefits for getting started:
- PyGraphistry is preinstalled
- Jupyter Notebook is preinstalled
- Starter examples of using with different files, databases, and Nvidia RAPIDS are provided
- Preconfigured backend server: Nvidia drivers,
nvidia-docker
, Graphistry server, RAPIDS environment, etc. - Running in your private server means you can safely explore private data there
The server gracefully stops/starts: Control potential server utility fees by simply stopping the server when not using it.
Install PyGraphistry into your own Python app or data science notebook environment such as Jupyter and Google Colab. Connect it to a Graphistry server such as the self-serve Graphistry AMI or a free Graphistry Hub account.
Install PyGraphistry with Python's pip
package manager:
- Pandas only (recommended):
pip install --user graphistry
-
- neo4j:
pip install --user "graphistry[bolt]"
- neo4j:
-
- IGraph, NetworkX, Neo4j:
pip install --user "graphistry[all]"
- IGraph, NetworkX, Neo4j:
The latter two can be skipped if you already have the third-party Python packages at the appropriate versions installed.
Provide your API credentials to upload data to your Graphistry GPU server:
import graphistry
#graphistry.register(key='Your key') # 1.0 API
#graphistry.register(api=3, username='your name', password='your pwd') # 2.0 API, logged out after 1hr
#graphistry.register(api=3, token='your JWT token') # 2.0 API, expires after 1hr
For the 2.0 API, your username/password are the same as your Graphistry account, and your session expires after 1hr. The temporary JWT token (1hr) can be generated via the REST API using your login credentials, or by visiting your landing page.
Optionally, for convenience, you may set your API key in your system environment and thereby skip the register step in all your notebooks. In your .profile
or .bash_profile
, add the following and reload your environment:
export GRAPHISTRY_API_KEY="Your key"
Specify which Graphistry to reach:
graphistry.register(protocol='https', server='hub.graphistry.com')
Preconfigure private Graphistry servers to fill in this data for you.
Let's visualize relationships between the characters in Les Misérables. For this example, we'll choose Pandas to wrangle data and IGraph to run a community detection algorithm. You can view the Jupyter notebook containing this example.
Our dataset is a CSV file that looks like this:
source | target | value |
---|---|---|
Cravatte | Myriel | 1 |
Valjean | Mme.Magloire | 3 |
Valjean | Mlle.Baptistine | 3 |
Source and target are character names, and the value column counts the number of time they meet. Parsing is a one-liner with Pandas:
import pandas
links = pandas.read_csv('./lesmiserables.csv')
If you already have graph-like data, use this step. Otherwise, try the Hypergraph Transform
PyGraphistry can plot graphs directly from Pandas dataframes, IGraph graphs, or NetworkX graphs. Calling plot uploads the data to our visualization servers and return an URL to an embeddable webpage containing the visualization.
To define the graph, we bind
source and destination to the columns indicating the start and end nodes of each edges:
import graphistry
graphistry.register(protocol='https', server='hub.graphistry.com', token='YOUR_JWT_TOKEN_HERE')
g = graphistry.bind(source="source", destination="target")
g.plot(links)
You should see a beautiful graph like this one:
Let's add labels to edges in order to show how many times each pair of characters met. We create a new column called label in edge table links that contains the text of the label and we bind edge_label to it.
links["label"] = links.value.map(lambda v: "#Meetings: %d" % v)
g = g.bind(edge_label="label")
g.plot(links)
Let's size nodes based on their PageRank score and color them using their community.
IGraph already has these algorithms implemented for us. If IGraph is not already installed, fetch it with pip install python-igraph
. Warning: pip install igraph
will install the wrong package!
We start by converting our edge dateframe into an IGraph. The plotter can do the conversion for us using the source and destination bindings. Then we compute two new node attributes (pagerank & community).
ig = graphistry.pandas2igraph(links)
ig.vs['pagerank'] = ig.pagerank()
ig.vs['community'] = ig.community_infomap().membership
We can then bind the node community
and pagerank
columns to visualization attributes:
g.bind(point_color='community', point_size='pagerank').plot(ig)
See the color palette documentation for specifying color values.
To control the position, we can add .bind(point_x='colA', point_y='colB').settings(url_params={'play': 0})
(see demos and additional url parameters]). In api=1
, you created columns named x
and y
.
You may also want to bind point_title
: .bind(point_title='colA')
.
By default, edges get colored as a gradient between their source/destination node colors. You can override this by setting .bind(edge_color='colA')
, similar to how node colors function. (See color documentation.)
Similarly, you can bind the edge weight, where higher weights cause nodes to cluster closer together: .bind(edge_weight='colA')
. See tutorial.
- Create a free public data Graphistry Hub account or one-click launch a private Graphistry instance in AWS
- Check out the analyst and developer introductions, or try your own CSV
- Explore the demos folder for your favorite file format, database, API, or kind of analysis
- Graphistry UI Guide
- API docs: Bindings and colors, REST API, embedding URLs and URL parameters, dynamic JS API, and more
- Python API ReadTheDocs
- Within a notebook, you can always run
help(graphistry)
,help(graphistry.hypergraph)
, etc. - Additional Graphistry API docs, including the predefined color palette values (color brewer)