Topological feature extraction from graphs
==========================================
``giotto-tda`` can extract topological features from undirected or
directed graphs represented as adjacency matrices, via the following
transformers:
- `VietorisRipsPersistence `__
and
`SparseRipsPersistence `__
initialized with ``metric="precomputed"``, for undirected graphs;
- `FlagserPersistence `__
initialized with ``directed=True``, for directed graphs, and with
``directed=False`` for undirected ones.
In this notebook, we build some basic intuition on how these methods are
able to capture structures and patterns from such graphs. We will focus
on the multi-scale nature of the information contained in the final
outputs (“persistence diagrams”), as well as on the differences between
the undirected and directed cases. Although adjacency matrices of
sparsely connected and even unweighted graphs can be passed directly to
these transformers, they are interpreted as *weighted* adjacency
matrices according to some non-standard conventions. We will highlight
these conventions below.
The mathematical technologies used under the hood are various flavours
of “persistent homology” (as is also the case for
`EuclideanCechPersistence `__
and
`CubicalPersistence `__).
If you are interested in the details, you can start from the `theory
glossary `__
and references therein.
If you are looking at a static version of this notebook and would like
to run its contents, head over to
`GitHub `__
and download the source.
See also
--------
- `Topological feature extraction using VietorisRipsPersistence and
PersistenceEntropy `__
which treats the “special case” of point clouds (see below).
- `Plotting in
giotto-tda `__,
particularly Section 1.2 (as above, treats the case of point clouds).
- `Classifying 3D
shapes `__
(a more advanced example).
- `Computing persistent homology of directed flag
complexes `__ by Daniel
Luetgehetmann, Dejan Govc, Jason Smith, and Ran Levi.
**License: AGPLv3**
Import libraries
----------------
.. code:: ipython3
import numpy as np
from numpy.random import default_rng
rng = default_rng(42) # Create a random number generator
from scipy.spatial.distance import pdist, squareform
from scipy.sparse import csr_matrix
from gtda.graphs import GraphGeodesicDistance
from gtda.homology import VietorisRipsPersistence, SparseRipsPersistence, FlagserPersistence
from igraph import Graph
from IPython.display import SVG, display
Undirected graphs – ``VietorisRipsPersistence`` and ``SparseRipsPersistence``
-----------------------------------------------------------------------------
General API
~~~~~~~~~~~
If you have a collection ``X`` of adjacency matrices of graphs, you can
instantiate transformers of class ``VietorisRipsPersistence`` or
``SparseRipsPersistence`` by setting the parameter ``metric`` as
``"precomputed"``, and then call ``fit_transform`` on \`\ ``X`` to
obtain the corresponding collection of *persistence diagrams* (see
**Understanding the computation** below for an explanation).
In the case of ``VietorisRipsPersistence``, ``X`` can be a list of
sparse or dense matrices, and a basic example of topological feature
extraction would look like this:
::
# Instantiate topological transformer
VR = VietorisRipsPersistence(metric="precomputed")
# Compute persistence diagrams corresponding to each graph in X
diagrams = VR.fit_transform(X)
Each entry in the result can be plotted as follows (where we plot the
0th entry, i.e. ``diagrams[0]``):
::
VR.plot(diagrams, sample=0)
*Note*: ``SparseRipsPersistence`` implements an approximate scheme for
computing the same topological quantities as
``VietorisRipsPersistence``. This can be useful for speeding up the
computation on large inputs, but we will not demonstrate its use in this
notebook.
Fully connected and weighted
~~~~~~~~~~~~~~~~~~~~~~~~~~~~
We now turn to the case of fully connected and weighted (FCW) undirected
graphs. In this case, the input should be a list of 2D arrays or a
single 3D array. Here is a simple example:
.. code:: ipython3
# Create a single weighted adjacency matrix of a FCW graph
n_vertices = 10
x = rng.random((n_vertices, n_vertices))
# Fill the diagonal with zeros (not always necessary, see below)
np.fill_diagonal(x, 0)
# Create a trivial collection of weighted adjacency matrices, containing x only
X = [x]
# Instantiate topological transformer
VR = VietorisRipsPersistence(metric="precomputed")
# Compute persistence diagrams corresponding to each entry (only one here) in X
diagrams = VR.fit_transform(X)
print(f"diagrams.shape: {diagrams.shape} ({diagrams.shape[1]} topological features)")
.. parsed-literal::
diagrams.shape: (1, 15, 3) (15 topological features)
Non-fully connected weighted graphs
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
In ``giotto-tda``, a non-fully connected weighted graph can be
represented by an adjacency matrix in one of two possible forms: - a
dense square array with ``np.inf`` in position :math:`ij` if the edge
between vertex :math:`i` and vertex :math:`j` is absent. - a sparse
matrix in which the non-stored edge weights represent absent edges.
**Important notes** - A ``0`` in a dense array, or an explicitly stored
``0`` in a sparse matrix, does *not* denote an absent edge. It denotes
an edge with weight 0, which, in a sense, means the complete opposite!
See the section **Understanding the computation** below. - Dense Boolean
arrays are first converted to numerical ones and then interpreted as
adjacency matrices of FCW graphs. ``False`` values therefore should not
be used to represent absent edges.
Understanding the computation
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
To understand what these persistence diagrams are telling us about the
input weighted graphs, we briefly explain the **clique complex (or flag
complex) filtration** procedure underlying the computations in
``VietorisRipsPersistence`` when ``metric="precomputed"``, via an
example.
Let us start with a special case of a weighted graph with adjacency
matrix as follows:
- the diagonal entries (“vertex weights”) are all zero;
- all off-diagonal entries (edge weights) are non-negative;
- some edge weights are infinite (or very very large).
We can lay such a graph on the plane to visualise it, drawing only the
finite edges:
.. figure:: images/weighted_graph.png
:alt: A weighted graph
A weighted graph
The procedure can be explained as follows: we let a parameter
:math:`\varepsilon` start at 0, and as we increase it all the way to
infinity we keep considering the instantaneous subgraphs made of a) all
the vertices in the original graph, and b) those edges whose weight is
less than or equal to the current :math:`\varepsilon`. We also promote
these subgraphs to more general structures called **(simplicial)
complexes** that, alongside vertices and edges, also possess
:math:`k`\ **-simplices**, i.e. selected subsets of :math:`k + 1`
vertices (a 2-simplex is an abstract “triangle”, a 3-simplex an abstract
“tetrahedron”, etc). Our criterion is this: for each integer
:math:`k \geq 2`, all :math:`(k + 1)`-cliques in each instantaneous
subgraph are declared to be the :math:`k`-simplices of the subgraph’s
associated complex. By definition, the :math:`0`-simplices are the
vertices and the :math:`1`-simplices are the available edges.
As :math:`\varepsilon` increases from 0 (included) to infinity, we
record the following information:
1. How many new **connected components** are created because of the
appearance of vertices (in this example, all vertices “appear” in one
go at :math:`\varepsilon = 0`, by definition!), or merge because of
the appearance of new edges.
2. How many new 1-dimensional “holes”, 2-dimensional “cavities”, or more
generally :math:`d`-dimensional **voids** are created in the
instantaneous complex. A hole, cavity, or :math:`d`-dimensional void
is such only if there is no collection of “triangles”, “tetrahedra”,
or :math:`(d + 1)`-simplices which the void is the “boundary” of.
*Note*: Although the edges of a triangle *alone* “surround a hole”,
these cannot occur in our particular construction because the
“filling” triangle is also declared present in the complex when all
its edges are.
3. How many :math:`d`-dimensional voids which were present at earlier
values of :math:`\epsilon` are “filled” by :math:`(d + 1)`-simplices
which just appear.
This process of recording the full topological history of the graph
across all edge weights is called (Vietoris-Rips) **persistent
homology**.
Let us start at :math:`\varepsilon = 0`: Some edges had zero weight in
our graph, so they already appear!
.. figure:: images/clique_complex_0_small.png
:alt: :math:`\varepsilon = 0`
:math:`\varepsilon = 0`
There are 9 connected components, and nothing much else.
Up to and including :math:`\varepsilon = 2`, a few more edges are added
which make some of the connected components merge together but do not
create any hole, triangles, or higher cliques. Let us look at
:math:`\varepsilon = 3`:
.. figure:: images/clique_complex_3_small.png
:alt: :math:`\varepsilon = 3`
:math:`\varepsilon = 3`
The newly arrived edges reduce the number of connected components
further, but more interestingly they create a 1D hole!
As an example of a “higher”-simplex, at :math:`\varepsilon = 4` we get
our first triangle:
.. figure:: images/clique_complex_4_small.png
:alt: :math:`\varepsilon = 4`
:math:`\varepsilon = 4`
At :math:`\varepsilon = 5`, our 1D hole is filled:
.. figure:: images/clique_complex_5_small.png
:alt: :math:`\varepsilon = 5`
:math:`\varepsilon = 5`
At :math:`\varepsilon = 8`, two new 1D holes appear:
.. figure:: images/clique_complex_8_small.png
:alt: :math:`\varepsilon = 8`
:math:`\varepsilon = 8`
Finally, at :math:`\varepsilon = 9`, some more connected components
merge, but no new voids are either created or destroyed:
.. figure:: images/clique_complex_9_small.png
:alt: :math:`\varepsilon = 9`
:math:`\varepsilon = 9`
We can stop as we have reached the maximum value of :math:`\varepsilon`,
beyond which nothing will change: there is only one connected component
left, but there are also two 1D holes which will never get filled.
Fit-transforming via ``VietorisRipsPersistence(metric="precomputed")``
on the original graph’s adjacency matrix would return the following 3D
array of **persistence diagrams**:
.. code:: ipython3
diagrams = np.array([[[0., 1., 0],
[0., 2., 0],
[0., 2., 0],
[0., 3., 0],
[0., 4., 0],
[0., 5., 0],
[0., 6., 0],
[0., 7., 0],
[3., 5., 1],
[8., np.inf, 1],
[8., np.inf, 1]]])
The notebook `Topological feature extraction using
VietorisRipsPersistence and
PersistenceEntropy `__
explains how to interpret this output and how to make informative 2D
scatter plots out of its entries. Here, we only have one entry
corresponding to our graph:
.. code:: ipython3
from gtda.plotting import plot_diagram
plot_diagram(diagrams[0])
.. raw:: html