# Topological feature extraction using `VietorisRipsPersistence`

and `PersistenceEntropy`

¶

In this notebook, we showcase the ease of use of one of the core
components of `giotto-tda`

: `VietorisRipsPersistence`

, along with
vectorization methods. We first list steps in a typical,
topological-feature extraction routine and then show to encapsulate them
with a standard `scikit-learn`

–like pipeline.

If you are looking at a static version of this notebook and would like to run its contents, head over to GitHub and download the source.

**License: AGPLv3**

## Generate data¶

Let’s begin by generating 3D point clouds of spheres and tori, along
with a label of 0 (1) for each sphere (torus). We also add noise to each
point cloud, whose effect is to displace the points sampling the
surfaces by a random amount in a random direction. **Note**: You will
need the auxiliary module
generate_datasets.py
to run this cell. You can change the second argument of
`generate_point_clouds`

to obtain a finer or coarser sampling, or tune
the level of noise via the third.

```
from data.generate_datasets import make_point_clouds
n_samples_per_class = 10
point_clouds, labels = make_point_clouds(n_samples_per_class, 10, 0.1)
point_clouds.shape
print(f"There are {point_clouds.shape[0]} point clouds in {point_clouds.shape[2]} dimensions, "
f"each with {point_clouds.shape[1]} points.")
```

```
There are 30 point clouds in 3 dimensions, each with 100 points.
```

## Calculate persistent homology¶

Instantiate a `VietorisRipsPersistence`

transformer and calculate
so-called **persistence diagrams** for this collection of point clouds.

```
from gtda.homology import VietorisRipsPersistence
VR = VietorisRipsPersistence(homology_dimensions=[0, 1, 2]) # Parameter explained in the text
diagrams = VR.fit_transform(point_clouds)
diagrams.shape
```

```
(30, 176, 3)
```

**Important note**: `VietorisRipsPersistence`

, and all other
“persistent homology” transformers in `gtda.homology`

, expect input in
the form of a 3D array or, in some cases, a list of 2D arrays. For each
entry in the input (here, for each point cloud in `point_clouds`

) they
compute a topological summary which is also a 2D array, and then stack
all these summaries into a single output 3D array. So, in our case,
`diagrams[i]`

represents the topology of `point_clouds[i]`

.
`diagrams[i]`

is interpreted as follows: - Each row is a triplet
describing a single topological feature found in `point_clouds[i]`

. -
The first and second entries (respectively) in the triplet denote the
values of the “filtration parameter” at which the feature appears or
disappears respectively. They are referred to as the “birth” and “death”
values of the feature (respectively). The meaning of “filtration
parameter” depends on the specific transformer, but in the case of
`VietorisRipsPersistence`

on point clouds it has the interpretation of
a length scale. - A topological feature can be a connected component, 1D
hole/loop, 2D cavity, or more generally \(d\)-dimensional “void”
which exists in the data at scales between its birth and death values.
The integer \(d\) is the *homology dimension* (or degree) of the
feature and is stored as the third entry in the triplet. In this
example, the shapes should have 2D cavities so we explicitly tell
`VietorisRipsPersistence`

to look for these by using the
`homology_dimensions`

parameter!

If we make one scatter plot per available homology dimension, and plot
births and deaths as x- and y-coordinates of points in 2D, we end up
with a 2D representation of `diagrams[i]`

, and the reason why it is
called a persistence *diagram*:

```
from gtda.plotting import plot_diagram
i = 0
plot_diagram(diagrams[i])
```