The Lorenz attractor
====================
This notebook contains a full TDA pipeline to analyse the transitions of
the Lorenz system to a chaotic regime from the stable one and viceversa.
If you are looking at a static version of this notebook and would like
to run its contents, head over to
`GitHub `__
and download the source.
See also
--------
- `Topology of time
series `__,
in which the *Takens embedding* technique used here is explained in
detail and illustrated via simple examples.
- `Gravitational waves
detection `__,
where,following
`arXiv:1910.08245 `__, the Takens
embedding technique is shown to be effective for the detection of
gravitational waves signals buried in background noise.
- `Topological feature extraction using VietorisRipsPersistence and
PersistenceEntropy `__
for a quick introduction to general topological feature extraction in
``giotto-tda``.
**License: AGPLv3**
Import libraries
----------------
The first step consists in importing relevant *gtda* components and
other useful libraries or modules.
.. code:: ipython3
# Import the gtda modules
from gtda.time_series import Resampler, SlidingWindow, takens_embedding_optimal_parameters, \
TakensEmbedding, PermutationEntropy
from gtda.homology import WeakAlphaPersistence, VietorisRipsPersistence
from gtda.diagrams import Scaler, Filtering, PersistenceEntropy, BettiCurve, PairwiseDistance
from gtda.graphs import KNeighborsGraph, GraphGeodesicDistance
from gtda.pipeline import Pipeline
import numpy as np
from sklearn.metrics import pairwise_distances
import plotly.express as px
from plotly.offline import init_notebook_mode, iplot
init_notebook_mode(connected=True)
# gtda plotting functions
from gtda.plotting import plot_heatmap
# Import data from openml
import openml
.. raw:: html
.. code:: ipython3
# Plotting functions
from gtda.plotting import plot_point_cloud
Setting up the Lorenz attractor simulation
------------------------------------------
In the next block we set up all the parameters of the Lorenz system and
we define also the instants at which the regime (stable VS chaotic)
changes.
.. code:: ipython3
# Plotting the trajectories of the Lorenz system
from openml.datasets.functions import get_dataset
point_cloud = get_dataset(42182).get_data(dataset_format='array')[0]
plot_point_cloud(point_cloud)
.. raw:: html
.. code:: ipython3
# Selecting the z-axis and the label rho
X = point_cloud[:, 2]
y = point_cloud[:, 3]
.. code:: ipython3
fig = px.line(title='Trajectory of the Lorenz solution, projected along the z-axis')
fig.add_scatter(y=X, name='X')
fig.add_scatter(y=y, name='y')
fig.show()
.. raw:: html
Resampling the time series
--------------------------
It is important to find the correct time scale at which key signals take
place. Here we propose one possible resampling period: *10h*. Recall
that the unit time is *1h*. The resampler method is used to perform the
resampling.
.. code:: ipython3
period = 10
periodicSampler = Resampler(period=period)
X_sampled, y_sampled = periodicSampler.fit_transform_resample(X, y)
.. code:: ipython3
fig = px.line(title='Trajectory of the Lorenz solution, projected along the z-axis and resampled every 10h')
fig.add_scatter(y=X_sampled.flatten(), name='X_sampled')
fig.add_scatter(y=y_sampled, name='y_sampled')
fig.show()
.. raw:: html
Takens Embedding
----------------
In order to obtain meaningful topological features from a time series,
we use a *time-delay embedding* technique named after F. Takens who used
it in the 1960s in his foundational work on dynamical systems.
The idea is simple: given a time series :math:`X(t)`, one can extract a
sequence of vectors of the form
:math:`X_i := [(X(t_i)), X(t_i + 2 \tau), ..., X(t_i + M \tau)]`. The
difference between :math:`t_i` and :math:`t_{i-1}` is called *stride*.
:math:`M` and :math:`\tau` are optimized automatically in this example
according to known heuristics implemented in ``giotto-tda`` in the
``takens_embedding_optimal_parameters`` function. They can also be set
by hand if preferred.
.. code:: ipython3
max_time_delay = 3
max_embedding_dimension = 10
stride = 1
optimal_time_delay, optimal_embedding_dimension = takens_embedding_optimal_parameters(
X_sampled, max_time_delay, max_embedding_dimension, stride=stride
)
print(f"Optimal embedding time delay based on mutual information: {optimal_time_delay}")
print(f"Optimal embedding dimension based on false nearest neighbors: {optimal_embedding_dimension}")
.. parsed-literal::
Optimal embedding time delay based on mutual information: 3
Optimal embedding dimension based on false nearest neighbors: 10
Having computed reasonable values for the parameters by looking at the
whole time series, we can now perform the embedding procedure (which
transforms a single time series into a single point cloud) on local
sliding windows over the data. The result of this will be a “time series
of point clouds” with possibly interesting topologies, which we will be
able to feed directly to our homology transformers.
We first construct sliding windows using ``SlidingWindow``
transformer-resampler, and then use the ``TakensEmbedding`` transformer
to perform the embedding in parallel on each window, using the
parameters ``optimal_time_delay`` and ``optimal_embedding_dimension``
found above.
.. code:: ipython3
window_size = 41
window_stride = 5
SW = SlidingWindow(size=window_size, stride=window_stride)
X_windows, y_windows = SW.fit_transform_resample(X_sampled, y_sampled)
TE = TakensEmbedding(time_delay=optimal_time_delay, dimension=optimal_embedding_dimension, stride=stride)
X_embedded = TE.fit_transform(X_windows)
We can plot the Takens embedding of a specific window either by using
``plot_point_cloud``, or by using the ``plot`` method of
``SlidingWindow``, as shown below.
*Note*: only the first three coordinates are plotted!
.. code:: ipython3
window_number = 3
TE.plot(X_embedded, sample=window_number)
.. raw:: html
For comparison, here is the portion of time series containing the data
which originates this point cloud. Notice the quasi-periodicity,
corresponding to the loop in the point cloud.
.. code:: ipython3
embedded_begin, embedded_end = SW.slice_windows(X_windows)[window_number]
window_indices = np.arange(embedded_begin, embedded_end + optimal_time_delay * (optimal_embedding_dimension - 1))
fig = px.line(title=f"Resampled Lorenz solution over sliding window {window_number}")
fig.add_scatter(x=window_indices, y=X_sampled[window_indices], name="X_sampled")
fig.show()
.. raw:: html
Persistence diagram
-------------------
The topological information in the embedding is synthesised via the
persistence diagram. The horizontal axis corresponds to the moment in
which a homological generator is born, while the vertical axis
corresponds to the moments in which a homological generator dies. The
generators of the homology groups (at given rank) are colored
differently.
.. code:: ipython3
homology_dimensions = (0, 1, 2)
WA = WeakAlphaPersistence(homology_dimensions=homology_dimensions)
X_diagrams = WA.fit_transform(X_embedded)
We can plot the persistence diagram for the embedding of the same
sliding window as before:
.. code:: ipython3
WA.plot(X_diagrams, sample=window_number)
.. raw:: html
Scikit-learn–style pipeline
---------------------------
One of the advantages of ``giotto-tda`` is the compatibility with
``scikit-learn``. It is possible to set up and run a full pipeline such
as the one above in a few lines:
.. code:: ipython3
# Steps of the Pipeline
steps = [('sampling', periodicSampler),
('window', SW),
('embedding', TE),
('diagrams', WA)]
# Define the Pipeline
pipeline = Pipeline(steps)
# Run the pipeline
X_diagrams = pipeline.fit_transform(X)
The final result is the same as before:
.. code:: ipython3
pipeline[-1].plot(X_diagrams, sample=window_number)
.. raw:: html
Rescaling the diagram
---------------------
By default, rescaling a diagram via ``Scaler`` means normalizing points
such that the maximum “bottleneck distance” from the *empty diagram*
(across all homology dimensions) is equal to 1. Notice that this means
the birth and death scales are modified. We can do this as follows:
.. code:: ipython3
diagramScaler = Scaler()
X_scaled = diagramScaler.fit_transform(X_diagrams)
diagramScaler.plot(X_scaled, sample=window_number)
.. raw:: html
Filtering diagrams
------------------
Filtering a diagram means eliminating the homology generators whose
lifespan is considered too short to be significant. We can use
``Filtering`` as follows:
.. code:: ipython3
diagramFiltering = Filtering(epsilon=0.1, homology_dimensions=(1, 2))
X_filtered = diagramFiltering.fit_transform(X_scaled)
diagramFiltering.plot(X_filtered, sample=window_number)
.. raw:: html
We can add the steps above to our pipeline:
.. code:: ipython3
steps_new = [
('scaler', diagramScaler),
('filtering', diagramFiltering)
]
pipeline_filter = Pipeline(steps + steps_new)
X_filtered = pipeline_filter.fit_transform(X)
Persistence entropy
-------------------
The *entropy* of persistence diagrams can be calculated via
``PersistenceEntropy``:
.. code:: ipython3
PE = PersistenceEntropy()
X_persistence_entropy = PE.fit_transform(X_scaled)
.. code:: ipython3
fig = px.line(title='Persistence entropies, indexed by sliding window number')
for dim in range(X_persistence_entropy.shape[1]):
fig.add_scatter(y=X_persistence_entropy[:, dim], name=f"PE in homology dimension {dim}")
fig.show()
.. raw:: html
Betti Curves
------------
The Betti curves of a persistence diagram can be computed and plotted
using ``BettiCurve``:
.. code:: ipython3
BC = BettiCurve()
X_betti_curves = BC.fit_transform(X_scaled)
BC.plot(X_betti_curves, sample=window_number)
.. raw:: html
Distances among diagrams
------------------------
In this section we show how to compute several notions of distances
among persistence diagrams.
In each case, we will obtain distance matrices whose i-th row encodes
the distance of the i-th diagram from all the others.
We start with the so-called “landscape :math:`L^2` distance”: when
``order`` is ``None``, the output is one distance matrix per sample and
homology dimension.
.. code:: ipython3
p_L = 2
n_layers = 5
PD = PairwiseDistance(metric='landscape',
metric_params={'p': p_L, 'n_layers': n_layers, 'n_bins': 1000},
order=None)
X_distance_L = PD.fit_transform(X_diagrams)
X_distance_L.shape
.. parsed-literal::
(91, 91, 3)
This is what distances in homology dimension 0 look like:
.. code:: ipython3
plot_heatmap(X_distance_L[:, :, 0], colorscale='blues')
.. raw:: html
We now change metric and compute the “:math:`2`-Wasserstein distances”
between the diagrams. This one takes longer!
.. code:: ipython3
p_W = 2
PD = PairwiseDistance(metric='wasserstein',
metric_params={'p': p_W, 'delta': 0.1},
order=None)
X_distance_W = PD.fit_transform(X_diagrams)
And again this is what distances in homology dimension 0 look like:
.. code:: ipython3
plot_heatmap(X_distance_W[:, :, 0], colorscale='blues')
.. raw:: html
Notice that how dramatically things can change when the metrics are
modified.
New distances in the embedding space: kNN graphs and geodesic distances
-----------------------------------------------------------------------
We propose here a new way to compute distances between points in the
embedding space. Instead of considering the Euclidean distance in the
Takens space, we propose to build a :math:`k`-nearest neighbors graph
and then use the geodesic distance on such graph.
.. code:: ipython3
n_neighbors = 2
kNN = KNeighborsGraph(n_neighbors=n_neighbors)
X_kNN = kNN.fit_transform(X_embedded)
Given the graph embedding, the natural notion of distance between
vertices corresponds to the lengths of the shortest path connecting two
vertices. This is also known as *graph geodesic distance*. We compute it
and plot it for our chosen window number.
.. code:: ipython3
GGD = GraphGeodesicDistance()
GGD.fit_transform_plot(X_kNN, sample=window_number);
.. raw:: html
.. parsed-literal::
array([[[ 0., 1., 2., 2., 1., 2., 3., 4., 5., 6., 7., 8.,
9., 9.],
[ 1., 0., 1., 2., 2., 3., 4., 5., 6., 7., 8., 9.,
10., 10.],
[ 2., 1., 0., 1., 2., 3., 4., 5., 6., 7., 8., 9.,
10., 10.],
[ 2., 2., 1., 0., 1., 2., 3., 4., 5., 6., 7., 8.,
9., 9.],
[ 1., 2., 2., 1., 0., 1., 2., 3., 4., 5., 6., 7.,
8., 8.],
[ 2., 3., 3., 2., 1., 0., 1., 2., 3., 4., 5., 6.,
7., 7.],
[ 3., 4., 4., 3., 2., 1., 0., 1., 2., 3., 4., 5.,
6., 6.],
[ 4., 5., 5., 4., 3., 2., 1., 0., 1., 2., 3., 4.,
5., 5.],
[ 5., 6., 6., 5., 4., 3., 2., 1., 0., 1., 2., 3.,
4., 4.],
[ 6., 7., 7., 6., 5., 4., 3., 2., 1., 0., 1., 2.,
3., 3.],
[ 7., 8., 8., 7., 6., 5., 4., 3., 2., 1., 0., 1.,
2., 2.],
[ 8., 9., 9., 8., 7., 6., 5., 4., 3., 2., 1., 0.,
1., 1.],
[ 9., 10., 10., 9., 8., 7., 6., 5., 4., 3., 2., 1.,
0., 1.],
[ 9., 10., 10., 9., 8., 7., 6., 5., 4., 3., 2., 1.,
1., 0.]]])
For comparison, this is what the ordinary pairwise Euclidean distance
matrix looks like for the same window:
.. code:: ipython3
plot_heatmap(pairwise_distances(X_embedded[window_number]), colorscale='blues')
.. raw:: html
This is what the first few steps (before scaling and filtering) of the
pipeline described above would be if you’d like persistence diagrams to
be obtained using this new distance instead.
*Note*: ``WeakAlphaPersistence`` cannot be used now as it needs point
cloud input. We can use instead an instance of
``VietorisRipsPersistence``, but we have to take care to pass
``metric='precomputed'`` to the constructor!
.. code:: ipython3
# Steps of the Pipeline
steps = [
('sampling', periodicSampler),
('window', SW),
('embedding', TE),
('kNN_graph', kNN),
('graph_geo_distance', GGD),
('diagrams', VietorisRipsPersistence(metric='precomputed',
homology_dimensions=homology_dimensions))
]
# Define the Pipeline
pipeline = Pipeline(steps)
# Run the pipeline
X_diagrams = pipeline.fit_transform(X)
Let’s look at the persistence diagrams obtained with this new distance:
.. code:: ipython3
pipeline[-1].plot(X_diagrams, sample=window_number)
.. raw:: html