# Scaler¶

class gtda.diagrams.Scaler(metric='bottleneck', metric_params=None, function=<function amax>, n_jobs=None)[source]

Linear scaling of persistence diagrams.

A positive scale factor scale_ is calculated during fit by considering all available persistence diagrams partitioned according to homology dimensions. During transform, all birth-death pairs are divided by scale_.

The value of scale_ depends on two things:

• A way of computing, for each homology dimension, the amplitude in that dimension of a persistence diagram consisting of birth-death-dimension triples [b, d, q]. Together, metric and metric_params define this in the same way as in Amplitude.

• A scalar-valued function which is applied to the resulting two-dimensional array of amplitudes (one per diagram and homology dimension) to obtain scale_.

Important note:

• Input collections of persistence diagrams for this transformer must satisfy certain requirements, see e.g. fit.

Parameters
• metric ('bottleneck' | 'wasserstein' | 'betti' | 'landscape' |'silhouette' | 'heat' | 'persistence_image', optional, default: 'bottleneck') – See the corresponding parameter in Amplitude.

• metric_params (dict or None, optional, default: None) – See the corresponding parameter in Amplitude.

• function (callable, optional, default: numpy.max) – Function used to extract a positive scalar from the collection of amplitude vectors in fit. Must map 2D arrays to scalars.

• n_jobs (int or None, optional, default: None) – The number of jobs to use for the computation. None means 1 unless in a joblib.parallel_backend context. -1 means using all processors.

effective_metric_params_

Dictionary containing all information present in metric_params as well as relevant quantities computed in fit.

Type

dict

homology_dimensions_

Homology dimensions seen in fit, sorted in ascending order.

Type

tuple

scale_

Value by which to rescale diagrams.

Type

float

Notes

When metric is 'bottleneck' and function is numpy.max, fit_transform has the effect of making the lifetime of the most persistent point across all diagrams and homology dimensions equal to 2.

To compute scaling factors without first splitting the computation between different homology dimensions, data should be first transformed by an instance of ForgetDimension.

__init__(metric='bottleneck', metric_params=None, function=<function amax>, n_jobs=None)[source]

Initialize self. See help(type(self)) for accurate signature.

fit(X, y=None)[source]

Store all observed homology dimensions in homology_dimensions_ and compute scale_. Then, return the estimator.

Parameters
• X (ndarray of shape (n_samples, n_features, 3)) – Input data. Array of persistence diagrams, each a collection of triples [b, d, q] representing persistent topological features through their birth (b), death (d) and homology dimension (q). It is important that, for each possible homology dimension, the number of triples for which q equals that homology dimension is constants across the entries of X.

• y (None) – There is no need for a target in a transformer, yet the pipeline API requires this parameter.

Returns

self

Return type

object

fit_transform(X, y=None, **fit_params)

Fit to data, then transform it.

Fits transformer to X and y with optional parameters fit_params and returns a transformed version of X.

Parameters
• X (ndarray of shape (n_samples, n_features, 3)) – Input data. Array of persistence diagrams, each a collection of triples [b, d, q] representing persistent topological features through their birth (b), death (d) and homology dimension (q). It is important that, for each possible homology dimension, the number of triples for which q equals that homology dimension is constants across the entries of X.

• y (None) – There is no need for a target in a transformer, yet the pipeline API requires this parameter.

Returns

Xs – Rescaled diagrams.

Return type

ndarray of shape (n_samples, n_features, 3)

fit_transform_plot(X, y=None, sample=0, **plot_params)

Fit to data, then apply transform_plot.

Parameters
• X (ndarray of shape (n_samples, ..)) – Input data.

• y (ndarray of shape (n_samples,) or None) – Target values for supervised problems.

• sample (int) – Sample to be plotted.

• **plot_params – Optional plotting parameters.

Returns

Xt – Transformed one-sample slice from the input.

Return type

ndarray of shape (1, ..)

get_params(deep=True)

Get parameters for this estimator.

Parameters

deep (bool, default=True) – If True, will return the parameters for this estimator and contained subobjects that are estimators.

Returns

params – Parameter names mapped to their values.

Return type

mapping of string to any

inverse_transform(X)[source]

Scale back the data to the original representation. Multiplies by the scale found in fit.

Parameters

X (ndarray of shape (n_samples, n_features, 3)) – Data to apply the inverse transform to, c.f. transform.

Returns

Xs – Rescaled diagrams.

Return type

ndarray of shape (n_samples, n_features, 3)

plot(Xt, sample=0, homology_dimensions=None, plotly_params=None)[source]

Plot a sample from a collection of persistence diagrams, with homology in multiple dimensions.

Parameters
• Xt (ndarray of shape (n_samples, n_points, 3)) – Collection of persistence diagrams, such as returned by transform.

• sample (int, optional, default: 0) – Index of the sample in Xt to be plotted.

• homology_dimensions (list, tuple or None, optional, default: None) – Which homology dimensions to include in the plot. None is equivalent to passing homology_dimensions_.

• plotly_params (dict or None, optional, default: None) – Custom parameters to configure the plotly figure. Allowed keys are "traces" and "layout", and the corresponding values should be dictionaries containing keyword arguments as would be fed to the update_traces and update_layout methods of plotly.graph_objects.Figure.

Returns

fig – Plotly figure.

Return type

plotly.graph_objects.Figure object

set_params(**params)

Set the parameters of this estimator.

The method works on simple estimators as well as on nested objects (such as pipelines). The latter have parameters of the form <component>__<parameter> so that it’s possible to update each component of a nested object.

Parameters

**params (dict) – Estimator parameters.

Returns

self – Estimator instance.

Return type

object

transform(X, y=None)[source]

Divide all birth and death values in X by scale_.

Parameters
• X (ndarray of shape (n_samples, n_features, 3)) – Input data. Array of persistence diagrams, each a collection of triples [b, d, q] representing persistent topological features through their birth (b), death (d) and homology dimension (q). It is important that, for each possible homology dimension, the number of triples for which q equals that homology dimension is constants across the entries of X.

• y (None) – There is no need for a target in a transformer, yet the pipeline API requires this parameter.

Returns

Xs – Rescaled diagrams.

Return type

ndarray of shape (n_samples, n_features, 3)

transform_plot(X, sample=0, **plot_params)

Take a one-sample slice from the input collection and transform it. Before returning the transformed object, plot the transformed sample.

Parameters
• X (ndarray of shape (n_samples, ..)) – Input data.

• sample (int) – Sample to be plotted.

• **plot_params – Optional plotting parameters.

Returns

Xt – Transformed one-sample slice from the input.

Return type

ndarray of shape (1, ..)