CubicalCover

class gtda.mapper.CubicalCover(kind='uniform', n_intervals=10, overlap_frac=0.1)[source]

Cover of multi-dimensional data coming from overlapping hypercubes (technically, parallelopipeds) given by taking products of one-dimensional intervals.

In fit, OneDimensionalCover objects are fitted independently on each column of the input array, according to the same parameters passed to the constructor. For example, if the CubicalCover object is instantiated with kind='uniform', n_intervals=10 and overlap_frac=0.1, then each column of the input array is used to construct a cover of the real line by 10 equal-length intervals with fractional overlap of 0.1. Each element of the resulting multi-dimensional cover of Euclidean space is of the form \(I_{i, \ldots, k} = I^{(0)}_i \times \cdots \times I^{(d-1)}_k\) where \(d\) is the number of columns in the input array, and \(I^{(l)}_j\) is the \(j\). In transform, the cover is applied to a new array X’ to yield a cover of X’.

Parameters
  • kind ('uniform' | 'balanced', optional, default: 'uniform') – The kind of cover to use.

  • n_intervals (int, optional, default: 10) – The number of intervals in the covers of each feature dimension calculated in fit.

  • overlap_frac (float, optional, default: 0.1) – The fractional overlap between consecutive intervals in the covers of each feature dimension calculated in fit.

__init__(kind='uniform', n_intervals=10, overlap_frac=0.1)[source]

Initialize self. See help(type(self)) for accurate signature.

fit(X, y=None)[source]

Compute all open cover parallelopipeds according to X, as products of one-dimensional intervals covering each feature dimension separately. Then, return the estimator.

This method is here to implement the usual scikit-learn API and hence work in pipelines.

Parameters
  • X (ndarray of shape (n_samples, n_features)) – Input data.

  • y (None) – There is no need for a target in a transformer, yet the pipeline API requires this parameter.

Returns

self

Return type

object

fit_transform(X, y=None, **fit_params)[source]

Fit to data, then transform it.

Fits transformer to X and y with optional parameters fit_params and returns a transformed version of X.

Parameters
  • X (ndarray of shape (n_samples, n_features)) – Input data.

  • y (None) – There is no need for a target in a transformer, yet the pipeline API requires this parameter.

Returns

Xt – Encoding of the cover of X as a boolean array. In general, n_cover_sets is less than or equal to n_intervals * n_features` as empty or duplicated cover sets are removed.

Return type

ndarray of shape (n_samples, n_cover_sets)

get_params(deep=True)

Get parameters for this estimator.

Parameters

deep (bool, default=True) – If True, will return the parameters for this estimator and contained subobjects that are estimators.

Returns

params – Parameter names mapped to their values.

Return type

mapping of string to any

set_params(**params)

Set the parameters of this estimator.

The method works on simple estimators as well as on nested objects (such as pipelines). The latter have parameters of the form <component>__<parameter> so that it’s possible to update each component of a nested object.

Parameters

**params (dict) – Estimator parameters.

Returns

self – Estimator instance.

Return type

object

transform(X, y=None)[source]

Compute a cover of X according to the cover of Euclidean space computed in fit, and return it as a two-dimensional boolean array whose each column indicates the location of entries in X belonging to a common cover interval.

Parameters
  • X (ndarray of shape (n_samples, n_features)) – Input data.

  • y (None) – There is no need for a target in a transformer, yet the pipeline API requires this parameter.

Returns

Xt – Encoding of the cover of X as a boolean array. In general, n_cover_sets is less than or equal to n_intervals * n_features` as empty or duplicated cover sets are removed.

Return type

ndarray of shape (n_samples, n_cover_sets)