OneDimensionalCover

class gtda.mapper.OneDimensionalCover(kind='uniform', n_intervals=10, overlap_frac=0.1)[source]

Cover of one-dimensional data coming from open overlapping intervals.

In fit, given a training array X representing a collection of real numbers, a cover of the real line by open intervals \(I_k = (a_k, b_k)\) (\(k = 1, \ldots, n\), \(a_k < a_{k+1}\), \(b_k < b_{k+1}\)) is constructed based on the distribution of values in X. In transform, the cover is applied to a new array X’ to yield a cover of X’.

All covers constructed in fit have \(a_1 = -\infty\) and \(b_n = + \infty\). Two kinds of cover are currently available: “uniform” and “balanced”. A uniform cover is such that \(b_1 - m = b_2 - a_2 = \cdots = M - a_n\) where \(m\) and \(M\) are the minimum and maximum values in X respectively. A balanced cover is such that approximately the same number of unique values from X is contained in each cover interval.

Parameters
  • kind ('uniform' | 'balanced', optional, default: 'uniform') – The kind of cover to use.

  • n_intervals (int, optional, default: 10) – The number of intervals in the cover calculated in fit.

  • overlap_frac (float, optional, default: 0.1) – If the cover is uniform, this is the ratio between the length of the intersection between consecutive intervals and the length of each interval. If the cover is balanced, this is the analogous fractional overlap for a uniform cover of the closed interval \((0.5, N + 0.5)\) where \(N\) is the number of unique values in the training array (see the Notes).

left_limits_

Left limits of the cover intervals computed in fit. See the Notes.

Type

ndarray of shape (n_intervals,)

right_limits_

Right limits of the cover intervals computed in fit. See the Notes.

Type

ndarray of shape (n_intervals,)

Notes

In the case of a balanced cover, left_limits_ and right_limits_ are computed as follows given a training array X: first, entries in X are ranked in ascending order, starting at 1 and with the same rank repeated in the case of equal values; then, the closed interval \((0.5, N + 0.5)\), where \(N\) is the maximum rank observed, is covered uniformly with parameters n_intervals and overlap_frac, yielding intervals \((\alpha_k, \beta_k)\); the final cover is made of intervals \((a_k, b_k)\) where, for \(k > 1\) (resp. \(k < \)), \(a_k\) (resp. \(b_k\)) is the value of any entry in X ranked as the floor ( resp. ceiling) of \(\alpha_k\) (resp. \(\beta_k\)).

See also

CubicalCover

__init__(kind='uniform', n_intervals=10, overlap_frac=0.1)[source]

Initialize self. See help(type(self)) for accurate signature.

fit(X, y=None)[source]

Compute all cover interval limits according to X and store them in left_limits_ and right_limits_. Then, return the estimator.

This method is here to implement the usual scikit-learn API and hence work in pipelines.

Parameters
  • X (ndarray of shape (n_samples,) or (n_samples, 1)) – Input data.

  • y (None) – There is no need for a target in a transformer, yet the pipeline API requires this parameter.

Returns

self

Return type

object

fit_transform(X, y=None, **fit_params)[source]

Fit to data, then transform it.

Fits transformer to X and y with optional parameters fit_params and returns a transformed version of X.

Parameters
  • X (ndarray of shape (n_samples,) or (n_samples, 1)) – Input data.

  • y (None) – There is no need for a target in a transformer, yet the pipeline API requires this parameter.

Returns

Xt – Encoding of the cover of X as a boolean array. In general, n_cover_sets is less than or equal to n_intervals as empty or duplicated cover sets are removed.

Return type

ndarray of shape (n_samples, n_cover_sets)

get_fitted_intervals()[source]

Returns the open intervals computed in fit, as a list of tuples (a, b) where a < b.

get_params(deep=True)

Get parameters for this estimator.

Parameters

deep (bool, default=True) – If True, will return the parameters for this estimator and contained subobjects that are estimators.

Returns

params – Parameter names mapped to their values.

Return type

mapping of string to any

set_params(**params)

Set the parameters of this estimator.

The method works on simple estimators as well as on nested objects (such as pipelines). The latter have parameters of the form <component>__<parameter> so that it’s possible to update each component of a nested object.

Parameters

**params (dict) – Estimator parameters.

Returns

self – Estimator instance.

Return type

object

transform(X, y=None)[source]

Compute a cover of X according to the cover of the real line computed in fit, and return it as a two-dimensional boolean array. Each column indicates the location of entries in X belonging to a common cover interval.

Parameters
  • X (ndarray of shape (n_samples,) or (n_samples, 1)) – Input data.

  • y (None) – There is no need for a target in a transformer, yet the pipeline API requires this parameter.

Returns

Xt – Encoding of the cover of X as a boolean array. In general, n_cover_sets is less than or equal to n_intervals as empty or duplicated cover sets are removed.

Return type

ndarray of shape (n_samples, n_cover_sets)