StandardFeatures

class gtda.curves.StandardFeatures(function='max', function_params=None, n_jobs=None)[source]

Standard features from multi-channel curves.

A multi-channel (integer sampled) curve is a 2D array of shape (n_channels, n_bins), where each row represents the y-values in one of the channels. This transformer applies scalar or vector-valued functions channel-wise to extract features from each multi-channel curve in a collection. The output is always a 2D array such that row i is the concatenation of the outputs of the chosen functions on the channels in the i-th (multi-)curve in the collection.

Parameters
  • function (string, callable, list or tuple, optional, default: "max") – Function or list/tuple of functions to apply to each channel of each multi-channel curve. Functions can map to scalars or to 1D arrays. If a string (see below) or a callable, then the same function is applied to all channels. Otherwise, function is a list/tuple of the same length as the number of entries along axis 1 in the collection passed to fit. Lists/tuples may contain allowed strings (see below), callables, and None in some positions to indicate that no feature should be extracted from the corresponding channel. Available strings are "identity", "argmin", "argmax", "min", "max", "mean", "std", "median" and "average".

  • function_params (dict, None, list or tuple, optional, default: None) –

    Additional keyword arguments for the function or functions in function. Passing None is equivalent to passing no arguments. Otherwise, if function is a single string or callable then function_params must be a dictionary. For functions encoded by allowed strings, the dictionary keys are as follows:

    • If function == "average", the only key is "weights" (np.ndarray or None, default: None).

    • Otherwise, there are no allowed keys.

    If function is a list or tuple, function_params must be a list or tuple of dictionaries (or None) as above, of the same length as function.

  • n_jobs (int or None, optional, default: None) – The number of jobs to use for the computation. None means 1 unless in a joblib.parallel_backend context. -1 means using all processors. Ignored if function is one of the allowed string options.

n_channels_

Number of channels present in the 3D array passed to fit. Must match the number of channels in the 3D array passed to transform.

Type

int

effective_function_

Callable, or tuple of callables or None, describing the function(s) used to compute features in each available channel. It is a single callable only when function was passed as a string.

Type

callable or tuple

effective_function_params_

Dictionary or tuple of dictionaries containing all information present in function_params as well as relevant quantities computed in fit. It is a single dict only when function was passed as a string. ``None``s are converted to empty dictionaries.

Type

dict or tuple

__init__(function='max', function_params=None, n_jobs=None)[source]

Initialize self. See help(type(self)) for accurate signature.

fit(X, y=None)[source]

Compute n_channels_ and effective_function_params_. Then, return the estimator.

This function is here to implement the usual scikit-learn API and hence work in pipelines.

Parameters
  • X (ndarray of shape (n_samples, n_channels, n_bins)) – Input data. Collection of multi-channel curves.

  • y (None) – There is no need for a target in a transformer, yet the pipeline API requires this parameter.

Returns

self

Return type

object

fit_transform(X, y=None, **fit_params)

Fit to data, then transform it.

Fits transformer to X and y with optional parameters fit_params and returns a transformed version of X.

Parameters
  • X (ndarray of shape (n_samples, n_channels, n_bins)) – Input data. Collection of multi-channel curves.

  • y (None) – There is no need for a target in a transformer, yet the pipeline API requires this parameter.

Returns

Xt – Output collection of features of multi-channel curves. n_features is the sum of the number of features output by the (non-None) functions on their respective channels.

Return type

ndarray of shape (n_samples, n_features)

get_params(deep=True)

Get parameters for this estimator.

Parameters

deep (bool, default=True) – If True, will return the parameters for this estimator and contained subobjects that are estimators.

Returns

params – Parameter names mapped to their values.

Return type

mapping of string to any

set_params(**params)

Set the parameters of this estimator.

The method works on simple estimators as well as on nested objects (such as pipelines). The latter have parameters of the form <component>__<parameter> so that it’s possible to update each component of a nested object.

Parameters

**params (dict) – Estimator parameters.

Returns

self – Estimator instance.

Return type

object

transform(X, y=None)[source]

Compute features of multi-channel curves.

Parameters
  • X (ndarray of shape (n_samples, n_channels, n_bins)) – Input collection of multi-channel curves.

  • y (None) – There is no need for a target in a transformer, yet the pipeline API requires this parameter.

Returns

Xt – Output collection of features of multi-channel curves. n_features is the sum of the number of features output by the (non-None) functions on their respective channels.

Return type

ndarray of shape (n_samples, n_features)