presc.copies package

presc.copies.copying module

class presc.copies.copying.ClassifierCopy(original, copy, numerical_sampling, post_sampling_labeling=True, enforce_balance=False, label_col='class', **k_sampling_parameters)[source]

Bases: object

Represents a classifier copy and its associated sampling method of choice.

Each instance wraps the original ML classifier with a ML classifier copy and the sampling method to carry out the copy. Methods allow to carry out the copy the original classifier, evaluate the quality of the copy, and to generate additional data using the original classifier with the sampling method specified on instatiation.

original

Original ML classifier to be copied.

Type

sklearn-type classifier

copy

ML classifier that will be used for the copy.

Type

sklearn-type classifier

numerical_sampling

Any of the numerical sampling functions defined in PRESC: grid_sampling, uniform_sampling, normal_sampling… The balancing sampler can only be used if the feature space does not contain any categorical variable.

Type

function

post_sampling_labeling

Whether generated data must be labeled after the sampling or not. If the chosen sampling function already does class labeling (such as balancing samplers) then it should be set to False. If the parameter enforce_balance is set to True then this parameter does not have any effect.

Type

bool

enforce_balance

Force class balancing for sampling functions that do not normally carry it out intrinsically.

Type

bool

label_col

Name of the label column.

Type

str

\*\*k_sampling_parameters

Parameters needed for the numerical_sampling function.

compute_fidelity_error(test_data)[source]

Computes the empirical fidelity error of the classifier copy.

Quantifies the resemblance of the copy to the original classifier. This value is zero when the copy makes exactly the same predictions than the original classifier (including misclassifications).

Parameters

test_data (array-like) – Dataset with the unlabeled samples to evaluate the resemblance of the copy to the original classifier.

Returns

The numerical value of the empirical fidelity error of the copy with this dataset.

Return type

float

copy_classifier(get_training_data=False, **k_mod_sampling_parameters)[source]

Copies the classifier using data generated with the original model.

Generates synthetic data using only basic information of the features (dynamic range, mean and sigma), labels it using the original model, and trains the copy model with this synthetic data. It can also return the generated synthetic data used for training.

Parameters
  • get_training_data (bool) – If True this method returns the synthetic data generated from the original classifier that was used to train the copy.

  • **k_mod_sampling_parameters – If the “nsamples” and/or “random_state” parameters of the sampling function have to be changed in order to obtain a different set of synthetic data, they can be specified here.

Returns

Outputs a PRESC Dataset with the training samples and their labels (if get_training_data set to True).

Return type

presc.dataset.Dataset

evaluation_summary(test_data=None, synthetic_data=None)[source]

Computes several metrics to evaluate the classifier copy.

Summary of metrics that evaluate the quality of a classifier copy, not only to assess its performance as classifier but to quantify its resemblance to the original classifier. Accuracy of the original and the copy models (using the original test data), and the empirical fidelity error and replacement capability of the copy (using the original test data and/or the generated synthetic data). This is a wrapper of the summary_metrics function applied to the copy and original models in this instance.

Parameters
  • original_model (sklearn-type classifier) – Original ML classifier to be copied.

  • copy_model (presc.copies.copying.ClassifierCopy) – ML classifier copy from the original ML classifier.

  • test_data (presc.dataset.Dataset) – Subset of the original data reserved for testing.

  • synthetic_data (presc.dataset.Dataset) – Synthetic data generated using the original model.

  • show_results (bool) – If True the metrics are also printed.

Returns

The values of all metrics.

Return type

dict

generate_synthetic_data(**k_mod_sampling_parameters)[source]

Generates synthetic data using the original model.

Generates samples following the sampling strategy specified on instantiation for the numerical features and a discrete distribution for the categorical features, and then labels them using the original model. If the same data needs to be generated then simply use a specific random seed.

Parameters

**k_mod_sampling_parameters – If the “nsamples” and/or “random_state” parameters of the sampling function have to be changed in order to obtain a different set of synthetic data, they can be specified here.

Returns

Outputs a PRESC Dataset with the generated samples and their labels.

Return type

presc.dataset.Dataset

replacement_capability(test_data)[source]

Computes the replacement capability of a classifier copy.

Quantifies the ability of the copy model to substitute the original model, i.e. maintaining the same accuracy in its predictions. This value is one when the accuracy of the copy model is the same as the original model, although the individual predictions may be different, approaching zero if the accuracy of the copy is much smaller than the original, and it can even take values larger than one if the copy model is better than the original.

Parameters

test_data (presc.dataset.Dataset) – Subset of the original data reserved to evaluate the resemblance of the copy to the original classifier. Or synthetic data generated from the original model with the same purpose.

Returns

The numerical value of the replacement capability.

Return type

float

presc.copies.sampling module

presc.copies.sampling.build_equal_category_dict(feature_categories)[source]

Assigns equal probability to all categories of each feature.

Parameters

feature_categories (dict of lists) – A dictionary with an entry per feature, with the list of categories that each feature has.

Returns

A dictionary with an entry per dataset feature (dictionary keys are the column names), where each feature entry contains a nested dictionary with its categories and the identical fraction for all categories from the same feature (the nested dictionary key for this information is “categories”, which is also a dictionary with one entry per category).

Return type

dict of dicts

presc.copies.sampling.categorical_sampling(feature_parameters, nsamples=500, random_state=None)[source]

Sample the classifier with a discrete distribution sampling.

Generates synthetic samples with a discrete distribution according to the probabilities described by feature_parameters. Features are assumed to be independent (not correlated).

Parameters
  • feature_parameters (dict of dicts) – A dictionary with an entry per dataset feature (dictionary keys should be the feature names), where each feature entry must contain a nested dictionary with its categories and their fraction. The key for the nested dictionary of categories should be “categories”, and the keys for the fractions should be the category name.

  • nsamples (int) – Number of samples to generate.

  • random_state (int) – Random seed used to generate the sampling data.

Returns

Dataset with a generated sampling following the discrete distribution of the feature space characterized by the feature_parameters.

Return type

pandas DataFrame

presc.copies.sampling.dynamical_range(df, verbose=False)[source]

Returns the dynamic range, mean, and sigma of the dataset features.

Parameters
  • df (pandas DataFrame) – The dataset with all the numerical features to analyze.

  • verbose (bool) – If set to True the feature parameters are printed.

Returns

A dictionary with an entry per dataset feature (dictionary keys are the column names), where each feature entry contains a nested dictionary with the values of the minimum and maximum values of the dynamic range of the dataset, as well as the mean and sigma of the distribution (nested dictionary keys are “min”, “max”, “mean” and “sigma”).

Return type

dict of dicts

presc.copies.sampling.find_categories(df, add_nans=False)[source]

Returns the categories of the dataset features.

Parameters
  • df (pandas DataFrame) – The dataset with all the categorical features to analyze.

  • add_nans (bool) – If True the sampler adds a “NaNs” category for the features that have any null values and assigns it the appropriate fraction.

Returns

A dictionary with an entry per dataset feature (dictionary keys are the column names), where each feature entry contains a nested dictionary with its categories and the fraction of each category present in the analyzed dataset (the nested dictionary key for this information is “categories”, which is also a dictionary with one entry per category).

Return type

dict of dicts

presc.copies.sampling.grid_sampling(feature_parameters, nsamples=500, random_state=None)[source]

Sample the classifier with a grid-like sampling.

Generates synthetic samples with a regular grid-like distribution within the feature space described in feature_parameters. Computes the grid spacing so that all features have the same number of different values.

Parameters
  • feature_parameters (dict of dicts) – A dictionary with an entry per dataset feature (dictionary keys should be the feature names), and where each feature entry must contain a nested dictionary with at least the entries corresponding to the minimum and maximum values of the dynamic range. Dictionary keys for these values should be “min” and “max”, respectively.

  • nsamples (int) – Maximum number of samples to generate. The exact number will depend on the parameter space.

  • random_state (int) – Parameter not used in grid_sampling.

Returns

Dataset with a regular grid-like generated sampling of the feature space characterized by the feature_parameters.

Return type

pandas DataFrame

presc.copies.sampling.image_random_sampling(feature_parameters={'images': {'max': 253, 'min': 0, 'x_pixels': 28, 'y_pixels': 28}}, nsamples=500, random_state=None)[source]

Sample the feature space of images using random pixels.

Generates synthetic samples using a random uniform distribution to establish the value for each image pixel. Hence, they are images of noise. It only generates one channel (that is, black and white images).

For most image datasets, which are not random and have structure, this is a very inefficient sampling method to generate synthetic image samples and explore the feature space. It is provided here for illustrating purposes only.

The default generates 28x28 images with pixel values between 0 and 253.

Parameters
  • feature_parameters (dict of dicts) –

    A dictionary which specifies the characteristics of the feature space of the images. It should have one entry ‘images’ with a nested dictionary with the entries ‘x_pixels’, ‘y_pixels’, ‘min’ and ‘max’, which specify the number of pixels of the image in each dimension, and the minimum and maximum possible values of the pixels. The values in the default dictionary are:

    feature_parameters = {“images”: {“x_pixels”: 28, “y_pixels”: 28,

    ”min”: 0, “max”: 253}}

  • nsamples (int) – Number of image samples to generate.

  • random_state (int) – Random seed used to generate the sampling data.

Returns

Dataset with a list of images that have the value of their pixels generated with a random uniform sampling of the feature space as specified in the feature_parameters.

Return type

pandas DataFrame

presc.copies.sampling.image_vae_sampling(feature_parameters={'images': {'autoencoder': None, 'autoencoder_edge_factor': 5, 'autoencoder_latent_dim': 2, 'max': 254, 'min': 0}}, nsamples=500, random_state=None)[source]

Sample the feature space of images using a variational autoencoder.

Generates synthetic samples of the same manifold as the variational autoencoder training data sampling the latent space, which represents images with the a Gaussian distribution for each latent dimension.

For image datasets, which are not random and have structure, this is an efficient sampling method to generate relevant synthetic image samples and explore the feature space.

Parameters
  • feature_parameters (dict of dicts) –

    A dictionary which specifies the characteristics of the feature space of the images. It should have one entry ‘images’ with a nested dictionary with the entries ‘x_pixels’, ‘y_pixels’, ‘min’ and ‘max’, which specify the number of pixels of the image in each dimension, and the minimum and maximum possible values of the pixels. The values in the default dictionary are:

    feature_parameters = {“images”: {“min”: 0, “max”: 254,

    ”autoencoder”: None, “autoencoder_latent_dim”: 2, “autoencoder_edge_factor”: 5}}

    It is neccessary to specify the autoencoder for it to work.

  • nsamples (int) – Number of image samples to generate.

  • random_state (int) – Random seed used to generate the sampling data.

Returns

Dataset with a list of images that have been generated sampling randomly the latent space of the variational autoencoder, as specified in the feature_parameters.

Return type

pandas DataFrame

presc.copies.sampling.labeling(X, original_classifier, label_col='class')[source]

Labels the samples from a dataset according to a classifier.

Parameters
  • X (pandas DataFrame) – Dataset with the features but not the labels.

  • original_classifier (sklearn-type classifier) – Classifier to use for the labeling of the samples.

  • label_col (str) – Name of the label column.

Returns

Outputs a PRESC Dataset with the samples and their labels.

Return type

presc.dataset.Dataset

presc.copies.sampling.mixed_data_features(df, add_nans=False)[source]

Extracts the numerical/categorical feature parameters from a dataset.

Parameters
  • df (pandas DataFrame) – The dataset with all the features to analyze (both numerical and categorical).

  • add_nans (bool) – If True the sampler adds a “NaNs” category for the categorical features that have any null values and assigns it the appropriate fraction.

Returns

A dictionary with an entry per dataset feature (dictionary keys are the column names), where each numerical feature entry contains a nested dictionary with the values of the minimum and maximum values of the dynamic range of the dataset, as well as the mean and sigma of the distribution, and each categorical feature entry contains a nested dictionary with its categories and the fraction of each category present in the analyzed dataset (nested dictionary keys are “min”, “max”, “mean”, “sigma”, and “categories”, which is also a dictionary with one entry per category).

Return type

dict of dicts

presc.copies.sampling.mixed_data_sampling(feature_parameters, numerical_sampling, nsamples=500, random_state=None, **remaining_parameters)[source]

Sample the classifier with a mix of a numerical and categorical sampler.

Generates synthetic samples with the specified distribution for the numerical features and with a discrete distribution for the categorical features. The parameters describing the feature space needed to compute the distributions are described in the feature_parameters dictionary. Features are assumed to be independent (not correlated).

Parameters
  • feature_parameters (dict of dicts) – A dictionary with an entry per dataset feature (dictionary keys should be the feature names), where each feature entry must contain a nested dictionary with its categories and their probability. The key for the nested dictionary of categories should be “categories”, and the keys for the probabilities should be the category name.

  • numerical_sampling (function) – Any of the non balancing numerical sampling functions defined in PRESC: grid_sampling, uniform_sampling, normal_sampling

  • nsamples (int) – Number of samples to generate.

  • random_state (int) – Random seed used to generate the sampling data.

Returns

Dataset with a generated sampling following the specified numerical sampling distribution for the numerical features and the discrete distribution for the categorical features, following the feature space characterized by the feature_parameters.

Return type

pandas DataFrame

presc.copies.sampling.normal_sampling(feature_parameters, nsamples=500, random_state=None)[source]

Sample the classifier with a normal distribution sampling.

Generates synthetic samples with a normal distribution according to the feature space described by feature_parameters. Features are assumed to be independent (not correlated).

Parameters
  • feature_parameters (dict of dicts) – A dictionary with an entry per dataset feature (dictionary keys should be the feature names), and where each feature entry must contain a nested dictionary with at least the entries corresponding to the mean and standard deviation values of the dataset. Dictionary keys for these values should be “mean” and “sigma”, respectively.

  • nsamples (int) – Number of samples to generate.

  • random_state (int) – Random seed used to generate the sampling data.

Returns

Dataset with a generated sampling following a normal distribution of the feature space characterized by the feature_parameters.

Return type

pandas DataFrame

presc.copies.sampling.reduce_feature_space(feature_parameters, sigmas=1)[source]

Force feature minimum/maximum values to x times the standard deviation.

This function will adjust the minimum and maximum values of each feature to the range determined by taking the feature’s mean value and substracting and adding to it the specified number of standard deviations. But only for the features that have the mean and standard deviation specified.

Normally this will reduce the feature space by leaving out the range of most extreme values and will facilitate that any sampling based on the feature minimum and maximum values becomes more efficient. This is a more notorious problem when the dictionary describing the features has ben extracted automatically from an original dataset which contains outliers.

Parameters
  • feature_parameters (dict of dicts) – A dictionary with an entry per dataset feature (dictionary keys are the column names), where each feature entry contains a nested dictionary with the values of the minimum and maximum values of the dynamic range of the dataset, as well as the mean and sigma of the distribution (nested dictionary keys are “min”, “max”, “mean” and “sigma”).

  • sigmas (float) – The factor by which the standard deviation will be multiplied in order to define the symmetric interval around the mean.

Returns

A dictionary with an entry per dataset feature (dictionary keys are the column names), where each feature entry contains a nested dictionary with the values of the minimum and maximum values of the dynamic range of the dataset, as well as the mean and sigma of the distribution (nested dictionary keys are “min”, “max”, “mean” and “sigma”).

Return type

dict of dicts

presc.copies.sampling.sampling_balancer(feature_parameters, numerical_sampling, original_classifier, nsamples=1000, max_iter=10, nbatch=1000, label_col='class', random_state=None, verbose=False, **remaining_parameters)[source]

Generate balanced synthetic data using any sampling function.

This function will attempt to obtain a balanced dataset with non-balancing samplers by generating the same number of samples for all classes, unless it reaches the maximum number of iterations. To use within the ClassifierCopy class, the enforce_balance must be set to True.

Note that the algorithm needs to find at least one sample of a different class in order detect that class and keep iterating through the batch generation of samples to try to get them all. Therefore, it is not guaranteed that it will find all the classes and successfully balance the synthetic dataset in extreme cases of imbalance. However, the batch size nbatch can be set to a larger number if we suspect that is the case, and this will increase the probability to find at least a sample of a different class in the first round. Thereafter, if the algorithm is already iterating to find a minoritary class, it is more likely that other classes that occupy a very small hypervolume will show up as well.

Parameters
  • feature_parameters (dict of dicts) – A dictionary with an entry per dataset feature (dictionary keys should be the feature names), where each feature entry must contain a nested dictionary with its categories and their fraction. The key for the nested dictionary of categories should be “categories”, and the keys for the fractions should be the category name.

  • numerical_sampling (function) – Any of the non balancing numerical sampling functions defined in PRESC: grid_sampling, uniform_sampling, normal_sampling

  • original_classifier (sklearn-type classifier) – Original ML classifier used to generate the synthetic data.

  • nsamples (int) – Number of samples to generate.

  • max_iter (int) – The maximum number of iterations generating batches to attempt to obtain the samples per class specified in nsamplesxclass.

  • nbatch (int) – Number of tentative samples to generate in each batch.

  • label_col (str) – Name of the label column.

  • random_state (int) – Random seed used to generate the sampling data.

  • verbose (bool) – If True the sampler prints information about each batch.

Returns

Dataset with a generated sampling following the specified numerical sampling distribution for the numerical features and the discrete distribution for the categorical features, following the feature space characterized by the feature_parameters, where the function has tried to balance the samples for each class.

Return type

pandas DataFrame

presc.copies.sampling.spherical_balancer_sampling(nsamples=1000, nfeatures=30, original_classifier=None, max_iter=10, nbatch=10000, radius_min=0, radius_max=1, label_col='class', random_state=None, verbose=False)[source]

Sample the classifier with a balancer spherical distribution sampling.

Generates synthetic samples with a spherical (shell) distribution between a minimum and a maximum radius values and then labels them using the original classifier. This function will attempt to obtain a balanced dataset by generating the same number of samples for all classes (nsamplesxclass), unless it reaches the maximum number of iterations. When used within the ClassifierCopy class, the balancing_sampler must be set to True.

This sampler works better when features have standardized values.

Parameters
  • nsamples (int) – Number of samples to generate.

  • nfeatures (int) – Number of features of the generated samples.

  • original_classifier (sklearn-type classifier) – Original ML classifier used to generate the synthetic data.

  • max_iter (int) – The maximum number of iterations generating batches to attempt to obtain the samples per class specified in nsamplesxclass.

  • nbatch (int) – Number of tentative samples to generate in each batch.

  • radius_min (float) – Minimum radius of the spherical shell distribution. It will be a spherical distribution if this value is set to zero.

  • radius_max (float) – Maximum radius of the spherical (shell) distribution.

  • label_col (str) – Name of the label column.

  • random_state (int) – Random seed used to generate the sampling data.

  • verbose (bool) – If True the sampler prints information about each batch.

Returns

Dataset with a generated sampling following a spherical distribution of the feature space, with features and labels.

Return type

pandas DataFrame

presc.copies.sampling.uniform_sampling(feature_parameters, nsamples=500, random_state=None)[source]

Sample the classifier with a random uniform sampling.

Generates synthetic samples with a random uniform distribution within the feature space described in feature_parameters.

Parameters
  • feature_parameters (dict of dicts) – A dictionary with an entry per dataset feature (dictionary keys should be the feature names), and where each feature entry must contain a nested dictionary with at least the entries corresponding to the minimum and maximum values of the dynamic range. Dictionary keys for these values should be “min” and “max”, respectively.

  • nsamples (int) – Number of samples to generate.

  • random_state (int) – Random seed used to generate the sampling data.

Returns

Dataset with a random uniform generated sampling of the feature space characterized by the feature_parameters.

Return type

pandas DataFrame

presc.copies.evaluations module

presc.copies.evaluations.empirical_fidelity_error(y_pred_original, y_pred_copy)[source]

Computes the empirical fidelity error of a classifier copy.

Quantifies the resemblance of the copy to the original classifier. This value is zero when the copy makes exactly the same predictions than the original classifier (including misclassifications).

Parameters
  • y_pred_original (list or 1d array-like) – Predicted labels, as returned by the original classifier.

  • y_pred_copy (list or 1d array-like) – Predicted labels, as returned by the classifier copy.

Returns

The numerical value of the empirical fidelity error.

Return type

float

presc.copies.evaluations.keep_top_classes(dataset, min_num_samples=2, classes_to_keep=None)[source]

Function to remove rows from minoritary classes from PRESC Datasets.

Only classes that have more than the specified minimum number of samples will be kept. If a list of the classes of interest is indicated, then this requirement is overrided.

Parameters
  • dataset (presc.dataset.Dataset) – PRESC dataset from which we want to remove the minoritary classes.

  • min_num_samples (int) – Minimum number of samples that the classes should have in order to keep them.

  • classes_to_keep (list) – Name of the classes to keep. If a list of classes is specified here, then the parameter min_num_samples is overriden, and the specified classes will have any number of samples.

Returns

PRESC Dataset without the samples from the minoritary classes.

Return type

presc.dataset.Dataset

presc.copies.evaluations.multivariable_density_comparison(datasets=[None], feature1=None, feature2=None, label_col='class', titles=None, other_kwargs={'alpha': 0.3, 'common_norm': False, 'fill': True, 'legend': False, 'n_levels': 4})[source]

Visualization to compare class density projections in detail.

Allows to compare the different topologies of a number of ML classifiers in a multivariable feature space by choosing a feature pair and “squashing” the rest of the features into a projected density distribution for each class.

It is important that the classifier datasets are obtained through a homogeneous sampling throughout the feature space to avoid introducing spurious shapes in the projected density distributions. uniform_sampling is a good option for that.

normal_sampling and any other non-uniform samplers should be avoided because the intrinsic class distributions become convoluted with its gaussian shape obscuring them. Note that grid_sampling is also not recommended because it samples very specific interval points and thus yields density peaks.

Parameters
  • datasets (list of pandas DataFrames) – List of the datasets with the sampled and labeled points for each classifier included in the comparison.

  • feature1 – Name of feature to display in the x-axis.

  • feature2 – Name of feature to display in the y-axis.

  • label_col (str) – Name of the label column.

  • titles (list of str) – List of names to identify each classifier and label their subplot.

  • **other_kwargs (dict) – Any other seaborn.kdeplot parameters needed to adjust the visualization. Default parameters are {“alpha”: 0.3, “common_norm”: False, “fill”: True, “n_levels”: 4, “legend”: False}. The value of any parameter specified within the other_kwargs dictionary will be overwritten, including any of these.

Returns

  • matplotlib.figure.Figure – Figure with the detailed classifier comparison.

  • matplotlib.axes.Axes or array of Axes – Contains most of the figure elements of the classifier comparison and sets the coordinate system.

presc.copies.evaluations.replacement_capability(y_true, y_pred_original, y_pred_copy)[source]

Computes the replacement capability of a classifier copy.

Quantifies the ability of the copy model to substitute the original model, i.e. maintaining the same accuracy in its predictions. This value is one when the accuracy of the copy model is the same as the original model, although the individual predictions may be different, approaching zero if the accuracy of the copy is much smaller than the original, and it can even take values larger than one if the copy model is better than the original.

Parameters
  • y_true (list or 1d array-like) – True labels from the data.

  • y_pred_original (list or 1d array-like) – Predicted labels, as returned by the original classifier.

  • y_pred_copy (list or 1d array-like) – Predicted labels, as returned by the classifier copy.

Returns

The numerical value of the replacement capability.

Return type

float

presc.copies.evaluations.summary_metrics(original_model=None, copy_model=None, test_data=None, synthetic_data=None, show_results=True)[source]

Computes several metrics to evaluate the classifier copy.

Summary of metrics that evaluate the quality of a classifier copy, not only to assess its performance as classifier but to quantify its resemblance to the original classifier. Accuracy of the original and the copy models (using the original test data), and the empirical fidelity error and replacement capability of the copy (using the original test data and/or the generated synthetic data).

Parameters
  • original_model (sklearn-type classifier) – Original ML classifier to be copied.

  • copy_model (presc.copies.copying.ClassifierCopy) – ML classifier copy from the original ML classifier.

  • test_data (presc.dataset.Dataset) – Subset of the original data reserved for testing.

  • synthetic_data (presc.dataset.Dataset) – Synthetic data generated using the original model.

  • show_results (bool) – If True the metrics are also printed.

Returns

The values of all metrics.

Return type

dict

presc.copies.examples module

presc.copies.examples.multiclass_gaussians(nsamples=3000, nfeatures=30, nclasses=15, center_low=2, center_high=10, scale_low=1, scale_high=1)[source]

Generates a multidimensional gaussian dataset with multiple classes.

This function generates a multidimensional normal distribution centered at the origin with standard deviation one for class zero. And then adds an additional gaussian distribution per class, centered at a random distance between center_low and center_high, and with random standard deviation between scale_low and scale_high.

Parameters
  • nsamples (int) – Maximum number of samples to generate. Actual number of samples depends on the number of classes, because the function yields a balanced dataset with the same number of samples per class.

  • nfeatures (int) – Number of features of the generated samples.

  • nclasses (int) – Number of classes in the generated dataset.

  • center_low (float) – Minimum translation from the origin of the center of the gaussian distributions corresponding to additional classes.

  • center_high (float) – Maximum translation from the origin of the center of the gaussian distributions corresponding to additional classes.

  • scale_low (float) – Minimum value for the standard deviation of the gaussian distributions corresponding to additional classes.

  • scale_high (float) – Maximum value for the standard deviation of the gaussian distributions corresponding to additional classes.

Returns

Outputs a PRESC Dataset with the generated samples and their labels.

Return type

presc.dataset.Dataset