Subpackages

Submodules

presc.configuration module

Config management for PRESC, handled using confuse.Configuration.

class presc.configuration.LocalConfig(from_config)[source]

Bases: confuse.core.Configuration

Confuse config view that overrides but doesn’t modify another view.

This is useful for temporarily overriding options, eg. with feature-specific settings, while still taking advantage of the confuse resolution and templating functionalities.

The override is dynamic, so it will always pull in the most recent values of the underlying configuration.

from_config: a confuse Configuration (RootView) instance to override.

resolve()[source]: The core (internal) data retrieval method. Generates (value, source) pairs for each source that contains a value for this view. May raise ConfigTypeError if a type error occurs while traversing a source.

set(value)[source]: Override the value for this configuration view. The specified value is added as the highest-priority configuration data source.

class presc.configuration.PrescConfig(from_config=None)[source]

Bases: object

Wrapper around a confuse Configuration object.

This is used for managing config options in PRESC, including the global config.

from_config

A PrescConfig instance to override. If None, the config is initialized to the default settings.

Type: PrescConfig

dump()[source]: Dump the current config in YAML format.

flatten()[source]

get(template=None)[source]

reset_defaults()[source]: Reset all options to their defaults.

set(settings)[source]

Update one or more config options.

These should be specified in a dict, either mirroring the nested structure of the configuration file, or as flat key-value pairs using dots to indicate nested namespaces.

Examples

config.set({"report": {"title": "My Report", "author": "Me"}}) config.set({"report.title": "My Report", "report.author": "Me"})

property settings: Access the underlying confuse object.

update_from_file(file_path)[source]: Override current settings with those in the given YAML file.

presc.dataset module

class presc.dataset.Dataset(df, label_col, feature_cols=None)[source]

Bases: object

Convenience API for a dataset used with classification model.

Wraps a pandas DataFrame and provides shortcuts to access feature and label columns. It also allows for other columns, eg. computed columns, to be included or added later.

df

Type: Pandas DataFrame

label_col

The name of the column containing the labels

Type: str

feature_cols

An array-like of column names corresponding to model features. If not specified, all columns aside from the label column will be assumed to be features.

Type: Array of str

property column_names: Returns feature and other column names.

property df: Returns the underlying DataFrame.

property feature_names: Returns the feature names as a list.

property features: Returns the dataset feature columns.

property labels: Returns the dataset label column.

property other_cols: Returns the dataset columns other than features and label.

property size

subset(subset_rows, by_position=False)[source]

Returns a Dataset corresponding to a subset of this one.

Parameters

subset_rows – Selector for the rows to include in the subset (that can be passed to .loc or .iloc).
by_position (bool) – If True, subset_rows is interpeted as row numbers (used with .iloc). Otherwise, subset_rows is used with .loc.

presc.model module

class presc.model.ClassificationModel(classifier, train_dataset=None, retrain_now=False)[source]

Bases: object

Represents a classification problem.

Instances wrap a ML model together with its associated training dataset.

Parameters

classifier (sklearn Classifier) – the classifier to wrap
dataset (Dataset) – optionally include the associated training dataset
retrain_now (bool) – should the classifier first be (re-)trained on the given dataset?

property classifier: Returns the underlying classifier.

predict_labels(test_dataset)[source]

Predict labels for the given test dataset.

Parameters: test_dataset (presc.dataset.Dataset) –
Returns: A like-indexed Series.
Return type: Series

predict_probs(test_dataset)[source]

Compute predicted probabilities for the given test dataset.

This must be supported by the underlying classifier, otherwise an error will be raised.

Parameters: test_dataset (presc.dataset.Dataset) –
Returns: A like-indexed DataFrame of probabilities for each class.
Return type: DataFrame

train(train_dataset=None)[source]

Train the underlying classification model.

Parameters: train_dataset (presc.dataset.Dataset) – A Dataset to train on. Defaults to the pre-specified training dataset, if any.

presc.utils module

exception presc.utils.PrescError[source]

Bases: ValueError, AttributeError

General exception class for errors related to PRESC computations.

presc.utils.include_exclude_list(all_vals, included='*', excluded=None)[source]

Find values remaining after inclusions and exclusions are applied.

Values are first restricted to explicit inclusions, and then exclusions are applied.

The special values “*” and None are interpreted as “all” and “none” respectively for included and excluded.

Parameters

all_vals (list) – The full list of possible values
included (list) – The list of values to include. Those not listed here are dropped.
excluded (list) – The list of values to drop (after restricting to included).

Returns

The list of values out of all_vals that should be included.

Return type

list