Submodules

presc.configuration module

Config management for PRESC, handled using confuse.Configuration.

class presc.configuration.LocalConfig(from_config)[source]

Bases: confuse.core.Configuration

Confuse config view that overrides but doesn’t modify another view.

This is useful for temporarily overriding options, eg. with feature-specific settings, while still taking advantage of the confuse resolution and templating functionalities.

The override is dynamic, so it will always pull in the most recent values of the underlying configuration.

from_config: a confuse Configuration (RootView) instance to override.

resolve()[source]

The core (internal) data retrieval method. Generates (value, source) pairs for each source that contains a value for this view. May raise ConfigTypeError if a type error occurs while traversing a source.

set(value)[source]

Override the value for this configuration view. The specified value is added as the highest-priority configuration data source.

class presc.configuration.PrescConfig(from_config=None)[source]

Bases: object

Wrapper around a confuse Configuration object.

This is used for managing config options in PRESC, including the global config.

from_config

A PrescConfig instance to override. If None, the config is initialized to the default settings.

Type

PrescConfig

dump()[source]

Dump the current config in YAML format.

flatten()[source]
get(template=None)[source]
reset_defaults()[source]

Reset all options to their defaults.

set(settings)[source]

Update one or more config options.

These should be specified in a dict, either mirroring the nested structure of the configuration file, or as flat key-value pairs using dots to indicate nested namespaces.

Examples

config.set({"report": {"title": "My Report", "author": "Me"}}) config.set({"report.title": "My Report", "report.author": "Me"})

property settings

Access the underlying confuse object.

update_from_file(file_path)[source]

Override current settings with those in the given YAML file.

presc.dataset module

class presc.dataset.Dataset(df, label_col, feature_cols=None)[source]

Bases: object

Convenience API for a dataset used with classification model.

Wraps a pandas DataFrame and provides shortcuts to access feature and label columns. It also allows for other columns, eg. computed columns, to be included or added later.

df
Type

Pandas DataFrame

label_col

The name of the column containing the labels

Type

str

feature_cols

An array-like of column names corresponding to model features. If not specified, all columns aside from the label column will be assumed to be features.

Type

Array of str

property column_names

Returns feature and other column names.

property df

Returns the underlying DataFrame.

property feature_names

Returns the feature names as a list.

property features

Returns the dataset feature columns.

property labels

Returns the dataset label column.

property other_cols

Returns the dataset columns other than features and label.

property size
subset(subset_rows, by_position=False)[source]

Returns a Dataset corresponding to a subset of this one.

Parameters
  • subset_rows – Selector for the rows to include in the subset (that can be passed to .loc or .iloc).

  • by_position (bool) – If True, subset_rows is interpeted as row numbers (used with .iloc). Otherwise, subset_rows is used with .loc.

presc.model module

class presc.model.ClassificationModel(classifier, train_dataset=None, retrain_now=False)[source]

Bases: object

Represents a classification problem.

Instances wrap a ML model together with its associated training dataset.

Parameters
  • classifier (sklearn Classifier) – the classifier to wrap

  • dataset (Dataset) – optionally include the associated training dataset

  • retrain_now (bool) – should the classifier first be (re-)trained on the given dataset?

property classifier

Returns the underlying classifier.

predict_labels(test_dataset)[source]

Predict labels for the given test dataset.

Parameters

test_dataset (presc.dataset.Dataset) –

Returns

A like-indexed Series.

Return type

Series

predict_probs(test_dataset)[source]

Compute predicted probabilities for the given test dataset.

This must be supported by the underlying classifier, otherwise an error will be raised.

Parameters

test_dataset (presc.dataset.Dataset) –

Returns

A like-indexed DataFrame of probabilities for each class.

Return type

DataFrame

train(train_dataset=None)[source]

Train the underlying classification model.

Parameters

train_dataset (presc.dataset.Dataset) – A Dataset to train on. Defaults to the pre-specified training dataset, if any.

presc.utils module

exception presc.utils.PrescError[source]

Bases: ValueError, AttributeError

General exception class for errors related to PRESC computations.

presc.utils.include_exclude_list(all_vals, included='*', excluded=None)[source]

Find values remaining after inclusions and exclusions are applied.

Values are first restricted to explicit inclusions, and then exclusions are applied.

The special values “*” and None are interpreted as “all” and “none” respectively for included and excluded.

Parameters
  • all_vals (list) – The full list of possible values

  • included (list) – The list of values to include. Those not listed here are dropped.

  • excluded (list) – The list of values to drop (after restricting to included).

Returns

The list of values out of all_vals that should be included.

Return type

list