deepchem.models package

Subpackages

Submodules

deepchem.models.models module

Contains an abstract base class that supports different ML models.

class deepchem.models.models.Model(model_instance=None, model_dir=None, verbose=True, **kwargs)[source]

Bases: sklearn.base.BaseEstimator

Abstract base class for different ML models.

evaluate(dataset, metrics, transformers=[], per_task_metrics=False)[source]

Evaluates the performance of this model on specified dataset.

Parameters:
  • dataset (dc.data.Dataset) – Dataset object.
  • metric (deepchem.metrics.Metric) – Evaluation metric
  • transformers (list) – List of deepchem.transformers.Transformer
  • per_task_metrics (bool) – If True, return per-task scores.
Returns:

Maps tasks to scores under metric.

Return type:

dict

fit(dataset, nb_epoch=10, batch_size=50, **kwargs)[source]

Fits a model on data in a Dataset object.

fit_on_batch(X, y, w)[source]

Updates existing model with new information.

static get_model_filename(model_dir)[source]

Given model directory, obtain filename for the model itself.

get_num_tasks()[source]

Get number of tasks.

get_params(deep=True)

Get parameters for this estimator.

Parameters:deep (boolean, optional) – If True, will return the parameters for this estimator and contained subobjects that are estimators.
Returns:params – Parameter names mapped to their values.
Return type:mapping of string to any
static get_params_filename(model_dir)[source]

Given model directory, obtain filename for the model itself.

get_task_type()[source]

Currently models can only be classifiers or regressors.

predict(dataset, transformers=[], batch_size=None)[source]

Uses self to make predictions on provided Dataset object.

Returns:numpy ndarray of shape (n_samples,)
Return type:y_pred
predict_on_batch(X, **kwargs)[source]

Makes predictions on given batch of new data.

Parameters:X (np.ndarray) – Features
predict_proba(dataset, transformers=[], batch_size=None, n_classes=2)[source]

TODO: Do transformers even make sense here?

Returns:numpy ndarray of shape (n_samples, n_classes*n_tasks)
Return type:y_pred
predict_proba_on_batch(X)[source]

Makes predictions of class probabilities on given batch of new data.

Parameters:X (np.ndarray) – Features
reload()[source]

Reload trained model from disk.

save()[source]

Dispatcher function for saving.

Each subclass is responsible for overriding this method.

set_params(**params)

Set the parameters of this estimator.

The method works on simple estimators as well as on nested objects (such as pipelines). The latter have parameters of the form <component>__<parameter> so that it’s possible to update each component of a nested object.

Returns:
Return type:self

deepchem.models.multitask module

Convenience class that lets singletask models fit on multitask data.

class deepchem.models.multitask.SingletaskToMultitask(tasks, model_builder, model_dir=None, verbose=True)[source]

Bases: deepchem.models.models.Model

Convenience class to let singletask models be fit on multitask data.

Warning: This current implementation is only functional for sklearn models.

evaluate(dataset, metrics, transformers=[], per_task_metrics=False)

Evaluates the performance of this model on specified dataset.

Parameters:
  • dataset (dc.data.Dataset) – Dataset object.
  • metric (deepchem.metrics.Metric) – Evaluation metric
  • transformers (list) – List of deepchem.transformers.Transformer
  • per_task_metrics (bool) – If True, return per-task scores.
Returns:

Maps tasks to scores under metric.

Return type:

dict

fit(dataset, **kwargs)[source]

Updates all singletask models with new information.

Warning: This current implementation is only functional for sklearn models.

fit_on_batch(X, y, w)

Updates existing model with new information.

get_model_filename(model_dir)

Given model directory, obtain filename for the model itself.

get_num_tasks()

Get number of tasks.

get_params(deep=True)

Get parameters for this estimator.

Parameters:deep (boolean, optional) – If True, will return the parameters for this estimator and contained subobjects that are estimators.
Returns:params – Parameter names mapped to their values.
Return type:mapping of string to any
get_params_filename(model_dir)

Given model directory, obtain filename for the model itself.

get_task_type()

Currently models can only be classifiers or regressors.

predict(dataset, transformers=[])[source]

Prediction for multitask models.

predict_on_batch(X)[source]

Concatenates results from all singletask models.

predict_proba(dataset, transformers=[], n_classes=2)[source]

Concatenates results from all singletask models.

predict_proba_on_batch(X, n_classes=2)[source]

Concatenates results from all singletask models.

reload()[source]

Load all models

save()[source]

Save all models

TODO(rbharath): Saving is not yet supported for this model.

set_params(**params)

Set the parameters of this estimator.

The method works on simple estimators as well as on nested objects (such as pipelines). The latter have parameters of the form <component>__<parameter> so that it’s possible to update each component of a nested object.

Returns:
Return type:self

deepchem.models.sequential module

Contains Sequential model adapted from keras/keras/models.py.

This class is adapted from Keras directly. Have cut out functionality and changed API to match DeepChem style.

class deepchem.models.sequential.Sequential(name=None, logdir=None)[source]

Bases: deepchem.models.models.Model

Linear stack of layers.

Parameters:layers (list of layers to add to the model.) –

Note

The first layer passed to a Sequential model should have a defined input shape. What that means is that it should have received an input_shape or batch_input_shape argument, or for some type of layers (recurrent, Dense...) an input_dim argument.

Example

>>> import deepchem as dc
>>> model = dc.models.Sequential()
>>> # Add features
>>> model.add_features(dc.nn.Input(shape=(50,)))
>>> # Add labels
>>> model.add_labels(dc.nn.Input(shape=(1,)))
>>> model.add(dc.nn.Dense(32, 50))
>>> model.add(dc.nn.Dense(64, 32))
add(layer)[source]

Adds a layer instance on top of the layer stack.

Parameters:layer (layer instance.) –
add_features(layer)[source]

Adds an input layer.

add_labels(layer)[source]

Adds a layer for labels

add_loss(loss, inputs=None)[source]

Adds a loss to model.

Parameters:losses (list) –
evaluate(x, y, batch_size=32, verbose=1, sample_weight=None, **kwargs)[source]

Computes the loss on some input data, batch by batch.

Parameters:
  • x (input data, as a Numpy array or list of Numpy arrays) – (if the model has multiple inputs).
  • y (labels, as a Numpy array.) –
  • batch_size (integer. Number of samples per gradient update.) –
  • verbose (verbosity mode, 0 or 1.) –
  • sample_weight (sample weights, as a Numpy array.) –
Returns:

  • Scalar test loss (if the model has no metrics)
  • or list of scalars (if the model computes other metrics).
  • The attribute model.metrics_names will give you
  • the display labels for the scalar outputs.

fit(dataset, nb_epoch=10, max_checkpoints_to_keep=5, log_every_N_batches=50, learning_rate=0.001, batch_size=50, checkpoint_interval=10)[source]

Trains the model for a fixed number of epochs.

TODO(rbharath0: This is mostly copied from TensorflowGraphModel. Should eventually refactor both together.

Parameters:
  • dataset (dc.data.Dataset) –
  • nb_epoch (10) –

    Number of training epochs. Dataset object holding training data

    batch_size: integer. Number of samples per gradient update. nb_epoch: integer, the number of epochs to train the model. verbose: 0 for no logging to stdout,
    1 for progress bar logging, 2 for one log line per epoch.
    initial_epoch: epoch at which to start training
    (useful for resuming a previous training run)
  • checkpoint_interval (int) – Frequency at which to write checkpoints, measured in epochs
fit_on_batch(X, y, w)

Updates existing model with new information.

get_model_filename(model_dir)

Given model directory, obtain filename for the model itself.

get_num_tasks()

Get number of tasks.

get_params(deep=True)

Get parameters for this estimator.

Parameters:deep (boolean, optional) – If True, will return the parameters for this estimator and contained subobjects that are estimators.
Returns:params – Parameter names mapped to their values.
Return type:mapping of string to any
get_params_filename(model_dir)

Given model directory, obtain filename for the model itself.

get_task_type()

Currently models can only be classifiers or regressors.

predict(x, batch_size=32, verbose=0)[source]

Generates output predictions for the input samples, processing the samples in a batched way.

# Arguments
x: the input data, as a Numpy array. batch_size: integer. verbose: verbosity mode, 0 or 1.
# Returns
A Numpy array of predictions.
predict_on_batch(x)[source]

Returns predictions for a single batch of samples.

predict_proba(x, batch_size=32, verbose=1)[source]

Generates class probability predictions for the input samples batch by batch.

# Arguments
x: input data, as a Numpy array or list of Numpy arrays
(if the model has multiple inputs).

batch_size: integer. verbose: verbosity mode, 0 or 1.

# Returns
A Numpy array of probability predictions.
predict_proba_on_batch(X)

Makes predictions of class probabilities on given batch of new data.

Parameters:X (np.ndarray) – Features
reload()

Reload trained model from disk.

save()

Dispatcher function for saving.

Each subclass is responsible for overriding this method.

set_params(**params)

Set the parameters of this estimator.

The method works on simple estimators as well as on nested objects (such as pipelines). The latter have parameters of the form <component>__<parameter> so that it’s possible to update each component of a nested object.

Returns:
Return type:self
test_on_batch(x, y, sample_weight=None, **kwargs)[source]

Evaluates the model over a single batch of samples.

# Arguments
x: input data, as a Numpy array or list of Numpy arrays
(if the model has multiple inputs).

y: labels, as a Numpy array. sample_weight: sample weights, as a Numpy array.

# Returns
Scalar test loss (if the model has no metrics) or list of scalars (if the model computes other metrics). The attribute model.metrics_names will give you the display labels for the scalar outputs.
train_on_batch(x, y, class_weight=None, sample_weight=None, **kwargs)[source]

Single gradient update over one batch of samples.

# Arguments
x: input data, as a Numpy array or list of Numpy arrays
(if the model has multiple inputs).

y: labels, as a Numpy array. class_weight: dictionary mapping classes to a weight value,

used for scaling the loss function (during training only).

sample_weight: sample weights, as a Numpy array.

# Returns
Scalar training loss (if the model has no metrics) or list of scalars (if the model computes other metrics). The attribute model.metrics_names will give you the display labels for the scalar outputs.
uses_learning_phase

Module contents

Gathers all models in one place for convenient imports