deepchem.models.tensorgraph.models package

Submodules

deepchem.models.tensorgraph.models.atomic_conv module

class deepchem.models.tensorgraph.models.atomic_conv.AtomicConvScore(atom_types, layer_sizes, **kwargs)[source]

Bases: deepchem.models.tensorgraph.layers.Layer

add_summary_to_tg()

Can only be called after self.create_layer to gaurentee that name is not none

clone(in_layers)

Create a copy of this layer with different inputs.

copy(replacements={}, variables_graph=None, shared=False)

Duplicate this Layer and all its inputs.

This is similar to clone(), but instead of only cloning one layer, it also recursively calls copy() on all of this layer’s inputs to clone the entire hierarchy of layers. In the process, you can optionally tell it to replace particular layers with specific existing ones. For example, you can clone a stack of layers, while connecting the topmost ones to different inputs.

For example, consider a stack of dense layers that depend on an input:

>>> input = Feature(shape=(None, 100))
>>> dense1 = Dense(100, in_layers=input)
>>> dense2 = Dense(100, in_layers=dense1)
>>> dense3 = Dense(100, in_layers=dense2)

The following will clone all three dense layers, but not the input layer. Instead, the input to the first dense layer will be a different layer specified in the replacements map.

>>> new_input = Feature(shape=(None, 100))
>>> replacements = {input: new_input}
>>> dense3_copy = dense3.copy(replacements)
Parameters:
  • replacements (map) – specifies existing layers, and the layers to replace them with (instead of cloning them). This argument serves two purposes. First, you can pass in a list of replacements to control which layers get cloned. In addition, as each layer is cloned, it is added to this map. On exit, it therefore contains a complete record of all layers that were copied, and a reference to the copy of each one.
  • variables_graph (TensorGraph) – an optional TensorGraph from which to take variables. If this is specified, the current value of each variable in each layer is recorded, and the copy has that value specified as its initial value. This allows a piece of a pre-trained model to be copied to another model.
  • shared (bool) – if True, create new layers by calling shared() on the input layers. This means the newly created layers will share variables with the original ones.
create_tensor(in_layers=None, set_tensors=True, **kwargs)[source]
layer_number_dict = {}
none_tensors()
set_summary(summary_op, summary_description=None, collections=None)

Annotates a tensor with a tf.summary operation Collects data from self.out_tensor by default but can be changed by setting self.tb_input to another tensor in create_tensor

Parameters:
  • summary_op (str) – summary operation to annotate node
  • summary_description (object, optional) – Optional summary_pb2.SummaryDescription()
  • collections (list of graph collections keys, optional) – New summary op is added to these collections. Defaults to [GraphKeys.SUMMARIES]
set_tensors(tensor)
set_variable_initial_values(values)

Set the initial values of all variables.

This takes a list, which contains the initial values to use for all of this layer’s values (in the same order retured by TensorGraph.get_layer_variables()). When this layer is used in a TensorGraph, it will automatically initialize each variable to the value specified in the list. Note that some layers also have separate mechanisms for specifying variable initializers; this method overrides them. The purpose of this method is to let a Layer object represent a pre-trained layer, complete with trained values for its variables.

shape

Get the shape of this Layer’s output.

shared(in_layers)

Create a copy of this layer that shares variables with it.

This is similar to clone(), but where clone() creates two independent layers, this causes the layers to share variables with each other.

Parameters:
  • in_layers (list tensor) –
  • in tensors for the shared layer (List) –
Returns:

Return type:

Layer

deepchem.models.tensorgraph.models.atomic_conv.InitializeWeightsBiases(prev_layer_size, size, weights=None, biases=None, name=None)[source]

Initializes weights and biases to be used in a fully-connected layer.

Parameters:
  • prev_layer_size (int) – Number of features in previous layer.
  • size (int) – Number of nodes in this layer.
  • weights (tf.Tensor, optional (Default None)) – Weight tensor.
  • biases (tf.Tensor, optional (Default None)) – Bias tensor.
  • name (str) – Name for this op, optional (Defaults to ‘fully_connected’ if None)
Returns:

  • weights (tf.Variable) – Initialized weights.
  • biases (tf.Variable) – Initialized biases.

deepchem.models.tensorgraph.models.atomic_conv.atomic_conv_model(frag1_num_atoms=70, frag2_num_atoms=634, complex_num_atoms=701, max_num_neighbors=12, batch_size=24, at=[6, 7.0, 8.0, 9.0, 11.0, 12.0, 15.0, 16.0, 17.0, 20.0, 25.0, 30.0, 35.0, 53.0, -1.0], radial=[[1.5, 2.0, 2.5, 3.0, 3.5, 4.0, 4.5, 5.0, 5.5, 6.0, 6.5, 7.0, 7.5, 8.0, 8.5, 9.0, 9.5, 10.0, 10.5, 11.0, 11.5, 12.0], [0.0, 4.0, 8.0], [0.4]], layer_sizes=[32, 32, 16], learning_rate=0.001)[source]

deepchem.models.tensorgraph.models.gan module

Generative Adversarial Networks.

class deepchem.models.tensorgraph.models.gan.GAN(n_generators=1, n_discriminators=1, **kwargs)[source]

Bases: deepchem.models.tensorgraph.tensor_graph.TensorGraph

Implements Generative Adversarial Networks.

A Generative Adversarial Network (GAN) is a type of generative model. It consists of two parts called the “generator” and the “discriminator”. The generator takes random noise as input and transforms it into an output that (hopefully) resembles the training data. The discriminator takes a set of samples as input and tries to distinguish the real training samples from the ones created by the generator. Both of them are trained together. The discriminator tries to get better and better at telling real from false data, while the generator tries to get better and better at fooling the discriminator.

In many cases there also are additional inputs to the generator and discriminator. In that case it is known as a Conditional GAN (CGAN), since it learns a distribution that is conditional on the values of those inputs. They are referred to as “conditional inputs”.

Many variations on this idea have been proposed, and new varieties of GANs are constantly being proposed. This class tries to make it very easy to implement straightforward GANs of the most conventional types. At the same time, it tries to be flexible enough that it can be used to implement many (but certainly not all) variations on the concept.

To define a GAN, you must create a subclass that provides implementations of the following methods:

get_noise_input_shape() get_data_input_shapes() create_generator() create_discriminator()

If you want your GAN to have any conditional inputs you must also implement:

get_conditional_input_shapes()

The following methods have default implementations that are suitable for most conventional GANs. You can override them if you want to customize their behavior:

create_generator_loss() create_discriminator_loss() get_noise_batch()

This class allows a GAN to have multiple generators and discriminators, a model known as MIX+GAN. It is described in Arora et al., “Generalization and Equilibrium in Generative Adversarial Nets (GANs)” (https://arxiv.org/abs/1703.00573). This can lead to better models, and is especially useful for reducing mode collapse, since different generators can learn different parts of the distribution. To use this technique, simply specify the number of generators and discriminators when calling the constructor. You can then tell predict_gan_generator() which generator to use for predicting samples.

add_output(layer)
build()
create_discriminator(data_inputs, conditional_inputs)[source]

Create the discriminator.

Subclasses must override this to construct the discriminator and return its output layer.

Parameters:
  • data_inputs (list) – the Input layers from which the discriminator can read the input data. The number and shapes of these inputs will match the return value from get_data_input_shapes(). The samples read from these layers may be either training data or generated data.
  • conditional_inputs (list) – the Input layers for any conditional inputs to the network. The number and shapes of these inputs will match the return value from get_conditional_input_shapes().
Returns:

  • A Layer object that outputs the probability of each sample being a training
  • sample. The shape of this layer must be [None]. That is, it must output a
  • one dimensional tensor whose length equals the batch size.

create_discriminator_loss(discrim_output_train, discrim_output_gen)[source]

Create the loss function for the discriminator.

The default implementation is appropriate for most cases. Subclasses can override this if the need to customize it.

Parameters:
  • discrim_output_train (Layer) – the output from the discriminator on a batch of generated data. This is its estimate of the probability that each sample is training data.
  • discrim_output_gen (Layer) – the output from the discriminator on a batch of training data. This is its estimate of the probability that each sample is training data.
Returns:

  • A Layer object that outputs the loss function to use for optimizing the
  • discriminator.

create_generator(noise_input, conditional_inputs)[source]

Create the generator.

Subclasses must override this to construct the generator and return its output layers.

Parameters:
  • noise_input (Input) – the Input layer from which the generator can read random noise. The shape will match the return value from get_noise_input_shape().
  • conditional_inputs (list) – the Input layers for any conditional inputs to the network. The number and shapes of these inputs will match the return value from get_conditional_input_shapes().
Returns:

  • A list of Layer objects that produce the generator’s outputs. The number and
  • shapes of these layers must match the return value from get_data_input_shapes(),
  • since generated data must have the same form as training data.

create_generator_loss(discrim_output)[source]

Create the loss function for the generator.

The default implementation is appropriate for most cases. Subclasses can override this if the need to customize it.

Parameters:discrim_output (Layer) – the output from the discriminator on a batch of generated data. This is its estimate of the probability that each sample is training data.
Returns:
  • A Layer object that outputs the loss function to use for optimizing the
  • generator.
create_submodel(layers=None, loss=None, optimizer=None)

Create an alternate objective for training one piece of a TensorGraph.

A TensorGraph consists of a set of layers, and specifies a loss function and optimizer to use for training those layers. Usually this is sufficient, but there are cases where you want to train different parts of a model separately. For example, a GAN consists of a generator and a discriminator. They are trained separately, and they use different loss functions.

A submodel defines an alternate objective to use in cases like this. It may optionally specify any of the following: a subset of layers in the model to train; a different loss function; and a different optimizer to use. This method creates a submodel, which you can then pass to fit() to use it for training.

Parameters:
  • layers (list) – the list of layers to train. If None, all layers in the model will be trained.
  • loss (Layer) – the loss function to optimize. If None, the model’s main loss function will be used.
  • optimizer (Optimizer) – the optimizer to use for training. If None, the model’s main optimizer will be used.
Returns:

  • the newly created submodel, which can be passed to any of the fitting
  • methods.

default_generator(dataset, epochs=1, predict=False, deterministic=True, pad_batches=True)
evaluate(dataset, metrics, transformers=[], per_task_metrics=False)

Evaluates the performance of this model on specified dataset.

Parameters:
  • dataset (dc.data.Dataset) – Dataset object.
  • metric (deepchem.metrics.Metric) – Evaluation metric
  • transformers (list) – List of deepchem.transformers.Transformer
  • per_task_metrics (bool) – If True, return per-task scores.
Returns:

Maps tasks to scores under metric.

Return type:

dict

evaluate_generator(feed_dict_generator, metrics, transformers=[], labels=None, outputs=None, weights=[], per_task_metrics=False)
fit(dataset, nb_epoch=10, max_checkpoints_to_keep=5, checkpoint_interval=1000, deterministic=False, restore=False, submodel=None, **kwargs)

Train this model on a dataset.

Parameters:
  • dataset (Dataset) – the Dataset to train on
  • nb_epoch (int) – the number of epochs to train for
  • max_checkpoints_to_keep (int) – the maximum number of checkpoints to keep. Older checkpoints are discarded.
  • checkpoint_interval (int) – the frequency at which to write checkpoints, measured in training steps. Set this to 0 to disable automatic checkpointing.
  • deterministic (bool) – if True, the samples are processed in order. If False, a different random order is used for each epoch.
  • restore (bool) – if True, restore the model from the most recent checkpoint and continue training from there. If False, retrain the model from scratch.
  • submodel (Submodel) – an alternate training objective to use. This should have been created by calling create_submodel().
fit_gan(batches, generator_steps=1.0, max_checkpoints_to_keep=5, checkpoint_interval=1000, restore=False)[source]

Train this model on data.

Parameters:
  • batches (iterable) – batches of data to train the discriminator on, each represented as a dict that maps Layers to values. It should specify values for all members of data_inputs and conditional_inputs.
  • generator_steps (float) – the number of training steps to perform for the generator for each batch. This can be used to adjust the ratio of training steps for the generator and discriminator. For example, 2.0 will perform two training steps for every batch, while 0.5 will only perform one training step for every two batches.
  • max_checkpoints_to_keep (int) – the maximum number of checkpoints to keep. Older checkpoints are discarded.
  • checkpoint_interval (int) – the frequency at which to write checkpoints, measured in batches. Set this to 0 to disable automatic checkpointing.
  • restore (bool) – if True, restore the model from the most recent checkpoint before training it.
fit_generator(feed_dict_generator, max_checkpoints_to_keep=5, checkpoint_interval=1000, restore=False, submodel=None)

Train this model on data from a generator.

Parameters:
  • feed_dict_generator (generator) – this should generate batches, each represented as a dict that maps Layers to values.
  • max_checkpoints_to_keep (int) – the maximum number of checkpoints to keep. Older checkpoints are discarded.
  • checkpoint_interval (int) – the frequency at which to write checkpoints, measured in training steps. Set this to 0 to disable automatic checkpointing.
  • restore (bool) – if True, restore the model from the most recent checkpoint and continue training from there. If False, retrain the model from scratch.
  • submodel (Submodel) – an alternate training objective to use. This should have been created by calling create_submodel().
Returns:

Return type:

the average loss over the most recent checkpoint interval

fit_on_batch(X, y, w, submodel=None)
get_checkpoints()

Get a list of all available checkpoint files.

get_conditional_input_shapes()[source]

Get the shapes of any conditional inputs.

Subclasses may override this to return a list of tuples, each giving the shape of one of the conditional inputs. The actual Input layers will be created automatically. The first dimension of each shape must be None, since it will correspond to the batch size.

The default implementation returns an empty list, meaning there are no conditional inputs.

get_data_input_shapes()[source]

Get the shapes of the inputs for training data.

Subclasses must override this to return a list of tuples, each giving the shape of one of the inputs. The actual Input layers will be created automatically. This list of shapes must also match the shapes of the generator’s outputs. The first dimension of each shape must be None, since it will correspond to the batch size.

get_global_step()
get_layer_variables(layer)

Get the list of trainable variables in a layer of the graph.

get_model_filename(model_dir)

Given model directory, obtain filename for the model itself.

get_noise_batch(batch_size)[source]

Get a batch of random noise to pass to the generator.

This should return a NumPy array whose shape matches the one returned by get_noise_input_shape(). The default implementation returns normally distributed values. Subclasses can override this to implement a different distribution.

get_noise_input_shape()[source]

Get the shape of the generator’s noise input layer.

Subclasses must override this to return a tuple giving the shape of the noise input. The actual Input layer will be created automatically. The first dimension must be None, since it will correspond to the batch size.

get_num_tasks()
get_params(deep=True)

Get parameters for this estimator.

Parameters:deep (boolean, optional) – If True, will return the parameters for this estimator and contained subobjects that are estimators.
Returns:params – Parameter names mapped to their values.
Return type:mapping of string to any
get_params_filename(model_dir)

Given model directory, obtain filename for the model itself.

get_pickling_errors(obj, seen=None)
get_pre_q_input(input_layer)
get_task_type()

Currently models can only be classifiers or regressors.

load_from_dir(model_dir, restore=True)
predict(dataset, transformers=[], outputs=None)

Uses self to make predictions on provided Dataset object.

Parameters:
  • dataset (dc.data.Dataset) – Dataset to make prediction on
  • transformers (list) – List of dc.trans.Transformers.
  • outputs (object) – If outputs is None, then will assume outputs = self.outputs[0] (single output). If outputs is a Layer/Tensor, then will evaluate and return as a single ndarray. If outputs is a list of Layers/Tensors, will return a list of ndarrays.
Returns:

results

Return type:

numpy ndarray or list of numpy ndarrays

predict_gan_generator(batch_size=1, noise_input=None, conditional_inputs=[], generator_index=0)[source]

Use the GAN to generate a batch of samples.

Parameters:
  • batch_size (int) – the number of samples to generate. If either noise_input or conditional_inputs is specified, this argument is ignored since the batch size is then determined by the size of that argument.
  • noise_input (array) – the value to use for the generator’s noise input. If None (the default), get_noise_batch() is called to generate a random input, so each call will produce a new set of samples.
  • conditional_inputs (list of arrays) – the values to use for all conditional inputs. This must be specified if the GAN has any conditional inputs.
  • generator_index (int) – the index of the generator (between 0 and n_generators-1) to use for generating the samples.
Returns:

  • An array (if the generator has only one output) or list of arrays (if it has
  • multiple outputs) containing the generated samples.

predict_on_batch(X, transformers=[], outputs=None)

Generates predictions for input samples, processing samples in a batch.

Parameters:
  • X (ndarray) – the input data, as a Numpy array.
  • transformers (List) – List of dc.trans.Transformers
Returns:

Return type:

A Numpy array of predictions.

predict_on_generator(generator, transformers=[], outputs=None)
Parameters:
  • generator (Generator) – Generator that constructs feed dictionaries for TensorGraph.
  • transformers (list) – List of dc.trans.Transformers.
  • outputs (object) – If outputs is None, then will assume outputs = self.outputs. If outputs is a Layer/Tensor, then will evaluate and return as a single ndarray. If outputs is a list of Layers/Tensors, will return a list of ndarrays.
  • Returns – y_pred: numpy ndarray of shape (n_samples, n_classes*n_tasks)
predict_proba(dataset, transformers=[], outputs=None)
Parameters:
  • dataset (dc.data.Dataset) – Dataset to make prediction on
  • transformers (list) – List of dc.trans.Transformers.
  • outputs (object) – If outputs is None, then will assume outputs = self.outputs[0] (single output). If outputs is a Layer/Tensor, then will evaluate and return as a single ndarray. If outputs is a list of Layers/Tensors, will return a list of ndarrays.
Returns:

y_pred

Return type:

numpy ndarray or list of numpy ndarrays

predict_proba_on_batch(X, transformers=[], outputs=None)

Generates predictions for input samples, processing samples in a batch.

Parameters:
  • X (ndarray) – the input data, as a Numpy array.
  • transformers (List) – List of dc.trans.Transformers
Returns:

Return type:

A Numpy array of predictions.

predict_proba_on_generator(generator, transformers=[], outputs=None)
Returns:numpy ndarray of shape (n_samples, n_classes*n_tasks)
Return type:y_pred
reload()

Reload trained model from disk.

restore(checkpoint=None)

Reload the values of all variables from a checkpoint file.

Parameters:checkpoint (str) – the path to the checkpoint file to load. If this is None, the most recent checkpoint will be chosen automatically. Call get_checkpoints() to get a list of all available checkpoints.
save()
save_checkpoint(max_checkpoints_to_keep=5)

Save a checkpoint to disk.

Usually you do not need to call this method, since fit() saves checkpoints automatically. If you have disabled automatic checkpointing during fitting, this can be called to manually write checkpoints.

Parameters:max_checkpoints_to_keep (int) – the maximum number of checkpoints to keep. Older checkpoints are discarded.
set_loss(layer)
set_optimizer(optimizer)

Set the optimizer to use for fitting.

set_params(**params)

Set the parameters of this estimator.

The method works on simple estimators as well as on nested objects (such as pipelines). The latter have parameters of the form <component>__<parameter> so that it’s possible to update each component of a nested object.

Returns:
Return type:self
topsort()
class deepchem.models.tensorgraph.models.gan.GradientPenaltyLayer(discrim_output_train, gan)[source]

Bases: deepchem.models.tensorgraph.layers.Layer

Implements the gradient penalty loss term for WGANs.

add_summary_to_tg()

Can only be called after self.create_layer to gaurentee that name is not none

clone(in_layers)

Create a copy of this layer with different inputs.

copy(replacements={}, variables_graph=None, shared=False)

Duplicate this Layer and all its inputs.

This is similar to clone(), but instead of only cloning one layer, it also recursively calls copy() on all of this layer’s inputs to clone the entire hierarchy of layers. In the process, you can optionally tell it to replace particular layers with specific existing ones. For example, you can clone a stack of layers, while connecting the topmost ones to different inputs.

For example, consider a stack of dense layers that depend on an input:

>>> input = Feature(shape=(None, 100))
>>> dense1 = Dense(100, in_layers=input)
>>> dense2 = Dense(100, in_layers=dense1)
>>> dense3 = Dense(100, in_layers=dense2)

The following will clone all three dense layers, but not the input layer. Instead, the input to the first dense layer will be a different layer specified in the replacements map.

>>> new_input = Feature(shape=(None, 100))
>>> replacements = {input: new_input}
>>> dense3_copy = dense3.copy(replacements)
Parameters:
  • replacements (map) – specifies existing layers, and the layers to replace them with (instead of cloning them). This argument serves two purposes. First, you can pass in a list of replacements to control which layers get cloned. In addition, as each layer is cloned, it is added to this map. On exit, it therefore contains a complete record of all layers that were copied, and a reference to the copy of each one.
  • variables_graph (TensorGraph) – an optional TensorGraph from which to take variables. If this is specified, the current value of each variable in each layer is recorded, and the copy has that value specified as its initial value. This allows a piece of a pre-trained model to be copied to another model.
  • shared (bool) – if True, create new layers by calling shared() on the input layers. This means the newly created layers will share variables with the original ones.
create_tensor(in_layers=None, set_tensors=True, **kwargs)[source]
layer_number_dict = {}
none_tensors()
set_summary(summary_op, summary_description=None, collections=None)

Annotates a tensor with a tf.summary operation Collects data from self.out_tensor by default but can be changed by setting self.tb_input to another tensor in create_tensor

Parameters:
  • summary_op (str) – summary operation to annotate node
  • summary_description (object, optional) – Optional summary_pb2.SummaryDescription()
  • collections (list of graph collections keys, optional) – New summary op is added to these collections. Defaults to [GraphKeys.SUMMARIES]
set_tensors(tensor)
set_variable_initial_values(values)

Set the initial values of all variables.

This takes a list, which contains the initial values to use for all of this layer’s values (in the same order retured by TensorGraph.get_layer_variables()). When this layer is used in a TensorGraph, it will automatically initialize each variable to the value specified in the list. Note that some layers also have separate mechanisms for specifying variable initializers; this method overrides them. The purpose of this method is to let a Layer object represent a pre-trained layer, complete with trained values for its variables.

shape

Get the shape of this Layer’s output.

shared(in_layers)

Create a copy of this layer that shares variables with it.

This is similar to clone(), but where clone() creates two independent layers, this causes the layers to share variables with each other.

Parameters:
  • in_layers (list tensor) –
  • in tensors for the shared layer (List) –
Returns:

Return type:

Layer

class deepchem.models.tensorgraph.models.gan.WGAN(gradient_penalty=10.0, **kwargs)[source]

Bases: deepchem.models.tensorgraph.models.gan.GAN

Implements Wasserstein Generative Adversarial Networks.

This class implements Wasserstein Generative Adversarial Networks (WGANs) as described in Arjovsky et al., “Wasserstein GAN” (https://arxiv.org/abs/1701.07875). A WGAN is conceptually rather different from a conventional GAN, but in practical terms very similar. It reinterprets the discriminator (often called the “critic” in this context) as learning an approximation to the Earth Mover distance between the training and generated distributions. The generator is then trained to minimize that distance. In practice, this just means using slightly different loss functions for training the generator and discriminator.

WGANs have theoretical advantages over conventional GANs, and they often work better in practice. In addition, the discriminator’s loss function can be directly interpreted as a measure of the quality of the model. That is an advantage over conventional GANs, where the loss does not directly convey information about the quality of the model.

The theory WGANs are based on requires the discriminator’s gradient to be bounded. The original paper achieved this by clipping its weights. This class instead does it by adding a penalty term to the discriminator’s loss, as described in https://arxiv.org/abs/1704.00028. This is sometimes found to produce better results.

There are a few other practical differences between GANs and WGANs. In a conventional GAN, the discriminator’s output must be between 0 and 1 so it can be interpreted as a probability. In a WGAN, it should produce an unbounded output that can be interpreted as a distance.

When training a WGAN, you also should usually use a smaller value for generator_steps. Conventional GANs rely on keeping the generator and discriminator “in balance” with each other. If the discriminator ever gets too good, it becomes impossible for the generator to fool it and training stalls. WGANs do not have this problem, and in fact the better the discriminator is, the easier it is for the generator to improve. It therefore usually works best to perform several training steps on the discriminator for each training step on the generator.

add_output(layer)
build()
create_discriminator(data_inputs, conditional_inputs)

Create the discriminator.

Subclasses must override this to construct the discriminator and return its output layer.

Parameters:
  • data_inputs (list) – the Input layers from which the discriminator can read the input data. The number and shapes of these inputs will match the return value from get_data_input_shapes(). The samples read from these layers may be either training data or generated data.
  • conditional_inputs (list) – the Input layers for any conditional inputs to the network. The number and shapes of these inputs will match the return value from get_conditional_input_shapes().
Returns:

  • A Layer object that outputs the probability of each sample being a training
  • sample. The shape of this layer must be [None]. That is, it must output a
  • one dimensional tensor whose length equals the batch size.

create_discriminator_loss(discrim_output_train, discrim_output_gen)[source]
create_generator(noise_input, conditional_inputs)

Create the generator.

Subclasses must override this to construct the generator and return its output layers.

Parameters:
  • noise_input (Input) – the Input layer from which the generator can read random noise. The shape will match the return value from get_noise_input_shape().
  • conditional_inputs (list) – the Input layers for any conditional inputs to the network. The number and shapes of these inputs will match the return value from get_conditional_input_shapes().
Returns:

  • A list of Layer objects that produce the generator’s outputs. The number and
  • shapes of these layers must match the return value from get_data_input_shapes(),
  • since generated data must have the same form as training data.

create_generator_loss(discrim_output)[source]
create_submodel(layers=None, loss=None, optimizer=None)

Create an alternate objective for training one piece of a TensorGraph.

A TensorGraph consists of a set of layers, and specifies a loss function and optimizer to use for training those layers. Usually this is sufficient, but there are cases where you want to train different parts of a model separately. For example, a GAN consists of a generator and a discriminator. They are trained separately, and they use different loss functions.

A submodel defines an alternate objective to use in cases like this. It may optionally specify any of the following: a subset of layers in the model to train; a different loss function; and a different optimizer to use. This method creates a submodel, which you can then pass to fit() to use it for training.

Parameters:
  • layers (list) – the list of layers to train. If None, all layers in the model will be trained.
  • loss (Layer) – the loss function to optimize. If None, the model’s main loss function will be used.
  • optimizer (Optimizer) – the optimizer to use for training. If None, the model’s main optimizer will be used.
Returns:

  • the newly created submodel, which can be passed to any of the fitting
  • methods.

default_generator(dataset, epochs=1, predict=False, deterministic=True, pad_batches=True)
evaluate(dataset, metrics, transformers=[], per_task_metrics=False)

Evaluates the performance of this model on specified dataset.

Parameters:
  • dataset (dc.data.Dataset) – Dataset object.
  • metric (deepchem.metrics.Metric) – Evaluation metric
  • transformers (list) – List of deepchem.transformers.Transformer
  • per_task_metrics (bool) – If True, return per-task scores.
Returns:

Maps tasks to scores under metric.

Return type:

dict

evaluate_generator(feed_dict_generator, metrics, transformers=[], labels=None, outputs=None, weights=[], per_task_metrics=False)
fit(dataset, nb_epoch=10, max_checkpoints_to_keep=5, checkpoint_interval=1000, deterministic=False, restore=False, submodel=None, **kwargs)

Train this model on a dataset.

Parameters:
  • dataset (Dataset) – the Dataset to train on
  • nb_epoch (int) – the number of epochs to train for
  • max_checkpoints_to_keep (int) – the maximum number of checkpoints to keep. Older checkpoints are discarded.
  • checkpoint_interval (int) – the frequency at which to write checkpoints, measured in training steps. Set this to 0 to disable automatic checkpointing.
  • deterministic (bool) – if True, the samples are processed in order. If False, a different random order is used for each epoch.
  • restore (bool) – if True, restore the model from the most recent checkpoint and continue training from there. If False, retrain the model from scratch.
  • submodel (Submodel) – an alternate training objective to use. This should have been created by calling create_submodel().
fit_gan(batches, generator_steps=1.0, max_checkpoints_to_keep=5, checkpoint_interval=1000, restore=False)

Train this model on data.

Parameters:
  • batches (iterable) – batches of data to train the discriminator on, each represented as a dict that maps Layers to values. It should specify values for all members of data_inputs and conditional_inputs.
  • generator_steps (float) – the number of training steps to perform for the generator for each batch. This can be used to adjust the ratio of training steps for the generator and discriminator. For example, 2.0 will perform two training steps for every batch, while 0.5 will only perform one training step for every two batches.
  • max_checkpoints_to_keep (int) – the maximum number of checkpoints to keep. Older checkpoints are discarded.
  • checkpoint_interval (int) – the frequency at which to write checkpoints, measured in batches. Set this to 0 to disable automatic checkpointing.
  • restore (bool) – if True, restore the model from the most recent checkpoint before training it.
fit_generator(feed_dict_generator, max_checkpoints_to_keep=5, checkpoint_interval=1000, restore=False, submodel=None)

Train this model on data from a generator.

Parameters:
  • feed_dict_generator (generator) – this should generate batches, each represented as a dict that maps Layers to values.
  • max_checkpoints_to_keep (int) – the maximum number of checkpoints to keep. Older checkpoints are discarded.
  • checkpoint_interval (int) – the frequency at which to write checkpoints, measured in training steps. Set this to 0 to disable automatic checkpointing.
  • restore (bool) – if True, restore the model from the most recent checkpoint and continue training from there. If False, retrain the model from scratch.
  • submodel (Submodel) – an alternate training objective to use. This should have been created by calling create_submodel().
Returns:

Return type:

the average loss over the most recent checkpoint interval

fit_on_batch(X, y, w, submodel=None)
get_checkpoints()

Get a list of all available checkpoint files.

get_conditional_input_shapes()

Get the shapes of any conditional inputs.

Subclasses may override this to return a list of tuples, each giving the shape of one of the conditional inputs. The actual Input layers will be created automatically. The first dimension of each shape must be None, since it will correspond to the batch size.

The default implementation returns an empty list, meaning there are no conditional inputs.

get_data_input_shapes()

Get the shapes of the inputs for training data.

Subclasses must override this to return a list of tuples, each giving the shape of one of the inputs. The actual Input layers will be created automatically. This list of shapes must also match the shapes of the generator’s outputs. The first dimension of each shape must be None, since it will correspond to the batch size.

get_global_step()
get_layer_variables(layer)

Get the list of trainable variables in a layer of the graph.

get_model_filename(model_dir)

Given model directory, obtain filename for the model itself.

get_noise_batch(batch_size)

Get a batch of random noise to pass to the generator.

This should return a NumPy array whose shape matches the one returned by get_noise_input_shape(). The default implementation returns normally distributed values. Subclasses can override this to implement a different distribution.

get_noise_input_shape()

Get the shape of the generator’s noise input layer.

Subclasses must override this to return a tuple giving the shape of the noise input. The actual Input layer will be created automatically. The first dimension must be None, since it will correspond to the batch size.

get_num_tasks()
get_params(deep=True)

Get parameters for this estimator.

Parameters:deep (boolean, optional) – If True, will return the parameters for this estimator and contained subobjects that are estimators.
Returns:params – Parameter names mapped to their values.
Return type:mapping of string to any
get_params_filename(model_dir)

Given model directory, obtain filename for the model itself.

get_pickling_errors(obj, seen=None)
get_pre_q_input(input_layer)
get_task_type()

Currently models can only be classifiers or regressors.

load_from_dir(model_dir, restore=True)
predict(dataset, transformers=[], outputs=None)

Uses self to make predictions on provided Dataset object.

Parameters:
  • dataset (dc.data.Dataset) – Dataset to make prediction on
  • transformers (list) – List of dc.trans.Transformers.
  • outputs (object) – If outputs is None, then will assume outputs = self.outputs[0] (single output). If outputs is a Layer/Tensor, then will evaluate and return as a single ndarray. If outputs is a list of Layers/Tensors, will return a list of ndarrays.
Returns:

results

Return type:

numpy ndarray or list of numpy ndarrays

predict_gan_generator(batch_size=1, noise_input=None, conditional_inputs=[], generator_index=0)

Use the GAN to generate a batch of samples.

Parameters:
  • batch_size (int) – the number of samples to generate. If either noise_input or conditional_inputs is specified, this argument is ignored since the batch size is then determined by the size of that argument.
  • noise_input (array) – the value to use for the generator’s noise input. If None (the default), get_noise_batch() is called to generate a random input, so each call will produce a new set of samples.
  • conditional_inputs (list of arrays) – the values to use for all conditional inputs. This must be specified if the GAN has any conditional inputs.
  • generator_index (int) – the index of the generator (between 0 and n_generators-1) to use for generating the samples.
Returns:

  • An array (if the generator has only one output) or list of arrays (if it has
  • multiple outputs) containing the generated samples.

predict_on_batch(X, transformers=[], outputs=None)

Generates predictions for input samples, processing samples in a batch.

Parameters:
  • X (ndarray) – the input data, as a Numpy array.
  • transformers (List) – List of dc.trans.Transformers
Returns:

Return type:

A Numpy array of predictions.

predict_on_generator(generator, transformers=[], outputs=None)
Parameters:
  • generator (Generator) – Generator that constructs feed dictionaries for TensorGraph.
  • transformers (list) – List of dc.trans.Transformers.
  • outputs (object) – If outputs is None, then will assume outputs = self.outputs. If outputs is a Layer/Tensor, then will evaluate and return as a single ndarray. If outputs is a list of Layers/Tensors, will return a list of ndarrays.
  • Returns – y_pred: numpy ndarray of shape (n_samples, n_classes*n_tasks)
predict_proba(dataset, transformers=[], outputs=None)
Parameters:
  • dataset (dc.data.Dataset) – Dataset to make prediction on
  • transformers (list) – List of dc.trans.Transformers.
  • outputs (object) – If outputs is None, then will assume outputs = self.outputs[0] (single output). If outputs is a Layer/Tensor, then will evaluate and return as a single ndarray. If outputs is a list of Layers/Tensors, will return a list of ndarrays.
Returns:

y_pred

Return type:

numpy ndarray or list of numpy ndarrays

predict_proba_on_batch(X, transformers=[], outputs=None)

Generates predictions for input samples, processing samples in a batch.

Parameters:
  • X (ndarray) – the input data, as a Numpy array.
  • transformers (List) – List of dc.trans.Transformers
Returns:

Return type:

A Numpy array of predictions.

predict_proba_on_generator(generator, transformers=[], outputs=None)
Returns:numpy ndarray of shape (n_samples, n_classes*n_tasks)
Return type:y_pred
reload()

Reload trained model from disk.

restore(checkpoint=None)

Reload the values of all variables from a checkpoint file.

Parameters:checkpoint (str) – the path to the checkpoint file to load. If this is None, the most recent checkpoint will be chosen automatically. Call get_checkpoints() to get a list of all available checkpoints.
save()
save_checkpoint(max_checkpoints_to_keep=5)

Save a checkpoint to disk.

Usually you do not need to call this method, since fit() saves checkpoints automatically. If you have disabled automatic checkpointing during fitting, this can be called to manually write checkpoints.

Parameters:max_checkpoints_to_keep (int) – the maximum number of checkpoints to keep. Older checkpoints are discarded.
set_loss(layer)
set_optimizer(optimizer)

Set the optimizer to use for fitting.

set_params(**params)

Set the parameters of this estimator.

The method works on simple estimators as well as on nested objects (such as pipelines). The latter have parameters of the form <component>__<parameter> so that it’s possible to update each component of a nested object.

Returns:
Return type:self
topsort()

deepchem.models.tensorgraph.models.graph_models module

class deepchem.models.tensorgraph.models.graph_models.DAGTensorGraph(n_tasks, max_atoms=50, n_atom_feat=75, n_graph_feat=30, n_outputs=30, mode='classification', **kwargs)[source]

Bases: deepchem.models.tensorgraph.tensor_graph.TensorGraph

add_output(layer)
build()
build_graph()[source]

Building graph structures: Features => DAGLayer => DAGGather => Classification or Regression

create_submodel(layers=None, loss=None, optimizer=None)

Create an alternate objective for training one piece of a TensorGraph.

A TensorGraph consists of a set of layers, and specifies a loss function and optimizer to use for training those layers. Usually this is sufficient, but there are cases where you want to train different parts of a model separately. For example, a GAN consists of a generator and a discriminator. They are trained separately, and they use different loss functions.

A submodel defines an alternate objective to use in cases like this. It may optionally specify any of the following: a subset of layers in the model to train; a different loss function; and a different optimizer to use. This method creates a submodel, which you can then pass to fit() to use it for training.

Parameters:
  • layers (list) – the list of layers to train. If None, all layers in the model will be trained.
  • loss (Layer) – the loss function to optimize. If None, the model’s main loss function will be used.
  • optimizer (Optimizer) – the optimizer to use for training. If None, the model’s main optimizer will be used.
Returns:

  • the newly created submodel, which can be passed to any of the fitting
  • methods.

default_generator(dataset, epochs=1, predict=False, deterministic=True, pad_batches=True)[source]

TensorGraph style implementation

evaluate(dataset, metrics, transformers=[], per_task_metrics=False)

Evaluates the performance of this model on specified dataset.

Parameters:
  • dataset (dc.data.Dataset) – Dataset object.
  • metric (deepchem.metrics.Metric) – Evaluation metric
  • transformers (list) – List of deepchem.transformers.Transformer
  • per_task_metrics (bool) – If True, return per-task scores.
Returns:

Maps tasks to scores under metric.

Return type:

dict

evaluate_generator(feed_dict_generator, metrics, transformers=[], labels=None, outputs=None, weights=[], per_task_metrics=False)
fit(dataset, nb_epoch=10, max_checkpoints_to_keep=5, checkpoint_interval=1000, deterministic=False, restore=False, submodel=None, **kwargs)

Train this model on a dataset.

Parameters:
  • dataset (Dataset) – the Dataset to train on
  • nb_epoch (int) – the number of epochs to train for
  • max_checkpoints_to_keep (int) – the maximum number of checkpoints to keep. Older checkpoints are discarded.
  • checkpoint_interval (int) – the frequency at which to write checkpoints, measured in training steps. Set this to 0 to disable automatic checkpointing.
  • deterministic (bool) – if True, the samples are processed in order. If False, a different random order is used for each epoch.
  • restore (bool) – if True, restore the model from the most recent checkpoint and continue training from there. If False, retrain the model from scratch.
  • submodel (Submodel) – an alternate training objective to use. This should have been created by calling create_submodel().
fit_generator(feed_dict_generator, max_checkpoints_to_keep=5, checkpoint_interval=1000, restore=False, submodel=None)

Train this model on data from a generator.

Parameters:
  • feed_dict_generator (generator) – this should generate batches, each represented as a dict that maps Layers to values.
  • max_checkpoints_to_keep (int) – the maximum number of checkpoints to keep. Older checkpoints are discarded.
  • checkpoint_interval (int) – the frequency at which to write checkpoints, measured in training steps. Set this to 0 to disable automatic checkpointing.
  • restore (bool) – if True, restore the model from the most recent checkpoint and continue training from there. If False, retrain the model from scratch.
  • submodel (Submodel) – an alternate training objective to use. This should have been created by calling create_submodel().
Returns:

Return type:

the average loss over the most recent checkpoint interval

fit_on_batch(X, y, w, submodel=None)
get_checkpoints()

Get a list of all available checkpoint files.

get_global_step()
get_layer_variables(layer)

Get the list of trainable variables in a layer of the graph.

get_model_filename(model_dir)

Given model directory, obtain filename for the model itself.

get_num_tasks()
get_params(deep=True)

Get parameters for this estimator.

Parameters:deep (boolean, optional) – If True, will return the parameters for this estimator and contained subobjects that are estimators.
Returns:params – Parameter names mapped to their values.
Return type:mapping of string to any
get_params_filename(model_dir)

Given model directory, obtain filename for the model itself.

get_pickling_errors(obj, seen=None)
get_pre_q_input(input_layer)
get_task_type()

Currently models can only be classifiers or regressors.

load_from_dir(model_dir, restore=True)
predict(dataset, transformers=[], outputs=None)

Uses self to make predictions on provided Dataset object.

Parameters:
  • dataset (dc.data.Dataset) – Dataset to make prediction on
  • transformers (list) – List of dc.trans.Transformers.
  • outputs (object) – If outputs is None, then will assume outputs = self.outputs[0] (single output). If outputs is a Layer/Tensor, then will evaluate and return as a single ndarray. If outputs is a list of Layers/Tensors, will return a list of ndarrays.
Returns:

results

Return type:

numpy ndarray or list of numpy ndarrays

predict_on_batch(X, transformers=[], outputs=None)

Generates predictions for input samples, processing samples in a batch.

Parameters:
  • X (ndarray) – the input data, as a Numpy array.
  • transformers (List) – List of dc.trans.Transformers
Returns:

Return type:

A Numpy array of predictions.

predict_on_generator(generator, transformers=[], outputs=None)[source]
predict_proba(dataset, transformers=[], outputs=None)
Parameters:
  • dataset (dc.data.Dataset) – Dataset to make prediction on
  • transformers (list) – List of dc.trans.Transformers.
  • outputs (object) – If outputs is None, then will assume outputs = self.outputs[0] (single output). If outputs is a Layer/Tensor, then will evaluate and return as a single ndarray. If outputs is a list of Layers/Tensors, will return a list of ndarrays.
Returns:

y_pred

Return type:

numpy ndarray or list of numpy ndarrays

predict_proba_on_batch(X, transformers=[], outputs=None)

Generates predictions for input samples, processing samples in a batch.

Parameters:
  • X (ndarray) – the input data, as a Numpy array.
  • transformers (List) – List of dc.trans.Transformers
Returns:

Return type:

A Numpy array of predictions.

predict_proba_on_generator(generator, transformers=[], outputs=None)
Returns:numpy ndarray of shape (n_samples, n_classes*n_tasks)
Return type:y_pred
reload()

Reload trained model from disk.

restore(checkpoint=None)

Reload the values of all variables from a checkpoint file.

Parameters:checkpoint (str) – the path to the checkpoint file to load. If this is None, the most recent checkpoint will be chosen automatically. Call get_checkpoints() to get a list of all available checkpoints.
save()
save_checkpoint(max_checkpoints_to_keep=5)

Save a checkpoint to disk.

Usually you do not need to call this method, since fit() saves checkpoints automatically. If you have disabled automatic checkpointing during fitting, this can be called to manually write checkpoints.

Parameters:max_checkpoints_to_keep (int) – the maximum number of checkpoints to keep. Older checkpoints are discarded.
set_loss(layer)
set_optimizer(optimizer)

Set the optimizer to use for fitting.

set_params(**params)

Set the parameters of this estimator.

The method works on simple estimators as well as on nested objects (such as pipelines). The latter have parameters of the form <component>__<parameter> so that it’s possible to update each component of a nested object.

Returns:
Return type:self
topsort()
class deepchem.models.tensorgraph.models.graph_models.DTNNTensorGraph(n_tasks, n_embedding=30, n_hidden=100, n_distance=100, distance_min=-1, distance_max=18, output_activation=True, mode='regression', **kwargs)[source]

Bases: deepchem.models.tensorgraph.tensor_graph.TensorGraph

add_output(layer)
build()
build_graph()[source]

Building graph structures: Features => DTNNEmbedding => DTNNStep => DTNNStep => DTNNGather => Regression

create_submodel(layers=None, loss=None, optimizer=None)

Create an alternate objective for training one piece of a TensorGraph.

A TensorGraph consists of a set of layers, and specifies a loss function and optimizer to use for training those layers. Usually this is sufficient, but there are cases where you want to train different parts of a model separately. For example, a GAN consists of a generator and a discriminator. They are trained separately, and they use different loss functions.

A submodel defines an alternate objective to use in cases like this. It may optionally specify any of the following: a subset of layers in the model to train; a different loss function; and a different optimizer to use. This method creates a submodel, which you can then pass to fit() to use it for training.

Parameters:
  • layers (list) – the list of layers to train. If None, all layers in the model will be trained.
  • loss (Layer) – the loss function to optimize. If None, the model’s main loss function will be used.
  • optimizer (Optimizer) – the optimizer to use for training. If None, the model’s main optimizer will be used.
Returns:

  • the newly created submodel, which can be passed to any of the fitting
  • methods.

default_generator(dataset, epochs=1, predict=False, deterministic=True, pad_batches=True)[source]

TensorGraph style implementation

evaluate(dataset, metrics, transformers=[], per_task_metrics=False)

Evaluates the performance of this model on specified dataset.

Parameters:
  • dataset (dc.data.Dataset) – Dataset object.
  • metric (deepchem.metrics.Metric) – Evaluation metric
  • transformers (list) – List of deepchem.transformers.Transformer
  • per_task_metrics (bool) – If True, return per-task scores.
Returns:

Maps tasks to scores under metric.

Return type:

dict

evaluate_generator(feed_dict_generator, metrics, transformers=[], labels=None, outputs=None, weights=[], per_task_metrics=False)
fit(dataset, nb_epoch=10, max_checkpoints_to_keep=5, checkpoint_interval=1000, deterministic=False, restore=False, submodel=None, **kwargs)

Train this model on a dataset.

Parameters:
  • dataset (Dataset) – the Dataset to train on
  • nb_epoch (int) – the number of epochs to train for
  • max_checkpoints_to_keep (int) – the maximum number of checkpoints to keep. Older checkpoints are discarded.
  • checkpoint_interval (int) – the frequency at which to write checkpoints, measured in training steps. Set this to 0 to disable automatic checkpointing.
  • deterministic (bool) – if True, the samples are processed in order. If False, a different random order is used for each epoch.
  • restore (bool) – if True, restore the model from the most recent checkpoint and continue training from there. If False, retrain the model from scratch.
  • submodel (Submodel) – an alternate training objective to use. This should have been created by calling create_submodel().
fit_generator(feed_dict_generator, max_checkpoints_to_keep=5, checkpoint_interval=1000, restore=False, submodel=None)

Train this model on data from a generator.

Parameters:
  • feed_dict_generator (generator) – this should generate batches, each represented as a dict that maps Layers to values.
  • max_checkpoints_to_keep (int) – the maximum number of checkpoints to keep. Older checkpoints are discarded.
  • checkpoint_interval (int) – the frequency at which to write checkpoints, measured in training steps. Set this to 0 to disable automatic checkpointing.
  • restore (bool) – if True, restore the model from the most recent checkpoint and continue training from there. If False, retrain the model from scratch.
  • submodel (Submodel) – an alternate training objective to use. This should have been created by calling create_submodel().
Returns:

Return type:

the average loss over the most recent checkpoint interval

fit_on_batch(X, y, w, submodel=None)
get_checkpoints()

Get a list of all available checkpoint files.

get_global_step()
get_layer_variables(layer)

Get the list of trainable variables in a layer of the graph.

get_model_filename(model_dir)

Given model directory, obtain filename for the model itself.

get_num_tasks()
get_params(deep=True)

Get parameters for this estimator.

Parameters:deep (boolean, optional) – If True, will return the parameters for this estimator and contained subobjects that are estimators.
Returns:params – Parameter names mapped to their values.
Return type:mapping of string to any
get_params_filename(model_dir)

Given model directory, obtain filename for the model itself.

get_pickling_errors(obj, seen=None)
get_pre_q_input(input_layer)
get_task_type()

Currently models can only be classifiers or regressors.

load_from_dir(model_dir, restore=True)
predict(dataset, transformers=[], outputs=None)[source]
predict_on_batch(X, transformers=[], outputs=None)

Generates predictions for input samples, processing samples in a batch.

Parameters:
  • X (ndarray) – the input data, as a Numpy array.
  • transformers (List) – List of dc.trans.Transformers
Returns:

Return type:

A Numpy array of predictions.

predict_on_generator(generator, transformers=[], outputs=None)
Parameters:
  • generator (Generator) – Generator that constructs feed dictionaries for TensorGraph.
  • transformers (list) – List of dc.trans.Transformers.
  • outputs (object) – If outputs is None, then will assume outputs = self.outputs. If outputs is a Layer/Tensor, then will evaluate and return as a single ndarray. If outputs is a list of Layers/Tensors, will return a list of ndarrays.
  • Returns – y_pred: numpy ndarray of shape (n_samples, n_classes*n_tasks)
predict_proba(dataset, transformers=[], outputs=None)
Parameters:
  • dataset (dc.data.Dataset) – Dataset to make prediction on
  • transformers (list) – List of dc.trans.Transformers.
  • outputs (object) – If outputs is None, then will assume outputs = self.outputs[0] (single output). If outputs is a Layer/Tensor, then will evaluate and return as a single ndarray. If outputs is a list of Layers/Tensors, will return a list of ndarrays.
Returns:

y_pred

Return type:

numpy ndarray or list of numpy ndarrays

predict_proba_on_batch(X, transformers=[], outputs=None)

Generates predictions for input samples, processing samples in a batch.

Parameters:
  • X (ndarray) – the input data, as a Numpy array.
  • transformers (List) – List of dc.trans.Transformers
Returns:

Return type:

A Numpy array of predictions.

predict_proba_on_generator(generator, transformers=[], outputs=None)
Returns:numpy ndarray of shape (n_samples, n_classes*n_tasks)
Return type:y_pred
reload()

Reload trained model from disk.

restore(checkpoint=None)

Reload the values of all variables from a checkpoint file.

Parameters:checkpoint (str) – the path to the checkpoint file to load. If this is None, the most recent checkpoint will be chosen automatically. Call get_checkpoints() to get a list of all available checkpoints.
save()
save_checkpoint(max_checkpoints_to_keep=5)

Save a checkpoint to disk.

Usually you do not need to call this method, since fit() saves checkpoints automatically. If you have disabled automatic checkpointing during fitting, this can be called to manually write checkpoints.

Parameters:max_checkpoints_to_keep (int) – the maximum number of checkpoints to keep. Older checkpoints are discarded.
set_loss(layer)
set_optimizer(optimizer)

Set the optimizer to use for fitting.

set_params(**params)

Set the parameters of this estimator.

The method works on simple estimators as well as on nested objects (such as pipelines). The latter have parameters of the form <component>__<parameter> so that it’s possible to update each component of a nested object.

Returns:
Return type:self
topsort()
class deepchem.models.tensorgraph.models.graph_models.GraphConvTensorGraph(n_tasks, graph_conv_layers=[64, 64], dense_layer_size=128, dropout=0.0, mode='classification', **kwargs)[source]

Bases: deepchem.models.tensorgraph.tensor_graph.TensorGraph

add_output(layer)
bayesian_predict(dataset, transformers=[], n_passes=4, untransform=False)[source]
Generates predictions and confidences on a dataset object
https://arxiv.org/pdf/1506.02142.pdf
# Returns:
mu: numpy ndarray of shape (n_samples, n_tasks) sigma: numpy ndarray of shape (n_samples, n_tasks)
bayesian_predict_on_batch(X, transformers=[], n_passes=4)[source]
Returns:numpy ndarray of shape (n_samples, n_tasks) sigma: numpy ndarray of shape (n_samples, n_tasks)
Return type:mu
build()
build_graph()[source]

Building graph structures:

create_submodel(layers=None, loss=None, optimizer=None)

Create an alternate objective for training one piece of a TensorGraph.

A TensorGraph consists of a set of layers, and specifies a loss function and optimizer to use for training those layers. Usually this is sufficient, but there are cases where you want to train different parts of a model separately. For example, a GAN consists of a generator and a discriminator. They are trained separately, and they use different loss functions.

A submodel defines an alternate objective to use in cases like this. It may optionally specify any of the following: a subset of layers in the model to train; a different loss function; and a different optimizer to use. This method creates a submodel, which you can then pass to fit() to use it for training.

Parameters:
  • layers (list) – the list of layers to train. If None, all layers in the model will be trained.
  • loss (Layer) – the loss function to optimize. If None, the model’s main loss function will be used.
  • optimizer (Optimizer) – the optimizer to use for training. If None, the model’s main optimizer will be used.
Returns:

  • the newly created submodel, which can be passed to any of the fitting
  • methods.

default_generator(dataset, epochs=1, predict=False, deterministic=True, pad_batches=True)[source]
evaluate(dataset, metrics, transformers=[], per_task_metrics=False)[source]
evaluate_generator(feed_dict_generator, metrics, transformers=[], labels=None, outputs=None, weights=[], per_task_metrics=False)
fit(dataset, nb_epoch=10, max_checkpoints_to_keep=5, checkpoint_interval=1000, deterministic=False, restore=False, submodel=None, **kwargs)

Train this model on a dataset.

Parameters:
  • dataset (Dataset) – the Dataset to train on
  • nb_epoch (int) – the number of epochs to train for
  • max_checkpoints_to_keep (int) – the maximum number of checkpoints to keep. Older checkpoints are discarded.
  • checkpoint_interval (int) – the frequency at which to write checkpoints, measured in training steps. Set this to 0 to disable automatic checkpointing.
  • deterministic (bool) – if True, the samples are processed in order. If False, a different random order is used for each epoch.
  • restore (bool) – if True, restore the model from the most recent checkpoint and continue training from there. If False, retrain the model from scratch.
  • submodel (Submodel) – an alternate training objective to use. This should have been created by calling create_submodel().
fit_generator(feed_dict_generator, max_checkpoints_to_keep=5, checkpoint_interval=1000, restore=False, submodel=None)

Train this model on data from a generator.

Parameters:
  • feed_dict_generator (generator) – this should generate batches, each represented as a dict that maps Layers to values.
  • max_checkpoints_to_keep (int) – the maximum number of checkpoints to keep. Older checkpoints are discarded.
  • checkpoint_interval (int) – the frequency at which to write checkpoints, measured in training steps. Set this to 0 to disable automatic checkpointing.
  • restore (bool) – if True, restore the model from the most recent checkpoint and continue training from there. If False, retrain the model from scratch.
  • submodel (Submodel) – an alternate training objective to use. This should have been created by calling create_submodel().
Returns:

Return type:

the average loss over the most recent checkpoint interval

fit_on_batch(X, y, w, submodel=None)
get_checkpoints()

Get a list of all available checkpoint files.

get_global_step()
get_layer_variables(layer)

Get the list of trainable variables in a layer of the graph.

get_model_filename(model_dir)

Given model directory, obtain filename for the model itself.

get_num_tasks()
get_params(deep=True)

Get parameters for this estimator.

Parameters:deep (boolean, optional) – If True, will return the parameters for this estimator and contained subobjects that are estimators.
Returns:params – Parameter names mapped to their values.
Return type:mapping of string to any
get_params_filename(model_dir)

Given model directory, obtain filename for the model itself.

get_pickling_errors(obj, seen=None)
get_pre_q_input(input_layer)
get_task_type()

Currently models can only be classifiers or regressors.

load_from_dir(model_dir, restore=True)
predict(dataset, transformers=[], outputs=None)

Uses self to make predictions on provided Dataset object.

Parameters:
  • dataset (dc.data.Dataset) – Dataset to make prediction on
  • transformers (list) – List of dc.trans.Transformers.
  • outputs (object) – If outputs is None, then will assume outputs = self.outputs[0] (single output). If outputs is a Layer/Tensor, then will evaluate and return as a single ndarray. If outputs is a list of Layers/Tensors, will return a list of ndarrays.
Returns:

results

Return type:

numpy ndarray or list of numpy ndarrays

predict_on_batch(X, transformers=[], outputs=None)

Generates predictions for input samples, processing samples in a batch.

Parameters:
  • X (ndarray) – the input data, as a Numpy array.
  • transformers (List) – List of dc.trans.Transformers
Returns:

Return type:

A Numpy array of predictions.

predict_on_generator(generator, transformers=[], outputs=None)[source]
predict_on_smiles(smiles, transformers=[], untransform=False)[source]

Generates predictions on a numpy array of smile strings

# Returns:
y_: numpy ndarray of shape (n_samples, n_tasks)
predict_proba(dataset, transformers=[], outputs=None)
Parameters:
  • dataset (dc.data.Dataset) – Dataset to make prediction on
  • transformers (list) – List of dc.trans.Transformers.
  • outputs (object) – If outputs is None, then will assume outputs = self.outputs[0] (single output). If outputs is a Layer/Tensor, then will evaluate and return as a single ndarray. If outputs is a list of Layers/Tensors, will return a list of ndarrays.
Returns:

y_pred

Return type:

numpy ndarray or list of numpy ndarrays

predict_proba_on_batch(X, transformers=[], outputs=None)

Generates predictions for input samples, processing samples in a batch.

Parameters:
  • X (ndarray) – the input data, as a Numpy array.
  • transformers (List) – List of dc.trans.Transformers
Returns:

Return type:

A Numpy array of predictions.

predict_proba_on_generator(generator, transformers=[], outputs=None)[source]
reload()

Reload trained model from disk.

restore(checkpoint=None)

Reload the values of all variables from a checkpoint file.

Parameters:checkpoint (str) – the path to the checkpoint file to load. If this is None, the most recent checkpoint will be chosen automatically. Call get_checkpoints() to get a list of all available checkpoints.
save()
save_checkpoint(max_checkpoints_to_keep=5)

Save a checkpoint to disk.

Usually you do not need to call this method, since fit() saves checkpoints automatically. If you have disabled automatic checkpointing during fitting, this can be called to manually write checkpoints.

Parameters:max_checkpoints_to_keep (int) – the maximum number of checkpoints to keep. Older checkpoints are discarded.
set_loss(layer)
set_optimizer(optimizer)

Set the optimizer to use for fitting.

set_params(**params)

Set the parameters of this estimator.

The method works on simple estimators as well as on nested objects (such as pipelines). The latter have parameters of the form <component>__<parameter> so that it’s possible to update each component of a nested object.

Returns:
Return type:self
topsort()
class deepchem.models.tensorgraph.models.graph_models.MPNNTensorGraph(n_tasks, n_atom_feat=70, n_pair_feat=8, n_hidden=100, T=5, M=10, mode='regression', **kwargs)[source]

Bases: deepchem.models.tensorgraph.tensor_graph.TensorGraph

Message Passing Neural Network, default structures built according to https://arxiv.org/abs/1511.06391

add_output(layer)
build()
build_graph()[source]
create_submodel(layers=None, loss=None, optimizer=None)

Create an alternate objective for training one piece of a TensorGraph.

A TensorGraph consists of a set of layers, and specifies a loss function and optimizer to use for training those layers. Usually this is sufficient, but there are cases where you want to train different parts of a model separately. For example, a GAN consists of a generator and a discriminator. They are trained separately, and they use different loss functions.

A submodel defines an alternate objective to use in cases like this. It may optionally specify any of the following: a subset of layers in the model to train; a different loss function; and a different optimizer to use. This method creates a submodel, which you can then pass to fit() to use it for training.

Parameters:
  • layers (list) – the list of layers to train. If None, all layers in the model will be trained.
  • loss (Layer) – the loss function to optimize. If None, the model’s main loss function will be used.
  • optimizer (Optimizer) – the optimizer to use for training. If None, the model’s main optimizer will be used.
Returns:

  • the newly created submodel, which can be passed to any of the fitting
  • methods.

default_generator(dataset, epochs=1, predict=False, deterministic=True, pad_batches=True)[source]

Same generator as Weave models

evaluate(dataset, metrics, transformers=[], per_task_metrics=False)

Evaluates the performance of this model on specified dataset.

Parameters:
  • dataset (dc.data.Dataset) – Dataset object.
  • metric (deepchem.metrics.Metric) – Evaluation metric
  • transformers (list) – List of deepchem.transformers.Transformer
  • per_task_metrics (bool) – If True, return per-task scores.
Returns:

Maps tasks to scores under metric.

Return type:

dict

evaluate_generator(feed_dict_generator, metrics, transformers=[], labels=None, outputs=None, weights=[], per_task_metrics=False)
fit(dataset, nb_epoch=10, max_checkpoints_to_keep=5, checkpoint_interval=1000, deterministic=False, restore=False, submodel=None, **kwargs)

Train this model on a dataset.

Parameters:
  • dataset (Dataset) – the Dataset to train on
  • nb_epoch (int) – the number of epochs to train for
  • max_checkpoints_to_keep (int) – the maximum number of checkpoints to keep. Older checkpoints are discarded.
  • checkpoint_interval (int) – the frequency at which to write checkpoints, measured in training steps. Set this to 0 to disable automatic checkpointing.
  • deterministic (bool) – if True, the samples are processed in order. If False, a different random order is used for each epoch.
  • restore (bool) – if True, restore the model from the most recent checkpoint and continue training from there. If False, retrain the model from scratch.
  • submodel (Submodel) – an alternate training objective to use. This should have been created by calling create_submodel().
fit_generator(feed_dict_generator, max_checkpoints_to_keep=5, checkpoint_interval=1000, restore=False, submodel=None)

Train this model on data from a generator.

Parameters:
  • feed_dict_generator (generator) – this should generate batches, each represented as a dict that maps Layers to values.
  • max_checkpoints_to_keep (int) – the maximum number of checkpoints to keep. Older checkpoints are discarded.
  • checkpoint_interval (int) – the frequency at which to write checkpoints, measured in training steps. Set this to 0 to disable automatic checkpointing.
  • restore (bool) – if True, restore the model from the most recent checkpoint and continue training from there. If False, retrain the model from scratch.
  • submodel (Submodel) – an alternate training objective to use. This should have been created by calling create_submodel().
Returns:

Return type:

the average loss over the most recent checkpoint interval

fit_on_batch(X, y, w, submodel=None)
get_checkpoints()

Get a list of all available checkpoint files.

get_global_step()
get_layer_variables(layer)

Get the list of trainable variables in a layer of the graph.

get_model_filename(model_dir)

Given model directory, obtain filename for the model itself.

get_num_tasks()
get_params(deep=True)

Get parameters for this estimator.

Parameters:deep (boolean, optional) – If True, will return the parameters for this estimator and contained subobjects that are estimators.
Returns:params – Parameter names mapped to their values.
Return type:mapping of string to any
get_params_filename(model_dir)

Given model directory, obtain filename for the model itself.

get_pickling_errors(obj, seen=None)
get_pre_q_input(input_layer)
get_task_type()

Currently models can only be classifiers or regressors.

load_from_dir(model_dir, restore=True)
predict(dataset, transformers=[], batch_size=None)[source]
predict_on_batch(X, transformers=[], outputs=None)

Generates predictions for input samples, processing samples in a batch.

Parameters:
  • X (ndarray) – the input data, as a Numpy array.
  • transformers (List) – List of dc.trans.Transformers
Returns:

Return type:

A Numpy array of predictions.

predict_on_generator(generator, transformers=[])[source]
predict_proba(dataset, transformers=[], batch_size=None)[source]
predict_proba_on_batch(X, transformers=[], outputs=None)

Generates predictions for input samples, processing samples in a batch.

Parameters:
  • X (ndarray) – the input data, as a Numpy array.
  • transformers (List) – List of dc.trans.Transformers
Returns:

Return type:

A Numpy array of predictions.

predict_proba_on_generator(generator, transformers=[])[source]
Returns:numpy ndarray of shape (n_samples, n_classes*n_tasks)
Return type:y_pred
reload()

Reload trained model from disk.

restore(checkpoint=None)

Reload the values of all variables from a checkpoint file.

Parameters:checkpoint (str) – the path to the checkpoint file to load. If this is None, the most recent checkpoint will be chosen automatically. Call get_checkpoints() to get a list of all available checkpoints.
save()
save_checkpoint(max_checkpoints_to_keep=5)

Save a checkpoint to disk.

Usually you do not need to call this method, since fit() saves checkpoints automatically. If you have disabled automatic checkpointing during fitting, this can be called to manually write checkpoints.

Parameters:max_checkpoints_to_keep (int) – the maximum number of checkpoints to keep. Older checkpoints are discarded.
set_loss(layer)
set_optimizer(optimizer)

Set the optimizer to use for fitting.

set_params(**params)

Set the parameters of this estimator.

The method works on simple estimators as well as on nested objects (such as pipelines). The latter have parameters of the form <component>__<parameter> so that it’s possible to update each component of a nested object.

Returns:
Return type:self
topsort()
class deepchem.models.tensorgraph.models.graph_models.PetroskiSuchTensorGraph(n_tasks, max_atoms=200, dropout=0.0, mode='classification', **kwargs)[source]

Bases: deepchem.models.tensorgraph.tensor_graph.TensorGraph

Model from Robust Spatial Filtering with Graph Convolutional Neural Networks https://arxiv.org/abs/1703.00792

add_output(layer)
build()
build_graph()[source]
create_submodel(layers=None, loss=None, optimizer=None)

Create an alternate objective for training one piece of a TensorGraph.

A TensorGraph consists of a set of layers, and specifies a loss function and optimizer to use for training those layers. Usually this is sufficient, but there are cases where you want to train different parts of a model separately. For example, a GAN consists of a generator and a discriminator. They are trained separately, and they use different loss functions.

A submodel defines an alternate objective to use in cases like this. It may optionally specify any of the following: a subset of layers in the model to train; a different loss function; and a different optimizer to use. This method creates a submodel, which you can then pass to fit() to use it for training.

Parameters:
  • layers (list) – the list of layers to train. If None, all layers in the model will be trained.
  • loss (Layer) – the loss function to optimize. If None, the model’s main loss function will be used.
  • optimizer (Optimizer) – the optimizer to use for training. If None, the model’s main optimizer will be used.
Returns:

  • the newly created submodel, which can be passed to any of the fitting
  • methods.

default_generator(dataset, epochs=1, predict=False, deterministic=True, pad_batches=True)[source]
evaluate(dataset, metrics, transformers=[], per_task_metrics=False)[source]
evaluate_generator(feed_dict_generator, metrics, transformers=[], labels=None, outputs=None, weights=[], per_task_metrics=False)
fit(dataset, nb_epoch=10, max_checkpoints_to_keep=5, checkpoint_interval=1000, deterministic=False, restore=False, submodel=None, **kwargs)

Train this model on a dataset.

Parameters:
  • dataset (Dataset) – the Dataset to train on
  • nb_epoch (int) – the number of epochs to train for
  • max_checkpoints_to_keep (int) – the maximum number of checkpoints to keep. Older checkpoints are discarded.
  • checkpoint_interval (int) – the frequency at which to write checkpoints, measured in training steps. Set this to 0 to disable automatic checkpointing.
  • deterministic (bool) – if True, the samples are processed in order. If False, a different random order is used for each epoch.
  • restore (bool) – if True, restore the model from the most recent checkpoint and continue training from there. If False, retrain the model from scratch.
  • submodel (Submodel) – an alternate training objective to use. This should have been created by calling create_submodel().
fit_generator(feed_dict_generator, max_checkpoints_to_keep=5, checkpoint_interval=1000, restore=False, submodel=None)

Train this model on data from a generator.

Parameters:
  • feed_dict_generator (generator) – this should generate batches, each represented as a dict that maps Layers to values.
  • max_checkpoints_to_keep (int) – the maximum number of checkpoints to keep. Older checkpoints are discarded.
  • checkpoint_interval (int) – the frequency at which to write checkpoints, measured in training steps. Set this to 0 to disable automatic checkpointing.
  • restore (bool) – if True, restore the model from the most recent checkpoint and continue training from there. If False, retrain the model from scratch.
  • submodel (Submodel) – an alternate training objective to use. This should have been created by calling create_submodel().
Returns:

Return type:

the average loss over the most recent checkpoint interval

fit_on_batch(X, y, w, submodel=None)
get_checkpoints()

Get a list of all available checkpoint files.

get_global_step()
get_layer_variables(layer)

Get the list of trainable variables in a layer of the graph.

get_model_filename(model_dir)

Given model directory, obtain filename for the model itself.

get_num_tasks()
get_params(deep=True)

Get parameters for this estimator.

Parameters:deep (boolean, optional) – If True, will return the parameters for this estimator and contained subobjects that are estimators.
Returns:params – Parameter names mapped to their values.
Return type:mapping of string to any
get_params_filename(model_dir)

Given model directory, obtain filename for the model itself.

get_pickling_errors(obj, seen=None)
get_pre_q_input(input_layer)
get_task_type()

Currently models can only be classifiers or regressors.

load_from_dir(model_dir, restore=True)
predict(dataset, transformers=[], outputs=None)

Uses self to make predictions on provided Dataset object.

Parameters:
  • dataset (dc.data.Dataset) – Dataset to make prediction on
  • transformers (list) – List of dc.trans.Transformers.
  • outputs (object) – If outputs is None, then will assume outputs = self.outputs[0] (single output). If outputs is a Layer/Tensor, then will evaluate and return as a single ndarray. If outputs is a list of Layers/Tensors, will return a list of ndarrays.
Returns:

results

Return type:

numpy ndarray or list of numpy ndarrays

predict_on_batch(X, transformers=[], outputs=None)

Generates predictions for input samples, processing samples in a batch.

Parameters:
  • X (ndarray) – the input data, as a Numpy array.
  • transformers (List) – List of dc.trans.Transformers
Returns:

Return type:

A Numpy array of predictions.

predict_on_generator(generator, transformers=[], outputs=None)
Parameters:
  • generator (Generator) – Generator that constructs feed dictionaries for TensorGraph.
  • transformers (list) – List of dc.trans.Transformers.
  • outputs (object) – If outputs is None, then will assume outputs = self.outputs. If outputs is a Layer/Tensor, then will evaluate and return as a single ndarray. If outputs is a list of Layers/Tensors, will return a list of ndarrays.
  • Returns – y_pred: numpy ndarray of shape (n_samples, n_classes*n_tasks)
predict_proba(dataset, transformers=[], outputs=None)
Parameters:
  • dataset (dc.data.Dataset) – Dataset to make prediction on
  • transformers (list) – List of dc.trans.Transformers.
  • outputs (object) – If outputs is None, then will assume outputs = self.outputs[0] (single output). If outputs is a Layer/Tensor, then will evaluate and return as a single ndarray. If outputs is a list of Layers/Tensors, will return a list of ndarrays.
Returns:

y_pred

Return type:

numpy ndarray or list of numpy ndarrays

predict_proba_on_batch(X, transformers=[], outputs=None)

Generates predictions for input samples, processing samples in a batch.

Parameters:
  • X (ndarray) – the input data, as a Numpy array.
  • transformers (List) – List of dc.trans.Transformers
Returns:

Return type:

A Numpy array of predictions.

predict_proba_on_generator(generator, transformers=[])[source]
reload()

Reload trained model from disk.

restore(checkpoint=None)

Reload the values of all variables from a checkpoint file.

Parameters:checkpoint (str) – the path to the checkpoint file to load. If this is None, the most recent checkpoint will be chosen automatically. Call get_checkpoints() to get a list of all available checkpoints.
save()
save_checkpoint(max_checkpoints_to_keep=5)

Save a checkpoint to disk.

Usually you do not need to call this method, since fit() saves checkpoints automatically. If you have disabled automatic checkpointing during fitting, this can be called to manually write checkpoints.

Parameters:max_checkpoints_to_keep (int) – the maximum number of checkpoints to keep. Older checkpoints are discarded.
set_loss(layer)
set_optimizer(optimizer)

Set the optimizer to use for fitting.

set_params(**params)

Set the parameters of this estimator.

The method works on simple estimators as well as on nested objects (such as pipelines). The latter have parameters of the form <component>__<parameter> so that it’s possible to update each component of a nested object.

Returns:
Return type:self
topsort()
class deepchem.models.tensorgraph.models.graph_models.WeaveTensorGraph(n_tasks, n_atom_feat=75, n_pair_feat=14, n_hidden=50, n_graph_feat=128, mode='classification', **kwargs)[source]

Bases: deepchem.models.tensorgraph.tensor_graph.TensorGraph

add_output(layer)
build()
build_graph()[source]

Building graph structures: Features => WeaveLayer => WeaveLayer => Dense => WeaveGather => Classification or Regression

create_submodel(layers=None, loss=None, optimizer=None)

Create an alternate objective for training one piece of a TensorGraph.

A TensorGraph consists of a set of layers, and specifies a loss function and optimizer to use for training those layers. Usually this is sufficient, but there are cases where you want to train different parts of a model separately. For example, a GAN consists of a generator and a discriminator. They are trained separately, and they use different loss functions.

A submodel defines an alternate objective to use in cases like this. It may optionally specify any of the following: a subset of layers in the model to train; a different loss function; and a different optimizer to use. This method creates a submodel, which you can then pass to fit() to use it for training.

Parameters:
  • layers (list) – the list of layers to train. If None, all layers in the model will be trained.
  • loss (Layer) – the loss function to optimize. If None, the model’s main loss function will be used.
  • optimizer (Optimizer) – the optimizer to use for training. If None, the model’s main optimizer will be used.
Returns:

  • the newly created submodel, which can be passed to any of the fitting
  • methods.

default_generator(dataset, epochs=1, predict=False, deterministic=True, pad_batches=True)[source]

TensorGraph style implementation

evaluate(dataset, metrics, transformers=[], per_task_metrics=False)

Evaluates the performance of this model on specified dataset.

Parameters:
  • dataset (dc.data.Dataset) – Dataset object.
  • metric (deepchem.metrics.Metric) – Evaluation metric
  • transformers (list) – List of deepchem.transformers.Transformer
  • per_task_metrics (bool) – If True, return per-task scores.
Returns:

Maps tasks to scores under metric.

Return type:

dict

evaluate_generator(feed_dict_generator, metrics, transformers=[], labels=None, outputs=None, weights=[], per_task_metrics=False)
fit(dataset, nb_epoch=10, max_checkpoints_to_keep=5, checkpoint_interval=1000, deterministic=False, restore=False, submodel=None, **kwargs)

Train this model on a dataset.

Parameters:
  • dataset (Dataset) – the Dataset to train on
  • nb_epoch (int) – the number of epochs to train for
  • max_checkpoints_to_keep (int) – the maximum number of checkpoints to keep. Older checkpoints are discarded.
  • checkpoint_interval (int) – the frequency at which to write checkpoints, measured in training steps. Set this to 0 to disable automatic checkpointing.
  • deterministic (bool) – if True, the samples are processed in order. If False, a different random order is used for each epoch.
  • restore (bool) – if True, restore the model from the most recent checkpoint and continue training from there. If False, retrain the model from scratch.
  • submodel (Submodel) – an alternate training objective to use. This should have been created by calling create_submodel().
fit_generator(feed_dict_generator, max_checkpoints_to_keep=5, checkpoint_interval=1000, restore=False, submodel=None)

Train this model on data from a generator.

Parameters:
  • feed_dict_generator (generator) – this should generate batches, each represented as a dict that maps Layers to values.
  • max_checkpoints_to_keep (int) – the maximum number of checkpoints to keep. Older checkpoints are discarded.
  • checkpoint_interval (int) – the frequency at which to write checkpoints, measured in training steps. Set this to 0 to disable automatic checkpointing.
  • restore (bool) – if True, restore the model from the most recent checkpoint and continue training from there. If False, retrain the model from scratch.
  • submodel (Submodel) – an alternate training objective to use. This should have been created by calling create_submodel().
Returns:

Return type:

the average loss over the most recent checkpoint interval

fit_on_batch(X, y, w, submodel=None)
get_checkpoints()

Get a list of all available checkpoint files.

get_global_step()
get_layer_variables(layer)

Get the list of trainable variables in a layer of the graph.

get_model_filename(model_dir)

Given model directory, obtain filename for the model itself.

get_num_tasks()
get_params(deep=True)

Get parameters for this estimator.

Parameters:deep (boolean, optional) – If True, will return the parameters for this estimator and contained subobjects that are estimators.
Returns:params – Parameter names mapped to their values.
Return type:mapping of string to any
get_params_filename(model_dir)

Given model directory, obtain filename for the model itself.

get_pickling_errors(obj, seen=None)
get_pre_q_input(input_layer)
get_task_type()

Currently models can only be classifiers or regressors.

load_from_dir(model_dir, restore=True)
predict(dataset, transformers=[], outputs=None)

Uses self to make predictions on provided Dataset object.

Parameters:
  • dataset (dc.data.Dataset) – Dataset to make prediction on
  • transformers (list) – List of dc.trans.Transformers.
  • outputs (object) – If outputs is None, then will assume outputs = self.outputs[0] (single output). If outputs is a Layer/Tensor, then will evaluate and return as a single ndarray. If outputs is a list of Layers/Tensors, will return a list of ndarrays.
Returns:

results

Return type:

numpy ndarray or list of numpy ndarrays

predict_on_batch(X, transformers=[], outputs=None)

Generates predictions for input samples, processing samples in a batch.

Parameters:
  • X (ndarray) – the input data, as a Numpy array.
  • transformers (List) – List of dc.trans.Transformers
Returns:

Return type:

A Numpy array of predictions.

predict_on_generator(generator, transformers=[], outputs=None)[source]
predict_proba(dataset, transformers=[], outputs=None)
Parameters:
  • dataset (dc.data.Dataset) – Dataset to make prediction on
  • transformers (list) – List of dc.trans.Transformers.
  • outputs (object) – If outputs is None, then will assume outputs = self.outputs[0] (single output). If outputs is a Layer/Tensor, then will evaluate and return as a single ndarray. If outputs is a list of Layers/Tensors, will return a list of ndarrays.
Returns:

y_pred

Return type:

numpy ndarray or list of numpy ndarrays

predict_proba_on_batch(X, transformers=[], outputs=None)

Generates predictions for input samples, processing samples in a batch.

Parameters:
  • X (ndarray) – the input data, as a Numpy array.
  • transformers (List) – List of dc.trans.Transformers
Returns:

Return type:

A Numpy array of predictions.

predict_proba_on_generator(generator, transformers=[], outputs=None)
Returns:numpy ndarray of shape (n_samples, n_classes*n_tasks)
Return type:y_pred
reload()

Reload trained model from disk.

restore(checkpoint=None)

Reload the values of all variables from a checkpoint file.

Parameters:checkpoint (str) – the path to the checkpoint file to load. If this is None, the most recent checkpoint will be chosen automatically. Call get_checkpoints() to get a list of all available checkpoints.
save()
save_checkpoint(max_checkpoints_to_keep=5)

Save a checkpoint to disk.

Usually you do not need to call this method, since fit() saves checkpoints automatically. If you have disabled automatic checkpointing during fitting, this can be called to manually write checkpoints.

Parameters:max_checkpoints_to_keep (int) – the maximum number of checkpoints to keep. Older checkpoints are discarded.
set_loss(layer)
set_optimizer(optimizer)

Set the optimizer to use for fitting.

set_params(**params)

Set the parameters of this estimator.

The method works on simple estimators as well as on nested objects (such as pipelines). The latter have parameters of the form <component>__<parameter> so that it’s possible to update each component of a nested object.

Returns:
Return type:self
topsort()

deepchem.models.tensorgraph.models.robust_multitask module

deepchem.models.tensorgraph.models.seqtoseq module

Sequence to sequence translation models.

class deepchem.models.tensorgraph.models.seqtoseq.AspuruGuzikAutoEncoder(num_tokens, max_output_length, embedding_dimension=196, filter_sizes=[9, 9, 10], kernel_sizes=[9, 9, 11], decoder_dimension=488, **kwargs)[source]

Bases: deepchem.models.tensorgraph.models.seqtoseq.SeqToSeq

This is an implementation of Automatic Chemical Design Using a Continuous Representation of Molecules http://pubs.acs.org/doi/full/10.1021/acscentsci.7b00572

We report a method to convert discrete representations of molecules to and from a multidimensional continuous representation. This model allows us to generate new molecules for efficient exploration and optimization through open-ended spaces of chemical compounds. A deep neural network was trained on hundreds of thousands of existing chemical structures to construct three coupled functions: an encoder, a decoder, and a predictor. The encoder converts the discrete representation of a molecule into a real-valued continuous vector, and the decoder converts these continuous vectors back to discrete molecular representations. The predictor estimates chemical properties from the latent continuous vector representation of the molecule. Continuous representations of molecules allow us to automatically generate novel chemical structures by performing simple operations in the latent space, such as decoding random vectors, perturbing known chemical structures, or interpolating between molecules. Continuous representations also allow the use of powerful gradient-based optimization to efficiently guide the search for optimized functional compounds. We demonstrate our method in the domain of drug-like molecules and also in a set of molecules with fewer that nine heavy atoms.

Notes

This is currently an imperfect reproduction of the paper. One difference is that teacher forcing in the decoder is not implemented. The paper also discusses co-learning molecular properties at the same time as training the encoder/decoder. This is not done here. The hyperparameters chosen are from ZINC dataset.

This network also currently suffers from exploding gradients. Care has to be taken when training.

NOTE(LESWING): Will need to play around with annealing schedule to not have exploding gradients TODO(LESWING): Teacher Forcing TODO(LESWING): Sigmoid variational loss annealing schedule The output GRU layer had one additional input, corresponding to the character sampled from the softmax output of the previous time step and was trained using teacher forcing. 48 This increased the accuracy of generated SMILES strings, which resulted in higher fractions of valid SMILES strings for latent points outside the training data, but also made training more difficult, since the decoder showed a tendency to ignore the (variational) encoding and rely solely on the input sequence. The variational loss was annealed according to sigmoid schedule after 29 epochs, running for a total 120 epochs

I also added a BatchNorm before the mean and std embedding layers. This has empiracally made training more stable, and is discussed in Ladder Variational Autoencoders. https://arxiv.org/pdf/1602.02282.pdf Maybe if Teacher Forcing and Sigmoid variational loss annealing schedule are used the BatchNorm will no longer be neccessary.

add_output(layer)
build()
create_submodel(layers=None, loss=None, optimizer=None)

Create an alternate objective for training one piece of a TensorGraph.

A TensorGraph consists of a set of layers, and specifies a loss function and optimizer to use for training those layers. Usually this is sufficient, but there are cases where you want to train different parts of a model separately. For example, a GAN consists of a generator and a discriminator. They are trained separately, and they use different loss functions.

A submodel defines an alternate objective to use in cases like this. It may optionally specify any of the following: a subset of layers in the model to train; a different loss function; and a different optimizer to use. This method creates a submodel, which you can then pass to fit() to use it for training.

Parameters:
  • layers (list) – the list of layers to train. If None, all layers in the model will be trained.
  • loss (Layer) – the loss function to optimize. If None, the model’s main loss function will be used.
  • optimizer (Optimizer) – the optimizer to use for training. If None, the model’s main optimizer will be used.
Returns:

  • the newly created submodel, which can be passed to any of the fitting
  • methods.

default_generator(dataset, epochs=1, predict=False, deterministic=True, pad_batches=True)
evaluate(dataset, metrics, transformers=[], per_task_metrics=False)

Evaluates the performance of this model on specified dataset.

Parameters:
  • dataset (dc.data.Dataset) – Dataset object.
  • metric (deepchem.metrics.Metric) – Evaluation metric
  • transformers (list) – List of deepchem.transformers.Transformer
  • per_task_metrics (bool) – If True, return per-task scores.
Returns:

Maps tasks to scores under metric.

Return type:

dict

evaluate_generator(feed_dict_generator, metrics, transformers=[], labels=None, outputs=None, weights=[], per_task_metrics=False)
fit(dataset, nb_epoch=10, max_checkpoints_to_keep=5, checkpoint_interval=1000, deterministic=False, restore=False, submodel=None, **kwargs)

Train this model on a dataset.

Parameters:
  • dataset (Dataset) – the Dataset to train on
  • nb_epoch (int) – the number of epochs to train for
  • max_checkpoints_to_keep (int) – the maximum number of checkpoints to keep. Older checkpoints are discarded.
  • checkpoint_interval (int) – the frequency at which to write checkpoints, measured in training steps. Set this to 0 to disable automatic checkpointing.
  • deterministic (bool) – if True, the samples are processed in order. If False, a different random order is used for each epoch.
  • restore (bool) – if True, restore the model from the most recent checkpoint and continue training from there. If False, retrain the model from scratch.
  • submodel (Submodel) – an alternate training objective to use. This should have been created by calling create_submodel().
fit_generator(feed_dict_generator, max_checkpoints_to_keep=5, checkpoint_interval=1000, restore=False, submodel=None)

Train this model on data from a generator.

Parameters:
  • feed_dict_generator (generator) – this should generate batches, each represented as a dict that maps Layers to values.
  • max_checkpoints_to_keep (int) – the maximum number of checkpoints to keep. Older checkpoints are discarded.
  • checkpoint_interval (int) – the frequency at which to write checkpoints, measured in training steps. Set this to 0 to disable automatic checkpointing.
  • restore (bool) – if True, restore the model from the most recent checkpoint and continue training from there. If False, retrain the model from scratch.
  • submodel (Submodel) – an alternate training objective to use. This should have been created by calling create_submodel().
Returns:

Return type:

the average loss over the most recent checkpoint interval

fit_on_batch(X, y, w, submodel=None)
fit_sequences(sequences, max_checkpoints_to_keep=5, checkpoint_interval=1000, restore=False)

Train this model on a set of sequences

Parameters:
  • sequences (iterable) – the training samples to fit to. Each sample should be represented as a tuple of the form (input_sequence, output_sequence).
  • max_checkpoints_to_keep (int) – the maximum number of checkpoints to keep. Older checkpoints are discarded.
  • checkpoint_interval (int) – the frequency at which to write checkpoints, measured in training steps.
  • restore (bool) – if True, restore the model from the most recent checkpoint and continue training from there. If False, retrain the model from scratch.
get_checkpoints()

Get a list of all available checkpoint files.

get_global_step()
get_layer_variables(layer)

Get the list of trainable variables in a layer of the graph.

get_model_filename(model_dir)

Given model directory, obtain filename for the model itself.

get_num_tasks()
get_params(deep=True)

Get parameters for this estimator.

Parameters:deep (boolean, optional) – If True, will return the parameters for this estimator and contained subobjects that are estimators.
Returns:params – Parameter names mapped to their values.
Return type:mapping of string to any
get_params_filename(model_dir)

Given model directory, obtain filename for the model itself.

get_pickling_errors(obj, seen=None)
get_pre_q_input(input_layer)
get_task_type()

Currently models can only be classifiers or regressors.

load_from_dir(model_dir, restore=True)
predict(dataset, transformers=[], outputs=None)

Uses self to make predictions on provided Dataset object.

Parameters:
  • dataset (dc.data.Dataset) – Dataset to make prediction on
  • transformers (list) – List of dc.trans.Transformers.
  • outputs (object) – If outputs is None, then will assume outputs = self.outputs[0] (single output). If outputs is a Layer/Tensor, then will evaluate and return as a single ndarray. If outputs is a list of Layers/Tensors, will return a list of ndarrays.
Returns:

results

Return type:

numpy ndarray or list of numpy ndarrays

predict_embeddings(sequences)[source]

Given a set of input sequences, compute the embedding vectors.

Parameters:sequences (iterable) – the input sequences to generate an embedding vector for
predict_from_embeddings(embeddings, beam_width=5)

Given a set of embedding vectors, predict the output sequences.

The prediction is done using a beam search with length normalization.

Parameters:
  • embeddings (iterable) – the embedding vectors to generate predictions for
  • beam_width (int) – the beam width to use for searching. Set to 1 to use a simple greedy search.
predict_from_sequences(sequences, beam_width=5)[source]

Given a set of input sequences, predict the output sequences.

The prediction is done using a beam search with length normalization.

Parameters:
  • sequences (iterable) – the input sequences to generate a prediction for
  • beam_width (int) – the beam width to use for searching. Set to 1 to use a simple greedy search.
predict_on_batch(X, transformers=[], outputs=None)

Generates predictions for input samples, processing samples in a batch.

Parameters:
  • X (ndarray) – the input data, as a Numpy array.
  • transformers (List) – List of dc.trans.Transformers
Returns:

Return type:

A Numpy array of predictions.

predict_on_generator(generator, transformers=[], outputs=None)
Parameters:
  • generator (Generator) – Generator that constructs feed dictionaries for TensorGraph.
  • transformers (list) – List of dc.trans.Transformers.
  • outputs (object) – If outputs is None, then will assume outputs = self.outputs. If outputs is a Layer/Tensor, then will evaluate and return as a single ndarray. If outputs is a list of Layers/Tensors, will return a list of ndarrays.
  • Returns – y_pred: numpy ndarray of shape (n_samples, n_classes*n_tasks)
predict_proba(dataset, transformers=[], outputs=None)
Parameters:
  • dataset (dc.data.Dataset) – Dataset to make prediction on
  • transformers (list) – List of dc.trans.Transformers.
  • outputs (object) – If outputs is None, then will assume outputs = self.outputs[0] (single output). If outputs is a Layer/Tensor, then will evaluate and return as a single ndarray. If outputs is a list of Layers/Tensors, will return a list of ndarrays.
Returns:

y_pred

Return type:

numpy ndarray or list of numpy ndarrays

predict_proba_on_batch(X, transformers=[], outputs=None)

Generates predictions for input samples, processing samples in a batch.

Parameters:
  • X (ndarray) – the input data, as a Numpy array.
  • transformers (List) – List of dc.trans.Transformers
Returns:

Return type:

A Numpy array of predictions.

predict_proba_on_generator(generator, transformers=[], outputs=None)
Returns:numpy ndarray of shape (n_samples, n_classes*n_tasks)
Return type:y_pred
reload()

Reload trained model from disk.

restore(checkpoint=None)

Reload the values of all variables from a checkpoint file.

Parameters:checkpoint (str) – the path to the checkpoint file to load. If this is None, the most recent checkpoint will be chosen automatically. Call get_checkpoints() to get a list of all available checkpoints.
save()
save_checkpoint(max_checkpoints_to_keep=5)

Save a checkpoint to disk.

Usually you do not need to call this method, since fit() saves checkpoints automatically. If you have disabled automatic checkpointing during fitting, this can be called to manually write checkpoints.

Parameters:max_checkpoints_to_keep (int) – the maximum number of checkpoints to keep. Older checkpoints are discarded.
sequence_end = <object object>
set_loss(layer)
set_optimizer(optimizer)

Set the optimizer to use for fitting.

set_params(**params)

Set the parameters of this estimator.

The method works on simple estimators as well as on nested objects (such as pipelines). The latter have parameters of the form <component>__<parameter> so that it’s possible to update each component of a nested object.

Returns:
Return type:self
topsort()
class deepchem.models.tensorgraph.models.seqtoseq.SeqToSeq(input_tokens, output_tokens, max_output_length, encoder_layers=4, decoder_layers=4, embedding_dimension=512, dropout=0.0, reverse_input=True, variational=False, annealing_start_step=5000, annealing_final_step=10000, **kwargs)[source]

Bases: deepchem.models.tensorgraph.tensor_graph.TensorGraph

Implements sequence to sequence translation models.

The model is based on the description in Sutskever et al., “Sequence to Sequence Learning with Neural Networks” (https://arxiv.org/abs/1409.3215), although this implementation uses GRUs instead of LSTMs. The goal is to take sequences of tokens as input, and translate each one into a different output sequence. The input and output sequences can both be of variable length, and an output sequence need not have the same length as the input sequence it was generated from. For example, these models were originally developed for use in natural language processing. In that context, the input might be a sequence of English words, and the output might be a sequence of French words. The goal would be to train the model to translate sentences from English to French.

The model consists of two parts called the “encoder” and “decoder”. Each one consists of a stack of recurrent layers. The job of the encoder is to transform the input sequence into a single, fixed length vector called the “embedding”. That vector contains all relevant information from the input sequence. The decoder then transforms the embedding vector into the output sequence.

These models can be used for various purposes. First and most obviously, they can be used for sequence to sequence translation. In any case where you have sequences of tokens, and you want to translate each one into a different sequence, a SeqToSeq model can be trained to perform the translation.

Another possible use case is transforming variable length sequences into fixed length vectors. Many types of models require their inputs to have a fixed shape, which makes it difficult to use them with variable sized inputs (for example, when the input is a molecule, and different molecules have different numbers of atoms). In that case, you can train a SeqToSeq model as an autoencoder, so that it tries to make the output sequence identical to the input one. That forces the embedding vector to contain all information from the original sequence. You can then use the encoder for transforming sequences into fixed length embedding vectors, suitable to use as inputs to other types of models.

Another use case is to train the decoder for use as a generative model. Here again you begin by training the SeqToSeq model as an autoencoder. Once training is complete, you can supply arbitrary embedding vectors, and transform each one into an output sequence. When used in this way, you typically train it as a variational autoencoder. This adds random noise to the encoder, and also adds a constraint term to the loss that forces the embedding vector to have a unit Gaussian distribution. You can then pick random vectors from a Gaussian distribution, and the output sequences should follow the same distribution as the training data.

When training as a variational autoencoder, it is best to use KL cost annealing, as described in https://arxiv.org/abs/1511.06349. The constraint term in the loss is initially set to 0, so the optimizer just tries to minimize the reconstruction loss. Once it has made reasonable progress toward that, the constraint term can be gradually turned back on. The range of steps over which this happens is configurable.

add_output(layer)
build()
create_submodel(layers=None, loss=None, optimizer=None)

Create an alternate objective for training one piece of a TensorGraph.

A TensorGraph consists of a set of layers, and specifies a loss function and optimizer to use for training those layers. Usually this is sufficient, but there are cases where you want to train different parts of a model separately. For example, a GAN consists of a generator and a discriminator. They are trained separately, and they use different loss functions.

A submodel defines an alternate objective to use in cases like this. It may optionally specify any of the following: a subset of layers in the model to train; a different loss function; and a different optimizer to use. This method creates a submodel, which you can then pass to fit() to use it for training.

Parameters:
  • layers (list) – the list of layers to train. If None, all layers in the model will be trained.
  • loss (Layer) – the loss function to optimize. If None, the model’s main loss function will be used.
  • optimizer (Optimizer) – the optimizer to use for training. If None, the model’s main optimizer will be used.
Returns:

  • the newly created submodel, which can be passed to any of the fitting
  • methods.

default_generator(dataset, epochs=1, predict=False, deterministic=True, pad_batches=True)
evaluate(dataset, metrics, transformers=[], per_task_metrics=False)

Evaluates the performance of this model on specified dataset.

Parameters:
  • dataset (dc.data.Dataset) – Dataset object.
  • metric (deepchem.metrics.Metric) – Evaluation metric
  • transformers (list) – List of deepchem.transformers.Transformer
  • per_task_metrics (bool) – If True, return per-task scores.
Returns:

Maps tasks to scores under metric.

Return type:

dict

evaluate_generator(feed_dict_generator, metrics, transformers=[], labels=None, outputs=None, weights=[], per_task_metrics=False)
fit(dataset, nb_epoch=10, max_checkpoints_to_keep=5, checkpoint_interval=1000, deterministic=False, restore=False, submodel=None, **kwargs)

Train this model on a dataset.

Parameters:
  • dataset (Dataset) – the Dataset to train on
  • nb_epoch (int) – the number of epochs to train for
  • max_checkpoints_to_keep (int) – the maximum number of checkpoints to keep. Older checkpoints are discarded.
  • checkpoint_interval (int) – the frequency at which to write checkpoints, measured in training steps. Set this to 0 to disable automatic checkpointing.
  • deterministic (bool) – if True, the samples are processed in order. If False, a different random order is used for each epoch.
  • restore (bool) – if True, restore the model from the most recent checkpoint and continue training from there. If False, retrain the model from scratch.
  • submodel (Submodel) – an alternate training objective to use. This should have been created by calling create_submodel().
fit_generator(feed_dict_generator, max_checkpoints_to_keep=5, checkpoint_interval=1000, restore=False, submodel=None)

Train this model on data from a generator.

Parameters:
  • feed_dict_generator (generator) – this should generate batches, each represented as a dict that maps Layers to values.
  • max_checkpoints_to_keep (int) – the maximum number of checkpoints to keep. Older checkpoints are discarded.
  • checkpoint_interval (int) – the frequency at which to write checkpoints, measured in training steps. Set this to 0 to disable automatic checkpointing.
  • restore (bool) – if True, restore the model from the most recent checkpoint and continue training from there. If False, retrain the model from scratch.
  • submodel (Submodel) – an alternate training objective to use. This should have been created by calling create_submodel().
Returns:

Return type:

the average loss over the most recent checkpoint interval

fit_on_batch(X, y, w, submodel=None)
fit_sequences(sequences, max_checkpoints_to_keep=5, checkpoint_interval=1000, restore=False)[source]

Train this model on a set of sequences

Parameters:
  • sequences (iterable) – the training samples to fit to. Each sample should be represented as a tuple of the form (input_sequence, output_sequence).
  • max_checkpoints_to_keep (int) – the maximum number of checkpoints to keep. Older checkpoints are discarded.
  • checkpoint_interval (int) – the frequency at which to write checkpoints, measured in training steps.
  • restore (bool) – if True, restore the model from the most recent checkpoint and continue training from there. If False, retrain the model from scratch.
get_checkpoints()

Get a list of all available checkpoint files.

get_global_step()
get_layer_variables(layer)

Get the list of trainable variables in a layer of the graph.

get_model_filename(model_dir)

Given model directory, obtain filename for the model itself.

get_num_tasks()
get_params(deep=True)

Get parameters for this estimator.

Parameters:deep (boolean, optional) – If True, will return the parameters for this estimator and contained subobjects that are estimators.
Returns:params – Parameter names mapped to their values.
Return type:mapping of string to any
get_params_filename(model_dir)

Given model directory, obtain filename for the model itself.

get_pickling_errors(obj, seen=None)
get_pre_q_input(input_layer)
get_task_type()

Currently models can only be classifiers or regressors.

load_from_dir(model_dir, restore=True)
predict(dataset, transformers=[], outputs=None)

Uses self to make predictions on provided Dataset object.

Parameters:
  • dataset (dc.data.Dataset) – Dataset to make prediction on
  • transformers (list) – List of dc.trans.Transformers.
  • outputs (object) – If outputs is None, then will assume outputs = self.outputs[0] (single output). If outputs is a Layer/Tensor, then will evaluate and return as a single ndarray. If outputs is a list of Layers/Tensors, will return a list of ndarrays.
Returns:

results

Return type:

numpy ndarray or list of numpy ndarrays

predict_embeddings(sequences)[source]

Given a set of input sequences, compute the embedding vectors.

Parameters:sequences (iterable) – the input sequences to generate an embedding vector for
predict_from_embeddings(embeddings, beam_width=5)[source]

Given a set of embedding vectors, predict the output sequences.

The prediction is done using a beam search with length normalization.

Parameters:
  • embeddings (iterable) – the embedding vectors to generate predictions for
  • beam_width (int) – the beam width to use for searching. Set to 1 to use a simple greedy search.
predict_from_sequences(sequences, beam_width=5)[source]

Given a set of input sequences, predict the output sequences.

The prediction is done using a beam search with length normalization.

Parameters:
  • sequences (iterable) – the input sequences to generate a prediction for
  • beam_width (int) – the beam width to use for searching. Set to 1 to use a simple greedy search.
predict_on_batch(X, transformers=[], outputs=None)

Generates predictions for input samples, processing samples in a batch.

Parameters:
  • X (ndarray) – the input data, as a Numpy array.
  • transformers (List) – List of dc.trans.Transformers
Returns:

Return type:

A Numpy array of predictions.

predict_on_generator(generator, transformers=[], outputs=None)
Parameters:
  • generator (Generator) – Generator that constructs feed dictionaries for TensorGraph.
  • transformers (list) – List of dc.trans.Transformers.
  • outputs (object) – If outputs is None, then will assume outputs = self.outputs. If outputs is a Layer/Tensor, then will evaluate and return as a single ndarray. If outputs is a list of Layers/Tensors, will return a list of ndarrays.
  • Returns – y_pred: numpy ndarray of shape (n_samples, n_classes*n_tasks)
predict_proba(dataset, transformers=[], outputs=None)
Parameters:
  • dataset (dc.data.Dataset) – Dataset to make prediction on
  • transformers (list) – List of dc.trans.Transformers.
  • outputs (object) – If outputs is None, then will assume outputs = self.outputs[0] (single output). If outputs is a Layer/Tensor, then will evaluate and return as a single ndarray. If outputs is a list of Layers/Tensors, will return a list of ndarrays.
Returns:

y_pred

Return type:

numpy ndarray or list of numpy ndarrays

predict_proba_on_batch(X, transformers=[], outputs=None)

Generates predictions for input samples, processing samples in a batch.

Parameters:
  • X (ndarray) – the input data, as a Numpy array.
  • transformers (List) – List of dc.trans.Transformers
Returns:

Return type:

A Numpy array of predictions.

predict_proba_on_generator(generator, transformers=[], outputs=None)
Returns:numpy ndarray of shape (n_samples, n_classes*n_tasks)
Return type:y_pred
reload()

Reload trained model from disk.

restore(checkpoint=None)

Reload the values of all variables from a checkpoint file.

Parameters:checkpoint (str) – the path to the checkpoint file to load. If this is None, the most recent checkpoint will be chosen automatically. Call get_checkpoints() to get a list of all available checkpoints.
save()
save_checkpoint(max_checkpoints_to_keep=5)

Save a checkpoint to disk.

Usually you do not need to call this method, since fit() saves checkpoints automatically. If you have disabled automatic checkpointing during fitting, this can be called to manually write checkpoints.

Parameters:max_checkpoints_to_keep (int) – the maximum number of checkpoints to keep. Older checkpoints are discarded.
sequence_end = <object object>
set_loss(layer)
set_optimizer(optimizer)

Set the optimizer to use for fitting.

set_params(**params)

Set the parameters of this estimator.

The method works on simple estimators as well as on nested objects (such as pipelines). The latter have parameters of the form <component>__<parameter> so that it’s possible to update each component of a nested object.

Returns:
Return type:self
topsort()

deepchem.models.tensorgraph.models.sequence_dnn module

Implements SequenceDNNs for use in DRAGONN models.

Code adapated from github.com/kundajelab/dragonn repository. The SequenceDNN class is useful for prediction tasks working with genomic data.

class deepchem.models.tensorgraph.models.sequence_dnn.SequenceDNN(seq_length, use_RNN=False, num_tasks=1, num_filters=15, kernel_size=15, pool_width=35, L1=0, dropout=0.0, verbose=True, **kwargs)[source]

Bases: deepchem.models.tensorgraph.sequential.Sequential

Sequence DNN models.

# TODO(rbharath): This model only supports one-conv layer. Extend # so that conv layers of greater depth can be implemented.

Parameters:
  • seq_length (int) – length of input sequence.
  • num_tasks (int, optional) – number of tasks. Default: 1.
  • num_filters (list[int] | tuple[int]) – number of convolutional filters in each layer. Default: (15,).
  • conv_width (list[int] | tuple[int]) – width of each layer’s convolutional filters. Default: (15,).
  • pool_width (int) – width of max pooling after the last layer. Default: 35.
  • L1 (float) – strength of L1 penalty.
  • dropout (float) – dropout probability in every convolutional layer. Default: 0.
  • verbose (bool) – Verbose print statements activated if true.
add(layer)

Adds a new layer to model.

layer: Layer
Adds layer to this graph.
add_output(layer)
build()
create_submodel(layers=None, loss=None, optimizer=None)

Create an alternate objective for training one piece of a TensorGraph.

A TensorGraph consists of a set of layers, and specifies a loss function and optimizer to use for training those layers. Usually this is sufficient, but there are cases where you want to train different parts of a model separately. For example, a GAN consists of a generator and a discriminator. They are trained separately, and they use different loss functions.

A submodel defines an alternate objective to use in cases like this. It may optionally specify any of the following: a subset of layers in the model to train; a different loss function; and a different optimizer to use. This method creates a submodel, which you can then pass to fit() to use it for training.

Parameters:
  • layers (list) – the list of layers to train. If None, all layers in the model will be trained.
  • loss (Layer) – the loss function to optimize. If None, the model’s main loss function will be used.
  • optimizer (Optimizer) – the optimizer to use for training. If None, the model’s main optimizer will be used.
Returns:

  • the newly created submodel, which can be passed to any of the fitting
  • methods.

default_generator(dataset, epochs=1, predict=False, deterministic=True, pad_batches=True)
evaluate(dataset, metrics, transformers=[], per_task_metrics=False)

Evaluates the performance of this model on specified dataset.

Parameters:
  • dataset (dc.data.Dataset) – Dataset object.
  • metric (deepchem.metrics.Metric) – Evaluation metric
  • transformers (list) – List of deepchem.transformers.Transformer
  • per_task_metrics (bool) – If True, return per-task scores.
Returns:

Maps tasks to scores under metric.

Return type:

dict

evaluate_generator(feed_dict_generator, metrics, transformers=[], labels=None, outputs=None, weights=[], per_task_metrics=False)
fit(dataset, loss, **kwargs)

Fits on the specified dataset.

If called for the first time, constructs the TensorFlow graph for this model. Fits this graph on the specified dataset according to the specified loss.

Parameters:
  • dataset (dc.data.Dataset) – Dataset with data
  • loss (string) – Only “binary_crossentropy” or “mse” for now.
fit_generator(feed_dict_generator, max_checkpoints_to_keep=5, checkpoint_interval=1000, restore=False, submodel=None)

Train this model on data from a generator.

Parameters:
  • feed_dict_generator (generator) – this should generate batches, each represented as a dict that maps Layers to values.
  • max_checkpoints_to_keep (int) – the maximum number of checkpoints to keep. Older checkpoints are discarded.
  • checkpoint_interval (int) – the frequency at which to write checkpoints, measured in training steps. Set this to 0 to disable automatic checkpointing.
  • restore (bool) – if True, restore the model from the most recent checkpoint and continue training from there. If False, retrain the model from scratch.
  • submodel (Submodel) – an alternate training objective to use. This should have been created by calling create_submodel().
Returns:

Return type:

the average loss over the most recent checkpoint interval

fit_on_batch(X, y, w, submodel=None)
get_checkpoints()

Get a list of all available checkpoint files.

get_global_step()
get_layer_variables(layer)

Get the list of trainable variables in a layer of the graph.

get_model_filename(model_dir)

Given model directory, obtain filename for the model itself.

get_num_tasks()
get_params(deep=True)

Get parameters for this estimator.

Parameters:deep (boolean, optional) – If True, will return the parameters for this estimator and contained subobjects that are estimators.
Returns:params – Parameter names mapped to their values.
Return type:mapping of string to any
get_params_filename(model_dir)

Given model directory, obtain filename for the model itself.

get_pickling_errors(obj, seen=None)
get_pre_q_input(input_layer)
get_task_type()

Currently models can only be classifiers or regressors.

load_from_dir(model_dir, restore=True)
predict(dataset, transformers=[], outputs=None)

Uses self to make predictions on provided Dataset object.

Parameters:
  • dataset (dc.data.Dataset) – Dataset to make prediction on
  • transformers (list) – List of dc.trans.Transformers.
  • outputs (object) – If outputs is None, then will assume outputs = self.outputs[0] (single output). If outputs is a Layer/Tensor, then will evaluate and return as a single ndarray. If outputs is a list of Layers/Tensors, will return a list of ndarrays.
Returns:

results

Return type:

numpy ndarray or list of numpy ndarrays

predict_on_batch(X, transformers=[], outputs=None)

Generates predictions for input samples, processing samples in a batch.

Parameters:
  • X (ndarray) – the input data, as a Numpy array.
  • transformers (List) – List of dc.trans.Transformers
Returns:

Return type:

A Numpy array of predictions.

predict_on_generator(generator, transformers=[], outputs=None)
Parameters:
  • generator (Generator) – Generator that constructs feed dictionaries for TensorGraph.
  • transformers (list) – List of dc.trans.Transformers.
  • outputs (object) – If outputs is None, then will assume outputs = self.outputs. If outputs is a Layer/Tensor, then will evaluate and return as a single ndarray. If outputs is a list of Layers/Tensors, will return a list of ndarrays.
  • Returns – y_pred: numpy ndarray of shape (n_samples, n_classes*n_tasks)
predict_proba(dataset, transformers=[], outputs=None)
Parameters:
  • dataset (dc.data.Dataset) – Dataset to make prediction on
  • transformers (list) – List of dc.trans.Transformers.
  • outputs (object) – If outputs is None, then will assume outputs = self.outputs[0] (single output). If outputs is a Layer/Tensor, then will evaluate and return as a single ndarray. If outputs is a list of Layers/Tensors, will return a list of ndarrays.
Returns:

y_pred

Return type:

numpy ndarray or list of numpy ndarrays

predict_proba_on_batch(X, transformers=[], outputs=None)

Generates predictions for input samples, processing samples in a batch.

Parameters:
  • X (ndarray) – the input data, as a Numpy array.
  • transformers (List) – List of dc.trans.Transformers
Returns:

Return type:

A Numpy array of predictions.

predict_proba_on_generator(generator, transformers=[], outputs=None)
Returns:numpy ndarray of shape (n_samples, n_classes*n_tasks)
Return type:y_pred
reload()

Reload trained model from disk.

restore(checkpoint=None)

Not currently supported.

save()
save_checkpoint(max_checkpoints_to_keep=5)

Save a checkpoint to disk.

Usually you do not need to call this method, since fit() saves checkpoints automatically. If you have disabled automatic checkpointing during fitting, this can be called to manually write checkpoints.

Parameters:max_checkpoints_to_keep (int) – the maximum number of checkpoints to keep. Older checkpoints are discarded.
set_loss(layer)
set_optimizer(optimizer)

Set the optimizer to use for fitting.

set_params(**params)

Set the parameters of this estimator.

The method works on simple estimators as well as on nested objects (such as pipelines). The latter have parameters of the form <component>__<parameter> so that it’s possible to update each component of a nested object.

Returns:
Return type:self
topsort()

deepchem.models.tensorgraph.models.symmetry_function_regression module

Created on Thu Jul 6 20:31:47 2017

@author: zqwu @contributors: ytz

class deepchem.models.tensorgraph.models.symmetry_function_regression.ANIRegression(n_tasks, max_atoms, layer_structures=[128, 64], atom_number_cases=[1, 6, 7, 8, 16], **kwargs)[source]

Bases: deepchem.models.tensorgraph.tensor_graph.TensorGraph

add_output(layer)
build()
build_grad()[source]
build_graph()[source]
compute_grad(dataset, upper_lim=1)[source]

Computes a batched gradients given an input dataset.

Parameters:
  • dataset (dc.Dataset) – dataset-like object whose X values will be used to compute gradients from
  • upper_lim (int) – subset of dataset used.
Returns:

Gradients of the input of shape (max_atoms, 4). Note that it is up to the end user to slice this matrix into the correct shape, since it’s very likely the derivatives with respect to the atomic numbers are zero.

Return type:

np.array

create_submodel(layers=None, loss=None, optimizer=None)

Create an alternate objective for training one piece of a TensorGraph.

A TensorGraph consists of a set of layers, and specifies a loss function and optimizer to use for training those layers. Usually this is sufficient, but there are cases where you want to train different parts of a model separately. For example, a GAN consists of a generator and a discriminator. They are trained separately, and they use different loss functions.

A submodel defines an alternate objective to use in cases like this. It may optionally specify any of the following: a subset of layers in the model to train; a different loss function; and a different optimizer to use. This method creates a submodel, which you can then pass to fit() to use it for training.

Parameters:
  • layers (list) – the list of layers to train. If None, all layers in the model will be trained.
  • loss (Layer) – the loss function to optimize. If None, the model’s main loss function will be used.
  • optimizer (Optimizer) – the optimizer to use for training. If None, the model’s main optimizer will be used.
Returns:

  • the newly created submodel, which can be passed to any of the fitting
  • methods.

default_generator(dataset, epochs=1, predict=False, deterministic=True, pad_batches=True)[source]
evaluate(dataset, metrics, transformers=[], per_task_metrics=False)

Evaluates the performance of this model on specified dataset.

Parameters:
  • dataset (dc.data.Dataset) – Dataset object.
  • metric (deepchem.metrics.Metric) – Evaluation metric
  • transformers (list) – List of deepchem.transformers.Transformer
  • per_task_metrics (bool) – If True, return per-task scores.
Returns:

Maps tasks to scores under metric.

Return type:

dict

evaluate_generator(feed_dict_generator, metrics, transformers=[], labels=None, outputs=None, weights=[], per_task_metrics=False)
fit(dataset, nb_epoch=10, max_checkpoints_to_keep=5, checkpoint_interval=1000, deterministic=False, restore=False, submodel=None, **kwargs)

Train this model on a dataset.

Parameters:
  • dataset (Dataset) – the Dataset to train on
  • nb_epoch (int) – the number of epochs to train for
  • max_checkpoints_to_keep (int) – the maximum number of checkpoints to keep. Older checkpoints are discarded.
  • checkpoint_interval (int) – the frequency at which to write checkpoints, measured in training steps. Set this to 0 to disable automatic checkpointing.
  • deterministic (bool) – if True, the samples are processed in order. If False, a different random order is used for each epoch.
  • restore (bool) – if True, restore the model from the most recent checkpoint and continue training from there. If False, retrain the model from scratch.
  • submodel (Submodel) – an alternate training objective to use. This should have been created by calling create_submodel().
fit_generator(feed_dict_generator, max_checkpoints_to_keep=5, checkpoint_interval=1000, restore=False, submodel=None)

Train this model on data from a generator.

Parameters:
  • feed_dict_generator (generator) – this should generate batches, each represented as a dict that maps Layers to values.
  • max_checkpoints_to_keep (int) – the maximum number of checkpoints to keep. Older checkpoints are discarded.
  • checkpoint_interval (int) – the frequency at which to write checkpoints, measured in training steps. Set this to 0 to disable automatic checkpointing.
  • restore (bool) – if True, restore the model from the most recent checkpoint and continue training from there. If False, retrain the model from scratch.
  • submodel (Submodel) – an alternate training objective to use. This should have been created by calling create_submodel().
Returns:

Return type:

the average loss over the most recent checkpoint interval

fit_on_batch(X, y, w, submodel=None)
get_checkpoints()

Get a list of all available checkpoint files.

get_global_step()
get_layer_variables(layer)

Get the list of trainable variables in a layer of the graph.

get_model_filename(model_dir)

Given model directory, obtain filename for the model itself.

get_num_tasks()
get_params(deep=True)

Get parameters for this estimator.

Parameters:deep (boolean, optional) – If True, will return the parameters for this estimator and contained subobjects that are estimators.
Returns:params – Parameter names mapped to their values.
Return type:mapping of string to any
get_params_filename(model_dir)

Given model directory, obtain filename for the model itself.

get_pickling_errors(obj, seen=None)
get_pre_q_input(input_layer)
get_task_type()

Currently models can only be classifiers or regressors.

grad_one(X, atomic_nums, constraints=None)[source]

Computes gradients for that of a single structure.

Parameters:
  • X (np.array) – numpy array of shape (a, 3) where a <= max_atoms and dtype is float-like
  • atomic_nums (np.array) – numpy array of shape (a,) where a is the same as that of X.
  • constraints (np.array) – numpy array of indices of X used for constraining a subset of the atoms of the molecule.
Returns:

derivatives of the same shape and type as input parameter X.

Return type:

np.array

load_from_dir(model_dir, restore=True)
classmethod load_numpy(model_dir)[source]

Load from a portable numpy file.

Parameters:model_dir (str) – Location of the model directory.
minimize_structure(X, atomic_nums, constraints=None)[source]

Minimizes a structure, as defined by a set of coordinates and their atomic numbers.

Parameters:
  • X (np.array) – numpy array of shape (a, 3) where a <= max_atoms and dtype is float-like
  • atomic_nums (np.array) – numpy array of shape (a,) where a is the same as that of X.
Returns:

minimized coordinates of the same shape and type as input parameter X.

Return type:

np.array

pred_one(X, atomic_nums, constraints=None)[source]

Makes an energy prediction for a set of atomic coordinates.

Parameters:
  • X (np.array) – numpy array of shape (a, 3) where a <= max_atoms and dtype is float-like
  • atomic_nums (np.array) – numpy array of shape (a,) where a is the same as that of X.
  • constraints (unused) – This parameter is mainly for compatibility purposes for scipy optimize
Returns:

Predicted energy. Note that the meaning of the returned value is dependent on the training y-values both in semantics (relative vs absolute) and units (kcal/mol vs Hartrees)

Return type:

float

predict(dataset, transformers=[], outputs=None)

Uses self to make predictions on provided Dataset object.

Parameters:
  • dataset (dc.data.Dataset) – Dataset to make prediction on
  • transformers (list) – List of dc.trans.Transformers.
  • outputs (object) – If outputs is None, then will assume outputs = self.outputs[0] (single output). If outputs is a Layer/Tensor, then will evaluate and return as a single ndarray. If outputs is a list of Layers/Tensors, will return a list of ndarrays.
Returns:

results

Return type:

numpy ndarray or list of numpy ndarrays

predict_on_batch(X, transformers=[], outputs=None)

Generates predictions for input samples, processing samples in a batch.

Parameters:
  • X (ndarray) – the input data, as a Numpy array.
  • transformers (List) – List of dc.trans.Transformers
Returns:

Return type:

A Numpy array of predictions.

predict_on_generator(generator, transformers=[], outputs=None)
Parameters:
  • generator (Generator) – Generator that constructs feed dictionaries for TensorGraph.
  • transformers (list) – List of dc.trans.Transformers.
  • outputs (object) – If outputs is None, then will assume outputs = self.outputs. If outputs is a Layer/Tensor, then will evaluate and return as a single ndarray. If outputs is a list of Layers/Tensors, will return a list of ndarrays.
  • Returns – y_pred: numpy ndarray of shape (n_samples, n_classes*n_tasks)
predict_proba(dataset, transformers=[], outputs=None)
Parameters:
  • dataset (dc.data.Dataset) – Dataset to make prediction on
  • transformers (list) – List of dc.trans.Transformers.
  • outputs (object) – If outputs is None, then will assume outputs = self.outputs[0] (single output). If outputs is a Layer/Tensor, then will evaluate and return as a single ndarray. If outputs is a list of Layers/Tensors, will return a list of ndarrays.
Returns:

y_pred

Return type:

numpy ndarray or list of numpy ndarrays

predict_proba_on_batch(X, transformers=[], outputs=None)

Generates predictions for input samples, processing samples in a batch.

Parameters:
  • X (ndarray) – the input data, as a Numpy array.
  • transformers (List) – List of dc.trans.Transformers
Returns:

Return type:

A Numpy array of predictions.

predict_proba_on_generator(generator, transformers=[], outputs=None)
Returns:numpy ndarray of shape (n_samples, n_classes*n_tasks)
Return type:y_pred
reload()

Reload trained model from disk.

restore(checkpoint=None)

Reload the values of all variables from a checkpoint file.

Parameters:checkpoint (str) – the path to the checkpoint file to load. If this is None, the most recent checkpoint will be chosen automatically. Call get_checkpoints() to get a list of all available checkpoints.
save()[source]
save_checkpoint(max_checkpoints_to_keep=5)

Save a checkpoint to disk.

Usually you do not need to call this method, since fit() saves checkpoints automatically. If you have disabled automatic checkpointing during fitting, this can be called to manually write checkpoints.

Parameters:max_checkpoints_to_keep (int) – the maximum number of checkpoints to keep. Older checkpoints are discarded.
save_numpy()[source]

Save to a portable numpy file. Note that this relies on the names to be consistent across different versions. The file is saved as save_pickle.npz under the model_dir.

set_loss(layer)
set_optimizer(optimizer)

Set the optimizer to use for fitting.

set_params(**params)

Set the parameters of this estimator.

The method works on simple estimators as well as on nested objects (such as pipelines). The latter have parameters of the form <component>__<parameter> so that it’s possible to update each component of a nested object.

Returns:
Return type:self
topsort()
class deepchem.models.tensorgraph.models.symmetry_function_regression.BPSymmetryFunctionRegression(n_tasks, max_atoms, n_feat=96, layer_structures=[128, 64], **kwargs)[source]

Bases: deepchem.models.tensorgraph.tensor_graph.TensorGraph

add_output(layer)
build()
build_graph()[source]
create_submodel(layers=None, loss=None, optimizer=None)

Create an alternate objective for training one piece of a TensorGraph.

A TensorGraph consists of a set of layers, and specifies a loss function and optimizer to use for training those layers. Usually this is sufficient, but there are cases where you want to train different parts of a model separately. For example, a GAN consists of a generator and a discriminator. They are trained separately, and they use different loss functions.

A submodel defines an alternate objective to use in cases like this. It may optionally specify any of the following: a subset of layers in the model to train; a different loss function; and a different optimizer to use. This method creates a submodel, which you can then pass to fit() to use it for training.

Parameters:
  • layers (list) – the list of layers to train. If None, all layers in the model will be trained.
  • loss (Layer) – the loss function to optimize. If None, the model’s main loss function will be used.
  • optimizer (Optimizer) – the optimizer to use for training. If None, the model’s main optimizer will be used.
Returns:

  • the newly created submodel, which can be passed to any of the fitting
  • methods.

default_generator(dataset, epochs=1, predict=False, pad_batches=True)[source]
evaluate(dataset, metrics, transformers=[], per_task_metrics=False)

Evaluates the performance of this model on specified dataset.

Parameters:
  • dataset (dc.data.Dataset) – Dataset object.
  • metric (deepchem.metrics.Metric) – Evaluation metric
  • transformers (list) – List of deepchem.transformers.Transformer
  • per_task_metrics (bool) – If True, return per-task scores.
Returns:

Maps tasks to scores under metric.

Return type:

dict

evaluate_generator(feed_dict_generator, metrics, transformers=[], labels=None, outputs=None, weights=[], per_task_metrics=False)
fit(dataset, nb_epoch=10, max_checkpoints_to_keep=5, checkpoint_interval=1000, deterministic=False, restore=False, submodel=None, **kwargs)

Train this model on a dataset.

Parameters:
  • dataset (Dataset) – the Dataset to train on
  • nb_epoch (int) – the number of epochs to train for
  • max_checkpoints_to_keep (int) – the maximum number of checkpoints to keep. Older checkpoints are discarded.
  • checkpoint_interval (int) – the frequency at which to write checkpoints, measured in training steps. Set this to 0 to disable automatic checkpointing.
  • deterministic (bool) – if True, the samples are processed in order. If False, a different random order is used for each epoch.
  • restore (bool) – if True, restore the model from the most recent checkpoint and continue training from there. If False, retrain the model from scratch.
  • submodel (Submodel) – an alternate training objective to use. This should have been created by calling create_submodel().
fit_generator(feed_dict_generator, max_checkpoints_to_keep=5, checkpoint_interval=1000, restore=False, submodel=None)

Train this model on data from a generator.

Parameters:
  • feed_dict_generator (generator) – this should generate batches, each represented as a dict that maps Layers to values.
  • max_checkpoints_to_keep (int) – the maximum number of checkpoints to keep. Older checkpoints are discarded.
  • checkpoint_interval (int) – the frequency at which to write checkpoints, measured in training steps. Set this to 0 to disable automatic checkpointing.
  • restore (bool) – if True, restore the model from the most recent checkpoint and continue training from there. If False, retrain the model from scratch.
  • submodel (Submodel) – an alternate training objective to use. This should have been created by calling create_submodel().
Returns:

Return type:

the average loss over the most recent checkpoint interval

fit_on_batch(X, y, w, submodel=None)
get_checkpoints()

Get a list of all available checkpoint files.

get_global_step()
get_layer_variables(layer)

Get the list of trainable variables in a layer of the graph.

get_model_filename(model_dir)

Given model directory, obtain filename for the model itself.

get_num_tasks()
get_params(deep=True)

Get parameters for this estimator.

Parameters:deep (boolean, optional) – If True, will return the parameters for this estimator and contained subobjects that are estimators.
Returns:params – Parameter names mapped to their values.
Return type:mapping of string to any
get_params_filename(model_dir)

Given model directory, obtain filename for the model itself.

get_pickling_errors(obj, seen=None)
get_pre_q_input(input_layer)
get_task_type()

Currently models can only be classifiers or regressors.

load_from_dir(model_dir, restore=True)
predict(dataset, transformers=[], outputs=None)

Uses self to make predictions on provided Dataset object.

Parameters:
  • dataset (dc.data.Dataset) – Dataset to make prediction on
  • transformers (list) – List of dc.trans.Transformers.
  • outputs (object) – If outputs is None, then will assume outputs = self.outputs[0] (single output). If outputs is a Layer/Tensor, then will evaluate and return as a single ndarray. If outputs is a list of Layers/Tensors, will return a list of ndarrays.
Returns:

results

Return type:

numpy ndarray or list of numpy ndarrays

predict_on_batch(X, transformers=[], outputs=None)

Generates predictions for input samples, processing samples in a batch.

Parameters:
  • X (ndarray) – the input data, as a Numpy array.
  • transformers (List) – List of dc.trans.Transformers
Returns:

Return type:

A Numpy array of predictions.

predict_on_generator(generator, transformers=[], outputs=None)
Parameters:
  • generator (Generator) – Generator that constructs feed dictionaries for TensorGraph.
  • transformers (list) – List of dc.trans.Transformers.
  • outputs (object) – If outputs is None, then will assume outputs = self.outputs. If outputs is a Layer/Tensor, then will evaluate and return as a single ndarray. If outputs is a list of Layers/Tensors, will return a list of ndarrays.
  • Returns – y_pred: numpy ndarray of shape (n_samples, n_classes*n_tasks)
predict_proba(dataset, transformers=[], outputs=None)
Parameters:
  • dataset (dc.data.Dataset) – Dataset to make prediction on
  • transformers (list) – List of dc.trans.Transformers.
  • outputs (object) – If outputs is None, then will assume outputs = self.outputs[0] (single output). If outputs is a Layer/Tensor, then will evaluate and return as a single ndarray. If outputs is a list of Layers/Tensors, will return a list of ndarrays.
Returns:

y_pred

Return type:

numpy ndarray or list of numpy ndarrays

predict_proba_on_batch(X, transformers=[], outputs=None)

Generates predictions for input samples, processing samples in a batch.

Parameters:
  • X (ndarray) – the input data, as a Numpy array.
  • transformers (List) – List of dc.trans.Transformers
Returns:

Return type:

A Numpy array of predictions.

predict_proba_on_generator(generator, transformers=[], outputs=None)
Returns:numpy ndarray of shape (n_samples, n_classes*n_tasks)
Return type:y_pred
reload()

Reload trained model from disk.

restore(checkpoint=None)

Reload the values of all variables from a checkpoint file.

Parameters:checkpoint (str) – the path to the checkpoint file to load. If this is None, the most recent checkpoint will be chosen automatically. Call get_checkpoints() to get a list of all available checkpoints.
save()
save_checkpoint(max_checkpoints_to_keep=5)

Save a checkpoint to disk.

Usually you do not need to call this method, since fit() saves checkpoints automatically. If you have disabled automatic checkpointing during fitting, this can be called to manually write checkpoints.

Parameters:max_checkpoints_to_keep (int) – the maximum number of checkpoints to keep. Older checkpoints are discarded.
set_loss(layer)
set_optimizer(optimizer)

Set the optimizer to use for fitting.

set_params(**params)

Set the parameters of this estimator.

The method works on simple estimators as well as on nested objects (such as pipelines). The latter have parameters of the form <component>__<parameter> so that it’s possible to update each component of a nested object.

Returns:
Return type:self
topsort()

deepchem.models.tensorgraph.models.test_graph_models module

class deepchem.models.tensorgraph.models.test_graph_models.TestGraphModels(methodName='runTest')[source]

Bases: unittest.case.TestCase

addCleanup(function, *args, **kwargs)

Add a function, with arguments, to be called when the test is completed. Functions added are called on a LIFO basis and are called after tearDown on test failure or success.

Cleanup items are called even if setUp fails (unlike tearDown).

addTypeEqualityFunc(typeobj, function)

Add a type specific assertEqual style function to compare a type.

This method is for use by TestCase subclasses that need to register their own type equality functions to provide nicer error messages.

Parameters:
  • typeobj – The data type to call this function on when both values are of the same type in assertEqual().
  • function – The callable taking two arguments and an optional msg= argument that raises self.failureException with a useful error message when the two arguments are not equal.
assertAlmostEqual(first, second, places=None, msg=None, delta=None)

Fail if the two objects are unequal as determined by their difference rounded to the given number of decimal places (default 7) and comparing to zero, or by comparing that the between the two objects is more than the given delta.

Note that decimal places (from zero) are usually not the same as significant digits (measured from the most significant digit).

If the two objects compare equal then they will automatically compare almost equal.

assertAlmostEquals(*args, **kwargs)
assertCountEqual(first, second, msg=None)

An unordered sequence comparison asserting that the same elements, regardless of order. If the same element occurs more than once, it verifies that the elements occur the same number of times.

self.assertEqual(Counter(list(first)),
Counter(list(second)))
Example:
  • [0, 1, 1] and [1, 0, 1] compare equal.
  • [0, 0, 1] and [0, 1] compare unequal.
assertDictContainsSubset(subset, dictionary, msg=None)

Checks whether dictionary is a superset of subset.

assertDictEqual(d1, d2, msg=None)
assertEqual(first, second, msg=None)

Fail if the two objects are unequal as determined by the ‘==’ operator.

assertEquals(*args, **kwargs)
assertFalse(expr, msg=None)

Check that the expression is false.

assertGreater(a, b, msg=None)

Just like self.assertTrue(a > b), but with a nicer default message.

assertGreaterEqual(a, b, msg=None)

Just like self.assertTrue(a >= b), but with a nicer default message.

assertIn(member, container, msg=None)

Just like self.assertTrue(a in b), but with a nicer default message.

assertIs(expr1, expr2, msg=None)

Just like self.assertTrue(a is b), but with a nicer default message.

assertIsInstance(obj, cls, msg=None)

Same as self.assertTrue(isinstance(obj, cls)), with a nicer default message.

assertIsNone(obj, msg=None)

Same as self.assertTrue(obj is None), with a nicer default message.

assertIsNot(expr1, expr2, msg=None)

Just like self.assertTrue(a is not b), but with a nicer default message.

assertIsNotNone(obj, msg=None)

Included for symmetry with assertIsNone.

assertLess(a, b, msg=None)

Just like self.assertTrue(a < b), but with a nicer default message.

assertLessEqual(a, b, msg=None)

Just like self.assertTrue(a <= b), but with a nicer default message.

assertListEqual(list1, list2, msg=None)

A list-specific equality assertion.

Parameters:
  • list1 – The first list to compare.
  • list2 – The second list to compare.
  • msg – Optional message to use on failure instead of a list of differences.
assertLogs(logger=None, level=None)

Fail unless a log message of level level or higher is emitted on logger_name or its children. If omitted, level defaults to INFO and logger defaults to the root logger.

This method must be used as a context manager, and will yield a recording object with two attributes: output and records. At the end of the context manager, the output attribute will be a list of the matching formatted log messages and the records attribute will be a list of the corresponding LogRecord objects.

Example:

with self.assertLogs('foo', level='INFO') as cm:
    logging.getLogger('foo').info('first message')
    logging.getLogger('foo.bar').error('second message')
self.assertEqual(cm.output, ['INFO:foo:first message',
                             'ERROR:foo.bar:second message'])
assertMultiLineEqual(first, second, msg=None)

Assert that two multi-line strings are equal.

assertNotAlmostEqual(first, second, places=None, msg=None, delta=None)

Fail if the two objects are equal as determined by their difference rounded to the given number of decimal places (default 7) and comparing to zero, or by comparing that the between the two objects is less than the given delta.

Note that decimal places (from zero) are usually not the same as significant digits (measured from the most significant digit).

Objects that are equal automatically fail.

assertNotAlmostEquals(*args, **kwargs)
assertNotEqual(first, second, msg=None)

Fail if the two objects are equal as determined by the ‘!=’ operator.

assertNotEquals(*args, **kwargs)
assertNotIn(member, container, msg=None)

Just like self.assertTrue(a not in b), but with a nicer default message.

assertNotIsInstance(obj, cls, msg=None)

Included for symmetry with assertIsInstance.

assertNotRegex(text, unexpected_regex, msg=None)

Fail the test if the text matches the regular expression.

assertNotRegexpMatches(*args, **kwargs)
assertRaises(expected_exception, *args, **kwargs)

Fail unless an exception of class expected_exception is raised by the callable when invoked with specified positional and keyword arguments. If a different type of exception is raised, it will not be caught, and the test case will be deemed to have suffered an error, exactly as for an unexpected exception.

If called with the callable and arguments omitted, will return a context object used like this:

with self.assertRaises(SomeException):
    do_something()

An optional keyword argument ‘msg’ can be provided when assertRaises is used as a context object.

The context manager keeps a reference to the exception as the ‘exception’ attribute. This allows you to inspect the exception after the assertion:

with self.assertRaises(SomeException) as cm:
    do_something()
the_exception = cm.exception
self.assertEqual(the_exception.error_code, 3)
assertRaisesRegex(expected_exception, expected_regex, *args, **kwargs)

Asserts that the message in a raised exception matches a regex.

Parameters:
  • expected_exception – Exception class expected to be raised.
  • expected_regex – Regex (re pattern object or string) expected to be found in error message.
  • args – Function to be called and extra positional args.
  • kwargs – Extra kwargs.
  • msg – Optional message used in case of failure. Can only be used when assertRaisesRegex is used as a context manager.
assertRaisesRegexp(*args, **kwargs)
assertRegex(text, expected_regex, msg=None)

Fail the test unless the text matches the regular expression.

assertRegexpMatches(*args, **kwargs)
assertSequenceEqual(seq1, seq2, msg=None, seq_type=None)

An equality assertion for ordered sequences (like lists and tuples).

For the purposes of this function, a valid ordered sequence type is one which can be indexed, has a length, and has an equality operator.

Parameters:
  • seq1 – The first sequence to compare.
  • seq2 – The second sequence to compare.
  • seq_type – The expected datatype of the sequences, or None if no datatype should be enforced.
  • msg – Optional message to use on failure instead of a list of differences.
assertSetEqual(set1, set2, msg=None)

A set-specific equality assertion.

Parameters:
  • set1 – The first set to compare.
  • set2 – The second set to compare.
  • msg – Optional message to use on failure instead of a list of differences.

assertSetEqual uses ducktyping to support different types of sets, and is optimized for sets specifically (parameters must support a difference method).

assertTrue(expr, msg=None)

Check that the expression is true.

assertTupleEqual(tuple1, tuple2, msg=None)

A tuple-specific equality assertion.

Parameters:
  • tuple1 – The first tuple to compare.
  • tuple2 – The second tuple to compare.
  • msg – Optional message to use on failure instead of a list of differences.
assertWarns(expected_warning, *args, **kwargs)

Fail unless a warning of class warnClass is triggered by the callable when invoked with specified positional and keyword arguments. If a different type of warning is triggered, it will not be handled: depending on the other warning filtering rules in effect, it might be silenced, printed out, or raised as an exception.

If called with the callable and arguments omitted, will return a context object used like this:

with self.assertWarns(SomeWarning):
    do_something()

An optional keyword argument ‘msg’ can be provided when assertWarns is used as a context object.

The context manager keeps a reference to the first matching warning as the ‘warning’ attribute; similarly, the ‘filename’ and ‘lineno’ attributes give you information about the line of Python code from which the warning was triggered. This allows you to inspect the warning after the assertion:

with self.assertWarns(SomeWarning) as cm:
    do_something()
the_warning = cm.warning
self.assertEqual(the_warning.some_attribute, 147)
assertWarnsRegex(expected_warning, expected_regex, *args, **kwargs)

Asserts that the message in a triggered warning matches a regexp. Basic functioning is similar to assertWarns() with the addition that only warnings whose messages also match the regular expression are considered successful matches.

Parameters:
  • expected_warning – Warning class expected to be triggered.
  • expected_regex – Regex (re pattern object or string) expected to be found in error message.
  • args – Function to be called and extra positional args.
  • kwargs – Extra kwargs.
  • msg – Optional message used in case of failure. Can only be used when assertWarnsRegex is used as a context manager.
assert_(*args, **kwargs)
countTestCases()
debug()

Run the test without collecting errors in a TestResult

defaultTestResult()
doCleanups()

Execute all cleanup functions. Normally called for you after tearDown.

fail(msg=None)

Fail immediately, with the given message.

failIf(*args, **kwargs)
failIfAlmostEqual(*args, **kwargs)
failIfEqual(*args, **kwargs)
failUnless(*args, **kwargs)
failUnlessAlmostEqual(*args, **kwargs)
failUnlessEqual(*args, **kwargs)
failUnlessRaises(*args, **kwargs)
failureException

alias of AssertionError

get_dataset(mode='classification', featurizer='GraphConv', num_tasks=2)[source]
id()
longMessage = True
maxDiff = 640
run(result=None)
setUp()

Hook method for setting up the test fixture before exercising it.

setUpClass()

Hook method for setting up class fixture before running tests in the class.

shortDescription()

Returns a one-line description of the test, or None if no description has been provided.

The default implementation of this method returns the first line of the specified test method’s docstring.

skipTest(reason)

Skip this test.

subTest(msg=<object object>, **params)

Return a context manager that will return the enclosed block of code in a subtest identified by the optional message and keyword parameters. A failure in the subtest marks the test case as failed but resumes execution at the end of the enclosed block, allowing further test code to be executed.

tearDown()

Hook method for deconstructing the test fixture after testing it.

tearDownClass()

Hook method for deconstructing the class fixture after running all tests in the class.

test_change_loss_function()[source]
test_change_loss_function_weave()[source]
test_graph_conv_error_bars()[source]
test_graph_conv_model()[source]
test_graph_conv_regression_model()[source]

deepchem.models.tensorgraph.models.test_symmetry_functions module

class deepchem.models.tensorgraph.models.test_symmetry_functions.TestANIRegression(methodName='runTest')[source]

Bases: unittest.case.TestCase

addCleanup(function, *args, **kwargs)

Add a function, with arguments, to be called when the test is completed. Functions added are called on a LIFO basis and are called after tearDown on test failure or success.

Cleanup items are called even if setUp fails (unlike tearDown).

addTypeEqualityFunc(typeobj, function)

Add a type specific assertEqual style function to compare a type.

This method is for use by TestCase subclasses that need to register their own type equality functions to provide nicer error messages.

Parameters:
  • typeobj – The data type to call this function on when both values are of the same type in assertEqual().
  • function – The callable taking two arguments and an optional msg= argument that raises self.failureException with a useful error message when the two arguments are not equal.
assertAlmostEqual(first, second, places=None, msg=None, delta=None)

Fail if the two objects are unequal as determined by their difference rounded to the given number of decimal places (default 7) and comparing to zero, or by comparing that the between the two objects is more than the given delta.

Note that decimal places (from zero) are usually not the same as significant digits (measured from the most significant digit).

If the two objects compare equal then they will automatically compare almost equal.

assertAlmostEquals(*args, **kwargs)
assertCountEqual(first, second, msg=None)

An unordered sequence comparison asserting that the same elements, regardless of order. If the same element occurs more than once, it verifies that the elements occur the same number of times.

self.assertEqual(Counter(list(first)),
Counter(list(second)))
Example:
  • [0, 1, 1] and [1, 0, 1] compare equal.
  • [0, 0, 1] and [0, 1] compare unequal.
assertDictContainsSubset(subset, dictionary, msg=None)

Checks whether dictionary is a superset of subset.

assertDictEqual(d1, d2, msg=None)
assertEqual(first, second, msg=None)

Fail if the two objects are unequal as determined by the ‘==’ operator.

assertEquals(*args, **kwargs)
assertFalse(expr, msg=None)

Check that the expression is false.

assertGreater(a, b, msg=None)

Just like self.assertTrue(a > b), but with a nicer default message.

assertGreaterEqual(a, b, msg=None)

Just like self.assertTrue(a >= b), but with a nicer default message.

assertIn(member, container, msg=None)

Just like self.assertTrue(a in b), but with a nicer default message.

assertIs(expr1, expr2, msg=None)

Just like self.assertTrue(a is b), but with a nicer default message.

assertIsInstance(obj, cls, msg=None)

Same as self.assertTrue(isinstance(obj, cls)), with a nicer default message.

assertIsNone(obj, msg=None)

Same as self.assertTrue(obj is None), with a nicer default message.

assertIsNot(expr1, expr2, msg=None)

Just like self.assertTrue(a is not b), but with a nicer default message.

assertIsNotNone(obj, msg=None)

Included for symmetry with assertIsNone.

assertLess(a, b, msg=None)

Just like self.assertTrue(a < b), but with a nicer default message.

assertLessEqual(a, b, msg=None)

Just like self.assertTrue(a <= b), but with a nicer default message.

assertListEqual(list1, list2, msg=None)

A list-specific equality assertion.

Parameters:
  • list1 – The first list to compare.
  • list2 – The second list to compare.
  • msg – Optional message to use on failure instead of a list of differences.
assertLogs(logger=None, level=None)

Fail unless a log message of level level or higher is emitted on logger_name or its children. If omitted, level defaults to INFO and logger defaults to the root logger.

This method must be used as a context manager, and will yield a recording object with two attributes: output and records. At the end of the context manager, the output attribute will be a list of the matching formatted log messages and the records attribute will be a list of the corresponding LogRecord objects.

Example:

with self.assertLogs('foo', level='INFO') as cm:
    logging.getLogger('foo').info('first message')
    logging.getLogger('foo.bar').error('second message')
self.assertEqual(cm.output, ['INFO:foo:first message',
                             'ERROR:foo.bar:second message'])
assertMultiLineEqual(first, second, msg=None)

Assert that two multi-line strings are equal.

assertNotAlmostEqual(first, second, places=None, msg=None, delta=None)

Fail if the two objects are equal as determined by their difference rounded to the given number of decimal places (default 7) and comparing to zero, or by comparing that the between the two objects is less than the given delta.

Note that decimal places (from zero) are usually not the same as significant digits (measured from the most significant digit).

Objects that are equal automatically fail.

assertNotAlmostEquals(*args, **kwargs)
assertNotEqual(first, second, msg=None)

Fail if the two objects are equal as determined by the ‘!=’ operator.

assertNotEquals(*args, **kwargs)
assertNotIn(member, container, msg=None)

Just like self.assertTrue(a not in b), but with a nicer default message.

assertNotIsInstance(obj, cls, msg=None)

Included for symmetry with assertIsInstance.

assertNotRegex(text, unexpected_regex, msg=None)

Fail the test if the text matches the regular expression.

assertNotRegexpMatches(*args, **kwargs)
assertRaises(expected_exception, *args, **kwargs)

Fail unless an exception of class expected_exception is raised by the callable when invoked with specified positional and keyword arguments. If a different type of exception is raised, it will not be caught, and the test case will be deemed to have suffered an error, exactly as for an unexpected exception.

If called with the callable and arguments omitted, will return a context object used like this:

with self.assertRaises(SomeException):
    do_something()

An optional keyword argument ‘msg’ can be provided when assertRaises is used as a context object.

The context manager keeps a reference to the exception as the ‘exception’ attribute. This allows you to inspect the exception after the assertion:

with self.assertRaises(SomeException) as cm:
    do_something()
the_exception = cm.exception
self.assertEqual(the_exception.error_code, 3)
assertRaisesRegex(expected_exception, expected_regex, *args, **kwargs)

Asserts that the message in a raised exception matches a regex.

Parameters:
  • expected_exception – Exception class expected to be raised.
  • expected_regex – Regex (re pattern object or string) expected to be found in error message.
  • args – Function to be called and extra positional args.
  • kwargs – Extra kwargs.
  • msg – Optional message used in case of failure. Can only be used when assertRaisesRegex is used as a context manager.
assertRaisesRegexp(*args, **kwargs)
assertRegex(text, expected_regex, msg=None)

Fail the test unless the text matches the regular expression.

assertRegexpMatches(*args, **kwargs)
assertSequenceEqual(seq1, seq2, msg=None, seq_type=None)

An equality assertion for ordered sequences (like lists and tuples).

For the purposes of this function, a valid ordered sequence type is one which can be indexed, has a length, and has an equality operator.

Parameters:
  • seq1 – The first sequence to compare.
  • seq2 – The second sequence to compare.
  • seq_type – The expected datatype of the sequences, or None if no datatype should be enforced.
  • msg – Optional message to use on failure instead of a list of differences.
assertSetEqual(set1, set2, msg=None)

A set-specific equality assertion.

Parameters:
  • set1 – The first set to compare.
  • set2 – The second set to compare.
  • msg – Optional message to use on failure instead of a list of differences.

assertSetEqual uses ducktyping to support different types of sets, and is optimized for sets specifically (parameters must support a difference method).

assertTrue(expr, msg=None)

Check that the expression is true.

assertTupleEqual(tuple1, tuple2, msg=None)

A tuple-specific equality assertion.

Parameters:
  • tuple1 – The first tuple to compare.
  • tuple2 – The second tuple to compare.
  • msg – Optional message to use on failure instead of a list of differences.
assertWarns(expected_warning, *args, **kwargs)

Fail unless a warning of class warnClass is triggered by the callable when invoked with specified positional and keyword arguments. If a different type of warning is triggered, it will not be handled: depending on the other warning filtering rules in effect, it might be silenced, printed out, or raised as an exception.

If called with the callable and arguments omitted, will return a context object used like this:

with self.assertWarns(SomeWarning):
    do_something()

An optional keyword argument ‘msg’ can be provided when assertWarns is used as a context object.

The context manager keeps a reference to the first matching warning as the ‘warning’ attribute; similarly, the ‘filename’ and ‘lineno’ attributes give you information about the line of Python code from which the warning was triggered. This allows you to inspect the warning after the assertion:

with self.assertWarns(SomeWarning) as cm:
    do_something()
the_warning = cm.warning
self.assertEqual(the_warning.some_attribute, 147)
assertWarnsRegex(expected_warning, expected_regex, *args, **kwargs)

Asserts that the message in a triggered warning matches a regexp. Basic functioning is similar to assertWarns() with the addition that only warnings whose messages also match the regular expression are considered successful matches.

Parameters:
  • expected_warning – Warning class expected to be triggered.
  • expected_regex – Regex (re pattern object or string) expected to be found in error message.
  • args – Function to be called and extra positional args.
  • kwargs – Extra kwargs.
  • msg – Optional message used in case of failure. Can only be used when assertWarnsRegex is used as a context manager.
assert_(*args, **kwargs)
countTestCases()
debug()

Run the test without collecting errors in a TestResult

defaultTestResult()
doCleanups()

Execute all cleanup functions. Normally called for you after tearDown.

fail(msg=None)

Fail immediately, with the given message.

failIf(*args, **kwargs)
failIfAlmostEqual(*args, **kwargs)
failIfEqual(*args, **kwargs)
failUnless(*args, **kwargs)
failUnlessAlmostEqual(*args, **kwargs)
failUnlessEqual(*args, **kwargs)
failUnlessRaises(*args, **kwargs)
failureException

alias of AssertionError

id()
longMessage = True
maxDiff = 640
run(result=None)
setUp()[source]
setUpClass()

Hook method for setting up class fixture before running tests in the class.

shortDescription()

Returns a one-line description of the test, or None if no description has been provided.

The default implementation of this method returns the first line of the specified test method’s docstring.

skipTest(reason)

Skip this test.

subTest(msg=<object object>, **params)

Return a context manager that will return the enclosed block of code in a subtest identified by the optional message and keyword parameters. A failure in the subtest marks the test case as failed but resumes execution at the end of the enclosed block, allowing further test code to be executed.

tearDown()

Hook method for deconstructing the test fixture after testing it.

tearDownClass()

Hook method for deconstructing the class fixture after running all tests in the class.

test_gradients()[source]
test_numpy_save_load()[source]

deepchem.models.tensorgraph.models.text_cnn module

Created on Thu Sep 28 15:17:50 2017

@author: zqwu

class deepchem.models.tensorgraph.models.text_cnn.TextCNNTensorGraph(n_tasks, char_dict, seq_length, n_embedding=75, kernel_sizes=[1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 15, 20], num_filters=[100, 200, 200, 200, 200, 100, 100, 100, 100, 100, 160, 160], dropout=0.25, mode='classification', **kwargs)[source]

Bases: deepchem.models.tensorgraph.tensor_graph.TensorGraph

A Convolutional neural network on smiles strings Reimplementation of the discriminator module in ORGAN: https://arxiv.org/abs/1705.10843 Originated from: http://emnlp2014.org/papers/pdf/EMNLP2014181.pdf

This model applies multiple 1D convolutional filters to the padded strings, then max-over-time pooling is applied on all filters, extracting one feature per filter. All features are concatenated and transformed through several hidden layers to form predictions.

This model is initially developed for sentence-level classification tasks, with words represented as vectors. In this implementation, SMILES strings are dissected into characters and transformed to one-hot vectors in a similar way. The model can be used for general molecular-level classification or regression tasks. It is also used in the ORGAN model as discriminator.

Training of the model only requires SMILES strings input, all featurized datasets that include SMILES in the ids attribute are accepted. PDBbind, QM7 and QM7b are not supported. To use the model, build_char_dict should be called first before defining the model to build character dict of input dataset, example can be found in examples/delaney/delaney_textcnn.py

add_output(layer)
build()
static build_char_dict(dataset, default_dict={'(': 2, 'F': 17, 'O': 21, 'c': 28, '_': 27, '[': 24, '-': 5, '6': 12, 'S': 23, '3': 9, 'C': 16, '4': 10, ')': 3, 'I': 19, 'H': 18, 'P': 22, '2': 8, '5': 11, 's': 33, 'n': 31, 'Br': 30, '7': 13, 'Cl': 29, '+': 4, '/': 6, ']': 26, '1': 7, '#': 1, '\\': 25, '8': 14, 'N': 20, '=': 15, 'o': 32})[source]

Collect all unique characters(in smiles) from the dataset. This method should be called before defining the model to build appropriate char_dict

build_graph()[source]
create_submodel(layers=None, loss=None, optimizer=None)

Create an alternate objective for training one piece of a TensorGraph.

A TensorGraph consists of a set of layers, and specifies a loss function and optimizer to use for training those layers. Usually this is sufficient, but there are cases where you want to train different parts of a model separately. For example, a GAN consists of a generator and a discriminator. They are trained separately, and they use different loss functions.

A submodel defines an alternate objective to use in cases like this. It may optionally specify any of the following: a subset of layers in the model to train; a different loss function; and a different optimizer to use. This method creates a submodel, which you can then pass to fit() to use it for training.

Parameters:
  • layers (list) – the list of layers to train. If None, all layers in the model will be trained.
  • loss (Layer) – the loss function to optimize. If None, the model’s main loss function will be used.
  • optimizer (Optimizer) – the optimizer to use for training. If None, the model’s main optimizer will be used.
Returns:

  • the newly created submodel, which can be passed to any of the fitting
  • methods.

default_generator(dataset, epochs=1, predict=False, deterministic=True, pad_batches=True)[source]

Transfer smiles strings to fixed length integer vectors

evaluate(dataset, metrics, transformers=[], per_task_metrics=False)

Evaluates the performance of this model on specified dataset.

Parameters:
  • dataset (dc.data.Dataset) – Dataset object.
  • metric (deepchem.metrics.Metric) – Evaluation metric
  • transformers (list) – List of deepchem.transformers.Transformer
  • per_task_metrics (bool) – If True, return per-task scores.
Returns:

Maps tasks to scores under metric.

Return type:

dict

evaluate_generator(feed_dict_generator, metrics, transformers=[], labels=None, outputs=None, weights=[], per_task_metrics=False)
fit(dataset, nb_epoch=10, max_checkpoints_to_keep=5, checkpoint_interval=1000, deterministic=False, restore=False, submodel=None, **kwargs)

Train this model on a dataset.

Parameters:
  • dataset (Dataset) – the Dataset to train on
  • nb_epoch (int) – the number of epochs to train for
  • max_checkpoints_to_keep (int) – the maximum number of checkpoints to keep. Older checkpoints are discarded.
  • checkpoint_interval (int) – the frequency at which to write checkpoints, measured in training steps. Set this to 0 to disable automatic checkpointing.
  • deterministic (bool) – if True, the samples are processed in order. If False, a different random order is used for each epoch.
  • restore (bool) – if True, restore the model from the most recent checkpoint and continue training from there. If False, retrain the model from scratch.
  • submodel (Submodel) – an alternate training objective to use. This should have been created by calling create_submodel().
fit_generator(feed_dict_generator, max_checkpoints_to_keep=5, checkpoint_interval=1000, restore=False, submodel=None)

Train this model on data from a generator.

Parameters:
  • feed_dict_generator (generator) – this should generate batches, each represented as a dict that maps Layers to values.
  • max_checkpoints_to_keep (int) – the maximum number of checkpoints to keep. Older checkpoints are discarded.
  • checkpoint_interval (int) – the frequency at which to write checkpoints, measured in training steps. Set this to 0 to disable automatic checkpointing.
  • restore (bool) – if True, restore the model from the most recent checkpoint and continue training from there. If False, retrain the model from scratch.
  • submodel (Submodel) – an alternate training objective to use. This should have been created by calling create_submodel().
Returns:

Return type:

the average loss over the most recent checkpoint interval

fit_on_batch(X, y, w, submodel=None)
get_checkpoints()

Get a list of all available checkpoint files.

get_global_step()
get_layer_variables(layer)

Get the list of trainable variables in a layer of the graph.

get_model_filename(model_dir)

Given model directory, obtain filename for the model itself.

get_num_tasks()
get_params(deep=True)

Get parameters for this estimator.

Parameters:deep (boolean, optional) – If True, will return the parameters for this estimator and contained subobjects that are estimators.
Returns:params – Parameter names mapped to their values.
Return type:mapping of string to any
get_params_filename(model_dir)

Given model directory, obtain filename for the model itself.

get_pickling_errors(obj, seen=None)
get_pre_q_input(input_layer)
get_task_type()

Currently models can only be classifiers or regressors.

load_from_dir(model_dir, restore=True)
predict(dataset, transformers=[], outputs=None)

Uses self to make predictions on provided Dataset object.

Parameters:
  • dataset (dc.data.Dataset) – Dataset to make prediction on
  • transformers (list) – List of dc.trans.Transformers.
  • outputs (object) – If outputs is None, then will assume outputs = self.outputs[0] (single output). If outputs is a Layer/Tensor, then will evaluate and return as a single ndarray. If outputs is a list of Layers/Tensors, will return a list of ndarrays.
Returns:

results

Return type:

numpy ndarray or list of numpy ndarrays

predict_on_batch(X, transformers=[], outputs=None)

Generates predictions for input samples, processing samples in a batch.

Parameters:
  • X (ndarray) – the input data, as a Numpy array.
  • transformers (List) – List of dc.trans.Transformers
Returns:

Return type:

A Numpy array of predictions.

predict_on_generator(generator, transformers=[], outputs=None)[source]
predict_proba(dataset, transformers=[], outputs=None)
Parameters:
  • dataset (dc.data.Dataset) – Dataset to make prediction on
  • transformers (list) – List of dc.trans.Transformers.
  • outputs (object) – If outputs is None, then will assume outputs = self.outputs[0] (single output). If outputs is a Layer/Tensor, then will evaluate and return as a single ndarray. If outputs is a list of Layers/Tensors, will return a list of ndarrays.
Returns:

y_pred

Return type:

numpy ndarray or list of numpy ndarrays

predict_proba_on_batch(X, transformers=[], outputs=None)

Generates predictions for input samples, processing samples in a batch.

Parameters:
  • X (ndarray) – the input data, as a Numpy array.
  • transformers (List) – List of dc.trans.Transformers
Returns:

Return type:

A Numpy array of predictions.

predict_proba_on_generator(generator, transformers=[], outputs=None)
Returns:numpy ndarray of shape (n_samples, n_classes*n_tasks)
Return type:y_pred
reload()

Reload trained model from disk.

restore(checkpoint=None)

Reload the values of all variables from a checkpoint file.

Parameters:checkpoint (str) – the path to the checkpoint file to load. If this is None, the most recent checkpoint will be chosen automatically. Call get_checkpoints() to get a list of all available checkpoints.
save()
save_checkpoint(max_checkpoints_to_keep=5)

Save a checkpoint to disk.

Usually you do not need to call this method, since fit() saves checkpoints automatically. If you have disabled automatic checkpointing during fitting, this can be called to manually write checkpoints.

Parameters:max_checkpoints_to_keep (int) – the maximum number of checkpoints to keep. Older checkpoints are discarded.
set_loss(layer)
set_optimizer(optimizer)

Set the optimizer to use for fitting.

set_params(**params)

Set the parameters of this estimator.

The method works on simple estimators as well as on nested objects (such as pipelines). The latter have parameters of the form <component>__<parameter> so that it’s possible to update each component of a nested object.

Returns:
Return type:self
smiles_to_seq(smiles)[source]

Tokenize characters in smiles to integers

topsort()

Module contents