deepchem.trans package

Submodules

deepchem.trans.transformers module

Contains an abstract base class that supports data transformations.

class deepchem.trans.transformers.ANITransformer(max_atoms=23, radial_cutoff=4.6, angular_cutoff=3.1, radial_length=32, angular_length=8, atom_cases=[1, 6, 7, 8, 16], atomic_number_differentiated=True, coordinates_in_bohr=True, transform_X=True, transform_y=False, transform_w=False)[source]

Bases: transformers.Transformer

Performs transform from 3D coordinates to ANI symmetry functions

angular_symmetry(d_cutoff, d, atom_numbers, coordinates)[source]

Angular Symmetry Function

build()[source]

tensorflow computation graph for transform

distance_cutoff(d, cutoff, flags)[source]

Generate distance matrix with trainable cutoff

distance_matrix(coordinates, flags)[source]

Generate distance matrix

get_num_feats()[source]
radial_symmetry(d_cutoff, d, atom_numbers)[source]

Radial Symmetry Function

transform(dataset, parallel=False)

Transforms all internally stored data. Adds X-transform, y-transform columns to metadata.

transform_array(X, y, w)[source]
transform_on_array(X, y, w)

Transforms numpy arrays X, y, and w

untransform(z)[source]
class deepchem.trans.transformers.BalancingTransformer(transform_X=False, transform_y=False, transform_w=False, dataset=None, seed=None)[source]

Bases: transformers.Transformer

Balance positive and negative examples for weights.

transform(dataset, parallel=False)

Transforms all internally stored data. Adds X-transform, y-transform columns to metadata.

transform_array(X, y, w)[source]

Transform the data in a set of (X, y, w) arrays.

transform_on_array(X, y, w)

Transforms numpy arrays X, y, and w

untransform(z)

Reverses stored transformation on provided data.

class deepchem.trans.transformers.CDFTransformer(transform_X=False, transform_y=False, dataset=None, bins=2)[source]

Bases: transformers.Transformer

Histograms the data and assigns values based on sorted list.

transform(dataset, bins)[source]

Performs CDF transform on data.

transform_array(X, y, w)

Transform the data in a set of (X, y, w) arrays.

transform_on_array(X, y, w)

Transforms numpy arrays X, y, and w

untransform(z)[source]
class deepchem.trans.transformers.ClippingTransformer(transform_X=False, transform_y=False, transform_w=False, dataset=None, x_max=5.0, y_max=500.0)[source]

Bases: transformers.Transformer

Clip large values in datasets.

Example:

>>> n_samples = 10
>>> n_features = 3
>>> n_tasks = 1
>>> ids = np.arange(n_samples)
>>> X = np.random.rand(n_samples, n_features)
>>> y = np.zeros((n_samples, n_tasks))
>>> w = np.ones((n_samples, n_tasks))
>>> dataset = dc.data.NumpyDataset(X, y, w, ids)
>>> transformer = dc.trans.ClippingTransformer(transform_X=True)
>>> dataset = transformer.transform(dataset)
transform(dataset, parallel=False)

Transforms all internally stored data. Adds X-transform, y-transform columns to metadata.

transform_array(X, y, w)[source]

Transform the data in a set of (X, y, w) arrays.

X: np.ndarray
Features
y: np.ndarray
Tasks
w: np.ndarray
Weights
X: np.ndarray
Transformed features
y: np.ndarray
Transformed tasks
w: np.ndarray
Transformed weights
transform_on_array(X, y, w)

Transforms numpy arrays X, y, and w

untransform(z)[source]
class deepchem.trans.transformers.CoulombFitTransformer(dataset)[source]

Bases: transformers.Transformer

Performs randomization and binarization operations on batches of Coulomb Matrix features during fit.

Example:

>>> n_samples = 10
>>> n_features = 3
>>> n_tasks = 1
>>> ids = np.arange(n_samples)
>>> X = np.random.rand(n_samples, n_features, n_features)
>>> y = np.zeros((n_samples, n_tasks))
>>> w = np.ones((n_samples, n_tasks))
>>> dataset = dc.data.NumpyDataset(X, y, w, ids)
>>> fit_transformers = [dc.trans.CoulombFitTransformer(dataset)]
>>> model = dc.models.MultiTaskFitTransformRegressor(n_tasks,
...    [n_features, n_features], batch_size=n_samples, fit_transformers=fit_transformers, n_evals=1)
n_features after fit_transform: 12
X_transform(X)[source]

Perform Coulomb Fit transform on features.

X: np.ndarray
Features
X: np.ndarray
Transformed features
expand(X)[source]

Binarize features.

X: np.ndarray
Features
X: np.ndarray
Binarized features
normalize(X)[source]

Normalize features.

X: np.ndarray
Features
X: np.ndarray
Normalized features
realize(X)[source]

Randomize features.

X: np.ndarray
Features
X: np.ndarray
Randomized features
transform(dataset, parallel=False)

Transforms all internally stored data. Adds X-transform, y-transform columns to metadata.

transform_array(X, y, w)[source]
transform_on_array(X, y, w)

Transforms numpy arrays X, y, and w

untransform(z)[source]
class deepchem.trans.transformers.DAGTransformer(max_atoms=50, transform_X=True, transform_y=False, transform_w=False)[source]

Bases: transformers.Transformer

Performs transform from ConvMol adjacency lists to DAG calculation orders

UG_to_DAG(sample)[source]

This function generates the DAGs for a molecule

transform(dataset, parallel=False)

Transforms all internally stored data. Adds X-transform, y-transform columns to metadata.

transform_array(X, y, w)[source]

Add calculation orders to ConvMol objects

transform_on_array(X, y, w)

Transforms numpy arrays X, y, and w

untransform(z)[source]
class deepchem.trans.transformers.FeaturizationTransformer(transform_X=False, transform_y=False, transform_w=False, dataset=None, featurizer=None)[source]

Bases: transformers.Transformer

A transformer which runs a featurizer over the X values of a dataset. Datasets used by this transformer must have rdkit.mol objects as the X values

transform(dataset, parallel=False)

Transforms all internally stored data. Adds X-transform, y-transform columns to metadata.

transform_array(X, y, w)[source]
transform_on_array(X, y, w)

Transforms numpy arrays X, y, and w

untransform(z)

Reverses stored transformation on provided data.

class deepchem.trans.transformers.IRVTransformer(K, n_tasks, dataset, transform_y=False, transform_x=False)[source]

Bases: object

Performs transform from ECFP to IRV features(K nearest neibours).

X_transform(X_target)[source]

Calculate similarity between target dataset(X_target) and reference dataset(X): #(1 in intersection)/#(1 in union)

similarity = (X_target intersect X)/(X_target union X)
X_target: np.ndarray
fingerprints of target dataset should have same length with X in the second axis
X_target: np.ndarray
features of size(batch_size, 2*K*n_tasks)
static matrix_mul(X1, X2, shard_size=5000)[source]

Calculate matrix multiplication for big matrix, X1 and X2 are sliced into pieces with shard_size rows(columns) then multiplied together and concatenated to the proper size

realize(similarity, y, w)[source]

find samples with top ten similarity values in the reference dataset

similarity: np.ndarray
similarity value between target dataset and reference dataset should have size of (n_samples_in_target, n_samples_in_reference)
y: np.array
labels for a single task
w: np.array
weights for a single task
features: list
n_samples * np.array of size (2*K,) each array includes K similarity values and corresponding labels
transform(dataset)[source]
untransform(z)[source]
class deepchem.trans.transformers.ImageTransformer(size, transform_X=True, transform_y=False, transform_w=False)[source]

Bases: transformers.Transformer

Convert an image into width, height, channel

transform(dataset, parallel=False)

Transforms all internally stored data. Adds X-transform, y-transform columns to metadata.

transform_array(X, y, w)[source]

Transform the data in a set of (X, y, w) arrays.

transform_on_array(X, y, w)

Transforms numpy arrays X, y, and w

untransform(z)

Reverses stored transformation on provided data.

class deepchem.trans.transformers.LogTransformer(transform_X=False, transform_y=False, features=None, tasks=None, dataset=None)[source]

Bases: transformers.Transformer

tasks = None

Initialize log transformation.

transform(dataset, parallel=False)

Transforms all internally stored data. Adds X-transform, y-transform columns to metadata.

transform_array(X, y, w)[source]

Transform the data in a set of (X, y, w) arrays.

transform_on_array(X, y, w)

Transforms numpy arrays X, y, and w

untransform(z)[source]

Undo transformation on provided data.

class deepchem.trans.transformers.NormalizationTransformer(transform_X=False, transform_y=False, transform_w=False, dataset=None, transform_gradients=False)[source]

Bases: transformers.Transformer

transform(dataset, parallel=False)[source]
transform_array(X, y, w)[source]

Transform the data in a set of (X, y, w) arrays.

transform_on_array(X, y, w)

Transforms numpy arrays X, y, and w

untransform(z)[source]

Undo transformation on provided data.

untransform_grad(grad, tasks)[source]

Undo transformation on gradient.

class deepchem.trans.transformers.PowerTransformer(transform_X=False, transform_y=False, powers=[1])[source]

Bases: transformers.Transformer

Takes power n transforms of the data based on an input vector.

transform(dataset)[source]

Performs power transform on data.

transform_array(X, y, w)

Transform the data in a set of (X, y, w) arrays.

transform_on_array(X, y, w)

Transforms numpy arrays X, y, and w

untransform(z)[source]
deepchem.trans.transformers.get_cdf_values(array, bins)[source]
deepchem.trans.transformers.get_grad_statistics(dataset)[source]

Computes and returns statistics of a dataset

This function assumes that the first task of a dataset holds the energy for an input system, and that the remaining tasks holds the gradient for the system.

deepchem.trans.transformers.undo_grad_transforms(grad, tasks, transformers)[source]
deepchem.trans.transformers.undo_transforms(y, transformers)[source]

Undoes all transformations applied.

Module contents

Gathers all transformers in one place for convenient imports