deepchem.feat package

Submodules

deepchem.feat.adjacency_fingerprints module

class deepchem.feat.adjacency_fingerprints.AdjacencyFingerprint(n_atom_types=23, max_n_atoms=200, add_hydrogens=False, max_valence=4, num_atoms_feature=False)[source]

Bases: deepchem.feat.base_classes.Featurizer

__call__(mols)

Calculate features for molecules.

Parameters:mols (iterable) – RDKit Mol objects.
featurize(rdkit_mols)[source]
deepchem.feat.adjacency_fingerprints.featurize_mol(mol, n_atom_types, max_n_atoms, max_valence, num_atoms_feature)[source]
deepchem.feat.adjacency_fingerprints.get_atom_adj_matrices(mol, n_atom_types, max_n_atoms=200, max_valence=4, graph_conv_features=True, nxn=True)[source]
deepchem.feat.adjacency_fingerprints.get_atom_type(atom)[source]

deepchem.feat.atomic_coordinates module

Atomic coordinate featurizer.

class deepchem.feat.atomic_coordinates.AtomicCoordinates[source]

Bases: deepchem.feat.base_classes.Featurizer

Nx3 matrix of Cartesian coordinates [Angstrom]

__call__(mols)

Calculate features for molecules.

Parameters:mols (iterable) – RDKit Mol objects.
featurize(mols, verbose=True, log_every_n=1000)

Calculate features for molecules.

Parameters:mols (iterable) – RDKit Mol objects.
name = ['atomic_coordinates']
class deepchem.feat.atomic_coordinates.ComplexNeighborListFragmentAtomicCoordinates(frag1_num_atoms, frag2_num_atoms, complex_num_atoms, max_num_neighbors, neighbor_cutoff, strip_hydrogens=True)[source]

Bases: deepchem.feat.base_classes.ComplexFeaturizer

featurize_complexes(mol_pdbs, protein_pdbs, verbose=True, log_every_n=1000)

Calculate features for mol/protein complexes.

Parameters:
  • mol_pdbs (list) – List of PDBs for molecules. Each PDB should be a list of lines of the PDB file.
  • protein_pdbs (list) – List of PDBs for proteins. Each PDB should be a list of lines of the PDB file.
featurize_mol(coords, mol, max_num_atoms)[source]
get_Z_matrix(mol, max_atoms)[source]
class deepchem.feat.atomic_coordinates.NeighborListAtomicCoordinates(max_num_neighbors=None, neighbor_cutoff=4, periodic_box_size=None)[source]

Bases: deepchem.feat.base_classes.Featurizer

Adjacency List of neighbors in 3-space

Neighbors determined by user-defined distance cutoff [in Angstrom].

https://en.wikipedia.org/wiki/Cell_list Ref: http://www.cs.cornell.edu/ron/references/1989/Calculations%20of%20a%20List%20of%20Neighbors%20in%20Molecular%20Dynamics%20Si.pdf

Parameters:
  • neighbor_cutoff (float) – Threshold distance [Angstroms] for counting neighbors.
  • periodic_box_size (3 element array) – Dimensions of the periodic box in Angstroms, or None to not use periodic boundary conditions
__call__(mols)

Calculate features for molecules.

Parameters:mols (iterable) – RDKit Mol objects.
featurize(mols, verbose=True, log_every_n=1000)

Calculate features for molecules.

Parameters:mols (iterable) – RDKit Mol objects.
class deepchem.feat.atomic_coordinates.NeighborListComplexAtomicCoordinates(max_num_neighbors=None, neighbor_cutoff=4)[source]

Bases: deepchem.feat.base_classes.ComplexFeaturizer

Adjacency list of neighbors for protein-ligand complexes in 3-space.

Neighbors dtermined by user-dfined distance cutoff.

featurize_complexes(mol_pdbs, protein_pdbs, verbose=True, log_every_n=1000)

Calculate features for mol/protein complexes.

Parameters:
  • mol_pdbs (list) – List of PDBs for molecules. Each PDB should be a list of lines of the PDB file.
  • protein_pdbs (list) – List of PDBs for proteins. Each PDB should be a list of lines of the PDB file.
deepchem.feat.atomic_coordinates.compute_neighbor_list(coords, neighbor_cutoff, max_num_neighbors, periodic_box_size)[source]

Computes a neighbor list from atom coordinates.

deepchem.feat.atomic_coordinates.get_coords(mol)[source]

Gets coordinates in Angstrom for RDKit mol.

deepchem.feat.base_classes module

Feature calculations.

class deepchem.feat.base_classes.ComplexFeaturizer[source]

Bases: object

” Abstract class for calculating features for mol/protein complexes.

featurize_complexes(mol_pdbs, protein_pdbs, verbose=True, log_every_n=1000)[source]

Calculate features for mol/protein complexes.

Parameters:
  • mol_pdbs (list) – List of PDBs for molecules. Each PDB should be a list of lines of the PDB file.
  • protein_pdbs (list) – List of PDBs for proteins. Each PDB should be a list of lines of the PDB file.
class deepchem.feat.base_classes.Featurizer[source]

Bases: object

Abstract class for calculating a set of features for a molecule.

Child classes implement the _featurize method for calculating features for a single molecule.

__call__(mols)[source]

Calculate features for molecules.

Parameters:mols (iterable) – RDKit Mol objects.
featurize(mols, verbose=True, log_every_n=1000)[source]

Calculate features for molecules.

Parameters:mols (iterable) – RDKit Mol objects.
class deepchem.feat.base_classes.UserDefinedFeaturizer(feature_fields)[source]

Bases: deepchem.feat.base_classes.Featurizer

Directs usage of user-computed featurizations.

__call__(mols)

Calculate features for molecules.

Parameters:mols (iterable) – RDKit Mol objects.
featurize(mols, verbose=True, log_every_n=1000)

Calculate features for molecules.

Parameters:mols (iterable) – RDKit Mol objects.

deepchem.feat.basic module

Basic molecular features.

class deepchem.feat.basic.MolecularWeight[source]

Bases: deepchem.feat.base_classes.Featurizer

Molecular weight.

__call__(mols)

Calculate features for molecules.

Parameters:mols (iterable) – RDKit Mol objects.
featurize(mols, verbose=True, log_every_n=1000)

Calculate features for molecules.

Parameters:mols (iterable) – RDKit Mol objects.
name = ['mw', 'molecular_weight']
class deepchem.feat.basic.RDKitDescriptors[source]

Bases: deepchem.feat.base_classes.Featurizer

RDKit descriptors.

See http://rdkit.org/docs/GettingStartedInPython.html #list-of-available-descriptors.

__call__(mols)

Calculate features for molecules.

Parameters:mols (iterable) – RDKit Mol objects.
allowedDescriptors = {'Chi0v', 'SMR_VSA3', 'PEOE_VSA14', 'Chi3v', 'VSA_EState5', 'NumValenceElectrons', 'SlogP_VSA11', 'NumHeteroatoms', 'Ipc', 'PEOE_VSA6', 'HeavyAtomMolWt', 'NHOHCount', 'SlogP_VSA4', 'MinEStateIndex', 'SlogP_VSA6', 'Chi0n', 'NumRadicalElectrons', 'EState_VSA4', 'VSA_EState2', 'SMR_VSA4', 'SMR_VSA2', 'MinAbsEStateIndex', 'BertzCT', 'EState_VSA1', 'EState_VSA11', 'TPSA', 'EState_VSA3', 'EState_VSA10', 'RingCount', 'Chi1n', 'NumAromaticCarbocycles', 'BalabanJ', 'Kappa2', 'NumSaturatedHeterocycles', 'SMR_VSA7', 'NumAliphaticRings', 'FractionCSP3', 'NumAliphaticHeterocycles', 'SlogP_VSA3', 'Kappa1', 'PEOE_VSA4', 'MaxAbsEStateIndex', 'NOCount', 'MaxPartialCharge', 'SlogP_VSA7', 'VSA_EState4', 'Chi4v', 'VSA_EState7', 'SMR_VSA1', 'EState_VSA2', 'SMR_VSA5', 'PEOE_VSA3', 'MinAbsPartialCharge', 'PEOE_VSA10', 'PEOE_VSA12', 'SlogP_VSA10', 'Chi2n', 'SlogP_VSA12', 'SMR_VSA6', 'SlogP_VSA9', 'SlogP_VSA5', 'Chi1', 'PEOE_VSA8', 'MolMR', 'VSA_EState8', 'MolLogP', 'Chi4n', 'VSA_EState6', 'ExactMolWt', 'NumRotatableBonds', 'MaxEStateIndex', 'NumSaturatedRings', 'VSA_EState1', 'SlogP_VSA2', 'HeavyAtomCount', 'NumAromaticHeterocycles', 'HallKierAlpha', 'EState_VSA7', 'VSA_EState10', 'PEOE_VSA9', 'VSA_EState3', 'NumAliphaticCarbocycles', 'NumHAcceptors', 'PEOE_VSA13', 'Chi3n', 'MolWt', 'Kappa3', 'PEOE_VSA7', 'EState_VSA8', 'PEOE_VSA11', 'Chi0', 'EState_VSA9', 'MinPartialCharge', 'VSA_EState9', 'PEOE_VSA1', 'NumHDonors', 'LabuteASA', 'SlogP_VSA1', 'NumSaturatedCarbocycles', 'NumAromaticRings', 'SMR_VSA9', 'Chi1v', 'EState_VSA5', 'Chi2v', 'MaxAbsPartialCharge', 'SMR_VSA8', 'SlogP_VSA8', 'EState_VSA6', 'PEOE_VSA5', 'PEOE_VSA2', 'SMR_VSA10'}
featurize(mols, verbose=True, log_every_n=1000)

Calculate features for molecules.

Parameters:mols (iterable) – RDKit Mol objects.
name = 'descriptors'

deepchem.feat.binding_pocket_features module

Featurizes proposed binding pockets.

class deepchem.feat.binding_pocket_features.BindingPocketFeaturizer[source]

Bases: deepchem.feat.base_classes.Featurizer

Featurizes binding pockets with information about chemical environments.

__call__(mols)

Calculate features for molecules.

Parameters:mols (iterable) – RDKit Mol objects.
featurize(protein_file, pockets, pocket_atoms_map, pocket_coords, verbose=False)[source]

Calculate atomic coodinates.

n_features = 24
residues = ['ALA', 'ARG', 'ASN', 'ASP', 'CYS', 'GLN', 'GLU', 'GLY', 'HIS', 'ILE', 'LEU', 'LYS', 'MET', 'PHE', 'PRO', 'PYL', 'SER', 'SEC', 'THR', 'TRP', 'TYR', 'VAL', 'ASX', 'GLX']

deepchem.feat.coulomb_matrices module

Generate coulomb matrices for molecules.

See Montavon et al., _New Journal of Physics_ __15__ (2013) 095003.

class deepchem.feat.coulomb_matrices.BPSymmetryFunction(max_atoms)[source]

Bases: deepchem.feat.base_classes.Featurizer

Calculate Symmetry Function for each atom in the molecules Methods described in https://journals.aps.org/prl/pdf/10.1103/PhysRevLett.98.146401

__call__(mols)

Calculate features for molecules.

Parameters:mols (iterable) – RDKit Mol objects.
featurize(mols, verbose=True, log_every_n=1000)

Calculate features for molecules.

Parameters:mols (iterable) – RDKit Mol objects.
class deepchem.feat.coulomb_matrices.CoulombMatrix(max_atoms, remove_hydrogens=False, randomize=False, upper_tri=False, n_samples=1, seed=None)[source]

Bases: deepchem.feat.base_classes.Featurizer

Calculate Coulomb matrices for molecules.

Parameters:
  • max_atoms (int) – Maximum number of atoms for any molecule in the dataset. Used to pad the Coulomb matrix.
  • remove_hydrogens (bool, optional (default False)) – Whether to remove hydrogens before constructing Coulomb matrix.
  • randomize (bool, optional (default False)) – Whether to randomize Coulomb matrices to remove dependence on atom index order.
  • upper_tri (bool, optional (default False)) – Whether to return the upper triangular portion of the Coulomb matrix.
  • n_samples (int, optional (default 1)) – Number of random Coulomb matrices to generate if randomize is True.
  • seed (int, optional) – Random seed.
  • Example
  • featurizers = dc.feat.CoulombMatrix(max_atoms=23) (>>>) –
  • input_file = 'deepchem/feat/tests/data/water.sdf' # really backed by water.sdf.csv (>>>) –
  • tasks = ["atomization_energy"] (>>>) –
  • featurizer = dc.data.SDFLoader(tasks, smiles_field="smiles", mol_field="mol", (>>>) –
  • featurizer=featurizers, verbose=False) (..) –
  • dataset = featurizer.featurize(input_file) (>>>) –
  • structures from deepchem/feat/tests/data/water.sdf. (Reading) –
  • sample 0 (Featurizing) –
__call__(mols)

Calculate features for molecules.

Parameters:mols (iterable) – RDKit Mol objects.
conformers = True
coulomb_matrix(mol)[source]

Generate Coulomb matrices for each conformer of the given molecule.

Parameters:mol (RDKit Mol) – Molecule.
featurize(mols, verbose=True, log_every_n=1000)

Calculate features for molecules.

Parameters:mols (iterable) – RDKit Mol objects.
static get_interatomic_distances(conf)[source]

Get interatomic distances for atoms in a molecular conformer.

Parameters:conf (RDKit Conformer) – Molecule conformer.
name = 'coulomb_matrix'
randomize_coulomb_matrix(m)[source]

Randomize a Coulomb matrix as decribed in Montavon et al., _New Journal of Physics_ __15__ (2013) 095003:

  1. Compute row norms for M in a vector row_norms.
  2. Sample a zero-mean unit-variance noise vector e with dimension equal to row_norms.
  3. Permute the rows and columns of M with the permutation that sorts row_norms + e.
Parameters:
  • m (ndarray) – Coulomb matrix.
  • n_samples (int, optional (default 1)) – Number of random matrices to generate.
  • seed (int, optional) – Random seed.
class deepchem.feat.coulomb_matrices.CoulombMatrixEig(max_atoms, remove_hydrogens=False, randomize=False, n_samples=1, seed=None)[source]

Bases: deepchem.feat.coulomb_matrices.CoulombMatrix

Calculate the eigenvales of Coulomb matrices for molecules.

Parameters:
  • max_atoms (int) – Maximum number of atoms for any molecule in the dataset. Used to pad the Coulomb matrix.
  • remove_hydrogens (bool, optional (default False)) – Whether to remove hydrogens before constructing Coulomb matrix.
  • randomize (bool, optional (default False)) – Whether to randomize Coulomb matrices to remove dependence on atom index order.
  • n_samples (int, optional (default 1)) – Number of random Coulomb matrices to generate if randomize is True.
  • seed (int, optional) – Random seed.
  • Example
  • featurizers = dc.feat.CoulombMatrixEig(max_atoms=23) (>>>) –
  • input_file = 'deepchem/feat/tests/data/water.sdf' # really backed by water.sdf.csv (>>>) –
  • tasks = ["atomization_energy"] (>>>) –
  • featurizer = dc.data.SDFLoader(tasks, smiles_field="smiles", mol_field="mol", (>>>) –
  • featurizer=featurizers, verbose=False) (..) –
  • dataset = featurizer.featurize(input_file) (>>>) –
  • structures from deepchem/feat/tests/data/water.sdf. (Reading) –
  • sample 0 (Featurizing) –
__call__(mols)

Calculate features for molecules.

Parameters:mols (iterable) – RDKit Mol objects.
conformers = True
coulomb_matrix(mol)

Generate Coulomb matrices for each conformer of the given molecule.

Parameters:mol (RDKit Mol) – Molecule.
featurize(mols, verbose=True, log_every_n=1000)

Calculate features for molecules.

Parameters:mols (iterable) – RDKit Mol objects.
get_interatomic_distances(conf)

Get interatomic distances for atoms in a molecular conformer.

Parameters:conf (RDKit Conformer) – Molecule conformer.
name = 'coulomb_matrix'
randomize_coulomb_matrix(m)

Randomize a Coulomb matrix as decribed in Montavon et al., _New Journal of Physics_ __15__ (2013) 095003:

  1. Compute row norms for M in a vector row_norms.
  2. Sample a zero-mean unit-variance noise vector e with dimension equal to row_norms.
  3. Permute the rows and columns of M with the permutation that sorts row_norms + e.
Parameters:
  • m (ndarray) – Coulomb matrix.
  • n_samples (int, optional (default 1)) – Number of random matrices to generate.
  • seed (int, optional) – Random seed.

deepchem.feat.fingerprints module

Topological fingerprints.

class deepchem.feat.fingerprints.CircularFingerprint(radius=2, size=2048, chiral=False, bonds=True, features=False, sparse=False, smiles=False)[source]

Bases: deepchem.feat.base_classes.Featurizer

Circular (Morgan) fingerprints.

Parameters:
  • radius (int, optional (default 2)) – Fingerprint radius.
  • size (int, optional (default 2048)) – Length of generated bit vector.
  • chiral (bool, optional (default False)) – Whether to consider chirality in fingerprint generation.
  • bonds (bool, optional (default True)) – Whether to consider bond order in fingerprint generation.
  • features (bool, optional (default False)) – Whether to use feature information instead of atom information; see RDKit docs for more info.
  • sparse (bool, optional (default False)) – Whether to return a dict for each molecule containing the sparse fingerprint.
  • smiles (bool, optional (default False)) – Whether to calculate SMILES strings for fragment IDs (only applicable when calculating sparse fingerprints).
__call__(mols)

Calculate features for molecules.

Parameters:mols (iterable) – RDKit Mol objects.
featurize(mols, verbose=True, log_every_n=1000)

Calculate features for molecules.

Parameters:mols (iterable) – RDKit Mol objects.
name = 'circular'

deepchem.feat.graph_features module

class deepchem.feat.graph_features.ConvMolFeaturizer(master_atom=False, use_chirality=False)[source]

Bases: deepchem.feat.base_classes.Featurizer

__call__(mols)

Calculate features for molecules.

Parameters:mols (iterable) – RDKit Mol objects.
featurize(mols, verbose=True, log_every_n=1000)

Calculate features for molecules.

Parameters:mols (iterable) – RDKit Mol objects.
name = ['conv_mol']
class deepchem.feat.graph_features.WeaveFeaturizer(graph_distance=True, explicit_H=False, use_chirality=False)[source]

Bases: deepchem.feat.base_classes.Featurizer

__call__(mols)

Calculate features for molecules.

Parameters:mols (iterable) – RDKit Mol objects.
featurize(mols, verbose=True, log_every_n=1000)

Calculate features for molecules.

Parameters:mols (iterable) – RDKit Mol objects.
name = ['weave_mol']
deepchem.feat.graph_features.atom_features(atom, bool_id_feat=False, explicit_H=False, use_chirality=False)[source]
deepchem.feat.graph_features.atom_to_id(atom)[source]

Return a unique id corresponding to the atom type

deepchem.feat.graph_features.bond_features(bond, use_chirality=False)[source]
deepchem.feat.graph_features.features_to_id(features, intervals)[source]

Convert list of features into index using spacings provided in intervals

deepchem.feat.graph_features.find_distance(a1, num_atoms, canon_adj_list, max_distance=7)[source]
deepchem.feat.graph_features.get_feature_list(atom)[source]
deepchem.feat.graph_features.get_intervals(l)[source]

For list of lists, gets the cumulative products of the lengths

deepchem.feat.graph_features.id_to_features(id, intervals)[source]
deepchem.feat.graph_features.one_of_k_encoding(x, allowable_set)[source]
deepchem.feat.graph_features.one_of_k_encoding_unk(x, allowable_set)[source]

Maps inputs not in the allowable set to the last element.

deepchem.feat.graph_features.pair_features(mol, edge_list, canon_adj_list, bt_len=6, graph_distance=True)[source]
deepchem.feat.graph_features.safe_index(l, e)[source]

Gets the index of e in l, providing an index of len(l) if not found

deepchem.feat.mol_graphs module

Data Structures used to represented molecules for convolutions.

class deepchem.feat.mol_graphs.ConvMol(atom_features, adj_list, max_deg=10, min_deg=0)[source]

Bases: object

Holds information about a molecules.

Resorts order of atoms internally to be in order of increasing degree. Note that only heavy atoms (hydrogens excluded) are considered here.

static agglomerate_mols(mols, max_deg=10, min_deg=0)[source]

Concatenates list of ConvMol’s into one mol object that can be used to feed into tensorflow placeholders. The indexing of the molecules are preseved during the combination, but the indexing of the atoms are greatly changed.

Parameters:mols (list) – ConvMol objects to be combined into one molecule.
get_adjacency_list()[source]

Returns a canonicalized adjacency list.

Canonicalized means that the atoms are re-ordered by degree.

Returns:Canonicalized form of adjacency list.
Return type:list
get_atom_features()[source]

Returns canonicalized version of atom features.

Features are sorted by atom degree, with original order maintained when degrees are same.

get_atoms_with_deg(deg)[source]

Retrieves atom_features with the specific degree

get_deg_adjacency_lists()[source]

Returns adjacency lists grouped by atom degree.

Returns:Has length (max_deg+1-min_deg). The element at position deg is itself a list of the neighbor-lists for atoms with degree deg.
Return type:list
get_deg_slice()[source]

Returns degree-slice tensor.

The deg_slice tensor allows indexing into a flattened version of the molecule’s atoms. Assume atoms are sorted in order of degree. Then deg_slice[deg][0] is the starting position for atoms of degree deg in flattened list, and deg_slice[deg][1] is the number of atoms with degree deg.

Note deg_slice has shape (max_deg+1-min_deg, 2).

Returns:deg_slice – Shape (max_deg+1-min_deg, 2)
Return type:np.ndarray
static get_null_mol(n_feat, max_deg=10, min_deg=0)[source]

Constructs a null molecules

Get one molecule with one atom of each degree, with all the atoms connected to themselves, and containing n_feat features.

Parameters:n_feat (int) – number of features for the nodes in the null molecule
get_num_atoms()[source]
get_num_atoms_with_deg(deg)[source]

Returns the number of atoms with the given degree

class deepchem.feat.mol_graphs.MultiConvMol(nodes, deg_adj_lists, deg_slice, membership, num_mols)[source]

Bases: object

Holds information about multiple molecules, for use in feeding information into tensorflow. Generated using the agglomerate_mols function

get_atom_features()[source]
get_deg_adjacency_lists()[source]
get_num_atoms()[source]
get_num_molecules()[source]
class deepchem.feat.mol_graphs.WeaveMol(nodes, pairs)[source]

Bases: object

Holds information about a molecule Molecule struct used in weave models

get_atom_features()[source]
get_num_atoms()[source]
get_num_features()[source]
get_pair_features()[source]
deepchem.feat.mol_graphs.cumulative_sum(l, offset=0)[source]

Returns cumulative sums for set of counts.

Returns the cumulative sums for a set of counts with the first returned value starting at 0. I.e [3,2,4] -> [0, 3, 5, 9]. Keeps final sum for searching. Useful for reindexing.

Parameters:l (list) – List of integers. Typically small counts.
deepchem.feat.mol_graphs.cumulative_sum_minus_last(l, offset=0)[source]

Returns cumulative sums for set of counts, removing last entry.

Returns the cumulative sums for a set of counts with the first returned value starting at 0. I.e [3,2,4] -> [0, 3, 5]. Note last sum element 9 is missing. Useful for reindexing

Parameters:l (list) – List of integers. Typically small counts.

deepchem.feat.nnscore_utils module

Helper Classes and Functions for docking fingerprint computation.

class deepchem.feat.nnscore_utils.AromaticRing(center, indices, plane_coeff, radius)[source]

Bases: object

Holds information about an aromatic ring.

class deepchem.feat.nnscore_utils.Atom(atomname='', residue='', coordinates=<deepchem.feat.nnscore_utils.Point object>, element='', pdb_index='', line='', atomtype='', indices_of_atoms_connecting=None, charge=0, resid=0, chain='', structure='', comment='')[source]

Bases: object

Implements a container class for atoms. This class contains useful annotations about the atom.

add_neighbor_atom_indices(indices)[source]

Adds atoms with provided PDB indices as neighbors.

Parameters:index (list) – List of indices of neighbors in PDB object.
copy_of()[source]

Make a copy of this atom.

create_pdb_line(index)[source]

Generates appropriate ATOM line for pdb file.

Parameters:index (int) – Index in associated PDB file.
number_of_neighbors()[source]

Reports number of neighboring atoms.

read_atom_pdb_line(line)[source]

TODO(rbharath): This method probably belongs in the PDB class, and not in the Atom class.

Reads an ATOM or HETATM line from PDB and instantiates fields.

Atoms in PDBs are represented by ATOM or HETATM statements. ATOM and HETATM statements follow the following record format:

(see ftp://ftp.wwpdb.org/pub/pdb/doc/format_descriptions/Format_v33_Letter.pdf)

1 - 6 Record name “ATOM “/”HETATM” 7 - 11 Integer serial Atom serial number. 13 - 16 Atom name Atom name. 17 Character altLoc Alternate location indicator. 18 - 20 Residue name resName Residue name. 22 Character chainID Chain identifier. 23 - 26 Integer resSeq Residue sequence number. 27 AChar iCode Code for insertion of residues. 31 - 38 Real(8.3) x Orthogonal coordinates for X in Angstroms. 39 - 46 Real(8.3) y Orthogonal coordinates for Y in Angstroms. 47 - 54 Real(8.3) z Orthogonal coordinates for Z in Angstroms. 55 - 60 Real(6.2) occupancy Occupancy. 61 - 66 Real(6.2) tempFactor Temperature factor. 77 - 78 LString(2) element Element symbol, right-justified. 79 - 80 LString(2) charge Charge on the atom.

side_chain_or_backbone()[source]

Determine whether receptor atom belongs to residue sidechain or backbone.

class deepchem.feat.nnscore_utils.Charged(coordinates, indices, positive)[source]

Bases: object

A class that represeents a charged atom.

class deepchem.feat.nnscore_utils.Point(x=None, y=None, z=None, coords=None)[source]

Bases: object

Simple implementation for a point in 3-space.

as_array()[source]

Return the coordinates of this point as array.

copy_of()[source]

Return a copy of this point.

dist_to(point)[source]

Distance (in 2-norm) from this point to another.

magnitude()[source]

Magnitude of this point (in 2-norm).

deepchem.feat.nnscore_utils.angle_between_points(point1, point2)[source]

Computes the angle (in radians) between two points.

deepchem.feat.nnscore_utils.angle_between_three_points(point1, point2, point3)[source]

Computes the angle (in radians) between the three provided points.

deepchem.feat.nnscore_utils.average_point(points)[source]

Returns the point with averaged coordinates of arguments.

Parameters:points (list) – List of point objects.
Returns:pavg – Has coordinates the arithmetic average of those of p1 and p2.
Return type:Point object
deepchem.feat.nnscore_utils.cross_product(point1, point2)[source]

Calculates the cross-product of provided points.

deepchem.feat.nnscore_utils.dihedral(point1, point2, point3, point4)[source]

Compute dihedral angle between 4 points.

TODO(rbharath): Write a nontrivial test for this.

deepchem.feat.nnscore_utils.distance(point1, point2)[source]

Computes distance between two points.

deepchem.feat.nnscore_utils.dot_product(point1, point2)[source]

Dot product of points.

deepchem.feat.nnscore_utils.force_partial_charge_computation(mol)[source]

Force computation of partial charges for molecule.

Parameters:mol (Rdkit Mol) – Molecule on which we compute partial charges.
deepchem.feat.nnscore_utils.hydrogenate_and_compute_partial_charges(input_file, input_format, hyd_output=None, pdbqt_output=None, protein=True, verbose=True)[source]

Outputs a hydrogenated pdb and a pdbqt with partial charges.

Takes an input file in specified format. Generates two outputs:

-) A pdb file that contains a hydrogenated (at pH 7.4) version of
original compound.
-) A pdbqt file that has computed Gasteiger partial charges. This pdbqt
file is build from the hydrogenated pdb.

TODO(rbharath): Can do a bit of refactoring between this function and pdbqt_to_pdb.

Parameters:
  • input_file (String) – Path to input file.
  • input_format (String) – Name of input format.
deepchem.feat.nnscore_utils.normalized_vector(point)[source]

Normalize provided point.

deepchem.feat.nnscore_utils.pdbqt_to_pdb(input_file, output_directory)[source]

Convert pdbqt file to pdb file.

Parameters:
  • input_file (String) – Path to input file.
  • output_directory (String) – Path to desired output directory.
deepchem.feat.nnscore_utils.project_point_onto_plane(point, plane_coefficients)[source]

Finds nearest point on specified plane to given point.

Parameters:
  • point (Point) – Given point
  • plane_coefficients (list) – [a, b, c, d] where place equation is ax + by + cz = d
deepchem.feat.nnscore_utils.vector_scalar_multiply(point, scalar)[source]

Multiplies the provided point by scalar.

deepchem.feat.nnscore_utils.vector_subtraction(point1, point2)[source]

Subtracts the coordinates of the provided points.

deepchem.feat.one_hot module

class deepchem.feat.one_hot.OneHotFeaturizer(charset=None, padlength=120)[source]

Bases: deepchem.feat.base_classes.Featurizer

NOTE(LESWING) Not Thread Safe in initialization of charset

__call__(mols)

Calculate features for molecules.

Parameters:mols (iterable) – RDKit Mol objects.
featurize(mols, verbose=True, log_every_n=1000)[source]
Parameters:
  • mols (obj) – List of rdkit Molecule Objects
  • verbose (bool) – How much logging
  • log_every_n – How often to log
  • Returns
  • -------
  • obj – numpy array of features
one_hot_array(i)[source]

Create a one hot array with bit i set to 1 :param i: bit to set to 1 :type i: int

Returns:obj – length len(self.charset)
Return type:list of obj:int
one_hot_encoded(smile)[source]

One Hot Encode an entire SMILE string :param smile: smile string to encode

Returns:np.array of one hot encoded arrays for each character in smile
Return type:object
one_hot_index(c)[source]

TODO(LESWING) replace with map lookup vs linear scan :param c: character whose index we want

Returns:index of c in self.charset
Return type:int
pad_smile(smile)[source]

Pad A Smile String to self.pad_length :param smile: :type smile: str

Returns:smile string space padded to self.pad_length
Return type:str
untransform(z)[source]

Convert from one hot representation back to SMILE :param z: list of one hot encoded features

Returns:
Return type:Smile Strings picking MAX for each one hot encoded array

deepchem.feat.raw_featurizer module

class deepchem.feat.raw_featurizer.RawFeaturizer(smiles=False)[source]

Bases: deepchem.feat.base_classes.Featurizer

__call__(mols)

Calculate features for molecules.

Parameters:mols (iterable) – RDKit Mol objects.
featurize(mols, verbose=True, log_every_n=1000)

Calculate features for molecules.

Parameters:mols (iterable) – RDKit Mol objects.

deepchem.feat.rdkit_grid_featurizer module

class deepchem.feat.rdkit_grid_featurizer.RdkitGridFeaturizer(nb_rotations=0, feature_types=None, ecfp_degree=2, ecfp_power=3, splif_power=3, box_width=16.0, voxel_width=1.0, flatten=False, verbose=True, sanitize=False, **kwargs)[source]

Bases: deepchem.feat.base_classes.ComplexFeaturizer

Featurizes protein-ligand complex using flat features or a 3D grid (in which each voxel is described with a vector of features).

featurize_complexes(mol_files, protein_pdbs, log_every_n=1000)[source]

Calculate features for mol/protein complexes.

Parameters:
  • mols (list) – List of PDB filenames for molecules.
  • protein_pdbs (list) – List of PDB filenames for proteins.
deepchem.feat.rdkit_grid_featurizer.angle_between(vector_i, vector_j)[source]

Returns the angle in radians between vectors “vector_i” and “vector_j”:

>>> print("%0.06f" % angle_between((1, 0, 0), (0, 1, 0)))

1.570796 >>> print(“%0.06f” % angle_between((1, 0, 0), (1, 0, 0))) 0.000000 >>> print(“%0.06f” % angle_between((1, 0, 0), (-1, 0, 0))) 3.141593

Note that this function always returns the smaller of the two angles between the vectors (value between 0 and pi).

deepchem.feat.rdkit_grid_featurizer.compute_all_ecfp(mol, indices=None, degree=2)[source]

Obtain molecular fragment for all atoms emanating outward to given degree. For each fragment, compute SMILES string (for now) and hash to an int. Return a dictionary mapping atom index to hashed SMILES.

deepchem.feat.rdkit_grid_featurizer.compute_all_sybyl(mol, indices=None)[source]

Computes Sybyl atom types for atoms in molecule.

deepchem.feat.rdkit_grid_featurizer.compute_binding_pocket_cation_pi(protein, ligand, **kwargs)[source]

Finds cation-pi interactions between protein and ligand.

protein, ligand: rdkit.rdchem.Mol
Interacting molecules
**kwargs:
Arguments that are passed to compute_cation_pi function
protein_cation_pi, ligand_cation_pi: dict
Dictionaries that maps atom indices to the number of cations/aromatic atoms they interact with
deepchem.feat.rdkit_grid_featurizer.compute_cation_pi(mol1, mol2, charge_tolerance=0.01, **kwargs)[source]

Finds aromatic rings in mo1 and cations in mol2 that interact with each other.

mol1: rdkit.rdchem.Mol
Molecule to look for interacting rings
mol2: rdkit.rdchem.Mol
Molecule to look for interacting cations
charge_tolerance: float
Atom is considered a cation if its formal charge is greater than 1 - charge_tolerance
**kwargs:
Arguments that are passed to is_cation_pi function
mol1_pi: dict
Dictionary that maps atom indices (from mol1) to the number of cations (in mol2) they interact with
mol2_cation: dict
Dictionary that maps atom indices (from mol2) to the number of aromatic atoms (in mol1) they interact with
deepchem.feat.rdkit_grid_featurizer.compute_centroid(coordinates)[source]

Compute the x,y,z centroid of provided coordinates

coordinates: np.ndarray
Shape (N, 3), where N is number atoms.
deepchem.feat.rdkit_grid_featurizer.compute_charge_dictionary(molecule)[source]

Create a dictionary with partial charges for each atom in the molecule.

This function assumes that the charges for the molecule are already computed (it can be done with rdkit_util.compute_charges(molecule))

deepchem.feat.rdkit_grid_featurizer.compute_ecfp_features(mol, ecfp_degree=2, ecfp_power=11)[source]

Computes ECFP features for provided rdkit molecule.

mol: rdkit molecule
Molecule to featurize.
ecfp_degree: int
ECFP radius
ecfp_power: int
Number of bits to store ECFP features (2^ecfp_power will be length of ECFP array)
ecfp_array: np.ndarray
Returns an array of size 2^ecfp_power where array at index i has a 1 if that ECFP fragment is found in the molecule and array at index j has a 0 if ECFP fragment not in molecule.
deepchem.feat.rdkit_grid_featurizer.compute_hbonds_in_range(protein, protein_xyz, ligand, ligand_xyz, pairwise_distances, hbond_dist_bin, hbond_angle_cutoff)[source]

Find all pairs of (protein_index_i, ligand_index_j) that hydrogen bond given a distance bin and an angle cutoff.

deepchem.feat.rdkit_grid_featurizer.compute_hydrogen_bonds(protein_xyz, protein, ligand_xyz, ligand, pairwise_distances, hbond_dist_bins, hbond_angle_cutoffs)[source]

Computes hydrogen bonds between proteins and ligands.

Returns a list of sublists. Each sublist is a series of tuples of (protein_index_i, ligand_index_j) that represent a hydrogen bond. Each sublist represents a different type of hydrogen bond.

deepchem.feat.rdkit_grid_featurizer.compute_pairwise_distances(protein_xyz, ligand_xyz)[source]

Takes an input m x 3 and n x 3 np arrays of 3D coords of protein and ligand, respectively, and outputs an m x n np array of pairwise distances in Angstroms between protein and ligand atoms. entry (i,j) is dist between the i”th protein atom and the j”th ligand atom.

deepchem.feat.rdkit_grid_featurizer.compute_pi_stack(protein, ligand, pairwise_distances=None, dist_cutoff=4.4, angle_cutoff=30.0)[source]

Find aromatic rings in protein and ligand that form pi-pi contacts. For each atom in the contact, count number of atoms in the other molecule that form this contact.

Pseudocode:

for each aromatic ring in protein:
for each aromatic ring in ligand:

compute distance between centers compute angle between normals if it counts as parallel pi-pi:

count interacting atoms
if it counts as pi-T:
count interacting atoms
protein, ligand: rdkit.rdchem.Mol
Two interacting molecules.
pairwise_distances: np.ndarray (optional)
Array of pairwise protein-ligand distances (Angstroms)
dist_cutoff: float
Distance cutoff. Max allowed distance between the ring center (Angstroms).
angle_cutoff: float
Angle cutoff. Max allowed deviation from the ideal angle between rings.
protein_pi_t, protein_pi_parallel, ligand_pi_t, ligand_pi_parallel: dict
Dictionaries mapping atom indices to number of atoms they interact with. Separate dictionary is created for each type of pi stacking (parallel and T-shaped) and each molecule (protein and ligand).
deepchem.feat.rdkit_grid_featurizer.compute_ring_center(mol, ring_indices)[source]

Computes 3D coordinates of a center of a given ring.

mol: rdkit.rdchem.Mol
Molecule containing a ring
ring_indices: array-like
Indices of atoms forming a ring
ring_centroid: np.ndarray
Position of a ring center
deepchem.feat.rdkit_grid_featurizer.compute_ring_normal(mol, ring_indices)[source]

Computes normal to a plane determined by a given ring.

mol: rdkit.rdchem.Mol
Molecule containing a ring
ring_indices: array-like
Indices of atoms forming a ring
normal: np.ndarray
Normal vector
deepchem.feat.rdkit_grid_featurizer.compute_salt_bridges(protein_xyz, protein, ligand_xyz, ligand, pairwise_distances, cutoff=5.0)[source]

Find salt bridge contacts between protein and lingand.

protein_xyz, ligand_xyz: np.ndarray
Arrays with atomic coordinates
protein, ligand: rdkit.rdchem.Mol
Interacting molecules
pairwise_distances: np.ndarray
Array of pairwise protein-ligand distances (Angstroms)
cutoff: float
Cutoff distance for contact consideration
salt_bridge_contacts: list of tuples
List of contacts. Tuple (i, j) indicates that atom i from protein interacts with atom j from ligand.
deepchem.feat.rdkit_grid_featurizer.compute_splif_features_in_range(protein, ligand, pairwise_distances, contact_bin, ecfp_degree=2)[source]

Computes SPLIF features for protein atoms close to ligand atoms.

Finds all protein atoms that are > contact_bin[0] and < contact_bin[1] away from ligand atoms. Then, finds the ECFP fingerprints for the contacting atoms. Returns a dictionary mapping (protein_index_i, ligand_index_j) –> (protein_ecfp_i, ligand_ecfp_j)

deepchem.feat.rdkit_grid_featurizer.convert_atom_pair_to_voxel(molecule_xyz_tuple, atom_index_pair, box_width, voxel_width)[source]

Converts a pair of atoms to a list of i,j,k tuples.

deepchem.feat.rdkit_grid_featurizer.convert_atom_to_voxel(molecule_xyz, atom_index, box_width, voxel_width, verbose=False)[source]

Converts atom coordinates to an i,j,k grid index.

molecule_xyz: np.ndarray
Array with coordinates of all atoms in the molecule, shape (N, 3)
atom_index: int
Index of an atom
box_width: float
Size of a box
voxel_width: float
Size of a voxel
verbose: bool
Print warnings when atom is outside of a box
deepchem.feat.rdkit_grid_featurizer.featurize_binding_pocket_ecfp(protein_xyz, protein, ligand_xyz, ligand, pairwise_distances=None, cutoff=4.5, ecfp_degree=2)[source]

Computes ECFP dicts for ligand and binding pocket of the protein.

Parameters:
  • protein_xyz (np.ndarray) – Of shape (N_protein_atoms, 3)
  • protein (rdkit.rdchem.Mol) – Contains more metadata.
  • ligand_xyz (np.ndarray) – Of shape (N_ligand_atoms, 3)
  • ligand (rdkit.rdchem.Mol) – Contains more metadata
  • pairwise_distances (np.ndarray) – Array of pairwise protein-ligand distances (Angstroms)
  • cutoff (float) – Cutoff distance for contact consideration
  • ecfp_degree (int) – ECFP radius
deepchem.feat.rdkit_grid_featurizer.featurize_binding_pocket_sybyl(protein_xyz, protein, ligand_xyz, ligand, pairwise_distances=None, cutoff=7.0)[source]

Computes Sybyl dicts for ligand and binding pocket of the protein.

Parameters:
  • protein_xyz (np.ndarray) – Of shape (N_protein_atoms, 3)
  • protein (Rdkit Molecule) – Contains more metadata.
  • ligand_xyz (np.ndarray) – Of shape (N_ligand_atoms, 3)
  • ligand (Rdkit Molecule) – Contains more metadata
  • pairwise_distances (np.ndarray) – Array of pairwise protein-ligand distances (Angstroms)
  • cutoff (float) – Cutoff distance for contact consideration.
deepchem.feat.rdkit_grid_featurizer.featurize_splif(protein_xyz, protein, ligand_xyz, ligand, contact_bins, pairwise_distances, ecfp_degree)[source]

Computes SPLIF featurization of protein-ligand binding pocket.

For each contact range (i.e. 1 A to 2 A, 2 A to 3 A, etc.) compute a dictionary mapping (protein_index_i, ligand_index_j) tuples –> (protein_ecfp_i, ligand_ecfp_j) tuples. Return a list of such splif dictionaries.

deepchem.feat.rdkit_grid_featurizer.generate_random__unit_vector()[source]

Generate a random unit vector on the 3-sphere. citation: http://mathworld.wolfram.com/SpherePointPicking.html

  1. Choose random theta element [0, 2*pi]
  2. Choose random z element [-1, 1]
  3. Compute output vector u: (x,y,z) = (sqrt(1-z^2)*cos(theta), sqrt(1-z^2)*sin(theta),z)
deepchem.feat.rdkit_grid_featurizer.generate_random_rotation_matrix()[source]
  1. Generate a random unit vector u, randomly sampled from the unit
3-sphere (see function generate_random__unit_vector() for details)
  1. Generate a second random unit vector v
  1. If absolute value of u dot v > 0.99, repeat.
(This is important for numerical stability. Intuition: we want them to
be as linearly independent as possible or else the orthogonalized version of v will be much shorter in magnitude compared to u. I assume in Stack they took this from Gram-Schmidt orthogonalization?)
  1. v” = v - (u dot v)*u, i.e. subtract out the component of v that’s in
u’s direction
  1. normalize v” (this isn”t in Stack but I assume it must be done)
  1. find w = u cross v”
  2. u, v”, and w will form the columns of a rotation matrix, R. The intuition is that u, v” and w are, respectively, what the standard basis vectors e1, e2, and e3 will be mapped to under the transformation.
deepchem.feat.rdkit_grid_featurizer.get_formal_charge(atom)[source]
deepchem.feat.rdkit_grid_featurizer.get_ligand_filetype(ligand_filename)[source]

Returns the filetype of ligand.

deepchem.feat.rdkit_grid_featurizer.get_partial_charge(atom)[source]

Get partial charge of a given atom (rdkit Atom object)

deepchem.feat.rdkit_grid_featurizer.hash_ecfp(ecfp, power)[source]

Returns an int of size 2^power representing that ECFP fragment. Input must be a string.

deepchem.feat.rdkit_grid_featurizer.hash_ecfp_pair(ecfp_pair, power)[source]

Returns an int of size 2^power representing that ECFP pair. Input must be a tuple of strings.

deepchem.feat.rdkit_grid_featurizer.hash_sybyl(sybyl, sybyl_types)[source]
deepchem.feat.rdkit_grid_featurizer.is_angle_within_cutoff(vector_i, vector_j, hbond_angle_cutoff)[source]
deepchem.feat.rdkit_grid_featurizer.is_cation_pi(cation_position, ring_center, ring_normal, dist_cutoff=6.5, angle_cutoff=30.0)[source]

Check if a cation and an aromatic ring form contact.

ring_center: np.ndarray
Positions of ring center. Can be computed with the compute_ring_center function.
ring_normal: np.ndarray
Normal of ring. Can be computed with the compute_ring_normal function.
dist_cutoff: float
Distance cutoff. Max allowed distance between ring center and cation (in Angstroms).
angle_cutoff: float
Angle cutoff. Max allowed deviation from the ideal (0deg) angle between ring normal and vector pointing from ring center to cation (in degrees).
deepchem.feat.rdkit_grid_featurizer.is_hydrogen_bond(protein_xyz, protein, ligand_xyz, ligand, contact, hbond_angle_cutoff)[source]

Determine if a pair of atoms (contact = tuple of protein_atom_index, ligand_atom_index) between protein and ligand represents a hydrogen bond. Returns a boolean result.

deepchem.feat.rdkit_grid_featurizer.is_pi_parallel(ring1_center, ring1_normal, ring2_center, ring2_normal, dist_cutoff=8.0, angle_cutoff=30.0)[source]

Check if two aromatic rings form a parallel pi-pi contact.

ring1_center, ring2_center: np.ndarray
Positions of centers of the two rings. Can be computed with the compute_ring_center function.
ring1_normal, ring2_normal: np.ndarray
Normals of the two rings. Can be computed with the compute_ring_normal function.
dist_cutoff: float
Distance cutoff. Max allowed distance between the ring center (Angstroms).
angle_cutoff: float
Angle cutoff. Max allowed deviation from the ideal (0deg) angle between the rings (in degrees).
deepchem.feat.rdkit_grid_featurizer.is_pi_t(ring1_center, ring1_normal, ring2_center, ring2_normal, dist_cutoff=5.5, angle_cutoff=30.0)[source]

Check if two aromatic rings form a T-shaped pi-pi contact.

ring1_center, ring2_center: np.ndarray
Positions of centers of the two rings. Can be computed with the compute_ring_center function.
ring1_normal, ring2_normal: np.ndarray
Normals of the two rings. Can be computed with the compute_ring_normal function.
dist_cutoff: float
Distance cutoff. Max allowed distance between the ring center (Angstroms).
angle_cutoff: float
Angle cutoff. Max allowed deviation from the ideal (90deg) angle between the rings (in degrees).
deepchem.feat.rdkit_grid_featurizer.is_salt_bridge(atom_i, atom_j)[source]

Check if two atoms have correct charges to form a salt bridge

deepchem.feat.rdkit_grid_featurizer.rotate_molecules(mol_coordinates_list)[source]

Rotates provided molecular coordinates.

Pseudocode: 1. Generate random rotation matrix. This matrix applies a random

transformation to any 3-vector such that, were the random transformation repeatedly applied, it would randomly sample along the surface of a sphere with radius equal to the norm of the given 3-vector cf. _generate_random_rotation_matrix() for details
  1. Apply R to all atomic coordinatse.
  2. Return rotated molecule
deepchem.feat.rdkit_grid_featurizer.subtract_centroid(xyz, centroid)[source]

Subtracts centroid from each coordinate.

Subtracts the centroid, a numpy array of dim 3, from all coordinates of all atoms in the molecule

deepchem.feat.rdkit_grid_featurizer.unit_vector(vector)[source]

Returns the unit vector of the vector.

deepchem.feat.test_one_hot module

class deepchem.feat.test_one_hot.TestOneHotFeaturizer(methodName='runTest')[source]

Bases: unittest.case.TestCase

addCleanup(function, *args, **kwargs)

Add a function, with arguments, to be called when the test is completed. Functions added are called on a LIFO basis and are called after tearDown on test failure or success.

Cleanup items are called even if setUp fails (unlike tearDown).

addTypeEqualityFunc(typeobj, function)

Add a type specific assertEqual style function to compare a type.

This method is for use by TestCase subclasses that need to register their own type equality functions to provide nicer error messages.

Parameters:
  • typeobj – The data type to call this function on when both values are of the same type in assertEqual().
  • function – The callable taking two arguments and an optional msg= argument that raises self.failureException with a useful error message when the two arguments are not equal.
assertAlmostEqual(first, second, places=None, msg=None, delta=None)

Fail if the two objects are unequal as determined by their difference rounded to the given number of decimal places (default 7) and comparing to zero, or by comparing that the between the two objects is more than the given delta.

Note that decimal places (from zero) are usually not the same as significant digits (measured from the most significant digit).

If the two objects compare equal then they will automatically compare almost equal.

assertAlmostEquals(*args, **kwargs)
assertCountEqual(first, second, msg=None)

An unordered sequence comparison asserting that the same elements, regardless of order. If the same element occurs more than once, it verifies that the elements occur the same number of times.

self.assertEqual(Counter(list(first)),
Counter(list(second)))
Example:
  • [0, 1, 1] and [1, 0, 1] compare equal.
  • [0, 0, 1] and [0, 1] compare unequal.
assertDictContainsSubset(subset, dictionary, msg=None)

Checks whether dictionary is a superset of subset.

assertDictEqual(d1, d2, msg=None)
assertEqual(first, second, msg=None)

Fail if the two objects are unequal as determined by the ‘==’ operator.

assertEquals(*args, **kwargs)
assertFalse(expr, msg=None)

Check that the expression is false.

assertGreater(a, b, msg=None)

Just like self.assertTrue(a > b), but with a nicer default message.

assertGreaterEqual(a, b, msg=None)

Just like self.assertTrue(a >= b), but with a nicer default message.

assertIn(member, container, msg=None)

Just like self.assertTrue(a in b), but with a nicer default message.

assertIs(expr1, expr2, msg=None)

Just like self.assertTrue(a is b), but with a nicer default message.

assertIsInstance(obj, cls, msg=None)

Same as self.assertTrue(isinstance(obj, cls)), with a nicer default message.

assertIsNone(obj, msg=None)

Same as self.assertTrue(obj is None), with a nicer default message.

assertIsNot(expr1, expr2, msg=None)

Just like self.assertTrue(a is not b), but with a nicer default message.

assertIsNotNone(obj, msg=None)

Included for symmetry with assertIsNone.

assertLess(a, b, msg=None)

Just like self.assertTrue(a < b), but with a nicer default message.

assertLessEqual(a, b, msg=None)

Just like self.assertTrue(a <= b), but with a nicer default message.

assertListEqual(list1, list2, msg=None)

A list-specific equality assertion.

Parameters:
  • list1 – The first list to compare.
  • list2 – The second list to compare.
  • msg – Optional message to use on failure instead of a list of differences.
assertLogs(logger=None, level=None)

Fail unless a log message of level level or higher is emitted on logger_name or its children. If omitted, level defaults to INFO and logger defaults to the root logger.

This method must be used as a context manager, and will yield a recording object with two attributes: output and records. At the end of the context manager, the output attribute will be a list of the matching formatted log messages and the records attribute will be a list of the corresponding LogRecord objects.

Example:

with self.assertLogs('foo', level='INFO') as cm:
    logging.getLogger('foo').info('first message')
    logging.getLogger('foo.bar').error('second message')
self.assertEqual(cm.output, ['INFO:foo:first message',
                             'ERROR:foo.bar:second message'])
assertMultiLineEqual(first, second, msg=None)

Assert that two multi-line strings are equal.

assertNotAlmostEqual(first, second, places=None, msg=None, delta=None)

Fail if the two objects are equal as determined by their difference rounded to the given number of decimal places (default 7) and comparing to zero, or by comparing that the between the two objects is less than the given delta.

Note that decimal places (from zero) are usually not the same as significant digits (measured from the most significant digit).

Objects that are equal automatically fail.

assertNotAlmostEquals(*args, **kwargs)
assertNotEqual(first, second, msg=None)

Fail if the two objects are equal as determined by the ‘!=’ operator.

assertNotEquals(*args, **kwargs)
assertNotIn(member, container, msg=None)

Just like self.assertTrue(a not in b), but with a nicer default message.

assertNotIsInstance(obj, cls, msg=None)

Included for symmetry with assertIsInstance.

assertNotRegex(text, unexpected_regex, msg=None)

Fail the test if the text matches the regular expression.

assertNotRegexpMatches(*args, **kwargs)
assertRaises(expected_exception, *args, **kwargs)

Fail unless an exception of class expected_exception is raised by the callable when invoked with specified positional and keyword arguments. If a different type of exception is raised, it will not be caught, and the test case will be deemed to have suffered an error, exactly as for an unexpected exception.

If called with the callable and arguments omitted, will return a context object used like this:

with self.assertRaises(SomeException):
    do_something()

An optional keyword argument ‘msg’ can be provided when assertRaises is used as a context object.

The context manager keeps a reference to the exception as the ‘exception’ attribute. This allows you to inspect the exception after the assertion:

with self.assertRaises(SomeException) as cm:
    do_something()
the_exception = cm.exception
self.assertEqual(the_exception.error_code, 3)
assertRaisesRegex(expected_exception, expected_regex, *args, **kwargs)

Asserts that the message in a raised exception matches a regex.

Parameters:
  • expected_exception – Exception class expected to be raised.
  • expected_regex – Regex (re pattern object or string) expected to be found in error message.
  • args – Function to be called and extra positional args.
  • kwargs – Extra kwargs.
  • msg – Optional message used in case of failure. Can only be used when assertRaisesRegex is used as a context manager.
assertRaisesRegexp(*args, **kwargs)
assertRegex(text, expected_regex, msg=None)

Fail the test unless the text matches the regular expression.

assertRegexpMatches(*args, **kwargs)
assertSequenceEqual(seq1, seq2, msg=None, seq_type=None)

An equality assertion for ordered sequences (like lists and tuples).

For the purposes of this function, a valid ordered sequence type is one which can be indexed, has a length, and has an equality operator.

Parameters:
  • seq1 – The first sequence to compare.
  • seq2 – The second sequence to compare.
  • seq_type – The expected datatype of the sequences, or None if no datatype should be enforced.
  • msg – Optional message to use on failure instead of a list of differences.
assertSetEqual(set1, set2, msg=None)

A set-specific equality assertion.

Parameters:
  • set1 – The first set to compare.
  • set2 – The second set to compare.
  • msg – Optional message to use on failure instead of a list of differences.

assertSetEqual uses ducktyping to support different types of sets, and is optimized for sets specifically (parameters must support a difference method).

assertTrue(expr, msg=None)

Check that the expression is true.

assertTupleEqual(tuple1, tuple2, msg=None)

A tuple-specific equality assertion.

Parameters:
  • tuple1 – The first tuple to compare.
  • tuple2 – The second tuple to compare.
  • msg – Optional message to use on failure instead of a list of differences.
assertWarns(expected_warning, *args, **kwargs)

Fail unless a warning of class warnClass is triggered by the callable when invoked with specified positional and keyword arguments. If a different type of warning is triggered, it will not be handled: depending on the other warning filtering rules in effect, it might be silenced, printed out, or raised as an exception.

If called with the callable and arguments omitted, will return a context object used like this:

with self.assertWarns(SomeWarning):
    do_something()

An optional keyword argument ‘msg’ can be provided when assertWarns is used as a context object.

The context manager keeps a reference to the first matching warning as the ‘warning’ attribute; similarly, the ‘filename’ and ‘lineno’ attributes give you information about the line of Python code from which the warning was triggered. This allows you to inspect the warning after the assertion:

with self.assertWarns(SomeWarning) as cm:
    do_something()
the_warning = cm.warning
self.assertEqual(the_warning.some_attribute, 147)
assertWarnsRegex(expected_warning, expected_regex, *args, **kwargs)

Asserts that the message in a triggered warning matches a regexp. Basic functioning is similar to assertWarns() with the addition that only warnings whose messages also match the regular expression are considered successful matches.

Parameters:
  • expected_warning – Warning class expected to be triggered.
  • expected_regex – Regex (re pattern object or string) expected to be found in error message.
  • args – Function to be called and extra positional args.
  • kwargs – Extra kwargs.
  • msg – Optional message used in case of failure. Can only be used when assertWarnsRegex is used as a context manager.
assert_(*args, **kwargs)
countTestCases()
debug()

Run the test without collecting errors in a TestResult

defaultTestResult()
doCleanups()

Execute all cleanup functions. Normally called for you after tearDown.

fail(msg=None)

Fail immediately, with the given message.

failIf(*args, **kwargs)
failIfAlmostEqual(*args, **kwargs)
failIfEqual(*args, **kwargs)
failUnless(*args, **kwargs)
failUnlessAlmostEqual(*args, **kwargs)
failUnlessEqual(*args, **kwargs)
failUnlessRaises(*args, **kwargs)
failureException

alias of AssertionError

id()
longMessage = True
maxDiff = 640
run(result=None)
setUp()

Hook method for setting up the test fixture before exercising it.

setUpClass()

Hook method for setting up class fixture before running tests in the class.

shortDescription()

Returns a one-line description of the test, or None if no description has been provided.

The default implementation of this method returns the first line of the specified test method’s docstring.

skipTest(reason)

Skip this test.

subTest(msg=<object object>, **params)

Return a context manager that will return the enclosed block of code in a subtest identified by the optional message and keyword parameters. A failure in the subtest marks the test case as failed but resumes execution at the end of the enclosed block, allowing further test code to be executed.

tearDown()

Hook method for deconstructing the test fixture after testing it.

tearDownClass()

Hook method for deconstructing the class fixture after running all tests in the class.

test_featurize()[source]

Module contents

Making it easy to import in classes.