deepchem.utils package


deepchem.utils.conformers module

Conformer generation.

class deepchem.utils.conformers.ConformerGenerator(max_conformers=1, rmsd_threshold=0.5, force_field='uff', pool_multiplier=10)[source]

Bases: object

Generate molecule conformers.

  1. Generate a pool of conformers.
  2. Minimize conformers.
  3. Prune conformers using an RMSD threshold.

Note that pruning is done _after_ minimization, which differs from the protocol described in the references.


  • max_conformers (int, optional (default 1)) – Maximum number of conformers to generate (after pruning).
  • rmsd_threshold (float, optional (default 0.5)) – RMSD threshold for pruning conformers. If None or negative, no pruning is performed.
  • force_field (str, optional (default 'uff')) – Force field to use for conformer energy calculation and minimization. Options are ‘uff’, ‘mmff94’, and ‘mmff94s’.
  • pool_multiplier (int, optional (default 10)) – Factor to multiply by max_conformers to generate the initial conformer pool. Since conformers are pruned after energy minimization, increasing the size of the pool increases the chance of identifying max_conformers unique conformers.

Generate conformers for a molecule.

Parameters:mol (RDKit Mol) – Molecule.

Generate conformers, possibly with pruning.

Parameters:mol (RDKit Mol) – Molecule.

Generate conformers for a molecule.

This function returns a copy of the original molecule with embedded conformers.

Parameters:mol (RDKit Mol) – Molecule.

Calculate conformer energies.

Parameters:mol (RDKit Mol) – Molecule.
Returns:energies – Minimized conformer energies.
Return type:array_like
static get_conformer_rmsd(mol)[source]

Calculate conformer-conformer RMSD.

Parameters:mol (RDKit Mol) – Molecule.
get_molecule_force_field(mol, conf_id=None, **kwargs)[source]

Get a force field for a molecule.

  • mol (RDKit Mol) – Molecule.
  • conf_id (int, optional) – ID of the conformer to associate with the force field.
  • kwargs (dict, optional) – Keyword arguments for force field constructor.

Minimize molecule conformers.

Parameters:mol (RDKit Mol) – Molecule.

Prune conformers from a molecule using an RMSD threshold, starting with the lowest energy conformer.

Parameters:mol (RDKit Mol) – Molecule.
  • A new RDKit Mol containing the chosen conformers, sorted by
  • increasing energy.

deepchem.utils.evaluate module

Utility functions to evaluate models on datasets.

class deepchem.utils.evaluate.Evaluator(model, dataset, transformers, verbose=False)[source]

Bases: object

Class that evaluates a model on a given dataset.

compute_model_performance(metrics, csv_out=None, stats_out=None, per_task_metrics=False)[source]

Computes statistics of model on test data and saves results to csv.

  • metrics (list) – List of dc.metrics.Metric objects
  • csv_out (str, optional) – Filename to write CSV of model predictions.
  • stats_out (str, optional) – Filename to write computed statistics.
  • per_task_metrics (bool, optional) – If true, return computed metric for each task on multitask dataset.
output_predictions(y_preds, csv_out)[source]

Writes predictions to file.

  • y_preds – np.ndarray
  • csvfile – Open file object.
output_statistics(scores, stats_out)[source]

Write computed stats to file.

class deepchem.utils.evaluate.GeneratorEvaluator(model, generator, transformers, labels, outputs=None, n_tasks=1, n_classes=2, weights=[])[source]

Bases: object

Partner class to Evaluator. Instead of operating over datasets this class operates over Generator. Evaluate a Metric over a model and Generator.

compute_model_performance(metrics, per_task_metrics=False)[source]

Computes statistics of model on test data and saves results to csv.

  • metrics (list) – List of dc.metrics.Metric objects
  • per_task_metrics (bool, optional) – If true, return computed metric for each task on multitask dataset.
deepchem.utils.evaluate.relative_difference(x, y)[source]

Compute the relative difference between x and y

deepchem.utils.evaluate.threshold_predictions(y, threshold)[source]

deepchem.utils.mol_xyz_util module


Uses compute centroid and range of 3D coordinents


deepchem.utils.rdkit_util module

exception deepchem.utils.rdkit_util.MoleculeLoadException(*args, **kwargs)[source]

Bases: Exception


Exception.with_traceback(tb) – set self.__traceback__ to tb and return self.

class deepchem.utils.rdkit_util.PdbqtLigandWriter(mol, outfile)[source]

Bases: object

Create a torsion tree and write to pdbqt file


The single public function of this class. It converts a molecule and a pdb file into a pdbqt file stored in outfile


Add hydrogens to a molecule object TODO (LESWING) see if there are more flags to add here for default :param mol: Rdkit Mol :return: Rdkit Mol


Attempt to compute Gasteiger Charges on Mol This also has the side effect of calculating charges on mol. The mol passed into this function has to already have been sanitized :param mol: rdkit molecule :return: molecule with charges


returns an m x 3 np array of 3d coords of given rdkit molecule

deepchem.utils.rdkit_util.load_molecule(molecule_file, add_hydrogens=True, calc_charges=True, sanitize=False)[source]

Converts molecule file to (xyz-coords, obmol object)

Given molecule_file, returns a tuple of xyz coords of molecule and an rdkit object representing that molecule :param molecule_file: filename for molecule :param add_hydrogens: should add hydrogens via pdbfixer? :param calc_charges: should add charges vis rdkit :return: (xyz, mol)

deepchem.utils.rdkit_util.merge_molecules(ligand, protein)[source]
deepchem.utils.rdkit_util.merge_molecules_xyz(protein_xyz, ligand_xyz)[source]

Merges coordinates of protein and ligand.

deepchem.utils.rdkit_util.pdbqt_file_hack_ligand(mol, outfile)[source]

Hack to convert a pdb ligand into a pdbqt ligand :param mol: rdkit Mol Object :param outfile: filename which already has a valid pdb representation of mol

deepchem.utils.rdkit_util.pdbqt_file_hack_protein(mol, outfile)[source]

Hack to convert a pdb protein into a pdbqt protein :param mol: rdkit Mol of protein :param outfile: filename which already has a valid pdb representation of mol

deepchem.utils.rdkit_util.write_molecule(mol, outfile, is_protein=False)[source]
Write molecule to a file
  • mol – rdkit Mol object
  • outfile – filename to write mol to
  • is_protein – is this molecule a protein? module

Simple utils to save and load from disk.[source]

Loads fasta file and returns an array of one-hot sequences.

Parameters:fname (str) – Filename of fasta file.[source]

Get type of input file. Must be csv/pkl.gz/sdf file., shard_size=None, verbose=True)[source]

Load data as pandas dataframe., shard_size=None, verbose=True)[source]

Loads data from disk.

For CSV files, supports sharded loading for large files.[source]
Parameters:save_dir (str) –
  • loaded (bool) – Whether the load succeeded
  • all_dataset ((,, – The train, valid, test datasets
  • transformers (list of dc.trans.Transformer) – The transformers used for this dataset[source]

Load a dataset from file.[source]

Load dataset from pickle file., clean_mols)[source]

Load SDF file into dataframe.[source]

Load a dataset from multiple files. Each file MUST have same column headers, verbose=True)[source]

Print string if verbose., train, valid, test, transformers)[source], metadata_df, data_dir)[source]

Saves the metadata for a DiskDataset :param tasks: Tasks of DiskDataset :type tasks: list of str :param metadata_df: :type metadata_df: pd.DataFrame :param data_dir: Directory to store metadata :type data_dir: str, filename, compress=3)[source]

Save a dataset to file.[source]

One hot encodes list of genomic sequences.

Sequences encoded have shape (N_sequences, 4, sequence_length, 1). Here 4 is for the 4 basepairs (ACGT) present in genomic sequences. These sequences will be processed as images with one color channel.

Parameters:sequences (np.ndarray) – Array of genetic sequences
Raises:ValueError: – If sequences are of different lengths.
Return type:Shape (N_sequences, 4, sequence_length, 1).

deepchem.utils.visualization module

Module contents

Miscellaneous utility functions.

class deepchem.utils.ScaffoldGenerator(include_chirality=False)[source]

Bases: object

Generate molecular scaffolds.

Parameters:include_chirality (: bool, optional (default False)) – Include chirality in scaffolds.

Get Murcko scaffolds for molecules.

Murcko scaffolds are described in DOI: 10.1021/jm9602928. They are essentially that part of the molecule consisting of rings and the linker atoms between them.

Parameters:mols (array_like) – Molecules.
deepchem.utils.download_url(url, dest_dir='/tmp', name=None)[source]

Download a file to disk.

  • url (str) – the URL to download from
  • dest_dir (str) – the directory to save the file in
  • name (str) – the file name to save it as. If omitted, it will try to extract a file name from the URL

Get the DeepChem data directory.

deepchem.utils.pad_array(x, shape, fill=0, both=False)[source]

Pad an array with a fill value.

  • x (ndarray) – Matrix.
  • shape (tuple or int) – Desired shape. If int, all dimensions are padded to that size.
  • fill (object, optional (default 0)) – Fill value.
  • both (bool, optional (default False)) – If True, split the padding on both sides of each axis. If False, padding is applied to the end of each axis.
deepchem.utils.untargz_file(file, dest_dir='/tmp', name=None)[source]

Untar and unzip a .tar.gz file to disk.

  • file (str) – the filepath to decompress
  • dest_dir (str) – the directory to save the file in
  • name (str) – the file name to save it as. If omitted, it will use the file name