deepchem.utils package

Submodules

deepchem.utils.conformers module

Conformer generation.

class deepchem.utils.conformers.ConformerGenerator(max_conformers=1, rmsd_threshold=0.5, force_field='uff', pool_multiplier=10)[source]

Bases: object

Generate molecule conformers.

  1. Generate a pool of conformers.
  2. Minimize conformers.
  3. Prune conformers using an RMSD threshold.

Note that pruning is done _after_ minimization, which differs from the protocol described in the references.

References

Parameters:
  • max_conformers (int, optional (default 1)) – Maximum number of conformers to generate (after pruning).
  • rmsd_threshold (float, optional (default 0.5)) – RMSD threshold for pruning conformers. If None or negative, no pruning is performed.
  • force_field (str, optional (default 'uff')) – Force field to use for conformer energy calculation and minimization. Options are ‘uff’, ‘mmff94’, and ‘mmff94s’.
  • pool_multiplier (int, optional (default 10)) – Factor to multiply by max_conformers to generate the initial conformer pool. Since conformers are pruned after energy minimization, increasing the size of the pool increases the chance of identifying max_conformers unique conformers.
__call__(mol)[source]

Generate conformers for a molecule.

Parameters:mol (RDKit Mol) – Molecule.
embed_molecule(mol)[source]

Generate conformers, possibly with pruning.

Parameters:mol (RDKit Mol) – Molecule.
generate_conformers(mol)[source]

Generate conformers for a molecule.

This function returns a copy of the original molecule with embedded conformers.

Parameters:mol (RDKit Mol) – Molecule.
get_conformer_energies(mol)[source]

Calculate conformer energies.

Parameters:mol (RDKit Mol) – Molecule.
Returns:energies – Minimized conformer energies.
Return type:array_like
static get_conformer_rmsd(mol)[source]

Calculate conformer-conformer RMSD.

Parameters:mol (RDKit Mol) – Molecule.
get_molecule_force_field(mol, conf_id=None, **kwargs)[source]

Get a force field for a molecule.

Parameters:
  • mol (RDKit Mol) – Molecule.
  • conf_id (int, optional) – ID of the conformer to associate with the force field.
  • kwargs (dict, optional) – Keyword arguments for force field constructor.
minimize_conformers(mol)[source]

Minimize molecule conformers.

Parameters:mol (RDKit Mol) – Molecule.
prune_conformers(mol)[source]

Prune conformers from a molecule using an RMSD threshold, starting with the lowest energy conformer.

Parameters:mol (RDKit Mol) – Molecule.
Returns:
  • A new RDKit Mol containing the chosen conformers, sorted by
  • increasing energy.

deepchem.utils.evaluate module

Utility functions to evaluate models on datasets.

class deepchem.utils.evaluate.Evaluator(model, dataset, transformers, verbose=False)[source]

Bases: object

Class that evaluates a model on a given dataset.

compute_model_performance(metrics, csv_out=None, stats_out=None, per_task_metrics=False)[source]

Computes statistics of model on test data and saves results to csv.

Parameters:
  • metrics (list) – List of dc.metrics.Metric objects
  • csv_out (str, optional) – Filename to write CSV of model predictions.
  • stats_out (str, optional) – Filename to write computed statistics.
  • per_task_metrics (bool, optional) – If true, return computed metric for each task on multitask dataset.
output_predictions(y_preds, csv_out)[source]

Writes predictions to file.

Parameters:
  • y_preds – np.ndarray
  • csvfile – Open file object.
output_statistics(scores, stats_out)[source]

Write computed stats to file.

class deepchem.utils.evaluate.GeneratorEvaluator(model, generator, transformers, labels, outputs=None, n_tasks=1, n_classes=2, weights=[])[source]

Bases: object

Partner class to Evaluator. Instead of operating over datasets this class operates over Generator. Evaluate a Metric over a model and Generator.

compute_model_performance(metrics, per_task_metrics=False)[source]

Computes statistics of model on test data and saves results to csv.

Parameters:
  • metrics (list) – List of dc.metrics.Metric objects
  • per_task_metrics (bool, optional) – If true, return computed metric for each task on multitask dataset.
deepchem.utils.evaluate.relative_difference(x, y)[source]

Compute the relative difference between x and y

deepchem.utils.evaluate.threshold_predictions(y, threshold)[source]

deepchem.utils.mol_xyz_util module

deepchem.utils.mol_xyz_util.get_molecule_centroid(molecule_xyz)[source]

Uses compute centroid and range of 3D coordinents

deepchem.utils.mol_xyz_util.get_molecule_range(molecule_xyz)[source]

deepchem.utils.rdkit_util module

exception deepchem.utils.rdkit_util.MoleculeLoadException(*args, **kwargs)[source]

Bases: Exception

args
with_traceback()

Exception.with_traceback(tb) – set self.__traceback__ to tb and return self.

class deepchem.utils.rdkit_util.PdbqtLigandWriter(mol, outfile)[source]

Bases: object

Create a torsion tree and write to pdbqt file

convert()[source]

The single public function of this class. It converts a molecule and a pdb file into a pdbqt file stored in outfile

deepchem.utils.rdkit_util.add_hydrogens_to_mol(mol)[source]

Add hydrogens to a molecule object TODO (LESWING) see if there are more flags to add here for default :param mol: Rdkit Mol :return: Rdkit Mol

deepchem.utils.rdkit_util.compute_charges(mol)[source]

Attempt to compute Gasteiger Charges on Mol This also has the side effect of calculating charges on mol. The mol passed into this function has to already have been sanitized :param mol: rdkit molecule :return: molecule with charges

deepchem.utils.rdkit_util.get_xyz_from_mol(mol)[source]

returns an m x 3 np array of 3d coords of given rdkit molecule

deepchem.utils.rdkit_util.load_molecule(molecule_file, add_hydrogens=True, calc_charges=True, sanitize=False)[source]

Converts molecule file to (xyz-coords, obmol object)

Given molecule_file, returns a tuple of xyz coords of molecule and an rdkit object representing that molecule :param molecule_file: filename for molecule :param add_hydrogens: should add hydrogens via pdbfixer? :param calc_charges: should add charges vis rdkit :return: (xyz, mol)

deepchem.utils.rdkit_util.merge_molecules(ligand, protein)[source]
deepchem.utils.rdkit_util.merge_molecules_xyz(protein_xyz, ligand_xyz)[source]

Merges coordinates of protein and ligand.

deepchem.utils.rdkit_util.pdbqt_file_hack_ligand(mol, outfile)[source]

Hack to convert a pdb ligand into a pdbqt ligand :param mol: rdkit Mol Object :param outfile: filename which already has a valid pdb representation of mol

deepchem.utils.rdkit_util.pdbqt_file_hack_protein(mol, outfile)[source]

Hack to convert a pdb protein into a pdbqt protein :param mol: rdkit Mol of protein :param outfile: filename which already has a valid pdb representation of mol

deepchem.utils.rdkit_util.pdbqt_to_pdb(filename)[source]
deepchem.utils.rdkit_util.write_molecule(mol, outfile, is_protein=False)[source]
Write molecule to a file
Parameters:
  • mol – rdkit Mol object
  • outfile – filename to write mol to
  • is_protein – is this molecule a protein?

deepchem.utils.save module

Simple utils to save and load from disk.

deepchem.utils.save.encode_fasta_sequence(fname)[source]

Loads fasta file and returns an array of one-hot sequences.

Parameters:fname (str) – Filename of fasta file.
deepchem.utils.save.get_input_type(input_file)[source]

Get type of input file. Must be csv/pkl.gz/sdf file.

deepchem.utils.save.load_csv_files(filenames, shard_size=None, verbose=True)[source]

Load data as pandas dataframe.

deepchem.utils.save.load_data(input_files, shard_size=None, verbose=True)[source]

Loads data from disk.

For CSV files, supports sharded loading for large files.

deepchem.utils.save.load_dataset_from_disk(save_dir)[source]
Parameters:save_dir (str) –
Returns:
  • loaded (bool) – Whether the load succeeded
  • all_dataset ((dc.data.Dataset, dc.data.Dataset, dc.data.Dataset)) – The train, valid, test datasets
  • transformers (list of dc.trans.Transformer) – The transformers used for this dataset
deepchem.utils.save.load_from_disk(filename)[source]

Load a dataset from file.

deepchem.utils.save.load_pickle_from_disk(filename)[source]

Load dataset from pickle file.

deepchem.utils.save.load_sdf_files(input_files, clean_mols)[source]

Load SDF file into dataframe.

deepchem.utils.save.load_sharded_csv(filenames)[source]

Load a dataset from multiple files. Each file MUST have same column headers

deepchem.utils.save.log(string, verbose=True)[source]

Print string if verbose.

deepchem.utils.save.save_dataset_to_disk(save_dir, train, valid, test, transformers)[source]
deepchem.utils.save.save_metadata(tasks, metadata_df, data_dir)[source]

Saves the metadata for a DiskDataset :param tasks: Tasks of DiskDataset :type tasks: list of str :param metadata_df: :type metadata_df: pd.DataFrame :param data_dir: Directory to store metadata :type data_dir: str

deepchem.utils.save.save_to_disk(dataset, filename, compress=3)[source]

Save a dataset to file.

deepchem.utils.save.seq_one_hot_encode(sequences)[source]

One hot encodes list of genomic sequences.

Sequences encoded have shape (N_sequences, 4, sequence_length, 1). Here 4 is for the 4 basepairs (ACGT) present in genomic sequences. These sequences will be processed as images with one color channel.

Parameters:sequences (np.ndarray) – Array of genetic sequences
Raises:ValueError: – If sequences are of different lengths.
Returns:np.ndarray
Return type:Shape (N_sequences, 4, sequence_length, 1).

deepchem.utils.visualization module

Module contents

Miscellaneous utility functions.

class deepchem.utils.ScaffoldGenerator(include_chirality=False)[source]

Bases: object

Generate molecular scaffolds.

Parameters:include_chirality (: bool, optional (default False)) – Include chirality in scaffolds.
get_scaffold(mol)[source]

Get Murcko scaffolds for molecules.

Murcko scaffolds are described in DOI: 10.1021/jm9602928. They are essentially that part of the molecule consisting of rings and the linker atoms between them.

Parameters:mols (array_like) – Molecules.
deepchem.utils.download_url(url, dest_dir='/tmp', name=None)[source]

Download a file to disk.

Parameters:
  • url (str) – the URL to download from
  • dest_dir (str) – the directory to save the file in
  • name (str) – the file name to save it as. If omitted, it will try to extract a file name from the URL
deepchem.utils.get_data_dir()[source]

Get the DeepChem data directory.

deepchem.utils.pad_array(x, shape, fill=0, both=False)[source]

Pad an array with a fill value.

Parameters:
  • x (ndarray) – Matrix.
  • shape (tuple or int) – Desired shape. If int, all dimensions are padded to that size.
  • fill (object, optional (default 0)) – Fill value.
  • both (bool, optional (default False)) – If True, split the padding on both sides of each axis. If False, padding is applied to the end of each axis.
deepchem.utils.untargz_file(file, dest_dir='/tmp', name=None)[source]

Untar and unzip a .tar.gz file to disk.

Parameters:
  • file (str) – the filepath to decompress
  • dest_dir (str) – the directory to save the file in
  • name (str) – the file name to save it as. If omitted, it will use the file name