deepchem.molnet package

Submodules

deepchem.molnet.check_availability module

deepchem.molnet.preset_hyper_parameters module

Created on Tue Mar 7 00:07:10 2017

@author: zqwu

deepchem.molnet.run_benchmark module

Created on Mon Mar 06 14:25:40 2017

@author: Zhenqin Wu

deepchem.molnet.run_benchmark.benchmark_model(model, all_dataset, transformers, metric, test=False)[source]

Benchmark custom model.

model: user-defined model stucture
For user define model, it should include function: fit, evaluate.
all_dataset: (train, test, val) data tuple.
Returned by load_dataset function.

transformers

metric: string
choice of evaluation metrics.
deepchem.molnet.run_benchmark.load_dataset(dataset, featurizer, split='random')[source]

Load specific dataset for benchmark.

Parameters:
  • dataset (string) – choice of which datasets to use, should be: tox21, muv, sider, toxcast, pcba, delaney, kaggle, nci, clintox, hiv, pdbbind, chembl, qm7, qm7b, qm9, sampl
  • featurizer (string or dc.feat.Featurizer.) – choice of featurization.
  • split (string, optional (default=None)) – choice of splitter function, None = using the default splitter
deepchem.molnet.run_benchmark.run_benchmark(datasets, model, split=None, metric=None, direction=True, featurizer=None, n_features=0, out_path='.', hyper_parameters=None, hyper_param_search=False, max_iter=20, search_range=2, test=False, reload=True, seed=123)[source]

Run benchmark test on designated datasets with deepchem(or user-defined) model

Parameters:
  • datasets (list of string) – choice of which datasets to use, should be: bace_c, bace_r, bbbp, chembl, clearance, clintox, delaney, hiv, hopv, kaggle, lipo, muv, nci, pcba, pdbbind, ppb, qm7, qm7b, qm8, qm9, sampl, sider, tox21, toxcast
  • model (string or user-defined model stucture) – choice of which model to use, deepchem provides implementation of logistic regression, random forest, multitask network, bypass multitask network, irv, graph convolution; for user define model, it should include function: fit, evaluate
  • split (string, optional (default=None)) – choice of splitter function, None = using the default splitter
  • metric (string, optional (default=None)) – choice of evaluation metrics, None = using the default metrics(AUC & R2)
  • direction (bool, optional(default=True)) – Optimization direction when doing hyperparameter search Maximization(True) or minimization(False)
  • featurizer (string or dc.feat.Featurizer, optional (default=None)) – choice of featurization, None = using the default corresponding to model (string only applicable to deepchem models)
  • n_features (int, optional(default=0)) – depending on featurizers, redefined when using deepchem featurizers, need to be specified for user-defined featurizers(if using deepchem models)
  • out_path (string, optional(default='.')) – path of result file
  • hyper_parameters (dict, optional (default=None)) – hyper parameters for designated model, None = use preset values
  • hyper_param_search (bool, optional(default=False)) – whether to perform hyper parameter search, using gaussian process by default
  • max_iter (int, optional(default=20)) – number of optimization trials
  • search_range (int(float), optional(default=4)) – optimization on [initial values / search_range, initial values * search_range]
  • test (boolean, optional(default=False)) – whether to evaluate on test set
  • reload (boolean, optional(default=True)) – whether to save and reload featurized datasets

deepchem.molnet.run_benchmark_low_data module

Created on Mon Mar 06 14:25:40 2017

@author: Zhenqin Wu

deepchem.molnet.run_benchmark_low_data.run_benchmark_low_data(datasets, model, split='task', metric=None, featurizer=None, n_features=0, out_path='.', K=4, hyper_parameters=None, cross_valid=False, seed=123)[source]

Run low data benchmark test on designated datasets with deepchem(or user-defined) model

Parameters:
  • datasets (list of string) – choice of which datasets to use, should be: muv, tox21, sider
  • model (string or user-defined model stucture) – choice of which model to use, should be: siamese, attn, res
  • split (string, optional (default='task')) – choice of splitter function, only task splitter supported
  • metric (string, optional (default=None)) – choice of evaluation metrics, None = using the default metrics(AUC)
  • featurizer (string or dc.feat.Featurizer, optional (default=None)) – choice of featurization, None = using the default corresponding to model (string only applicable to deepchem models)
  • n_features (int, optional(default=0)) – depending on featurizers, redefined when using deepchem featurizers, need to be specified for user-defined featurizers(if using deepchem models)
  • out_path (string, optional(default='.')) – path of result file
  • K (int, optional(default=4)) – K-fold splitting of datasets
  • hyper_parameters (dict, optional (default=None)) – hyper parameters for designated model, None = use preset values
  • cross_valid (boolean, optional(default=False)) – whether to cross validate

deepchem.molnet.run_benchmark_models module

Created on Mon Mar 6 23:41:26 2017

@author: zqwu

deepchem.molnet.run_benchmark_models.benchmark_classification(train_dataset, valid_dataset, test_dataset, tasks, transformers, n_features, metric, model, test=False, hyper_parameters=None, seed=123)[source]

Calculate performance of different models on the specific dataset & tasks

Parameters:
  • train_dataset (dataset struct) – dataset used for model training and evaluation
  • valid_dataset (dataset struct) – dataset only used for model evaluation (and hyperparameter tuning)
  • test_dataset (dataset struct) – dataset only used for model evaluation
  • tasks (list of string) – list of targets(tasks, datasets)
  • transformers (dc.trans.Transformer struct) – transformer used for model evaluation
  • n_features (integer) – number of features, or length of binary fingerprints
  • metric (list of dc.metrics.Metric objects) – metrics used for evaluation
  • model (string, optional (default='tf')) – choice of which model to use, should be: rf, tf, tf_robust, logreg, irv, graphconv, dag, xgb, weave
  • test (boolean) – whether to calculate test_set performance
  • hyper_parameters (dict, optional (default=None)) – hyper parameters for designated model, None = use preset values
Returns:

  • train_scores (dict) – predicting results(AUC) on training set
  • valid_scores (dict) – predicting results(AUC) on valid set
  • test_scores (dict) – predicting results(AUC) on test set

deepchem.molnet.run_benchmark_models.benchmark_regression(train_dataset, valid_dataset, test_dataset, tasks, transformers, n_features, metric, model, test=False, hyper_parameters=None, seed=123)[source]

Calculate performance of different models on the specific dataset & tasks

Parameters:
  • train_dataset (dataset struct) – dataset used for model training and evaluation
  • valid_dataset (dataset struct) – dataset only used for model evaluation (and hyperparameter tuning)
  • test_dataset (dataset struct) – dataset only used for model evaluation
  • tasks (list of string) – list of targets(tasks, datasets)
  • transformers (dc.trans.Transformer struct) – transformer used for model evaluation
  • n_features (integer) – number of features, or length of binary fingerprints
  • metric (list of dc.metrics.Metric objects) – metrics used for evaluation
  • model (string, optional (default='tf_regression')) – choice of which model to use, should be: tf_regression, tf_regression_ft, graphconvreg, rf_regression, dtnn, dag_regression, xgb_regression, weave_regression, krr, ani, krr_ft, mpnn
  • test (boolean) – whether to calculate test_set performance
  • hyper_parameters (dict, optional (default=None)) – hyper parameters for designated model, None = use preset values
Returns:

  • train_scores (dict) – predicting results(AUC) on training set
  • valid_scores (dict) – predicting results(AUC) on valid set
  • test_scores (dict) – predicting results(AUC) on test set

deepchem.molnet.run_benchmark_models.low_data_benchmark_classification(train_dataset, valid_dataset, n_features, metric, model='siamese', hyper_parameters=None, seed=123)[source]

Calculate low data benchmark performance

Parameters:
  • train_dataset (dataset struct) – loaded dataset, ConvMol struct, used for training
  • valid_dataset (dataset struct) – loaded dataset, ConvMol struct, used for validation
  • n_features (integer) – number of features, or length of binary fingerprints
  • metric (list of dc.metrics.Metric objects) – metrics used for evaluation
  • model (string, optional (default='siamese')) – choice of which model to use, should be: siamese, attn, res
  • hyper_parameters (dict, optional (default=None)) – hyper parameters for designated model, None = use preset values
Returns:

valid_scores – predicting results(AUC) on valid set

Return type:

dict

Module contents