deepchem.models.sklearn_models package

Module contents

Code for processing datasets using scikit-learn.

class deepchem.models.sklearn_models.SklearnModel(model_instance=None, model_dir=None, verbose=True, **kwargs)[source]

Bases: deepchem.models.models.Model

Abstract base class for different ML models.

evaluate(dataset, metrics, transformers=[], per_task_metrics=False)

Evaluates the performance of this model on specified dataset.

Parameters:
  • dataset (dc.data.Dataset) – Dataset object.
  • metric (deepchem.metrics.Metric) – Evaluation metric
  • transformers (list) – List of deepchem.transformers.Transformer
  • per_task_metrics (bool) – If True, return per-task scores.
Returns:

Maps tasks to scores under metric.

Return type:

dict

fit(dataset, **kwargs)[source]

Fits SKLearn model to data.

fit_on_batch(X, y, w)

Updates existing model with new information.

get_model_filename(model_dir)

Given model directory, obtain filename for the model itself.

get_num_tasks()[source]

Number of tasks for this model. Defaults to 1

get_params(deep=True)

Get parameters for this estimator.

Parameters:deep (boolean, optional) – If True, will return the parameters for this estimator and contained subobjects that are estimators.
Returns:params – Parameter names mapped to their values.
Return type:mapping of string to any
get_params_filename(model_dir)

Given model directory, obtain filename for the model itself.

get_task_type()

Currently models can only be classifiers or regressors.

predict(X, transformers=[])[source]

Makes predictions on dataset.

predict_on_batch(X, pad_batch=False)[source]

Makes predictions on batch of data.

Parameters:
  • X (np.ndarray) – Features
  • pad_batch (bool, optional) – Ignored for Sklearn Model. Only used for Tensorflow models with rigid batch-size requirements.
predict_proba(dataset, transformers=[], batch_size=None, n_classes=2)

TODO: Do transformers even make sense here?

Returns:numpy ndarray of shape (n_samples, n_classes*n_tasks)
Return type:y_pred
predict_proba_on_batch(X, pad_batch=False)[source]

Makes per-class predictions on batch of data.

Parameters:
  • X (np.ndarray) – Features
  • pad_batch (bool, optional) – Ignored for Sklearn Model. Only used for Tensorflow models with rigid batch-size requirements.
reload()[source]

Loads sklearn model from joblib file on disk.

save()[source]

Saves sklearn model to disk using joblib.

set_params(**params)

Set the parameters of this estimator.

The method works on simple estimators as well as on nested objects (such as pipelines). The latter have parameters of the form <component>__<parameter> so that it’s possible to update each component of a nested object.

Returns:
Return type:self