deepchem.metrics package

Module contents

Evaluation metrics.

class deepchem.metrics.Metric(metric, task_averager=None, name=None, threshold=None, verbose=True, mode=None, compute_energy_metric=False)[source]

Bases: object

Wrapper class for computing user-defined metrics.

compute_metric(y_true, y_pred, w=None, n_classes=2, filter_nans=True, per_task_metrics=False)[source]

Compute a performance metric for each task.

  • y_true (np.ndarray) – An np.ndarray containing true values for each task.
  • y_pred (np.ndarray) – An np.ndarray containing predicted values for each task.
  • w (np.ndarray, optional) – An np.ndarray containing weights for each datapoint.
  • n_classes (int, optional) – Number of classes in data for classification tasks.
  • filter_nans (bool, optional) – Remove NaN values in computed metrics
  • per_task_metrics (bool, optional) – If true, return computed metric for each task on multitask dataset.

Return type:

A numpy nd.array containing metric values for each task.

compute_singletask_metric(y_true, y_pred, w)[source]

Compute a metric value.

  • y_true – A list of arrays containing true values for each task.
  • y_pred – A list of arrays containing predicted values for each task.

Float metric value.


NotImplementedError – If metric_str is not in METRICS.

deepchem.metrics.balanced_accuracy_score(y, y_pred)[source]

Computes balanced accuracy score.

deepchem.metrics.compute_roc_auc_scores(y, y_pred)[source]

Transforms the results dict into roc-auc-scores and prints scores.

  • results (dict) –
  • task_types (dict) – dict mapping task names to output type. Each output type must be either “classification” or “regression”.
deepchem.metrics.from_one_hot(y, axis=1)[source]

Transorms label vector from one-hot encoding.

y: np.ndarray
A vector of shape [n_samples, num_classes]
deepchem.metrics.kappa_score(y_true, y_pred)[source]

Calculate Cohen’s kappa for classification tasks.


Note that this implementation of Cohen’s kappa expects binary labels.

  • y_true – Numpy array containing true values.
  • y_pred – Numpy array containing predicted values.

Numpy array containing kappa for each classification task.

Return type:



AssertionError – If y_true and y_pred are not the same size, or if class labels are not in [0, 1].

deepchem.metrics.mae_score(y_true, y_pred)[source]

Computes MAE.

deepchem.metrics.pearson_r2_score(y, y_pred)[source]

Computes Pearson R^2 (square of Pearson correlation).

deepchem.metrics.prc_auc_score(y, y_pred)[source]

Compute area under precision-recall curve

deepchem.metrics.rms_score(y_true, y_pred)[source]

Computes RMS error.

deepchem.metrics.to_one_hot(y, n_classes=2)[source]

Transforms label vector into one-hot encoding.

Turns y into vector of shape [n_samples, 2] (assuming binary labels).

y: np.ndarray
A vector of shape [n_samples, 1]