deepchem.data package

Submodules

deepchem.data.data_loader module

Process an input dataset into a format suitable for machine learning.

class deepchem.data.data_loader.CSVLoader(tasks, smiles_field=None, id_field=None, mol_field=None, featurizer=None, verbose=True, log_every_n=1000)[source]

Bases: deepchem.data.data_loader.DataLoader

Handles loading of CSV files.

featurize(input_files, data_dir=None, shard_size=8192)

Featurize provided files and write to specified location.

For large datasets, automatically shards into smaller chunks for convenience.

Parameters:
  • input_files (list) – List of input filenames.
  • data_dir (str) – (Optional) Directory to store featurized dataset.
  • shard_size (int) – (Optional) Number of examples stored in each shard.
featurize_shard(shard)[source]

Featurizes a shard of an input dataframe.

get_shards(input_files, shard_size, verbose=True)[source]

Defines a generator which returns data for each shard

class deepchem.data.data_loader.DataLoader(tasks, smiles_field=None, id_field=None, mol_field=None, featurizer=None, verbose=True, log_every_n=1000)[source]

Bases: object

Handles loading/featurizing of chemical samples (datapoints).

Currently knows how to load csv-files/pandas-dataframes/SDF-files. Writes a dataframe object to disk as output.

featurize(input_files, data_dir=None, shard_size=8192)[source]

Featurize provided files and write to specified location.

For large datasets, automatically shards into smaller chunks for convenience.

Parameters:
  • input_files (list) – List of input filenames.
  • data_dir (str) – (Optional) Directory to store featurized dataset.
  • shard_size (int) – (Optional) Number of examples stored in each shard.
featurize_shard(shard)[source]

Featurizes a shard of an input dataframe.

get_shards(input_files, shard_size)[source]

Stub for children classes.

class deepchem.data.data_loader.FASTALoader(verbose=True)[source]

Bases: deepchem.data.data_loader.DataLoader

Handles loading of FASTA files.

featurize(input_files, data_dir=None)[source]

Featurizes fasta files.

Parameters:
  • input_files (list) – List of fasta files.
  • data_dir (str) – (Optional) Name of directory where featurized data is stored.
featurize_shard(shard)

Featurizes a shard of an input dataframe.

get_shards(input_files, shard_size)

Stub for children classes.

class deepchem.data.data_loader.SDFLoader(tasks, clean_mols=False, **kwargs)[source]

Bases: deepchem.data.data_loader.DataLoader

Handles loading of SDF files.

featurize(input_files, data_dir=None, shard_size=8192)

Featurize provided files and write to specified location.

For large datasets, automatically shards into smaller chunks for convenience.

Parameters:
  • input_files (list) – List of input filenames.
  • data_dir (str) – (Optional) Directory to store featurized dataset.
  • shard_size (int) – (Optional) Number of examples stored in each shard.
featurize_shard(shard)[source]

Featurizes a shard of an input dataframe.

get_shards(input_files, shard_size)[source]

Defines a generator which returns data for each shard

class deepchem.data.data_loader.UserCSVLoader(tasks, smiles_field=None, id_field=None, mol_field=None, featurizer=None, verbose=True, log_every_n=1000)[source]

Bases: deepchem.data.data_loader.DataLoader

Handles loading of CSV files with user-defined featurizers.

featurize(input_files, data_dir=None, shard_size=8192)

Featurize provided files and write to specified location.

For large datasets, automatically shards into smaller chunks for convenience.

Parameters:
  • input_files (list) – List of input filenames.
  • data_dir (str) – (Optional) Directory to store featurized dataset.
  • shard_size (int) – (Optional) Number of examples stored in each shard.
featurize_shard(shard)[source]

Featurizes a shard of an input dataframe.

get_shards(input_files, shard_size)[source]

Defines a generator which returns data for each shard

deepchem.data.data_loader.convert_df_to_numpy(df, tasks, verbose=False)[source]

Transforms a dataframe containing deepchem input into numpy arrays

deepchem.data.data_loader.featurize_mol_df(df, featurizer, field, verbose=True, log_every_N=1000)[source]

Featurize individual compounds in dataframe.

Featurizes .sdf files, so the 3-D structure should be preserved so we use the rdkit “mol” object created from .sdf instead of smiles string. Some featurizers such as CoulombMatrix also require a 3-D structure. Featurizing from .sdf is currently the only way to perform CM feautization.

deepchem.data.data_loader.featurize_smiles_df(df, featurizer, field, log_every_N=1000, verbose=True)[source]

Featurize individual compounds in dataframe.

Given a featurizer that operates on individual chemical compounds or macromolecules, compute & add features for that compound to the features dataframe

deepchem.data.data_loader.featurize_smiles_np(arr, featurizer, log_every_N=1000, verbose=True)[source]

Featurize individual compounds in a numpy array.

Given a featurizer that operates on individual chemical compounds or macromolecules, compute & add features for that compound to the features array

deepchem.data.data_loader.get_user_specified_features(df, featurizer, verbose=True)[source]

Extract and merge user specified features.

Merge features included in dataset provided by user into final features dataframe

Three types of featurization here:

  1. Molecule featurization
-) Smiles string featurization -) Rdkit MOL featurization
  1. Complex featurization
-) PDB files for interacting molecules.
  1. User specified featurizations.

deepchem.data.datasets module

Contains wrapper class for datasets.

class deepchem.data.datasets.Databag(datasets=None)[source]

Bases: object

A utility class to iterate through multiple datasets together.

add_dataset(key, dataset)[source]
iterbatches(**kwargs)[source]

Loop through all internal datasets in the same order :param batch_size: Number of samples from each dataset to return :type batch_size: int :param epoch: Number of times to loop through the datasets :type epoch: int :param pad_batches: Should all batches==batch_size

Returns:Generator which yields a dictionary {key
Return type:dataset.X[batch]}
class deepchem.data.datasets.Dataset[source]

Bases: object

Abstract base class for datasets defined by X, y, w elements.

X

Get the X vector for this dataset as a single numpy array.

__len__()[source]

Get the number of elements in the dataset.

get_shape()[source]

Get the shape of the dataset.

Returns four tuples, giving the shape of the X, y, w, and ids arrays.

get_statistics(X_stats=True, y_stats=True)[source]

Compute and return statistics of this dataset.

get_task_names()[source]

Get the names of the tasks associated with this dataset.

ids

Get the ids vector for this dataset as a single numpy array.

iterbatches(batch_size=None, epoch=0, deterministic=False, pad_batches=False)[source]
itersamples()[source]

Get an object that iterates over the samples in the dataset.

Example:

>>> dataset = NumpyDataset(np.ones((2,2)))
>>> for x, y, w, id in dataset.itersamples():
...   print(x, y, w, id)
[1. 1.] [0.] [0.] 0
[1. 1.] [0.] [0.] 1
make_iterator(batch_size=100, epochs=1, deterministic=False, pad_batches=False)[source]

Create a tf.data.Iterator that iterates over the data in this Dataset.

The iterator’s get_next() method returns a tuple of three tensors (X, y, w) which can be used to retrieve the features, labels, and weights respectively.

Parameters:
  • batch_size (int) – the number of samples to include in each batch
  • epochs (int) – the number of times to iterate over the Dataset
  • deterministic (bool) – if True, the data is produced in order. If False, a different random permutation of the data is used for each epoch.
  • pad_batches (bool) – if True, batches are padded as necessary to make the size of each batch exactly equal batch_size.
transform(fn, **args)[source]

Construct a new dataset by applying a transformation to every sample in this dataset.

The argument is a function that can be called as follows:

>> newx, newy, neww = fn(x, y, w)

It might be called only once with the whole dataset, or multiple times with different subsets of the data. Each time it is called, it should transform the samples and return the transformed data.

Parameters:fn (function) – A function to apply to each sample in the dataset
Returns:
Return type:a newly constructed Dataset object
w

Get the weight vector for this dataset as a single numpy array.

y

Get the y vector for this dataset as a single numpy array.

class deepchem.data.datasets.DiskDataset(data_dir, verbose=True)[source]

Bases: deepchem.data.datasets.Dataset

A Dataset that is stored as a set of files on disk.

X

Get the X vector for this dataset as a single numpy array.

__len__()[source]

Finds number of elements in dataset.

add_shard(X, y, w, ids)[source]

Adds a data shard.

complete_shuffle(data_dir=None)[source]

Completely shuffle across all data, across all shards.

Note: this loads all the data into ram, and can be prohibitively expensive for larger datasets.

Parameters:shard_size (int) – size of the resulting dataset’s size. If None, then the first shard’s shard_size will be used.
Returns:A DiskDataset with a single shard.
Return type:DiskDatasset
static create_dataset(shard_generator, data_dir=None, tasks=[], verbose=True)[source]

Creates a new DiskDataset

Parameters:
  • shard_generator (Iterable) – An iterable (either a list or generator) that provides tuples of data (X, y, w, ids). Each tuple will be written to a separate shard on disk.
  • data_dir (str) – Filename for data directory. Creates a temp directory if none specified.
  • tasks (list) – List of tasks for this dataset.
static from_numpy(X, y, w=None, ids=None, tasks=None, data_dir=None, verbose=True)[source]

Creates a DiskDataset object from specified Numpy arrays.

get_data_shape()[source]

Gets array shape of datapoints in this dataset.

get_label_means()[source]

Return pandas series of label means.

get_label_stds()[source]

Return pandas series of label stds.

get_number_shards()[source]

Returns the number of shards for this dataset.

get_shape()[source]

Finds shape of dataset.

get_shard(i)[source]

Retrieves data for the i-th shard from disk.

get_shard_size()[source]

Gets size of shards on disk.

get_statistics(X_stats=True, y_stats=True)

Compute and return statistics of this dataset.

get_task_names()[source]

Gets learning tasks associated with this dataset.

ids

Get the ids vector for this dataset as a single numpy array.

iterbatches(batch_size=None, epoch=0, deterministic=False, pad_batches=False)[source]

Get an object that iterates over minibatches from the dataset. It is guaranteed that the number of batches returned is math.ceil(len(dataset)/batch_size).

Each minibatch is returned as a tuple of four numpy arrays: (X, y, w, ids).

batch_size: int
Number of elements in a batch. If None, then it yields batches with size equal to the size of each individual shard.
epoch: int
Not used
deterministic: bool
Whether or not we should should shuffle each shard before generating the batches. Note that this is only local in the sense that it does not ever mix between different shards.
pad_batches: bool
Whether or not we should pad the last batch, globally, such that it has exactly batch_size elements.
itersamples()[source]

Get an object that iterates over the samples in the dataset.

Example:

>>> dataset = DiskDataset.from_numpy(np.ones((2,2)), np.ones((2,1)), verbose=False)
>>> for x, y, w, id in dataset.itersamples():
...   print(x, y, w, id)
[1. 1.] [1.] [1.] 0
[1. 1.] [1.] [1.] 1
itershards()[source]

Return an object that iterates over all shards in dataset.

Datasets are stored in sharded fashion on disk. Each call to next() for the generator defined by this function returns the data from a particular shard. The order of shards returned is guaranteed to remain fixed.

load_metadata()[source]
make_iterator(batch_size=100, epochs=1, deterministic=False, pad_batches=False)

Create a tf.data.Iterator that iterates over the data in this Dataset.

The iterator’s get_next() method returns a tuple of three tensors (X, y, w) which can be used to retrieve the features, labels, and weights respectively.

Parameters:
  • batch_size (int) – the number of samples to include in each batch
  • epochs (int) – the number of times to iterate over the Dataset
  • deterministic (bool) – if True, the data is produced in order. If False, a different random permutation of the data is used for each epoch.
  • pad_batches (bool) – if True, batches are padded as necessary to make the size of each batch exactly equal batch_size.
static merge(datasets, merge_dir=None)[source]

Merges provided datasets into a merged dataset.

move(new_data_dir)[source]

Moves dataset to new directory.

reshard(shard_size)[source]

Reshards data to have specified shard size.

save_to_disk()[source]

Save dataset to disk.

select(indices, select_dir=None)[source]

Creates a new dataset from a selection of indices from self.

Parameters:
  • select_dir (string) – Path to new directory that the selected indices will be copied to.
  • indices (list) – List of indices to select.
set_shard(shard_num, X, y, w, ids)[source]

Writes data shard to disk

shuffle_each_shard()[source]

Shuffles elements within each shard of the datset.

shuffle_shards()[source]

Shuffles the order of the shards for this dataset.

sparse_shuffle()[source]

Shuffling that exploits data sparsity to shuffle large datasets.

Only for 1-dimensional feature vectors (does not work for tensorial featurizations).

subset(shard_nums, subset_dir=None)[source]

Creates a subset of the original dataset on disk.

transform(fn, **args)[source]

Construct a new dataset by applying a transformation to every sample in this dataset.

The argument is a function that can be called as follows:

>> newx, newy, neww = fn(x, y, w)

It might be called only once with the whole dataset, or multiple times with different subsets of the data. Each time it is called, it should transform the samples and return the transformed data.

Parameters:
  • fn (function) – A function to apply to each sample in the dataset
  • out_dir (string) – The directory to save the new dataset in. If this is omitted, a temporary directory is created automatically
Returns:

Return type:

a newly constructed Dataset object

w

Get the weight vector for this dataset as a single numpy array.

static write_data_to_disk(data_dir, basename, tasks, X=None, y=None, w=None, ids=None)[source]
y

Get the y vector for this dataset as a single numpy array.

class deepchem.data.datasets.NumpyDataset(X, y=None, w=None, ids=None, n_tasks=1)[source]

Bases: deepchem.data.datasets.Dataset

A Dataset defined by in-memory numpy arrays.

X

Get the X vector for this dataset as a single numpy array.

__len__()[source]

Get the number of elements in the dataset.

static from_DiskDataset(ds)[source]
Parameters:
Returns:

Data of ds as NumpyDataset

Return type:

NumpyDataset

static from_json(fname)[source]
get_shape()[source]

Get the shape of the dataset.

Returns four tuples, giving the shape of the X, y, w, and ids arrays.

get_statistics(X_stats=True, y_stats=True)

Compute and return statistics of this dataset.

get_task_names()[source]

Get the names of the tasks associated with this dataset.

ids

Get the ids vector for this dataset as a single numpy array.

iterbatches(batch_size=None, epoch=0, deterministic=False, pad_batches=False)[source]

Get an object that iterates over minibatches from the dataset.

Each minibatch is returned as a tuple of four numpy arrays: (X, y, w, ids).

itersamples()[source]

Get an object that iterates over the samples in the dataset.

Example:

>>> dataset = NumpyDataset(np.ones((2,2)))
>>> for x, y, w, id in dataset.itersamples():
...   print(x, y, w, id)
[1. 1.] [0.] [0.] 0
[1. 1.] [0.] [0.] 1
make_iterator(batch_size=100, epochs=1, deterministic=False, pad_batches=False)

Create a tf.data.Iterator that iterates over the data in this Dataset.

The iterator’s get_next() method returns a tuple of three tensors (X, y, w) which can be used to retrieve the features, labels, and weights respectively.

Parameters:
  • batch_size (int) – the number of samples to include in each batch
  • epochs (int) – the number of times to iterate over the Dataset
  • deterministic (bool) – if True, the data is produced in order. If False, a different random permutation of the data is used for each epoch.
  • pad_batches (bool) – if True, batches are padded as necessary to make the size of each batch exactly equal batch_size.
static merge(datasets)[source]
Parameters:datasets (list of deepchem.data.NumpyDataset) – list of datasets to merge
Returns:
Return type:Single deepchem.data.NumpyDataset with data concatenated over axis 0
select(indices, select_dir=None)[source]

Creates a new dataset from a selection of indices from self.

TODO(rbharath): select_dir is here due to dc.splits always passing in splits.

Parameters:
  • indices (list) – List of indices to select.
  • select_dir (string) – Ignored.
static to_json(fname)[source]
transform(fn, **args)[source]

Construct a new dataset by applying a transformation to every sample in this dataset.

The argument is a function that can be called as follows:

>> newx, newy, neww = fn(x, y, w)

It might be called only once with the whole dataset, or multiple times with different subsets of the data. Each time it is called, it should transform the samples and return the transformed data.

Parameters:fn (function) – A function to apply to each sample in the dataset
Returns:
Return type:a newly constructed Dataset object
w

Get the weight vector for this dataset as a single numpy array.

y

Get the y vector for this dataset as a single numpy array.

deepchem.data.datasets.densify_features(X_sparse, num_features)[source]

Expands sparse feature representation to dense feature array.

deepchem.data.datasets.pad_batch(batch_size, X_b, y_b, w_b, ids_b)[source]

Pads batch to have size precisely batch_size elements.

Fills in batch by wrapping around samples till whole batch is filled.

deepchem.data.datasets.pad_features(batch_size, X_b)[source]

Pads a batch of features to have precisely batch_size elements.

Version of pad_batch for use at prediction time.

deepchem.data.datasets.sparsify_features(X)[source]

Extracts a sparse feature representation from dense feature array.

deepchem.data.supports module

Sample supports from datasets.

class deepchem.data.supports.EpisodeGenerator(dataset, n_pos, n_neg, n_test, n_episodes_per_task)[source]

Bases: object

Generates (support, test) pairs for episodic training.

Precomputes all (support, test) pairs at construction. Allows to reduce overhead from computation.

__next__()

Sample next (support, test) pair.

Return from internal storage.

next()[source]

Sample next (support, test) pair.

Return from internal storage.

class deepchem.data.supports.SupportGenerator(dataset, n_pos, n_neg, n_trials)[source]

Bases: object

Generate support sets from a dataset.

Iterates over tasks and trials. For each trial, picks one support from each task, and returns in a randomized order

__next__()

Sample next support.

Supports are sampled from the tasks in a random order. Each support is drawn entirely from within one task.

next()[source]

Sample next support.

Supports are sampled from the tasks in a random order. Each support is drawn entirely from within one task.

deepchem.data.supports.dataset_difference(dataset, remove)[source]

Removes the compounds in remove from dataset.

Parameters:
  • dataset (dc.data.Dataset) – Source dataset.
  • remove (dc.data.Dataset) – Dataset whose overlap will be removed.
deepchem.data.supports.get_single_task_support(dataset, n_pos, n_neg, task, replace=True)[source]

Generates one support set purely for specified task.

Parameters:
  • datasets (dc.data.Dataset) – Dataset from which supports are sampled.
  • n_pos (int) – Number of positive samples in support.
  • n_neg (int) – Number of negative samples in support.
  • task (int) – Index of current task.
  • replace (bool, optional) – Whether or not to use replacement when sampling supports.
Returns:

List of NumpyDatasets, each of which is a support set.

Return type:

list

deepchem.data.supports.get_single_task_test(dataset, batch_size, task, replace=True)[source]

Gets test set from specified task.

Samples random subset of size batch_size from specified task of dataset. Ensures that sampled points have measurements for this task.

deepchem.data.supports.get_task_dataset(dataset, task)[source]

Selects out entries for a particular task.

deepchem.data.supports.get_task_dataset_minus_support(dataset, support, task)[source]

Gets data for specified task, minus support points.

Useful for evaluating model performance once trained (so that test compounds can be ensured distinct from support.)

Parameters:
  • dataset (dc.data.Dataset) – Source dataset.
  • support (dc.data.Dataset) – The support dataset
  • task (int) – Task number of task to select.
deepchem.data.supports.get_task_support(dataset, n_episodes, n_pos, n_neg, task, log_every_n=50)[source]

Generates one support set purely for specified task.

Parameters:
  • datasets (dc.data.Dataset) – Dataset from which supports are sampled.
  • n_episodes (int) – Number of episodes for which supports have to be sampled from this task.
  • n_pos (int) – Number of positive samples in support.
  • n_neg (int) – Number of negative samples in support.
  • task (int) – Index of current task.
  • log_every_n (int, optional) – Prints every log_every_n supports sampled.
Returns:

List of NumpyDatasets, each of which is a support set.

Return type:

list

deepchem.data.supports.get_task_test(dataset, n_episodes, n_test, task, log_every_n=50)[source]

Gets test set from specified task.

Parameters:
  • dataset (dc.data.Dataset) – Dataset from which to sample.
  • n_episodes (int) – Number of episodes to sample test sets for.
  • n_test (int) – Number of compounds per test set.
  • log_every_n (int, optional) – Prints every log_every_n supports sampled.
deepchem.data.supports.remove_dead_examples(dataset)[source]

Removes compounds with no weight.

Parameters:dataset (dc.data.Dataset) – Source dataset.

deepchem.data.test_data_loader module

class deepchem.data.test_data_loader.TestCSVLoader(methodName='runTest')[source]

Bases: unittest.case.TestCase

addCleanup(function, *args, **kwargs)

Add a function, with arguments, to be called when the test is completed. Functions added are called on a LIFO basis and are called after tearDown on test failure or success.

Cleanup items are called even if setUp fails (unlike tearDown).

addTypeEqualityFunc(typeobj, function)

Add a type specific assertEqual style function to compare a type.

This method is for use by TestCase subclasses that need to register their own type equality functions to provide nicer error messages.

Parameters:
  • typeobj – The data type to call this function on when both values are of the same type in assertEqual().
  • function – The callable taking two arguments and an optional msg= argument that raises self.failureException with a useful error message when the two arguments are not equal.
assertAlmostEqual(first, second, places=None, msg=None, delta=None)

Fail if the two objects are unequal as determined by their difference rounded to the given number of decimal places (default 7) and comparing to zero, or by comparing that the between the two objects is more than the given delta.

Note that decimal places (from zero) are usually not the same as significant digits (measured from the most significant digit).

If the two objects compare equal then they will automatically compare almost equal.

assertAlmostEquals(*args, **kwargs)
assertCountEqual(first, second, msg=None)

An unordered sequence comparison asserting that the same elements, regardless of order. If the same element occurs more than once, it verifies that the elements occur the same number of times.

self.assertEqual(Counter(list(first)),
Counter(list(second)))
Example:
  • [0, 1, 1] and [1, 0, 1] compare equal.
  • [0, 0, 1] and [0, 1] compare unequal.
assertDictContainsSubset(subset, dictionary, msg=None)

Checks whether dictionary is a superset of subset.

assertDictEqual(d1, d2, msg=None)
assertEqual(first, second, msg=None)

Fail if the two objects are unequal as determined by the ‘==’ operator.

assertEquals(*args, **kwargs)
assertFalse(expr, msg=None)

Check that the expression is false.

assertGreater(a, b, msg=None)

Just like self.assertTrue(a > b), but with a nicer default message.

assertGreaterEqual(a, b, msg=None)

Just like self.assertTrue(a >= b), but with a nicer default message.

assertIn(member, container, msg=None)

Just like self.assertTrue(a in b), but with a nicer default message.

assertIs(expr1, expr2, msg=None)

Just like self.assertTrue(a is b), but with a nicer default message.

assertIsInstance(obj, cls, msg=None)

Same as self.assertTrue(isinstance(obj, cls)), with a nicer default message.

assertIsNone(obj, msg=None)

Same as self.assertTrue(obj is None), with a nicer default message.

assertIsNot(expr1, expr2, msg=None)

Just like self.assertTrue(a is not b), but with a nicer default message.

assertIsNotNone(obj, msg=None)

Included for symmetry with assertIsNone.

assertLess(a, b, msg=None)

Just like self.assertTrue(a < b), but with a nicer default message.

assertLessEqual(a, b, msg=None)

Just like self.assertTrue(a <= b), but with a nicer default message.

assertListEqual(list1, list2, msg=None)

A list-specific equality assertion.

Parameters:
  • list1 – The first list to compare.
  • list2 – The second list to compare.
  • msg – Optional message to use on failure instead of a list of differences.
assertLogs(logger=None, level=None)

Fail unless a log message of level level or higher is emitted on logger_name or its children. If omitted, level defaults to INFO and logger defaults to the root logger.

This method must be used as a context manager, and will yield a recording object with two attributes: output and records. At the end of the context manager, the output attribute will be a list of the matching formatted log messages and the records attribute will be a list of the corresponding LogRecord objects.

Example:

with self.assertLogs('foo', level='INFO') as cm:
    logging.getLogger('foo').info('first message')
    logging.getLogger('foo.bar').error('second message')
self.assertEqual(cm.output, ['INFO:foo:first message',
                             'ERROR:foo.bar:second message'])
assertMultiLineEqual(first, second, msg=None)

Assert that two multi-line strings are equal.

assertNotAlmostEqual(first, second, places=None, msg=None, delta=None)

Fail if the two objects are equal as determined by their difference rounded to the given number of decimal places (default 7) and comparing to zero, or by comparing that the between the two objects is less than the given delta.

Note that decimal places (from zero) are usually not the same as significant digits (measured from the most significant digit).

Objects that are equal automatically fail.

assertNotAlmostEquals(*args, **kwargs)
assertNotEqual(first, second, msg=None)

Fail if the two objects are equal as determined by the ‘!=’ operator.

assertNotEquals(*args, **kwargs)
assertNotIn(member, container, msg=None)

Just like self.assertTrue(a not in b), but with a nicer default message.

assertNotIsInstance(obj, cls, msg=None)

Included for symmetry with assertIsInstance.

assertNotRegex(text, unexpected_regex, msg=None)

Fail the test if the text matches the regular expression.

assertNotRegexpMatches(*args, **kwargs)
assertRaises(expected_exception, *args, **kwargs)

Fail unless an exception of class expected_exception is raised by the callable when invoked with specified positional and keyword arguments. If a different type of exception is raised, it will not be caught, and the test case will be deemed to have suffered an error, exactly as for an unexpected exception.

If called with the callable and arguments omitted, will return a context object used like this:

with self.assertRaises(SomeException):
    do_something()

An optional keyword argument ‘msg’ can be provided when assertRaises is used as a context object.

The context manager keeps a reference to the exception as the ‘exception’ attribute. This allows you to inspect the exception after the assertion:

with self.assertRaises(SomeException) as cm:
    do_something()
the_exception = cm.exception
self.assertEqual(the_exception.error_code, 3)
assertRaisesRegex(expected_exception, expected_regex, *args, **kwargs)

Asserts that the message in a raised exception matches a regex.

Parameters:
  • expected_exception – Exception class expected to be raised.
  • expected_regex – Regex (re pattern object or string) expected to be found in error message.
  • args – Function to be called and extra positional args.
  • kwargs – Extra kwargs.
  • msg – Optional message used in case of failure. Can only be used when assertRaisesRegex is used as a context manager.
assertRaisesRegexp(*args, **kwargs)
assertRegex(text, expected_regex, msg=None)

Fail the test unless the text matches the regular expression.

assertRegexpMatches(*args, **kwargs)
assertSequenceEqual(seq1, seq2, msg=None, seq_type=None)

An equality assertion for ordered sequences (like lists and tuples).

For the purposes of this function, a valid ordered sequence type is one which can be indexed, has a length, and has an equality operator.

Parameters:
  • seq1 – The first sequence to compare.
  • seq2 – The second sequence to compare.
  • seq_type – The expected datatype of the sequences, or None if no datatype should be enforced.
  • msg – Optional message to use on failure instead of a list of differences.
assertSetEqual(set1, set2, msg=None)

A set-specific equality assertion.

Parameters:
  • set1 – The first set to compare.
  • set2 – The second set to compare.
  • msg – Optional message to use on failure instead of a list of differences.

assertSetEqual uses ducktyping to support different types of sets, and is optimized for sets specifically (parameters must support a difference method).

assertTrue(expr, msg=None)

Check that the expression is true.

assertTupleEqual(tuple1, tuple2, msg=None)

A tuple-specific equality assertion.

Parameters:
  • tuple1 – The first tuple to compare.
  • tuple2 – The second tuple to compare.
  • msg – Optional message to use on failure instead of a list of differences.
assertWarns(expected_warning, *args, **kwargs)

Fail unless a warning of class warnClass is triggered by the callable when invoked with specified positional and keyword arguments. If a different type of warning is triggered, it will not be handled: depending on the other warning filtering rules in effect, it might be silenced, printed out, or raised as an exception.

If called with the callable and arguments omitted, will return a context object used like this:

with self.assertWarns(SomeWarning):
    do_something()

An optional keyword argument ‘msg’ can be provided when assertWarns is used as a context object.

The context manager keeps a reference to the first matching warning as the ‘warning’ attribute; similarly, the ‘filename’ and ‘lineno’ attributes give you information about the line of Python code from which the warning was triggered. This allows you to inspect the warning after the assertion:

with self.assertWarns(SomeWarning) as cm:
    do_something()
the_warning = cm.warning
self.assertEqual(the_warning.some_attribute, 147)
assertWarnsRegex(expected_warning, expected_regex, *args, **kwargs)

Asserts that the message in a triggered warning matches a regexp. Basic functioning is similar to assertWarns() with the addition that only warnings whose messages also match the regular expression are considered successful matches.

Parameters:
  • expected_warning – Warning class expected to be triggered.
  • expected_regex – Regex (re pattern object or string) expected to be found in error message.
  • args – Function to be called and extra positional args.
  • kwargs – Extra kwargs.
  • msg – Optional message used in case of failure. Can only be used when assertWarnsRegex is used as a context manager.
assert_(*args, **kwargs)
countTestCases()
debug()

Run the test without collecting errors in a TestResult

defaultTestResult()
doCleanups()

Execute all cleanup functions. Normally called for you after tearDown.

fail(msg=None)

Fail immediately, with the given message.

failIf(*args, **kwargs)
failIfAlmostEqual(*args, **kwargs)
failIfEqual(*args, **kwargs)
failUnless(*args, **kwargs)
failUnlessAlmostEqual(*args, **kwargs)
failUnlessEqual(*args, **kwargs)
failUnlessRaises(*args, **kwargs)
failureException

alias of AssertionError

id()
longMessage = True
maxDiff = 640
run(result=None)
setUp()

Hook method for setting up the test fixture before exercising it.

setUpClass()

Hook method for setting up class fixture before running tests in the class.

shortDescription()

Returns a one-line description of the test, or None if no description has been provided.

The default implementation of this method returns the first line of the specified test method’s docstring.

skipTest(reason)

Skip this test.

subTest(msg=<object object>, **params)

Return a context manager that will return the enclosed block of code in a subtest identified by the optional message and keyword parameters. A failure in the subtest marks the test case as failed but resumes execution at the end of the enclosed block, allowing further test code to be executed.

tearDown()

Hook method for deconstructing the test fixture after testing it.

tearDownClass()

Hook method for deconstructing the class fixture after running all tests in the class.

test_load_singleton_csv()[source]

Module contents

Gathers all datasets in one place for convenient imports