Python module reference

This module reference extends the manual with a comprehensive overview of the available functionality built into datalad. Each module in the package is documented by a general summary of its purpose and the list of classes and functions it provides.

High-level user interface

Dataset operations

api.Dataset(*args, **kwargs)

Representation of a DataLad dataset/repository

api.create([path, initopts, force, …])

Create a new dataset from scratch.

api.create_sibling(sshurl[, name, …])

Create a dataset sibling on a UNIX-like Shell (local or SSH)-accessible machine

api.create_sibling_github(reponame[, …])

Create dataset sibling on GitHub.

api.create_sibling_gitlab([path, site, …])

Create dataset sibling at a GitLab site

api.drop([path, dataset, recursive, …])

Drop file content from datasets

api.get([path, source, dataset, recursive, …])

Get any dataset content (files/directories/subdatasets).

api.install([path, source, dataset, …])

Install a dataset from a (remote) source.

api.publish([path, dataset, to, since, …])

Publish a dataset to a known sibling.

api.remove([path, dataset, recursive, …])

Remove components from datasets

api.save([path, message, dataset, …])

Save the current state of a dataset

api.update([path, sibling, merge, follow, …])

Update a dataset from a sibling.

api.uninstall([path, dataset, recursive, …])

Uninstall subdatasets

api.unlock([path, dataset, recursive, …])

Unlock file(s) of a dataset

Metadata handling

api.search([query, dataset, force_reindex, …])

Search dataset metadata

api.metadata([path, dataset, …])

Metadata reporting for files and entire datasets

api.aggregate_metadata([path, dataset, …])

Aggregate metadata of one or more datasets for later query.

api.extract_metadata(types[, files, dataset])

Run one or more of DataLad’s metadata extractors on a dataset or file.

Reproducible execution

api.run([cmd, dataset, inputs, outputs, …])

Run an arbitrary shell command and record its impact on a dataset.

api.rerun([revision, since, dataset, …])

Re-execute previous datalad run commands.

api.run_procedure([spec, dataset, discover, …])

Run prepared procedures (DataLad scripts) on a dataset

Plumbing commands

api.annotate_paths([path, dataset, …])

Analyze and act upon input paths

api.clean([dataset, what, recursive, …])

Clean up after DataLad (possible temporary files etc.)

api.clone(source[, path, dataset, …])

Obtain a dataset (copy) from a URL or local directory

api.copy_file([path, dataset, recursive, …])

Copy files and their availability metadata from one dataset to another.

api.create_test_dataset([path, spec, seed])

Create test (meta-)dataset.

api.diff([path, fr, to, dataset, annex, …])

Report differences between two states of a dataset (hierarchy)

api.download_url(urls[, dataset, path, …])

Download content

api.ls(loc[, recursive, fast, all_, long_, …])

List summary information about URLs and dataset(s)

api.push([path, dataset, to, since, data, …])

Push a dataset to a known sibling.

api.sshrun(login, cmd[, port, ipv4, ipv6, …])

Run command on remote machines via SSH.

api.siblings([action, dataset, name, url, …])

Manage sibling configuration

api.subdatasets([path, dataset, fulfilled, …])

Report subdatasets and their properties.

Miscellaneous commands

api.add_archive_content(archive[, annex, …])

Add content of an archive under git annex control.

api.test([module, verbose, nocapture, pdb, stop])

Run internal DataLad (unit)tests.

Plugins

DataLad can be customized by plugins. The following plugins are shipped with DataLad.

add_readme(dataset[, filename, existing])

Add basic information about DataLad datasets to a README file

addurls(dataset, urlfile, urlformat, …[, …])

Create and update a dataset from a list of URLs.

check_dates(paths[, reference_date, revs, …])

Find repository dates that are more recent than a reference date.

export_archive(dataset[, filename, …])

Export the content of a dataset as a TAR/ZIP archive.

export_to_figshare(dataset[, filename, …])

Export the content of a dataset as a ZIP archive to figshare

no_annex(dataset, pattern[, ref_dir, makedirs])

Configure a dataset to never put some content into the dataset’s annex

wtf([dataset, sensitive, sections, flavor, …])

Generate a report about the DataLad installation and configuration

Support functionality

auto

Proxy basic file operations (e.g.

cmd

Wrapper for command and function calls, allowing for dry runs and output handling

consts

constants for datalad

log

utils

version

support.gitrepo

Internal low-level interface to Git repositories

support.annexrepo

Interface to git-annex by Joey Hess.

support.archives

Various handlers/functionality for different types of files (e.g.

support.configparserinc

customremotes.main

customremotes.base

Base classes to custom git-annex remotes (e.g.

customremotes.archives

Custom remote to support getting the load from archives present under annex

Configuration management

config

Test infrastructure

tests.heavyoutput

Helper to provide heavy load on stdout and stderr

Command line interface infrastructure

cmdline.main

cmdline.helpers

cmdline.common_args