Package mdp
[hide private]
[frames] | no frames]

Package mdp

**The Modular toolkit for Data Processing (MDP)** package is a library
of widely used data processing algorithms, and the possibility to
combine them together to form pipelines for building more complex
data processing software.

MDP has been designed to be used as-is and as a framework for
scientific data processing development.

From the user's perspective, MDP consists of a collection of *units*,
which process data. For example, these include algorithms for
supervised and unsupervised learning, principal and independent
components analysis and classification.

These units can be chained into data processing flows, to create
pipelines as well as more complex feed-forward network
architectures. Given a set of input data, MDP takes care of training
and executing all nodes in the network in the correct order and
passing intermediate data between the nodes. This allows the user to
specify complex algorithms as a series of simpler data processing
steps.

The number of available algorithms is steadily increasing and includes
signal processing methods (Principal Component Analysis, Independent
Component Analysis, Slow Feature Analysis), manifold learning methods
([Hessian] Locally Linear Embedding), several classifiers,
probabilistic methods (Factor Analysis, RBM), data pre-processing
methods, and many others.

Particular care has been taken to make computations efficient in terms
of speed and memory. To reduce the memory footprint, it is possible to
perform learning using batches of data. For large data-sets, it is
also possible to specify that MDP should use single precision floating
point numbers rather than double precision ones. Finally, calculations
can be parallelised using the ``parallel`` subpackage, which offers a
parallel implementation of the basic nodes and flows.

From the developer's perspective, MDP is a framework that makes the
implementation of new supervised and unsupervised learning algorithms
easy and straightforward. The basic class, ``Node``, takes care of tedious
tasks like numerical type and dimensionality checking, leaving the
developer free to concentrate on the implementation of the learning
and execution phases. Because of the common interface, the node then
automatically integrates with the rest of the library and can be used
in a network together with other nodes.

A node can have multiple training phases and even an undetermined
number of phases. Multiple training phases mean that the training data
is presented multiple times to the same node. This allows the
implementation of algorithms that need to collect some statistics on
the whole input before proceeding with the actual training, and others
that need to iterate over a training phase until a convergence
criterion is satisfied. It is possible to train each phase using
chunks of input data if the chunks are given as an iterable. Moreover,
crash recovery can be optionally enabled, which will save the state of
the flow in case of a failure for later inspection.

MDP is distributed under the open source BSD license. It has been
written in the context of theoretical research in neuroscience, but it
has been designed to be helpful in any context where trainable data
processing algorithms are used. Its simplicity on the user's side, the
variety of readily available algorithms, and the reusability of the
implemented nodes also make it a useful educational tool.

http://mdp-toolkit.sourceforge.net


Version: 3.5

Author: MDP Developers

Contact: mdp-toolkit@python.org

Copyright: (c) 2003-2016 mdp-toolkit@python.org

License: BSD License, see COPYRIGHT

Classes [hide private]
CheckpointFlow
Subclass of Flow class that allows user-supplied checkpoint functions to be executed at the end of each phase, for example to save the internal structures of a node for later analysis.
CheckpointFunction
Base class for checkpoint functions.
CheckpointSaveFunction
This checkpoint function saves the node in pickle format.
CircularOnlineFlow
A 'CircularOnlineFlow' is a cyclic sequence of online/non-trainable nodes that are trained and executed together to form a more complex algorithm.
CircularOnlineFlowException
Base class for exceptions in Flow subclasses.
ClassifierCumulator
A ClassifierCumulator is a Node whose training phase simply collects all input data and labels.
ClassifierNode
A ClassifierNode can be used for classification tasks that should not interfere with the normal execution flow.
CrashRecoveryException
Class to handle crash recovery
Cumulator
A specialized version of `VariadicCumulator` which only fills the field ``self.data``.
ExtensionNode
Base class for extensions nodes.
ExtensionNodeMetaclass
This is the metaclass for node extension superclasses.
Flow
A 'Flow' is a sequence of nodes that are trained and executed together to form a more complex algorithm.
FlowException
Base class for exceptions in Flow subclasses.
FlowExceptionCR
Class to handle flow-crash recovery
IsNotInvertibleException
Raised when the `Node.inverse` method is called although the node is not invertible.
IsNotTrainableException
Raised when the `Node.train` method is called although the node is not trainable.
MDPDeprecationWarning
Warn about deprecated MDP API.
MDPException
Base class for exceptions in MDP.
MDPWarning
Base class for warnings in MDP.
Node
A `Node` is the basic building block of an MDP application.
NodeException
Base class for exceptions in `Node` subclasses.
NodeMetaclass
A metaclass which copies docstrings from private to public methods.
OnlineFlow
An 'OnlineFlow' is a sequence of nodes that are trained online and executed together to form a more complex algorithm.
OnlineFlowException
Base class for exceptions in Flow subclasses.
OnlineNode
An online Node (OnlineNode) is the basic building block of an online MDP application.
OnlineNodeException
Base class for exceptions in `OnlineNode` subclasses.
PreserveDimNode
Abstract base class with ``output_dim == input_dim``.
PreserveDimOnlineNode
Abstract base class with ``output_dim == input_dim``.
TrainingException
Base class for exceptions in the training phase.
TrainingFinishedException
Raised when the `Node.train` method is called although the training phase is closed.
config
Provide information about optional dependencies.
extension
Context manager for MDP extension.
Functions [hide private]
 
VariadicCumulator(*fields)
A VariadicCumulator is a `Node` whose training phase simply collects all input data.
 
activate_extension(extension_name, verbose=False)
Activate the extension by injecting the extension methods.
 
activate_extensions(extension_names, verbose=False)
Activate all the extensions for the given names.
 
deactivate_extension(extension_name, verbose=False)
Deactivate the extension by removing the injected methods.
 
deactivate_extensions(extension_names, verbose=False)
Deactivate all the extensions for the given names.
 
extension_method(extension_name, node_cls, method_name=None)
Returns a decorator to register a function as an extension method.
 
extension_setup(extension_name)
Returns a decorator to register a setup function for an extension.
 
extension_teardown(extension_name)
Returns a decorator to register a teardown function for an extension.
 
fastica(x, **kwargs)
Perform Independent Component Analysis on input data using the FastICA algorithm by Aapo Hyvarinen.
 
get_extensions()
Return a dictionary currently registered extensions.
 
pca(x, **kwargs)
Filters multidimensional input data through its principal components.
 
with_extension(extension_name)
Return a wrapper function to activate and deactivate the extension.
Variables [hide private]
  __homepage__ = 'http://mdp-toolkit.sourceforge.net'
  __medium_description__ = '**Modular toolkit for Data Processin...
  __package__ = 'mdp'
  __revision__ = ''
  __short_description__ = 'MDP is a Python library for building ...
  numx_description = 'scipy'
Function Details [hide private]

VariadicCumulator(*fields)

 
A VariadicCumulator is a `Node` whose training phase simply collects
all input data. In this way it is possible to easily implement
batch-mode learning.

The data is accessible in the attributes given with the VariadicCumulator's
constructor after the beginning of the `Node._stop_training` phase.
``self.tlen`` contains the number of data points collected.

activate_extension(extension_name, verbose=False)

 
Activate the extension by injecting the extension methods.

activate_extensions(extension_names, verbose=False)

 
Activate all the extensions for the given names.

extension_names -- Sequence of extension names.

deactivate_extension(extension_name, verbose=False)

 
Deactivate the extension by removing the injected methods.

deactivate_extensions(extension_names, verbose=False)

 
Deactivate all the extensions for the given names.

extension_names -- Sequence of extension names.

extension_method(extension_name, node_cls, method_name=None)

 
Returns a decorator to register a function as an extension method.

:Parameters:
  extension_name
    String with the name of the extension.
  node_cls
    Node class for which the method should be registered.
  method_name
    Name of the extension method (default value is ``None``).

    If no value is provided then the name of the function is used.

Note that it is possible to directly call other extension functions, call
extension methods in other node classes or to use super in the normal way
(the function will be called as a method of the node class).

extension_setup(extension_name)

 
Returns a decorator to register a setup function for an extension.

:Parameters:
  extension_name
    String with the name of the extension.

The decorated function will be called when the extension is activated.

Note that there is also the extension_teardown decorator, which should
probably defined as well if there is a setup procedure.

extension_teardown(extension_name)

 
Returns a decorator to register a teardown function for an extension.

:Parameters:
  extension_name
    String with the name of the extension.

The decorated function will be called when the extension is deactivated.

fastica(x, **kwargs)

 
Perform Independent Component Analysis on input data using the FastICA
algorithm by Aapo Hyvarinen.

Observations of the same variable are stored on rows, different variables
are stored on columns.

This is a shortcut function for the corresponding node `nodes.FastICANode`.
If any keyword arguments are specified, they are passed to its constructor.

This is equivalent to ``mdp.nodes.FastICANode(**kwargs)(x)``

get_extensions()

 
Return a dictionary currently registered extensions.

Note that this is not a copy, so if you change anything in this dict
the whole extension mechanism will be affected. If you just want the
names of the available extensions use get_extensions().keys().

pca(x, **kwargs)

 
Filters multidimensional input data through its principal components.

Observations of the same variable are stored on rows, different variables
are stored on columns.

This is a shortcut function for the corresponding node `nodes.PCANode`. If any
keyword arguments are specified, they are passed to its constructor.

This is equivalent to ``mdp.nodes.PCANode(**kwargs)(x)``

with_extension(extension_name)

 
Return a wrapper function to activate and deactivate the extension.

This function is intended to be used with the decorator syntax.

The deactivation happens only if the extension was activated by
the decorator (not if it was already active before). So this
decorator ensures that the extensions is active and prevents
unintended side effects.

If the generated function is a generator, the extension will be in
effect only when the generator object is created (that is when the
function is called, but its body is not actually immediately
executed). When the function body is executed (after ``next`` is
called on the generator object), the extension might not be in
effect anymore. Therefore, it is better to use the `extension`
context manager with a generator function.


Variables Details [hide private]

__homepage__

Value:
'http://mdp-toolkit.sourceforge.net'

__medium_description__

Value:
'''**Modular toolkit for Data Processing (MDP)** is a Python data proc\
essing framework.

From the user\'s perspective, MDP is a collection of supervised and un\
supervised
learning algorithms and other data processing units that can be combin\
ed into
data processing sequences and more complex feed-forward network archit\
...

__package__

Value:
'mdp'

__revision__

Value:
''

__short_description__

Value:
'MDP is a Python library for building complex data processing software\
 by combining widely used machine learning algorithms into pipelines a\
nd networks.'

numx_description

Value:
'scipy'