MDP

Software Screenshot:
MDP
Software Details:
Version: 3.3
Upload Date: 11 May 15
Distribution Type: Freeware
Downloads: 6

Rating: 3.0/5 (Total Votes: 2)

MDP (Modular toolkit for Data Processing) is a library of widely used data processing algorithms that can be combined according to a pipeline analogy to build more complex data processing software.

From the user's perspective, MDP consists of a collection of supervised and unsupervised learning algorithms, and other data processing units (nodes) that can be combined into data processing sequences (flows) and more complex feed-forward network architectures. Given a set of input data, MDP takes care of successively training or executing all nodes in the network. This allows the user to specify complex algorithms as a series of simpler data processing steps in a natural way.

The base of available algorithms is steadily increasing and includes, to name but the most common, Principal Component Analysis (PCA and NIPALS), several Independent Component Analysis algorithms (CuBICA, FastICA, TDSEP, JADE, and XSFA), Slow Feature Analysis, Gaussian Classifiers, Restricted Boltzmann Machine, and Locally Linear Embedding.

Particular care has been taken to make computations efficient in terms of speed and memory. To reduce memory requirements, it is possible to perform learning using batches of data, and to define the internal parameters of the nodes to be single precision, which makes the usage of very large data sets possible. Moreover, the 'parallel' subpackage offers a parallel implementation of the basic nodes and flows.

From the developer's perspective, MDP is a framework that makes the implementation of new supervised and unsupervised learning algorithms easy and straightforward. The basic class, 'Node', takes care of tedious tasks like numerical type and dimensionality checking, leaving the developer free to concentrate on the implementation of the learning and execution phases. Because of the common interface, the node then automatically integrates with the rest of the library and can be used in a network together with other nodes. A node can have multiple training phases and even an undetermined number of phases. This allows the implementation of algorithms that need to collect some statistics on the whole input before proceeding with the actual training, and others that need to iterate over a training phase until a convergence criterion is satisfied. The ability to train each phase using chunks of input data is maintained if the chunks are generated with iterators. Moreover, crash recovery is optionally available: in case of failure, the current state of the flow is saved for later inspection.

MDP has been written in the context of theoretical research in neuroscience, but it has been designed to be helpful in any context where trainable data processing algorithms are used. Its simplicity on the user side together with the reusability of the implemented nodes make it also a valid educational tool.

What is new in this release:

  • Python 3 support.
  • New extensions: caching and gradient.
  • An improved and expanded tutorial.
  • Several improvements and bugfixes.
  • This release is under a BSD license.

What is new in version 2.5:

  • 2009-06-30: Added online detection of numerical backend, parallel python support, symeig backend and numerical backend to the output of unit tests. Should help in debugging.
  • 2009-06-12: Integration of the cutoff and histogram nodes.
  • 2009-06-12: Fixed bug in parallel flow (exception handling).
  • 2009-06-09: Fixed bug in LLENode when output_dim is a float. Thanks to Konrad Hinsen.
  • 2009-06-05: Fixed bugs in parallel flow for multiple schedulers.
  • 2009-06-05: Fixed a bug in layer inverse, thanks to Alberto Escalante.
  • 2009-04-29: Added a LinearRegressionNode.
  • 2009-03-31: PCANode does not complain anymore when covariance matrix has negative eigenvalues iff svd==True or reduce==True. If output_dim has been specified has a desired variance, negative eigenvalues are ignored. Improved error message for SFANode in case of negative eigenvalues, we now suggest to prepend the node with a PCANode(svd=True) or PCANode(reduce=True).
  • 2009-03-26: Migrated from old thread package to the new threading one. Added flag to disable caching in process scheduler. There are some breaking changes for custom schedulers (parallel flow training or execution is not affected).
  • 2009-03-25: Added svn revision tracking support.
  • 2009-03-25: Removed the copy_callable flag for scheduler, this is now completely replaced by forking the TaskCallable. This has no effect for the convenient ParallelFlow interface, but custom schedulers get broken.
  • 2009-03-22: Implemented caching in the ProcessScheduler.
  • 2009-02-22: make_parallel now works completely in-place to save memory.
  • 2009-02-12: Added container methods to FlowNode.
  • 2009-03-03: Added CrossCovarianceMatrix with tests.
  • 2009-02-03: Added IdentityNode.
  • 2009-01-30: Added a helper function in hinet to directly display a flow HTML representation.
  • 2009-01-22: Allow output_dim in Layer to be set lazily.
  • 2008-12-23: Added total_variance to the nipals node.
  • 2008-12-23: Always set explained_variance and total_variance after training in PCANode.
  • 2008-12-12: Modified symrand to really return symmetric matrices (and not only positive definite). Adapted GaussianClassifierNode to account for that. Adapted symrand to return also complex hermitian matrices.
  • 2008-12-11: Fixed one problem in PCANode (when output_dim was set to input_dim the total variance was treated as unknown). Fixed var_part parameter in ParallelPCANode.
  • 2008-12-11: Added var_part feature to PCANode (filter according to variance relative to absoute variance).
  • 2008-12-04: Fixed missing axis arg in amax call in tutorial. Thanks to Samuel John!
  • 2008-12-04: Fixed the empty data iterator handling in ParallelFlow. Also added empty iterator checks in the normal Flow (raise an exception if the iterator is empty).
  • 2008-11-19: Modified pca and sfa nodes to check for negaive eigenvalues in the cov matrices
  • 2008-11-19: symeig integrated in scipy, mdp can use it from there now.
  • 2008-11-18: Added ParallelFDANode.
  • 2008-11-18: Updated the train callable for ParallelFlow to support additional arguments.
  • 2008-11-05: Rewrite of the make parallel code, now supports hinet structures.
  • 2008-11-03: Rewrite of the hinet HTML repesentation creator. Unfortunately this also breaks the public interface, but the changes are pretty simple.
  • 2008-10-29: Shut off warnings coming from remote processes in ProcessScheduler
  • 2008-10-27: Fixed problem with overwriting kwargs in the init method of ParallelFlow.
  • 2008-10-24: Fixed pretrained nodes bug in hinet.FlowNode.
  • 2008-10-20: Fixed critical import bug in parallel package when pp (parallel python library) is installed.

Requirements:

  • Python
  • NumPy
  • SciPy

Similar Software

Fractal Fr0st
Fractal Fr0st

3 Jun 15

Hilbert II
Hilbert II

20 Feb 15

Lolimot
Lolimot

2 Jun 15

Zasio
Zasio

3 Jun 15

Comments to MDP

Comments not found
Add Comment
Turn on images!