Dapper Dataflow Engine

Software Screenshot:
Dapper Dataflow Engine
Software Details:
Version: 0.98
Upload Date: 12 May 15
Developer: Roy Liu
Distribution Type: Freeware
Downloads: 10

Rating: 2.0/5 (Total Votes: 1)

Dapper (Distributed and Parallel Program Execution Runtime) is a tool for taming the complexities of developing for large-scale cloud and grid computing, enabling the user to create distributed computations from the essentials -- the code that will execut

Why Dapper?

We live in interesting times, where breakthroughs in the sciences increasingly depend on the growing availability and abundance of commoditized, networked computational resources. With the help of the cloud or grid, computations that would otherwise run for days on a single desktop machine now have distributed and/or parallel formulations that can churn through, in a matter of hours, input sets ten times as large on a hundred machines. As alluring as the idea of strength in numbers may be, having just physical hardware is not enough -- a programmer has to craft the actual computation that will run on it. Consequently, the high value placed on human effort and creativity necessitates a programming environment that enables, and even encourages, succinct expression of distributed computations, and yet at the same time does not sacrifice generality.

Dapper, standing for Distributed and Parallel Program Execution Runtime, is one such tool for bridging the scientist/programmer's high level specifications that capture the essence of a program, with the low level mechanisms that reflect the unsavory realities of distributed and parallel computing. Under its dataflow-oriented approach, Dapper enables users to code locally in Java and execute globally on the cloud or grid. The user first writes codelets, or small snippets of code that perform simple tasks and do not, in themselves, constitute a complete program. Afterwards, he or she specifies how those codelets, seen as vertices in the dataflow, transmit data to each other via edge relations. The resulting directed acyclic dataflow graph is a complete program interpretable by the Dapper server, which, upon being contacted by long-lived worker clients, can coordinate a distributed execution.

Under the Dapper model, the user no longer needs to worry about traditionally ad-hoc aspects of managing the cloud or grid, which include handling data interconnects and dependencies, recovering from errors, distributing code, and starting jobs. Perhaps more importantly, it provides an entire Java-based toolchain and runtime for framing nearly all coarse-grained distributed computations in a consistent format that allows for rapid deployment and easy conveyance to other researchers.

Features:

  • A code distribution system that allows the Dapper server to transmit requisite program code over the network and have clients dynamically load it. A consequence of this is that, barring external executables, updates to Dapper programs need only happen on the server-side.
  • A powerful subflow embedding method for dynamically modifying the dataflow graph at runtime.
  • A runtime in vanilla Java, a language that many are no doubt familiar with. Aside from the requirement of a recent JVM and optionally Graphviz Dot, Dapper is self-contained.
  • A robust control protocol. The Dapper server expects any number of clients to fail, at any time, and has customizable re-execution and timeout policies to cope. Consequently, one can start and stop (long-lived) clients without fear of putting the entire system into an inconsistent state.
  • Flexible semantics that allow data transfers via files or TCP streams.
  • Interoperability with firewalls. Since your local cloud or grid probably sits behind a firewall, we have devised special semantics for streaming data transfers.
  • Liberal licensing terms. Dapper is released under the LGPL to prevent contamination of your codebase.
  • Operation as an embedded application. A user manual describes the programming API that users can follow to run the Dapper server inside an application like Apache Tomcat.
  • Operation as a standalone user interface. With it, one can run off-the-shelf demos and learn core concepts from visual examples. By following a minimal set of conventions, one can then bundle one's own Dapper programs as execution archives, and then get realtime dataflow status and debugging feedback.

What is new in this release:

  • The ServerLogic#closeIdleClients method has been changed to better match the user's intuitive notion of idleness.
  • A user option for specifying the server's hostname has been added.
  • Networking internals have been reworked to use new APIs.
  • The build process has been updated to support both 32- and 64-bit Windows cross-compilation.
  • The dapper.* hierarchy has been renamed to org.dapper.*.

What is new in version 0.96:

  • Added the FlowListener abstraction, so that users may now associate metadata with dataflows and their nodes.
  • Fixed a memory leak in the Dapper server.
  • Added apiviz doclet tags so that relationships among classes can be better visualized.
  • Added apiviz for visualization of dependencies in Javadoc.
  • Fixed a small bug in BuildAndTest executable.
  • The build process is now fully integrated with Apache Ivy. The source distribution now longer ships with the SST. Instead, dependencies are automatically downloaded; failing that, one may download the SST source, compile it, and publish to a local repository. See user manual for more details.
  • Updated build process and removed redundant steps.
  • Normalized copyright and license notices in all files.
  • Java 1.6 is now required to build and run.
  • Change build process to use Apache Ivy, which means that external dependencies no longer have to be packaged with the SST main distribution.
  • Added 'doxygen' target to build process so that native components can be documented.

What is new in version 0.95:

  • A new, flexible logging infrastructure has been added.
  • Initializers for logging structures have been moved out of the Server and Client classes and into drivers.
  • Finite state machines have been updated to the new annotation-driven API.
  • The source code has been normalized to have 8 spaces instead of tabs.

What is new in version 0.94:

  • Command line options for the client and server are now available, courtesy of the Apache Commons CLI library.
  • The client process lifecycle is now defined as ending when a disconnect from the server happens.
  • Stem generation functionality has moved from being a member method of OutputHandleResource to being a static method of CodeletUtilities.
  • The FlowNodeFactory class is used in favor of direct instantiation of FlowNodes.
  • Building of native components has migrated to the CMake.
  • Logging has migrated to SLF4J.
  • A README has been added to all distributions.

What is new in version 0.93:

  • Greatly improved pedagogical examples.
  • Updated manual.
  • Added convenience routines in dapper.codelet.CodeletUtilities for resource querying.
  • Removed Generator, FileEdge, FileBatchGenerator, FileBatchEdge, FileResource, and FileBatchResource. They have been replaced with the concept of abstract data handles in the form of HandleEdge, InputHandleResource, OutputHandleResource. See manual for changes.
  • Updated dapper.codelet.Resource to export input and output streams.

Requirements:

  • Java 2 Standard Edition Runtime Environment

Similar Software

AppScale
AppScale

18 Jul 15

Son of Grid Engine
Son of Grid Engine

19 Feb 15

salt
salt

20 Feb 15

Comments to Dapper Dataflow Engine

Comments not found
Add Comment
Turn on images!