Meandre: Semantic-Driven Data-Intensive Flows in the Clouds

Meandre: !
Semantic-Driven Data-Intensive !
Flows in the Clouds
Xavier Llorà!

National Center for Supercomputing Applications!
University of Illinois at Urbana-Champaign!

xllora@illinois.edu
The SEASR project and its Meandre infrastructure!
are sponsored by The Andrew W. Mellon Foundation

SEASR: Design Goals
•  Transparency
–  From a single laptop to a HPC cluster

–  Not bound to a particular computation fabric

–  Allow heterogeneous development

•  Intuitive programming paradigm
–  Modular Components assembled into Flows

–  Foster Collaboration and Sharing

•  Open Source
•  Service Orientated Architecture (SOA)

Meandre: Infrastructure
•  SEASR/Meandre Infrastructure:
–  Dataﬂow execution paradigm
–  Semantic-web driven
–  Web oriented
–  Supports publishing services
–  Promotes reuse, sharing, and collaboration


Meandre: Data Driven Execution
•  Execution Paradigms
–  Conventional programs perform computational tasks by
executing a sequence of instructions.
–  Data driven execution revolves around the idea of
applying transformation operations to a ﬂow or stream
of data when it is available.


Meandre: Dataﬂow Example

Value1
Sum
Value2


Meandre: Dataﬂow Example
•  Dataﬂow Addition Example
–  Logical Operation ‘+’
Value1
–  Requires two inputs
Sum
–  Produces one output
Value2

•  When two inputs are available
–  Logical operation can be preformed

–  Sum is output

•  When output is produced
–  Reset internal values

–  Wait for two new input values to become available

Meandre: The Dataﬂow Component
•  Data dictates component execution semantics

Inputs Outputs

Component

P

Descriptor in RDF! The component !
of its behavior
implementation

Meandre: Data Driven Execution
•  Dataflow Approach
–  May have zero to many inputs
–  May have zero to many outputs
–  Performs a logical operation when data is available

•  The component define its firing policy


Meandre: Component Metadata
•  Describes a component
•  Separates:
–  Components semantics (black box)
–  Components implementation (Java, Python, Lisp)

•  Provides a uniﬁed framework:
–  Basic building blocks or units (components)
–  Complex tasks (ﬂows)
–  Standardized metadata


Meandre: Semantic Web Concepts
•  Relies on the usage of the resource description framework
(RDF)
•  Provides a common framework to share and reuse data
across application, enterprise, and community boundaries
•  Focuses on common formats for integration and combination
of data drawn from diverse sources
•  Pays special attention to the language used for recording how
the data relates to real world objects
•  Allows navigation to sets of data resources that are
semantically connected.


Meandre: Metadata Ontologies
•  Meandre's metadata relies on three ontologies:
–  The RDF ontology serves as a base for deﬁning
Meandre descriptors
–  The Dublin Core Elements ontology provides basic
publishing and descriptive capabilities in the description
of Meandre descriptors
–  The Meandre ontology describes a set of relationships
that model valid components, as understood by the
Meandre execution engine architecture


Meandre: Components Types
•  Components are the basic building block of any
computational task.

•  There are two kinds of Meandre components:
–  Executable components

•  Perform computational tasks that require no human
interactions during runtime

•  Processes are initialized during flow startup and are fired when
in accordance to the policies defined for it.

–  Control components

•  Used to pause dataflow during user interaction cycles

•  WebUI may be a HTML Form, Applet, or Other user interface

Wrapping With Components
•  Component provides inputs, outputs, properties
•  You code
–  Inside!
–  Call from!
–  A WS front end
–  Interactive application

–  Request/response cycles

Meandre: Flow (Complex Tasks)
•  A ﬂow is a collection of connected components

Read
Merge
P

P

Show
Get
P
P

Do
P

Dataflow execution

Meandre: Programming Paradigm

•  The programming paradigm creates complex
tasks by linking together a bunch of specialized
components. Meandre's publishing mechanism
allows components develop by third parties to be
assembled in a new ﬂow.
•  There are two ways to develop ﬂows :
–  Meandre’s Workbench visual programming tool
–  Meandre’s ZigZag scripting language


Meandre: Workbench Existing Flow

Components

Flows

Locations


Meandre: ZigZag Script Language
•  ZigZag is a simple language for describing data-
intensive flows
–  Modeled on Python for simplicity.
–  ZigZag is declarative language for expressing the
directed graphs that describe flows.

•  Command-line tools allow ZigZag files to compile
and execute.
–  A compiler is provided to transform a ZigZag program
(.zz) into Meandre archive unit (.mau).
–  Mau(s) can then be executed by a Meandre engine.

•  ZigZag code that represents example ﬂow:
#
# Imports the three required components and creates the component aliases
#
import <http://localhost:1714/public/services/demo_repository.rdf>
alias <http://test.org/component/push_string> as PUSH
alias <http://test.org/component/concatenate-strings> as CONCAT
alias <http://test.org/component/print-object> as PRINT
#
# Creates four instances for the flow
#
push_hello, push_world, concat, print = PUSH(), PUSH(), CONCAT(), PRINT()
#
# Sets up the properties of the instances
#
push_hello.message, push_world.message = quot;Hello quot;, quot;world!quot;
#
# Describes the data-intensive flow
#
@phres, @pwres = push_hello(), push_world()
@cres = concat( string_one: phres.string; string_two: pwres.string )
print( object: cres.concatenated_string )
#

•  Automatic Parallelization
–  Multiple instances of a component could be run in parallel to boost
throughput.

–  Specialized operator available in ZigZag Scripting to cause multiple
instances of a given component to used
•  Consider a simple ﬂow example show in the diagram

•  The dataﬂow declaration would look like
#
#
@pu = push()
@pt = pass( string:pu.string )
print( object:pt.string )

–  Adding the operator [+AUTO] to middle component
#
@pu = push()
@pt = pass( string:pu.string ) [+AUTO]
print( object:pt.string )

–  [+AUTO] tells the ZigZag compiler to parallelize the “pass
component instance” by the number of cores available on
system.
–  [+AUTO] may also be written [+N] where N is an numeric
value to use for example [+10].


–  Adding the operator [+4] would result in a directed grap

# Describes the data-intensive flow # Describes the data-intensive flow
# #
@pu = push() @pu = push()
@pt = pass( string:pu.string ) [+4] @pt = pass( string:pu.string ) [+4!]
print( object:pt.string ) print( object:pt.string )


Meandre: Flows to MAU
•  Flows can be executed using their RDF
descriptors
•  Flows can be compiled into MAU
•  MAU is:
–  Self-contained representation
–  Ready for execution
–  Portable
–  The base of ﬂow execution in grid environments


And Behind The Scenes?
•  Architecture designed to scale
•  Infrastructure
–  Laptop
–  Server
–  Cluster

•  Tools
–  Talk to the infrastructure
–  Workbench, ZigZag

Meandre: The Architecture
•  The design of the Meandre architecture follows
three directives:
–  provide a robust and transparent scalable solution from
a laptop to large-scale clusters
–  create an uniﬁed solution for batch and interactive tasks
–  encourage reusing and sharing components

•  To ensure such goals, the designed architecture
relies on four stacked layers and builds on top of
service-oriented architectures (SOA)


Meandre: Basic Single Server


Meandre MDX: Cloud Computing
•  Servers can be
–  instantiated on demand
–  disposed when done or on demand

•  A cluster is formed by at least one server
•  The Meandre Distributed Exchange (MDX)
–  Orchestrates operational integrity by managing cluster
conﬁguration and membership using a shared database
resource.


Meandre MDX: The Picture
MDX Backbone 


Meandre MDX: The Architecture
•  Virtualization infrastructure
–  Provide a uniform access to the underlying execution
environment. It relies on virtualization of machines and
the usage of Java for hardware abstraction.

•  IO standardization
–  A uniﬁed layer provides access to shared data stores,
distributed ﬁle-system, specialized metadata stores,
and access to other service-oriented architecture
gateways.


Meandre MDX: The Architecture
•  Data-intensive ﬂow infrastructure
–  Provide the basic Meandre execution engine for data-
intensive ﬂows, component repositories and discovery
mechanisms, extensible plugins and web user
interfaces (webUIs).

•  Interaction layer
–  Can provide self-contained applications via webUIs,
create plugins for third-party services, interact with the
embedding application that relies on the Meandre
engine, or provide services to the cloud.


Meandre: Semantic-Driven Data-Intensive Flows in the Clouds

More Related Content

Similar to Meandre: Semantic-Driven Data-Intensive Flows in the Clouds

Similar to Meandre: Semantic-Driven Data-Intensive Flows in the Clouds (20)

More from Xavier Llorà

More from Xavier Llorà (20)

Recently uploaded

Recently uploaded (20)

Meandre: Semantic-Driven Data-Intensive Flows in the Clouds