A quick overview of the seed for Meandre 2.0 series. It covers the main motivations moving forward and the disruptive changes introduced via the use of Scala and MongoDB
The document discusses how Acquia Search improves upon Drupal's core search capabilities. It provides an overview of Acquia Search, highlighting features like faceted navigation, content recommendations, sorting results by relevance, and searching across multiple sites. Performance tests show Acquia Search returns results 3 to 16 times faster than Drupal core search. The document also outlines how Acquia Search works and is packaged and priced and compares managing Solr yourself versus using Acquia Search.
CloudStack provides an orchestration platform that abstracts physical network resources and allows third party plugins to integrate their networking services. It separates orchestration from actual provisioning, with CloudStack only handling orchestration events and notifications, while provisioning is handled by plugins. This allows services to scale independently of CloudStack. CloudStack defines common concepts like Networks, but plugins determine how these map to physical networks through interfaces like NetworkGuru. This architecture enables innovation from partners through well-defined plugin APIs and abstraction layers.
1. The document provides an overview of Windows Azure offerings including Compute, Storage, SQL Azure, Virtual Network, AppFabric, and Marketplace.
2. It discusses the "7 Deadly Sins of Cloud Development" including under utilization of cloud resources, platform monogamy, poorly defined release cadence, always connected assumptions, synchronous application design, lack of load/failover testing, and lack of cloud reading.
3. The document includes demos of various Windows Azure features to illustrate how to avoid the sins.
Service-Oriented Design and Implement with Rails3Wen-Tien Chang
The service implements a RESTful Users web service with Rails 3. It customizes Rails to remove unnecessary components and optimize for a lightweight REST service. The service follows best practices for API design including using JSON format, placing JSON conversion in the controller, and returning appropriate HTTP status codes. Requests are designed to be stateless and atomic. Errors are returned in a standardized JSON format.
Netflix on Cloud - combined slides for Dev and OpsAdrian Cockcroft
This document contains slides from a presentation given by Adrian Cockcroft on Netflix's use of cloud computing on Amazon Web Services (AWS). The summary includes:
1) Netflix moved most of its infrastructure to AWS to leverage AWS's scale and features rather than building its own datacenters, as capacity growth was unpredictable and datacenters were inflexible.
2) Netflix uses many AWS services including EC2, S3, EBS, EMR and more. It deployed a large movie encoding farm on EC2, stores content on S3, uses EMR/Hadoop for log analysis, and a CDN for content delivery.
3) Netflix has learned that cloud tools don't always scale for large
This document provides an overview of a tutorial on Apache CloudStack. It outlines 3 sessions on introducing CloudStack, its architecture, and hands-on with DevCloud. Session 1 defines cloud computing and introduces CloudStack as an open-source orchestration platform for delivering infrastructure as a service clouds. It describes CloudStack's history and how to contribute to the project.
Growing in the Wild. The story by CUBRID Database Developers.CUBRID
The presentation the CUBRID team presented at Russian Internet Technologies Conference in 2012. The presentation covers such questions as *WHY* CUBRID was developed, *WHY* the developers did not fork existing solutions, *WHY* it was necessary to develop a new RDBMS from scratch, and *HOW* CUBRID Database was evolved over the years.
Windows Phone 7 and Windows Azure – A Match Made in the CloudMichael Collier
Windows Phone 7 and Windows Azure are a good match because they both provide easy and familiar development environments, connectivity through the cloud, and scalability. They are compatible in these areas. The document discusses how Windows Phone 7 and Windows Azure can be used together through features like data storage in Windows Azure tables and blobs, push notifications, and identity management with Access Control Services. It provides examples of how to integrate the platforms for storing, retrieving, and displaying data stored in the cloud.
Session presented at the 2nd IndicThreads.com Conference on Cloud Computing held in Pune, India on 3-4 June 2011.
http://CloudComputing.IndicThreads.com
Abstract:“With increasing demand, ever-growing datasets, unpredictable traffic patterns and need for faster response times, “scalable architecture” has become a necessity. Here, we will see how the traditional concepts and best practices for scalability have to be adopted for the cloud. Further, we will go through the unique advantages that Amazon AWS cloud offers for architecting scalable applications. As an architect, you need to identify the components and bottlenecks in your architecture and modify your application to leverage the underlying scalability.
We will cover the following topics:
Scalability principles for the cloud
Leveraging AWS services for application components
Shared nothing architecture
Asynchronous work queues for loosely coupled applications
Database scalability
Tools, connectors and enablers to help build, deploy and monitor your cloud environment
Scalability using Platform-as-a-Service offerings on top of AWS
An example of a horizontally scalable architecture for an enterprise application on Amazon AWS
This talk will act as a primer for a cloud architect to achieve an auto-scalable, highly available, fully-monitored edge-cached application.”
Speaker:
Kalpak Shah is the Founder & CEO of Clogeny Technologies Pvt. Ltd. and guides the overall strategic direction of the company. Clogeny is focused on niche software and product development in cloud computing and scalable applications domains. He is passionate about the ground-breaking economics and technology afforded by the cloud computing platforms. He has been leading and architecting cutting-edge product development across the cloud stack including IaaS, PaaS and SaaS vendors.
He has previously worked at organizations like Sun Microsystems and Symantec in the storage domain primarily distributed and disk filesystems. Kalpak has a Bachelors’ of Engineering degree in computer engineering from PICT, University of Pune.
Node.js (Node) , the brain child of Ryan Dahl, was released in 2009 when he worked for Joyent, Inc. Node is one of the most hyped technologies to arrive on the web development scene, though it is also one of the most misunderstood.
So what is Node? Is it a programming language like Python, Java, or C++? Is it an application framework like Django, Rails, or Symphony? Is it maybe some type of middleware that can be plugged into existing web stacks like Memcached or RabbitMQ? Actually, it is none of the above. Node is simply a set of JavaScript language bindings to Google's powerful V8 engine. This begs the question: "what is a language binding and what is V8?"
This presentation introduces Node from an architectural perspective by discussing its implementation followed by a practical demonstration of how to build an application using it through a real-world example. Michael Filbin of Aspenware explains how Ryan liberated JavaScript from the browser and brought the power of event-driven, non-blocked programming to every developer by using the world's most popular programming language.
CloudStack is an open source cloud computing platform that allows users to manage their infrastructure as an automated system. It provides self-service access to computing resources like servers, storage, and networking via a web interface. CloudStack supports multiple hypervisors and public/private cloud deployment strategies. The core components include hosts, primary storage, clusters, pods, networks, secondary storage, and zones which are managed by CloudStack servers.
Cloud Computing from an Entrpreneur's ViewpointJ Singh
Cloud computing allows users to access computing resources like servers and storage over the internet. It provides on-demand self-service, ubiquitous network access, resource pooling and rapid elasticity. Companies can start small without large capital expenditures and scale easily. Major players in cloud computing include Amazon, Google, Microsoft and IBM. Amazon EC2 allows users to launch virtual machines while S3 provides storage services. Google App Engine uses a virtual operating system and datastore for applications. Cloud computing enables massive parallelism for data-intensive tasks.
- CloudStack is an open source cloud computing platform that was donated to the Apache Software Foundation in 2012. It provides infrastructure as a service and supports various hypervisors and physical hardware.
- CloudStack has a scalable architecture designed to support thousands of hosts and VMs across multiple availability zones. It provides rich networking and storage capabilities.
- CloudStack can support both traditional server virtualization workloads as well as "Amazon-style" workloads with software defined networks and object storage.
- The CloudStack community is growing rapidly and encourages participation through mailing lists, IRC, forums and meetup groups.
JUDCon London 2011 - Elastic SOA on the Cloud, Steve MillidgeC2B2 Consulting
This document discusses deploying JBoss Enterprise Service Bus (ESB) for elasticity on the cloud. It provides an overview of service-oriented architecture (SOA) and cloud computing. The key benefits of SOA on the cloud are cloud integration and cloud bursting. Challenges to elasticity include networking, service discovery, clustering, and monitoring. JBoss Operations Network (RHQ) can help with elasticity by monitoring metrics, alerting on thresholds, and triggering the addition of cloud instances through APIs when needed. Local JMS queues provide advantages for elasticity over clustered JMS.
NoSQL and SQL - Why Choose? Enjoy the best of both worlds with MySQLAndrew Morgan
Theres a lot of excitement around NoSQL Data Stores with the promise of simple access patterns, flexible schemas, scalability and High Availability. The downside comes in the form of losing ACID transactions, consistency, flexible queries and data integrity checks. What if you could have the best of both worlds? This session shows how MySQL Cluster provides simultaneous SQL and native NoSQL access to your data whether a simple key-value API (Memcached), REST, JavaScript, Java or C++. You will hear how the MySQL Cluster architecture delivers in-memory real-time performance, 99.999% availability, on-line maintenance and linear, horizontal scalability through transparent auto-sharding.
The document summarizes CloudStack architecture plans for the future. It discusses moving to management server clusters per availability zone rather than per region. It also discusses using an object storage system for templates and snapshots rather than a separate NFS server. Finally, it discusses a possible future model where CloudStack manages existing virtualization clusters rather than deploying and managing its own system VMs.
This talk lays out the elements of an extension including the content model, JS API, Web Scripts, Content Policies, Action Executors, Web Scripts and more. This will draw on years of experience delivering extensions to various projects.
There is a code sample in github: https://github.com/rmknightstar/devcon2018
You can see the presentation as given at the Alfresco Developer Conference here : https://youtu.be/CKRswhh-jHE?list=PLyJdWuUHM3igOUt49uiFqs-6DCQAgJ1vs&t=0
Visualizing content in metadata storesXavier Llorà
This document describes a system for visualizing content stored in metadata stores. It includes a query abstraction layer that separates querying logic from application logic and allows querying metadata stores via web services. Query results are transformed and then visualized using tools like Prefuse, JFreeChart and Swing. The overall system includes components for metadata storage, query transport, query abstraction, result transformation, and visualization, with the prototype called Gview wrapping these components together.
From Galapagos to Twitter: Darwin, Natural Selection, and Web 2.0Xavier Llorà
One hundred and fifty years have passed since the publication of Darwin's world-changing manuscript "The Origins of Species by Means of Natural Selection". Darwin's ideas have proven their power to reach beyond the biology realm, and their ability to define a conceptual framework which allows us to model and understand complex systems. In the mid 1950s and 60s the efforts of a scattered group of engineers proved the benefits of adopting an evolutionary paradigm to solve complex real-world problems. In the 70s, the emerging presence of computers brought us a new collection of artificial evolution paradigms, among which genetic algorithms rapidly gained widespread adoption. Currently, the Internet has propitiated an exponential growth of information and computational resources that are clearly disrupting our perception and forcing us to reevaluate the boundaries between technology and social interaction. Darwin's ideas can, once again, help us understand such disruptive change. In this talk, I will review the origin of artificial evolution ideas and techniques. I will also show how these techniques are, nowadays, helping to solve a wide range of applications, from life science problems to twitter puzzles, and how high performance computing can make Darwin ideas a routinary tool to help us model and understand complex systems.
A hierarchical security framework for defending against sophisticated attacks...redpel dot com
A hierarchical security framework for defending against sophisticated attacks on wireless sensor networks in smart cities
for more ieee paper / full abstract / implementation , just visit www.redpel.com
Large Scale Data Mining using Genetics-Based Machine LearningXavier Llorà
We are living in the peta-byte era.We have larger and larger data to analyze, process and transform into useful answers for the domain experts. Robust data mining tools, able to cope with petascale volumes and/or high dimensionality producing human-understandable solutions are key on several domain areas. Genetics-based machine learning (GBML) techniques are perfect candidates for this task, among others, due to the recent advances in representations, learning paradigms, and theoretical modeling. If evolutionary learning techniques aspire to be a relevant player in this context, they need to have the capacity of processing these vast amounts of data and they need to process this data within reasonable time. Moreover, massive computation cycles are getting cheaper and cheaper every day, allowing researchers to have access to unprecedented parallelization degrees. Several topics are interlaced in these two requirements: (1) having the proper learning paradigms and knowledge representations, (2) understanding them and knowing when are they suitable for the problem at hand, (3) using efficiency enhancement techniques, and (4) transforming and visualizing the produced solutions to give back as much insight as possible to the domain experts are few of them.
This tutorial will try to answer this question, following a roadmap that starts with the questions of what large means, and why large is a challenge for GBML methods. Afterwards, we will discuss different facets in which we can overcome this challenge: Efficiency enhancement techniques, representations able to cope with large dimensionality spaces, scalability of learning paradigms. We will also review a topic interlaced with all of them: how can we model the scalability of the components of our GBML systems to better engineer them to get the best performance out of them for large datasets. The roadmap continues with examples of real applications of GBML systems and finishes with an analysis of further directions.
Q con london2011-matthewwall-whyichosemongodbforguardiancoukRoger Xia
The document discusses the author's experience with database selection and migration for the guardian.co.uk website, describing how they evolved from a relational database to a NoSQL database (MongoDB) to address scalability and flexibility issues. It outlines the benefits of MongoDB for the author's use case, including flexible schema, simple data model mapping, and ease of queries. MongoDB was selected to build a new identity management system after issues were encountered with scaling the existing relational database solution.
MongoDB provides high performance for write-intensive workloads, with the ability to handle 80,000 inserts per second on a single node. It offers easy replication and high availability. For large datasets, MongoDB scales horizontally using a sharding architecture with query routers, config servers, and shards. MongoDB ensures data security using encryption, access control, and authentication features.
There is a high demand for companies to publish and promote their content on the web. To accommodate this demand Alfresco has provided a number of solutions covering editorial to web tier. As an example of this demand Ixxus was commissioned by a leading business information publisher to produce a microsite for ‘teaser’ content to increase subscriptions of their main site. To deliver this Ixxus utilized a number of features provided by Alfresco, such as services like the Transfer Service, the web scripts framework, and Surf. The majority of these features now make up the mainstay of Alfresco’s Web Quick Start WCM solution. The goal of this session is to demonstrate a real world example of how the combination of Alfresco, Surf and CMIS offers a great platform for developers to produce content-rich websites quickly. The session will cover: Using Spring Roo to construct a Surf application, Benefits of using Spring Surf, Using the Transfer Service, OpenCMIS in Surf, Varnish your Surf application, and What’s next
Conceptos básicos. Seminario web 6: Despliegue de producciónMongoDB
Este es el último seminario web de la serie Conceptos básicos, en la que se realiza una introducción a la base de datos MongoDB. En este seminario web le guiaremos por el despliegue en producción.
MongoDB-as-a-Service on Pivotal Cloud FoundryVMware Tanzu
SpringOne Platform 2016'
Speakers: Mallika Iyer; Principal Software Engineer, Pivotal & Sam Weaver; Product Manager, MongoDB
The ability to provide your organization with multiple data services on a platform like Pivotal Cloud Foundry is very powerful, and increases the agility of the organization as a whole, when developers are able to provision data services on demand, and all of this is completely transparent to the system operators. This session will cover a very brief overview of Pivotal Cloud Foundry, and will then deep dive into running MongoDB as a managed service on this platform. The MongoDB service for Pivotal Cloud Foundry leverages the capabilities of Bosh 2.0 for on-demand-dynamic provisioning for services while maintaining an integration with MongoDB's Cloud Ops Manager, to provide the best of both - Pivotal Cloud Foundry and MongoDB.
This presentation covers best practices for running MongoDB on AWS. We also discuss how to utilize the automation features of MMS to spin up new clusters in minutes on AWS.
Scalable Event Analytics with MongoDB & Ruby on RailsJared Rosoff
The document discusses scaling event analytics applications using Ruby on Rails and MongoDB. It describes how the author's startup initially used a standard Rails architecture with MySQL, but ran into performance bottlenecks. It then explores solutions like replication, sharding, key-value stores and Hadoop, but notes the development challenges with each approach. The document concludes that using MongoDB provided scalable writes, flexible reporting and the ability to keep the application as "just a Rails app". MongoDB's sharding allows scaling to high concurrency levels with increased storage and transaction capacity across shards.
This document provides an overview of MongoDB sharding. It begins with definitions of key terms like shards, chunks, config servers, and mongos. It explains how MongoDB partitions and distributes data across shards. The roles of config servers and mongos routers are outlined. Guidelines for choosing a shard key are presented, emphasizing characteristics like cardinality, write distribution, and query isolation. Best practices for setting up and using MongoDB sharding are also covered.
This document discusses MongoDB and provides information on why it is useful, how it works, and best practices. Specifically, it notes that MongoDB is a noSQL database that is easy to use, scalable, and supports high performance and availability. It is well-suited for flexible schemas, embedded documents, and complex relationships. The document also covers topics like BSON, CRUD operations, indexing, map reduce, transactions, replication, and sharding in MongoDB.
Benchmarking, Load Testing, and Preventing Terrible DisastersMongoDB
"Have you ever crossed your fingers before performing an upgrade or switching storage engines, because you weren't quite sure what would happen? Have you ever been bitten by a slight change in behavior that turned out to be unexpectedly significant for your workload? At Parse we have developed a workflow that lets us repeatedly capture and replay real production workloads offline. This has allowed us to confidently perform upgrades across a large fleet with a minimum amount of canarying, and has helped us load test a variety of storage engines with real workloads so we can compare and understand the performance tradeoffs.
In this talk we will cover best practices for upgrades and migrations, and we will walk through how to use our open-sourced tooling to demonstrate how you can do the same. We will also share some fun war stories about various disasters found and averted *before* putting them into production thanks to offline benchmarking."
Life in a Queue - Using Message Queue with djangoTareque Hossain
Brief introduction on message queue and how its relevant in web applications
How to tell if your web application could benefit from message queue
Common example of tasks that could benefit from message queues
Choosing a broker/protocol
What broker/protocol PBS Education chose and why
Message queue solution architecture
Brief introduction on celery/carrot
Writing a message queue task using celery
How to invoke a message queue taks
What happens when you invoke a task (walk through architecture)
How to write tasks efficiently
What are the things that are good to know when writing tasks (things we experienced at PBS Education)
Building Node.js based APIs in minutes. Achieve full-stack JavaScript, Offline Sync, Geolocation, REST API / JSON, ORM and API Management in open source. Write your own connectors, work on express.js. Create MEAN stack applications connecting Angular to Node to MongoDB. Presented at the Connect.js conference in Atlanta
In the world of social gaming, the classic 2-tier of web application does not cut it anymore. We need new and better solutions.
Follow along the evolution of game servers at Wooga and get an in-depth look into the next-generation backend putting the combined forces of Erlang and Ruby to work. Learn how scalability, reliability, concurrency control and beautiful code do not need to be mutually exclusive.
At Scale With Style (Erlang User Conference 2012)Wooga
In the world of social gaming, the classic 2-tier of web application does not cut it anymore. We need new and better solutions.
Follow along the evolution of game servers at Wooga and get an in-depth look into the next-generation backend putting the combined forces of Erlang and Ruby to work. Learn how scalability, reliability, concurrency control and beautiful code do not need to be mutually exclusive.
The document discusses NoSQL databases and MongoDB, describing NoSQL as a non-relational, schema-less data storage engine that is easy to scale, lightweight, and robust. It provides an overview of MongoDB, explaining that it bridges the gap between key-value stores and SQL databases by being fast, scalable, and supporting rich functionality like map-reduce. Several myths about NoSQL are debunked and Python ORMs for MongoDB implementation are referenced.
Liberty: The Right Fit for Micro Profile?Dev_Events
Kevin Sutter, Senior Technical Staff Member, IBM @kwsutter
Alasdair Nottingham, Websphere Runtime Architect, IBM @notatibm
The move to microservices is well under way, but has enterprise Java adapted to these new realities? Although some argue that enterprise Java is irrelevant, many of its tried-and-proven APIs are highly applicable to microservice architectures. And the need for new APIs to address challenges inherent in highly distributed microservices is clear. The recent announcement of the Micro Profile initiative (microprofile.io) to define new application server portable APIs means that these needs will be addressed. This session explores what Micro Profile is, how it can help with microservices, and how WebSphere Liberty’s à la carte approach to Java EE can help enable microservices by using the new Micro Profile and ldemo Liberty plus the microProfile-1.0 feature.
MongoSF 2011 - Using MongoDB for IGN's Social PlatformManish Pandit
Using MongoDB for IGN’s Social Platform
IGN uses MongoDB to power its social platform which receives 30M API calls and 7M activities daily. MongoDB is used to store activity streams, comments, notifications and other social data. Some challenges include large amounts of data, sorting activities in reverse order, and filtering activities. Caching of activity streams in Memcached improved performance. Monitoring, backups, and tools like MMS are used to manage the MongoDB deployment. Future plans include moving more data to MongoDB and sharding relationships across servers.
Spring Framework provides a comprehensive infrastructure to develop Java applications. It handles dependency injection and inversion of control so developers can focus on domain logic rather than plumbing. Spring promotes best practices like clean code, test-driven development, and design patterns. It includes aspects for cross-cutting concerns and supports web development through frameworks like Spring MVC. The document introduces Spring's lightweight and modular IoC container and AOP capabilities for separating concerns.
Data-Intensive Computing for Competent Genetic Algorithms: A Pilot Study us...Xavier Llorà
Data-intensive computing has positioned itself as a valuable programming paradigm to efficiently approach problems requiring processing very large volumes of data. This paper presents a pilot study about how to apply the data-intensive computing paradigm to evolutionary computation algorithms. Two representative cases (selectorecombinative genetic algorithms and estimation of distribution algorithms) are presented, analyzed, and discussed. This study shows that equivalent data-intensive computing evolutionary computation algorithms can be easily developed, providing robust and scalable algorithms for the multicore-computing era. Experimental results show how such algorithms scale with the number of available cores without further modification.
Scalabiltity in GBML, Accuracy-based Michigan Fuzzy LCS, and new TrendsXavier Llorà
The document summarizes a presentation given by Jorge Casillas on research related to scaling up genetic learning algorithms and fuzzy classifier systems. Specifically, it discusses:
1. An approach using evolutionary instance selection and stratification to extract rule sets from large datasets that balance prediction accuracy and interpretability.
2. Fuzzy-XCS, an accuracy-based genetic fuzzy system the author is developing that uses competitive fuzzy inference and represents rules as disjunctive normal forms to address challenges in credit assignment.
3. Open problems and opportunities in applying genetic learning at large scales, such as addressing chromosome size and efficient evaluation over large datasets.
Pittsburgh Learning Classifier Systems for Protein Structure Prediction: Sca...Xavier Llorà
This document summarizes research using a Pittsburgh Learning Classifier System (LCS) called GAssist to predict protein structure by determining coordination numbers (CN). The researchers tested GAssist on a dataset of over 250,000 protein residues, comparing it to support vector machines, Naive Bayes, and C4.5 decision trees. While support vector machines achieved the best accuracy, GAssist produced more interpretable and compact rule sets at the cost of lower performance. The researchers analyzed the interpretability and scalability of GAssist for this challenging bioinformatics problem, identifying avenues for improving its accuracy while maintaining explanatory power.
Learning Classifier Systems for Class Imbalance ProblemsXavier Llorà
The document discusses learning classifier systems (LCS) for addressing class imbalance problems in datasets. It aims to enhance the applicability of LCS to knowledge discovery from real-world datasets that often exhibit class imbalance, where one class is represented by significantly fewer examples than other classes. The author proposes adapting parameters of the XCS learning classifier system, such as learning rate and genetic algorithm threshold, based on estimated class imbalance ratios within classifiers' niches in order to minimize bias towards majority classes and better handle small disjuncts representing minority classes.
XCS: Current capabilities and future challengesXavier Llorà
The document discusses the XCS classifier system, which uses a combination of gradient-based techniques and evolutionary algorithms to learn predictive models from complex problems. It summarizes XCS's current capabilities in classification, function approximation, and reinforcement learning tasks. However, it notes there are still challenges to improve XCS's representations and operators, niching abilities, handling of dynamic problems, solution compactness, and development of hierarchical classifier systems.
Computed Prediction: So far, so good. What now?Xavier Llorà
This document discusses computed prediction in learning classifier systems (LCS). It addresses representing the payoff function Q(s,a) that maps state-action pairs to expected future payoffs. Specifically:
1) In computed prediction, each classifier has parameters w and the classifier prediction is computed as a parametrized function p(x,w) like a linear approximation.
2) Classifier weights are updated using the Widrow-Hoff rule online as the payoff function is learned.
3) Using a powerful approximator like tile coding to compute predictions allows the problem to potentially be solved by a single classifier, but evolution of different approximators per problem subspace may still
This document provides information about the NCSA/IlliGAL Gathering on Evolutionary Learning (NIGEL 2006) conference. It discusses how the conference originated from a previous 2003 gathering. It thanks the organizers and participants and provides details about the agenda, which includes presentations on topics like classifier systems and discussions around applications and techniques of evolutionary learning.
Linkage Learning for Pittsburgh LCS: Making Problems TractableXavier Llorà
Presentation by Xavier Llorà, Kumara Sastry, & David E. Goldberg showing how linkage learning is possible on Pittsburgh style learning classifier systems
Meandre: Semantic-Driven Data-Intensive Flows in the CloudsXavier Llorà
- Meandre is a semantic-driven data-intensive workflow infrastructure for distributed computing. It allows users to assemble modular components into complex workflows (flows) in a visual programming tool or using a scripting language called ZigZag.
- Workflows are composed of components, which can be executable or control components. Executable components perform computational tasks when data is available, while control components pause workflows for user interactions. Components are described semantically using ontologies to separate functionality from implementation.
- Data availability drives workflow execution in Meandre. When required inputs are available, components will fire and produce outputs to make data available for downstream components. This dataflow approach aims to make workflows transparent, intuitive, and reusable across
ZigZag is a new language for describing data-intensive workflows. It aims to make the Meandre infrastructure easier to use by allowing users to assemble complex data flows. The language has a new syntax and compiles workflows that can then be run on Meandre to process large datasets.
Do not Match, Inherit: Fitness Surrogates for Genetics-Based Machine Learning...Xavier Llorà
A byproduct benefit of using probabilistic model-building genetic algorithms is the creation of cheap and accurate surrogate models. Learning classifier systems---and genetics-based machine learning in general---can greatly benefit from such surrogates which may replace the costly matching procedure of a rule against large data sets. In this paper we investigate the accuracy of such surrogate fitness functions when coupled with the probabilistic models evolved by the x-ary extended compact classifier system (xeCCS). To achieve such a goal, we show the need that the probabilistic models should be able to represent all the accurate basis functions required for creating an accurate surrogate. We also introduce a procedure to transform populations of rules based into dependency structure matrices (DSMs) which allows building accurate models of overlapping building blocks---a necessary condition to accurately estimate the fitness of the evolved rules.
Towards Better than Human Capability in Diagnosing Prostate Cancer Using Infr...Xavier Llorà
Cancer diagnosis is essentially a human task. Almost universally, the process requires the extraction of tissue (biopsy) and examination of its microstructure by a human. To improve diagnoses based on limited and inconsistent morphologic knowledge, a new approach has recently been proposed that uses molecular spectroscopic imaging to utilize microscopic chemical composition for diagnoses. In contrast to visible imaging, the approach results in very large data sets as each pixel contains the entire molecular vibrational spectroscopy data from all chemical species. Here, we propose data handling and analysis strategies to allow computer-based diagnosis of human prostate cancer by applying a novel genetics-based machine learning technique ({\tt NAX}). We apply this technique to demonstrate both fast learning and accurate classification that, additionally, scales well with parallelization. Preliminary results demonstrate that this approach can improve current clinical practice in diagnosing prostate cancer.
This presentation covers a brief overview of the current stage of the DISCUS project. General overview and introduction to some of the currently available tools
Challenges and Strategies of Digital Transformation.pptxwisdomfishlee
In an era where digital innovation is ubiquitous, executives from various corporations frequently seek insights into the tangible benefits that digital transformation can offer. This document outlines a comprehensive framework that elucidates the concept of digital transformation, highlighting its multifaceted dimensions and the pivotal roles it plays in enhancing business competitiveness.
Demystifying Neural Networks And Building Cybersecurity ApplicationsPriyanka Aash
In today's rapidly evolving technological landscape, Artificial Neural Networks (ANNs) have emerged as a cornerstone of artificial intelligence, revolutionizing various fields including cybersecurity. Inspired by the intricacies of the human brain, ANNs have a rich history and a complex structure that enables them to learn and make decisions. This blog aims to unravel the mysteries of neural networks, explore their mathematical foundations, and demonstrate their practical applications, particularly in building robust malware detection systems using Convolutional Neural Networks (CNNs).
Cracking AI Black Box - Strategies for Customer-centric Enterprise ExcellenceQuentin Reul
The democratization of Generative AI is ushering in a new era of innovation for enterprises. Discover how you can harness this powerful technology to deliver unparalleled customer value and securing a formidable competitive advantage in today's competitive market. In this session, you will learn how to:
- Identify high-impact customer needs with precision
- Harness the power of large language models to address specific customer needs effectively
- Implement AI responsibly to build trust and foster strong customer relationships
Whether you're at the early stages of your AI journey or looking to optimize existing initiatives, this session will provide you with actionable insights and strategies needed to leverage AI as a powerful catalyst for customer-driven enterprise success.
The History of Embeddings & Multimodal EmbeddingsZilliz
Frank Liu will walk through the history of embeddings and how we got to the cool embedding models used today. He'll end with a demo on how multimodal RAG is used.
Smart mobility refers to the integration of advanced technologies and innovative solutions to create efficient, sustainable, and interconnected transportation systems. It encompasses various aspects of transportation, including public transit, shared mobility services, intelligent transportation systems, electric vehicles, and connected infrastructure. Smart mobility aims to improve the overall mobility experience by leveraging data, connectivity, and automation to enhance safety, reduce congestion, optimize transportation networks, and minimize environmental impacts.
Keynote : AI & Future Of Offensive SecurityPriyanka Aash
In the presentation, the focus is on the transformative impact of artificial intelligence (AI) in cybersecurity, particularly in the context of malware generation and adversarial attacks. AI promises to revolutionize the field by enabling scalable solutions to historically challenging problems such as continuous threat simulation, autonomous attack path generation, and the creation of sophisticated attack payloads. The discussions underscore how AI-powered tools like AI-based penetration testing can outpace traditional methods, enhancing security posture by efficiently identifying and mitigating vulnerabilities across complex attack surfaces. The use of AI in red teaming further amplifies these capabilities, allowing organizations to validate security controls effectively against diverse adversarial scenarios. These advancements not only streamline testing processes but also bolster defense strategies, ensuring readiness against evolving cyber threats.
BLOCKCHAIN TECHNOLOGY - Advantages and DisadvantagesSAI KAILASH R
Explore the advantages and disadvantages of blockchain technology in this comprehensive SlideShare presentation. Blockchain, the backbone of cryptocurrencies like Bitcoin, is revolutionizing various industries by offering enhanced security, transparency, and efficiency. However, it also comes with challenges such as scalability issues and energy consumption. This presentation provides an in-depth analysis of the key benefits and drawbacks of blockchain, helping you understand its potential impact on the future of technology and business.
LeadMagnet IQ Review: Unlock the Secret to Effortless Traffic and Leads.pdfSelfMade bd
Imagine being able to generate high-quality traffic and leads effortlessly. Sounds like a dream, right? Well, it’s not. It’s called LeadMagnet IQ, and it’s here to revolutionize your marketing efforts.
(Note: Download the paper about this software. After that, click on [Click for Instant Access] inside the paper, and it will take you to the sales page of the product.)
Top 12 AI Technology Trends For 2024.pdfMarrie Morris
Technology has become an irreplaceable component of our daily lives. The role of AI in technology revolutionizes our lives for the betterment of the future. In this article, we will learn about the top 12 AI technology trends for 2024.
Garbage In, Garbage Out: Why poor data curation is killing your AI models (an...Zilliz
Enterprises have traditionally prioritized data quantity, assuming more is better for AI performance. However, a new reality is setting in: high-quality data, not just volume, is the key. This shift exposes a critical gap – many organizations struggle to understand their existing data and lack effective curation strategies and tools. This talk dives into these data challenges and explores the methods of automating data curation.
Choosing the Best Outlook OST to PST Converter: Key Features and Considerationswebbyacad software
When looking for a good software utility to convert Outlook OST files to PST format, it is important to find one that is easy to use and has useful features. WebbyAcad OST to PST Converter Tool is a great choice because it is simple to use for anyone, whether you are tech-savvy or not. It can smoothly change your files to PST while keeping all your data safe and secure. Plus, it can handle large amounts of data and convert multiple files at once, which can save you a lot of time. It even comes with 24*7 technical support assistance and a free trial, so you can try it out before making a decision. Whether you need to recover, move, or back up your data, Webbyacad OST to PST Converter is a reliable option that gives you all the support you need to manage your Outlook data effectively.
Uncharted Together- Navigating AI's New Frontiers in LibrariesBrian Pichman
Journey into the heart of innovation where the collaborative spirit between information professionals, technologists, and researchers illuminates the path forward through AI's uncharted territories. This opening keynote celebrates the unique potential of special libraries to spearhead AI-driven transformations. Join Brian Pichman as we saddle up to ride into the history of Artificial Intelligence, how its evolved over the years, and how its transforming today's frontiers. We will explore a variety of tools and strategies that leverage AI including some new ideas that may enhance cataloging, unlock personalized user experiences, or pioneer new ways to access specialized research. As with any frontier exploration, we will confront shared ethical challenges and explore how joint efforts can not only navigate but also shape AI's impact on equitable access and information integrity in special libraries. For the remainder of the conference, we will equip you with a "digital compass" where you can submit ideas and thoughts of what you've learned in sessions for a final reveal in the closing keynote.
kk vathada _digital transformation frameworks_2024.pdfKIRAN KV
I'm excited to share my latest presentation on digital transformation frameworks from industry leaders like PwC, Cognizant, Gartner, McKinsey, Capgemini, MIT, and DXO. These frameworks are crucial for driving innovation and success in today's digital age. Whether you're a consultant, director, or head of digital transformation, these insights are tailored to help you lead your organization to new heights.
🔍 Featured Frameworks:
PwC's Framework: Grounded in Industry 4.0 with a focus on data and analytics, and digitizing product and service offerings.
Cognizant's Framework: Enhancing customer experience, incorporating new pricing models, and leveraging customer insights.
Gartner's Framework: Emphasizing shared understanding, leadership, and support teams for digital excellence.
McKinsey's 4D Framework: Discover, Design, Deliver, and De-risk to navigate digital change effectively.
Capgemini's Framework: Focus on customer experience, operational excellence, and business model innovation.
MIT’s Framework: Customer experience, operational processes, business models, digital capabilities, and leadership culture.
DXO's Framework: Business model innovation, digital customer experience, and digital organization & process transformation.
It's your unstructured data: How to get your GenAI app to production (and spe...Zilliz
So you've successfully built a GenAI app POC for your company -- now comes the hard part: bringing it to production. Aparavi addresses the challenges of AI projects while addressing data privacy and PII. Our Service for RAG helps AI developers and data scientists to scale their app to 1000s to millions of users using corporate unstructured data. Aparavi’s AI Data Loader cleans, prepares and then loads only the relevant unstructured data for each AI project/app, enabling you to operationalize the creation of GenAI apps easily and accurately while giving you the time to focus on what you really want to do - building a great AI application with useful and relevant context. All within your environment and never having to share private corporate data with anyone - not even Aparavi.
Computer HARDWARE presenattion by CWD students class 10
Meandre 2.0 Alpha Preview
1. Xavier Llorà
Data-Intensive Technologies and Applications
National Center for Supercomputing Applications
University of Illinois at Urbana-Champaign
xllora@illinois.edu
2. • Great feedback and lessons learned from 1.4.X series
• Hot topics on 1.4.X
• Complex concurrency model based on traditional
semaphores written in Java
• Server performance bounded by JENA’s persistent
model implementation
• State caching on individual servers increase
complexity of single-image clusters
• Cloud-deployable, but not cloud-friendly
3. • How 1.5 efforts turned into 2.0?
• Cloud-friendly infrastructure required rethinking
core functionalities
• Drastic redesign of backend state storage
• Revisited execution engine to support distributed
flow execution
• Changes on the API that will rendered returned
JSON documents incompatible with 1.4.X
5. • Rewritten from scratch in Scala
• RDBMS backend via Jena/JDBC has been dropped
• MongoDB for state management and scalability
• Meandre 2.0 server is stateless
• Meandre API revised
• Revised response documents
• Simplified API (reduced the number of services)
• New job API
6. • New HTML interaction interface
• Off-the-shelf full-fledged single-image cluster
• Revised flow execution lifecycle: Queued, Preparing,
Running, Done, Failed, Killed, Aborted
• Flow execution as a separate spawned process.
Multiple execution engines are available
• Running flows can be killed on demand
• Rewritten execution engine (Snowfield)
• Support for distributed flow fragment execution
8. • MongoDB bridges the gap between
• Key-value stores (which are fast and highly
scalable)
• Traditional RDBMS systems (which provide rich
queries and deep functionality)
• MongoDB supports replication of data between
servers for failover and redundancy
• MongoDB is designed to scale horizontally via an
auto-sharding permitting the development of large-
9. • Fast REST API prototyping and development for Scala
• Built on the top of Jetty (http://jetty.codehaus.org/
jetty/)
• Enables quick prototyping of REST APIs
• Provides a simple DSL build on Scala
• Developed to support the development of Meandre
2.0
• http://github.com/xllora/Crochet
10. import crochet._
new Crochet {
get(“/message”,“text/plain”) { “Hello World!” }
} serving “./static_content” as “/static” on 8080
Get your server up and running by running
$ scala
-cp crochet-0.1.4.jar:crochet-3dparty-libraries-0.1.X.jar
hello-world-with-static.scala
11. • Notification fabric for distributed Scala applications
• Back ended on MongoDB for scalability
• Snare monitors developed for Meandre 2.0
• Track activity via heartbeat
• Provide messaging between monitors and global
broadcasting of BSON objects
• Basic monitoring over HTTP via Crochet
• http://github.com/xllora/Snare
12. scala> import snare.tools.Implicits._
scala> val monitors = (1 to 3).toList.map(
i => snare.Snare(
"me_"+i,
“my_pool”,
(o)=>{println(o);true}
)
)
scala> monitors.map(_.activity=true)
2010.01.28 16:47:05.222::INFO:[EVTL] Notification event loop engaged for 230815e0-30cc-3afe-99ac-936d497d1282
2010.01.28 16:47:05.231::INFO:[EVTL] Notification event loop engaged for baec232f-d74d-3fd1-ad3a-caf362f58b7d
2010.01.28 16:47:05.236::INFO:[EVTL] Notification event loop engaged for d057fcde-fd10-3edd-9fd2-cfe464c6971c
2010.01.28 16:47:08.136::INFO:[HRTB] Heartbeat engaged for baec232f-d74d-3fd1-ad3a-caf362f58b7d
2010.01.28 16:47:08.136::INFO:[HRTB] Heartbeat engaged for 230815e0-30cc-3afe-99ac-936d497d1282
2010.01.28 16:47:08.136::INFO:[HRTB] Heartbeat engaged for d057fcde-fd10-3edd-9fd2-cfe464c6971c
scala> monitors(0).broadcast("""{"msg":“Fooo!!!”}""")
scala> monitors(0).notifyPeer(
“230815e0-30cc-3afe-99ac-936d497d1282”,
"""{"msg":“Fooo!!!”}"""
)
14. • Meandre 2.0 requires at least 2 separate services
running
• A MongoDB for shared state storage and
management
• A Meandre server to provide services (via Crochet)
and facilitate execution (customizable execution
engines)
• A single-image Meandre cluster scales horizontally
by adding new Meandre servers and sharding the
MongoDB store
15. • Can be broken in three basic functional units
1. The Meandre server (main activity coordinator)
2. The MongoDB store (holds all server state, job
related information, and system information)
3. Meandre customizable executor (in charge of flow
execution allowing selection of multiple
execution engines)
16. Crochet
Server
State API Snare Monitor Job Manager API
User Info, Profiles & Roles •Execution coordination
Repositories •Spawn external jobs for execution
•Customizable execution engine
Unified Job Queue
•On job running per server
Job Consoles and Logs •Allow consuming all server
Snare Cluster Status & Heartbeat resources
17. • A cluster is formed by one or more Meandre servers
• MongoDB scalability can support tens of Meandre
servers with a single instance
• Adding more Meandre servers allows:
• Provide web service load balance
• Fault tolerance
• Improving the throughput of job execution
(number of concurrent jobs is equal to the number
of Meandre servers in the cluster)
18. Crochet
Server
State API
Balancing
Snare Monitor
Job Manager API
Crochet
Server
Load
State API
Snare Monitor
Job Manager API
User Info, Profiles & Roles
Crochet
Repositories
Server
Unified Job Queue
State API
Job Consoles and Logs
Snare Monitor
Snare Cluster Status & Heartbeat Job Manager API
19. • A single image cluster can be scaled out by relying
on MongoDB
• MongoDB is the key to as single-image cluster
• Starting at 1.6.X MongoDB provides production
ready autosharding
• State scalability via sharded collections allows to
keep scaling up a single-image large-scale Meandre
Cluster
22. • The response messages have been revised
• Homogenized the structure of the response contents
• Revisit execution mechanics
• Introduce a new job API that helps
• Submit jobs for execution
• Track them (monitor state, kill, etc.)
• Inspect console and logs in real time
23. • Repository API
Manage user repository of components and flows
• Location API
Manage locations from where components and flows
can be imported into a user repository
• Security API
Allow administrators to manage users and their
profiles and roles in a given cluster
24. • Publish API
Helps manage the components and flows that get
published to the publicly shared global repository
• Cluster management & logs API
The cluster management API mostly focus on cluster
monitoring (via Snare web monitor), selective server/
cluster shutdown, and access to server/cluster logs
• Job API
The new job API allows to submit, monitor, and
control jobs submitted for execution to a cluster
25. • Public API
Miscellaneous public services providing access to the
public repository, demo repository, and pinging
services (targeted to specific servers)
26. • The prefix of the rest API is configurable
• Each call specifies the response format using a simple
file extension convention
• The next few slides provides a raw list of the revisited
API (further details should be looked up on the
Meandre documentation website)
35. • Already mentioned that flows in Meandre 2.0 are
spawn on a separate process
• The execution process is a wrapper
• STDIN: Read the repository RDF to execute
• STDOUT: Outputs the console flow output
• STDERR: Outputs of the logs of the flow
• Console and logs are streamed and archive by the
Meandre server in real time
36. • Console and logs are linked to job submission
• Users can query anytime for consoles and logs and
they will get the current contents
• Once flow execution finishes consoles and logs are
compacted but are still available on demand
37. Crochet
Control
Server
Flow & Components
RDF (STDIN)
State API
Snare Monitor Job Manager API
Console (STDOUT) Spawned
•Consoles Flow Execution
•Logs
Logs (STDIN) Process
•Job tracking
38. • Meandre 2.0 server does not provide any execution
facility. Instead, it spawns a separate process
• The process is pass a command-line parameter (the
port number for the WebUI)
• The process is assumed to read the repository to
execute (flow and required components RDF)
• Reads console (STDOUT) and logs (STDERR) and
pushes them into MongoDB
• It is able to terminate a spawned job on demand
39. • The Job API submit service accepts a parameter
(“wrapper”) that allows you to request specific
execution engines.
• The default execution engines provided in 2.0 are
• echo: Just reprints the input RDF to the console
and logs beginning and end of execution
• 1.4.x: The latest execution engine released on the
1.4.x series
• snowfield: The revamped Meandre 2.0 execution
engine (also the basic execution piece of
distributed execution of flows)
40. • All execution engines are place on the
<MEANDRE_HOME>/scripts directory
• All execution engines are lunch via Scala scripts
using the name convention
execution_<NAME>.scala
• The provided execution engines are named
• execution_echo.scala
• execution_1.4.x.scala
• execution_snowfield.scala
41. • You can add an execution engine by adding a script
following the previous naming convention. For
instance, execution engine my_engine will require
a Scala wrapper place in the <MEANDRE_HOME>/
scripts folder named
execution_my_engine.scala
• You can request your customized execution engine
by submitting jobs via the REST API and add the
parameter &wrapper=my_engine
43. • The introduction of the Job API have refined flow
lifecycle
• 1.4.X execution was on demand (potentially
overloading the box)
• 2.0.X introduces a refine execution state
44. Done
ne
Execution successfully
e
y gi
bl
User request
ad en
la
completed
ai
re on
av
i
ut
er
ec
rv
Ex
Se
Submitted Preparing Running
Inf
In
ras
fra
t ruc
Infrast
st
tur
ru
e
ct
ur
fai
e
ructure
lur
e
fa
ilu
re
failure
w
Bad-behaved flow
equest
flo
st
ed
e
qu
av
est
User r
e
eh
u
rr req
-b
se er
U Us Ba
d
Aborted Failed Killed
46. • Data-driven execution
• No centralized control
• Designed with multi and many cores in mind
• The underlying assumption
• One thread per component
• Finite buffers on input ports to decouple
production/consumption
• Ideal for share-memory machines (e.g. Cobalt)
48. • Two other threads are created
• Mr. Proper
• This threads monitors the status of component threads
• If no thread running and no data, then flow is done, time
to clean
• Mr. Probe can
• Record firing events
• Data in the buffers
• Component state
50. • Key benefit for Meandre after the Scala transition
• High level parallel constructs
• Simple concurrency model
• Actors modeled after Erlang
• Actors are light weight when compared to threads
• Configurable scheduling for actors
51. • Actors are the primitive of concurrent computation
• Actors respond to messages they receive
• Actors perform local operations
• Actors send messages to other actors
• Actors can create new actors
52. JVM
C1 C3 C6
A1 A3 A6
C2 C4 C5
Actor
A2 A4 A5 Scheduler
Mr. Probing Proper
A0
53. • Abstraction
• Break the relation between components and threads
• Minimize context switching between threads
• Main benefit
• Simple communication model
• Trivial to distribute!
54. C1 C3 C2 C4
A1 A3 A2 A4
Actor Actor
Scheduler Scheduler
JVM1 JVM2 Mr. Probing Proper
A0
Actor Actor Actor
Scheduler Scheduler Scheduler
C5 C6
JVM0
A5 A6
JVM3 JVM4
55. • Now JVM can be place on different machines
• Questions?
• How do I group components in JVMs?
• Where do I place the JVMs
• Scheduling and mapping relies on 3rd parties
• Manually by user
• Model 1 job and let the grid do the allocation (e.g.
Abe, Blue Waters)
• Cloud orchestrated
56. Xavier Llorà
Data-Intensive Technologies and Applications
National Center for Supercomputing Applications
University of Illinois at Urbana-Champaign
xllora@illinois.edu