This document discusses predictive maintenance using sensor data in utility industries. It describes how sensors can monitor infrastructure and predict failures by analyzing patterns in sensor data using machine learning models. An architecture is proposed that uses big data frameworks like Spark, Kafka and HBase to collect, analyze and store large volumes of real-time sensor data at scale. Predictive analytics on this data with techniques like clustering and regression can detect anomalies and predict failures to enable condition-based maintenance in utilities. Modeling uncertain sensor readings with probabilistic and autoregressive approaches is also discussed.
1. The document discusses how organizations can leverage data, analytics, and insights to fundamentally change and pioneer new business models.
2. It emphasizes that data analytics cannot be accomplished in a silo and must involve the entire organization. Modern cloud platforms, software methodologies, and data tools are needed.
3. Examples are provided of how various organizations have used tools like Pivotal Greenplum to gain insights from data to solve problems in areas like predictive maintenance, risk management, and national security.
AI in healthcare and Automobile Industry using OpenPOWER/IBM POWER9 systemsGanesan Narayanasamy
As the adoption of AI technologies increases and matures, the focus will shift from exploration to time to market, productivity and integration with existing workflows. Governing Enterprise data, scaling AI model development, selecting a complete, collaborative hybrid platform and tools for rapid solution deployments are key focus areas for growing data scientist teams tasked to respond to business challenges. This talk will cover the challenges and innovations for AI at scale for the Industires such as Healthcare and Automotive , the AI ladder and AI life cycle and infrastructure architecture considerations.
The document outlines an agenda for a presentation on big data. It discusses key topics like the state of big data adoption, a holistic approach to big data, five high value use cases, technical components, and the future of big data and cloud. The presentation aims to provide an overview of big data and how organizations can take a comprehensive approach to leveraging their data assets.
Big Data Real Time Analytics - A Facebook Case StudyNati Shalom
Building Your Own Facebook Real Time Analytics System with Cassandra and GigaSpaces.
Facebook's real time analytics system is a good reference for those looking to build their real time analytics system for big data.
The first part covers the lessons from Facebook's experience and the reason they chose HBase over Cassandra.
In the second part of the session, we learn how we can build our own Real Time Analytics system, achieve better performance, gain real business insights, and business analytics on our big data, and make the deployment and scaling significantly simpler using the new version of Cassandra and GigaSpaces Cloudify.
Big data expert and Infochimps CEO, Jim Kaskade presents the Infinite Monkey Theorem at CloudCon Expo. He provides an energetic, inspiring, and practical perspective on why Big Data is disrupting. It’s more than historic data analyzed on Hadoop. It’s also more than real-time streaming data stored and queried using NoSQL. Learn more at www.Infochimps.com
Use cases for Hadoop and Big Data Analytics - InfoSphere BigInsightsGord Sissons
This presentation is from TDWI's event in Boston during the summer of 2014. IBM InfoSphere BigInsights is IBM's enterprise grade Hadoop offering. It combines the best of open-source Hadoop, with advanced capabilities including Big SQL that clients can optionally deploy to get to market faster with a variety of big data and analytic applications.
IBM provides two types of accelerators for big data to speed the development and implementation of specific big data solutions: 1) Analytic accelerators that address specific data types or operations with advanced analytics; and 2) Application accelerators that address specific use cases and include both industry-specific and cross-industry features. The accelerators are packaged software components that provide business logic, data processing, and visualization capabilities and help eliminate the complexity of building big data applications. Examples of capabilities provided by various accelerators include text analytics, geospatial analysis, time series prediction, data mining, finance analytics, machine data analysis, social media insights, and telecommunications event data processing.
The document summarizes the results of a benchmark report on operational analytics from December 2013. Some key findings include:
- One-third of respondents have fully or partially deployed operational analytics.
- The top functional areas that use it are operations, finance, and sales.
- Most implementations integrate data from multiple systems at an enterprise scope.
- The biggest challenges are sourcing data from complex systems and defining rules for analysis and actions.
"Empower Developers with HPE Machine Learning and Augmented Intelligence", Dr...Dataconomy Media
"Empower Developers with HPE Machine Learning and Augmented Intelligence", Dr. Abdourahmane Faye, Big Data SME Lead DACH at HPE
Watch more from Data Natives Berlin 2016 here: http://bit.ly/2fE1sEo
Visit the conference website to learn more: www.datanatives.io
Follow Data Natives:
https://www.facebook.com/DataNatives
https://twitter.com/DataNativesConf
Stay Connected to Data Natives by Email: Subscribe to our newsletter to get the news first about Data Natives 2017: http://bit.ly/1WMJAqS
About the Author:
Abdou Faye is Subject Matter Expert in Big Data, Predictive Analytics / Machine Learning and Business Intelligence, with more than 19 years of experience in that area in various leading and executive roles, both from a Technical, Architecture and Sales perspectives. He recently joins HPE coming from SAP, where he was leading the Predictive Analysis & Big Data CoE (Center Of Excellence) business since 2010 for DACH, CEE and CIS region, in charge of Business Development and Sales Support. Prior to SAP, he worked 4 Years at Microsoft as Senior BI & SQL-Server Consultant in Switzerland, after 10 years spent at Philip Morris (CH), Orange Telco (CH) and SEMA Group (FR). Abdou graduated from Paris 11 University in 2000, where he completed a PhD on Data Mining/Predictive Analytics, after completing a Master in Computer Science.
The document provides an overview of IBM's big data and analytics capabilities. It discusses what big data is, the characteristics of big data including volume, velocity, variety and veracity. It then covers IBM's big data platform which includes products like InfoSphere Data Explorer, InfoSphere BigInsights, IBM PureData Systems and InfoSphere Streams. Example use cases of big data are also presented.
GITEX Big Data Conference 2014 – SAP PresentationPedro Pereira
Big, Fast and Predictive Data: How to Extract Real Business Value – in real time.
90% of the world’s data was created in the last two years. If you can harness it, it will revolutionize the way you do business. Big Data solutions can help extract real business value – in real time.
The document discusses reference architectures for building big data applications with Internet of Things (IoT) technologies. It describes an IoT reference architecture that includes components for device connectivity, data processing/analytics, and business connectivity. It provides examples of device types, connectivity options, and how to use Azure services for device identity/registry, stream processing, analytics, and presentation. Guiding principles are also outlined for building scalable, secure, and flexible IoT solutions.
Big Data & Analytics continues to redefine business. Data has transitioned from an underused asset to the lifeblood of the organisation, and a critical component of business intelligence, insight and strategy.
Big Data Scotland is the largest annual data analytics conference held in Scotland: it is supported by ScotlandIS and The Data Lab and free for delegates to attend. The conference is geared towards senior technologists and business leaders and aims to provide a unique forum for knowledge exchange, discussion and cross-pollination.
The programme will explore the evolution of data analytics; looking at key tools and techniques and how these can be applied to deliver practical insight and value. Presentations will span a wide array of topics from Data Wrangling and Visualisation to AI, Chatbots and Industry 4.0.
Key Topics
• Tools and techniques
• Corporate data culture, business processes, digital transformation
• Business intelligence, trends, decision making
• AI, Real-time Analytics, IoT, Industry 4.0, Robotics
• Security, regulation, privacy, consent, anonymization
• Data visualisation, interpretation and communication
• CRM and Personalisation
StreamAnalytix is a software platform that enables enterprises to analyze and respond to events in real-time at Big Data scale. It is designed to rapidly build and deploy streaming analytics applications for any industry vertical, any data format, and any use case.
Overview of analytics and big data in practiceVivek Murugesan
Intended to give an overview of analytics and big data in practice. With set of industry use cases from different domains. Would be useful for someone who is trying to understand Analytics and Big Data.
Threat Detection and Response at Scale with Dominique BrezinskiDatabricks
Security monitoring and threat response has diverse processing demands on large volumes of log and telemetry data. Processing requirements span from low-latency stream processing to interactive queries over months of data. To make things more challenging, we must keep the data accessible for a retention window measured in years. Having tackled this problem before in a massive-scale environment using Apache Spark, when it came time to do it again, there were a few things I knew worked and a few wrongs I wanted to right.
We approached Databricks with a set of challenges to collaborate on: provide a stable and optimized platform for Unified Analytics that allows our team to focus on value delivery using streaming, SQL, graph, and ML; leverage decoupled storage and compute while delivering high performance over a broad set of workloads; use S3 notifications instead of list operations; remove Hive Metastore from the write path; and approach indexed response times for our more common search cases, without hard-to-scale index maintenance, over our entire retention window. This is about the fruit of that collaboration.
How to Swiftly Operationalize the Data Lake for Advanced Analytics Using a Lo...Denodo
Watch full webinar here: https://bit.ly/3mfFJqb
Presented at Chief Data Officer Live Series 2021, ASEAN (August Edition)
While big data initiatives have become necessary for any business to generate actionable insights, big data fabric has become a necessity for any successful big data initiative. The best-of-breed big data fabrics should deliver actionable insights to the business users with minimal effort, provide end-to-end security to the entire enterprise data platform, and provide real-time data integration while delivering a self-service data platform to business users.
Watch this on-demand session to learn how big data fabric enabled by Data Virtualization:
- Provides lightning fast self-service data access to business users
- Centralizes data security, governance, and data privacy
- Fulfills the promise of data lakes to provide actionable insights
The document discusses challenges in building machine learning platforms and pipelines. It covers topics like data exploration challenges due to versioning issues; managing large numbers of model experiments with different hyperparameters, datasets, and performance tracking; and difficulties deploying models at scale for monitoring. The presentation demonstrates examples of machine learning applications in industries like telecommunications, manufacturing, and finance. It also discusses trends in deep learning, distributed learning, transfer learning, and edge device machine learning.
The document provides an introduction to predictive maintenance. It outlines the objectives of the course, which are to define predictive maintenance programs and various condition monitoring techniques, including vibration analysis, lubrication analysis, ultrasonic analysis, and thermographic analysis. The agenda covers topics such as predictive maintenance, maintenance planning, vibration analysis, and thermal analysis. The document then begins discussing predictive maintenance in more detail, defining preventative maintenance, predictive maintenance, and condition monitoring. It explores patterns of equipment failure and how to monitor equipment condition.
[Tutorial] building machine learning models for predictive maintenance applic...PAPIs.io
The document discusses using machine learning for predictive maintenance in IoT applications compared to traditional approaches. It describes using publicly available aircraft engine data to build models in Azure ML to predict remaining useful life. Models tested include regression, binary classification, and multi-class classification. An end-to-end pipeline is demonstrated, from data preparation through deploying web services with different machine learning models.
Predictive maintenance uses sensors and data analytics to predict failures in machines before they occur. It aims to replace parts that will break imminently, rather than all parts preventatively. This approach reduces costs compared to traditional preventative maintenance. Developing an effective predictive maintenance program requires understanding machine operations and economics, identifying relevant data sources, and creating predictive models. It is a collaborative process without single solutions, as each machine system presents unique challenges. The goal is finding the optimal balance of predictive efforts and resulting cost savings.
International competition, shorter product life cycles and faster technological leaps forward – these are only a few of the challenges the production of a company is facing in the 21st century. In order to survive in an environment like this, resource-efficient and secure planning of production processes are necessary to guarantee a consistent and high quality output. Unforeseeable machine failures as well as performance drops or deterioration in quality because of defective system components can lead to shortness of supplies which will eventually weaken the market position of the entire organization.
To meet these requirements organizations are increasingly focusing on the improvement of maintenance, repair and operations of their machinery. In the previous years, the industry shifted their focus away from only reactive repair mechanisms towards the predictive coordination of machine maintenance.
Predictive Maintenance falls under the category of the future of maintenance developments. Originally developed in the course of the “Industrie 4.0” high-tech strategy of the German government, today Predictive Maintenance represents the informatization of production processes - intelligent IT-based production systems on the path towards a Smart Factory. Through the generation and analysis of different machine data, the predictive power of the state of industrial plants is not only enhanced, but also provides the basis for an improved planning certainty as well as the efficient planning of repair and maintenance work.
The Science of Predictive Maintenance: IBM's Predictive Analytics SolutionSenturus
Overview of IBM’s Predictive Maintenance and Quality (PMQ) solution. View the webinar video recording and download this deck: http://www.senturus.com/resources/science-predictive-maintenance/.
We show you the PMQ solution can keep manufacturing processes, infrastructure and field equipment running to maximize use and performance, while minimizing costs.
We show how you can use powerful analytics and data integration to help: Anticipate asset maintenance and product quality problems, Reduce unscheduled asset downtime, Spend less time solving production machinery and field asset problems, Improve asset productivity and process quality, Monitor how assets are performing in real-time and predict what will happen next.
Senturus, a business analytics consulting firm, has a resource library with hundreds of free recorded webinars, trainings, demos and unbiased product reviews. Take a look and share them with your colleagues and friends: http://www.senturus.com/resources/.
The document discusses predictive maintenance in oil refineries using analytics. It notes that most refinery shutdowns are unplanned and due to mechanical failures. Traditional maintenance methods are reactive or preventive. Predictive analytics uses real-time equipment data and historical maintenance records to monitor equipment health and estimate remaining lifespan. This allows refineries to schedule maintenance more efficiently to avoid breakdowns and reduce downtime. The document provides examples of predictive algorithms and dashboards that can integrate data for predictive maintenance to optimize operations and supply chain processes. It estimates that a typical 100,000 bpd refinery could save over $3.5 million annually through predictive maintenance.
Using the Industrial Internet to Move From Planned Maintenance to Predictive ...Sentient Science
Sentient Science provides Prognostics Health Management using the Industrial Internet and will show practical examples of driving down O&M costs by moving from Planned Preventative Maintenance (PPM) to Predictive Health Maintenance (PHM) for distributed assets. This presentation will outline how the IIC, and the practical benefits of integrating your distributed assets with prognostics, predictive models for life extension.
BA Summit 2014 Predictive maintenance: Met big data het lek dichtenDaniel Westzaan
Predictive maintenance is een van de big-datatoepassingen met enorme potentie. Voor Vitens, het grootste waterbedrijf van Nederland met meer dan 5,5 miljoen klanten, toonden CGI en IBM in een proof of value aan dat sneller en nauwkeuriger lekken lokaliseren in potentie miljoenen kan besparen.
De primaire taak van Vitens is ervoor zorgen dat klanten te allen tijde kunnen beschikken over topkwaliteit drinkwater. Met een netwerk van meer dan 49.000 km relatief oude pijpleiding, is het kostenefficiënt onderhouden van het netwerk een voortdurende uitdaging. Veelal wordt gekozen voor preventief onderhoud waardoor pijpleiding vaak eerder wordt vervangen dan strikt nodig is. Desondanks treden er regelmatig lekken op met soms grote schade en bedreiging van de leveringszekerheid.
Het lokaliseren van lekken gebeurt handmatig, wat veel tijd en geld kost omdat het zoekgebied vaak kan oplopen tot tientallen kilometers. Vitens vroeg CGI en IBM om met behulp van een big-datatoepassing een methode te ontwikkelen voor het lokaliseren van lekken. In een proof of value werd historische data geanalyseerd waarbij de helft van de geanalyseerde lekken tot op 2,5 km nauwkeurig kon worden gelokaliseerd.
Door sneller lekken te lokaliseren of zelfs te voorspellen, kan Vitens niet alleen direct besparen op inzet van medewerkers voor lokalisatie en bezetting van het callcenter. Het maakt het ook mogelijk om de effectieve levensduur van pijpleidingen te verlengen of, bij minder kritische delen van het netwerk, zelfs te kiezen voor de maximale levensduur waarbij pas leiding pas wordt vervangen bij het daadwerkelijk optreden van lekken.
Reliability centered maintenance (RCM) is a maintenance strategy that uses failure modes and effects analysis to determine the most cost-effective maintenance tasks. It aims to perform only necessary maintenance to preserve system functions and avoid unnecessary maintenance costs. RCM shifts maintenance from reactive to condition-based, using tools like vibration analysis and oil testing to predict failures. Initial costs for RCM are higher but maintenance costs decrease over time as failures are prevented.
When working for Petrobras at PRSI (Pasadena Refining System Inc.) I had this opportunity to share my experience as a Maintenance Manager in Brazil with PRSI operators and maintenance crew.
This presentation describes unique management principles and provides a comprehensive, high-level overview of the necessary program elements for operating a mission critical facility efficiently and reliably throughout its life cycle. Practical management tips and advice are also given.
The document discusses how digital technologies can transform the chemical industry by addressing key challenges across the value chain. It outlines challenges in areas like production, supply chain, commercial operations, and maintenance. It then explains how digital interventions using mobility, IoT, cloud, big data analytics and social media can optimize asset utilization, improve supply chain agility, provide customer insights and enhance worker safety. Specific digital capabilities for the plant, supply chain, sales and marketing, and workforce are presented. The document concludes that digital will disrupt how chemical industries operate and deliver value.
Maintenance Prédictive, personnalisation des interactions clients, optimisation de la supply chain : quand Spark et Hadoop s'immiscent dans les activités opérationnelles de l'entreprise
This document discusses performance management and business analytics. It includes:
- A maturity model showing four steps of performance management maturity from reacting to orchestrating.
- Charts and data from surveys on the amount of useless information managers receive, time wasted searching for information, and accidentally using wrong information.
- Descriptions of what is needed for effective performance management including an enterprise platform, consistent information access, and best practice solutions.
- An overview of how Cognos and IBM solutions address business analytics needs like financial, customer, and supply chain optimization.
XMPLR Data Analytics in Power GenerationScott Affelt
This document discusses opportunities for using data analytics in power generation. It outlines how data analytics can help improve efficiency, reliability, emissions, and flexibility through approaches like predictive maintenance. Advanced pattern recognition and prognostics are highlighted as particularly useful methods. Predictive maintenance allows issues to be addressed before failures occur, reducing downtime and maintenance costs. Implementing these approaches faces challenges regarding data collection, expertise, accuracy, and security. Overall, data analytics can help operators better manage assets and improve generation reliability through early fault detection and useful life predictions.
One of the major challenges for Gas Turbine users is to ensure high level of engine availability and reliability, and efficient operation during their complete life-cycle. For this purpose, Various maintenance approaches have been introduced over the years for the gas turbine maintenance: Breakdown Maintenance or Run to Failure, Preventive Maintenance or Scheduled Maintenance and Condition-Based Maintenance (CBM). Here the focus is on CBM or predictive maintenance.
Presentation by Dr. Peter Bruce, Statistics.com. Presented on April 27, 2012 at the MRA Spring Research Symposium hosted by the Mid-Atlantic Chapter of the Marketing Research Association.
Application fields of R in classical industrial analyticseoda GmbH
There is no doubt about it: A variety of methods which are easy to access can make R a valuable tool for plenty application scenarios in business, e.g. optimizing sales campaigns by scoring potential customers, predicting machine failures with patterns from sensor data or forecasting weather conditions in order to have some guidance for the trade and supply of energy. The knowledge in these kinds of scenarios has emerged from the young discipline of data science and is based on contemporary methods of data mining and predictive analytics.
At the same time there are many more application scenarios for R which are not directly connected to data science but are substantially impacted by classical analytics used in business and particularly in industry, for example in process controls, process validation or cyclic reports. Even though R has reached incredible popularity for data science methods, companies still struggle to make R accessible for their classical analytics.
The talk will highlight the differences of data science and classical analytics and reveal the underrated potential of R in business processes which are dominated by a particular software. Furthermore, the talk will give an outlook for R in the field of analytics.
This document discusses stream computing and various real-time analytics platforms for processing streaming data. It describes key concepts of stream computing like analyzing data in motion before storing, scaling to process large data volumes, and making faster decisions. Popular open-source platforms are explained briefly, including their architecture and uses - Spark, Storm, Kafka, Flume, and Amazon Kinesis.
Reactive Stream Processing for Data-centric Publish/SubscribeSumant Tambe
The document discusses the Industrial Internet of Things (IIoT) and key challenges in developing a dataflow programming model and middleware for IIoT systems. It notes that IIoT systems involve large-scale distributed data publishing and processing streams in a parallel manner. Existing pub-sub middleware like DDS can handle data distribution but lack support for composable local data processing. The document proposes combining DDS with reactive programming using Rx.NET to provide a unified dataflow model for both local processing and distribution.
User can run queries via MicroStrategy’s visual interface without the need to write unfamiliar HiveQL or MapReduce scripts. In essence, any user, without programming skill in Hadoop, can ask questions against vast volumes of structured and unstructured data to gain valuable business insights.
A whitepaper from qubole about the Tips on how to choose the best SQL Engine for your use case and data workloads
https://www.qubole.com/resources/white-papers/enabling-sql-access-to-data-lakes
This paper covers our experience of building real-time pipelines for financial data, the various open source libraries we experimented with and the impacts we saw in a very brief time.
Kafka vs Spark vs Impala in bigdata .pptxemmadoo192
In today's data-driven world, organizations are faced with the challenge of efficiently processing and analyzing vast amounts of data to extract valuable insights. Apache Spark has emerged as a powerful tool for processing big data, offering speed, scalability, and ease of use. This project aims to leverage the capabilities of Spark to enhance data processing efficiency and empower organizations to derive meaningful insights from their data.Scalable Data Processing: Implement Spark to process large-scale datasets in a distributed computing environment, enabling parallel processing for enhanced scalability.
Real-time Data Analytics: Utilize Spark Streaming to perform real-time analytics on streaming data sources, enabling organizations to make timely decisions based on up-to-date information.
Advanced Analytics: Employ Spark's machine learning library (MLlib) to perform advanced analytics tasks such as predictive modeling, clustering, and classification, enabling organizations to uncover patterns and trends within their data.
Integration with Big Data Ecosystem: Integrate Spark seamlessly with other components of the big data ecosystem such as Hadoop, Kafka, and Cassandra, enabling seamless data ingestion, storage, and processing across different platforms.
Optimization and Performance Tuning: Implement optimization techniques such as partitioning, caching, and lazy evaluation to enhance the performance of Spark jobs and reduce processing time.
Methodology:
Data Exploration and Preparation: Explore and preprocess the dataset to handle missing values, outliers, and data inconsistencies, ensuring data quality and reliability.
Spark Environment Setup: Set up a Spark cluster either on-premises or on a cloud platform such as AWS or Azure, configuring the necessary resources and dependencies.
Development of Spark Applications: Develop Spark applications using Scala, Python, or Java to implement various data processing and analytics tasks according to the project requirements.
Testing and Validation: Test the Spark applications using sample datasets and validation techniques to ensure accuracy and reliability of the results.
Deployment and Integration: Deploy the Spark applications into production environment and integrate them with existing systems and workflows for seamless operation.
Deliverables:
Technical Documentation: Provide detailed documentation covering the project architecture, design decisions, implementation details, and deployment instructions.
Codebase: Deliver well-organized and documented codebase of the Spark applications developed during the project, along with unit tests and integration tests.
Performance Metrics: Present performance metrics and benchmarks demonstrating the efficiency and scalability of the Spark-based solution compared to traditional approaches.
Training and Support: Offer training sessions and support to the project stakeholders to enable them to effectively utilize and maintain the Spark-based solution.
Massive sacalabilitty with InterSystems IRIS Data PlatformRobert Bira
Faced with the enormous and evergrowing amounts of data being generated in the world today, software architects need to pay special attention to the scalability of their solutions. They must also design systems that can, when needed, handle many thousands of concurrent users. It’s not easy, but designing for massive scalability is an absolute necessity.
This document provides an overview and comparison of RDBMS, Hadoop, and Spark. It introduces RDBMS and describes its use cases such as online transaction processing and data warehouses. It then introduces Hadoop and describes its ecosystem including HDFS, YARN, MapReduce, and related sub-modules. Common use cases for Hadoop are also outlined. Spark is then introduced along with its modules like Spark Core, SQL, and MLlib. Use cases for Spark include data enrichment, trigger event detection, and machine learning. The document concludes by comparing RDBMS and Hadoop, as well as Hadoop and Spark, and addressing common misconceptions about Hadoop and Spark.
This document discusses how to implement operations like selection, joining, grouping, and sorting in Cassandra without SQL. It explains that Cassandra uses a nested data model to efficiently store and retrieve related data. Operations like selection can be performed by creating additional column families that index data by fields like birthdate and allow fast retrieval of records by those fields. Joining can be implemented by nesting related entity data within the same column family. Grouping and sorting are also achieved through additional indexing column families. While this requires duplicating data for different queries, it takes advantage of Cassandra's strengths in scalable updates.
This document discusses performance analysis and fault tolerance in software environments. It begins by introducing the importance of performance analysis and fault tolerance for software, as faults can lead to losses. It then discusses different fault tolerance techniques, which generally involve some type of replication to handle node and network failures. The two main approaches are replication and coordination, which rely on modeling computation as a deterministic state machine. The document will analyze performance and fault tolerance of software environments.
A General Purpose Extensible Scanning Query Architecture for Ad Hoc AnalyticsFlurry, Inc.
We present Burst, an analytic query system with a scalable and flexible approach to performing lowlatency ad hoc analysis over large complex datasets. The architecture consists of hardwareefficient scan techniques and a language facility to transform an extensible set of ad hoc declarative queries into imperative physical scan plans. These plans are multicast across all nodes/cores of a two level sharded/distributed ingestion, storage, and execution topology and executed. The first release of this system is the query engine behind the Flurry Explorer product. Here we explore the design details of that system as well as the incremental ingestion pipeline enhancement currently being implemented for the next major release.
Dataservices - Processing Big Data The Microservice WayJosef Adersberger
We see a big data processing pattern emerging using the Microservice approach to build an integrated, flexible, and distributed system of data processing tasks. We call this the Dataservice pattern. In this presentation we'll introduce into Dataservices: their basic concepts, the technology typically in use (like Kubernetes, Kafka, Cassandra and Spring) and some architectures from real-life.
Schema-based multi-tenant architecture using Quarkus & Hibernate-ORM.pdfseo18
Architecture design is a must while developing a SaaS application to ensure its scalability and optimising infrastructure costs. In this blog, Lets discuss the implementation of one such architecture with Quarkus java framework and Hibernate ORM
Getting real-time analytics for devices/application/business monitoring from trillions of events and petabytes of data like companies Netflix, Uber, Alibaba, Paypal, Ebay, Metamarkets do.
6° Sessione - Ambiti applicativi nella ricerca di tecnologie statistiche avan...Jürgen Ambrosi
In questa sessione vedremo, con il solito approccio pratico di demo hands on, come utilizzare il linguaggio R per effettuare analisi a valore aggiunto,
Toccheremo con mano le performance di parallelizzazione degli algoritmi, aspetto fondamentale per aiutare il ricercatore nel raggiungimento dei suoi obbiettivi.
In questa sessione avremo la partecipazione di Lorenzo Casucci, Data Platform Solution Architect di Microsoft.
The document discusses experimenting with big data technologies using the Tengu platform. Tengu allows customers to easily set up environments to experiment with big data stores like Cassandra and Elasticsearch. It also supports different types of big data analysis like stream processing, batch analysis, and the Lambda architecture. Tengu handles all the deployment and configuration of these technologies so users can focus on experimenting with their applications in a big data context without having to deal with integration and setup.
ارائه در زمینه کلان داده،
کارگاه آموزشی "عصر کلان داده، چرا و چگونه؟" در بیست و دومین کنفرانس انجمن کامپیوتر ایران csicc2017.ir
وحید امیری
vahidamiry.ir
datastack.ir
This document discusses analyzing fire department call data from San Francisco using HiveQL and MapReduce. The authors cleaned the data, loaded it into HDFS, and performed queries and analysis. They found that Hive queries took less time than custom MapReduce programs for the same queries on this dataset. Visualizations of query results were created using JFreeCharts. The goal was to help improve fire department resource allocation and response based on patterns in call volume, location, and time.
Similar to Predictive maintenance withsensors_in_utilities_ (20)
Tama Tonga MFT T shirts Tama Tonga MFT T shirtsexgf28
Tama Tonga MFT T shirts
https://www.pinterest.com/youngtshirt/tama-tonga-mft-t-shirts/
Tama Tonga MFT T shirts,Tama Tonga MFT shirt,Tama Tonga MFT Sweatshirts,MFT T shirts Grabs yours today. tag and share who loves it.
Book dating , international dating phgrathomaskurtha9
International dating programhttps: please register here and start to meet new people todayhttps://www.digistore24.com/redir/384521/godtim/.
get started. https://www.digistore24.com/redir/384521/godtim/
2. Agenda
Sensors in IOT era
Predictive Maintenance
Predictive Maintenance with sensor data in Utilities industry
Architecture for real time distributed sensor data collection, analysis,
visualization, and storage system
Modeling imprecise sensor readings
3. Sensors in IOT era
Sensors
Sensors are a bridge between the physical world and the internet. They will
play an ever increasing role in just about every field imaginable, and powering
the “Internet of Things”.
Potential Uses of Sensor Data
Sensors can be used to monitor machines, infrastructure, and environment such as
ventilation equipment, bridges, energy meters, airplane engines, temperature,
humility, etc.
One use of this data is for predictive maintenance, to repair or replace the items
before they break.
4. 3 classes of Maintenance
Corrective maintenance (CM), is simply fixing things after they suffer a
breakdown and can also be called Reactive maintenance.
Preventive maintenance (PM), is about replacing or replenishing consumables
at scheduled intervals.
Predictive maintenance (PdM) or Condition-based maintenance, focuses on
detecting failures before they occur.
PdM incorporates inspections of the system at predetermined intervals to
determine system condition.
Depending on the outcome of a continual inspection, either a preventive or
no maintenance activity is performed.
5. Fault Detection Method in Predictive Maintenance
PdM employs many fault or defect detection methods which compare current
sensor or inspection data with some reference data.
If the reference data are the outcome of a representation of the real system,
the fault detection method is called model-based.
Mainly, two distinctive kind of models are used, analytical models and
machine learning models:
Analytical models are limited to represent linear characteristics, however
modern machine learning techniques based on artificial intelligence, as
neural networks or Bayesian (beliefs) networks or support vector machines
are capable of including nonlinearities and complex interdependencies. Even
a relatively "simple" machine learning tool such as a decision tree can allow
for nonlinearities.
6. Machine Learning in Predictive Maintenance
Data Mining and Machine Learning
allow systematic classifying of
patterns contained in data sets.
Patterns of data, “attributes”,
containing information about
condition of physical assets can be
represented by “instances” with an
associated failure mode, or “class”.
Predictions can be made based on
patterns in real time data.
7. Decision tree model example
Here is an instance of building a decision tree model where the strategy is to
either perform maintenance or not based on outcome from several
independent measurements (variables).
9. Predictive Maintenance in Utility
industry
By analyzing the patterns of circumstances surrounding past equipment
failures and power outages and by accessing multiple data sources including
sensors in real time, utility companies can predict and prevent future
failures.
Predictive Maintenance allows utility companies to not only prepare for
known consumption peaks, such as those caused by extreme weather
conditions, but also react quickly to unexpected problems when the warning
signs appear.
Utility companies can spot the problem early on:
When some of the values of some sensor are not normal;
When the number of abnormal values exceeds a given threshold;
Or when the values of a given sensor are significantly different from the values
of its neighbors.
10. Big and fast sensor data requires a
different architecture
Due to the rapid advances in sensor technologies, the
number of sensors and the amount of sensor data
have been increasing with incredible rates.
Therefore the scalability, availability, speed
requirements for sensor data collection, storage, and
analysis solutions call for use of new technologies,
which have the ability to efficiently distribute data
over many servers and dynamically add new
attributes to data records.
11. Architecture for a real time distributed sensor
data collection, analysis, visualization, and
storage system
The new architecture must be able to scale to support a large number of
sensors and big data sizes.
It must be able to automatically gather and analyze large number of sensor
measurements over long periods of time and also to deploy statistics and
machine learning to execute computationally complex data analysis
algorithms with many influence factors.
Open source big data frameworks can be utilized for large-scale sensor data
analysis requirements.
13. An example use case
Display all the transformers located in City Houston, Texas on the map, and
when a transformer icon is clicked, display in an info window the following
details for each transformer: Transformer ID, Age, Designed Capacity, exact
location, and the current Load reading.
If a transformer is of Type “Pole-Top”, with Rating 230, Age > 20, and if its
load has exceeds its designed capacity by more than 10 kVA, and also in the
location where the transformer is located, air temperature >100 degrees,
we'll highlight the transformer icon as red.
When user clicks on the specific transformer, we'll populate the details for the
transformer, including its Load reading. Both the transformer icon color and
the transformer Load reading (with red or green color) will continuously
update every second in real time.
14. Why Spark?
Spark presents a new distributed memory abstraction, called resilient
distributed datasets (RDDs), which provides a data structure for in-memory
computations on large clusters.
RDDs can achieve fault tolerance, meaning that if a given task fails due to
some reasons such as hardware failures and erroneous user code, lost data
can be recovered and reconstructed automatically on the remaining tasks.
Spark has a Java high-level API for working with distributed data similar to
Hadoop and presents an in-memory processing solution.
We run Spark on Hortonworks HDP2.2 in YARN mode, also have made Spark
1.3.1 work on HDP2.2 (default Spark version: 1.2).
15. Spark Streaming
Spark Streaming is an extension of the core Spark API that allows to enable
high-throughput, fault-tolerant stream processing of live data streams.
It offers an additional abstraction called discretized streams, or
DStreams. DStreams are a continuous sequence of RDDs representing a
stream of data.
DStreams can be created from live incoming data or by transforming other
DStreams.
Spark receives data, divides it into batches, then replicates the batches for
fault tolerance and persists them in memory where they are available for
mathematical operations.
Spark 1.3 offers Streaming K-means Clustering and Streaming Linear
Regression
16. Spark SQL
Spark SQL is Spark's module for working with structured data.
The foundation of Spark SQL is a type of RDD, called SchemaRDD (pre-V1.3) or
DataFrame (V1.3), an object similar to a table in a relational database.
Spark SQL can run queries against mixed types of data
Spark piece in detail:
17. Sensor Data Storage – HBase
NoSQL databases provide efficient alternatives for large amount of sensor data storage. In
this example, we will use HBase, a NoSQL key/value store which runs on top of HDFS.
Unlike Hive, HBase operations run in real-time on its database rather than batch-based
MapReduce jobs.
Each key/value pair in HBase is defined as a cell, and each key consists of row-key, column
family, column, and time-stamp. A row in HBase is a grouping of key/value mappings
identified by the row-key.
In our case, we’ll store the anomaly sensor data in a table “abnormal_ load” in the format of:
key, Transformer_ID, Timestamp, Load, Overload, Location, Air_Temperature
We can query our HBase table by creating an external Hive table, linking the HBase table to
the Hive table, and then running HiveQL:
select Transformer_ID, Timestamp, Overload from spark_poc.abnormal_load where Overload
> 20 and Air_Temperature>105 order by Timestamp DESC;
18. Why sending all sources data to Kafka
In the diagrams in the next 2 slides:
The first shows what happens without Kafka.
Since each source needs to have a connection to each target, it is difficult to
maintain and can cause lots of programming and security issues.
The second diagram uses the Kafka, so all sources send data to Kafka.
We only to develop one interface/program to get all different data into
Kafka. Each different data is one topic.
And from consumer side, a consumer only deals with Kafka. When we add a
new source or a new consumer, it does not affect any existing source or target
at all. Thus it is easy to maintain, clean, secure, scalable.
20. Data Pipe Lines With Kafka
Kafka
HBase Hive
Sources
Targets
HDFS DB
21. Why write analysis result data stream to
Kafka before publishing it to web UI
This is because if we send data steam (analysis result) to a queue on the web
server and then use web socket to push to the browser, it is very tedious to
maintain the queue.
Kafka comes handy as a distributed, persistent message queue which supports
multiple concurrent writers, as well as multiple groups of readers that
maintain their own offsets within the queue (which Kafka calls a ‘topic’).
This enables us to build applications that consume data from a topic at their
own pace without disrupting access from other groups of readers.
22. Sensor Data Analysis
To analyze data on the aforementioned architecture we use distributed
machine-learning algorithms in Apache Mahout and MLlib by Apache Spark.
MLlib is a Spark component and a fast and flexible iterative computing
framework to implement machine-learning algorithms, including
classification, clustering, linear regression, collaborative filtering, and
decomposition aims to create and analyze large-scale data hosted in memory.
We use -means algorithm for clustering sensor data and find the anomalies. -
means algorithm is a very popular unsupervised learning algorithm. It aims to
assign objects to groups. All of the objects to be grouped need to be
represented as numerical features. The technique iteratively assigns points to
clusters using distance as a similarity factor until there is no change in which
point belongs to which cluster.
We also use Spark’s Streaming K-means.
23. Modeling imprecise sensor readings
Sensor readings are inherently imprecise because of the noise introduced by
the equipment itself.
Two main approaches have emerged for modeling uncertain data series:
In the first, a Probability Density Function (PDF) over the uncertain values is
estimated by using some a priori knowledge.
In the second, the uncertain data distribution is summarized by repeated
measurements (i.e., samples).
24. Dynamic probabilistic models over the
sensor readings
The KEN technique builds and maintains dynamic probabilistic models over the
sensor readings, taking into account the spatio-temporal correlations that exist
in the sensor readings.
These models organize the sensor nodes in non-overlapping groups, and are
shared by the sensor nodes and the sink.
The expected values of the probabilistic models are the values that are
recorded by the sink. If the sensors observe that these values are more than εVT
away from the sensed values, then a model update is triggered.
The PAQ and SAF methods employ linear regression and autoregressive
models, respectively, for modeling the measurements produced by the nodes,
with SAF leading to a more accurate model than PAQ.