(Go: >> BACK << -|- >> HOME <<)

SlideShare a Scribd company logo
Predictive Maintenance
with Sensors in Utilities
Tina Zhang
Agenda
 Sensors in IOT era
 Predictive Maintenance
 Predictive Maintenance with sensor data in Utilities industry
 Architecture for real time distributed sensor data collection, analysis,
visualization, and storage system
 Modeling imprecise sensor readings
Sensors in IOT era
 Sensors
Sensors are a bridge between the physical world and the internet. They will
play an ever increasing role in just about every field imaginable, and powering
the “Internet of Things”.
 Potential Uses of Sensor Data
 Sensors can be used to monitor machines, infrastructure, and environment such as
ventilation equipment, bridges, energy meters, airplane engines, temperature,
humility, etc.
 One use of this data is for predictive maintenance, to repair or replace the items
before they break.
3 classes of Maintenance
 Corrective maintenance (CM), is simply fixing things after they suffer a
breakdown and can also be called Reactive maintenance.
 Preventive maintenance (PM), is about replacing or replenishing consumables
at scheduled intervals.
 Predictive maintenance (PdM) or Condition-based maintenance, focuses on
detecting failures before they occur.
PdM incorporates inspections of the system at predetermined intervals to
determine system condition.
Depending on the outcome of a continual inspection, either a preventive or
no maintenance activity is performed.
Fault Detection Method in Predictive Maintenance
 PdM employs many fault or defect detection methods which compare current
sensor or inspection data with some reference data.
 If the reference data are the outcome of a representation of the real system,
the fault detection method is called model-based.
Mainly, two distinctive kind of models are used, analytical models and
machine learning models:
Analytical models are limited to represent linear characteristics, however
modern machine learning techniques based on artificial intelligence, as
neural networks or Bayesian (beliefs) networks or support vector machines
are capable of including nonlinearities and complex interdependencies. Even
a relatively "simple" machine learning tool such as a decision tree can allow
for nonlinearities.
Machine Learning in Predictive Maintenance
 Data Mining and Machine Learning
allow systematic classifying of
patterns contained in data sets.
 Patterns of data, “attributes”,
containing information about
condition of physical assets can be
represented by “instances” with an
associated failure mode, or “class”.
 Predictions can be made based on
patterns in real time data.
Decision tree model example
 Here is an instance of building a decision tree model where the strategy is to
either perform maintenance or not based on outcome from several
independent measurements (variables).
Naïve Bayes example
Predictive Maintenance in Utility
industry
 By analyzing the patterns of circumstances surrounding past equipment
failures and power outages and by accessing multiple data sources including
sensors in real time, utility companies can predict and prevent future
failures.
 Predictive Maintenance allows utility companies to not only prepare for
known consumption peaks, such as those caused by extreme weather
conditions, but also react quickly to unexpected problems when the warning
signs appear.
 Utility companies can spot the problem early on:
 When some of the values of some sensor are not normal;
 When the number of abnormal values exceeds a given threshold;
 Or when the values of a given sensor are significantly different from the values
of its neighbors.
Big and fast sensor data requires a
different architecture
 Due to the rapid advances in sensor technologies, the
number of sensors and the amount of sensor data
have been increasing with incredible rates.
 Therefore the scalability, availability, speed
requirements for sensor data collection, storage, and
analysis solutions call for use of new technologies,
which have the ability to efficiently distribute data
over many servers and dynamically add new
attributes to data records.
Architecture for a real time distributed sensor
data collection, analysis, visualization, and
storage system
 The new architecture must be able to scale to support a large number of
sensors and big data sizes.
 It must be able to automatically gather and analyze large number of sensor
measurements over long periods of time and also to deploy statistics and
machine learning to execute computationally complex data analysis
algorithms with many influence factors.
 Open source big data frameworks can be utilized for large-scale sensor data
analysis requirements.
Socket
Shared Files
User
Kafka
Web Service
Data Source
:
:
Spark
Streaming &
Spark SQL &
ML lib
HDFS
Web UI
HBase
Analysis
results
Kafka
Hive
An example use case
 Display all the transformers located in City Houston, Texas on the map, and
when a transformer icon is clicked, display in an info window the following
details for each transformer: Transformer ID, Age, Designed Capacity, exact
location, and the current Load reading.
 If a transformer is of Type “Pole-Top”, with Rating 230, Age > 20, and if its
load has exceeds its designed capacity by more than 10 kVA, and also in the
location where the transformer is located, air temperature >100 degrees,
we'll highlight the transformer icon as red.
 When user clicks on the specific transformer, we'll populate the details for the
transformer, including its Load reading. Both the transformer icon color and
the transformer Load reading (with red or green color) will continuously
update every second in real time.
Why Spark?
 Spark presents a new distributed memory abstraction, called resilient
distributed datasets (RDDs), which provides a data structure for in-memory
computations on large clusters.
 RDDs can achieve fault tolerance, meaning that if a given task fails due to
some reasons such as hardware failures and erroneous user code, lost data
can be recovered and reconstructed automatically on the remaining tasks.
 Spark has a Java high-level API for working with distributed data similar to
Hadoop and presents an in-memory processing solution.
 We run Spark on Hortonworks HDP2.2 in YARN mode, also have made Spark
1.3.1 work on HDP2.2 (default Spark version: 1.2).
Spark Streaming
 Spark Streaming is an extension of the core Spark API that allows to enable
high-throughput, fault-tolerant stream processing of live data streams.
 It offers an additional abstraction called discretized streams, or
DStreams. DStreams are a continuous sequence of RDDs representing a
stream of data.
 DStreams can be created from live incoming data or by transforming other
DStreams.
 Spark receives data, divides it into batches, then replicates the batches for
fault tolerance and persists them in memory where they are available for
mathematical operations.
 Spark 1.3 offers Streaming K-means Clustering and Streaming Linear
Regression
Spark SQL
 Spark SQL is Spark's module for working with structured data.
 The foundation of Spark SQL is a type of RDD, called SchemaRDD (pre-V1.3) or
DataFrame (V1.3), an object similar to a table in a relational database.
 Spark SQL can run queries against mixed types of data
Spark piece in detail:
Sensor Data Storage – HBase
 NoSQL databases provide efficient alternatives for large amount of sensor data storage. In
this example, we will use HBase, a NoSQL key/value store which runs on top of HDFS.
 Unlike Hive, HBase operations run in real-time on its database rather than batch-based
MapReduce jobs.
 Each key/value pair in HBase is defined as a cell, and each key consists of row-key, column
family, column, and time-stamp. A row in HBase is a grouping of key/value mappings
identified by the row-key.
In our case, we’ll store the anomaly sensor data in a table “abnormal_ load” in the format of:
key, Transformer_ID, Timestamp, Load, Overload, Location, Air_Temperature
 We can query our HBase table by creating an external Hive table, linking the HBase table to
the Hive table, and then running HiveQL:
select Transformer_ID, Timestamp, Overload from spark_poc.abnormal_load where Overload
> 20 and Air_Temperature>105 order by Timestamp DESC;
Why sending all sources data to Kafka
In the diagrams in the next 2 slides:
 The first shows what happens without Kafka.
Since each source needs to have a connection to each target, it is difficult to
maintain and can cause lots of programming and security issues.
 The second diagram uses the Kafka, so all sources send data to Kafka.
We only to develop one interface/program to get all different data into
Kafka. Each different data is one topic.
And from consumer side, a consumer only deals with Kafka. When we add a
new source or a new consumer, it does not affect any existing source or target
at all. Thus it is easy to maintain, clean, secure, scalable.
Sources
Targets
Data Pipe Lines Without Kafka
Data Pipe Lines With Kafka
Kafka
HBase Hive
Sources
Targets
HDFS DB
Why write analysis result data stream to
Kafka before publishing it to web UI
 This is because if we send data steam (analysis result) to a queue on the web
server and then use web socket to push to the browser, it is very tedious to
maintain the queue.
 Kafka comes handy as a distributed, persistent message queue which supports
multiple concurrent writers, as well as multiple groups of readers that
maintain their own offsets within the queue (which Kafka calls a ‘topic’).
This enables us to build applications that consume data from a topic at their
own pace without disrupting access from other groups of readers.
Sensor Data Analysis
 To analyze data on the aforementioned architecture we use distributed
machine-learning algorithms in Apache Mahout and MLlib by Apache Spark.
 MLlib is a Spark component and a fast and flexible iterative computing
framework to implement machine-learning algorithms, including
classification, clustering, linear regression, collaborative filtering, and
decomposition aims to create and analyze large-scale data hosted in memory.
 We use -means algorithm for clustering sensor data and find the anomalies. -
means algorithm is a very popular unsupervised learning algorithm. It aims to
assign objects to groups. All of the objects to be grouped need to be
represented as numerical features. The technique iteratively assigns points to
clusters using distance as a similarity factor until there is no change in which
point belongs to which cluster.
 We also use Spark’s Streaming K-means.
Modeling imprecise sensor readings
 Sensor readings are inherently imprecise because of the noise introduced by
the equipment itself.
 Two main approaches have emerged for modeling uncertain data series:
 In the first, a Probability Density Function (PDF) over the uncertain values is
estimated by using some a priori knowledge.
 In the second, the uncertain data distribution is summarized by repeated
measurements (i.e., samples).
Dynamic probabilistic models over the
sensor readings
 The KEN technique builds and maintains dynamic probabilistic models over the
sensor readings, taking into account the spatio-temporal correlations that exist
in the sensor readings.
 These models organize the sensor nodes in non-overlapping groups, and are
shared by the sensor nodes and the sink.
 The expected values of the probabilistic models are the values that are
recorded by the sink. If the sensors observe that these values are more than εVT
away from the sensed values, then a model update is triggered.
 The PAQ and SAF methods employ linear regression and autoregressive
models, respectively, for modeling the measurements produced by the nodes,
with SAF leading to a more accurate model than PAQ.

More Related Content

What's hot

Innovating With Data and Analytics
Innovating With Data and AnalyticsInnovating With Data and Analytics
Innovating With Data and Analytics
VMware Tanzu
 
AI in healthcare and Automobile Industry using OpenPOWER/IBM POWER9 systems
AI in healthcare and Automobile Industry using OpenPOWER/IBM POWER9 systemsAI in healthcare and Automobile Industry using OpenPOWER/IBM POWER9 systems
AI in healthcare and Automobile Industry using OpenPOWER/IBM POWER9 systems
Ganesan Narayanasamy
 
02 a holistic approach to big data
02 a holistic approach to big data02 a holistic approach to big data
02 a holistic approach to big data
Raul Chong
 
Big Data Real Time Analytics - A Facebook Case Study
Big Data Real Time Analytics - A Facebook Case StudyBig Data Real Time Analytics - A Facebook Case Study
Big Data Real Time Analytics - A Facebook Case Study
Nati Shalom
 
HOW TO APPLY BIG DATA ANALYTICS AND MACHINE LEARNING TO REAL TIME PROCESSING ...
HOW TO APPLY BIG DATA ANALYTICS AND MACHINE LEARNING TO REAL TIME PROCESSING ...HOW TO APPLY BIG DATA ANALYTICS AND MACHINE LEARNING TO REAL TIME PROCESSING ...
HOW TO APPLY BIG DATA ANALYTICS AND MACHINE LEARNING TO REAL TIME PROCESSING ...
Big Data Spain
 
Infochimps + CloudCon: Infinite Monkey Theorem
Infochimps + CloudCon: Infinite Monkey TheoremInfochimps + CloudCon: Infinite Monkey Theorem
Infochimps + CloudCon: Infinite Monkey Theorem
Infochimps, a CSC Big Data Business
 
Use cases for Hadoop and Big Data Analytics - InfoSphere BigInsights
Use cases for Hadoop and Big Data Analytics - InfoSphere BigInsightsUse cases for Hadoop and Big Data Analytics - InfoSphere BigInsights
Use cases for Hadoop and Big Data Analytics - InfoSphere BigInsights
Gord Sissons
 
Ibm big data
Ibm big dataIbm big data
Ibm big data
Peter Tutty
 
Operational Analytics
Operational AnalyticsOperational Analytics
Operational Analytics
Eckerson Group
 
"Empower Developers with HPE Machine Learning and Augmented Intelligence", Dr...
"Empower Developers with HPE Machine Learning and Augmented Intelligence", Dr..."Empower Developers with HPE Machine Learning and Augmented Intelligence", Dr...
"Empower Developers with HPE Machine Learning and Augmented Intelligence", Dr...
Dataconomy Media
 
Overview - IBM Big Data Platform
Overview - IBM Big Data PlatformOverview - IBM Big Data Platform
Overview - IBM Big Data Platform
Vikas Manoria
 
GITEX Big Data Conference 2014 – SAP Presentation
GITEX Big Data Conference 2014 – SAP PresentationGITEX Big Data Conference 2014 – SAP Presentation
GITEX Big Data Conference 2014 – SAP Presentation
Pedro Pereira
 
Big Data Application Architectures - IoT
Big Data Application Architectures - IoTBig Data Application Architectures - IoT
Big Data Application Architectures - IoT
DataWorks Summit/Hadoop Summit
 
Big Data Scotland 2017
Big Data Scotland 2017Big Data Scotland 2017
Big Data Scotland 2017
Ray Bugg
 
Importance of Big Data Analytics
Importance of Big Data AnalyticsImportance of Big Data Analytics
Importance of Big Data Analytics
Impetus Technologies
 
Overview of analytics and big data in practice
Overview of analytics and big data in practiceOverview of analytics and big data in practice
Overview of analytics and big data in practice
Vivek Murugesan
 
Threat Detection and Response at Scale with Dominique Brezinski
Threat Detection and Response at Scale with Dominique BrezinskiThreat Detection and Response at Scale with Dominique Brezinski
Threat Detection and Response at Scale with Dominique Brezinski
Databricks
 
How to Swiftly Operationalize the Data Lake for Advanced Analytics Using a Lo...
How to Swiftly Operationalize the Data Lake for Advanced Analytics Using a Lo...How to Swiftly Operationalize the Data Lake for Advanced Analytics Using a Lo...
How to Swiftly Operationalize the Data Lake for Advanced Analytics Using a Lo...
Denodo
 
How IOT & Big Data will shape up Future Economies?
How IOT & Big Data will shape up Future Economies?How IOT & Big Data will shape up Future Economies?
How IOT & Big Data will shape up Future Economies?
Srinath Perera
 
Next-Gen ML/AI Platform
Next-Gen ML/AI PlatformNext-Gen ML/AI Platform
Next-Gen ML/AI Platform
Josh Yeh
 

What's hot (20)

Innovating With Data and Analytics
Innovating With Data and AnalyticsInnovating With Data and Analytics
Innovating With Data and Analytics
 
AI in healthcare and Automobile Industry using OpenPOWER/IBM POWER9 systems
AI in healthcare and Automobile Industry using OpenPOWER/IBM POWER9 systemsAI in healthcare and Automobile Industry using OpenPOWER/IBM POWER9 systems
AI in healthcare and Automobile Industry using OpenPOWER/IBM POWER9 systems
 
02 a holistic approach to big data
02 a holistic approach to big data02 a holistic approach to big data
02 a holistic approach to big data
 
Big Data Real Time Analytics - A Facebook Case Study
Big Data Real Time Analytics - A Facebook Case StudyBig Data Real Time Analytics - A Facebook Case Study
Big Data Real Time Analytics - A Facebook Case Study
 
HOW TO APPLY BIG DATA ANALYTICS AND MACHINE LEARNING TO REAL TIME PROCESSING ...
HOW TO APPLY BIG DATA ANALYTICS AND MACHINE LEARNING TO REAL TIME PROCESSING ...HOW TO APPLY BIG DATA ANALYTICS AND MACHINE LEARNING TO REAL TIME PROCESSING ...
HOW TO APPLY BIG DATA ANALYTICS AND MACHINE LEARNING TO REAL TIME PROCESSING ...
 
Infochimps + CloudCon: Infinite Monkey Theorem
Infochimps + CloudCon: Infinite Monkey TheoremInfochimps + CloudCon: Infinite Monkey Theorem
Infochimps + CloudCon: Infinite Monkey Theorem
 
Use cases for Hadoop and Big Data Analytics - InfoSphere BigInsights
Use cases for Hadoop and Big Data Analytics - InfoSphere BigInsightsUse cases for Hadoop and Big Data Analytics - InfoSphere BigInsights
Use cases for Hadoop and Big Data Analytics - InfoSphere BigInsights
 
Ibm big data
Ibm big dataIbm big data
Ibm big data
 
Operational Analytics
Operational AnalyticsOperational Analytics
Operational Analytics
 
"Empower Developers with HPE Machine Learning and Augmented Intelligence", Dr...
"Empower Developers with HPE Machine Learning and Augmented Intelligence", Dr..."Empower Developers with HPE Machine Learning and Augmented Intelligence", Dr...
"Empower Developers with HPE Machine Learning and Augmented Intelligence", Dr...
 
Overview - IBM Big Data Platform
Overview - IBM Big Data PlatformOverview - IBM Big Data Platform
Overview - IBM Big Data Platform
 
GITEX Big Data Conference 2014 – SAP Presentation
GITEX Big Data Conference 2014 – SAP PresentationGITEX Big Data Conference 2014 – SAP Presentation
GITEX Big Data Conference 2014 – SAP Presentation
 
Big Data Application Architectures - IoT
Big Data Application Architectures - IoTBig Data Application Architectures - IoT
Big Data Application Architectures - IoT
 
Big Data Scotland 2017
Big Data Scotland 2017Big Data Scotland 2017
Big Data Scotland 2017
 
Importance of Big Data Analytics
Importance of Big Data AnalyticsImportance of Big Data Analytics
Importance of Big Data Analytics
 
Overview of analytics and big data in practice
Overview of analytics and big data in practiceOverview of analytics and big data in practice
Overview of analytics and big data in practice
 
Threat Detection and Response at Scale with Dominique Brezinski
Threat Detection and Response at Scale with Dominique BrezinskiThreat Detection and Response at Scale with Dominique Brezinski
Threat Detection and Response at Scale with Dominique Brezinski
 
How to Swiftly Operationalize the Data Lake for Advanced Analytics Using a Lo...
How to Swiftly Operationalize the Data Lake for Advanced Analytics Using a Lo...How to Swiftly Operationalize the Data Lake for Advanced Analytics Using a Lo...
How to Swiftly Operationalize the Data Lake for Advanced Analytics Using a Lo...
 
How IOT & Big Data will shape up Future Economies?
How IOT & Big Data will shape up Future Economies?How IOT & Big Data will shape up Future Economies?
How IOT & Big Data will shape up Future Economies?
 
Next-Gen ML/AI Platform
Next-Gen ML/AI PlatformNext-Gen ML/AI Platform
Next-Gen ML/AI Platform
 

Viewers also liked

Predictive maintenance
Predictive maintenancePredictive maintenance
Predictive maintenance
James Shearer
 
[Tutorial] building machine learning models for predictive maintenance applic...
[Tutorial] building machine learning models for predictive maintenance applic...[Tutorial] building machine learning models for predictive maintenance applic...
[Tutorial] building machine learning models for predictive maintenance applic...
PAPIs.io
 
What is predictive maintenance?
What is predictive maintenance?What is predictive maintenance?
What is predictive maintenance?
Danko Nikolic
 
Predictive Maintenance with R
Predictive Maintenance with RPredictive Maintenance with R
Predictive Maintenance with R
eoda GmbH
 
The Science of Predictive Maintenance: IBM's Predictive Analytics Solution
The Science of Predictive Maintenance: IBM's Predictive Analytics SolutionThe Science of Predictive Maintenance: IBM's Predictive Analytics Solution
The Science of Predictive Maintenance: IBM's Predictive Analytics Solution
Senturus
 
Predictive Maintenance
Predictive MaintenancePredictive Maintenance
Predictive Maintenance
fljungbe
 
Predictive Maintenance
Predictive MaintenancePredictive Maintenance
Predictive Maintenance
Saama
 
Using the Industrial Internet to Move From Planned Maintenance to Predictive ...
Using the Industrial Internet to Move From Planned Maintenance to Predictive ...Using the Industrial Internet to Move From Planned Maintenance to Predictive ...
Using the Industrial Internet to Move From Planned Maintenance to Predictive ...
Sentient Science
 
Predictive Maintenance for Oil and Gas
Predictive Maintenance for Oil and Gas Predictive Maintenance for Oil and Gas
Predictive Maintenance for Oil and Gas
Helen Fisher
 
BA Summit 2014 Predictive maintenance: Met big data het lek dichten
BA Summit 2014  Predictive maintenance: Met big data het lek dichtenBA Summit 2014  Predictive maintenance: Met big data het lek dichten
BA Summit 2014 Predictive maintenance: Met big data het lek dichten
Daniel Westzaan
 
Reliability centred maintenance
Reliability centred maintenanceReliability centred maintenance
Reliability centred maintenance
SHIVAJI CHOUDHURY
 
Reliability centered maintenance
Reliability centered maintenanceReliability centered maintenance
Reliability centered maintenance
Rodolfo Stonner, PMP, RMP
 
Essential Elements of Data Center Facility Operations
Essential Elements of Data Center Facility OperationsEssential Elements of Data Center Facility Operations
Essential Elements of Data Center Facility Operations
Schneider Electric
 
Digital POV-Chemical Industries
Digital POV-Chemical IndustriesDigital POV-Chemical Industries
Digital POV-Chemical Industries
Ravi Shankar Sugavanam
 
DATA FORUM MICROPOLE 2015 - Atelier Talend
 DATA FORUM MICROPOLE 2015 - Atelier Talend DATA FORUM MICROPOLE 2015 - Atelier Talend
DATA FORUM MICROPOLE 2015 - Atelier Talend
Micropole Group
 
Business Insight and Predictive Analysis
Business Insight and Predictive AnalysisBusiness Insight and Predictive Analysis
Business Insight and Predictive Analysis
USAID CEED II Project Moldova
 
XMPLR Data Analytics in Power Generation
XMPLR Data Analytics in  Power GenerationXMPLR Data Analytics in  Power Generation
XMPLR Data Analytics in Power Generation
Scott Affelt
 
Predictive maintenance
Predictive maintenancePredictive maintenance
Predictive maintenance
Elena Maria Vaccher
 
Predictive Analysis
Predictive AnalysisPredictive Analysis
Predictive Analysis
Michael Bystry
 
Application fields of R in classical industrial analytics
Application fields of R in classical industrial analyticsApplication fields of R in classical industrial analytics
Application fields of R in classical industrial analytics
eoda GmbH
 

Viewers also liked (20)

Predictive maintenance
Predictive maintenancePredictive maintenance
Predictive maintenance
 
[Tutorial] building machine learning models for predictive maintenance applic...
[Tutorial] building machine learning models for predictive maintenance applic...[Tutorial] building machine learning models for predictive maintenance applic...
[Tutorial] building machine learning models for predictive maintenance applic...
 
What is predictive maintenance?
What is predictive maintenance?What is predictive maintenance?
What is predictive maintenance?
 
Predictive Maintenance with R
Predictive Maintenance with RPredictive Maintenance with R
Predictive Maintenance with R
 
The Science of Predictive Maintenance: IBM's Predictive Analytics Solution
The Science of Predictive Maintenance: IBM's Predictive Analytics SolutionThe Science of Predictive Maintenance: IBM's Predictive Analytics Solution
The Science of Predictive Maintenance: IBM's Predictive Analytics Solution
 
Predictive Maintenance
Predictive MaintenancePredictive Maintenance
Predictive Maintenance
 
Predictive Maintenance
Predictive MaintenancePredictive Maintenance
Predictive Maintenance
 
Using the Industrial Internet to Move From Planned Maintenance to Predictive ...
Using the Industrial Internet to Move From Planned Maintenance to Predictive ...Using the Industrial Internet to Move From Planned Maintenance to Predictive ...
Using the Industrial Internet to Move From Planned Maintenance to Predictive ...
 
Predictive Maintenance for Oil and Gas
Predictive Maintenance for Oil and Gas Predictive Maintenance for Oil and Gas
Predictive Maintenance for Oil and Gas
 
BA Summit 2014 Predictive maintenance: Met big data het lek dichten
BA Summit 2014  Predictive maintenance: Met big data het lek dichtenBA Summit 2014  Predictive maintenance: Met big data het lek dichten
BA Summit 2014 Predictive maintenance: Met big data het lek dichten
 
Reliability centred maintenance
Reliability centred maintenanceReliability centred maintenance
Reliability centred maintenance
 
Reliability centered maintenance
Reliability centered maintenanceReliability centered maintenance
Reliability centered maintenance
 
Essential Elements of Data Center Facility Operations
Essential Elements of Data Center Facility OperationsEssential Elements of Data Center Facility Operations
Essential Elements of Data Center Facility Operations
 
Digital POV-Chemical Industries
Digital POV-Chemical IndustriesDigital POV-Chemical Industries
Digital POV-Chemical Industries
 
DATA FORUM MICROPOLE 2015 - Atelier Talend
 DATA FORUM MICROPOLE 2015 - Atelier Talend DATA FORUM MICROPOLE 2015 - Atelier Talend
DATA FORUM MICROPOLE 2015 - Atelier Talend
 
Business Insight and Predictive Analysis
Business Insight and Predictive AnalysisBusiness Insight and Predictive Analysis
Business Insight and Predictive Analysis
 
XMPLR Data Analytics in Power Generation
XMPLR Data Analytics in  Power GenerationXMPLR Data Analytics in  Power Generation
XMPLR Data Analytics in Power Generation
 
Predictive maintenance
Predictive maintenancePredictive maintenance
Predictive maintenance
 
Predictive Analysis
Predictive AnalysisPredictive Analysis
Predictive Analysis
 
Application fields of R in classical industrial analytics
Application fields of R in classical industrial analyticsApplication fields of R in classical industrial analytics
Application fields of R in classical industrial analytics
 

Similar to Predictive maintenance withsensors_in_utilities_

CS8091_BDA_Unit_IV_Stream_Computing
CS8091_BDA_Unit_IV_Stream_ComputingCS8091_BDA_Unit_IV_Stream_Computing
CS8091_BDA_Unit_IV_Stream_Computing
Palani Kumar
 
Reactive Stream Processing for Data-centric Publish/Subscribe
Reactive Stream Processing for Data-centric Publish/SubscribeReactive Stream Processing for Data-centric Publish/Subscribe
Reactive Stream Processing for Data-centric Publish/Subscribe
Sumant Tambe
 
Hadoop Integration with Microstrategy
Hadoop Integration with Microstrategy Hadoop Integration with Microstrategy
Hadoop Integration with Microstrategy
snehal parikh
 
Enabling SQL Access to Data Lakes
Enabling SQL Access to Data LakesEnabling SQL Access to Data Lakes
Enabling SQL Access to Data Lakes
Vasu S
 
Real time data-pipeline from inception to production
Real time data-pipeline from inception to productionReal time data-pipeline from inception to production
Real time data-pipeline from inception to production
Shreya Mukhopadhyay
 
Kafka vs Spark vs Impala in bigdata .pptx
Kafka vs Spark vs Impala in bigdata .pptxKafka vs Spark vs Impala in bigdata .pptx
Kafka vs Spark vs Impala in bigdata .pptx
emmadoo192
 
Massive sacalabilitty with InterSystems IRIS Data Platform
Massive sacalabilitty with InterSystems IRIS Data PlatformMassive sacalabilitty with InterSystems IRIS Data Platform
Massive sacalabilitty with InterSystems IRIS Data Platform
Robert Bira
 
Comparison among rdbms, hadoop and spark
Comparison among rdbms, hadoop and sparkComparison among rdbms, hadoop and spark
Comparison among rdbms, hadoop and spark
AgnihotriGhosh2
 
Cassandra data modelling best practices
Cassandra data modelling best practicesCassandra data modelling best practices
Cassandra data modelling best practices
Sandeep Sharma IIMK Smart City,IoT,Bigdata,Cloud,BI,DW
 
Data Analysis In The Cloud
Data Analysis In The CloudData Analysis In The Cloud
Data Analysis In The Cloud
Monica Carter
 
A General Purpose Extensible Scanning Query Architecture for Ad Hoc Analytics
A General Purpose Extensible Scanning Query Architecture for Ad Hoc AnalyticsA General Purpose Extensible Scanning Query Architecture for Ad Hoc Analytics
A General Purpose Extensible Scanning Query Architecture for Ad Hoc Analytics
Flurry, Inc.
 
Aucfanlab Datalake - Big Data Management Platform -
Aucfanlab Datalake - Big Data Management Platform -Aucfanlab Datalake - Big Data Management Platform -
Aucfanlab Datalake - Big Data Management Platform -
Aucfan
 
Dataservices - Processing Big Data The Microservice Way
Dataservices - Processing Big Data The Microservice WayDataservices - Processing Big Data The Microservice Way
Dataservices - Processing Big Data The Microservice Way
Josef Adersberger
 
Schema-based multi-tenant architecture using Quarkus &amp; Hibernate-ORM.pdf
Schema-based multi-tenant architecture using Quarkus &amp; Hibernate-ORM.pdfSchema-based multi-tenant architecture using Quarkus &amp; Hibernate-ORM.pdf
Schema-based multi-tenant architecture using Quarkus &amp; Hibernate-ORM.pdf
seo18
 
Real time analytics
Real time analyticsReal time analytics
Real time analytics
Leandro Totino Pereira
 
6° Sessione - Ambiti applicativi nella ricerca di tecnologie statistiche avan...
6° Sessione - Ambiti applicativi nella ricerca di tecnologie statistiche avan...6° Sessione - Ambiti applicativi nella ricerca di tecnologie statistiche avan...
6° Sessione - Ambiti applicativi nella ricerca di tecnologie statistiche avan...
Jürgen Ambrosi
 
Experimenting With Big Data
Experimenting With Big DataExperimenting With Big Data
Experimenting With Big Data
Nick Boucart
 
Big Data , Big Problem?
Big Data , Big Problem?Big Data , Big Problem?
Big Data , Big Problem?
Mohammadhasan Farazmand
 
عصر کلان داده، چرا و چگونه؟
عصر کلان داده، چرا و چگونه؟عصر کلان داده، چرا و چگونه؟
عصر کلان داده، چرا و چگونه؟
datastack
 
Final Report_798 Project_Nithin_Sharmila
Final Report_798 Project_Nithin_SharmilaFinal Report_798 Project_Nithin_Sharmila
Final Report_798 Project_Nithin_Sharmila
Nithin Kakkireni
 

Similar to Predictive maintenance withsensors_in_utilities_ (20)

CS8091_BDA_Unit_IV_Stream_Computing
CS8091_BDA_Unit_IV_Stream_ComputingCS8091_BDA_Unit_IV_Stream_Computing
CS8091_BDA_Unit_IV_Stream_Computing
 
Reactive Stream Processing for Data-centric Publish/Subscribe
Reactive Stream Processing for Data-centric Publish/SubscribeReactive Stream Processing for Data-centric Publish/Subscribe
Reactive Stream Processing for Data-centric Publish/Subscribe
 
Hadoop Integration with Microstrategy
Hadoop Integration with Microstrategy Hadoop Integration with Microstrategy
Hadoop Integration with Microstrategy
 
Enabling SQL Access to Data Lakes
Enabling SQL Access to Data LakesEnabling SQL Access to Data Lakes
Enabling SQL Access to Data Lakes
 
Real time data-pipeline from inception to production
Real time data-pipeline from inception to productionReal time data-pipeline from inception to production
Real time data-pipeline from inception to production
 
Kafka vs Spark vs Impala in bigdata .pptx
Kafka vs Spark vs Impala in bigdata .pptxKafka vs Spark vs Impala in bigdata .pptx
Kafka vs Spark vs Impala in bigdata .pptx
 
Massive sacalabilitty with InterSystems IRIS Data Platform
Massive sacalabilitty with InterSystems IRIS Data PlatformMassive sacalabilitty with InterSystems IRIS Data Platform
Massive sacalabilitty with InterSystems IRIS Data Platform
 
Comparison among rdbms, hadoop and spark
Comparison among rdbms, hadoop and sparkComparison among rdbms, hadoop and spark
Comparison among rdbms, hadoop and spark
 
Cassandra data modelling best practices
Cassandra data modelling best practicesCassandra data modelling best practices
Cassandra data modelling best practices
 
Data Analysis In The Cloud
Data Analysis In The CloudData Analysis In The Cloud
Data Analysis In The Cloud
 
A General Purpose Extensible Scanning Query Architecture for Ad Hoc Analytics
A General Purpose Extensible Scanning Query Architecture for Ad Hoc AnalyticsA General Purpose Extensible Scanning Query Architecture for Ad Hoc Analytics
A General Purpose Extensible Scanning Query Architecture for Ad Hoc Analytics
 
Aucfanlab Datalake - Big Data Management Platform -
Aucfanlab Datalake - Big Data Management Platform -Aucfanlab Datalake - Big Data Management Platform -
Aucfanlab Datalake - Big Data Management Platform -
 
Dataservices - Processing Big Data The Microservice Way
Dataservices - Processing Big Data The Microservice WayDataservices - Processing Big Data The Microservice Way
Dataservices - Processing Big Data The Microservice Way
 
Schema-based multi-tenant architecture using Quarkus &amp; Hibernate-ORM.pdf
Schema-based multi-tenant architecture using Quarkus &amp; Hibernate-ORM.pdfSchema-based multi-tenant architecture using Quarkus &amp; Hibernate-ORM.pdf
Schema-based multi-tenant architecture using Quarkus &amp; Hibernate-ORM.pdf
 
Real time analytics
Real time analyticsReal time analytics
Real time analytics
 
6° Sessione - Ambiti applicativi nella ricerca di tecnologie statistiche avan...
6° Sessione - Ambiti applicativi nella ricerca di tecnologie statistiche avan...6° Sessione - Ambiti applicativi nella ricerca di tecnologie statistiche avan...
6° Sessione - Ambiti applicativi nella ricerca di tecnologie statistiche avan...
 
Experimenting With Big Data
Experimenting With Big DataExperimenting With Big Data
Experimenting With Big Data
 
Big Data , Big Problem?
Big Data , Big Problem?Big Data , Big Problem?
Big Data , Big Problem?
 
عصر کلان داده، چرا و چگونه؟
عصر کلان داده، چرا و چگونه؟عصر کلان داده، چرا و چگونه؟
عصر کلان داده، چرا و چگونه؟
 
Final Report_798 Project_Nithin_Sharmila
Final Report_798 Project_Nithin_SharmilaFinal Report_798 Project_Nithin_Sharmila
Final Report_798 Project_Nithin_Sharmila
 

Recently uploaded

Tama Tonga MFT T shirts Tama Tonga MFT T shirts
Tama Tonga MFT T shirts Tama Tonga MFT T shirtsTama Tonga MFT T shirts Tama Tonga MFT T shirts
Tama Tonga MFT T shirts Tama Tonga MFT T shirts
exgf28
 
一比一原版(ubc毕业证书)英属哥伦比亚大学毕业证如何办理
一比一原版(ubc毕业证书)英属哥伦比亚大学毕业证如何办理一比一原版(ubc毕业证书)英属哥伦比亚大学毕业证如何办理
一比一原版(ubc毕业证书)英属哥伦比亚大学毕业证如何办理
taqyea
 
一比一原版(kcl毕业证书)英国伦敦国王学院毕业证如何办理
一比一原版(kcl毕业证书)英国伦敦国王学院毕业证如何办理一比一原版(kcl毕业证书)英国伦敦国王学院毕业证如何办理
一比一原版(kcl毕业证书)英国伦敦国王学院毕业证如何办理
taqyea
 
一比一原版(london毕业证书)英国伦敦大学毕业证如何办理
一比一原版(london毕业证书)英国伦敦大学毕业证如何办理一比一原版(london毕业证书)英国伦敦大学毕业证如何办理
一比一原版(london毕业证书)英国伦敦大学毕业证如何办理
taqyea
 
Dasdadadâfafafafafafgsgsgs adjasjdajda.docx
Dasdadadâfafafafafafgsgsgs adjasjdajda.docxDasdadadâfafafafafafgsgsgs adjasjdajda.docx
Dasdadadâfafafafafafgsgsgs adjasjdajda.docx
tuanqa6868
 
( Call  ) Girls South Ex 9873777170 High Profile beauty lady
( Call  ) Girls South Ex 9873777170 High Profile beauty lady( Call  ) Girls South Ex 9873777170 High Profile beauty lady
( Call  ) Girls South Ex 9873777170 High Profile beauty lady
Hyderabad Escorts Agency
 
Book dating , international dating phgra
Book dating , international dating phgraBook dating , international dating phgra
Book dating , international dating phgra
thomaskurtha9
 
一比一原版(city毕业证书)英国剑桥大学毕业证如何办理
一比一原版(city毕业证书)英国剑桥大学毕业证如何办理一比一原版(city毕业证书)英国剑桥大学毕业证如何办理
一比一原版(city毕业证书)英国剑桥大学毕业证如何办理
taqyea
 
一比一原版(uh毕业证)休斯敦大学毕业证如何办理
一比一原版(uh毕业证)休斯敦大学毕业证如何办理一比一原版(uh毕业证)休斯敦大学毕业证如何办理
一比一原版(uh毕业证)休斯敦大学毕业证如何办理
mvahxyy
 
Steps involved in the implementation of EDI in a company
Steps involved in the implementation of EDI in a companySteps involved in the implementation of EDI in a company
Steps involved in the implementation of EDI in a company
sivaraman163206
 
Kharghar @Call @Girls Whatsapp 9930687706 With High Profile Offer
Kharghar @Call @Girls Whatsapp 9930687706 With High Profile OfferKharghar @Call @Girls Whatsapp 9930687706 With High Profile Offer
Kharghar @Call @Girls Whatsapp 9930687706 With High Profile Offer
sonamgerg
 
一比一原版(soas毕业证书)英国伦敦大学亚非学院毕业证如何办理
一比一原版(soas毕业证书)英国伦敦大学亚非学院毕业证如何办理一比一原版(soas毕业证书)英国伦敦大学亚非学院毕业证如何办理
一比一原版(soas毕业证书)英国伦敦大学亚非学院毕业证如何办理
taqyea
 
一比一原版(brunel毕业证书)英国布鲁内尔大学毕业证如何办理
一比一原版(brunel毕业证书)英国布鲁内尔大学毕业证如何办理一比一原版(brunel毕业证书)英国布鲁内尔大学毕业证如何办理
一比一原版(brunel毕业证书)英国布鲁内尔大学毕业证如何办理
taqyea
 
sophos-xgs-series-firewall-datasheet.pdf
sophos-xgs-series-firewall-datasheet.pdfsophos-xgs-series-firewall-datasheet.pdf
sophos-xgs-series-firewall-datasheet.pdf
Thanksoan
 
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
taqyea
 
202254.com全网最高清影视香蕉影视,热门电影推荐,热门电视剧在线观看,免费电影,电影在线,在线观看。球华人在线電視劇,免费点播,免费提供最新高清的...
202254.com全网最高清影视香蕉影视,热门电影推荐,热门电视剧在线观看,免费电影,电影在线,在线观看。球华人在线電視劇,免费点播,免费提供最新高清的...202254.com全网最高清影视香蕉影视,热门电影推荐,热门电视剧在线观看,免费电影,电影在线,在线观看。球华人在线電視劇,免费点播,免费提供最新高清的...
202254.com全网最高清影视香蕉影视,热门电影推荐,热门电视剧在线观看,免费电影,电影在线,在线观看。球华人在线電視劇,免费点播,免费提供最新高清的...
ffg01100
 
very nice project on internet class 10.pptx
very nice project on internet class 10.pptxvery nice project on internet class 10.pptx
very nice project on internet class 10.pptx
bazukagaming6
 
一比一原版(liverpool毕业证)利物浦大学毕业证如何办理
一比一原版(liverpool毕业证)利物浦大学毕业证如何办理一比一原版(liverpool毕业证)利物浦大学毕业证如何办理
一比一原版(liverpool毕业证)利物浦大学毕业证如何办理
mvahxyy
 
一比一原版(爱大毕业证书)英国爱丁堡大学毕业证如何办理
一比一原版(爱大毕业证书)英国爱丁堡大学毕业证如何办理一比一原版(爱大毕业证书)英国爱丁堡大学毕业证如何办理
一比一原版(爱大毕业证书)英国爱丁堡大学毕业证如何办理
taqyea
 
Future Trends What's Next for UI UX Design on Websites
Future Trends What's Next for UI UX Design on WebsitesFuture Trends What's Next for UI UX Design on Websites
Future Trends What's Next for UI UX Design on Websites
Serva AppLabs
 

Recently uploaded (20)

Tama Tonga MFT T shirts Tama Tonga MFT T shirts
Tama Tonga MFT T shirts Tama Tonga MFT T shirtsTama Tonga MFT T shirts Tama Tonga MFT T shirts
Tama Tonga MFT T shirts Tama Tonga MFT T shirts
 
一比一原版(ubc毕业证书)英属哥伦比亚大学毕业证如何办理
一比一原版(ubc毕业证书)英属哥伦比亚大学毕业证如何办理一比一原版(ubc毕业证书)英属哥伦比亚大学毕业证如何办理
一比一原版(ubc毕业证书)英属哥伦比亚大学毕业证如何办理
 
一比一原版(kcl毕业证书)英国伦敦国王学院毕业证如何办理
一比一原版(kcl毕业证书)英国伦敦国王学院毕业证如何办理一比一原版(kcl毕业证书)英国伦敦国王学院毕业证如何办理
一比一原版(kcl毕业证书)英国伦敦国王学院毕业证如何办理
 
一比一原版(london毕业证书)英国伦敦大学毕业证如何办理
一比一原版(london毕业证书)英国伦敦大学毕业证如何办理一比一原版(london毕业证书)英国伦敦大学毕业证如何办理
一比一原版(london毕业证书)英国伦敦大学毕业证如何办理
 
Dasdadadâfafafafafafgsgsgs adjasjdajda.docx
Dasdadadâfafafafafafgsgsgs adjasjdajda.docxDasdadadâfafafafafafgsgsgs adjasjdajda.docx
Dasdadadâfafafafafafgsgsgs adjasjdajda.docx
 
( Call  ) Girls South Ex 9873777170 High Profile beauty lady
( Call  ) Girls South Ex 9873777170 High Profile beauty lady( Call  ) Girls South Ex 9873777170 High Profile beauty lady
( Call  ) Girls South Ex 9873777170 High Profile beauty lady
 
Book dating , international dating phgra
Book dating , international dating phgraBook dating , international dating phgra
Book dating , international dating phgra
 
一比一原版(city毕业证书)英国剑桥大学毕业证如何办理
一比一原版(city毕业证书)英国剑桥大学毕业证如何办理一比一原版(city毕业证书)英国剑桥大学毕业证如何办理
一比一原版(city毕业证书)英国剑桥大学毕业证如何办理
 
一比一原版(uh毕业证)休斯敦大学毕业证如何办理
一比一原版(uh毕业证)休斯敦大学毕业证如何办理一比一原版(uh毕业证)休斯敦大学毕业证如何办理
一比一原版(uh毕业证)休斯敦大学毕业证如何办理
 
Steps involved in the implementation of EDI in a company
Steps involved in the implementation of EDI in a companySteps involved in the implementation of EDI in a company
Steps involved in the implementation of EDI in a company
 
Kharghar @Call @Girls Whatsapp 9930687706 With High Profile Offer
Kharghar @Call @Girls Whatsapp 9930687706 With High Profile OfferKharghar @Call @Girls Whatsapp 9930687706 With High Profile Offer
Kharghar @Call @Girls Whatsapp 9930687706 With High Profile Offer
 
一比一原版(soas毕业证书)英国伦敦大学亚非学院毕业证如何办理
一比一原版(soas毕业证书)英国伦敦大学亚非学院毕业证如何办理一比一原版(soas毕业证书)英国伦敦大学亚非学院毕业证如何办理
一比一原版(soas毕业证书)英国伦敦大学亚非学院毕业证如何办理
 
一比一原版(brunel毕业证书)英国布鲁内尔大学毕业证如何办理
一比一原版(brunel毕业证书)英国布鲁内尔大学毕业证如何办理一比一原版(brunel毕业证书)英国布鲁内尔大学毕业证如何办理
一比一原版(brunel毕业证书)英国布鲁内尔大学毕业证如何办理
 
sophos-xgs-series-firewall-datasheet.pdf
sophos-xgs-series-firewall-datasheet.pdfsophos-xgs-series-firewall-datasheet.pdf
sophos-xgs-series-firewall-datasheet.pdf
 
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
 
202254.com全网最高清影视香蕉影视,热门电影推荐,热门电视剧在线观看,免费电影,电影在线,在线观看。球华人在线電視劇,免费点播,免费提供最新高清的...
202254.com全网最高清影视香蕉影视,热门电影推荐,热门电视剧在线观看,免费电影,电影在线,在线观看。球华人在线電視劇,免费点播,免费提供最新高清的...202254.com全网最高清影视香蕉影视,热门电影推荐,热门电视剧在线观看,免费电影,电影在线,在线观看。球华人在线電視劇,免费点播,免费提供最新高清的...
202254.com全网最高清影视香蕉影视,热门电影推荐,热门电视剧在线观看,免费电影,电影在线,在线观看。球华人在线電視劇,免费点播,免费提供最新高清的...
 
very nice project on internet class 10.pptx
very nice project on internet class 10.pptxvery nice project on internet class 10.pptx
very nice project on internet class 10.pptx
 
一比一原版(liverpool毕业证)利物浦大学毕业证如何办理
一比一原版(liverpool毕业证)利物浦大学毕业证如何办理一比一原版(liverpool毕业证)利物浦大学毕业证如何办理
一比一原版(liverpool毕业证)利物浦大学毕业证如何办理
 
一比一原版(爱大毕业证书)英国爱丁堡大学毕业证如何办理
一比一原版(爱大毕业证书)英国爱丁堡大学毕业证如何办理一比一原版(爱大毕业证书)英国爱丁堡大学毕业证如何办理
一比一原版(爱大毕业证书)英国爱丁堡大学毕业证如何办理
 
Future Trends What's Next for UI UX Design on Websites
Future Trends What's Next for UI UX Design on WebsitesFuture Trends What's Next for UI UX Design on Websites
Future Trends What's Next for UI UX Design on Websites
 

Predictive maintenance withsensors_in_utilities_

  • 1. Predictive Maintenance with Sensors in Utilities Tina Zhang
  • 2. Agenda  Sensors in IOT era  Predictive Maintenance  Predictive Maintenance with sensor data in Utilities industry  Architecture for real time distributed sensor data collection, analysis, visualization, and storage system  Modeling imprecise sensor readings
  • 3. Sensors in IOT era  Sensors Sensors are a bridge between the physical world and the internet. They will play an ever increasing role in just about every field imaginable, and powering the “Internet of Things”.  Potential Uses of Sensor Data  Sensors can be used to monitor machines, infrastructure, and environment such as ventilation equipment, bridges, energy meters, airplane engines, temperature, humility, etc.  One use of this data is for predictive maintenance, to repair or replace the items before they break.
  • 4. 3 classes of Maintenance  Corrective maintenance (CM), is simply fixing things after they suffer a breakdown and can also be called Reactive maintenance.  Preventive maintenance (PM), is about replacing or replenishing consumables at scheduled intervals.  Predictive maintenance (PdM) or Condition-based maintenance, focuses on detecting failures before they occur. PdM incorporates inspections of the system at predetermined intervals to determine system condition. Depending on the outcome of a continual inspection, either a preventive or no maintenance activity is performed.
  • 5. Fault Detection Method in Predictive Maintenance  PdM employs many fault or defect detection methods which compare current sensor or inspection data with some reference data.  If the reference data are the outcome of a representation of the real system, the fault detection method is called model-based. Mainly, two distinctive kind of models are used, analytical models and machine learning models: Analytical models are limited to represent linear characteristics, however modern machine learning techniques based on artificial intelligence, as neural networks or Bayesian (beliefs) networks or support vector machines are capable of including nonlinearities and complex interdependencies. Even a relatively "simple" machine learning tool such as a decision tree can allow for nonlinearities.
  • 6. Machine Learning in Predictive Maintenance  Data Mining and Machine Learning allow systematic classifying of patterns contained in data sets.  Patterns of data, “attributes”, containing information about condition of physical assets can be represented by “instances” with an associated failure mode, or “class”.  Predictions can be made based on patterns in real time data.
  • 7. Decision tree model example  Here is an instance of building a decision tree model where the strategy is to either perform maintenance or not based on outcome from several independent measurements (variables).
  • 9. Predictive Maintenance in Utility industry  By analyzing the patterns of circumstances surrounding past equipment failures and power outages and by accessing multiple data sources including sensors in real time, utility companies can predict and prevent future failures.  Predictive Maintenance allows utility companies to not only prepare for known consumption peaks, such as those caused by extreme weather conditions, but also react quickly to unexpected problems when the warning signs appear.  Utility companies can spot the problem early on:  When some of the values of some sensor are not normal;  When the number of abnormal values exceeds a given threshold;  Or when the values of a given sensor are significantly different from the values of its neighbors.
  • 10. Big and fast sensor data requires a different architecture  Due to the rapid advances in sensor technologies, the number of sensors and the amount of sensor data have been increasing with incredible rates.  Therefore the scalability, availability, speed requirements for sensor data collection, storage, and analysis solutions call for use of new technologies, which have the ability to efficiently distribute data over many servers and dynamically add new attributes to data records.
  • 11. Architecture for a real time distributed sensor data collection, analysis, visualization, and storage system  The new architecture must be able to scale to support a large number of sensors and big data sizes.  It must be able to automatically gather and analyze large number of sensor measurements over long periods of time and also to deploy statistics and machine learning to execute computationally complex data analysis algorithms with many influence factors.  Open source big data frameworks can be utilized for large-scale sensor data analysis requirements.
  • 12. Socket Shared Files User Kafka Web Service Data Source : : Spark Streaming & Spark SQL & ML lib HDFS Web UI HBase Analysis results Kafka Hive
  • 13. An example use case  Display all the transformers located in City Houston, Texas on the map, and when a transformer icon is clicked, display in an info window the following details for each transformer: Transformer ID, Age, Designed Capacity, exact location, and the current Load reading.  If a transformer is of Type “Pole-Top”, with Rating 230, Age > 20, and if its load has exceeds its designed capacity by more than 10 kVA, and also in the location where the transformer is located, air temperature >100 degrees, we'll highlight the transformer icon as red.  When user clicks on the specific transformer, we'll populate the details for the transformer, including its Load reading. Both the transformer icon color and the transformer Load reading (with red or green color) will continuously update every second in real time.
  • 14. Why Spark?  Spark presents a new distributed memory abstraction, called resilient distributed datasets (RDDs), which provides a data structure for in-memory computations on large clusters.  RDDs can achieve fault tolerance, meaning that if a given task fails due to some reasons such as hardware failures and erroneous user code, lost data can be recovered and reconstructed automatically on the remaining tasks.  Spark has a Java high-level API for working with distributed data similar to Hadoop and presents an in-memory processing solution.  We run Spark on Hortonworks HDP2.2 in YARN mode, also have made Spark 1.3.1 work on HDP2.2 (default Spark version: 1.2).
  • 15. Spark Streaming  Spark Streaming is an extension of the core Spark API that allows to enable high-throughput, fault-tolerant stream processing of live data streams.  It offers an additional abstraction called discretized streams, or DStreams. DStreams are a continuous sequence of RDDs representing a stream of data.  DStreams can be created from live incoming data or by transforming other DStreams.  Spark receives data, divides it into batches, then replicates the batches for fault tolerance and persists them in memory where they are available for mathematical operations.  Spark 1.3 offers Streaming K-means Clustering and Streaming Linear Regression
  • 16. Spark SQL  Spark SQL is Spark's module for working with structured data.  The foundation of Spark SQL is a type of RDD, called SchemaRDD (pre-V1.3) or DataFrame (V1.3), an object similar to a table in a relational database.  Spark SQL can run queries against mixed types of data Spark piece in detail:
  • 17. Sensor Data Storage – HBase  NoSQL databases provide efficient alternatives for large amount of sensor data storage. In this example, we will use HBase, a NoSQL key/value store which runs on top of HDFS.  Unlike Hive, HBase operations run in real-time on its database rather than batch-based MapReduce jobs.  Each key/value pair in HBase is defined as a cell, and each key consists of row-key, column family, column, and time-stamp. A row in HBase is a grouping of key/value mappings identified by the row-key. In our case, we’ll store the anomaly sensor data in a table “abnormal_ load” in the format of: key, Transformer_ID, Timestamp, Load, Overload, Location, Air_Temperature  We can query our HBase table by creating an external Hive table, linking the HBase table to the Hive table, and then running HiveQL: select Transformer_ID, Timestamp, Overload from spark_poc.abnormal_load where Overload > 20 and Air_Temperature>105 order by Timestamp DESC;
  • 18. Why sending all sources data to Kafka In the diagrams in the next 2 slides:  The first shows what happens without Kafka. Since each source needs to have a connection to each target, it is difficult to maintain and can cause lots of programming and security issues.  The second diagram uses the Kafka, so all sources send data to Kafka. We only to develop one interface/program to get all different data into Kafka. Each different data is one topic. And from consumer side, a consumer only deals with Kafka. When we add a new source or a new consumer, it does not affect any existing source or target at all. Thus it is easy to maintain, clean, secure, scalable.
  • 20. Data Pipe Lines With Kafka Kafka HBase Hive Sources Targets HDFS DB
  • 21. Why write analysis result data stream to Kafka before publishing it to web UI  This is because if we send data steam (analysis result) to a queue on the web server and then use web socket to push to the browser, it is very tedious to maintain the queue.  Kafka comes handy as a distributed, persistent message queue which supports multiple concurrent writers, as well as multiple groups of readers that maintain their own offsets within the queue (which Kafka calls a ‘topic’). This enables us to build applications that consume data from a topic at their own pace without disrupting access from other groups of readers.
  • 22. Sensor Data Analysis  To analyze data on the aforementioned architecture we use distributed machine-learning algorithms in Apache Mahout and MLlib by Apache Spark.  MLlib is a Spark component and a fast and flexible iterative computing framework to implement machine-learning algorithms, including classification, clustering, linear regression, collaborative filtering, and decomposition aims to create and analyze large-scale data hosted in memory.  We use -means algorithm for clustering sensor data and find the anomalies. - means algorithm is a very popular unsupervised learning algorithm. It aims to assign objects to groups. All of the objects to be grouped need to be represented as numerical features. The technique iteratively assigns points to clusters using distance as a similarity factor until there is no change in which point belongs to which cluster.  We also use Spark’s Streaming K-means.
  • 23. Modeling imprecise sensor readings  Sensor readings are inherently imprecise because of the noise introduced by the equipment itself.  Two main approaches have emerged for modeling uncertain data series:  In the first, a Probability Density Function (PDF) over the uncertain values is estimated by using some a priori knowledge.  In the second, the uncertain data distribution is summarized by repeated measurements (i.e., samples).
  • 24. Dynamic probabilistic models over the sensor readings  The KEN technique builds and maintains dynamic probabilistic models over the sensor readings, taking into account the spatio-temporal correlations that exist in the sensor readings.  These models organize the sensor nodes in non-overlapping groups, and are shared by the sensor nodes and the sink.  The expected values of the probabilistic models are the values that are recorded by the sink. If the sensors observe that these values are more than εVT away from the sensed values, then a model update is triggered.  The PAQ and SAF methods employ linear regression and autoregressive models, respectively, for modeling the measurements produced by the nodes, with SAF leading to a more accurate model than PAQ.