(Go: >> BACK << -|- >> HOME <<)

SlideShare a Scribd company logo
Building an Enterprise/Cloud
Analytics Platform with Jupyter
Notebooks and Apache Spark
Fred Reiss
Chief Architect, IBM Spark Technology Center
2
Hi!
Fred Reiss
• 2014-present: Chief Architect,
IBM Spark Technology Center.
• 2006-2014: Worked for IBM
Research.
• 2006: Ph.D. from U.C.
Berkeley.
3
The Jupyter Project
• Open Source project that builds software to enable
interactive notebooks for data science
– Started in 2014
– Grew out of the IPython project
4
What is IPython?
https://upload.wikimedia.org/wikipedia/commons/4/47/IPython-shell.png
By Shishirdasika (Own work) [CC BY-SA 3.0 (https://creativecommons.org/licenses/by-sa/3.0)], via Wikimedia Commons
Interactive
console for
Python
Can open a
window to
display
graphics
5
IPython Notebooks
https://upload.wikimedia.org/wikipedia/commons/a/af/IPython-notebook.png
By Shishirdasika (Own work) [CC BY-SA 3.0 (https://creativecommons.org/licenses/by-sa/3.0)], via Wikimedia Commons
Text and
graphics in the
same browser
window
6
Jupyter Notebooks Today
https://developer.ibm.com/code/patterns/create-visualizations-to-understand-food-insecurity/
User
Input
System
Output
Tables
Graphs
Text output
Cells
7
Jupyter Notebooks
• Jupyter notebooks are widely used by data scientists, social
scientists, physical scientists, engineers, and others
• Useful for many tasks
– Analyzing data
– Developing and debugging software
– Running experiments
– Keeping track of experimental results
– Presenting results
• Jupyter is a central part of the IBM Data Science Experience
(http://datascience.ibm.com)
8
Jupyter in the Enterprise: Key Challenges
• Collaboration among multiple users
• Large-scale data analysis (problems that don’t fit in
a laptop)
– Shared cloud infrastructure like Kubernetes
– Parallel frameworks like Spark
• Security and authentication
• Auditing and data access control
9
Isn’t this just shipping strings around?
JavaScript
“1+1”
Server
“1+1”
Python
Process
“1+1”
“2”“2”“2”
10
Isn’t this just shipping strings around?
JavaScript
“1+1”
FancyNewSystem
“1+1”
Python
Process
“1+1”
“2”“2”“2”
Security
Multitenancy
Authentication
Spark
Kubernetes
11
The Five Stages of Enterprise Jupyter Deployment
12
The Five Stages of Enterprise Jupyter Deployment
1. Denial
13
Jupyter does more than just pass strings around.
• Quite a bit more!
14
Asynchronous Operations
• Queue up multiple cells for
execution
– …in arbitrary order
• Stream output while a cell is
running
• Interrupt any operation
Fifteenth cell
that executed in
this session
15
Jupyter’s Display System: Much More than Text
https://nbviewer.jupyter.org/github/ipython/ipython/bl
ob/master/examples/IPython%20Kernel/Custom%2
0Display%20Logic.ipynb
16
Profiling and Debugging
17
Magics
• Jupyter’s
standard
Python kernel
has over 90
built-in magic
commands
http://ipython.readthedocs.io/en/stable/interactive/magics.html
18
Extensions
• Many additional
extensions in the
iPython project’s
Github repository
– https://github.com/ip
ython-
contrib/jupyter_contri
b_nbextensions
19
PixieDust
https://developer.ibm.com/code/patterns/analyze-san-francisco-traffic-data-with-ibm-pixiedust-and-data-science-experience/
20
Brunel
https://developer.ibm.com/open/videos/brunel-visualization-update-tech-talk/
21
The Actual Architecture of Jupyter Notebooks
Notebook Server Process
22
The Actual Architecture of Jupyter Notebooks
JavaScript
NotebookManagement
Python Process
KernelManagement
iPythonKernel
Notebook
Server
State
KernelProxy
Shell
IOPub
stdin
control
heartbeat
Kernel
Session
State
UserCode
sklearn
Spark
Tensor
Flow
…
Local
Filesystem
23
The Five Stages of Enterprise Jupyter Deployment
1. Denial
24
The Five Stages of Enterprise Jupyter Deployment
1. Denial
2. Anger
Notebook Server Process
25
The Actual Architecture of Jupyter Notebooks
JavaScript
NotebookManagement
Python Process
KernelManagement
iPythonKernel
Notebook
Server
State
KernelProxy
Shell
IOPub
stdin
control
heartbeat
Kernel
Session
State
UserCode
sklearn
Spark
Tensor
Flow
…
Local
Filesystem
Notebook Server Process
26
The Actual Architecture of Jupyter Notebooks
JavaScript
NotebookManagement
Python Process
KernelManagement
iPythonKernel
Notebook
Server
State
KernelProxy
Shell
IOPub
stdin
control
heartbeat
Kernel
Session
State
UserCode
sklearn
Spark
Tensor
Flow
…
Local
Filesystem
Notebook Server Process
27
The Actual Architecture of Jupyter Notebooks
JavaScript
NotebookManagement
Python Process
KernelManagement
iPythonKernel
Notebook
Server
State
KernelProxy
Shell
IOPub
stdin
control
heartbeat
Kernel
Session
State
UserCode
sklearn
Spark
Tensor
Flow
…
Local
Filesystem
Notebook Server Process
28
The Actual Architecture of Jupyter Notebooks
JavaScript
NotebookManagement
Python Process
KernelManagement
iPythonKernel
Notebook
Server
State
KernelProxy
Shell
IOPub
stdin
control
heartbeat
Kernel
Session
State
UserCode
sklearn
Spark
Tensor
Flow
…
Local
Filesystem
Five ZeroMQ
message queues over
unencrypted TCP
sockets…
…per kernel
29
Third-Party Kernels
• The IPython kernel is
the most common…
• …but there is a long tail
of other Jupyter kernels
– 103 kernels currently
listed on the Jupyter
project’s wiki
Notebook Server Process
30
The Actual Architecture of Jupyter Notebooks
JavaScript
NotebookManagement
Python Process
KernelManagement
iPythonKernel
Notebook
Server
State
KernelProxy
Shell
IOPub
stdin
control
heartbeat
Kernel
Session
State
UserCode
sklearn
Spark
Tensor
Flow
…
Local
Filesystem
To share
notebooks among
users, need to
share notebook
server
Notebook Server Process
31
The Actual Architecture of Jupyter Notebooks
JavaScript
NotebookManagement
Python Process
KernelManagement
iPythonKernel
Notebook
Server
State
KernelProxy
Shell
IOPub
stdin
control
heartbeat
Kernel
Session
State
UserCode
sklearn
Spark
Tensor
Flow
…
Local
Filesystem
To use Apache
Spark™ on YARN,
need to be inside
the YARN cluster’s
network.
32
Jupyter in the Enterprise: Key Challenges
• Collaboration among multiple users
• Large-scale data analysis
– Shared cloud infrastructure like Kubernetes
– Parallel frameworks like Spark
• Security and authentication
• Auditing and data access control
Bringing these properties to the Jupyter stack is hard!
33
The Five Stages of Enterprise Jupyter Deployment
1. Denial
2. Anger
34
The Five Stages of Enterprise Jupyter Deployment
1. Denial
2. Anger
3. Bargaining
35
Bargaining
• Meeting all the enterprise requirements is expensive
• Compromise to bring down the cost
36
Compromise #1: Gigantic Server
• Find the biggest machine or container you can get
• Run the entire Jupyter stack on that one machine
• Issues:
– Machine needs to be sized for the maximum aggregate
memory of all active users’ active kernels
• Hard upper limit of 256GB-1TB in most organizations
• Very problematic if you have many users and big data
– Need to authenticate all these users to the same machine
and notebook server
37
Compromise #2: Notebook Server Per User
• Proxy server manages a pool of containers, one per active user
• Each container contains an entire Jupyter notebook stack
• JupyterHub project provides a pre-built implementation of this
approach
• Issues:
– Container needs to be big enough for all the user’s kernels
• What size container to allocate when the user logs in?
• Does a big enough container even exist?
– Disables collaboration features
– Many more moving parts  More failure modes
38
Compromise #3: Replace the Kernel
iPythonKernel
KernelProxy
Shell
IOPub
stdin
control
heartbeat
KernelProxy
Proxy
39
Compromise #3: Replace the Kernel
• Replace the IPython kernel with a proxy
• Put something enterprise-friendly on
the other side of the proxy
• Apache Livy implements this approach
– https://github.com/jupyter-
incubator/sparkmagic
• Issues:
– Breaks Jupyter’s magics and extensions
– Breaks data visualization libraries
– Breaks third-party kernels
– Less control over code execution
Shell
IOPub
stdin
control
heartbeat
RESTfulwebservice
40
The Five Stages of Enterprise Jupyter Deployment
1. Denial
2. Anger
3. Bargaining
41
The Five Stages of Enterprise Jupyter Deployment
1. Denial
2. Anger
3. Bargaining
4. Depression
42
The Five Stages of Enterprise Jupyter Deployment
1. Denial
2. Anger
3. Bargaining
4. Depression
5. Jupyter Enterprise Gateway
43
The Origins of Jupyter Enterprise Gateway
• Multiple IBM products embedding Spark on YARN
• All wanted to add Jupyter notebooks with Spark
• Usual enterprise requirements (multitenancy,
scalability, security, etc.)
• Had reached the “Bargaining” stage
– Mix of compromises 1, 2, and 3
YARN Cluster
Initial
Prototype
44
Security
Layer
YARN
Workers
YARN
Resource
Manager
Spark
ExecutorsSpark
ExecutorsSpark
Executors
Spark
ExecutorsSpark
ExecutorsSpark
Executors
Notebook Node
nb2kg
(Proxy)
nb2kg
Jupyter
Kernel
Gateway
Python
Kernel
Spark Driver
Python
Kernel
Spark Driver
Shell
IOPub
stdin
control
heartbeat
YARN Cluster
Initial
Prototype
45
Security
Layer
YARN
Workers
YARN
Resource
Manager
Spark
ExecutorsSpark
ExecutorsSpark
Executors
Spark
ExecutorsSpark
ExecutorsSpark
Executors
Notebook Node
nb2kg
(Proxy)
nb2kg
Jupyter
Kernel
Gateway
Python
Kernel
Spark Driver
Python
Kernel
Spark Driver
Shell
IOPub
stdin
control
heartbeat
Issue #2: All
Spark jobs
run as same
user ID
Issue #1: All kernels
and Spark drivers
run on a single node
Issue #1: All kernels run on a single node
8 8 8 8
0
10
20
30
40
50
60
70
80
4 Nodes 8 Nodes 12 Nodes 16 Nodes
MaxKernels(4GBHeap)
Cluster Size (32GB Nodes)
Maximum Number of Simultaneous Kernels
46
Jupyter Enterprise Gateway: Initial Goals
• Optimized Resource Allocation
– Run Spark in YARN Cluster Mode to better utilize cluster resources.
– Pluggable architecture for additional Resource Managers
• Multiuser support with user impersonation
– Enhance security and sandboxing by enabling user impersonation
when running kernels (using Kerberos).
– Individual HDFS home folder for each notebook user.
– Use the same user ID for notebook and batch jobs.
• Enhanced Security
– Secure socket communications
– Any network communication should be encrypted
47
YARN Cluster
Jupyter Enterprise Gateway
48
Security
Layer
YARN
Workers
Jupyter EnterpriseGateway
Multitenancy
Remote kernels and Kernel Lifecycle management
Spark Executors
Spark Executors
Spark Executors
Yarn Container
Jupyter Kernel
Spark Driver
Spark Executors
Spark Executors
Spark Executors
Yarn Container
Jupyter Kernel
Spark Driver
Spark Executors
Spark Executors
Spark Executors
Yarn Container
Jupyter Kernel
Spark Driver
Impersonation:
Alice’s kernel
runs under
Alice’s user ID.
Scalability Benefits
8 8 8 8
16
32
48
64
0
10
20
30
40
50
60
70
80
4 Nodes 8 Nodes 12 Nodes 16 Nodes
MaxKernels(4GBHeap)
Cluster Size (32GB Nodes)
Maximum Number of Simultaneous Kernels
Before JEG
After JEG
49
Jupyter Enterprise Gateway: Open Source
50
• Released through the
Jupyter Incubator
– BSD License
– https://github.com/jupyter-
incubator/enterprise_gatew
ay
– Current release: 0.7.0
Jupyter Enterprise Gateway: Supported Platforms
• Python/Spark 2.x using IPython kernel
– With Spark Context delayed initialization
• Scala 2.11/ Spark 2.x using Apache Toree kernel
– With Spark Context delayed initialization
• R / Spark 2.x with IRkernel
51
Jupyter Enterprise Gateway – Roadmap
• Add support for other resource managers
– Kubernetes support
• Kernel Configuration Profile
– Enable client to request different resource configuration for kernels (e.g. small,
medium, large)
– Profiles should be defined by Administrators and enabled for user/group of users.
• Administration UI
– Dashboard with running kernels and administration actions
• Time running, stop/kill, Profile Management, etc
• User Environments
• High Availability
52
Jupyter Enterprise Gateway
• Jupyter Enterprise Gateway at IBM Code
– https://developer.ibm.com/code/openprojects/jupyter-enterprise-gateway/
• Jupyter Enterprise Gateway source code at GitHub
– https://github.com/jupyter-incubator/enterprise_gateway
• Docker images
– https://github.com/jupyter-
incubator/enterprise_gateway/tree/master/etc/docker
• Jupyter Enterprise Gateway 0.7 release
– https://github.com/jupyter-incubator/enterprise_gateway/releases/tag/v0.7.0
• Jupyter Enterprise Gateway Documentation
– http://jupyter-enterprise-gateway.readthedocs.io/en/latest/
53
Free
IBM Data Science
trial
https://ibm.biz/BdZceR
54
Thank you!
And special thanks to the Jupyter
Enterprise Gateway team: Luciano Resende,
Kevin Bates, Kun Liu, Christian Kadner,
Sanjay Saxena, Alan Chin, Sherry Guo, Alex
Bozarth, Zee Chen
55
Backup
56
Building your own test
environment with
Jupyter Enterprise Gateway
Jupyter Enterprise Gateway: Deployment
57
Management
Node
Powered by
Ambari
EG
Compute Engine based on Apache Spark
Jupyter Enterprise Gateway: Deployment
• Ansible deployment scripts
– https://github.com/lresende/spark-cluster-install
• One click deployment of the Spark Cluster
– Configure your host inventory (see example on git repository)
– Run the ”setup-ambari.yml” playbook
• $ ansible-playbook --verbose setup-ambari.yml -i hosts-fyre-ambari -c paramiko
• One click deployment of the Jupyter Enterprise Engine
– Run the ”setup-enterprise-gateway.yml” playbook
• $ ansible-playbook --verbose setup-enterprise-gateway.yml -i hosts-fyre-ambari -c
paramiko
58
Jupyter Enterprise Gateway - Deployment
• Docker images
– yarn-spark: Basic one node Spark on Yarn configuration
– enterprise-gateway: Adds Anaconda and Jupyter Enterprise Gateway to the
yarn-spark image
– nb2kg: Minimal Jupyter Notebook client configured with hooks to access the
Enterprise Gateway
– https://github.com/jupyter-incubator/enterprise_gateway/tree/master/etc/docker
• Building the latest docker images
– git checkout https://github.com/jupyter-incubator/enterprise_gateway
– make docker-clean docker-images
– Note: Make also have individual targets to clean and build individual images
(type make for help)
59
Jupyter Enterprise Gateway - Deployment
• Connecting to a Spark Cluster using a docker image
docker run -t --rm 
-e KG_URL='http://<Enterprise Gateway IP>:8888' 
-p 8888:8888 
-e VALIDATE_KG_CERT='no' 
-e LOG_LEVEL=DEBUG 
-e KG_REQUEST_TIMEOUT=40 
-e KG_CONNECT_TIMEOUT=40 
-v ${HOME}/opensource/jupyter/jupyter-notebooks/:/tmp/notebooks 
-w /tmp/notebooks 
elyra/nb2kg:dev
60

More Related Content

What's hot

Flaky tests and bugs in Apache software (e.g. Hadoop)
Flaky tests and bugs in Apache software (e.g. Hadoop)Flaky tests and bugs in Apache software (e.g. Hadoop)
Flaky tests and bugs in Apache software (e.g. Hadoop)
Akihiro Suda
 
Finding and Organizing a Great Cloud Foundry User Group
Finding and Organizing a Great Cloud Foundry User GroupFinding and Organizing a Great Cloud Foundry User Group
Finding and Organizing a Great Cloud Foundry User Group
Daniel Krook
 
Puppet & Jenkins
Puppet & JenkinsPuppet & Jenkins
Puppet & Jenkins
Matthew Barr
 
Tackling non-determinism in Hadoop - Testing and debugging distributed system...
Tackling non-determinism in Hadoop - Testing and debugging distributed system...Tackling non-determinism in Hadoop - Testing and debugging distributed system...
Tackling non-determinism in Hadoop - Testing and debugging distributed system...
Akihiro Suda
 
Slack Bot: upload NUGET package to Artifactory
Slack Bot: upload NUGET package to ArtifactorySlack Bot: upload NUGET package to Artifactory
Slack Bot: upload NUGET package to Artifactory
Sergey Dzyuban
 
Continuous Deployment at Disqus (Pylons Minicon)
Continuous Deployment at Disqus (Pylons Minicon)Continuous Deployment at Disqus (Pylons Minicon)
Continuous Deployment at Disqus (Pylons Minicon)
zeeg
 
CPOSC2014: Next Generation Cloud -- Rise of the Unikernel
CPOSC2014: Next Generation Cloud -- Rise of the UnikernelCPOSC2014: Next Generation Cloud -- Rise of the Unikernel
CPOSC2014: Next Generation Cloud -- Rise of the Unikernel
The Linux Foundation
 
SCALE13x: Next Generation of the Cloud - Rise of the Unikernel
SCALE13x: Next Generation of the Cloud - Rise of the UnikernelSCALE13x: Next Generation of the Cloud - Rise of the Unikernel
SCALE13x: Next Generation of the Cloud - Rise of the Unikernel
The Linux Foundation
 
FusionInventory at LSM/RMLL 2012
FusionInventory at LSM/RMLL 2012FusionInventory at LSM/RMLL 2012
FusionInventory at LSM/RMLL 2012
Nouh Walid
 
Docker openstack-2014
Docker openstack-2014Docker openstack-2014
Docker openstack-2014
OpenCity Community
 
Training Ensimag OpenStack 2016
Training Ensimag OpenStack 2016Training Ensimag OpenStack 2016
Training Ensimag OpenStack 2016
Bruno Cornec
 
OaaS:Open as a Strategy
OaaS:Open as a StrategyOaaS:Open as a Strategy
OaaS:Open as a Strategy
OpenCity Community
 
Opening words at DockerCon Europe by Ben Golub
Opening words at DockerCon Europe by Ben Golub Opening words at DockerCon Europe by Ben Golub
Opening words at DockerCon Europe by Ben Golub
Docker, Inc.
 
Using Embedded Linux for Infrastructure Systems
Using Embedded Linux for Infrastructure SystemsUsing Embedded Linux for Infrastructure Systems
Using Embedded Linux for Infrastructure Systems
Yoshitake Kobayashi
 
Learn OpenStack from trystack.cn
Learn OpenStack from trystack.cnLearn OpenStack from trystack.cn
Learn OpenStack from trystack.cn
OpenCity Community
 
Eclipse e4
Eclipse e4Eclipse e4
Eclipse e4
Chris Aniszczyk
 
Platform for a Connected World
Platform for a Connected WorldPlatform for a Connected World
Platform for a Connected World
All Things Open
 
Containers & CaaS
Containers & CaaSContainers & CaaS
Containers & CaaS
OpenCity Community
 
Moby Open Source Summit North America 2017
Moby Open Source Summit North America 2017Moby Open Source Summit North America 2017
Moby Open Source Summit North America 2017
Patrick Chanezon
 
Neo4J with Docker and Azure - GraphConnect 2015
Neo4J with Docker and Azure - GraphConnect 2015Neo4J with Docker and Azure - GraphConnect 2015
Neo4J with Docker and Azure - GraphConnect 2015
Patrick Chanezon
 

What's hot (20)

Flaky tests and bugs in Apache software (e.g. Hadoop)
Flaky tests and bugs in Apache software (e.g. Hadoop)Flaky tests and bugs in Apache software (e.g. Hadoop)
Flaky tests and bugs in Apache software (e.g. Hadoop)
 
Finding and Organizing a Great Cloud Foundry User Group
Finding and Organizing a Great Cloud Foundry User GroupFinding and Organizing a Great Cloud Foundry User Group
Finding and Organizing a Great Cloud Foundry User Group
 
Puppet & Jenkins
Puppet & JenkinsPuppet & Jenkins
Puppet & Jenkins
 
Tackling non-determinism in Hadoop - Testing and debugging distributed system...
Tackling non-determinism in Hadoop - Testing and debugging distributed system...Tackling non-determinism in Hadoop - Testing and debugging distributed system...
Tackling non-determinism in Hadoop - Testing and debugging distributed system...
 
Slack Bot: upload NUGET package to Artifactory
Slack Bot: upload NUGET package to ArtifactorySlack Bot: upload NUGET package to Artifactory
Slack Bot: upload NUGET package to Artifactory
 
Continuous Deployment at Disqus (Pylons Minicon)
Continuous Deployment at Disqus (Pylons Minicon)Continuous Deployment at Disqus (Pylons Minicon)
Continuous Deployment at Disqus (Pylons Minicon)
 
CPOSC2014: Next Generation Cloud -- Rise of the Unikernel
CPOSC2014: Next Generation Cloud -- Rise of the UnikernelCPOSC2014: Next Generation Cloud -- Rise of the Unikernel
CPOSC2014: Next Generation Cloud -- Rise of the Unikernel
 
SCALE13x: Next Generation of the Cloud - Rise of the Unikernel
SCALE13x: Next Generation of the Cloud - Rise of the UnikernelSCALE13x: Next Generation of the Cloud - Rise of the Unikernel
SCALE13x: Next Generation of the Cloud - Rise of the Unikernel
 
FusionInventory at LSM/RMLL 2012
FusionInventory at LSM/RMLL 2012FusionInventory at LSM/RMLL 2012
FusionInventory at LSM/RMLL 2012
 
Docker openstack-2014
Docker openstack-2014Docker openstack-2014
Docker openstack-2014
 
Training Ensimag OpenStack 2016
Training Ensimag OpenStack 2016Training Ensimag OpenStack 2016
Training Ensimag OpenStack 2016
 
OaaS:Open as a Strategy
OaaS:Open as a StrategyOaaS:Open as a Strategy
OaaS:Open as a Strategy
 
Opening words at DockerCon Europe by Ben Golub
Opening words at DockerCon Europe by Ben Golub Opening words at DockerCon Europe by Ben Golub
Opening words at DockerCon Europe by Ben Golub
 
Using Embedded Linux for Infrastructure Systems
Using Embedded Linux for Infrastructure SystemsUsing Embedded Linux for Infrastructure Systems
Using Embedded Linux for Infrastructure Systems
 
Learn OpenStack from trystack.cn
Learn OpenStack from trystack.cnLearn OpenStack from trystack.cn
Learn OpenStack from trystack.cn
 
Eclipse e4
Eclipse e4Eclipse e4
Eclipse e4
 
Platform for a Connected World
Platform for a Connected WorldPlatform for a Connected World
Platform for a Connected World
 
Containers & CaaS
Containers & CaaSContainers & CaaS
Containers & CaaS
 
Moby Open Source Summit North America 2017
Moby Open Source Summit North America 2017Moby Open Source Summit North America 2017
Moby Open Source Summit North America 2017
 
Neo4J with Docker and Azure - GraphConnect 2015
Neo4J with Docker and Azure - GraphConnect 2015Neo4J with Docker and Azure - GraphConnect 2015
Neo4J with Docker and Azure - GraphConnect 2015
 

Similar to 2018 02 20-jeg_index

The Analytic Platform behind IBM’s Watson Data Platform - Big Data Spain 2017
The Analytic Platform behind IBM’s Watson Data Platform - Big Data Spain 2017The Analytic Platform behind IBM’s Watson Data Platform - Big Data Spain 2017
The Analytic Platform behind IBM’s Watson Data Platform - Big Data Spain 2017
Luciano Resende
 
The Analytic Platform behind IBM’s Watson Data Platform by Luciano Resende a...
 The Analytic Platform behind IBM’s Watson Data Platform by Luciano Resende a... The Analytic Platform behind IBM’s Watson Data Platform by Luciano Resende a...
The Analytic Platform behind IBM’s Watson Data Platform by Luciano Resende a...
Big Data Spain
 
ch4-Software is Everywhere
ch4-Software is Everywherech4-Software is Everywhere
ch4-Software is Everywhere
ssuser06ea42
 
Scaling notebooks for Deep Learning workloads
Scaling notebooks for Deep Learning workloadsScaling notebooks for Deep Learning workloads
Scaling notebooks for Deep Learning workloads
Luciano Resende
 
Jupyter con meetup extended jupyter kernel gateway
Jupyter con meetup   extended jupyter kernel gatewayJupyter con meetup   extended jupyter kernel gateway
Jupyter con meetup extended jupyter kernel gateway
Luciano Resende
 
Strata - Scaling Jupyter with Jupyter Enterprise Gateway
Strata - Scaling Jupyter with Jupyter Enterprise GatewayStrata - Scaling Jupyter with Jupyter Enterprise Gateway
Strata - Scaling Jupyter with Jupyter Enterprise Gateway
Luciano Resende
 
A Jupyter kernel for Scala and Apache Spark.pdf
A Jupyter kernel for Scala and Apache Spark.pdfA Jupyter kernel for Scala and Apache Spark.pdf
A Jupyter kernel for Scala and Apache Spark.pdf
Luciano Resende
 
Dd13.2013.milano.open ntf
Dd13.2013.milano.open ntfDd13.2013.milano.open ntf
Dd13.2013.milano.open ntf
Ulrich Krause
 
Introduction to Jupyter notebook and MS Azure Machine Learning Studio
Introduction to Jupyter notebook and MS Azure Machine Learning StudioIntroduction to Jupyter notebook and MS Azure Machine Learning Studio
Introduction to Jupyter notebook and MS Azure Machine Learning Studio
Muralidharan Deenathayalan
 
Introduction to Jupyter notebook and MS Azure Machine Learning Studio
Introduction to Jupyter notebook and MS Azure Machine Learning StudioIntroduction to Jupyter notebook and MS Azure Machine Learning Studio
Introduction to Jupyter notebook and MS Azure Machine Learning Studio
Muralidharan Deenathayalan
 
The Latest and Greatest from OpenNTF and the IBM Social Business Toolkit, #dd13
The Latest and Greatest from OpenNTF and the IBM Social Business Toolkit, #dd13The Latest and Greatest from OpenNTF and the IBM Social Business Toolkit, #dd13
The Latest and Greatest from OpenNTF and the IBM Social Business Toolkit, #dd13
Dominopoint - Italian Lotus User Group
 
Building analytical microservices powered by jupyter kernels
Building analytical microservices powered by jupyter kernelsBuilding analytical microservices powered by jupyter kernels
Building analytical microservices powered by jupyter kernels
Luciano Resende
 
Big analytics meetup - Extended Jupyter Kernel Gateway
Big analytics meetup - Extended Jupyter Kernel GatewayBig analytics meetup - Extended Jupyter Kernel Gateway
Big analytics meetup - Extended Jupyter Kernel Gateway
Luciano Resende
 
Using Elyra for COVID-19 Analytics
Using Elyra for COVID-19 AnalyticsUsing Elyra for COVID-19 Analytics
Using Elyra for COVID-19 Analytics
Luciano Resende
 
1_International_Google_CoLab_20220307.pptx
1_International_Google_CoLab_20220307.pptx1_International_Google_CoLab_20220307.pptx
1_International_Google_CoLab_20220307.pptx
FEG
 
Community works for muli core embedded image processing
Community works for muli core embedded image processingCommunity works for muli core embedded image processing
Community works for muli core embedded image processing
Jeongpyo Kong
 
Luciano Resende - Scaling Big Data Interactive Workloads across Kubernetes Cl...
Luciano Resende - Scaling Big Data Interactive Workloads across Kubernetes Cl...Luciano Resende - Scaling Big Data Interactive Workloads across Kubernetes Cl...
Luciano Resende - Scaling Big Data Interactive Workloads across Kubernetes Cl...
Codemotion
 
Leveraging docker for hadoop build automation and big data stack provisioning
Leveraging docker for hadoop build automation and big data stack provisioningLeveraging docker for hadoop build automation and big data stack provisioning
Leveraging docker for hadoop build automation and big data stack provisioning
Evans Ye
 
Leveraging Docker for Hadoop build automation and Big Data stack provisioning
Leveraging Docker for Hadoop build automation and Big Data stack provisioningLeveraging Docker for Hadoop build automation and Big Data stack provisioning
Leveraging Docker for Hadoop build automation and Big Data stack provisioning
DataWorks Summit
 
Iteratively introducing Puppet technologies in the brownfield; Jeffrey Miller
Iteratively introducing Puppet technologies in the brownfield; Jeffrey MillerIteratively introducing Puppet technologies in the brownfield; Jeffrey Miller
Iteratively introducing Puppet technologies in the brownfield; Jeffrey Miller
Puppet
 

Similar to 2018 02 20-jeg_index (20)

The Analytic Platform behind IBM’s Watson Data Platform - Big Data Spain 2017
The Analytic Platform behind IBM’s Watson Data Platform - Big Data Spain 2017The Analytic Platform behind IBM’s Watson Data Platform - Big Data Spain 2017
The Analytic Platform behind IBM’s Watson Data Platform - Big Data Spain 2017
 
The Analytic Platform behind IBM’s Watson Data Platform by Luciano Resende a...
 The Analytic Platform behind IBM’s Watson Data Platform by Luciano Resende a... The Analytic Platform behind IBM’s Watson Data Platform by Luciano Resende a...
The Analytic Platform behind IBM’s Watson Data Platform by Luciano Resende a...
 
ch4-Software is Everywhere
ch4-Software is Everywherech4-Software is Everywhere
ch4-Software is Everywhere
 
Scaling notebooks for Deep Learning workloads
Scaling notebooks for Deep Learning workloadsScaling notebooks for Deep Learning workloads
Scaling notebooks for Deep Learning workloads
 
Jupyter con meetup extended jupyter kernel gateway
Jupyter con meetup   extended jupyter kernel gatewayJupyter con meetup   extended jupyter kernel gateway
Jupyter con meetup extended jupyter kernel gateway
 
Strata - Scaling Jupyter with Jupyter Enterprise Gateway
Strata - Scaling Jupyter with Jupyter Enterprise GatewayStrata - Scaling Jupyter with Jupyter Enterprise Gateway
Strata - Scaling Jupyter with Jupyter Enterprise Gateway
 
A Jupyter kernel for Scala and Apache Spark.pdf
A Jupyter kernel for Scala and Apache Spark.pdfA Jupyter kernel for Scala and Apache Spark.pdf
A Jupyter kernel for Scala and Apache Spark.pdf
 
Dd13.2013.milano.open ntf
Dd13.2013.milano.open ntfDd13.2013.milano.open ntf
Dd13.2013.milano.open ntf
 
Introduction to Jupyter notebook and MS Azure Machine Learning Studio
Introduction to Jupyter notebook and MS Azure Machine Learning StudioIntroduction to Jupyter notebook and MS Azure Machine Learning Studio
Introduction to Jupyter notebook and MS Azure Machine Learning Studio
 
Introduction to Jupyter notebook and MS Azure Machine Learning Studio
Introduction to Jupyter notebook and MS Azure Machine Learning StudioIntroduction to Jupyter notebook and MS Azure Machine Learning Studio
Introduction to Jupyter notebook and MS Azure Machine Learning Studio
 
The Latest and Greatest from OpenNTF and the IBM Social Business Toolkit, #dd13
The Latest and Greatest from OpenNTF and the IBM Social Business Toolkit, #dd13The Latest and Greatest from OpenNTF and the IBM Social Business Toolkit, #dd13
The Latest and Greatest from OpenNTF and the IBM Social Business Toolkit, #dd13
 
Building analytical microservices powered by jupyter kernels
Building analytical microservices powered by jupyter kernelsBuilding analytical microservices powered by jupyter kernels
Building analytical microservices powered by jupyter kernels
 
Big analytics meetup - Extended Jupyter Kernel Gateway
Big analytics meetup - Extended Jupyter Kernel GatewayBig analytics meetup - Extended Jupyter Kernel Gateway
Big analytics meetup - Extended Jupyter Kernel Gateway
 
Using Elyra for COVID-19 Analytics
Using Elyra for COVID-19 AnalyticsUsing Elyra for COVID-19 Analytics
Using Elyra for COVID-19 Analytics
 
1_International_Google_CoLab_20220307.pptx
1_International_Google_CoLab_20220307.pptx1_International_Google_CoLab_20220307.pptx
1_International_Google_CoLab_20220307.pptx
 
Community works for muli core embedded image processing
Community works for muli core embedded image processingCommunity works for muli core embedded image processing
Community works for muli core embedded image processing
 
Luciano Resende - Scaling Big Data Interactive Workloads across Kubernetes Cl...
Luciano Resende - Scaling Big Data Interactive Workloads across Kubernetes Cl...Luciano Resende - Scaling Big Data Interactive Workloads across Kubernetes Cl...
Luciano Resende - Scaling Big Data Interactive Workloads across Kubernetes Cl...
 
Leveraging docker for hadoop build automation and big data stack provisioning
Leveraging docker for hadoop build automation and big data stack provisioningLeveraging docker for hadoop build automation and big data stack provisioning
Leveraging docker for hadoop build automation and big data stack provisioning
 
Leveraging Docker for Hadoop build automation and Big Data stack provisioning
Leveraging Docker for Hadoop build automation and Big Data stack provisioningLeveraging Docker for Hadoop build automation and Big Data stack provisioning
Leveraging Docker for Hadoop build automation and Big Data stack provisioning
 
Iteratively introducing Puppet technologies in the brownfield; Jeffrey Miller
Iteratively introducing Puppet technologies in the brownfield; Jeffrey MillerIteratively introducing Puppet technologies in the brownfield; Jeffrey Miller
Iteratively introducing Puppet technologies in the brownfield; Jeffrey Miller
 

More from Chester Chen

SFBigAnalytics_SparkRapid_20220622.pdf
SFBigAnalytics_SparkRapid_20220622.pdfSFBigAnalytics_SparkRapid_20220622.pdf
SFBigAnalytics_SparkRapid_20220622.pdf
Chester Chen
 
zookeeer+raft-2.pdf
zookeeer+raft-2.pdfzookeeer+raft-2.pdf
zookeeer+raft-2.pdf
Chester Chen
 
SF Big Analytics 2022-03-15: Persia: Scaling DL Based Recommenders up to 100 ...
SF Big Analytics 2022-03-15: Persia: Scaling DL Based Recommenders up to 100 ...SF Big Analytics 2022-03-15: Persia: Scaling DL Based Recommenders up to 100 ...
SF Big Analytics 2022-03-15: Persia: Scaling DL Based Recommenders up to 100 ...
Chester Chen
 
SF Big Analytics talk: NVIDIA FLARE: Federated Learning Application Runtime E...
SF Big Analytics talk: NVIDIA FLARE: Federated Learning Application Runtime E...SF Big Analytics talk: NVIDIA FLARE: Federated Learning Application Runtime E...
SF Big Analytics talk: NVIDIA FLARE: Federated Learning Application Runtime E...
Chester Chen
 
A missing link in the ML infrastructure stack?
A missing link in the ML infrastructure stack?A missing link in the ML infrastructure stack?
A missing link in the ML infrastructure stack?
Chester Chen
 
Shopify datadiscoverysf bigdata
Shopify datadiscoverysf bigdataShopify datadiscoverysf bigdata
Shopify datadiscoverysf bigdata
Chester Chen
 
SF Big Analytics 20191112: How to performance-tune Spark applications in larg...
SF Big Analytics 20191112: How to performance-tune Spark applications in larg...SF Big Analytics 20191112: How to performance-tune Spark applications in larg...
SF Big Analytics 20191112: How to performance-tune Spark applications in larg...
Chester Chen
 
SF Big Analytics 2019112: Uncovering performance regressions in the TCP SACK...
 SF Big Analytics 2019112: Uncovering performance regressions in the TCP SACK... SF Big Analytics 2019112: Uncovering performance regressions in the TCP SACK...
SF Big Analytics 2019112: Uncovering performance regressions in the TCP SACK...
Chester Chen
 
SFBigAnalytics_20190724: Monitor kafka like a Pro
SFBigAnalytics_20190724: Monitor kafka like a ProSFBigAnalytics_20190724: Monitor kafka like a Pro
SFBigAnalytics_20190724: Monitor kafka like a Pro
Chester Chen
 
SF Big Analytics 2019-06-12: Managing uber's data workflows at scale
SF Big Analytics 2019-06-12: Managing uber's data workflows at scaleSF Big Analytics 2019-06-12: Managing uber's data workflows at scale
SF Big Analytics 2019-06-12: Managing uber's data workflows at scale
Chester Chen
 
SF Big Analytics 20190612: Building highly efficient data lakes using Apache ...
SF Big Analytics 20190612: Building highly efficient data lakes using Apache ...SF Big Analytics 20190612: Building highly efficient data lakes using Apache ...
SF Big Analytics 20190612: Building highly efficient data lakes using Apache ...
Chester Chen
 
SF Big Analytics_20190612: Scaling Apache Spark on Kubernetes at Lyft
SF Big Analytics_20190612: Scaling Apache Spark on Kubernetes at LyftSF Big Analytics_20190612: Scaling Apache Spark on Kubernetes at Lyft
SF Big Analytics_20190612: Scaling Apache Spark on Kubernetes at Lyft
Chester Chen
 
SFBigAnalytics- hybrid data management using cdap
SFBigAnalytics- hybrid data management using cdapSFBigAnalytics- hybrid data management using cdap
SFBigAnalytics- hybrid data management using cdap
Chester Chen
 
Sf big analytics: bighead
Sf big analytics: bigheadSf big analytics: bighead
Sf big analytics: bighead
Chester Chen
 
Sf big analytics_2018_04_18: Evolution of the GoPro's data platform
Sf big analytics_2018_04_18: Evolution of the GoPro's data platformSf big analytics_2018_04_18: Evolution of the GoPro's data platform
Sf big analytics_2018_04_18: Evolution of the GoPro's data platform
Chester Chen
 
Analytics Metrics delivery and ML Feature visualization: Evolution of Data Pl...
Analytics Metrics delivery and ML Feature visualization: Evolution of Data Pl...Analytics Metrics delivery and ML Feature visualization: Evolution of Data Pl...
Analytics Metrics delivery and ML Feature visualization: Evolution of Data Pl...
Chester Chen
 
2018 data warehouse features in spark
2018   data warehouse features in spark2018   data warehouse features in spark
2018 data warehouse features in spark
Chester Chen
 
2018 02-08-what's-new-in-apache-spark-2.3
2018 02-08-what's-new-in-apache-spark-2.3 2018 02-08-what's-new-in-apache-spark-2.3
2018 02-08-what's-new-in-apache-spark-2.3
Chester Chen
 
Index conf sparkml-feb20-n-pentreath
Index conf sparkml-feb20-n-pentreathIndex conf sparkml-feb20-n-pentreath
Index conf sparkml-feb20-n-pentreath
Chester Chen
 
Index conf sparkai-feb20-n-pentreath
Index conf sparkai-feb20-n-pentreathIndex conf sparkai-feb20-n-pentreath
Index conf sparkai-feb20-n-pentreath
Chester Chen
 

More from Chester Chen (20)

SFBigAnalytics_SparkRapid_20220622.pdf
SFBigAnalytics_SparkRapid_20220622.pdfSFBigAnalytics_SparkRapid_20220622.pdf
SFBigAnalytics_SparkRapid_20220622.pdf
 
zookeeer+raft-2.pdf
zookeeer+raft-2.pdfzookeeer+raft-2.pdf
zookeeer+raft-2.pdf
 
SF Big Analytics 2022-03-15: Persia: Scaling DL Based Recommenders up to 100 ...
SF Big Analytics 2022-03-15: Persia: Scaling DL Based Recommenders up to 100 ...SF Big Analytics 2022-03-15: Persia: Scaling DL Based Recommenders up to 100 ...
SF Big Analytics 2022-03-15: Persia: Scaling DL Based Recommenders up to 100 ...
 
SF Big Analytics talk: NVIDIA FLARE: Federated Learning Application Runtime E...
SF Big Analytics talk: NVIDIA FLARE: Federated Learning Application Runtime E...SF Big Analytics talk: NVIDIA FLARE: Federated Learning Application Runtime E...
SF Big Analytics talk: NVIDIA FLARE: Federated Learning Application Runtime E...
 
A missing link in the ML infrastructure stack?
A missing link in the ML infrastructure stack?A missing link in the ML infrastructure stack?
A missing link in the ML infrastructure stack?
 
Shopify datadiscoverysf bigdata
Shopify datadiscoverysf bigdataShopify datadiscoverysf bigdata
Shopify datadiscoverysf bigdata
 
SF Big Analytics 20191112: How to performance-tune Spark applications in larg...
SF Big Analytics 20191112: How to performance-tune Spark applications in larg...SF Big Analytics 20191112: How to performance-tune Spark applications in larg...
SF Big Analytics 20191112: How to performance-tune Spark applications in larg...
 
SF Big Analytics 2019112: Uncovering performance regressions in the TCP SACK...
 SF Big Analytics 2019112: Uncovering performance regressions in the TCP SACK... SF Big Analytics 2019112: Uncovering performance regressions in the TCP SACK...
SF Big Analytics 2019112: Uncovering performance regressions in the TCP SACK...
 
SFBigAnalytics_20190724: Monitor kafka like a Pro
SFBigAnalytics_20190724: Monitor kafka like a ProSFBigAnalytics_20190724: Monitor kafka like a Pro
SFBigAnalytics_20190724: Monitor kafka like a Pro
 
SF Big Analytics 2019-06-12: Managing uber's data workflows at scale
SF Big Analytics 2019-06-12: Managing uber's data workflows at scaleSF Big Analytics 2019-06-12: Managing uber's data workflows at scale
SF Big Analytics 2019-06-12: Managing uber's data workflows at scale
 
SF Big Analytics 20190612: Building highly efficient data lakes using Apache ...
SF Big Analytics 20190612: Building highly efficient data lakes using Apache ...SF Big Analytics 20190612: Building highly efficient data lakes using Apache ...
SF Big Analytics 20190612: Building highly efficient data lakes using Apache ...
 
SF Big Analytics_20190612: Scaling Apache Spark on Kubernetes at Lyft
SF Big Analytics_20190612: Scaling Apache Spark on Kubernetes at LyftSF Big Analytics_20190612: Scaling Apache Spark on Kubernetes at Lyft
SF Big Analytics_20190612: Scaling Apache Spark on Kubernetes at Lyft
 
SFBigAnalytics- hybrid data management using cdap
SFBigAnalytics- hybrid data management using cdapSFBigAnalytics- hybrid data management using cdap
SFBigAnalytics- hybrid data management using cdap
 
Sf big analytics: bighead
Sf big analytics: bigheadSf big analytics: bighead
Sf big analytics: bighead
 
Sf big analytics_2018_04_18: Evolution of the GoPro's data platform
Sf big analytics_2018_04_18: Evolution of the GoPro's data platformSf big analytics_2018_04_18: Evolution of the GoPro's data platform
Sf big analytics_2018_04_18: Evolution of the GoPro's data platform
 
Analytics Metrics delivery and ML Feature visualization: Evolution of Data Pl...
Analytics Metrics delivery and ML Feature visualization: Evolution of Data Pl...Analytics Metrics delivery and ML Feature visualization: Evolution of Data Pl...
Analytics Metrics delivery and ML Feature visualization: Evolution of Data Pl...
 
2018 data warehouse features in spark
2018   data warehouse features in spark2018   data warehouse features in spark
2018 data warehouse features in spark
 
2018 02-08-what's-new-in-apache-spark-2.3
2018 02-08-what's-new-in-apache-spark-2.3 2018 02-08-what's-new-in-apache-spark-2.3
2018 02-08-what's-new-in-apache-spark-2.3
 
Index conf sparkml-feb20-n-pentreath
Index conf sparkml-feb20-n-pentreathIndex conf sparkml-feb20-n-pentreath
Index conf sparkml-feb20-n-pentreath
 
Index conf sparkai-feb20-n-pentreath
Index conf sparkai-feb20-n-pentreathIndex conf sparkai-feb20-n-pentreath
Index conf sparkai-feb20-n-pentreath
 

Recently uploaded

The Rise of Python in Finance,Automating Trading Strategies: _.pdf
The Rise of Python in Finance,Automating Trading Strategies: _.pdfThe Rise of Python in Finance,Automating Trading Strategies: _.pdf
The Rise of Python in Finance,Automating Trading Strategies: _.pdf
Riya Sen
 
Training on CSPro and step by steps.pptx
Training on CSPro and step by steps.pptxTraining on CSPro and step by steps.pptx
Training on CSPro and step by steps.pptx
lenjisoHussein
 
Cal Girls Hotel Safari Jaipur | | Girls Call Free Drop Service
Cal Girls Hotel Safari Jaipur | | Girls Call Free Drop ServiceCal Girls Hotel Safari Jaipur | | Girls Call Free Drop Service
Cal Girls Hotel Safari Jaipur | | Girls Call Free Drop Service
Deepikakumari457585
 
Unit 1 Introduction to DATA SCIENCE .pptx
Unit 1 Introduction to DATA SCIENCE .pptxUnit 1 Introduction to DATA SCIENCE .pptx
Unit 1 Introduction to DATA SCIENCE .pptx
Priyanka Jadhav
 
Cyber Insurance Mathematical Model & Pricing 2
Cyber Insurance Mathematical Model & Pricing 2Cyber Insurance Mathematical Model & Pricing 2
Cyber Insurance Mathematical Model & Pricing 2
BaraDaniel1
 
PRODUCT | RESEARCH-PRESENTATION-1.1.pptx
PRODUCT | RESEARCH-PRESENTATION-1.1.pptxPRODUCT | RESEARCH-PRESENTATION-1.1.pptx
PRODUCT | RESEARCH-PRESENTATION-1.1.pptx
amazenolmedojeruel
 
SFBA Splunk Usergroup meeting July 17, 2024
SFBA Splunk Usergroup meeting July 17, 2024SFBA Splunk Usergroup meeting July 17, 2024
SFBA Splunk Usergroup meeting July 17, 2024
Becky Burwell
 
Practical Research for grade 12 students
Practical Research for grade 12 studentsPractical Research for grade 12 students
Practical Research for grade 12 students
juliaaaaana10
 
Scaling_GeoServer_in_the_cloud__clustering_state_of_the_art.pdf
Scaling_GeoServer_in_the_cloud__clustering_state_of_the_art.pdfScaling_GeoServer_in_the_cloud__clustering_state_of_the_art.pdf
Scaling_GeoServer_in_the_cloud__clustering_state_of_the_art.pdf
NebojsaKuduz1
 
FINAL PROJECT WORK PORTFOLIO MANAGEMENT (2) hhh (1) (2) (5) (1) (1).pdf
FINAL PROJECT WORK PORTFOLIO MANAGEMENT (2)  hhh (1) (2) (5) (1) (1).pdfFINAL PROJECT WORK PORTFOLIO MANAGEMENT (2)  hhh (1) (2) (5) (1) (1).pdf
FINAL PROJECT WORK PORTFOLIO MANAGEMENT (2) hhh (1) (2) (5) (1) (1).pdf
bala krishna
 
Where to order Frederick Community College diploma?
Where to order Frederick Community College diploma?Where to order Frederick Community College diploma?
Where to order Frederick Community College diploma?
SomalyEng
 
Parcel Delivery - Intel Segmentation and Last Mile Opt.pptx
Parcel Delivery - Intel Segmentation and Last Mile Opt.pptxParcel Delivery - Intel Segmentation and Last Mile Opt.pptx
Parcel Delivery - Intel Segmentation and Last Mile Opt.pptx
AltanAtabarut
 
UNITEC Institute of Technology diploma
UNITEC Institute of Technology diplomaUNITEC Institute of Technology diploma
UNITEC Institute of Technology diploma
oyhka
 
Snowflake Program Acceleration Guide.pdf
Snowflake Program Acceleration Guide.pdfSnowflake Program Acceleration Guide.pdf
Snowflake Program Acceleration Guide.pdf
emailkittu1
 
Field Diary and lab record, Importance.pdf
Field Diary and lab record, Importance.pdfField Diary and lab record, Importance.pdf
Field Diary and lab record, Importance.pdf
hritikbui
 
Histology of Muscle types histology o.ppt
Histology of Muscle types histology o.pptHistology of Muscle types histology o.ppt
Histology of Muscle types histology o.ppt
SamanArshad11
 
Cyber Insurance Mathematical Model & Pricing
Cyber Insurance Mathematical Model & PricingCyber Insurance Mathematical Model & Pricing
Cyber Insurance Mathematical Model & Pricing
BaraDaniel1
 
future-of-asset-management-future-of-asset-management
future-of-asset-management-future-of-asset-managementfuture-of-asset-management-future-of-asset-management
future-of-asset-management-future-of-asset-management
Aadee4
 
Why_are_we_hypnotizing_ourselves-_ATeggin-1.pdf
Why_are_we_hypnotizing_ourselves-_ATeggin-1.pdfWhy_are_we_hypnotizing_ourselves-_ATeggin-1.pdf
Why_are_we_hypnotizing_ourselves-_ATeggin-1.pdf
Alexander Teggin
 
Acid Base Practice Test 4- KEY.pdfkkjkjk
Acid Base Practice Test 4- KEY.pdfkkjkjkAcid Base Practice Test 4- KEY.pdfkkjkjk
Acid Base Practice Test 4- KEY.pdfkkjkjk
talha2khan2k
 

Recently uploaded (20)

The Rise of Python in Finance,Automating Trading Strategies: _.pdf
The Rise of Python in Finance,Automating Trading Strategies: _.pdfThe Rise of Python in Finance,Automating Trading Strategies: _.pdf
The Rise of Python in Finance,Automating Trading Strategies: _.pdf
 
Training on CSPro and step by steps.pptx
Training on CSPro and step by steps.pptxTraining on CSPro and step by steps.pptx
Training on CSPro and step by steps.pptx
 
Cal Girls Hotel Safari Jaipur | | Girls Call Free Drop Service
Cal Girls Hotel Safari Jaipur | | Girls Call Free Drop ServiceCal Girls Hotel Safari Jaipur | | Girls Call Free Drop Service
Cal Girls Hotel Safari Jaipur | | Girls Call Free Drop Service
 
Unit 1 Introduction to DATA SCIENCE .pptx
Unit 1 Introduction to DATA SCIENCE .pptxUnit 1 Introduction to DATA SCIENCE .pptx
Unit 1 Introduction to DATA SCIENCE .pptx
 
Cyber Insurance Mathematical Model & Pricing 2
Cyber Insurance Mathematical Model & Pricing 2Cyber Insurance Mathematical Model & Pricing 2
Cyber Insurance Mathematical Model & Pricing 2
 
PRODUCT | RESEARCH-PRESENTATION-1.1.pptx
PRODUCT | RESEARCH-PRESENTATION-1.1.pptxPRODUCT | RESEARCH-PRESENTATION-1.1.pptx
PRODUCT | RESEARCH-PRESENTATION-1.1.pptx
 
SFBA Splunk Usergroup meeting July 17, 2024
SFBA Splunk Usergroup meeting July 17, 2024SFBA Splunk Usergroup meeting July 17, 2024
SFBA Splunk Usergroup meeting July 17, 2024
 
Practical Research for grade 12 students
Practical Research for grade 12 studentsPractical Research for grade 12 students
Practical Research for grade 12 students
 
Scaling_GeoServer_in_the_cloud__clustering_state_of_the_art.pdf
Scaling_GeoServer_in_the_cloud__clustering_state_of_the_art.pdfScaling_GeoServer_in_the_cloud__clustering_state_of_the_art.pdf
Scaling_GeoServer_in_the_cloud__clustering_state_of_the_art.pdf
 
FINAL PROJECT WORK PORTFOLIO MANAGEMENT (2) hhh (1) (2) (5) (1) (1).pdf
FINAL PROJECT WORK PORTFOLIO MANAGEMENT (2)  hhh (1) (2) (5) (1) (1).pdfFINAL PROJECT WORK PORTFOLIO MANAGEMENT (2)  hhh (1) (2) (5) (1) (1).pdf
FINAL PROJECT WORK PORTFOLIO MANAGEMENT (2) hhh (1) (2) (5) (1) (1).pdf
 
Where to order Frederick Community College diploma?
Where to order Frederick Community College diploma?Where to order Frederick Community College diploma?
Where to order Frederick Community College diploma?
 
Parcel Delivery - Intel Segmentation and Last Mile Opt.pptx
Parcel Delivery - Intel Segmentation and Last Mile Opt.pptxParcel Delivery - Intel Segmentation and Last Mile Opt.pptx
Parcel Delivery - Intel Segmentation and Last Mile Opt.pptx
 
UNITEC Institute of Technology diploma
UNITEC Institute of Technology diplomaUNITEC Institute of Technology diploma
UNITEC Institute of Technology diploma
 
Snowflake Program Acceleration Guide.pdf
Snowflake Program Acceleration Guide.pdfSnowflake Program Acceleration Guide.pdf
Snowflake Program Acceleration Guide.pdf
 
Field Diary and lab record, Importance.pdf
Field Diary and lab record, Importance.pdfField Diary and lab record, Importance.pdf
Field Diary and lab record, Importance.pdf
 
Histology of Muscle types histology o.ppt
Histology of Muscle types histology o.pptHistology of Muscle types histology o.ppt
Histology of Muscle types histology o.ppt
 
Cyber Insurance Mathematical Model & Pricing
Cyber Insurance Mathematical Model & PricingCyber Insurance Mathematical Model & Pricing
Cyber Insurance Mathematical Model & Pricing
 
future-of-asset-management-future-of-asset-management
future-of-asset-management-future-of-asset-managementfuture-of-asset-management-future-of-asset-management
future-of-asset-management-future-of-asset-management
 
Why_are_we_hypnotizing_ourselves-_ATeggin-1.pdf
Why_are_we_hypnotizing_ourselves-_ATeggin-1.pdfWhy_are_we_hypnotizing_ourselves-_ATeggin-1.pdf
Why_are_we_hypnotizing_ourselves-_ATeggin-1.pdf
 
Acid Base Practice Test 4- KEY.pdfkkjkjk
Acid Base Practice Test 4- KEY.pdfkkjkjkAcid Base Practice Test 4- KEY.pdfkkjkjk
Acid Base Practice Test 4- KEY.pdfkkjkjk
 

2018 02 20-jeg_index

  • 1. Building an Enterprise/Cloud Analytics Platform with Jupyter Notebooks and Apache Spark Fred Reiss Chief Architect, IBM Spark Technology Center
  • 2. 2 Hi! Fred Reiss • 2014-present: Chief Architect, IBM Spark Technology Center. • 2006-2014: Worked for IBM Research. • 2006: Ph.D. from U.C. Berkeley.
  • 3. 3 The Jupyter Project • Open Source project that builds software to enable interactive notebooks for data science – Started in 2014 – Grew out of the IPython project
  • 4. 4 What is IPython? https://upload.wikimedia.org/wikipedia/commons/4/47/IPython-shell.png By Shishirdasika (Own work) [CC BY-SA 3.0 (https://creativecommons.org/licenses/by-sa/3.0)], via Wikimedia Commons Interactive console for Python Can open a window to display graphics
  • 5. 5 IPython Notebooks https://upload.wikimedia.org/wikipedia/commons/a/af/IPython-notebook.png By Shishirdasika (Own work) [CC BY-SA 3.0 (https://creativecommons.org/licenses/by-sa/3.0)], via Wikimedia Commons Text and graphics in the same browser window
  • 7. 7 Jupyter Notebooks • Jupyter notebooks are widely used by data scientists, social scientists, physical scientists, engineers, and others • Useful for many tasks – Analyzing data – Developing and debugging software – Running experiments – Keeping track of experimental results – Presenting results • Jupyter is a central part of the IBM Data Science Experience (http://datascience.ibm.com)
  • 8. 8 Jupyter in the Enterprise: Key Challenges • Collaboration among multiple users • Large-scale data analysis (problems that don’t fit in a laptop) – Shared cloud infrastructure like Kubernetes – Parallel frameworks like Spark • Security and authentication • Auditing and data access control
  • 9. 9 Isn’t this just shipping strings around? JavaScript “1+1” Server “1+1” Python Process “1+1” “2”“2”“2”
  • 10. 10 Isn’t this just shipping strings around? JavaScript “1+1” FancyNewSystem “1+1” Python Process “1+1” “2”“2”“2” Security Multitenancy Authentication Spark Kubernetes
  • 11. 11 The Five Stages of Enterprise Jupyter Deployment
  • 12. 12 The Five Stages of Enterprise Jupyter Deployment 1. Denial
  • 13. 13 Jupyter does more than just pass strings around. • Quite a bit more!
  • 14. 14 Asynchronous Operations • Queue up multiple cells for execution – …in arbitrary order • Stream output while a cell is running • Interrupt any operation Fifteenth cell that executed in this session
  • 15. 15 Jupyter’s Display System: Much More than Text https://nbviewer.jupyter.org/github/ipython/ipython/bl ob/master/examples/IPython%20Kernel/Custom%2 0Display%20Logic.ipynb
  • 17. 17 Magics • Jupyter’s standard Python kernel has over 90 built-in magic commands http://ipython.readthedocs.io/en/stable/interactive/magics.html
  • 18. 18 Extensions • Many additional extensions in the iPython project’s Github repository – https://github.com/ip ython- contrib/jupyter_contri b_nbextensions
  • 21. 21 The Actual Architecture of Jupyter Notebooks
  • 22. Notebook Server Process 22 The Actual Architecture of Jupyter Notebooks JavaScript NotebookManagement Python Process KernelManagement iPythonKernel Notebook Server State KernelProxy Shell IOPub stdin control heartbeat Kernel Session State UserCode sklearn Spark Tensor Flow … Local Filesystem
  • 23. 23 The Five Stages of Enterprise Jupyter Deployment 1. Denial
  • 24. 24 The Five Stages of Enterprise Jupyter Deployment 1. Denial 2. Anger
  • 25. Notebook Server Process 25 The Actual Architecture of Jupyter Notebooks JavaScript NotebookManagement Python Process KernelManagement iPythonKernel Notebook Server State KernelProxy Shell IOPub stdin control heartbeat Kernel Session State UserCode sklearn Spark Tensor Flow … Local Filesystem
  • 26. Notebook Server Process 26 The Actual Architecture of Jupyter Notebooks JavaScript NotebookManagement Python Process KernelManagement iPythonKernel Notebook Server State KernelProxy Shell IOPub stdin control heartbeat Kernel Session State UserCode sklearn Spark Tensor Flow … Local Filesystem
  • 27. Notebook Server Process 27 The Actual Architecture of Jupyter Notebooks JavaScript NotebookManagement Python Process KernelManagement iPythonKernel Notebook Server State KernelProxy Shell IOPub stdin control heartbeat Kernel Session State UserCode sklearn Spark Tensor Flow … Local Filesystem
  • 28. Notebook Server Process 28 The Actual Architecture of Jupyter Notebooks JavaScript NotebookManagement Python Process KernelManagement iPythonKernel Notebook Server State KernelProxy Shell IOPub stdin control heartbeat Kernel Session State UserCode sklearn Spark Tensor Flow … Local Filesystem Five ZeroMQ message queues over unencrypted TCP sockets… …per kernel
  • 29. 29 Third-Party Kernels • The IPython kernel is the most common… • …but there is a long tail of other Jupyter kernels – 103 kernels currently listed on the Jupyter project’s wiki
  • 30. Notebook Server Process 30 The Actual Architecture of Jupyter Notebooks JavaScript NotebookManagement Python Process KernelManagement iPythonKernel Notebook Server State KernelProxy Shell IOPub stdin control heartbeat Kernel Session State UserCode sklearn Spark Tensor Flow … Local Filesystem To share notebooks among users, need to share notebook server
  • 31. Notebook Server Process 31 The Actual Architecture of Jupyter Notebooks JavaScript NotebookManagement Python Process KernelManagement iPythonKernel Notebook Server State KernelProxy Shell IOPub stdin control heartbeat Kernel Session State UserCode sklearn Spark Tensor Flow … Local Filesystem To use Apache Spark™ on YARN, need to be inside the YARN cluster’s network.
  • 32. 32 Jupyter in the Enterprise: Key Challenges • Collaboration among multiple users • Large-scale data analysis – Shared cloud infrastructure like Kubernetes – Parallel frameworks like Spark • Security and authentication • Auditing and data access control Bringing these properties to the Jupyter stack is hard!
  • 33. 33 The Five Stages of Enterprise Jupyter Deployment 1. Denial 2. Anger
  • 34. 34 The Five Stages of Enterprise Jupyter Deployment 1. Denial 2. Anger 3. Bargaining
  • 35. 35 Bargaining • Meeting all the enterprise requirements is expensive • Compromise to bring down the cost
  • 36. 36 Compromise #1: Gigantic Server • Find the biggest machine or container you can get • Run the entire Jupyter stack on that one machine • Issues: – Machine needs to be sized for the maximum aggregate memory of all active users’ active kernels • Hard upper limit of 256GB-1TB in most organizations • Very problematic if you have many users and big data – Need to authenticate all these users to the same machine and notebook server
  • 37. 37 Compromise #2: Notebook Server Per User • Proxy server manages a pool of containers, one per active user • Each container contains an entire Jupyter notebook stack • JupyterHub project provides a pre-built implementation of this approach • Issues: – Container needs to be big enough for all the user’s kernels • What size container to allocate when the user logs in? • Does a big enough container even exist? – Disables collaboration features – Many more moving parts  More failure modes
  • 38. 38 Compromise #3: Replace the Kernel iPythonKernel KernelProxy Shell IOPub stdin control heartbeat
  • 39. KernelProxy Proxy 39 Compromise #3: Replace the Kernel • Replace the IPython kernel with a proxy • Put something enterprise-friendly on the other side of the proxy • Apache Livy implements this approach – https://github.com/jupyter- incubator/sparkmagic • Issues: – Breaks Jupyter’s magics and extensions – Breaks data visualization libraries – Breaks third-party kernels – Less control over code execution Shell IOPub stdin control heartbeat RESTfulwebservice
  • 40. 40 The Five Stages of Enterprise Jupyter Deployment 1. Denial 2. Anger 3. Bargaining
  • 41. 41 The Five Stages of Enterprise Jupyter Deployment 1. Denial 2. Anger 3. Bargaining 4. Depression
  • 42. 42 The Five Stages of Enterprise Jupyter Deployment 1. Denial 2. Anger 3. Bargaining 4. Depression 5. Jupyter Enterprise Gateway
  • 43. 43 The Origins of Jupyter Enterprise Gateway • Multiple IBM products embedding Spark on YARN • All wanted to add Jupyter notebooks with Spark • Usual enterprise requirements (multitenancy, scalability, security, etc.) • Had reached the “Bargaining” stage – Mix of compromises 1, 2, and 3
  • 45. YARN Cluster Initial Prototype 45 Security Layer YARN Workers YARN Resource Manager Spark ExecutorsSpark ExecutorsSpark Executors Spark ExecutorsSpark ExecutorsSpark Executors Notebook Node nb2kg (Proxy) nb2kg Jupyter Kernel Gateway Python Kernel Spark Driver Python Kernel Spark Driver Shell IOPub stdin control heartbeat Issue #2: All Spark jobs run as same user ID Issue #1: All kernels and Spark drivers run on a single node
  • 46. Issue #1: All kernels run on a single node 8 8 8 8 0 10 20 30 40 50 60 70 80 4 Nodes 8 Nodes 12 Nodes 16 Nodes MaxKernels(4GBHeap) Cluster Size (32GB Nodes) Maximum Number of Simultaneous Kernels 46
  • 47. Jupyter Enterprise Gateway: Initial Goals • Optimized Resource Allocation – Run Spark in YARN Cluster Mode to better utilize cluster resources. – Pluggable architecture for additional Resource Managers • Multiuser support with user impersonation – Enhance security and sandboxing by enabling user impersonation when running kernels (using Kerberos). – Individual HDFS home folder for each notebook user. – Use the same user ID for notebook and batch jobs. • Enhanced Security – Secure socket communications – Any network communication should be encrypted 47
  • 48. YARN Cluster Jupyter Enterprise Gateway 48 Security Layer YARN Workers Jupyter EnterpriseGateway Multitenancy Remote kernels and Kernel Lifecycle management Spark Executors Spark Executors Spark Executors Yarn Container Jupyter Kernel Spark Driver Spark Executors Spark Executors Spark Executors Yarn Container Jupyter Kernel Spark Driver Spark Executors Spark Executors Spark Executors Yarn Container Jupyter Kernel Spark Driver Impersonation: Alice’s kernel runs under Alice’s user ID.
  • 49. Scalability Benefits 8 8 8 8 16 32 48 64 0 10 20 30 40 50 60 70 80 4 Nodes 8 Nodes 12 Nodes 16 Nodes MaxKernels(4GBHeap) Cluster Size (32GB Nodes) Maximum Number of Simultaneous Kernels Before JEG After JEG 49
  • 50. Jupyter Enterprise Gateway: Open Source 50 • Released through the Jupyter Incubator – BSD License – https://github.com/jupyter- incubator/enterprise_gatew ay – Current release: 0.7.0
  • 51. Jupyter Enterprise Gateway: Supported Platforms • Python/Spark 2.x using IPython kernel – With Spark Context delayed initialization • Scala 2.11/ Spark 2.x using Apache Toree kernel – With Spark Context delayed initialization • R / Spark 2.x with IRkernel 51
  • 52. Jupyter Enterprise Gateway – Roadmap • Add support for other resource managers – Kubernetes support • Kernel Configuration Profile – Enable client to request different resource configuration for kernels (e.g. small, medium, large) – Profiles should be defined by Administrators and enabled for user/group of users. • Administration UI – Dashboard with running kernels and administration actions • Time running, stop/kill, Profile Management, etc • User Environments • High Availability 52
  • 53. Jupyter Enterprise Gateway • Jupyter Enterprise Gateway at IBM Code – https://developer.ibm.com/code/openprojects/jupyter-enterprise-gateway/ • Jupyter Enterprise Gateway source code at GitHub – https://github.com/jupyter-incubator/enterprise_gateway • Docker images – https://github.com/jupyter- incubator/enterprise_gateway/tree/master/etc/docker • Jupyter Enterprise Gateway 0.7 release – https://github.com/jupyter-incubator/enterprise_gateway/releases/tag/v0.7.0 • Jupyter Enterprise Gateway Documentation – http://jupyter-enterprise-gateway.readthedocs.io/en/latest/ 53 Free IBM Data Science trial https://ibm.biz/BdZceR
  • 54. 54 Thank you! And special thanks to the Jupyter Enterprise Gateway team: Luciano Resende, Kevin Bates, Kun Liu, Christian Kadner, Sanjay Saxena, Alan Chin, Sherry Guo, Alex Bozarth, Zee Chen
  • 56. 56 Building your own test environment with Jupyter Enterprise Gateway
  • 57. Jupyter Enterprise Gateway: Deployment 57 Management Node Powered by Ambari EG Compute Engine based on Apache Spark
  • 58. Jupyter Enterprise Gateway: Deployment • Ansible deployment scripts – https://github.com/lresende/spark-cluster-install • One click deployment of the Spark Cluster – Configure your host inventory (see example on git repository) – Run the ”setup-ambari.yml” playbook • $ ansible-playbook --verbose setup-ambari.yml -i hosts-fyre-ambari -c paramiko • One click deployment of the Jupyter Enterprise Engine – Run the ”setup-enterprise-gateway.yml” playbook • $ ansible-playbook --verbose setup-enterprise-gateway.yml -i hosts-fyre-ambari -c paramiko 58
  • 59. Jupyter Enterprise Gateway - Deployment • Docker images – yarn-spark: Basic one node Spark on Yarn configuration – enterprise-gateway: Adds Anaconda and Jupyter Enterprise Gateway to the yarn-spark image – nb2kg: Minimal Jupyter Notebook client configured with hooks to access the Enterprise Gateway – https://github.com/jupyter-incubator/enterprise_gateway/tree/master/etc/docker • Building the latest docker images – git checkout https://github.com/jupyter-incubator/enterprise_gateway – make docker-clean docker-images – Note: Make also have individual targets to clean and build individual images (type make for help) 59
  • 60. Jupyter Enterprise Gateway - Deployment • Connecting to a Spark Cluster using a docker image docker run -t --rm -e KG_URL='http://<Enterprise Gateway IP>:8888' -p 8888:8888 -e VALIDATE_KG_CERT='no' -e LOG_LEVEL=DEBUG -e KG_REQUEST_TIMEOUT=40 -e KG_CONNECT_TIMEOUT=40 -v ${HOME}/opensource/jupyter/jupyter-notebooks/:/tmp/notebooks -w /tmp/notebooks elyra/nb2kg:dev 60

Editor's Notes

  1. Now, when I first saw these requirements, my initial reaction was, “sounds easy”. I mean, to a first approximation, all that Jupyter is doing is passing strings around.
  2. This is what I initially thought, and I’ve met a good number of other people who were in the same situation and came up with the same design. The problem with this design is that it’s actually only the first stage of a much longer process that I like to call…
  3. And in particular, the first stage of this process is called…
  4. Let me explain.
  5. All these cool features of Jupyter notebooks rely on an architecture that is substantially more baroque than the cartoon picture from ten slides back…
  6. When an enterprise architect becomes aware of all this complexity, that’s when he or she moves from stage 1 to stage 2, which is…
  7. Let me explain.
  8. This architecture was designed for an academic setting. When you try to transplant it into an enterprise environment and layer enterprise requirements on top of it, things go downhill rather quickly.
  9. …and the purpose of this talk is to help you to work through this fourth stage as quickly as possible and move on to stage 5, which is…
  10. …Jupyter Enterprise Gateway. (Bet you thought I was going to say “acceptance”). So, what is Jupyter Enterprise Gateway?
  11. Min RK
  12. Min RK
  13. Min RK