(Go: >> BACK << -|- >> HOME <<)

SlideShare a Scribd company logo
1 7 J U N E 2 0 2 1
S C A L I N G A I I N P R O D U C T I O N
U S I N G P Y T O R C H
G E E T A C H A U H A N
PyTorch Partner Engineering, Facebook AI
@ C H A U H A N G
MLOPS World 2021
A G E N D A 0 1


C H A L L E N G E S W I T H M L I N
P R O D U C T I O N


0 2


T O R C H S E R V E O V E R V I E W


0 3


B E S T P R A C T I C E S F O R P R O D U C T I O N
D E P L O Y M E N T
MLOps World 2021
P Y T O R C H C O M M U N I T Y G R O W T H
Source: https://paperswithcode.com/trends
MLOps World 2021
●
●
●
Cloud / On-Prem
Preprocessing
Application
Application logic
Application logic
Postprocessing
. . .
. . .
. . .
Performance Ease of use
Cost efficiency Deployment at scale
C H A L L E N G E S W I T H M L I N D E P L O Y M E N T
MLOps World 2021
INFERENCE AT SCALE
Deploying and managing models in production is
di
ffi
cult.


Some of the pain points include:
Loading and managing multiple models, on multiple
servers or end devices


Running pre-processing and post-processing code on
prediction requests.


How to log, monitor and secure predictions


What happens when you hit scale?
MLOps World 2021
TORCHSERVE
Easily deploy PyTorch models in production at scale


D E F A U LT H A N D L E R S
F O R C O M M O N T A S K S
L O W L AT E N C Y M O D E L
S E R V I N G
W O R K S W I T H A N Y M L
E N V I R O N M E N T
MLOps World 2021
• Default handlers for common use
cases (e.g., image segmentation,
text classification) along with
custom handlers support for other
use cases and a Model Zoo


• Multi-model serving, Model
versioning and ability to roll back
to an earlier version


• Automatic batching of individual
inferences across HTTP requests
• Logging including common
metrics, and the ability to
incorporate custom metrics


• Robust HTTP APIS -
Management and Inference
model1.pth
model1.pth
model1.pth
torch-model-archiver
HTTP
HTTP
http://localhost:8080/ …


http://localhost:8081/ …
Logging Metrics
model1.mar model2.mar model3.mar
model4.mar model5.mar
<path>/model_store
Inference API
Management API
TorchServe
Metrics API
Inference
API
Serving Model 3
Serving Model 2
Serving Model 1
torchserve --start
TORCHSERVE
T O R C H S E R V E D E T A I L :


M O D E L H A N D L E R S
TorchServe has default model handlers that
perform boilerplate data transforms for
common cases:


• Image Classification


• Image Segmentation


• Object Detection


• Text Classification


You can also create custom model handlers
for any model and inference task.
import torch


class MyModelHandler(object):


    def initialize(self, context):


# get GPU status & device handle


# load model & supporting files (vocabularies etc.)


    def preprocess(self, data):


# put incoming data into tensor


# transform as needed for your model


    def inference(self, context):


# do predictions


    def postprocess(self, output):


# process inference output, e.g. extracting top K


# package output for web delivery


    def handle(self, context):


if not _service.initialized:


_service.initialize(context)


if data is None:


return None


data = _service.preprocess(data)


data = _service.inference(data)


data = _service.postprocess(data)


return data
M O D E L A R C H I V E
torch-model-archiver cli tool for packaging all
model artifacts into a single deployment unit


• model checkpoints or model definition file
with state_dict


• torchscript and eager mode support


• Extra files like vocab, config, index_to_name
mapping


torch-model-archiver


—model-name BERTSeqClassification_Torchscript


--version 1.0


--serialized-file Transformer_model/traced_model.pt


--handler ./Transformer_handler_generalized.py


--extra-files "./setup_config.json,./
Seq_classification_artifacts/index_to_name.json"





setup.config


{


“model_name": "bert-base-uncased",


“mode": "sequence_classification",


“do_lower_case": "True",


“num_labels": "2",


“save_mode": "torchscript",


“max_length": "150"


}




torchserve --start


--model-store model_store


—-models <path-to model-file/s3-url/azure-blob-url>
https://github.com/pytorch/serve/tree/master/model-archiver#creating-a-model-archive
D Y N A M I C B A T C H I N G
Via Custom Handlers


• Model Configuration based


• batch_size Max batch size


• max_batch_delay The max batch delay time
TorchServe waits to
receive batch_size number of requests


• (Coming soon) Batching support in default
handlers


curl localhost:8081/models/resnet-152


{


"modelName": "resnet-152",


"modelUrl": "https://s3.amazonaws.com/model-server/
model_archive_1.0/examples/resnet-152-batching/resnet-152.ma


"runtime": "python",


"minWorkers": 1,


"maxWorkers": 1,


"batchSize": 8,


"maxBatchDelay": 10,


"workers": [


{


"id": "9008",


"startTime": "2019-02-19T23:56:33.907Z",


"status": "READY",


"gpu": false,


"memoryUsage": 607715328


}


]


}


https://github.com/pytorch/serve/blob/master/docs/batch_inference_with_ts.md
M E T R I C S
Out of box metrics with ability to extend


• CPU, Disk, Memory utilization


• Requests type count


• ts.metrics class for extension


• Types supported - Size, percentage, counter,
general metric


• Prometheus metrics support available


# Access context metrics as follows


metrics = context.metrics


# Create Dimension Object


from ts.metrics.dimension import Dimension


# Dimensions are name value pairs


dim1 = Dimension(name, value)


.


dimN= Dimension(name_n, value_n)


# Add Distance as a metric


# dimensions = [dim1, dim2, dim3, ..., dimN]


metrics.add_metric('DistanceInKM', distance, 'km',
dimensions=dimensions)


# Add Image size as a size metric


metrics.add_size('SizeOfImage', img_size, None, 'MB', dimensions)


# Add MemoryUtilization as a percentage metric


metrics.add_percent('MemoryUtilization', utilization_percent, None,
dimensions)


# Create a counter with name 'LoopCount' and dimensions


metrics.add_counter('LoopCount', 1, None, dimensions)


# Log custom metrics


for metric in metrics.store:


logger.info("[METRICS]%s", str(metric))


https://github.com/pytorch/serve/blob/master/docs/metrics.md
MLOps World 2021
RECENT FEATURES
+ Ensemble Model support, Captum Model Interpretability


+ Kubeflow Pipelines /KFServing Integration with Auto-scaling and Canary rollout on any cloud/on-prem


+ GCP Vertex AI Serverless pipelines


+ MLflow Integration




+ Prometheus Integration with Grafana


+ Multiple nodes on EC2, Autoscaling on SageMaker/EKS, AWS Inferentia support


+ MMF, NMT, DeepLapV3 new examples




Deployment
models
Optimizations Resilience Measurement
Responsible AI
Standalon
e

Primary backu
p

Orchestratio
n

Cloud vs. 

on-premises
Performance vs.
latency
 

TorchScript profilin
g

Offline vs. real-tim
e

Cost
Robust endpoin
t

Auto-scalin
g

Canary
deployment
s

A / B testing
Metric
s

Model
performanc
e

Interpretabilit
y

Feedback loop
Fairnes
s

Human-centered
design
B E S T P R A C T I C E S F O R P R O D U C T I O N D E P L O Y M E N T S
MLOps World 2021
Fairness by design


• Measure skewness of data, model bias, data bias; identify relevant metrics


• Transparency, Explainable AI, inclusive design


Human-centered design


• Consider AI-driven decisions and their impact on people at the time of model design


• Provide ability to have human recourse vs. full automation – for example, need to avoid a mortgage
applications AI rejecting people of certain category or race


• Computer vision models measure results based on demographics; for example, include support for different
skin tones, age groups
R E S P O N S I B L E A I
MLOps World 2021
• Build with performance vs. latency goals in mind


• Reduce size of the model: Quantization, pruning, mixed precision training


• Reduce latency: TorchScript model; use SnakeViz profiler


• Evaluate GPU vs. CPU for low latency


• Evaluate REST vs. gRPC for your prediction service
O P T I M I Z A T I O N S
MLOps World 2021
fp32 accuracy int8 accuracy change Technique CPU inference speed up
ResNet50 76.1


Top-1, Imagenet
-0.2


75.9
Post Training
2x


214ms ➙102ms,


Intel Skylake-DE
MobileNetV2 71.9


Top-1, Imagenet
-0.3


71.6
Quantization-Aware
Training
4x


75ms ➙18ms


OnePlus 5, Snapdragon 835
Translate / FairSeq 32.78


BLEU, IWSLT 2014 de-en
0.0


32.78
Dynamic


(weights only)
4x


for encoder


Intel Skylake-SE
These models and more available on TorchHub - https://pytorch.org/hub/
QUANTIZATION
MLOps World 2021
B E R T


M O D E L


P R O F I L I N G


Eager Mode
MLOps World 2021
B E R T


M O D E L


P R O F I L I N G


Torchscript Mode


4x speedup
MLOps World 2021
Offline vs. real-time predictions


• Offline: Dynamic batching


• Online: Async processing – push/poll


• Pre-computed predictions for certain elements


Cost optimizations


• Spot Instances for offline


• Autoscaling based on metrics, on-demand cluster


• Evaluate AI Accelerators supported like AWS Inferentia for lower cost point


O P T I M I Z A T I O N S ( C O N T D . )
MLOps World 2021
Develop
,

Test
Production
Staging
,

Experiments
Hybrid Cloud
On-prem Cloud Managed
Install from Source
Standalone
Docker
Large Scale

Production
MLflow, Kubeflow
Kubernetes, Kubeflow/KFserving
Primary/Backup, ML Microservices
Autoscaling, Canary Rollouts
Minikub
e

Self managed Docker AWS CloudFormation
CLOUD VMs/ Containers
Microservices behind
 

API Gateway
CLOUD VMs/ Containers
AWS SageMaker
Endpoints, BYOC
AWS SageMaker
EKS/AKS/GKE
AWS SageMaker/ GCP
AI Platform
Serverless Functions
GCP Vertex AI,
 

AWS SageMaker
 

Canary Rollouts
Databricks
Managed MLflow
D E P L O Y I N G M O D E L S I N P R O D U C T I O N
MLOps World 2021
Create robust endpoint for serving, for example, SageMaker endpoint


Auto-scaling with orchestration deployments, multi-node for EC2, and other scenarios


Canary deployments, test new version of a model on small subset before making
default


Shadow inference, deploy new version of model in parallel


A / B testing of different versions of model
R E S I L L I E N C E
MLOps World 2021
Define model performance metrics, such as accuracy, while designing the AI service;
use-case specific


Add custom metrics as appropriate


Use CloudWatch or Prometheus dashboards for monitoring model performance


Model interpretability analysis via Captum


Deploy with a feedback loop, if model accuracy drops over time or new version,
analyze issues like concept drift, stale data, etc.
M E A S U R E M E N T
MLOps World 2021
Understand
Align
Mitigate
Monitor
Measure
Stakeholder conversations to find


consensus and outline measurement and
mitigation plans


Analyze model performance,


label bias, outcomes, and other
relevant signals
Address observed


issues in dataset,


models, policies, etc
How might the product’s goals, its policy,
and its implementation affect users from
different subgroups? Identify contextual
definitions of fairness


Monitor effect of mitigations on


subgroups, and ensure fairness
analysis holds as product adapts


FAIRNESS BY DESIGN
CAPTUM
Text Contributions: 7.54


Image Contributions: 11.19


Total Contributions: 18.73
0 200 400 600 800
400
300
200
100
0
S U P P O R T F O R AT T R I B U T I O N A LG O R I T H M S


T O I N T E R P R E T:


• Output predictions with respect to inputs


• Output predictions with respect to layers


• Neurons with respect to inputs


• Currently provides gradient & perturbation based
approaches (e.g. Integrated Gradients)
Model interpretability library for PyTorch
https://captum.ai/
MLOps World 2021
DYNABOARD & FLORES 101 WMT COMPETITION
http://www.statmt.org/wmt21/large-scale-multilingual-translation-task.html
https://github.com/facebookresearch/dynalab
https://dynabench.org/tasks/3#overall
MLOps World 2021
COMMUNIT Y PROJECTS https://github.com/cceyda/torchserve-dashboard
https://github.com/Unity-Technologies/SynthDet
https://medium.com/pytorch/how-wadhwani-ai-uses-pytorch-
to-empower-cotton-farmers-14397f4c9f2b
MLOps World 2021
FUTURE RELEASES
+ Improved memory and resource usage for better scalability


+ C++ Backend for lower latency


+ Enhanced profiling tools
• TorchServe: https://github.com/pytorch/serve


• Management API: https://github.com/pytorch/serve/blob/master/docs/management_api.md


• Inference API: https://github.com/pytorch/serve/blob/master/docs/inference_api.md


• Language Translation Ensemble example: https://github.com/pytorch/serve/tree/master/examples/Work
fl
ows/nmt_tranformers_pipeline


• BERT Model example: https://github.com/pytorch/serve/tree/master/examples/Huggingface_Transformers


• Model Zoo: https://github.com/pytorch/serve/blob/master/docs/model_zoo.md


• SnakeViz visualizations: https://github.com/pytorch/serve/tree/master/benchmarks#visualize-snakeviz-results


• Logging: https://github.com/pytorch/serve/blob/master/docs/logging.md


• Metrics: https://github.com/pytorch/serve/blob/master/docs/metrics.md


• Prometheus Metrics: https://gith ub.com/pytorch/serve/blob/master/docs/metrics_api.md


• Batch Inference: https://github.com/pytorch/serve/blob/master/docs/batch_inference_with_ts.md


• Kube
fl
ow Pipelines: https://github.com/kube
fl
ow/pipelines/tree/master/components/PyTorch/pytorch-kfp-components


• Kubernetes support: https://github.com/pytorch/serve/blob/master/kubernetes/README.md


• TorchServe Dashboard (Community): https://cceyda.github.io/blog/torchserve/streamlit/dashboard/2020/10/15/torchserve.html


• Custom Handler community blog: https://towardsdatascience.com/deploy-models-and-create-custom-handlers-in-torchserve-
fc2d048fbe91


• Captum Interpretability for BERT models: https://github.com/pytorch/serve/blob/master/captum/Captum_visualization_for_bert.ipynb


• Operationalize, Scale and Infuse Trust in AI using KFServing: https://blog.kube
fl
ow.org/release/o
ffi
cial/2021/03/08/kfserving-0.5.html


REFERENCES
QUESTIONS?


Contact:


Email: gchauhan@fb.com


Linkedin: https://www.linkedin.com/in/geetachauhan/

More Related Content

What's hot

AI and ML Series - Introduction to Generative AI and LLMs - Session 1
AI and ML Series - Introduction to Generative AI and LLMs - Session 1AI and ML Series - Introduction to Generative AI and LLMs - Session 1
AI and ML Series - Introduction to Generative AI and LLMs - Session 1
DianaGray10
 
Managing the Machine Learning Lifecycle with MLflow
Managing the Machine Learning Lifecycle with MLflowManaging the Machine Learning Lifecycle with MLflow
Managing the Machine Learning Lifecycle with MLflow
Databricks
 
Developing a Knowledge Graph of your Competency, Skills, and Knowledge at NASA
Developing a Knowledge Graph of your Competency, Skills, and Knowledge at NASADeveloping a Knowledge Graph of your Competency, Skills, and Knowledge at NASA
Developing a Knowledge Graph of your Competency, Skills, and Knowledge at NASA
Neo4j
 
MLOps Bridging the gap between Data Scientists and Ops.
MLOps Bridging the gap between Data Scientists and Ops.MLOps Bridging the gap between Data Scientists and Ops.
MLOps Bridging the gap between Data Scientists and Ops.
Knoldus Inc.
 
MLOps Using MLflow
MLOps Using MLflowMLOps Using MLflow
MLOps Using MLflow
Databricks
 
Building a Knowledge Graph using NLP and Ontologies
Building a Knowledge Graph using NLP and OntologiesBuilding a Knowledge Graph using NLP and Ontologies
Building a Knowledge Graph using NLP and Ontologies
Neo4j
 
Use MLflow to manage and deploy Machine Learning model on Spark
Use MLflow to manage and deploy Machine Learning model on Spark Use MLflow to manage and deploy Machine Learning model on Spark
Use MLflow to manage and deploy Machine Learning model on Spark
Herman Wu
 
Introduction to Knowledge Graphs: Data Summit 2020
Introduction to Knowledge Graphs: Data Summit 2020Introduction to Knowledge Graphs: Data Summit 2020
Introduction to Knowledge Graphs: Data Summit 2020
Enterprise Knowledge
 
Machine Learning Models in Production
Machine Learning Models in ProductionMachine Learning Models in Production
Machine Learning Models in Production
DataWorks Summit
 
Episode 2: The LLM / GPT / AI Prompt / Data Engineer Roadmap
Episode 2: The LLM / GPT / AI Prompt / Data Engineer RoadmapEpisode 2: The LLM / GPT / AI Prompt / Data Engineer Roadmap
Episode 2: The LLM / GPT / AI Prompt / Data Engineer Roadmap
Anant Corporation
 
MLOps - The Assembly Line of ML
MLOps - The Assembly Line of MLMLOps - The Assembly Line of ML
MLOps - The Assembly Line of ML
Jordan Birdsell
 
Emeli Dral (Evidently AI) – Analyze it: production monitoring for machine lea...
Emeli Dral (Evidently AI) – Analyze it: production monitoring for machine lea...Emeli Dral (Evidently AI) – Analyze it: production monitoring for machine lea...
Emeli Dral (Evidently AI) – Analyze it: production monitoring for machine lea...
Codiax
 
Ml ops intro session
Ml ops   intro sessionMl ops   intro session
Ml ops intro session
Avinash Patil
 
Simplifying Model Management with MLflow
Simplifying Model Management with MLflowSimplifying Model Management with MLflow
Simplifying Model Management with MLflow
Databricks
 
Knowledge Graphs and Generative AI_GraphSummit Minneapolis Sept 20.pptx
Knowledge Graphs and Generative AI_GraphSummit Minneapolis Sept 20.pptxKnowledge Graphs and Generative AI_GraphSummit Minneapolis Sept 20.pptx
Knowledge Graphs and Generative AI_GraphSummit Minneapolis Sept 20.pptx
Neo4j
 
Inside open metadata—the deep dive
Inside open metadata—the deep diveInside open metadata—the deep dive
Inside open metadata—the deep dive
DataWorks Summit
 
What is MLOps
What is MLOpsWhat is MLOps
What is MLOps
Henrik Skogström
 
Why do the majority of Data Science projects never make it to production?
Why do the majority of Data Science projects never make it to production?Why do the majority of Data Science projects never make it to production?
Why do the majority of Data Science projects never make it to production?
Itai Yaffe
 
TechEvent Databricks on Azure
TechEvent Databricks on AzureTechEvent Databricks on Azure
TechEvent Databricks on Azure
Trivadis
 
AI as a Service, Build Shared AI Service Platforms Based on Deep Learning Tec...
AI as a Service, Build Shared AI Service Platforms Based on Deep Learning Tec...AI as a Service, Build Shared AI Service Platforms Based on Deep Learning Tec...
AI as a Service, Build Shared AI Service Platforms Based on Deep Learning Tec...
Databricks
 

What's hot (20)

AI and ML Series - Introduction to Generative AI and LLMs - Session 1
AI and ML Series - Introduction to Generative AI and LLMs - Session 1AI and ML Series - Introduction to Generative AI and LLMs - Session 1
AI and ML Series - Introduction to Generative AI and LLMs - Session 1
 
Managing the Machine Learning Lifecycle with MLflow
Managing the Machine Learning Lifecycle with MLflowManaging the Machine Learning Lifecycle with MLflow
Managing the Machine Learning Lifecycle with MLflow
 
Developing a Knowledge Graph of your Competency, Skills, and Knowledge at NASA
Developing a Knowledge Graph of your Competency, Skills, and Knowledge at NASADeveloping a Knowledge Graph of your Competency, Skills, and Knowledge at NASA
Developing a Knowledge Graph of your Competency, Skills, and Knowledge at NASA
 
MLOps Bridging the gap between Data Scientists and Ops.
MLOps Bridging the gap between Data Scientists and Ops.MLOps Bridging the gap between Data Scientists and Ops.
MLOps Bridging the gap between Data Scientists and Ops.
 
MLOps Using MLflow
MLOps Using MLflowMLOps Using MLflow
MLOps Using MLflow
 
Building a Knowledge Graph using NLP and Ontologies
Building a Knowledge Graph using NLP and OntologiesBuilding a Knowledge Graph using NLP and Ontologies
Building a Knowledge Graph using NLP and Ontologies
 
Use MLflow to manage and deploy Machine Learning model on Spark
Use MLflow to manage and deploy Machine Learning model on Spark Use MLflow to manage and deploy Machine Learning model on Spark
Use MLflow to manage and deploy Machine Learning model on Spark
 
Introduction to Knowledge Graphs: Data Summit 2020
Introduction to Knowledge Graphs: Data Summit 2020Introduction to Knowledge Graphs: Data Summit 2020
Introduction to Knowledge Graphs: Data Summit 2020
 
Machine Learning Models in Production
Machine Learning Models in ProductionMachine Learning Models in Production
Machine Learning Models in Production
 
Episode 2: The LLM / GPT / AI Prompt / Data Engineer Roadmap
Episode 2: The LLM / GPT / AI Prompt / Data Engineer RoadmapEpisode 2: The LLM / GPT / AI Prompt / Data Engineer Roadmap
Episode 2: The LLM / GPT / AI Prompt / Data Engineer Roadmap
 
MLOps - The Assembly Line of ML
MLOps - The Assembly Line of MLMLOps - The Assembly Line of ML
MLOps - The Assembly Line of ML
 
Emeli Dral (Evidently AI) – Analyze it: production monitoring for machine lea...
Emeli Dral (Evidently AI) – Analyze it: production monitoring for machine lea...Emeli Dral (Evidently AI) – Analyze it: production monitoring for machine lea...
Emeli Dral (Evidently AI) – Analyze it: production monitoring for machine lea...
 
Ml ops intro session
Ml ops   intro sessionMl ops   intro session
Ml ops intro session
 
Simplifying Model Management with MLflow
Simplifying Model Management with MLflowSimplifying Model Management with MLflow
Simplifying Model Management with MLflow
 
Knowledge Graphs and Generative AI_GraphSummit Minneapolis Sept 20.pptx
Knowledge Graphs and Generative AI_GraphSummit Minneapolis Sept 20.pptxKnowledge Graphs and Generative AI_GraphSummit Minneapolis Sept 20.pptx
Knowledge Graphs and Generative AI_GraphSummit Minneapolis Sept 20.pptx
 
Inside open metadata—the deep dive
Inside open metadata—the deep diveInside open metadata—the deep dive
Inside open metadata—the deep dive
 
What is MLOps
What is MLOpsWhat is MLOps
What is MLOps
 
Why do the majority of Data Science projects never make it to production?
Why do the majority of Data Science projects never make it to production?Why do the majority of Data Science projects never make it to production?
Why do the majority of Data Science projects never make it to production?
 
TechEvent Databricks on Azure
TechEvent Databricks on AzureTechEvent Databricks on Azure
TechEvent Databricks on Azure
 
AI as a Service, Build Shared AI Service Platforms Based on Deep Learning Tec...
AI as a Service, Build Shared AI Service Platforms Based on Deep Learning Tec...AI as a Service, Build Shared AI Service Platforms Based on Deep Learning Tec...
AI as a Service, Build Shared AI Service Platforms Based on Deep Learning Tec...
 

Similar to Scaling AI in production using PyTorch

TensorFlow meetup: Keras - Pytorch - TensorFlow.js
TensorFlow meetup: Keras - Pytorch - TensorFlow.jsTensorFlow meetup: Keras - Pytorch - TensorFlow.js
TensorFlow meetup: Keras - Pytorch - TensorFlow.js
Stijn Decubber
 
Scaling Up AI Research to Production with PyTorch and MLFlow
Scaling Up AI Research to Production with PyTorch and MLFlowScaling Up AI Research to Production with PyTorch and MLFlow
Scaling Up AI Research to Production with PyTorch and MLFlow
Databricks
 
GDG Cloud Southlake #16: Priyanka Vergadia: Scalable Data Analytics in Google...
GDG Cloud Southlake #16: Priyanka Vergadia: Scalable Data Analytics in Google...GDG Cloud Southlake #16: Priyanka Vergadia: Scalable Data Analytics in Google...
GDG Cloud Southlake #16: Priyanka Vergadia: Scalable Data Analytics in Google...
James Anderson
 
Jonathon Wright - Intelligent Performance Cognitive Learning (AIOps)
Jonathon Wright - Intelligent Performance Cognitive Learning (AIOps)Jonathon Wright - Intelligent Performance Cognitive Learning (AIOps)
Jonathon Wright - Intelligent Performance Cognitive Learning (AIOps)
Neotys_Partner
 
Reproducible AI using MLflow and PyTorch
Reproducible AI using MLflow and PyTorchReproducible AI using MLflow and PyTorch
Reproducible AI using MLflow and PyTorch
Databricks
 
Slides-Артем Коваль-Cloud-Native MLOps Framework - DataFest 2021.pdf
Slides-Артем Коваль-Cloud-Native MLOps Framework - DataFest 2021.pdfSlides-Артем Коваль-Cloud-Native MLOps Framework - DataFest 2021.pdf
Slides-Артем Коваль-Cloud-Native MLOps Framework - DataFest 2021.pdf
vitm11
 
NextGenML
NextGenML NextGenML
Overview Of Parallel Development - Ericnel
Overview Of Parallel Development -  EricnelOverview Of Parallel Development -  Ericnel
Overview Of Parallel Development - Ericnel
ukdpe
 
Automated ML Workflow for Distributed Big Data Using Analytics Zoo (CVPR2020 ...
Automated ML Workflow for Distributed Big Data Using Analytics Zoo (CVPR2020 ...Automated ML Workflow for Distributed Big Data Using Analytics Zoo (CVPR2020 ...
Automated ML Workflow for Distributed Big Data Using Analytics Zoo (CVPR2020 ...
Jason Dai
 
Machine learning model to production
Machine learning model to productionMachine learning model to production
Machine learning model to production
Georg Heiler
 
MLOps - Build pipelines with Tensor Flow Extended & Kubeflow
MLOps - Build pipelines with Tensor Flow Extended & KubeflowMLOps - Build pipelines with Tensor Flow Extended & Kubeflow
MLOps - Build pipelines with Tensor Flow Extended & Kubeflow
Jan Kirenz
 
DAOS - Scale-Out Software-Defined Storage for HPC/Big Data/AI Convergence
DAOS - Scale-Out Software-Defined Storage for HPC/Big Data/AI ConvergenceDAOS - Scale-Out Software-Defined Storage for HPC/Big Data/AI Convergence
DAOS - Scale-Out Software-Defined Storage for HPC/Big Data/AI Convergence
inside-BigData.com
 
Serverless machine learning architectures at Helixa
Serverless machine learning architectures at HelixaServerless machine learning architectures at Helixa
Serverless machine learning architectures at Helixa
Data Science Milan
 
Productionizing Machine Learning - Bigdata meetup 5-06-2019
Productionizing Machine Learning - Bigdata meetup 5-06-2019Productionizing Machine Learning - Bigdata meetup 5-06-2019
Productionizing Machine Learning - Bigdata meetup 5-06-2019
Iulian Pintoiu
 
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
DataWorks Summit
 
DevOps for DataScience
DevOps for DataScienceDevOps for DataScience
DevOps for DataScience
Stepan Pushkarev
 
Deploying Data Science Engines to Production
Deploying Data Science Engines to ProductionDeploying Data Science Engines to Production
Deploying Data Science Engines to Production
Mostafa Majidpour
 
Big Data for Testing - Heading for Post Process and Analytics
Big Data for Testing - Heading for Post Process and AnalyticsBig Data for Testing - Heading for Post Process and Analytics
Big Data for Testing - Heading for Post Process and Analytics
OPNFV
 
Peek into Neo4j Product Strategy and Roadmap
Peek into Neo4j Product Strategy and RoadmapPeek into Neo4j Product Strategy and Roadmap
Peek into Neo4j Product Strategy and Roadmap
Neo4j
 
Data Agility—A Journey to Advanced Analytics and Machine Learning at Scale
Data Agility—A Journey to Advanced Analytics and Machine Learning at ScaleData Agility—A Journey to Advanced Analytics and Machine Learning at Scale
Data Agility—A Journey to Advanced Analytics and Machine Learning at Scale
Databricks
 

Similar to Scaling AI in production using PyTorch (20)

TensorFlow meetup: Keras - Pytorch - TensorFlow.js
TensorFlow meetup: Keras - Pytorch - TensorFlow.jsTensorFlow meetup: Keras - Pytorch - TensorFlow.js
TensorFlow meetup: Keras - Pytorch - TensorFlow.js
 
Scaling Up AI Research to Production with PyTorch and MLFlow
Scaling Up AI Research to Production with PyTorch and MLFlowScaling Up AI Research to Production with PyTorch and MLFlow
Scaling Up AI Research to Production with PyTorch and MLFlow
 
GDG Cloud Southlake #16: Priyanka Vergadia: Scalable Data Analytics in Google...
GDG Cloud Southlake #16: Priyanka Vergadia: Scalable Data Analytics in Google...GDG Cloud Southlake #16: Priyanka Vergadia: Scalable Data Analytics in Google...
GDG Cloud Southlake #16: Priyanka Vergadia: Scalable Data Analytics in Google...
 
Jonathon Wright - Intelligent Performance Cognitive Learning (AIOps)
Jonathon Wright - Intelligent Performance Cognitive Learning (AIOps)Jonathon Wright - Intelligent Performance Cognitive Learning (AIOps)
Jonathon Wright - Intelligent Performance Cognitive Learning (AIOps)
 
Reproducible AI using MLflow and PyTorch
Reproducible AI using MLflow and PyTorchReproducible AI using MLflow and PyTorch
Reproducible AI using MLflow and PyTorch
 
Slides-Артем Коваль-Cloud-Native MLOps Framework - DataFest 2021.pdf
Slides-Артем Коваль-Cloud-Native MLOps Framework - DataFest 2021.pdfSlides-Артем Коваль-Cloud-Native MLOps Framework - DataFest 2021.pdf
Slides-Артем Коваль-Cloud-Native MLOps Framework - DataFest 2021.pdf
 
NextGenML
NextGenML NextGenML
NextGenML
 
Overview Of Parallel Development - Ericnel
Overview Of Parallel Development -  EricnelOverview Of Parallel Development -  Ericnel
Overview Of Parallel Development - Ericnel
 
Automated ML Workflow for Distributed Big Data Using Analytics Zoo (CVPR2020 ...
Automated ML Workflow for Distributed Big Data Using Analytics Zoo (CVPR2020 ...Automated ML Workflow for Distributed Big Data Using Analytics Zoo (CVPR2020 ...
Automated ML Workflow for Distributed Big Data Using Analytics Zoo (CVPR2020 ...
 
Machine learning model to production
Machine learning model to productionMachine learning model to production
Machine learning model to production
 
MLOps - Build pipelines with Tensor Flow Extended & Kubeflow
MLOps - Build pipelines with Tensor Flow Extended & KubeflowMLOps - Build pipelines with Tensor Flow Extended & Kubeflow
MLOps - Build pipelines with Tensor Flow Extended & Kubeflow
 
DAOS - Scale-Out Software-Defined Storage for HPC/Big Data/AI Convergence
DAOS - Scale-Out Software-Defined Storage for HPC/Big Data/AI ConvergenceDAOS - Scale-Out Software-Defined Storage for HPC/Big Data/AI Convergence
DAOS - Scale-Out Software-Defined Storage for HPC/Big Data/AI Convergence
 
Serverless machine learning architectures at Helixa
Serverless machine learning architectures at HelixaServerless machine learning architectures at Helixa
Serverless machine learning architectures at Helixa
 
Productionizing Machine Learning - Bigdata meetup 5-06-2019
Productionizing Machine Learning - Bigdata meetup 5-06-2019Productionizing Machine Learning - Bigdata meetup 5-06-2019
Productionizing Machine Learning - Bigdata meetup 5-06-2019
 
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
 
DevOps for DataScience
DevOps for DataScienceDevOps for DataScience
DevOps for DataScience
 
Deploying Data Science Engines to Production
Deploying Data Science Engines to ProductionDeploying Data Science Engines to Production
Deploying Data Science Engines to Production
 
Big Data for Testing - Heading for Post Process and Analytics
Big Data for Testing - Heading for Post Process and AnalyticsBig Data for Testing - Heading for Post Process and Analytics
Big Data for Testing - Heading for Post Process and Analytics
 
Peek into Neo4j Product Strategy and Roadmap
Peek into Neo4j Product Strategy and RoadmapPeek into Neo4j Product Strategy and Roadmap
Peek into Neo4j Product Strategy and Roadmap
 
Data Agility—A Journey to Advanced Analytics and Machine Learning at Scale
Data Agility—A Journey to Advanced Analytics and Machine Learning at ScaleData Agility—A Journey to Advanced Analytics and Machine Learning at Scale
Data Agility—A Journey to Advanced Analytics and Machine Learning at Scale
 

More from geetachauhan

Profiling PyTorch for Efficiency & Sustainability
Profiling PyTorch for Efficiency & SustainabilityProfiling PyTorch for Efficiency & Sustainability
Profiling PyTorch for Efficiency & Sustainability
geetachauhan
 
Building AI with Security Privacy in Mind
Building AI with Security Privacy in MindBuilding AI with Security Privacy in Mind
Building AI with Security Privacy in Mind
geetachauhan
 
Building AI with Security and Privacy in mind
Building AI with Security and Privacy in mindBuilding AI with Security and Privacy in mind
Building AI with Security and Privacy in mind
geetachauhan
 
Building Interpretable & Secure AI Systems using PyTorch
Building Interpretable & Secure AI Systems using PyTorchBuilding Interpretable & Secure AI Systems using PyTorch
Building Interpretable & Secure AI Systems using PyTorch
geetachauhan
 
Future is private intel dev fest
Future is private   intel dev festFuture is private   intel dev fest
Future is private intel dev fest
geetachauhan
 
Decentralized AI Draper
Decentralized AI   DraperDecentralized AI   Draper
Decentralized AI Draper
geetachauhan
 
Decentralized AI: Convergence of AI + Blockchain
Decentralized AI: Convergence of AI + Blockchain Decentralized AI: Convergence of AI + Blockchain
Decentralized AI: Convergence of AI + Blockchain
geetachauhan
 
Decentralized AI: Convergence of Blockchain + AI
Decentralized AI: Convergence of Blockchain + AIDecentralized AI: Convergence of Blockchain + AI
Decentralized AI: Convergence of Blockchain + AI
geetachauhan
 
Decentralized AI: Convergence of Blockchain + AI
Decentralized AI: Convergence of Blockchain + AIDecentralized AI: Convergence of Blockchain + AI
Decentralized AI: Convergence of Blockchain + AI
geetachauhan
 
Deep learning for medical imaging
Deep learning for medical imagingDeep learning for medical imaging
Deep learning for medical imaging
geetachauhan
 
Deep learning for FinTech
Deep learning for FinTechDeep learning for FinTech
Deep learning for FinTech
geetachauhan
 
NIPS - Deep learning @ Edge using Intel's NCS
NIPS - Deep learning @ Edge using Intel's NCSNIPS - Deep learning @ Edge using Intel's NCS
NIPS - Deep learning @ Edge using Intel's NCS
geetachauhan
 
Best Practices for On-Demand HPC in Enterprises
Best Practices for On-Demand HPC in EnterprisesBest Practices for On-Demand HPC in Enterprises
Best Practices for On-Demand HPC in Enterprises
geetachauhan
 
Deep learning @ Edge using Intel's Neural Compute Stick
Deep learning @ Edge using Intel's Neural Compute StickDeep learning @ Edge using Intel's Neural Compute Stick
Deep learning @ Edge using Intel's Neural Compute Stick
geetachauhan
 
Distributed deep learning optimizations for Finance
Distributed deep learning optimizations for FinanceDistributed deep learning optimizations for Finance
Distributed deep learning optimizations for Finance
geetachauhan
 
Distributed deep learning optimizations - AI WithTheBest
Distributed deep learning optimizations - AI WithTheBestDistributed deep learning optimizations - AI WithTheBest
Distributed deep learning optimizations - AI WithTheBest
geetachauhan
 
Distributed deep learning optimizations
Distributed deep learning optimizationsDistributed deep learning optimizations
Distributed deep learning optimizations
geetachauhan
 
Tensorflow IoT - 1 Wk coding challenge
Tensorflow IoT - 1 Wk coding challengeTensorflow IoT - 1 Wk coding challenge
Tensorflow IoT - 1 Wk coding challenge
geetachauhan
 
Intel optimized tensorflow, distributed deep learning
Intel optimized tensorflow, distributed deep learningIntel optimized tensorflow, distributed deep learning
Intel optimized tensorflow, distributed deep learning
geetachauhan
 
Transfer learning for IoT
Transfer learning for IoTTransfer learning for IoT
Transfer learning for IoT
geetachauhan
 

More from geetachauhan (20)

Profiling PyTorch for Efficiency & Sustainability
Profiling PyTorch for Efficiency & SustainabilityProfiling PyTorch for Efficiency & Sustainability
Profiling PyTorch for Efficiency & Sustainability
 
Building AI with Security Privacy in Mind
Building AI with Security Privacy in MindBuilding AI with Security Privacy in Mind
Building AI with Security Privacy in Mind
 
Building AI with Security and Privacy in mind
Building AI with Security and Privacy in mindBuilding AI with Security and Privacy in mind
Building AI with Security and Privacy in mind
 
Building Interpretable & Secure AI Systems using PyTorch
Building Interpretable & Secure AI Systems using PyTorchBuilding Interpretable & Secure AI Systems using PyTorch
Building Interpretable & Secure AI Systems using PyTorch
 
Future is private intel dev fest
Future is private   intel dev festFuture is private   intel dev fest
Future is private intel dev fest
 
Decentralized AI Draper
Decentralized AI   DraperDecentralized AI   Draper
Decentralized AI Draper
 
Decentralized AI: Convergence of AI + Blockchain
Decentralized AI: Convergence of AI + Blockchain Decentralized AI: Convergence of AI + Blockchain
Decentralized AI: Convergence of AI + Blockchain
 
Decentralized AI: Convergence of Blockchain + AI
Decentralized AI: Convergence of Blockchain + AIDecentralized AI: Convergence of Blockchain + AI
Decentralized AI: Convergence of Blockchain + AI
 
Decentralized AI: Convergence of Blockchain + AI
Decentralized AI: Convergence of Blockchain + AIDecentralized AI: Convergence of Blockchain + AI
Decentralized AI: Convergence of Blockchain + AI
 
Deep learning for medical imaging
Deep learning for medical imagingDeep learning for medical imaging
Deep learning for medical imaging
 
Deep learning for FinTech
Deep learning for FinTechDeep learning for FinTech
Deep learning for FinTech
 
NIPS - Deep learning @ Edge using Intel's NCS
NIPS - Deep learning @ Edge using Intel's NCSNIPS - Deep learning @ Edge using Intel's NCS
NIPS - Deep learning @ Edge using Intel's NCS
 
Best Practices for On-Demand HPC in Enterprises
Best Practices for On-Demand HPC in EnterprisesBest Practices for On-Demand HPC in Enterprises
Best Practices for On-Demand HPC in Enterprises
 
Deep learning @ Edge using Intel's Neural Compute Stick
Deep learning @ Edge using Intel's Neural Compute StickDeep learning @ Edge using Intel's Neural Compute Stick
Deep learning @ Edge using Intel's Neural Compute Stick
 
Distributed deep learning optimizations for Finance
Distributed deep learning optimizations for FinanceDistributed deep learning optimizations for Finance
Distributed deep learning optimizations for Finance
 
Distributed deep learning optimizations - AI WithTheBest
Distributed deep learning optimizations - AI WithTheBestDistributed deep learning optimizations - AI WithTheBest
Distributed deep learning optimizations - AI WithTheBest
 
Distributed deep learning optimizations
Distributed deep learning optimizationsDistributed deep learning optimizations
Distributed deep learning optimizations
 
Tensorflow IoT - 1 Wk coding challenge
Tensorflow IoT - 1 Wk coding challengeTensorflow IoT - 1 Wk coding challenge
Tensorflow IoT - 1 Wk coding challenge
 
Intel optimized tensorflow, distributed deep learning
Intel optimized tensorflow, distributed deep learningIntel optimized tensorflow, distributed deep learning
Intel optimized tensorflow, distributed deep learning
 
Transfer learning for IoT
Transfer learning for IoTTransfer learning for IoT
Transfer learning for IoT
 

Recently uploaded

Navigating Post-Quantum Blockchain: Resilient Cryptography in Quantum Threats
Navigating Post-Quantum Blockchain: Resilient Cryptography in Quantum ThreatsNavigating Post-Quantum Blockchain: Resilient Cryptography in Quantum Threats
Navigating Post-Quantum Blockchain: Resilient Cryptography in Quantum Threats
anupriti
 
@Call @Girls Pune 0000000000 Riya Khan Beautiful Girl any Time
@Call @Girls Pune 0000000000 Riya Khan Beautiful Girl any Time@Call @Girls Pune 0000000000 Riya Khan Beautiful Girl any Time
@Call @Girls Pune 0000000000 Riya Khan Beautiful Girl any Time
amitchopra0215
 
K2G - Insurtech Innovation EMEA Award 2024
K2G - Insurtech Innovation EMEA Award 2024K2G - Insurtech Innovation EMEA Award 2024
K2G - Insurtech Innovation EMEA Award 2024
The Digital Insurer
 
Interaction Latency: Square's User-Centric Mobile Performance Metric
Interaction Latency: Square's User-Centric Mobile Performance MetricInteraction Latency: Square's User-Centric Mobile Performance Metric
Interaction Latency: Square's User-Centric Mobile Performance Metric
ScyllaDB
 
5G bootcamp Sep 2020 (NPI initiative).pptx
5G bootcamp Sep 2020 (NPI initiative).pptx5G bootcamp Sep 2020 (NPI initiative).pptx
5G bootcamp Sep 2020 (NPI initiative).pptx
SATYENDRA100
 
Paradigm Shifts in User Modeling: A Journey from Historical Foundations to Em...
Paradigm Shifts in User Modeling: A Journey from Historical Foundations to Em...Paradigm Shifts in User Modeling: A Journey from Historical Foundations to Em...
Paradigm Shifts in User Modeling: A Journey from Historical Foundations to Em...
Erasmo Purificato
 
Pigging Solutions Sustainability brochure.pdf
Pigging Solutions Sustainability brochure.pdfPigging Solutions Sustainability brochure.pdf
Pigging Solutions Sustainability brochure.pdf
Pigging Solutions
 
Quality Patents: Patents That Stand the Test of Time
Quality Patents: Patents That Stand the Test of TimeQuality Patents: Patents That Stand the Test of Time
Quality Patents: Patents That Stand the Test of Time
Aurora Consulting
 
Performance Budgets for the Real World by Tammy Everts
Performance Budgets for the Real World by Tammy EvertsPerformance Budgets for the Real World by Tammy Everts
Performance Budgets for the Real World by Tammy Everts
ScyllaDB
 
Data Protection in a Connected World: Sovereignty and Cyber Security
Data Protection in a Connected World: Sovereignty and Cyber SecurityData Protection in a Connected World: Sovereignty and Cyber Security
Data Protection in a Connected World: Sovereignty and Cyber Security
anupriti
 
Cookies program to display the information though cookie creation
Cookies program to display the information though cookie creationCookies program to display the information though cookie creation
Cookies program to display the information though cookie creation
shanthidl1
 
HTTP Adaptive Streaming – Quo Vadis (2024)
HTTP Adaptive Streaming – Quo Vadis (2024)HTTP Adaptive Streaming – Quo Vadis (2024)
HTTP Adaptive Streaming – Quo Vadis (2024)
Alpen-Adria-Universität
 
Implementations of Fused Deposition Modeling in real world
Implementations of Fused Deposition Modeling  in real worldImplementations of Fused Deposition Modeling  in real world
Implementations of Fused Deposition Modeling in real world
Emerging Tech
 
Observability For You and Me with OpenTelemetry
Observability For You and Me with OpenTelemetryObservability For You and Me with OpenTelemetry
Observability For You and Me with OpenTelemetry
Eric D. Schabell
 
WhatsApp Image 2024-03-27 at 08.19.52_bfd93109.pdf
WhatsApp Image 2024-03-27 at 08.19.52_bfd93109.pdfWhatsApp Image 2024-03-27 at 08.19.52_bfd93109.pdf
WhatsApp Image 2024-03-27 at 08.19.52_bfd93109.pdf
ArgaBisma
 
“Intel’s Approach to Operationalizing AI in the Manufacturing Sector,” a Pres...
“Intel’s Approach to Operationalizing AI in the Manufacturing Sector,” a Pres...“Intel’s Approach to Operationalizing AI in the Manufacturing Sector,” a Pres...
“Intel’s Approach to Operationalizing AI in the Manufacturing Sector,” a Pres...
Edge AI and Vision Alliance
 
What’s New in Teams Calling, Meetings and Devices May 2024
What’s New in Teams Calling, Meetings and Devices May 2024What’s New in Teams Calling, Meetings and Devices May 2024
What’s New in Teams Calling, Meetings and Devices May 2024
Stephanie Beckett
 
Hire a private investigator to get cell phone records
Hire a private investigator to get cell phone recordsHire a private investigator to get cell phone records
Hire a private investigator to get cell phone records
HackersList
 
AC Atlassian Coimbatore Session Slides( 22/06/2024)
AC Atlassian Coimbatore Session Slides( 22/06/2024)AC Atlassian Coimbatore Session Slides( 22/06/2024)
AC Atlassian Coimbatore Session Slides( 22/06/2024)
apoorva2579
 
Knowledge and Prompt Engineering Part 2 Focus on Prompt Design Approaches
Knowledge and Prompt Engineering Part 2 Focus on Prompt Design ApproachesKnowledge and Prompt Engineering Part 2 Focus on Prompt Design Approaches
Knowledge and Prompt Engineering Part 2 Focus on Prompt Design Approaches
Earley Information Science
 

Recently uploaded (20)

Navigating Post-Quantum Blockchain: Resilient Cryptography in Quantum Threats
Navigating Post-Quantum Blockchain: Resilient Cryptography in Quantum ThreatsNavigating Post-Quantum Blockchain: Resilient Cryptography in Quantum Threats
Navigating Post-Quantum Blockchain: Resilient Cryptography in Quantum Threats
 
@Call @Girls Pune 0000000000 Riya Khan Beautiful Girl any Time
@Call @Girls Pune 0000000000 Riya Khan Beautiful Girl any Time@Call @Girls Pune 0000000000 Riya Khan Beautiful Girl any Time
@Call @Girls Pune 0000000000 Riya Khan Beautiful Girl any Time
 
K2G - Insurtech Innovation EMEA Award 2024
K2G - Insurtech Innovation EMEA Award 2024K2G - Insurtech Innovation EMEA Award 2024
K2G - Insurtech Innovation EMEA Award 2024
 
Interaction Latency: Square's User-Centric Mobile Performance Metric
Interaction Latency: Square's User-Centric Mobile Performance MetricInteraction Latency: Square's User-Centric Mobile Performance Metric
Interaction Latency: Square's User-Centric Mobile Performance Metric
 
5G bootcamp Sep 2020 (NPI initiative).pptx
5G bootcamp Sep 2020 (NPI initiative).pptx5G bootcamp Sep 2020 (NPI initiative).pptx
5G bootcamp Sep 2020 (NPI initiative).pptx
 
Paradigm Shifts in User Modeling: A Journey from Historical Foundations to Em...
Paradigm Shifts in User Modeling: A Journey from Historical Foundations to Em...Paradigm Shifts in User Modeling: A Journey from Historical Foundations to Em...
Paradigm Shifts in User Modeling: A Journey from Historical Foundations to Em...
 
Pigging Solutions Sustainability brochure.pdf
Pigging Solutions Sustainability brochure.pdfPigging Solutions Sustainability brochure.pdf
Pigging Solutions Sustainability brochure.pdf
 
Quality Patents: Patents That Stand the Test of Time
Quality Patents: Patents That Stand the Test of TimeQuality Patents: Patents That Stand the Test of Time
Quality Patents: Patents That Stand the Test of Time
 
Performance Budgets for the Real World by Tammy Everts
Performance Budgets for the Real World by Tammy EvertsPerformance Budgets for the Real World by Tammy Everts
Performance Budgets for the Real World by Tammy Everts
 
Data Protection in a Connected World: Sovereignty and Cyber Security
Data Protection in a Connected World: Sovereignty and Cyber SecurityData Protection in a Connected World: Sovereignty and Cyber Security
Data Protection in a Connected World: Sovereignty and Cyber Security
 
Cookies program to display the information though cookie creation
Cookies program to display the information though cookie creationCookies program to display the information though cookie creation
Cookies program to display the information though cookie creation
 
HTTP Adaptive Streaming – Quo Vadis (2024)
HTTP Adaptive Streaming – Quo Vadis (2024)HTTP Adaptive Streaming – Quo Vadis (2024)
HTTP Adaptive Streaming – Quo Vadis (2024)
 
Implementations of Fused Deposition Modeling in real world
Implementations of Fused Deposition Modeling  in real worldImplementations of Fused Deposition Modeling  in real world
Implementations of Fused Deposition Modeling in real world
 
Observability For You and Me with OpenTelemetry
Observability For You and Me with OpenTelemetryObservability For You and Me with OpenTelemetry
Observability For You and Me with OpenTelemetry
 
WhatsApp Image 2024-03-27 at 08.19.52_bfd93109.pdf
WhatsApp Image 2024-03-27 at 08.19.52_bfd93109.pdfWhatsApp Image 2024-03-27 at 08.19.52_bfd93109.pdf
WhatsApp Image 2024-03-27 at 08.19.52_bfd93109.pdf
 
“Intel’s Approach to Operationalizing AI in the Manufacturing Sector,” a Pres...
“Intel’s Approach to Operationalizing AI in the Manufacturing Sector,” a Pres...“Intel’s Approach to Operationalizing AI in the Manufacturing Sector,” a Pres...
“Intel’s Approach to Operationalizing AI in the Manufacturing Sector,” a Pres...
 
What’s New in Teams Calling, Meetings and Devices May 2024
What’s New in Teams Calling, Meetings and Devices May 2024What’s New in Teams Calling, Meetings and Devices May 2024
What’s New in Teams Calling, Meetings and Devices May 2024
 
Hire a private investigator to get cell phone records
Hire a private investigator to get cell phone recordsHire a private investigator to get cell phone records
Hire a private investigator to get cell phone records
 
AC Atlassian Coimbatore Session Slides( 22/06/2024)
AC Atlassian Coimbatore Session Slides( 22/06/2024)AC Atlassian Coimbatore Session Slides( 22/06/2024)
AC Atlassian Coimbatore Session Slides( 22/06/2024)
 
Knowledge and Prompt Engineering Part 2 Focus on Prompt Design Approaches
Knowledge and Prompt Engineering Part 2 Focus on Prompt Design ApproachesKnowledge and Prompt Engineering Part 2 Focus on Prompt Design Approaches
Knowledge and Prompt Engineering Part 2 Focus on Prompt Design Approaches
 

Scaling AI in production using PyTorch

  • 1. 1 7 J U N E 2 0 2 1 S C A L I N G A I I N P R O D U C T I O N U S I N G P Y T O R C H G E E T A C H A U H A N PyTorch Partner Engineering, Facebook AI @ C H A U H A N G
  • 2. MLOPS World 2021 A G E N D A 0 1 C H A L L E N G E S W I T H M L I N P R O D U C T I O N 0 2 T O R C H S E R V E O V E R V I E W 0 3 B E S T P R A C T I C E S F O R P R O D U C T I O N D E P L O Y M E N T
  • 3. MLOps World 2021 P Y T O R C H C O M M U N I T Y G R O W T H Source: https://paperswithcode.com/trends
  • 4. MLOps World 2021 ● ● ● Cloud / On-Prem Preprocessing Application Application logic Application logic Postprocessing . . . . . . . . . Performance Ease of use Cost efficiency Deployment at scale C H A L L E N G E S W I T H M L I N D E P L O Y M E N T
  • 5. MLOps World 2021 INFERENCE AT SCALE Deploying and managing models in production is di ffi cult. Some of the pain points include: Loading and managing multiple models, on multiple servers or end devices Running pre-processing and post-processing code on prediction requests. How to log, monitor and secure predictions What happens when you hit scale?
  • 6. MLOps World 2021 TORCHSERVE Easily deploy PyTorch models in production at scale D E F A U LT H A N D L E R S F O R C O M M O N T A S K S L O W L AT E N C Y M O D E L S E R V I N G W O R K S W I T H A N Y M L E N V I R O N M E N T
  • 7. MLOps World 2021 • Default handlers for common use cases (e.g., image segmentation, text classification) along with custom handlers support for other use cases and a Model Zoo • Multi-model serving, Model versioning and ability to roll back to an earlier version • Automatic batching of individual inferences across HTTP requests • Logging including common metrics, and the ability to incorporate custom metrics • Robust HTTP APIS - Management and Inference model1.pth model1.pth model1.pth torch-model-archiver HTTP HTTP http://localhost:8080/ … http://localhost:8081/ … Logging Metrics model1.mar model2.mar model3.mar model4.mar model5.mar <path>/model_store Inference API Management API TorchServe Metrics API Inference API Serving Model 3 Serving Model 2 Serving Model 1 torchserve --start TORCHSERVE
  • 8. T O R C H S E R V E D E T A I L : M O D E L H A N D L E R S TorchServe has default model handlers that perform boilerplate data transforms for common cases: • Image Classification • Image Segmentation • Object Detection • Text Classification You can also create custom model handlers for any model and inference task. import torch class MyModelHandler(object):     def initialize(self, context): # get GPU status & device handle # load model & supporting files (vocabularies etc.)     def preprocess(self, data): # put incoming data into tensor # transform as needed for your model     def inference(self, context): # do predictions     def postprocess(self, output): # process inference output, e.g. extracting top K # package output for web delivery     def handle(self, context): if not _service.initialized: _service.initialize(context) if data is None: return None data = _service.preprocess(data) data = _service.inference(data) data = _service.postprocess(data) return data
  • 9. M O D E L A R C H I V E torch-model-archiver cli tool for packaging all model artifacts into a single deployment unit • model checkpoints or model definition file with state_dict • torchscript and eager mode support • Extra files like vocab, config, index_to_name mapping torch-model-archiver 
 —model-name BERTSeqClassification_Torchscript 
 --version 1.0 
 --serialized-file Transformer_model/traced_model.pt 
 --handler ./Transformer_handler_generalized.py 
 --extra-files "./setup_config.json,./ Seq_classification_artifacts/index_to_name.json" 
 

 setup.config 
 { “model_name": "bert-base-uncased", “mode": "sequence_classification", “do_lower_case": "True", “num_labels": "2", “save_mode": "torchscript", “max_length": "150" } 
 
 torchserve --start 
 --model-store model_store 
 —-models <path-to model-file/s3-url/azure-blob-url> https://github.com/pytorch/serve/tree/master/model-archiver#creating-a-model-archive
  • 10. D Y N A M I C B A T C H I N G Via Custom Handlers • Model Configuration based • batch_size Max batch size • max_batch_delay The max batch delay time TorchServe waits to receive batch_size number of requests 
 • (Coming soon) Batching support in default handlers curl localhost:8081/models/resnet-152 { "modelName": "resnet-152", "modelUrl": "https://s3.amazonaws.com/model-server/ model_archive_1.0/examples/resnet-152-batching/resnet-152.ma "runtime": "python", "minWorkers": 1, "maxWorkers": 1, "batchSize": 8, "maxBatchDelay": 10, "workers": [ { "id": "9008", "startTime": "2019-02-19T23:56:33.907Z", "status": "READY", "gpu": false, "memoryUsage": 607715328 } ] } https://github.com/pytorch/serve/blob/master/docs/batch_inference_with_ts.md
  • 11. M E T R I C S Out of box metrics with ability to extend • CPU, Disk, Memory utilization • Requests type count • ts.metrics class for extension • Types supported - Size, percentage, counter, general metric • Prometheus metrics support available # Access context metrics as follows metrics = context.metrics # Create Dimension Object from ts.metrics.dimension import Dimension # Dimensions are name value pairs dim1 = Dimension(name, value) . dimN= Dimension(name_n, value_n) # Add Distance as a metric # dimensions = [dim1, dim2, dim3, ..., dimN] metrics.add_metric('DistanceInKM', distance, 'km', dimensions=dimensions) # Add Image size as a size metric metrics.add_size('SizeOfImage', img_size, None, 'MB', dimensions) # Add MemoryUtilization as a percentage metric metrics.add_percent('MemoryUtilization', utilization_percent, None, dimensions) # Create a counter with name 'LoopCount' and dimensions metrics.add_counter('LoopCount', 1, None, dimensions) # Log custom metrics for metric in metrics.store: logger.info("[METRICS]%s", str(metric)) https://github.com/pytorch/serve/blob/master/docs/metrics.md
  • 12. MLOps World 2021 RECENT FEATURES + Ensemble Model support, Captum Model Interpretability + Kubeflow Pipelines /KFServing Integration with Auto-scaling and Canary rollout on any cloud/on-prem 
 + GCP Vertex AI Serverless pipelines + MLflow Integration + Prometheus Integration with Grafana + Multiple nodes on EC2, Autoscaling on SageMaker/EKS, AWS Inferentia support + MMF, NMT, DeepLapV3 new examples 
 

  • 13. Deployment models Optimizations Resilience Measurement Responsible AI Standalon e Primary backu p Orchestratio n Cloud vs. 
 on-premises Performance vs. latency TorchScript profilin g Offline vs. real-tim e Cost Robust endpoin t Auto-scalin g Canary deployment s A / B testing Metric s Model performanc e Interpretabilit y Feedback loop Fairnes s Human-centered design B E S T P R A C T I C E S F O R P R O D U C T I O N D E P L O Y M E N T S
  • 14. MLOps World 2021 Fairness by design • Measure skewness of data, model bias, data bias; identify relevant metrics • Transparency, Explainable AI, inclusive design Human-centered design • Consider AI-driven decisions and their impact on people at the time of model design • Provide ability to have human recourse vs. full automation – for example, need to avoid a mortgage applications AI rejecting people of certain category or race • Computer vision models measure results based on demographics; for example, include support for different skin tones, age groups R E S P O N S I B L E A I
  • 15. MLOps World 2021 • Build with performance vs. latency goals in mind • Reduce size of the model: Quantization, pruning, mixed precision training • Reduce latency: TorchScript model; use SnakeViz profiler • Evaluate GPU vs. CPU for low latency • Evaluate REST vs. gRPC for your prediction service O P T I M I Z A T I O N S
  • 16. MLOps World 2021 fp32 accuracy int8 accuracy change Technique CPU inference speed up ResNet50 76.1 
 Top-1, Imagenet -0.2 
 75.9 Post Training 2x 
 214ms ➙102ms, 
 Intel Skylake-DE MobileNetV2 71.9 Top-1, Imagenet -0.3 71.6 Quantization-Aware Training 4x 
 75ms ➙18ms 
 OnePlus 5, Snapdragon 835 Translate / FairSeq 32.78 
 BLEU, IWSLT 2014 de-en 0.0 
 32.78 Dynamic 
 (weights only) 4x 
 for encoder 
 Intel Skylake-SE These models and more available on TorchHub - https://pytorch.org/hub/ QUANTIZATION
  • 17. MLOps World 2021 B E R T M O D E L P R O F I L I N G Eager Mode
  • 18. MLOps World 2021 B E R T M O D E L P R O F I L I N G Torchscript Mode 4x speedup
  • 19. MLOps World 2021 Offline vs. real-time predictions • Offline: Dynamic batching • Online: Async processing – push/poll • Pre-computed predictions for certain elements Cost optimizations • Spot Instances for offline • Autoscaling based on metrics, on-demand cluster • Evaluate AI Accelerators supported like AWS Inferentia for lower cost point O P T I M I Z A T I O N S ( C O N T D . )
  • 20. MLOps World 2021 Develop , Test Production Staging , Experiments Hybrid Cloud On-prem Cloud Managed Install from Source Standalone Docker Large Scale
 Production MLflow, Kubeflow Kubernetes, Kubeflow/KFserving Primary/Backup, ML Microservices Autoscaling, Canary Rollouts Minikub e Self managed Docker AWS CloudFormation CLOUD VMs/ Containers Microservices behind API Gateway CLOUD VMs/ Containers AWS SageMaker Endpoints, BYOC AWS SageMaker EKS/AKS/GKE AWS SageMaker/ GCP AI Platform Serverless Functions GCP Vertex AI, AWS SageMaker Canary Rollouts Databricks Managed MLflow D E P L O Y I N G M O D E L S I N P R O D U C T I O N
  • 21. MLOps World 2021 Create robust endpoint for serving, for example, SageMaker endpoint Auto-scaling with orchestration deployments, multi-node for EC2, and other scenarios Canary deployments, test new version of a model on small subset before making default Shadow inference, deploy new version of model in parallel A / B testing of different versions of model R E S I L L I E N C E
  • 22. MLOps World 2021 Define model performance metrics, such as accuracy, while designing the AI service; use-case specific Add custom metrics as appropriate Use CloudWatch or Prometheus dashboards for monitoring model performance Model interpretability analysis via Captum Deploy with a feedback loop, if model accuracy drops over time or new version, analyze issues like concept drift, stale data, etc. M E A S U R E M E N T
  • 23. MLOps World 2021 Understand Align Mitigate Monitor Measure Stakeholder conversations to find 
 consensus and outline measurement and mitigation plans Analyze model performance, 
 label bias, outcomes, and other relevant signals Address observed 
 issues in dataset, 
 models, policies, etc How might the product’s goals, its policy, and its implementation affect users from different subgroups? Identify contextual definitions of fairness Monitor effect of mitigations on 
 subgroups, and ensure fairness analysis holds as product adapts FAIRNESS BY DESIGN
  • 24. CAPTUM Text Contributions: 7.54 Image Contributions: 11.19 Total Contributions: 18.73 0 200 400 600 800 400 300 200 100 0 S U P P O R T F O R AT T R I B U T I O N A LG O R I T H M S 
 T O I N T E R P R E T: • Output predictions with respect to inputs • Output predictions with respect to layers • Neurons with respect to inputs • Currently provides gradient & perturbation based approaches (e.g. Integrated Gradients) Model interpretability library for PyTorch https://captum.ai/
  • 25. MLOps World 2021 DYNABOARD & FLORES 101 WMT COMPETITION http://www.statmt.org/wmt21/large-scale-multilingual-translation-task.html https://github.com/facebookresearch/dynalab https://dynabench.org/tasks/3#overall
  • 26. MLOps World 2021 COMMUNIT Y PROJECTS https://github.com/cceyda/torchserve-dashboard https://github.com/Unity-Technologies/SynthDet https://medium.com/pytorch/how-wadhwani-ai-uses-pytorch- to-empower-cotton-farmers-14397f4c9f2b
  • 27. MLOps World 2021 FUTURE RELEASES + Improved memory and resource usage for better scalability + C++ Backend for lower latency + Enhanced profiling tools
  • 28. • TorchServe: https://github.com/pytorch/serve • Management API: https://github.com/pytorch/serve/blob/master/docs/management_api.md • Inference API: https://github.com/pytorch/serve/blob/master/docs/inference_api.md • Language Translation Ensemble example: https://github.com/pytorch/serve/tree/master/examples/Work fl ows/nmt_tranformers_pipeline • BERT Model example: https://github.com/pytorch/serve/tree/master/examples/Huggingface_Transformers • Model Zoo: https://github.com/pytorch/serve/blob/master/docs/model_zoo.md • SnakeViz visualizations: https://github.com/pytorch/serve/tree/master/benchmarks#visualize-snakeviz-results • Logging: https://github.com/pytorch/serve/blob/master/docs/logging.md • Metrics: https://github.com/pytorch/serve/blob/master/docs/metrics.md • Prometheus Metrics: https://gith ub.com/pytorch/serve/blob/master/docs/metrics_api.md • Batch Inference: https://github.com/pytorch/serve/blob/master/docs/batch_inference_with_ts.md • Kube fl ow Pipelines: https://github.com/kube fl ow/pipelines/tree/master/components/PyTorch/pytorch-kfp-components • Kubernetes support: https://github.com/pytorch/serve/blob/master/kubernetes/README.md • TorchServe Dashboard (Community): https://cceyda.github.io/blog/torchserve/streamlit/dashboard/2020/10/15/torchserve.html • Custom Handler community blog: https://towardsdatascience.com/deploy-models-and-create-custom-handlers-in-torchserve- fc2d048fbe91 • Captum Interpretability for BERT models: https://github.com/pytorch/serve/blob/master/captum/Captum_visualization_for_bert.ipynb • Operationalize, Scale and Infuse Trust in AI using KFServing: https://blog.kube fl ow.org/release/o ffi cial/2021/03/08/kfserving-0.5.html REFERENCES