(Go: >> BACK << -|- >> HOME <<)

SlideShare a Scribd company logo
The Importance of Memory
for Breaking the Edge AI
Performance Bottleneck
Wil Florentino
Sr. Segment Marketing Manager
Micron Technology
Edge AI reveals memory as the bottleneck
Trend toward memory-bound applications
Model complexity vs.
memory bandwidth
• Transformer size growth
410x / 2 years
• AI HW memory bandwidth
2x / 2 years1
Pre-processing latency
in AI execution
• Data pre-preprocessing
overhead2 impacts latency
$/GB vs. scalability
• SRAM: $5,000/GB
• DRAM: $50/GB3
1 “AI and memory wall,” Medium, 2021 2 “Rapid Data Pre-Processing with NVIDIA DALI” NVIDIA Technical Blog, 2021 3 “SRAM vs. DRAM: Difference between SRAM & DRAM explained,” Enterprise Storage Forum, 2023
2
$/GB
On-chip SRAM
Off-chip DRAM
AI Scalability
Deep
Learning
Other Machine
Learning
Approaches
Recent
Memory Requirements
over Time
Model
Size
GenAI
Pre-processing
overhead
Communication
overhead
Model inference
time
© 2024 Micron Technology
• Training data trade-offs
between cost of storage
on-premise vs. cloud
• Complexity of on-device
implementation in target
• Type of and memory
performance influence
the efficiency of running
the model
• Power consumption
DNN challenges relate back to memory and storage
Edge AI and Vision Alliance report on DNN implementation challenges
3
© 2024 Micron Technology
• CPU core counts are increasing
at a rate that minimizes available
memory bandwidth per core
• New memory technologies are
required to meet next-generation
bandwidth-per-core requirements
in multi-core CPUs
• Edge AI inference compute
requires additional memory
consideration
DRAM memory bandwidth per core has been declining
4
Multicore CPU architectures vs.
memory BW / core
2004
6.4 GB/s per core
Source: Micron. bandwidth normalized to x64 interface, 64Byte random accesses, 66% reads, dual-rank x4 simulation, 16Gb. Best estimates; subject to change.
© 2024 Micron Technology
Configuration
• Density per die
• Die per package
• I/O width
• Bank groups
• Technology node
Performance
• Speed/pin
• Number of channels
• Prefetch size
• Burst length
• Read latency
The many levers of a memory device
Complex design considerations for memory improve performance and lower costs
5
Operational
• On-die Error Correction
• Thermal profile
• Refresh management
• Power reduction modes
• Active vs. standby power
(picojoule/bit)
Application focus
• Functional safety
• Reliability/Availability/Serviceability
• Extended temperature
• Validation and testing
• Product lifecycle
• Industrial rated
• Auto validated
© 2024 Micron Technology
DDR5 for data-intensive training workloads
• Burst length
• Bank groups
• Banks
6
capability
DDR5 memory comparisons
Increased bandwidth more than 3x1
DDR5-3200
1.39x
2.03x
3.05x
23.4
GB/s
DDR5-4800 DDR5-8800
34.2
GB/s
51.2
GB/s
66%
efficiency
DDR4-3200
89%
efficiency
DDR5-4800
Higher bus efficiency up to 90%1
Faster transfer speed up to 8800 MT*/s2
Improved overall workload
performance3
Cloud
Virtualization 40%
Data center
Business apps 45%
High-performance computing
HPC modeling >200%
128GB high-capacity RDIMM
using monolithic 32Gb DRAM
1 Benchmark simulation comparison of DDR5 vs. DDR4-3200 2 Based on defined JEDEC specification 3 Results based on internal testing, third-party testing and/or industry workload benchmark testing *Mega-transfers per second
© 2024 Micron Technology
BW of LP4 @ 4.2Gbps/pin
IO per device (x16/x32/x64)
8 GB/s ● 17 GB/s ● 33 GB/s ● 33 GB/s ●
Number of LP4
packaged devices
3 6 7 14!
Compute bandwidth requirements by edge solution
AI TOPs* vs. number of LPDDR4 devices scenarios
Sensor edge Device edge Network edge Compute edge
IoT sensors and ultra low power
devices (TinyML)
Cameras, machines and
industrial/SFF PC/server
Industrial PC/server, network
equipment, NVR/VMS appliances
Server/NVR/VMS
appliances
Power <1W 2W <= 15W 15W <=75W 15W <= 75W+
SoC/ASIC IO width (typical) x16 x32 x64 x128
DLA INT 8 TOPS <4 4–20 20–50 50–100
Est. bandwidth to full
utilization of accelerator
[saturate accelerator**]
18 GB/s 90 GB/s 225 GB/s 451 GB/s
7
x16 x32 x64
1 *relative reference models only, actuals will vary. 2 **Device Level Accelerator bandwidth assumed roofline modeling (Resnet 50) 3 "V. Sze, Y. -H. Chen, T. -J. Yang and J. S. Emer, "How to Evaluate Deep Neural Network Processors: TOPS/W (Alone) Considered
Harmful," in IEEE Solid-State Circuits Magazine, vol. 12, no. 3, pp. 28-41"
© 2024 Micron Technology
LPDDR5 offers a leap in performance and possibilities
8
© 2024 Micron Technology
12.8
25.6
51.2
17
34
68
19.2
38.4
76.8
0
10
20
30
40
50
60
70
80
90
x16 channel x32 channel x64 channel
GB/s
Data rate in Gbps/pin
LPDDR5X bandwidth at different channel and pin speed
6.4 8.5 9.6
• Reduces number of components to get to same bandwidth
• Improved architecture
• Lower power [pj/bit]
6x throughput
Data Rate
2Gbps
4Gbps
6Gbps
9Gbps
Improved
Performance
2012~ 2021~
50% faster
Improved Power Savings Features
[mW/GBps index]
1.0
LP3 LP5x
Lower Power
Consumption
LP3
LP4 LP4x
LP5
LP5x
Memory footprint as a function of batch size
Tiling for small object detection in high-resolution vision
9
Example: Batch size: 9 x N
Convolutional model
Stacked
inputs
Meta AI-generated image
(Imagine Platform)
[1] Small object detection: An image tiling based approach, Medium, 2021 [Link]
[2] S. Ngyuyen, et al., “Dynamic tiling: A model-agnostic, adaptive, scalable, and inference-data-centric approach for efficient and
accurate small object detection,” arXiv:2309.11069v1, 2023
[3] F. Akyon, et al., “SAHI: Slicing aided hyper inference and fine-tuning for small object detection,” IEEE ICIP, 2022
[4] F. Unel, et al., “The power of tiling for small object detection,” CVPR, 2019
[5] Training vs. inference – Memory consumption by neural networks [Link]
[6] GitHub: TorchInfo [Link]
[7] Model not quantized (fp32). Memory footprint of two largest consecutive layers.
Higher batch size improves results
Tiling high-resolution images
0
1,000
2,000
3,000
4,000
5,000
6,000
7,000
1 2 4 8 16 32 64 128
Memory for inference YOLOv8x across* batch sizes
Memory Density
Batch size impacts
the memory footprint
6.1GB memory
requirement
MB
Batch size (computed)
*Parameter size: 273MB
© 2024 Micron Technology
• Models are very large and often need to fit in DRAM
• Bandwidth is critical to quality of service
− Tokens/sec is highly correlated with DRAM bandwidth
Why memory is important for generative language
LLAVA 7B with 8-bit quantization* ~5 seconds
LP5X 9.6 (x128): 153 GB/s
LP4 4.2 (x32): 17 GB/s
The image shows a person ironing
clothes on a…
The image depicts an unusual scene where a man is ironing
clothes on an ironing board placed on the back of a moving
vehicle, specifically a yellow SUV. This is not a typical activity
one would expect to see on a city street, as ironing is usually
done indoors in a stationary position to ensure safety and to
prevent accidents. The man's actions are not only
unconventional but also potentially dangerous due to the risk of
falling or being hit by other vehicles or pedestrians.
Additionally, the presence of a taxicab in the background adds
to the urban environment, which makes the scene even more
out of the ordinary.
1 Assumes GGML Quantization: ggml.ai. 2 Kim, Sehoon, et al. "Full stack optimization of
transformer inference: a survey." arXiv preprint arXiv:2302.14017 (2023)
* LLAVA (llava-vl.github.io) | Assume 1
token/word | Excluding time to first token
10
© 2024 Micron Technology
LPCAMM2 for AI-equipped systems
High speed
Energy efficient
Modular and serviceable
Space savings
Performance
• LPDDR5x speed of up to 9.6Gbps
• Full 128-bit, dual-channel, low-power
modular memory solution
Power efficiency
• Consumes 57%-61%1 less active power
and up to 80%1 less system standby
power compared to DDR5 SODIMM
• Thermal efficiency, fanless computers
Modularity
• Flexibility to upgrade system memory
capacity
• Single PCB for all memory configurations
Form factor
• Up to 64%2 space savings
• Space savings for industrial PCs, embedded
single-board computers, AIoT systems
11
1 Power measurements in mW per 64-bit bus at the same LPDDR5X speed compared to SODIMM 2 Calculation based on comparison of the total volume of commercially available dual-stacked DDR5 SODIMM module (32,808 mm3)
to LPCAMM2 module (11,934 mm3)
© 2024 Micron Technology
NVMe SSD
Port 0
Multiport SSD as centralized storage
Supporting multiple subsystems in a single storage device
4150AT product highlights
• Configurable multiport (single, dual,
triple and quad)
• SR-IOV allowing for shared and
private namespaces
• Design flexibility to match system usage
models with TLC, SLC and HE-SLC
endurance modes
• Up to 600K read and 100K write
IOPS performance
• -40 C to 115 C Tc operating temperature range
• Fast boot with TTR <100ms
NS1 NS2 NS3 NS4
Multiple HW and
SW subsystems
(different AI models)
Single multiport
centralized storage
PCIe
Robot control
Compliance
vision camera
Multi-camera
machine vision
Edge platform
agent
SR-IOV
and
SW
Virtualization
Port 1 Port 2 Port 3
12
Legend: SR-IOV = single root I/O virtualization, NS = namespace, PF = physical function, VF = virtual function, Tc = case temperature, TTR = time to ready, TLC = triple-level cell, SLC = single-level cell, HE-SLC = high endurance SLC,
IOPS = input/output operations per second
© 2024 Micron Technology
Micron AI memory and storage portfolio
Leadership products to enable AI workloads
© 2024 Micron Technology
13
AI at the edge (outside the data center) reveals
memory as a bottleneck
• Disproportionate growth between transformer size vs. memory bandwidth
• Data pre-preprocessing overhead impacts latency
• On-chip SRAM is cost prohibitive vs. external DRAM
Memory technology influences AI model execution performance
• Edge AI devices TOPS showcase memory bandwidth gap
• Tiling activation requires in-line memory density resources
• In generative language, bandwidth is required for quality of service
Leading memory technologies offer the best mix of solutions
for edge AI applications
• DDR5 for AI training workloads
• LPDDR4 and LPDDR5 for neural network compute
• LPCAMM2 to leverage LPDDR5X performance with DIMM modularity
• Multiport SSD to support different AI models and compute in a single storage
Summary
Micron memory enables all forms of AI embedded solutions
Visit us at Booth #105
Drones and
industrial transport
Smart grid and
clean energy
Industrial AR/VR
Smart factory
and robotics
AI-enabled video
security and
analytics
Low earth orbit
(LEO)
communication
14
© 2024 Micron Technology
15
Thank You
© 2024 Micron Technology

More Related Content

Similar to “The Importance of Memory for Breaking the Edge AI Performance Bottleneck,” a Presentation from Micron Technology

Modular by Design: Supermicro’s New Standards-Based Universal GPU Server
Modular by Design: Supermicro’s New Standards-Based Universal GPU ServerModular by Design: Supermicro’s New Standards-Based Universal GPU Server
Modular by Design: Supermicro’s New Standards-Based Universal GPU Server
Rebekah Rodriguez
 
Red Hat Storage Day Boston - Supermicro Super Storage
Red Hat Storage Day Boston - Supermicro Super StorageRed Hat Storage Day Boston - Supermicro Super Storage
Red Hat Storage Day Boston - Supermicro Super Storage
Red_Hat_Storage
 
Hybrid Memory Cube: Developing Scalable and Resilient Memory Systems
Hybrid Memory Cube: Developing Scalable and Resilient Memory SystemsHybrid Memory Cube: Developing Scalable and Resilient Memory Systems
Hybrid Memory Cube: Developing Scalable and Resilient Memory Systems
MicronTechnology
 
Supermicro X12 Performance Update
Supermicro X12 Performance UpdateSupermicro X12 Performance Update
Supermicro X12 Performance Update
Rebekah Rodriguez
 
X13 Pre-Release Update featuring 4th Gen Intel® Xeon® Scalable Processors
X13 Pre-Release Update featuring 4th Gen Intel® Xeon® Scalable Processors X13 Pre-Release Update featuring 4th Gen Intel® Xeon® Scalable Processors
X13 Pre-Release Update featuring 4th Gen Intel® Xeon® Scalable Processors
Rebekah Rodriguez
 
Summit workshop thompto
Summit workshop thomptoSummit workshop thompto
Summit workshop thompto
Ganesan Narayanasamy
 
Summit 16: Deploying Virtualized Mobile Infrastructures on Openstack
Summit 16: Deploying Virtualized Mobile Infrastructures on OpenstackSummit 16: Deploying Virtualized Mobile Infrastructures on Openstack
Summit 16: Deploying Virtualized Mobile Infrastructures on Openstack
OPNFV
 
Optimize Content Delivery with Multi-Access Edge Computing
Optimize Content Delivery with Multi-Access Edge ComputingOptimize Content Delivery with Multi-Access Edge Computing
Optimize Content Delivery with Multi-Access Edge Computing
Rebekah Rodriguez
 
40 Powers of 10 - Simulating the Universe with the DiRAC HPC Facility
40 Powers of 10 - Simulating the Universe with the DiRAC HPC Facility40 Powers of 10 - Simulating the Universe with the DiRAC HPC Facility
40 Powers of 10 - Simulating the Universe with the DiRAC HPC Facility
inside-BigData.com
 
SF Big Analytics & SF Machine Learning Meetup: Machine Learning at the Limit ...
SF Big Analytics & SF Machine Learning Meetup: Machine Learning at the Limit ...SF Big Analytics & SF Machine Learning Meetup: Machine Learning at the Limit ...
SF Big Analytics & SF Machine Learning Meetup: Machine Learning at the Limit ...
Chester Chen
 
Deep Dive On Intel Optane SSDs And New Server Platforms
Deep Dive On Intel Optane SSDs And New Server PlatformsDeep Dive On Intel Optane SSDs And New Server Platforms
Deep Dive On Intel Optane SSDs And New Server Platforms
NEXTtour
 
FUNDAMENTALS OF COMPUTER DESIGN
FUNDAMENTALS OF COMPUTER DESIGNFUNDAMENTALS OF COMPUTER DESIGN
FUNDAMENTALS OF COMPUTER DESIGN
venkatraman227
 
IBM POWER8 Systems Technology Group Development
IBM POWER8 Systems Technology Group DevelopmentIBM POWER8 Systems Technology Group Development
IBM POWER8 Systems Technology Group Development
Slide_N
 
Heterogeneous Computing : The Future of Systems
Heterogeneous Computing : The Future of SystemsHeterogeneous Computing : The Future of Systems
Heterogeneous Computing : The Future of Systems
Anand Haridass
 
Harnessing the virtual realm for successful real world artificial intelligence
Harnessing the virtual realm for successful real world artificial intelligenceHarnessing the virtual realm for successful real world artificial intelligence
Harnessing the virtual realm for successful real world artificial intelligence
Alison B. Lowndes
 
Modular by Design: Supermicro’s New Standards-Based Universal GPU Server
Modular by Design: Supermicro’s New Standards-Based Universal GPU ServerModular by Design: Supermicro’s New Standards-Based Universal GPU Server
Modular by Design: Supermicro’s New Standards-Based Universal GPU Server
Rebekah Rodriguez
 
HKG18-500K1 - Keynote: Dileep Bhandarkar - Emerging Computing Trends in the D...
HKG18-500K1 - Keynote: Dileep Bhandarkar - Emerging Computing Trends in the D...HKG18-500K1 - Keynote: Dileep Bhandarkar - Emerging Computing Trends in the D...
HKG18-500K1 - Keynote: Dileep Bhandarkar - Emerging Computing Trends in the D...
Linaro
 
POLYTEDA PowerDRC/LVS overview
POLYTEDA PowerDRC/LVS overviewPOLYTEDA PowerDRC/LVS overview
POLYTEDA PowerDRC/LVS overview
Alexander Grudanov
 
Power 7 Overview
Power 7 OverviewPower 7 Overview
Power 7 Overview
lambertt
 
DATE 2020: Design, Automation and Test in Europe Conference
DATE 2020: Design, Automation and Test in Europe ConferenceDATE 2020: Design, Automation and Test in Europe Conference
DATE 2020: Design, Automation and Test in Europe Conference
LEGATO project
 

Similar to “The Importance of Memory for Breaking the Edge AI Performance Bottleneck,” a Presentation from Micron Technology (20)

Modular by Design: Supermicro’s New Standards-Based Universal GPU Server
Modular by Design: Supermicro’s New Standards-Based Universal GPU ServerModular by Design: Supermicro’s New Standards-Based Universal GPU Server
Modular by Design: Supermicro’s New Standards-Based Universal GPU Server
 
Red Hat Storage Day Boston - Supermicro Super Storage
Red Hat Storage Day Boston - Supermicro Super StorageRed Hat Storage Day Boston - Supermicro Super Storage
Red Hat Storage Day Boston - Supermicro Super Storage
 
Hybrid Memory Cube: Developing Scalable and Resilient Memory Systems
Hybrid Memory Cube: Developing Scalable and Resilient Memory SystemsHybrid Memory Cube: Developing Scalable and Resilient Memory Systems
Hybrid Memory Cube: Developing Scalable and Resilient Memory Systems
 
Supermicro X12 Performance Update
Supermicro X12 Performance UpdateSupermicro X12 Performance Update
Supermicro X12 Performance Update
 
X13 Pre-Release Update featuring 4th Gen Intel® Xeon® Scalable Processors
X13 Pre-Release Update featuring 4th Gen Intel® Xeon® Scalable Processors X13 Pre-Release Update featuring 4th Gen Intel® Xeon® Scalable Processors
X13 Pre-Release Update featuring 4th Gen Intel® Xeon® Scalable Processors
 
Summit workshop thompto
Summit workshop thomptoSummit workshop thompto
Summit workshop thompto
 
Summit 16: Deploying Virtualized Mobile Infrastructures on Openstack
Summit 16: Deploying Virtualized Mobile Infrastructures on OpenstackSummit 16: Deploying Virtualized Mobile Infrastructures on Openstack
Summit 16: Deploying Virtualized Mobile Infrastructures on Openstack
 
Optimize Content Delivery with Multi-Access Edge Computing
Optimize Content Delivery with Multi-Access Edge ComputingOptimize Content Delivery with Multi-Access Edge Computing
Optimize Content Delivery with Multi-Access Edge Computing
 
40 Powers of 10 - Simulating the Universe with the DiRAC HPC Facility
40 Powers of 10 - Simulating the Universe with the DiRAC HPC Facility40 Powers of 10 - Simulating the Universe with the DiRAC HPC Facility
40 Powers of 10 - Simulating the Universe with the DiRAC HPC Facility
 
SF Big Analytics & SF Machine Learning Meetup: Machine Learning at the Limit ...
SF Big Analytics & SF Machine Learning Meetup: Machine Learning at the Limit ...SF Big Analytics & SF Machine Learning Meetup: Machine Learning at the Limit ...
SF Big Analytics & SF Machine Learning Meetup: Machine Learning at the Limit ...
 
Deep Dive On Intel Optane SSDs And New Server Platforms
Deep Dive On Intel Optane SSDs And New Server PlatformsDeep Dive On Intel Optane SSDs And New Server Platforms
Deep Dive On Intel Optane SSDs And New Server Platforms
 
FUNDAMENTALS OF COMPUTER DESIGN
FUNDAMENTALS OF COMPUTER DESIGNFUNDAMENTALS OF COMPUTER DESIGN
FUNDAMENTALS OF COMPUTER DESIGN
 
IBM POWER8 Systems Technology Group Development
IBM POWER8 Systems Technology Group DevelopmentIBM POWER8 Systems Technology Group Development
IBM POWER8 Systems Technology Group Development
 
Heterogeneous Computing : The Future of Systems
Heterogeneous Computing : The Future of SystemsHeterogeneous Computing : The Future of Systems
Heterogeneous Computing : The Future of Systems
 
Harnessing the virtual realm for successful real world artificial intelligence
Harnessing the virtual realm for successful real world artificial intelligenceHarnessing the virtual realm for successful real world artificial intelligence
Harnessing the virtual realm for successful real world artificial intelligence
 
Modular by Design: Supermicro’s New Standards-Based Universal GPU Server
Modular by Design: Supermicro’s New Standards-Based Universal GPU ServerModular by Design: Supermicro’s New Standards-Based Universal GPU Server
Modular by Design: Supermicro’s New Standards-Based Universal GPU Server
 
HKG18-500K1 - Keynote: Dileep Bhandarkar - Emerging Computing Trends in the D...
HKG18-500K1 - Keynote: Dileep Bhandarkar - Emerging Computing Trends in the D...HKG18-500K1 - Keynote: Dileep Bhandarkar - Emerging Computing Trends in the D...
HKG18-500K1 - Keynote: Dileep Bhandarkar - Emerging Computing Trends in the D...
 
POLYTEDA PowerDRC/LVS overview
POLYTEDA PowerDRC/LVS overviewPOLYTEDA PowerDRC/LVS overview
POLYTEDA PowerDRC/LVS overview
 
Power 7 Overview
Power 7 OverviewPower 7 Overview
Power 7 Overview
 
DATE 2020: Design, Automation and Test in Europe Conference
DATE 2020: Design, Automation and Test in Europe ConferenceDATE 2020: Design, Automation and Test in Europe Conference
DATE 2020: Design, Automation and Test in Europe Conference
 

More from Edge AI and Vision Alliance

“How to Run Audio and Vision AI Algorithms at Ultra-low Power,” a Presentatio...
“How to Run Audio and Vision AI Algorithms at Ultra-low Power,” a Presentatio...“How to Run Audio and Vision AI Algorithms at Ultra-low Power,” a Presentatio...
“How to Run Audio and Vision AI Algorithms at Ultra-low Power,” a Presentatio...
Edge AI and Vision Alliance
 
“Meeting the Critical Needs of Accuracy, Performance and Adaptability in Embe...
“Meeting the Critical Needs of Accuracy, Performance and Adaptability in Embe...“Meeting the Critical Needs of Accuracy, Performance and Adaptability in Embe...
“Meeting the Critical Needs of Accuracy, Performance and Adaptability in Embe...
Edge AI and Vision Alliance
 
“Build a Tiny Vision Application in Minutes with the Edge App SDK,” a Present...
“Build a Tiny Vision Application in Minutes with the Edge App SDK,” a Present...“Build a Tiny Vision Application in Minutes with the Edge App SDK,” a Present...
“Build a Tiny Vision Application in Minutes with the Edge App SDK,” a Present...
Edge AI and Vision Alliance
 
“Intel’s Approach to Operationalizing AI in the Manufacturing Sector,” a Pres...
“Intel’s Approach to Operationalizing AI in the Manufacturing Sector,” a Pres...“Intel’s Approach to Operationalizing AI in the Manufacturing Sector,” a Pres...
“Intel’s Approach to Operationalizing AI in the Manufacturing Sector,” a Pres...
Edge AI and Vision Alliance
 
“Transforming Enterprise Intelligence: The Power of Computer Vision and Gen A...
“Transforming Enterprise Intelligence: The Power of Computer Vision and Gen A...“Transforming Enterprise Intelligence: The Power of Computer Vision and Gen A...
“Transforming Enterprise Intelligence: The Power of Computer Vision and Gen A...
Edge AI and Vision Alliance
 
“Challenges and Solutions of Moving Vision LLMs to the Edge,” a Presentation ...
“Challenges and Solutions of Moving Vision LLMs to the Edge,” a Presentation ...“Challenges and Solutions of Moving Vision LLMs to the Edge,” a Presentation ...
“Challenges and Solutions of Moving Vision LLMs to the Edge,” a Presentation ...
Edge AI and Vision Alliance
 
“Implementing Transformer Neural Networks for Visual Perception on Embedded D...
“Implementing Transformer Neural Networks for Visual Perception on Embedded D...“Implementing Transformer Neural Networks for Visual Perception on Embedded D...
“Implementing Transformer Neural Networks for Visual Perception on Embedded D...
Edge AI and Vision Alliance
 
“A Cutting-edge Memory Optimization Method for Embedded AI Accelerators,” a P...
“A Cutting-edge Memory Optimization Method for Embedded AI Accelerators,” a P...“A Cutting-edge Memory Optimization Method for Embedded AI Accelerators,” a P...
“A Cutting-edge Memory Optimization Method for Embedded AI Accelerators,” a P...
Edge AI and Vision Alliance
 
“Efficiency Unleashed: The Next-gen NXP i.MX 95 Applications Processor for Em...
“Efficiency Unleashed: The Next-gen NXP i.MX 95 Applications Processor for Em...“Efficiency Unleashed: The Next-gen NXP i.MX 95 Applications Processor for Em...
“Efficiency Unleashed: The Next-gen NXP i.MX 95 Applications Processor for Em...
Edge AI and Vision Alliance
 
“Optimized Vision Language Models for Intelligent Transportation System Appli...
“Optimized Vision Language Models for Intelligent Transportation System Appli...“Optimized Vision Language Models for Intelligent Transportation System Appli...
“Optimized Vision Language Models for Intelligent Transportation System Appli...
Edge AI and Vision Alliance
 
“Image Signal Processing Optimization for Object Detection,” a Presentation f...
“Image Signal Processing Optimization for Object Detection,” a Presentation f...“Image Signal Processing Optimization for Object Detection,” a Presentation f...
“Image Signal Processing Optimization for Object Detection,” a Presentation f...
Edge AI and Vision Alliance
 
“Squeezing the Last Milliwatt and Cubic Millimeter from Smart Cameras Using t...
“Squeezing the Last Milliwatt and Cubic Millimeter from Smart Cameras Using t...“Squeezing the Last Milliwatt and Cubic Millimeter from Smart Cameras Using t...
“Squeezing the Last Milliwatt and Cubic Millimeter from Smart Cameras Using t...
Edge AI and Vision Alliance
 
"Maximize Your AI Compatibility with Flexible Pre- and Post-processing," a Pr...
"Maximize Your AI Compatibility with Flexible Pre- and Post-processing," a Pr..."Maximize Your AI Compatibility with Flexible Pre- and Post-processing," a Pr...
"Maximize Your AI Compatibility with Flexible Pre- and Post-processing," a Pr...
Edge AI and Vision Alliance
 
“Addressing Tomorrow’s Sensor Fusion and Processing Needs with Cadence’s Newe...
“Addressing Tomorrow’s Sensor Fusion and Processing Needs with Cadence’s Newe...“Addressing Tomorrow’s Sensor Fusion and Processing Needs with Cadence’s Newe...
“Addressing Tomorrow’s Sensor Fusion and Processing Needs with Cadence’s Newe...
Edge AI and Vision Alliance
 
“Temporal Event Neural Networks: A More Efficient Alternative to the Transfor...
“Temporal Event Neural Networks: A More Efficient Alternative to the Transfor...“Temporal Event Neural Networks: A More Efficient Alternative to the Transfor...
“Temporal Event Neural Networks: A More Efficient Alternative to the Transfor...
Edge AI and Vision Alliance
 
“Silicon Slip-ups: The Ten Most Common Errors Processor Suppliers Make (Numbe...
“Silicon Slip-ups: The Ten Most Common Errors Processor Suppliers Make (Numbe...“Silicon Slip-ups: The Ten Most Common Errors Processor Suppliers Make (Numbe...
“Silicon Slip-ups: The Ten Most Common Errors Processor Suppliers Make (Numbe...
Edge AI and Vision Alliance
 
“How Axelera AI Uses Digital Compute-in-memory to Deliver Fast and Energy-eff...
“How Axelera AI Uses Digital Compute-in-memory to Deliver Fast and Energy-eff...“How Axelera AI Uses Digital Compute-in-memory to Deliver Fast and Energy-eff...
“How Axelera AI Uses Digital Compute-in-memory to Deliver Fast and Energy-eff...
Edge AI and Vision Alliance
 
“How Arm’s Machine Learning Solution Enables Vision Transformers at the Edge,...
“How Arm’s Machine Learning Solution Enables Vision Transformers at the Edge,...“How Arm’s Machine Learning Solution Enables Vision Transformers at the Edge,...
“How Arm’s Machine Learning Solution Enables Vision Transformers at the Edge,...
Edge AI and Vision Alliance
 
“Nx EVOS: A New Enterprise Operating System for Video and Visual AI,” a Prese...
“Nx EVOS: A New Enterprise Operating System for Video and Visual AI,” a Prese...“Nx EVOS: A New Enterprise Operating System for Video and Visual AI,” a Prese...
“Nx EVOS: A New Enterprise Operating System for Video and Visual AI,” a Prese...
Edge AI and Vision Alliance
 
“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...
“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...
“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...
Edge AI and Vision Alliance
 

More from Edge AI and Vision Alliance (20)

“How to Run Audio and Vision AI Algorithms at Ultra-low Power,” a Presentatio...
“How to Run Audio and Vision AI Algorithms at Ultra-low Power,” a Presentatio...“How to Run Audio and Vision AI Algorithms at Ultra-low Power,” a Presentatio...
“How to Run Audio and Vision AI Algorithms at Ultra-low Power,” a Presentatio...
 
“Meeting the Critical Needs of Accuracy, Performance and Adaptability in Embe...
“Meeting the Critical Needs of Accuracy, Performance and Adaptability in Embe...“Meeting the Critical Needs of Accuracy, Performance and Adaptability in Embe...
“Meeting the Critical Needs of Accuracy, Performance and Adaptability in Embe...
 
“Build a Tiny Vision Application in Minutes with the Edge App SDK,” a Present...
“Build a Tiny Vision Application in Minutes with the Edge App SDK,” a Present...“Build a Tiny Vision Application in Minutes with the Edge App SDK,” a Present...
“Build a Tiny Vision Application in Minutes with the Edge App SDK,” a Present...
 
“Intel’s Approach to Operationalizing AI in the Manufacturing Sector,” a Pres...
“Intel’s Approach to Operationalizing AI in the Manufacturing Sector,” a Pres...“Intel’s Approach to Operationalizing AI in the Manufacturing Sector,” a Pres...
“Intel’s Approach to Operationalizing AI in the Manufacturing Sector,” a Pres...
 
“Transforming Enterprise Intelligence: The Power of Computer Vision and Gen A...
“Transforming Enterprise Intelligence: The Power of Computer Vision and Gen A...“Transforming Enterprise Intelligence: The Power of Computer Vision and Gen A...
“Transforming Enterprise Intelligence: The Power of Computer Vision and Gen A...
 
“Challenges and Solutions of Moving Vision LLMs to the Edge,” a Presentation ...
“Challenges and Solutions of Moving Vision LLMs to the Edge,” a Presentation ...“Challenges and Solutions of Moving Vision LLMs to the Edge,” a Presentation ...
“Challenges and Solutions of Moving Vision LLMs to the Edge,” a Presentation ...
 
“Implementing Transformer Neural Networks for Visual Perception on Embedded D...
“Implementing Transformer Neural Networks for Visual Perception on Embedded D...“Implementing Transformer Neural Networks for Visual Perception on Embedded D...
“Implementing Transformer Neural Networks for Visual Perception on Embedded D...
 
“A Cutting-edge Memory Optimization Method for Embedded AI Accelerators,” a P...
“A Cutting-edge Memory Optimization Method for Embedded AI Accelerators,” a P...“A Cutting-edge Memory Optimization Method for Embedded AI Accelerators,” a P...
“A Cutting-edge Memory Optimization Method for Embedded AI Accelerators,” a P...
 
“Efficiency Unleashed: The Next-gen NXP i.MX 95 Applications Processor for Em...
“Efficiency Unleashed: The Next-gen NXP i.MX 95 Applications Processor for Em...“Efficiency Unleashed: The Next-gen NXP i.MX 95 Applications Processor for Em...
“Efficiency Unleashed: The Next-gen NXP i.MX 95 Applications Processor for Em...
 
“Optimized Vision Language Models for Intelligent Transportation System Appli...
“Optimized Vision Language Models for Intelligent Transportation System Appli...“Optimized Vision Language Models for Intelligent Transportation System Appli...
“Optimized Vision Language Models for Intelligent Transportation System Appli...
 
“Image Signal Processing Optimization for Object Detection,” a Presentation f...
“Image Signal Processing Optimization for Object Detection,” a Presentation f...“Image Signal Processing Optimization for Object Detection,” a Presentation f...
“Image Signal Processing Optimization for Object Detection,” a Presentation f...
 
“Squeezing the Last Milliwatt and Cubic Millimeter from Smart Cameras Using t...
“Squeezing the Last Milliwatt and Cubic Millimeter from Smart Cameras Using t...“Squeezing the Last Milliwatt and Cubic Millimeter from Smart Cameras Using t...
“Squeezing the Last Milliwatt and Cubic Millimeter from Smart Cameras Using t...
 
"Maximize Your AI Compatibility with Flexible Pre- and Post-processing," a Pr...
"Maximize Your AI Compatibility with Flexible Pre- and Post-processing," a Pr..."Maximize Your AI Compatibility with Flexible Pre- and Post-processing," a Pr...
"Maximize Your AI Compatibility with Flexible Pre- and Post-processing," a Pr...
 
“Addressing Tomorrow’s Sensor Fusion and Processing Needs with Cadence’s Newe...
“Addressing Tomorrow’s Sensor Fusion and Processing Needs with Cadence’s Newe...“Addressing Tomorrow’s Sensor Fusion and Processing Needs with Cadence’s Newe...
“Addressing Tomorrow’s Sensor Fusion and Processing Needs with Cadence’s Newe...
 
“Temporal Event Neural Networks: A More Efficient Alternative to the Transfor...
“Temporal Event Neural Networks: A More Efficient Alternative to the Transfor...“Temporal Event Neural Networks: A More Efficient Alternative to the Transfor...
“Temporal Event Neural Networks: A More Efficient Alternative to the Transfor...
 
“Silicon Slip-ups: The Ten Most Common Errors Processor Suppliers Make (Numbe...
“Silicon Slip-ups: The Ten Most Common Errors Processor Suppliers Make (Numbe...“Silicon Slip-ups: The Ten Most Common Errors Processor Suppliers Make (Numbe...
“Silicon Slip-ups: The Ten Most Common Errors Processor Suppliers Make (Numbe...
 
“How Axelera AI Uses Digital Compute-in-memory to Deliver Fast and Energy-eff...
“How Axelera AI Uses Digital Compute-in-memory to Deliver Fast and Energy-eff...“How Axelera AI Uses Digital Compute-in-memory to Deliver Fast and Energy-eff...
“How Axelera AI Uses Digital Compute-in-memory to Deliver Fast and Energy-eff...
 
“How Arm’s Machine Learning Solution Enables Vision Transformers at the Edge,...
“How Arm’s Machine Learning Solution Enables Vision Transformers at the Edge,...“How Arm’s Machine Learning Solution Enables Vision Transformers at the Edge,...
“How Arm’s Machine Learning Solution Enables Vision Transformers at the Edge,...
 
“Nx EVOS: A New Enterprise Operating System for Video and Visual AI,” a Prese...
“Nx EVOS: A New Enterprise Operating System for Video and Visual AI,” a Prese...“Nx EVOS: A New Enterprise Operating System for Video and Visual AI,” a Prese...
“Nx EVOS: A New Enterprise Operating System for Video and Visual AI,” a Prese...
 
“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...
“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...
“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...
 

Recently uploaded

Chapter 4 - Test Analysis & Design Techniques V4.0
Chapter 4 - Test Analysis & Design Techniques V4.0Chapter 4 - Test Analysis & Design Techniques V4.0
Chapter 4 - Test Analysis & Design Techniques V4.0
Neeraj Kumar Singh
 
Kubernetes Cloud Native Indonesia Meetup - June 2024
Kubernetes Cloud Native Indonesia Meetup - June 2024Kubernetes Cloud Native Indonesia Meetup - June 2024
Kubernetes Cloud Native Indonesia Meetup - June 2024
Prasta Maha
 
Chapter 3 - Static Testing (Review) V4.0
Chapter 3 - Static Testing (Review) V4.0Chapter 3 - Static Testing (Review) V4.0
Chapter 3 - Static Testing (Review) V4.0
Neeraj Kumar Singh
 
9 Ways Pastors Will Use AI Everyday By 2029
9 Ways Pastors Will Use AI Everyday By 20299 Ways Pastors Will Use AI Everyday By 2029
9 Ways Pastors Will Use AI Everyday By 2029
Big Click Syndicate LLC
 
Pigging Solutions Sustainability brochure.pdf
Pigging Solutions Sustainability brochure.pdfPigging Solutions Sustainability brochure.pdf
Pigging Solutions Sustainability brochure.pdf
Pigging Solutions
 
20240702 QFM021 Machine Intelligence Reading List June 2024
20240702 QFM021 Machine Intelligence Reading List June 202420240702 QFM021 Machine Intelligence Reading List June 2024
20240702 QFM021 Machine Intelligence Reading List June 2024
Matthew Sinclair
 
Coordinate Systems in FME 101 - Webinar Slides
Coordinate Systems in FME 101 - Webinar SlidesCoordinate Systems in FME 101 - Webinar Slides
Coordinate Systems in FME 101 - Webinar Slides
Safe Software
 
Product Listing Optimization Presentation - Gay De La Cruz.pdf
Product Listing Optimization Presentation - Gay De La Cruz.pdfProduct Listing Optimization Presentation - Gay De La Cruz.pdf
Product Listing Optimization Presentation - Gay De La Cruz.pdf
gaydlc2513
 
Dev Dives: Mining your data with AI-powered Continuous Discovery
Dev Dives: Mining your data with AI-powered Continuous DiscoveryDev Dives: Mining your data with AI-powered Continuous Discovery
Dev Dives: Mining your data with AI-powered Continuous Discovery
UiPathCommunity
 
GDG Cloud Southlake #34: Neatsun Ziv: Automating Appsec
GDG Cloud Southlake #34: Neatsun Ziv: Automating AppsecGDG Cloud Southlake #34: Neatsun Ziv: Automating Appsec
GDG Cloud Southlake #34: Neatsun Ziv: Automating Appsec
James Anderson
 
Lessons Of Binary Analysis - Christien Rioux
Lessons Of Binary Analysis - Christien RiouxLessons Of Binary Analysis - Christien Rioux
Lessons Of Binary Analysis - Christien Rioux
crioux1
 
Leading a Tableau User Group - Onboarding deck for new leaders
Leading a Tableau User Group - Onboarding deck for new leadersLeading a Tableau User Group - Onboarding deck for new leaders
Leading a Tableau User Group - Onboarding deck for new leaders
lward7
 
What’s New in Teams Calling, Meetings and Devices May 2024
What’s New in Teams Calling, Meetings and Devices May 2024What’s New in Teams Calling, Meetings and Devices May 2024
What’s New in Teams Calling, Meetings and Devices May 2024
Stephanie Beckett
 
Details of description part II: Describing images in practice - Tech Forum 2024
Details of description part II: Describing images in practice - Tech Forum 2024Details of description part II: Describing images in practice - Tech Forum 2024
Details of description part II: Describing images in practice - Tech Forum 2024
BookNet Canada
 
How to Optimize Call Monitoring: Automate QA and Elevate Customer Experience
How to Optimize Call Monitoring: Automate QA and Elevate Customer ExperienceHow to Optimize Call Monitoring: Automate QA and Elevate Customer Experience
How to Optimize Call Monitoring: Automate QA and Elevate Customer Experience
Aggregage
 
Summer24-ReleaseOverviewDeck - Stephen Stanley 27 June 2024.pdf
Summer24-ReleaseOverviewDeck - Stephen Stanley 27 June 2024.pdfSummer24-ReleaseOverviewDeck - Stephen Stanley 27 June 2024.pdf
Summer24-ReleaseOverviewDeck - Stephen Stanley 27 June 2024.pdf
Anna Loughnan Colquhoun
 
ThousandEyes New Product Features and Release Highlights: June 2024
ThousandEyes New Product Features and Release Highlights: June 2024ThousandEyes New Product Features and Release Highlights: June 2024
ThousandEyes New Product Features and Release Highlights: June 2024
ThousandEyes
 
Getting Started Using the National Research Platform
Getting Started Using the National Research PlatformGetting Started Using the National Research Platform
Getting Started Using the National Research Platform
Larry Smarr
 
From NCSA to the National Research Platform
From NCSA to the National Research PlatformFrom NCSA to the National Research Platform
From NCSA to the National Research Platform
Larry Smarr
 
STKI Israeli Market Study 2024 final v1
STKI Israeli Market Study 2024 final  v1STKI Israeli Market Study 2024 final  v1
STKI Israeli Market Study 2024 final v1
Dr. Jimmy Schwarzkopf
 

Recently uploaded (20)

Chapter 4 - Test Analysis & Design Techniques V4.0
Chapter 4 - Test Analysis & Design Techniques V4.0Chapter 4 - Test Analysis & Design Techniques V4.0
Chapter 4 - Test Analysis & Design Techniques V4.0
 
Kubernetes Cloud Native Indonesia Meetup - June 2024
Kubernetes Cloud Native Indonesia Meetup - June 2024Kubernetes Cloud Native Indonesia Meetup - June 2024
Kubernetes Cloud Native Indonesia Meetup - June 2024
 
Chapter 3 - Static Testing (Review) V4.0
Chapter 3 - Static Testing (Review) V4.0Chapter 3 - Static Testing (Review) V4.0
Chapter 3 - Static Testing (Review) V4.0
 
9 Ways Pastors Will Use AI Everyday By 2029
9 Ways Pastors Will Use AI Everyday By 20299 Ways Pastors Will Use AI Everyday By 2029
9 Ways Pastors Will Use AI Everyday By 2029
 
Pigging Solutions Sustainability brochure.pdf
Pigging Solutions Sustainability brochure.pdfPigging Solutions Sustainability brochure.pdf
Pigging Solutions Sustainability brochure.pdf
 
20240702 QFM021 Machine Intelligence Reading List June 2024
20240702 QFM021 Machine Intelligence Reading List June 202420240702 QFM021 Machine Intelligence Reading List June 2024
20240702 QFM021 Machine Intelligence Reading List June 2024
 
Coordinate Systems in FME 101 - Webinar Slides
Coordinate Systems in FME 101 - Webinar SlidesCoordinate Systems in FME 101 - Webinar Slides
Coordinate Systems in FME 101 - Webinar Slides
 
Product Listing Optimization Presentation - Gay De La Cruz.pdf
Product Listing Optimization Presentation - Gay De La Cruz.pdfProduct Listing Optimization Presentation - Gay De La Cruz.pdf
Product Listing Optimization Presentation - Gay De La Cruz.pdf
 
Dev Dives: Mining your data with AI-powered Continuous Discovery
Dev Dives: Mining your data with AI-powered Continuous DiscoveryDev Dives: Mining your data with AI-powered Continuous Discovery
Dev Dives: Mining your data with AI-powered Continuous Discovery
 
GDG Cloud Southlake #34: Neatsun Ziv: Automating Appsec
GDG Cloud Southlake #34: Neatsun Ziv: Automating AppsecGDG Cloud Southlake #34: Neatsun Ziv: Automating Appsec
GDG Cloud Southlake #34: Neatsun Ziv: Automating Appsec
 
Lessons Of Binary Analysis - Christien Rioux
Lessons Of Binary Analysis - Christien RiouxLessons Of Binary Analysis - Christien Rioux
Lessons Of Binary Analysis - Christien Rioux
 
Leading a Tableau User Group - Onboarding deck for new leaders
Leading a Tableau User Group - Onboarding deck for new leadersLeading a Tableau User Group - Onboarding deck for new leaders
Leading a Tableau User Group - Onboarding deck for new leaders
 
What’s New in Teams Calling, Meetings and Devices May 2024
What’s New in Teams Calling, Meetings and Devices May 2024What’s New in Teams Calling, Meetings and Devices May 2024
What’s New in Teams Calling, Meetings and Devices May 2024
 
Details of description part II: Describing images in practice - Tech Forum 2024
Details of description part II: Describing images in practice - Tech Forum 2024Details of description part II: Describing images in practice - Tech Forum 2024
Details of description part II: Describing images in practice - Tech Forum 2024
 
How to Optimize Call Monitoring: Automate QA and Elevate Customer Experience
How to Optimize Call Monitoring: Automate QA and Elevate Customer ExperienceHow to Optimize Call Monitoring: Automate QA and Elevate Customer Experience
How to Optimize Call Monitoring: Automate QA and Elevate Customer Experience
 
Summer24-ReleaseOverviewDeck - Stephen Stanley 27 June 2024.pdf
Summer24-ReleaseOverviewDeck - Stephen Stanley 27 June 2024.pdfSummer24-ReleaseOverviewDeck - Stephen Stanley 27 June 2024.pdf
Summer24-ReleaseOverviewDeck - Stephen Stanley 27 June 2024.pdf
 
ThousandEyes New Product Features and Release Highlights: June 2024
ThousandEyes New Product Features and Release Highlights: June 2024ThousandEyes New Product Features and Release Highlights: June 2024
ThousandEyes New Product Features and Release Highlights: June 2024
 
Getting Started Using the National Research Platform
Getting Started Using the National Research PlatformGetting Started Using the National Research Platform
Getting Started Using the National Research Platform
 
From NCSA to the National Research Platform
From NCSA to the National Research PlatformFrom NCSA to the National Research Platform
From NCSA to the National Research Platform
 
STKI Israeli Market Study 2024 final v1
STKI Israeli Market Study 2024 final  v1STKI Israeli Market Study 2024 final  v1
STKI Israeli Market Study 2024 final v1
 

“The Importance of Memory for Breaking the Edge AI Performance Bottleneck,” a Presentation from Micron Technology

  • 1. The Importance of Memory for Breaking the Edge AI Performance Bottleneck Wil Florentino Sr. Segment Marketing Manager Micron Technology
  • 2. Edge AI reveals memory as the bottleneck Trend toward memory-bound applications Model complexity vs. memory bandwidth • Transformer size growth 410x / 2 years • AI HW memory bandwidth 2x / 2 years1 Pre-processing latency in AI execution • Data pre-preprocessing overhead2 impacts latency $/GB vs. scalability • SRAM: $5,000/GB • DRAM: $50/GB3 1 “AI and memory wall,” Medium, 2021 2 “Rapid Data Pre-Processing with NVIDIA DALI” NVIDIA Technical Blog, 2021 3 “SRAM vs. DRAM: Difference between SRAM & DRAM explained,” Enterprise Storage Forum, 2023 2 $/GB On-chip SRAM Off-chip DRAM AI Scalability Deep Learning Other Machine Learning Approaches Recent Memory Requirements over Time Model Size GenAI Pre-processing overhead Communication overhead Model inference time © 2024 Micron Technology
  • 3. • Training data trade-offs between cost of storage on-premise vs. cloud • Complexity of on-device implementation in target • Type of and memory performance influence the efficiency of running the model • Power consumption DNN challenges relate back to memory and storage Edge AI and Vision Alliance report on DNN implementation challenges 3 © 2024 Micron Technology
  • 4. • CPU core counts are increasing at a rate that minimizes available memory bandwidth per core • New memory technologies are required to meet next-generation bandwidth-per-core requirements in multi-core CPUs • Edge AI inference compute requires additional memory consideration DRAM memory bandwidth per core has been declining 4 Multicore CPU architectures vs. memory BW / core 2004 6.4 GB/s per core Source: Micron. bandwidth normalized to x64 interface, 64Byte random accesses, 66% reads, dual-rank x4 simulation, 16Gb. Best estimates; subject to change. © 2024 Micron Technology
  • 5. Configuration • Density per die • Die per package • I/O width • Bank groups • Technology node Performance • Speed/pin • Number of channels • Prefetch size • Burst length • Read latency The many levers of a memory device Complex design considerations for memory improve performance and lower costs 5 Operational • On-die Error Correction • Thermal profile • Refresh management • Power reduction modes • Active vs. standby power (picojoule/bit) Application focus • Functional safety • Reliability/Availability/Serviceability • Extended temperature • Validation and testing • Product lifecycle • Industrial rated • Auto validated © 2024 Micron Technology
  • 6. DDR5 for data-intensive training workloads • Burst length • Bank groups • Banks 6 capability DDR5 memory comparisons Increased bandwidth more than 3x1 DDR5-3200 1.39x 2.03x 3.05x 23.4 GB/s DDR5-4800 DDR5-8800 34.2 GB/s 51.2 GB/s 66% efficiency DDR4-3200 89% efficiency DDR5-4800 Higher bus efficiency up to 90%1 Faster transfer speed up to 8800 MT*/s2 Improved overall workload performance3 Cloud Virtualization 40% Data center Business apps 45% High-performance computing HPC modeling >200% 128GB high-capacity RDIMM using monolithic 32Gb DRAM 1 Benchmark simulation comparison of DDR5 vs. DDR4-3200 2 Based on defined JEDEC specification 3 Results based on internal testing, third-party testing and/or industry workload benchmark testing *Mega-transfers per second © 2024 Micron Technology
  • 7. BW of LP4 @ 4.2Gbps/pin IO per device (x16/x32/x64) 8 GB/s ● 17 GB/s ● 33 GB/s ● 33 GB/s ● Number of LP4 packaged devices 3 6 7 14! Compute bandwidth requirements by edge solution AI TOPs* vs. number of LPDDR4 devices scenarios Sensor edge Device edge Network edge Compute edge IoT sensors and ultra low power devices (TinyML) Cameras, machines and industrial/SFF PC/server Industrial PC/server, network equipment, NVR/VMS appliances Server/NVR/VMS appliances Power <1W 2W <= 15W 15W <=75W 15W <= 75W+ SoC/ASIC IO width (typical) x16 x32 x64 x128 DLA INT 8 TOPS <4 4–20 20–50 50–100 Est. bandwidth to full utilization of accelerator [saturate accelerator**] 18 GB/s 90 GB/s 225 GB/s 451 GB/s 7 x16 x32 x64 1 *relative reference models only, actuals will vary. 2 **Device Level Accelerator bandwidth assumed roofline modeling (Resnet 50) 3 "V. Sze, Y. -H. Chen, T. -J. Yang and J. S. Emer, "How to Evaluate Deep Neural Network Processors: TOPS/W (Alone) Considered Harmful," in IEEE Solid-State Circuits Magazine, vol. 12, no. 3, pp. 28-41" © 2024 Micron Technology
  • 8. LPDDR5 offers a leap in performance and possibilities 8 © 2024 Micron Technology 12.8 25.6 51.2 17 34 68 19.2 38.4 76.8 0 10 20 30 40 50 60 70 80 90 x16 channel x32 channel x64 channel GB/s Data rate in Gbps/pin LPDDR5X bandwidth at different channel and pin speed 6.4 8.5 9.6 • Reduces number of components to get to same bandwidth • Improved architecture • Lower power [pj/bit] 6x throughput Data Rate 2Gbps 4Gbps 6Gbps 9Gbps Improved Performance 2012~ 2021~ 50% faster Improved Power Savings Features [mW/GBps index] 1.0 LP3 LP5x Lower Power Consumption LP3 LP4 LP4x LP5 LP5x
  • 9. Memory footprint as a function of batch size Tiling for small object detection in high-resolution vision 9 Example: Batch size: 9 x N Convolutional model Stacked inputs Meta AI-generated image (Imagine Platform) [1] Small object detection: An image tiling based approach, Medium, 2021 [Link] [2] S. Ngyuyen, et al., “Dynamic tiling: A model-agnostic, adaptive, scalable, and inference-data-centric approach for efficient and accurate small object detection,” arXiv:2309.11069v1, 2023 [3] F. Akyon, et al., “SAHI: Slicing aided hyper inference and fine-tuning for small object detection,” IEEE ICIP, 2022 [4] F. Unel, et al., “The power of tiling for small object detection,” CVPR, 2019 [5] Training vs. inference – Memory consumption by neural networks [Link] [6] GitHub: TorchInfo [Link] [7] Model not quantized (fp32). Memory footprint of two largest consecutive layers. Higher batch size improves results Tiling high-resolution images 0 1,000 2,000 3,000 4,000 5,000 6,000 7,000 1 2 4 8 16 32 64 128 Memory for inference YOLOv8x across* batch sizes Memory Density Batch size impacts the memory footprint 6.1GB memory requirement MB Batch size (computed) *Parameter size: 273MB © 2024 Micron Technology
  • 10. • Models are very large and often need to fit in DRAM • Bandwidth is critical to quality of service − Tokens/sec is highly correlated with DRAM bandwidth Why memory is important for generative language LLAVA 7B with 8-bit quantization* ~5 seconds LP5X 9.6 (x128): 153 GB/s LP4 4.2 (x32): 17 GB/s The image shows a person ironing clothes on a… The image depicts an unusual scene where a man is ironing clothes on an ironing board placed on the back of a moving vehicle, specifically a yellow SUV. This is not a typical activity one would expect to see on a city street, as ironing is usually done indoors in a stationary position to ensure safety and to prevent accidents. The man's actions are not only unconventional but also potentially dangerous due to the risk of falling or being hit by other vehicles or pedestrians. Additionally, the presence of a taxicab in the background adds to the urban environment, which makes the scene even more out of the ordinary. 1 Assumes GGML Quantization: ggml.ai. 2 Kim, Sehoon, et al. "Full stack optimization of transformer inference: a survey." arXiv preprint arXiv:2302.14017 (2023) * LLAVA (llava-vl.github.io) | Assume 1 token/word | Excluding time to first token 10 © 2024 Micron Technology
  • 11. LPCAMM2 for AI-equipped systems High speed Energy efficient Modular and serviceable Space savings Performance • LPDDR5x speed of up to 9.6Gbps • Full 128-bit, dual-channel, low-power modular memory solution Power efficiency • Consumes 57%-61%1 less active power and up to 80%1 less system standby power compared to DDR5 SODIMM • Thermal efficiency, fanless computers Modularity • Flexibility to upgrade system memory capacity • Single PCB for all memory configurations Form factor • Up to 64%2 space savings • Space savings for industrial PCs, embedded single-board computers, AIoT systems 11 1 Power measurements in mW per 64-bit bus at the same LPDDR5X speed compared to SODIMM 2 Calculation based on comparison of the total volume of commercially available dual-stacked DDR5 SODIMM module (32,808 mm3) to LPCAMM2 module (11,934 mm3) © 2024 Micron Technology
  • 12. NVMe SSD Port 0 Multiport SSD as centralized storage Supporting multiple subsystems in a single storage device 4150AT product highlights • Configurable multiport (single, dual, triple and quad) • SR-IOV allowing for shared and private namespaces • Design flexibility to match system usage models with TLC, SLC and HE-SLC endurance modes • Up to 600K read and 100K write IOPS performance • -40 C to 115 C Tc operating temperature range • Fast boot with TTR <100ms NS1 NS2 NS3 NS4 Multiple HW and SW subsystems (different AI models) Single multiport centralized storage PCIe Robot control Compliance vision camera Multi-camera machine vision Edge platform agent SR-IOV and SW Virtualization Port 1 Port 2 Port 3 12 Legend: SR-IOV = single root I/O virtualization, NS = namespace, PF = physical function, VF = virtual function, Tc = case temperature, TTR = time to ready, TLC = triple-level cell, SLC = single-level cell, HE-SLC = high endurance SLC, IOPS = input/output operations per second © 2024 Micron Technology
  • 13. Micron AI memory and storage portfolio Leadership products to enable AI workloads © 2024 Micron Technology 13
  • 14. AI at the edge (outside the data center) reveals memory as a bottleneck • Disproportionate growth between transformer size vs. memory bandwidth • Data pre-preprocessing overhead impacts latency • On-chip SRAM is cost prohibitive vs. external DRAM Memory technology influences AI model execution performance • Edge AI devices TOPS showcase memory bandwidth gap • Tiling activation requires in-line memory density resources • In generative language, bandwidth is required for quality of service Leading memory technologies offer the best mix of solutions for edge AI applications • DDR5 for AI training workloads • LPDDR4 and LPDDR5 for neural network compute • LPCAMM2 to leverage LPDDR5X performance with DIMM modularity • Multiport SSD to support different AI models and compute in a single storage Summary Micron memory enables all forms of AI embedded solutions Visit us at Booth #105 Drones and industrial transport Smart grid and clean energy Industrial AR/VR Smart factory and robotics AI-enabled video security and analytics Low earth orbit (LEO) communication 14 © 2024 Micron Technology
  • 15. 15 Thank You © 2024 Micron Technology