Co-design of DL Accelerators in VEDLIoT. Muhammad Waqar Azhar. Workshop on Deep Learning for IoT (DL4IoT), co-located with HiPEAC 2022, Budapest, Hungary, June 2022.
Report
Share
Report
Share
1 of 13
More Related Content
Similar to HiPEAC2022-DL4IoT workshop_ Muhammad Waqar Azhar
The document summarizes a meetup group for the application of high-performance and GPU supercomputing technology to business problems. The group started in 2011 with locations in several US cities and Tokyo, and has reached over 1000 members. It is non-profit and hosted on Meetup.com. The group provides a forum for professionals to discuss challenges and solutions for applying advanced computing technologies in business.
Exploring emerging technologies in the HPC co-design spacejsvetter
This document discusses emerging technologies for high performance computing (HPC), focusing on heterogeneous computing and non-volatile memory. It provides an overview of HPC architectures past and present, highlighting the trend toward more heterogeneous systems using GPUs and other accelerators. The document discusses challenges for applications to adapt to these changing architectures. It also explores potential future technologies like 3D memory and discusses the Department of Energy's efforts in codesign centers to facilitate collaboration between application developers and emerging hardware.
This talk was given at a workshop entitled "Cybersecurity Engagement in a Research Environment" at Rady School of Management at UCSD. The workshop was organized by Michael Corn, the UCSD CISO. It tries to provoke discussion around the cybersecurity features and requirements of international science collaborations, as well as more generally, federated cyberinfrastructure systems.
Parallelformers is a tool for efficiently parallelizing large language models across multiple GPUs. It was created to address the challenges of deploying and using very large models, which require extensive engineering and expensive hardware. Parallelformers uses model parallelism techniques inspired by Megatron-LM to split models across GPUs for efficient distributed processing and inference. The key design principles of Parallelformers are efficient model parallelism, scalability to support many models, simplicity of use, and enabling easy deployment of large models.
In this paper, we develop a vision of software evolution based
on a feature-oriented perspective. From the fact that features
provide a common ground to all stakeholders, we derive a
hypothesis that changes can be eectively managed in a
feature-oriented manner. Assuming that the hypothesis holds,
we argue that feature-oriented software evolution relying
on automatic traceability, analyses, and recommendations
reduces existing challenges in understanding and managing
evolution. We illustrate these ideas using an automotive
example and raise research questions for the community.
Ceph has been in development for over a decade since its beginnings as a research project at UC Santa Cruz in the 2000s. It was incubated at DreamHost and later spun out to form Inktank to build Ceph into an enterprise-grade open source storage platform. Key developments included the RADOS distributed object store, librados client library, RBD block device, and S3-compatible radosgw object store. Ceph now has a large community and is used in many production deployments, with continued work to improve performance, add new features like erasure coding, and expand its capabilities for big data and the enterprise.
Session ID: SFO17-509
Session Name: Deep Learning on ARM Platforms
- SFO17-509
Speaker: Jammy Zhou
Track:
★ Session Summary ★
A new era of deep learning is coming with algorithm evolvement, powerful computing platforms and large dataset availability. This session will focus on existing and potential heterogeneous accelerator solutions (GPU, FPGA, DSP, and etc) for ARM platforms and the work ahead from platform perspective.
---------------------------------------------------
★ Resources ★
Event Page: http://connect.linaro.org/resource/sfo17/sfo17-509/
Presentation:
Video:
---------------------------------------------------
★ Event Details ★
Linaro Connect San Francisco 2017 (SFO17)
25-29 September 2017
Hyatt Regency San Francisco Airport
---------------------------------------------------
Keyword:
http://www.linaro.org
http://connect.linaro.org
---------------------------------------------------
Follow us on Social Media
https://www.facebook.com/LinaroOrg
https://twitter.com/linaroorg
https://www.youtube.com/user/linaroorg?sub_confirmation=1
https://www.linkedin.com/company/1026961
This session will walk us through the concept of orchestration and then it will explore orchestration solutions like Docker Swarm, K8 and Mesos + Marathon by helpful examples.
TRACK F: OpenCL for ALTERA FPGAs, Accelerating performance and design product...chiportal
The document discusses OpenCL for accelerating FPGA designs. It provides an overview of technology trends favoring parallelism and programmability. OpenCL is presented as a solution to bring FPGA design closer to software development by providing a standard programming model and faster compilation. The document describes how OpenCL maps to FPGAs by compiling kernels to hardware pipelines and discusses examples accelerated using OpenCL on FPGAs, including AES encryption, option pricing, document filtering, and video compression.
The document discusses the future of computing platforms and how they will change to handle massive amounts of data and machine learning tasks. Some key points:
- Traditional views of performance gains from clock speed increases are over. New architectures enabled by multi-core CPUs will radically change computing.
- "Big data" tasks like search, machine learning, and real-time data analysis will be increasingly important drivers of new computing platforms.
- Simple machine learning models applied to massive amounts of data can produce useful results, even without deep domain expertise. This approach has been demonstrated to work well for tasks like language translation.
- Future platforms may blend CPUs and GPUs differently to best handle both serial and parallel tasks for big data and machine
Xavier Amatriain, VP of Engineering, Quora at MLconf SF - 11/13/15MLconf
10 More Lessons Learned from Building Real-Life ML Systems: A year ago I presented a collection of 10 lessons in MLConf. These goal of the presentation was to highlight some of the practical issues that ML practitioners encounter in the field, many of which are not included in traditional textbooks and courses. The original 10 lessons included some related to issues such as feature complexity, sampling, regularization, distributing/parallelizing algorithms, or how to think about offline vs. online computation.
Since that presentation and associated material was published, I have been asked to complement it with more/newer material. In this talk I will present 10 new lessons that not only build upon the original ones, but also relate to my recent experiences at Quora. I will talk about the importance of metrics, training data, and debuggability of ML systems. I will also describe how to combine supervised and non-supervised approaches or the role of ensembles in practical ML systems.
10 more lessons learned from building Machine Learning systemsXavier Amatriain
1. Machine learning applications at Quora include answer ranking, feed ranking, topic recommendations, user recommendations, and more. A variety of models are used including logistic regression, gradient boosted decision trees, neural networks, and matrix factorization.
2. Implicit signals like watching and clicking tend to be more useful than explicit signals like ratings. However, both implicit and explicit signals combined can better represent long-term goals.
3. The outputs of machine learning models will often become inputs to other models, so models need to be designed with this in mind to avoid issues like feedback loops.
10 more lessons learned from building Machine Learning systems - MLConfXavier Amatriain
1. Machine learning applications at Quora include answer ranking, feed ranking, topic recommendations, user recommendations, and more. A variety of models are used including logistic regression, gradient boosted decision trees, neural networks, and matrix factorization.
2. Implicit signals like watching and clicking tend to be more useful than explicit signals like ratings. However, both implicit and explicit signals combined can better represent long-term goals.
3. It is important to focus on feature engineering to create features that are reusable, transformable, interpretable, and reliable. The outputs of models may become inputs to other models, so care must be taken to avoid feedback loops and ensure proper data dependencies.
Oplægget blev holdt ved et seminar i InfinIT-interessegruppen Højniveausprog til Indlejrede Systemer den 2. oktober 2013. Læs mere om interessegruppen her: http://infinit.dk/dk/interessegrupper/hoejniveau_sprog_til_indlejrede_systemer/hoejniveau_sprog_til_indlejrede_systemer.htm
London Ceph Day Keynote: Building Tomorrow's Ceph Ceph Community
This document provides an overview of the history and development of Ceph, an open-source distributed storage system. It discusses Sage Weil's initial research that led to Ceph's creation, the incubation of Ceph at DreamHost, the launch of Inktank to support Ceph's development and adoption, and the current state of Ceph including its growing community and usage in production deployments. It also outlines Weil's vision for Ceph's future, including improving governance, adding new technologies like tiering and erasure coding, and expanding its role in areas like big data and the enterprise storage market.
Swarm Debug Infrastructure (SDI) is an open-source tool that collects and shares fine-grained data about developers' interactive debugging activities. It allows researchers and practitioners to visualize debugging sessions, create recommendation systems, and perform empirical studies. The study presented used SDI to collect debugging data from 10 participants debugging 3 bugs in the open source project JabRef. The results showed participants toggled between 1-2 breakpoints per task, and exhibited either a "fuzzy" or "straight" navigation pattern through method invocations. SDI provides a way to preserve and analyze human debugging knowledge.
Swarm Debug Infrastructure (SDI) is an open-source tool that collects and shares fine-grained data about developers' interactive debugging activities. It allows researchers and practitioners to visualize debugging sessions, create recommendation systems, and perform empirical studies. An experimental study using SDI found that developers typically toggle between one or two breakpoints per debugging task, and exhibit either a "fuzzy" or "straight" navigation pattern through method invocations. SDI aims to preserve and share human debugging knowledge.
Ceph: A decade in the making and still going strongPatrick McGarry
Ceph is an open source distributed storage system that has been in development for over a decade. It started as a research project at UC Santa Cruz to build scalable object storage. Over the years, it has grown to include distributed block storage, file storage and an S3-compatible object store. Ceph is now used in many production deployments and has a thriving developer community, though continued work is needed to improve areas like CephFS and add new features around erasure coding, tiering and replication. The future of Ceph involves strengthening governance, expanding the ecosystem, improving performance and gaining more adoption in enterprise storage environments.
IoT Tech Expo 2023_Micha vor dem Berge presentationVEDLIoT Project
VEDLIoT Next Generation AIoT Applications. Micha vor dem Berge. VEDLIoT Conference Track co-located with IoT Tech Expo, Amsterdam, Netherlands, September 2023
Next generation accelerated AIoT systems and applications. Pedro Trancoso. Special Session on EU Projects, co-located with Computing Frontiers 2023, Bologna, Italy, May 2023
The document outlines an agenda for a presentation on the VEDLIoT project. The agenda includes an introduction to VEDLIoT by Pedro Trancoso, a presentation on VEDLIoT Hardware Platforms by Kevin Mika, and a discussion of Performance Evaluation and Benchmarking in VEDLIoT by Mario Pormann. The VEDLIoT project aims to develop very efficient deep learning techniques for IoT applications through the use of heterogeneous hardware platforms and accelerators.
IoT Week 2022-NGIoT session_Micha vor dem Berge presentationVEDLIoT Project
This document discusses optimizing a smart home system using edge computing and machine learning. It describes using embedded accelerators like the Nvidia Jetson AGX and Xavier to distribute neural networks and machine learning models to devices around the home. These include a smart mirror, kitchen, door, and other devices. The goal is to optimize the models to increase energy efficiency and distribute the workloads across the edge devices. One focus is developing a smart mirror prototype that can recognize faces, objects and gestures using embedded accelerators like the t.RECS and u.RECS boards to analyze camera input and interact with users through voice and a virtual display.
Next Generation IoT Architectures_Hans SalomonssonVEDLIoT Project
VEDLIoT Toolchain for Efficient Deep Learning on heterogeneous hardware, Hans Salomonsson, EU-IoT Training Workshops Series – "Next Generation IoT Architectures”, November 2021
The document discusses hardware platforms and accelerators for VEDLIoT. It describes the VEDLIoT Hardware Platform as a heterogeneous, modular, and scalable microserver system that supports the IoT spectrum from embedded to edge to cloud. It then provides details on several platforms: the RECS|Box platform which uses Computer-on-Module standards to achieve flexibility and performance; the t.RECS platform optimized for local edge applications; and the uRECS embedded device platform that supports machine learning acceleration and communication interfaces. Diagrams and specifications are given for the architectures of these platforms.
VEDLIoT Cognitive IoT Hardware Platform. René Griessl. Workshop on Deep Learning for IoT (DL4IoT), co-located with HiPEAC 2022, Budapest, Hungary, June 2022
SS-CPSIoT 2023_Kevin Mika and Piotr Zierhoffer presentationVEDLIoT Project
VEDLIoT – Accelerated AIoT. Kevin Mika and Piotr Zierhoffer. CPS&IoT’2023 Summer School on Cyber-Physical Systems and Internet-of-Things, Budva, Montenegro, June 2023
VEDLIOT – Accelerated AIoT. Jens Hagemeyer. 2nd Workshop on Deep Learning for IoT (DL4IoT), co-located with HiPEAC 2023, Toulouse, France, January 2023
VEDLIoT – A heterogeneous hardware platform for next-gen AIoT applications, Jens Hagemeyer, EU-IoT Training Session on “Machine Learning at the Edge and the FarEdge”, IoT Week (online event), August 2021
Security for VEDLIoT Components, from Cloud through Edge to IoT. Marcelo Pasin. Workshop on Deep Learning for IoT (DL4IoT), co-located with HiPEAC 2022, Budapest, Hungary, June 2022
Security and Robustness for VEDLIoT Components, from Cloud through Edge. Marcelo Pasin. VEDLIoT Conference Track co-located with IoT Tech Expo, Amsterdam, Netherlands, September 2023
Reconfigurable ML Accelerators in VEDLIoT. Marco Tassemeier. Workshop on Deep Learning for IoT (DL4IoT), co-located with HiPEAC 2022, Budapest, Hungary, June 2022
EU-IoT Training Workshops Series: AIoT and Edge Machine Learning 2021_Jens Ha...VEDLIoT Project
IoT - Accelerated Deep Learning for Cognitive Edge Computing, Jens Hagemeyer, EU-IoT Training Workshops Series – “AIoT and Edge Machine Learning”, May 2021
This an presentation about electrostatic force. This topic is from class 8 Force and Pressure lesson from ncert . I think this might be helpful for you. In this presentation there are 4 content they are Introduction, types, examples and demonstration. The demonstration should be done by yourself
ALTERNATIVE ANIMAL TOXICITY STUDY .pptxSAMIR PANDA
Alternatives animal testing are development and implementation of test methods that avoid the use of live animals.
Human biochemistry, physiology, pharmacology, and endocrinology and toxicology has been derived from animal models.10-100 millions of animals are using for experimentation in a year.
Animals used experimentation distributed among zebra- fish to primates.
Vast majority of animals are sacrificed at end of research programme.The use of animals can be further subdivided according to the degree of suffering
Minor animal suffering:- observing animals in behavioral studies, single blood sampling, Immunization without adjutants, etc.
Moderate animal suffering:- repeated blood sampling, recovery from general anesthesia, etc.
Types of Garden (Mughal and Buddhist style)saloniswain225
Garden is the place where, flower blooming on a plant ,aesthetic things are present like Topiary, Hedges, Arches and many more. Whereas, Botanical garden is an educational institution for scientific research as well as gathering information about different culture. Such as, Hindu, Mughal , Buddhist style.
The extremotolerant desert moss Syntrichia caninervis is a promising pioneer ...Sérgio Sacani
Many plans to establish human settlements on other planets focus on
adapting crops to growth in controlled environments. However, these settlements will also require pioneer plants that can grow in the soils and
harsh conditions found in extraterrestrial environments, such as those
on Mars. Here, we report the extraordinary environmental resilience of Syntrichia caninervis, a desert moss that thrives in various extreme environments. S. caninervis has remarkable desiccation tolerance; even after
losing >98% of its cellular water content, it can recover photosynthetic
and physiological activities within seconds after rehydration. Intact plants
can tolerate ultra-low temperatures and regenerate even after being stored
in a freezer at 80C for 5 years or in liquid nitrogen for 1 month.
S. caninervis also has super-resistance to gamma irradiation and can survive and maintain vitality in simulated Mars conditions; i.e., when simultaneously exposed to an anoxic atmosphere, extreme desiccation, low temperatures, and intense UV radiation. Our study shows that S. caninervis is
among the most stress tolerant organisms. This work provides fundamental insights into the multi-stress tolerance of the desert moss
S. caninervis, a promising candidate pioneer plant for colonizing extraterrestrial environments, laying the foundation for building biologically sustainable human habitats beyond Earth.
PART 1 The New Natural Principles of Electromagnetism and Electromagnetic Fie...Thane Heins
Document Summary and the History of Perpetual Motion
Every single Faraday Generator coil since 1834 has been and is currently performing Negative Work at infinite efficiency with created Electromagnetic Field Energy during electricity generation and its physical Kinetic Energy reduction or Electromagnetic Resistance of the changing magnetic field which is initially inducing Electric Current in the generator coil according to Faraday's Law of Induction.
The Work-Energy Principle confirms mathematically that the magnitude of the changing magnetic field's Kinetic Energy reduction is equal to the magnitude of Negative Work performed at infinite efficiency, which is equal to the magnitude of Energy (Electromagnetic Field Energy which is created according to Oersted's Law of Creation of Energy of 1820). Created Electromagnetic Field Energy is required in order to perform the Negative Work – because Work cannot be performed in the absence of Energy.
In 2007 Thane Heins of Almonte Ontario, Canada discovered that unlimited amounts of Positive Electromechanical Work could be performed at infinite efficiency with created and TIME DELAYED Electromagnetic Field Energy.
Every single ReGenX Generator coil since 2007 has been and is currently performing Positive Work at infinite efficiency with created Electromagnetic Field Energy during electricity generation and during its physical Kinetic Energy increase or Electromagnetic Assistance of the changing magnetic field which is initially inducing Electric Current in the generator coil according to Heins' Law of Induction.
Faraday Electric Generators all harness internally Created Electromagnetic Field Energy in order to perform Negative Work (system Kinetic Energy reduction) at infinite efficiency and ReGenX Electric Generators harness internally created and Time Delayed Electromagnetic Field Energy in order to perform Positive Work (system Kinetic Energy increase) at infinite efficiency.
Both Faraday Generators and ReGenX Generators operate as Perpetual Motion Machines of the First Kind because they both have the ability to perform both Negative or Positive Work indefinitely and at infinite efficiency without requiring any External Energy input. The unlimited Energy required to perform either the Negative or Positive Work is created at the Sub-Atomic Quantum Electron level inside the generators' Current Bearing Wires according to the Law of Creation of Energy.
Hans Christian Oersted discovered the Law of Creation of Energy in 1820 when he demonstrated the world's first Perpetual Motion Machine of the First Kind at the University of Copenhagen when he also simultaneously violated Newton's 1st, 2nd and 3rd Laws of Motion.
Michael Faraday built and demonstrated the world's second Perpetual Motion Machine of the First Kind in 1822 when he demonstrated his Electric Motor invention which harnessed created Electromagnetic Field Energy in order to perform Positive Electromechanical Work at infinite efficienc
History & overview of Bioprocess Technology.pptxberciyalgolda1
Bioprocess technology is a field that merges biology, chemistry, and engineering to develop processes that harness living cells or their components (like enzymes) for the production of pharmaceuticals, chemicals, food, and biofuels. This multidisciplinary field has evolved significantly over the past few decades, playing a crucial role in various industries.
Ethical considerations play a crucial role in research, ensuring the protection of participants and the integrity of the study. Here are some subject-specific ethical issues that researchers need
Towards Wearable Continuous Point-of-Care Monitoring for Deep Vein Thrombosis...ThrombUS+ Project
Kaldoudi E, Marozas M, Jurkonis R, Pousset N, Legros M, Kircher M, Novikov D, Sakalauskas A, Moustakidis P, Ayinde B, Moltani LA, Balling S, Vehkaoja A, Oksala N, Macas A, Balciuniene N, Bigaki M, Potoupnis M, Papadopoulou S-L, Grandone E, Gautier M, Bouda S, Schloetelburg C, Prinz T, Dionisio P, Anagnostopoulos S, Drougka I, Folkvord F, Drosatos G, Didaskalou S and the ThrombUS+ Consortium, Towards Wearable Continuous Point-of-Care Monitoring for Deep Vein Thrombosis of the Lower Limb. In: Jarm, T., Šmerc, R., Mahnič-Kalamiza, S. (eds) 9th European Medical and Biological Engineering Conference. EMBEC 2024. IFMBE Proceedings, vol 113. Springer, Cham. https://doi.org/10.1007/978-3-031-61628-0_36
Presented by Dr. Stelios Didaskalou, ThrombUS+ Project Manager
ScieNCE grade 08 Lesson 1 and 2 NLC.pptxJoanaBanasen1
just download it................
just download it................
just download it................
just download it................
just download it................
just download it................
just download it................
just download it................
just download it................
just download it................
just download it................
just download it................
just download it................
just download it................
just download it................
just download it................
just download it................
just download it................
just download it................
just download it................
just download it................
just download it................
just download it................
just download it................
just download it................
just download it................
just download it................
just download it................
just download it................
just download it................
just download it................
just download it................
just download it................
just download it................
just download it................
just download it................
just download it................
just download it................
just download it................
just download it................
just download it................
just download it................
just download it................
just download it................
just download it................
just download it................
just download it................
just download it................
just download it................
just download it................
just download it................
just download it................
just download it................
just download it................
just download it................
just download it................
just download it................
just download it................
just download it................
just download it................
just download it................
just download it................
just download it................
just download it................
just download it................
just download it................
just download it................
just download it................
just download it................
just download it................
just download it................
just download it................
just download it................
just download it................
just download it................
just download it................
just download it................
just download it................
just download it................
just download it................
just download it................
just download it................
just download it................
just download it................
just download it................
just download it................
just download it................
just download it................
just download it................
just download it................
just download it..............
Dalghren, Thorne and Stebbins System of Classification of AngiospermsGurjant Singh
The Dahlgren, Thorne, and Stebbins system of classification is a modern method for categorizing angiosperms (flowering plants) based on phylogenetic relationships. Developed by botanists Rolf Dahlgren, Robert Thorne, and G. Ledyard Stebbins, this system emphasizes evolutionary relationships and incorporates extensive morphological and molecular data. It aims to provide a more accurate reflection of the genetic and evolutionary connections among angiosperm families and orders, facilitating a better understanding of plant diversity and evolution. This classification system is a valuable tool for botanists, researchers, and horticulturists in studying and organizing the vast diversity of flowering plants.
8. 8
▪ Model case-study: MobileNet
▪ Observation: generic HW not efficient
▪ Challenge: Depthwise convolution
Co-Design Example - Motivation
● Heterogeneity at different levels:
○ Model layers of different type (e.g. depthwise and pointwise convolution)
○ Within same layer type (e.g. activation and filter sizes and shapes)
○ Determines: buffer sizes, reuse, parallelism
Layer-specific hardware to capture heterogeneity!
9. 9
Co-Design Example: Open Questions & Approaches
Approach A:
one-HW-for-all
DOG
Approach B:
one-HW-per-layer-type
Approach C:
one-HW-per-layer
+ Runs any model
- Suboptimal efficiency
+ Matches layer types
- Suboptimal utilization
+ Best efficiency
- Resource-hungry
10. 10
Co-Design Example: Open Questions & Approaches
Approach A:
one-HW-for-all
DOG
Approach B:
one-HW-per-layer-type
Approach C:
one-HW-per-layer
TVM-VTA
PYNQ-Z2
Unique Kernels
ZCU102
Xilinx FINN
ZCU102
• MobileNet requires aggressive
quantization (4b)
• Performance: 35 GOPS and 68
GOPS using MobileNetsV1 1x
and 0.5x
• Su, Jiang, et al. "Redundancy-
reduced mobilenet acceleration on
reconfigurable logic for imagenet
classification."
• Performance: approx 90 GOPS
• Resnet34
• Performance: approx 8 GOPS
• (DPU performance > 20 GOPS)
11. 11
Proposed Solution
Co-design:
▪ Approach B:
▪ Mapping is good but throughput is below threshold…
▪ Approach C:
▪ FINN requires large HW to support original model -> more aggressive quantization
▪ Quantized model fits in HW but accuracy is below threshold…
?
B + C
12. 12
▪ Current situation:
▪ Zoo of DNN models
▪ Zoo of HW accelerators
▪ Heterogeneity in the model -> Heterogeneity in the hardware
The need for Co-Design!
Co-Design with both generic and layer-specific HW modules
Conclusions