Generative AI models, such as ChatGPT and Stable Diffusion, can create new and original content like text, images, video, audio, or other data from simple prompts, as well as handle complex dialogs and reason about problems with or without images. These models are disrupting traditional technologies, from search and content creation to automation and problem solving, and are fundamentally shaping the future user interface to computing devices. Generative AI can apply broadly across industries, providing significant enhancements for utility, productivity, and entertainment. As generative AI adoption grows at record-setting speeds and computing demands increase, on-device and hybrid processing are more important than ever. Just like traditional computing evolved from mainframes to today’s mix of cloud and edge devices, AI processing will be distributed between them for AI to scale and reach its full potential.
In this presentation you’ll learn about:
- Why on-device AI is key
- Full-stack AI optimizations to make on-device AI possible and efficient
- Advanced techniques like quantization, distillation, and speculative decoding
- How generative AI models can be run on device and examples of some running now
- Qualcomm Technologies’ role in scaling on-device generative AI
This talk overviews my background as a female data scientist, introduces many types of generative AI, discusses potential use cases, highlights the need for representation in generative AI, and showcases a few tools that currently exist.
Exploring Opportunities in the Generative AI Value Chain.pdfDung Hoang
The article "Exploring Opportunities in the Generative AI Value Chain" by McKinsey & Company's QuantumBlack provides insights into the value created by generative artificial intelligence (AI) and its potential applications.
The document discusses using generative AI to improve learning products by making them better, stronger, and faster. It provides examples of using generative models for game creation, runtime design, and postmortem data analysis. It also addresses ethics and copyright challenges and considers generative AI as both a tool and potential friend. The document explores what models are, how they work, examples of applications, and resources for staying up to date on generative AI advances.
Chat GPT 4 can pass the American state bar exam, but before you go expecting to see robot lawyers taking over the courtroom, hold your horses cowboys – we're not quite there yet. That being said, AI is becoming increasingly more human-like, and as a VC we need to start thinking about how this new wave of technology is going to affect the way we build and run businesses. What do we need to do differently? How can we make sure that our investment strategies are reflecting these changes? It's a brave new world out there, and we’ve got to keep the big picture in mind!
Sharing here with you what we at Cavalry Ventures found out during our Generative AI deep dive.
An overview of the most important AI capabilities in marketing, advertising and content creation. I made this presentation to inform, educate and inspire people in the creative industries to familiarise themselves with the incredible toolsets that are already here and in development. I also explain how generative Ai works explore some possible new roles and business models for agencies. Hope you enjoy it!
The Future of AI is Generative not Discriminative 5/26/2021Steve Omohundro
The deep learning AI revolution has been sweeping the world for a decade now. Deep neural nets are routinely used for tasks like translation, fraud detection, and image classification. PwC estimates that they will create $15.7 trillion/year of value by 2030. But most current networks are "discriminative" in that they directly map inputs to predictions. This type of model requires lots of training examples, doesn't generalize well outside of its training set, creates inscrutable representations, is subject to adversarial examples, and makes knowledge transfer difficult. People, in contrast, can learn from just a few examples, generalize far beyond their experience, and can easily transfer and reuse knowledge. In recent years, new kinds of "generative" AI models have begun to exhibit these desirable human characteristics. They represent the causal generative processes by which the data is created and can be compositional, compact, and directly interpretable. Generative AI systems that assist people can model their needs and desires and interact with empathy. Their adaptability to changing circumstances will likely be required by rapidly changing AI-driven business and social systems. Generative AI will be the engine of future AI innovation.
generative-ai-fundamentals and Large language modelsAdventureWorld5
Thank you for the detailed review of the protein bars. I'm glad to hear you and your family are enjoying them as a healthy snack and meal replacement option. A couple suggestions based on your feedback:
- For future orders, you may want to check the expiration dates to help avoid any dried out bars towards the end of the box. Freshness is key to maintaining the moist texture.
- When introducing someone new to the bars, selecting one in-person if possible allows checking the flexibility as an indicator it's moist inside. This could help avoid a disappointing first impression from a dry sample.
- Storing opened boxes in an airtight container in the fridge may help extend the freshness even further when you can't
This presentation presents an overview of the challenges and opportunities of generative artificial intelligence in Web3. It includes a brief research history of generative AI as well as some of its immediate applications in Web3.
This document discusses AI and ChatGPT. It begins with an introduction to David Cieslak and his company RKL eSolutions, which provides ERP sales and consulting. It then provides definitions for key AI concepts like artificial intelligence, generative AI, large language models, and ChatGPT. The document discusses OpenAI's ChatGPT tool and how it works. It covers prompts, commands, and potential uses and impacts of generative AI technologies. Finally, it discusses concerns regarding generative AI and the future of life institute's call for more oversight of advanced AI.
Generative AI - The New Reality: How Key Players Are Progressing Vishal Sharma
The document discusses key players in generative AI and their progress. It provides an overview of generative AI including its evolution since 1950, where the spending is focused, how the technology works, and deployment models. It then profiles several major companies leading advancements in generative AI, including their strategies, growth areas, and risks. These companies are TSMC, Nvidia, Microsoft, Google, Amazon, Tesla, Oracle, Salesforce, SAP, and Palo Alto Networks.
Generative AI Use-cases for Enterprise - First SessionGene Leybzon
In this presentation, we will delve into the exciting applications of Generative AI across various business domains. Leveraging the capabilities of artificial intelligence and machine learning, Generative AI allows for dynamic, context-aware user interfaces that adapt in real-time to provide personalized user experiences. We will explore how this transformative technology can streamline design processes, facilitate user engagement, and open the doors to new forms of interactivity.
A journey into the business world of artificial intelligence. Explore at a high-level ongoing business experiments in creating new value.
* Review AI as a priority for value generation
* Explore ongoing experimentation
* Touch on how businesses are monetising AI
* Understand the intent of adoption by industries
* Discuss on the state of customer trust in AI
Part 1 of a 9 Part Research Series named "What matters in AI" published on https://www.andremuscat.com
An Introduction to Generative AI - May 18, 2023CoriFaklaris1
For this plenary talk at the Charlotte AI Institute for Smarter Learning, Dr. Cori Faklaris introduces her fellow college educators to the exciting world of generative AI tools. She gives a high-level overview of the generative AI landscape and how these tools use machine learning algorithms to generate creative content such as music, art, and text. She then shares some examples of generative AI tools and demonstrate how she has used some of these tools to enhance teaching and learning in the classroom and to boost her productivity in other areas of academic life.
Presenting the landscape of AI/ML in 2023 by introducing a quick summary of the last 10 years of its progress, current situation, and looking at things happening behind the scene.
How Does Generative AI Actually Work? (a quick semi-technical introduction to...ssuser4edc93
This document provides a technical introduction to large language models (LLMs). It explains that LLMs are based on simple probabilities derived from their massive training corpora, containing trillions of examples. The document then discusses several key aspects of how LLMs work, including that they function as a form of "lossy text compression" by encoding patterns and relationships in their training data. It also outlines some of the key elements in the architecture and training of the most advanced LLMs, such as GPT-4, focusing on their huge scale, transformer architecture, and use of reinforcement learning from human feedback.
Generative AI Use cases for Enterprise - Second SessionGene Leybzon
This document provides an overview of generative AI use cases for enterprises. It begins with addressing concerns that generative AI will replace jobs. The presentation then defines generative AI as AI that generates new content like text, images or code based on patterns learned from training data.
Several examples of generative AI outputs are shown including code, text, images and advice. Potential use cases for enterprises are then outlined, including synthetic data generation, code generation, code quality checks, customer service, and data analysis. The presentation concludes by emphasizing that people will be "replaced by someone who knows how to use AI", not AI itself.
Conversational AI and Chatbot IntegrationsCristina Vidu
Conversational AI and Chatbots (or rather - and more extensively - Virtual Agents) offer great benefits, especially in combination with technologies like RPA or IDP. Corneliu Niculite (Presales Director - EMEA @DRUID AI) and Roman Tobler (CEO @Routinuum & UiPath MVP) are discussing Conversational AI and why Virtual Agents play a significant role in modern ways of working. Moreover, Corneliu will be displaying how to build a Workflow and showcase an Accounts Payable Use Case, integrating DRUID and UiPath Robots.
📙 Agenda:
The focus of our meetup is around the following areas - with a lot of room to discuss and share experiences:
- What is "Conversational AI" and why do we need Chatbots (Virtual Agents);
- Deep-Dive to a DRUID-UiPath Integration via an Accounts Payable Use Case;
- Discussion, Q&A
Speakers:
👨🏻💻 Corneliu Niculite, Presales Director - EMEA DRUID AI
👨🏼💻 Roman Tobler, UiPath MVP, Co-Founder & CEO Routinuum GmbH
This session streamed live on March 8, 2023, 16:00 PM CET.
Check out our upcoming events at: community.uipath.com
Contact us at: community@uipath.com
A non-technical overview of Large Language Models, exploring their potential, limitations, and customization for specific challenges. While this deck is tailored for an audience from the financial industry in mind, its content remains broadly applicable.
(Note: Discover a slightly updated version of this deck at slideshare.net/LoicMerckel/introduction-to-llms.)
AI firsts: Leading from research to proof-of-conceptQualcomm Research
AI has made tremendous progress over the past decade, with many advancements coming from fundamental research from many decades ago. Accelerating the pipeline from research to commercialization has been daunting since scaling technologies in the real world faces many challenges beyond the theoretical work done in the lab. Qualcomm AI Research has taken on the task of not only generating novel AI research but also being first to demonstrate proof-of-concepts on commercial devices, enabling technology to scale in the real world. This presentation covers:
The challenges of deploying cutting-edge research on real-world mobile devices
How Qualcomm AI Research is solving system and feasibility challenges with full-stack optimizations to quickly move from research to commercialization
Examples where Qualcomm AI Research has had industrial or academic firsts
For the full video of this presentation, please visit: https://www.edge-ai-vision.com/2023/06/accelerating-newer-ml-models-using-the-qualcomm-ai-stack-a-presentation-from-qualcomm/
Vinesh Sukumar, Senior Director and Head of AI/ML Product Management at Qualcomm Technologies, presents the “Accelerating Newer ML Models Using the Qualcomm AI Stack” tutorial at the May 2023 Embedded Vision Summit.
The Qualcomm AI Stack revolutionizes how Qualcomm thinks about AI software and provides the ultimate tool and user interface to enable ecosystem partners to create faster and smarter AI applications for all embedded form factors. Focusing on real user experience challenges centered around model deployment, Sakumar explains how the Snapdragon developer community leverages data types, quantization and neural architecture search—among others—to optimize complex AI architectures for emerging use cases.
AI for Manufacturing (Machine Vision, Edge AI, Federated Learning)byteLAKE
Artificial intelligence and machine learning technologies are transforming key industries like manufacturing, finance, retail, and healthcare. Edge computing and federated learning are emerging approaches that can help address challenges around data privacy, bandwidth constraints, and latency. Edge AI runs optimized models directly on devices to analyze data and only send results rather than raw data. Federated learning leverages local AI models across edge devices to improve performance while keeping sensitive data private. Together these approaches help make AI more scalable, responsive and privacy-preserving for industries.
For the full video of this presentation, please visit: https://www.edge-ai-vision.com/2024/06/deploying-large-models-on-the-edge-success-stories-and-challenges-a-presentation-from-qualcomm/
Vinesh Sukumar, Senior Director of Product Management at Qualcomm Technologies, presents the “Deploying Large Models on the Edge: Success Stories and Challenges” tutorial at the May 2024 Embedded Vision Summit.
In this talk, Dr. Sukumar explains and demonstrates how Qualcomm has been successful in deploying large generative AI and multimodal models on the edge for a variety of use cases in consumer and enterprise markets. He examines key challenges that must be overcome before large models at the edge can reach their full commercial potential. He also highlights how Qualcomm is addressing these challenges through upgraded processor hardware, improved developer tools and a comprehensive library of fully optimized AI models in the Qualcomm AI Hub.
This issue’s feature article, Tuning Autonomous Driving Using Intel® System Studio, illustrates how the tools in Intel System Studio give embedded systems and connected device developers an integrated development environment to build, debug, and tune performance and power usage. Continuing the theme of tuning edge applications, Building Fast Data Compression Code for Cloud and Edge Applications shows how to use the Intel® Integrated Performance Primitives
to speed data compression.
For the full video of this presentation, please visit: https://www.edge-ai-vision.com/2022/06/powering-the-connected-intelligent-edge-and-the-future-of-on-device-ai-a-presentation-from-qualcomm/
Ziad Asghar, Vice President of Product Management at Qualcomm, presents the “Powering the Connected Intelligent Edge and the Future of On-Device AI” tutorial at the May 2022 Embedded Vision Summit.
Qualcomm is leading the realization of the “connected intelligent edge,” where the convergence of wireless connectivity, efficient computing and distributed AI will power the devices and experiences that you deserve. In this talk, Asghar explores some of the key challenges in deploying AI across diverse edge products in markets including mobile, automotive, XR, IoT, robotics and PCs — and some of the important differences in the AI requirements of these applications.
Asghar identifies unique AI features that will be needed as physical and digital spaces converge in what is now called the “metaverse”. He highlights key AI technologies offered within Qualcomm products, and how the company connects them to enable the connected intelligent edge. Finally, he shares his vision of the future of on-device AI — including on-device learning, efficient models, state-of-the-art quantization, and how Qualcomm plans to make this vision a reality.
The PPT contains the following content:
1. What is Google Cloud Study Jam
2. What is Cloud Computing
3. Fundamentals of cloud computing
4. what is Generative AI
5. Fundamentals of Generative AI
6. Breif overview on Google Cloud Study Jam.
7. Networking Session.
Helixa uses serverless machine learning architectures to power an audience intelligence platform. It ingests large datasets and uses machine learning models to provide insights. Helixa's machine learning system is built on AWS serverless services like Lambda, Glue, Athena and S3. It features a data lake for storage, a feature store for preprocessed data, and uses techniques like map-reduce to parallelize tasks. Helixa aims to build scalable and cost-effective machine learning pipelines without having to manage servers.
A talk on reducing costs & increasing efficiencies by designing, testing & engineering in simulation first, plus examples of robotics & environmental capability.
Learn how recent innovation at Google allows you to produce intelligence from IoT data. We will look at some use cases and you will get an overview of the building blocks we use to build truly intelligent IoT solutions in the cloud and on the edge.
This document discusses the need for continuous delivery in software development. It defines continuous delivery as making sure software can be reliably released at any time. The document outlines some key aspects of continuous delivery including automated testing, infrastructure as code, continuous integration, and blue/green deployments. It provides an example of implementing continuous delivery for a large retail company using tools like Jenkins, Puppet, Logstash and practices like infrastructure as code and automated testing.
A confluence of events is accelerating the growth of AI in the Enterprise - (i) The COVID pandemic is accelerating the digital transformation of enterprises, (ii) increased digital sales & digital interaction is fueling interest in operationalizing AI to drive revenue and cost efficiencies and (iii) Enterprise databases and enterprise apps are infusing AI to transparently augment predictive capabilities for clients. Enterprise Power Systems are pillars of the global economy hosting our trinity of operating systems
Qualcomm Technologies conducts leading research across artificial intelligence to transform industries by connecting devices at the wireless edge. This includes on-device AI processing and sensing augmented by edge cloud computing. Qualcomm aims to scale AI through distributed processing of data close to its source using techniques like model compression, quantization, and efficient hardware and software tools. The company's decade of AI research has included work on computer vision, speech recognition, reinforcement learning and other areas to advance power efficient and personalized AI.
If you're like most of the world, you're on an aggressive race to implement machine learning applications and on a path to get to deep learning. If you can give better service at a lower cost, you will be the winners in 2030. But infrastructure is a key challenge to getting there. What does the technology infrastructure look like over the next decade as you move from Petabytes to Exabytes? How are you budgeting for more colossal data growth over the next decade? How do your data scientists share data today and will it scale for 5-10 years? Do you have the appropriate security, governance, back-up and archiving processes in place? This session will address these issues and discuss strategies for customers as they ramp up their AI journey with a long term view.
Join us to see how Public-sector organizations and AWS Partners are combining Smart Devices and Artificial Intelligence to create flexible, secure and cost-effective solutions. Applying machine learning models to live video/audio, cameras can be transformed into flexible IoT devices that perform critical functions around public safety, security, property management, smart parking & environmental management. Learn how these solutions are architected using AWS services such as AWS IoT Core, AWS GreenGrass, AWS DeepLens, Amazon SageMaker and Amazon Alexa.
A late upload. This slide was presented on Aug 31, 2019, when I delivered a talk for AIoT seminar in University of Lambung Mangkurat, Banjarbaru. It's part of Republic of IoT 2019 event.
The document outlines the agenda for a Global AI Night event hosted by Microsoft. The event includes a welcome and keynote, followed by group sessions on using AI in Azure. There are beginner and intermediate tracks on topics like computer vision, machine learning, and deep learning. Speakers include representatives from Microsoft and SafeNet Consulting who will discuss leveraging Azure services and tools to build, train, and deploy AI models across devices and platforms.
As generative AI adoption grows at record-setting speeds and computing demands increase, hybrid processing is more important than ever. But just like traditional computing evolved from mainframes and thin clients to today’s mix of cloud and edge devices, AI processing must be distributed between the cloud and devices for AI to scale and reach its full potential. In this talk you’ll learn:
• Why on-device AI is key
• Which generative AI models can run on device
• Why the future of AI is hybrid
• Qualcomm Technologies’ role in making hybrid AI a reality
Deep Learning Neural Network Acceleration at the Edge - Andrea GalloLinaro
Short
The growing amount of data captured by sensors and the real time constraints imply that not only big data analytics but also Machine Learning (ML) inference shall be executed at the edge. The multiple options for neural network acceleration in Arm-based platforms provide an unprecedented opportunity for new intelligent devices. It also raises the risk of fragmentation and duplication of efforts when multiple frameworks shall support multiple accelerators.
Andrea Gallo, Linaro VP of Segment Groups, will summarise the existing NN frameworks, accelerator solutions, and will describe the efforts underway in the Arm ecosystem.
Abstract
The dramatically growing amount of data captured by sensors and the ever more stringent requirements for latency and real time constraints are paving the way for edge computing, and this implies that not only big data analytics but also Machine Learning (ML) inference shall be executed at the edge. The multiple options for neural network acceleration in recent Arm-based platforms provides an unprecedented opportunity for new intelligent devices with ML inference. It also raises the risk of fragmentation and duplication of efforts when multiple frameworks shall support multiple accelerators.
Andrea Gallo, Linaro VP of Segment Groups, will summarise the existing NN frameworks, model description formats, accelerator solutions, low cost development boards and will describe the efforts underway to identify the best technologies to improve the consolidation and enable the competitive innovative advantage from all vendors.
Audience
The session will be useful for executives to engineers. Executives will gain a deeper understanding of the issues and opportunities. Engineers at NN acceleration IP design houses will take away ideas for how to collaborate in the open source community on their area of expertise, how to evaluate the performance and accelerate multiple NN frameworks without modifying them for each new IP, whether it be targeting edge computing gateways, smart devices or simple microcontrollers.
Benefits to the Ecosystem
The AI deep learning neural network ecosystem is starting just now and it has similar implications with open source as GPU and video accelerators had in the early days with user space drivers, binary blobs, proprietary APIs and all possible ways to protect their IPs. The session will outline a proposal for a collaborative ecosystem effort to create a common framework to manage multiple NN accelerators while at the same time avoiding to modify deep learning frameworks with multiple forks.
Faster deep learning solutions from training to inference - Michele Tameni - ...Codemotion
Intel Deep Learning SDK enables using of optimized open source deep-learning frameworks, including Caffe and TensorFlow through a step-by-step wizard or iPython interactive notebooks. It includes easy and fast installation of all depended libraries and advanced tools for easy data pre-processing and model training, optimization and deployment, providing an end-to-end solution to the problem. In addition, it supports scale-out on multiple computers for training, as well as using compression methods for deployment of the models on various platforms, addressing memory and speed constraints.
Qualcomm Webinar: Solving Unsolvable Combinatorial Problems with AIQualcomm Research
How do you find the best solution when faced with many choices? Combinatorial optimization is a field of mathematics that seeks to find the most optimal solutions for complex problems involving multiple variables. There are numerous business verticals that can benefit from combinatorial optimization, whether transport, supply chain, or the mobile industry.
More recently, we’ve seen gains from AI for combinatorial optimization, leading to scalability of the method, as well as significant reductions in cost. This method replaces the manual tuning of traditional heuristic approaches with an AI agent that provides a fast metric estimation.
In this presentation you will find out:
Why AI is crucial in combinatorial optimization
How it can be applied to two use cases: improving chip design and hardware-specific compilers
The state-of-the-art results achieved by Qualcomm AI Research
- There is a rich roadmap of 5G technologies coming in the second half of the 5G decade with the 5G Advanced evolution
- 6G will be the future innovation platform for 2030 and beyond building on the 5G Advanced foundation
- 6G will be more than just a new radio design, expanding the role of AI, sensing and others in the connected intelligent edge
- Qualcomm is leading cutting-edge wireless research across six key technology vectors on the path to 6G
3D perception is crucial for understanding the real world. It offers many benefits and new capabilities over 2D across diverse applications, from XR and autonomous driving to IOT, camera, and mobile. 3D perception with machine learning is creating the new state of the art (SOTA) in areas, such as depth estimation, object detection, and neural scene representation. Making these SOTA neural networks feasible for real-world deployment on mobile devices constrained by power, thermal, and performance has been a challenge. Qualcomm AI Research has developed not only novel AI techniques for 3D perception but also full-stack AI optimizations to enable real-world deployments and energy-efficient solutions. This presentation explores the latest research that is enabling efficient 3D perception while maintaining neural network model accuracy. You’ll learn about:
- The advantages of 3D perception over 2D and the need for 3D perception across applications
- Advancements in 3D perception research by Qualcomm AI Research
- Our future 3D perception research directions
5G is going mainstream across the globe, and this is an exciting time to harness the low latency and high capacity of 5G to enable the metaverse. A distributed-compute architecture across device and cloud can enable rich extended reality (XR) user experiences. Virtual reality (VR) and mixed reality (MR) are ready for deployment in private networks, while augmented reality (AR) for wide area networks can be enabled in the near term with Wi-Fi powered AR glasses paired with a 5G-enabled phone. Device APIs enabling application adaptation is critical for good user experience. 5G standards are evolving to support the deployment of AR glasses at a large scale and setting the stage for 6G-era with the merging of the physical, digital, and virtual worlds. Techniques like perception-enhanced wireless offer significant potential to improve user experience. Qualcomm Technologies is enabling the XR industry with platforms, developer SDKs, and reference designs.
Check out this webinar to learn:
• How 5G and distributed-compute architectures enable the metaverse
• The latest results from our boundless XR 5G/6G testbed, including device APIs and perception-enhanced wireless
• 5G standards evolution for enhancing XR applications and the road to 6G
• How Qualcomm Technologies is enabling the industry with platforms, SDKs, and reference designs
This document summarizes a presentation given by Chirag Patel and Tijmen Blankevoort of Qualcomm AI Research on model efficiency techniques for edge AI. They discuss why model efficiency is important for on-device AI due to constraints like power and thermal limits. They overview techniques like quantization, conditional compute, neural architecture search, and compilation that can shrink AI models and efficiently run them on hardware. Specifically, they find that integer quantization through techniques like post-training and quantization-aware training can achieve similar accuracy as floating point models but provide much better performance per watt. Overall, the presentation advocates that integer quantization is the best approach for efficient AI inference on edge devices.
Bringing AI research to wireless communication and sensingQualcomm Research
AI for wireless is already here, with applications in areas such as mobility management, sensing and localization, smart signaling and interference management. Recently, Qualcomm Technologies has prototyped the AI-enabled air interface and launched the Qualcomm 5G AI Suite. These developments are possible thanks to expertise in both wireless and machine learning from over a decade of foundational research in these complementing fields.
Our approach brings together the modeling flexibility and computational efficiency of machine learning and the out-of-domain generalization and interpretability of wireless domain expertise.
In this webinar, Qualcomm AI Research presents an overview of state-of-the-art research at the intersection of the two fields and offers a glimpse into the future of the wireless industry.
Qualcomm AI Research is an initiative of Qualcomm Technologies, Inc.
Speakers:
Arash Behboodi, Machine Learning Research Scientist (Senior Staff Engineer/Manager), Qualcomm AI Research Daniel Dijkman, Machine Learning Research Scientist (Principal Engineer), Qualcomm AI Research
How will sidelink bring a new level of 5G versatility.pdfQualcomm Research
Today, the 5G system mainly operates on a network-to-device communication model, exemplified by enhanced mobile broadband use cases where all data transmissions are between the network (i.e., base station) and devices (e.g., smartphone). However, to fully deliver on the original 5G vision of supporting diverse devices, services, and deployment scenarios, we need to expand the 5G topology further to reach new levels of performance and efficiency.
That is why sidelink communication was introduced in 3GPP standards, designed to facilitate direct communication between devices, independent of connectivity via the cellular infrastructure. Beyond automotive communication, it also benefits many other 5G use cases such as IoT, mobile broadband, and public safety.
5G is designed to serve an unprecedented range of capabilities with a single global standard. With enhanced mobile broadband (eMBB), massive IoT (mIoT), and mission-critical IoT, the three pillars of 5G represent extremes in performance and associated complexity. For IoT services, NB-IoT and eMTC devices prioritize low power consumption and the lowest complexity for wide-area deployments (LPWA), while enhanced ultra-reliable, low-latency communication (eURLLC), along with time-sensitive networking (TSN), delivers the most stringent use case requirements. But there exists an opportunity to more efficiently address a broad range of mid-tier applications with capabilities ranging between these extremes.
In 5G NR Release 17, 3GPP introduced a new tier of reduced capability (RedCap) devices, also known as NR-Light. It is a new device platform that bridges the capability and complexity gap between the extremes in 5G today with an optimized design for mid-tier use cases. With the recent standards completion, NR-Light is set to efficiently expand the 5G universe to connect new frontiers.
Download this presentation to learn:
• What NR-Light is and why it can herald the next wave of 5G expansion
• How NR-Light is accelerating the growth of the connected intelligent edge
• Why NR-Light is a suitable 5G migration path for mid-tier LTE devices
Realizing mission-critical industrial automation with 5GQualcomm Research
Manufacturers seeking better operational efficiencies, with reduced downtime and higher yield, are at the leading edge of the Industry 4.0 transformation. With mobile system components and reliable wireless connectivity between them, flexible manufacturing systems can be reconfigured quickly for new tasks, to troubleshoot issues, or in response to shifts in supply and demand.
There is a long history of R&D collaboration between Bosch Rexroth and Qualcomm Technologies for the effective application of these 5G capabilities to industrial automation use cases. At the Robert Bosch Elektronik GmbH factory in Salzgitter, Germany, this collaboration has reached new heights.
Download this deck to learn how:
• Qualcomm Technologies and Bosch Rexroth are collaborating to accelerate the Industry 4.0 transformation
• 5G technologies deliver key capabilities for mission-critical industrial automation
• Distributed control solutions can work effectively across 5G TSN networks
• A single 5G technology platform solves connectivity and positioning needs for flexible manufacturing
3GPP Release 17: Completing the first phase of 5G evolutionQualcomm Research
The document discusses 3GPP Release 17, which brings new system capabilities and expands 5G to new devices, applications, and deployments. Some key points:
- Release 17 completes the first phase of 5G evolution. It expands 5G to new reduced capability devices, applications in new industries, and deployment models like non-terrestrial networks.
- Release 17 enhances technologies like massive MIMO, mmWave expansion, device power savings, coverage, and ultra-reliable low latency communications. It also introduces integrated access and backhaul and simple repeaters to expand 5G mmWave coverage.
- The release further scales 5G NR to support a wide range of device classes from high-end smartphones to
Setting off the 5G Advanced evolution with 3GPP Release 18Qualcomm Research
In December 2021, 3GPP has reached a consensus on the scope of 5G NR Release 18. This is a significant milestone marking the beginning of 5G Advanced — the second wave of wireless innovations that will fulfill the 5G vision. Release 18 will build on the solid foundation set by Releases 15, 16, and 17, and it sets the longer-term evolution direction of 5G and beyond. This release will encompass a wide range of new and enhancement projects, ranging from improved MIMO and application of AI/ML-enabled air interface to extended reality optimizations and broader IoT support.
Cellular networks have facilitated positioning in addition to voice or data communications from the beginning, since 2G, and we’ve since grown to rely on positioning technology to make our lives safer, simpler, more productive, and even fun. Cellular positioning complements other technologies to operate indoors and outdoors, including dense urban environments where tall buildings interfere with satellite positioning. It works whether we’re standing still, walking, or in a moving vehicle. With 5G, cellular positioning breaks new ground to bring robust precise positioning indoors and outdoors, to meet even the most demanding Industry 4.0 needs.
As we look to the future, the Connected Intelligent Edge will bring a new dimension of positional insight to a broad range of devices, improving wireless use cases still under development. We’re already charting the course to 5G Advanced and beyond by working on the evolution of cellular positioning technology to include RF sensing for situational awareness.
Download the deck to learn more.
The need for intelligent, personalized experiences powered by AI is ever-growing. Our devices are producing more and more data that could help improve our AI experiences. How do we learn and efficiently process all this data from edge devices while maintaining privacy? On-device learning rather than cloud training can address these challenges. In this presentation, we’ll discuss:
- Why on-device learning is crucial for providing intelligent, personalized experiences without sacrificing privacy
- Our latest research in on-device learning, including few-shot learning, continuous learning, and federated learning
- How we are solving system and feasibility challenges to move from research to commercialization
This presentation outlines the synergistic nature of 5G and AI -- two disruptive areas of innovations that can change the world. It illustrates the benefits of adopting AI for the advancements of 5G, as well as showcases the latest progress made by Qualcomm Technologies, Inc.
AI research is enabling more efficient video and voice codecs through techniques like generative models and deep learning. Qualcomm's latest research includes a neural video codec that achieves state-of-the-art compression rates compared to other learned video compression solutions. Their work on B-frame coding also provides improved rate-distortion results by extending neural P-frame codecs to allow for B-frame coding and interpolation. Future research aims to develop more efficient on-device deployment methods and semantically aware compression focused on regions of interest.
Role of localization and environment perception in autonomous drivingQualcomm Research
Dheeraj Ahuja, Sr. Director at Qualcomm Technologies, discusses how localization and perception technologies are critical for enhanced autonomous driving. As autonomous levels increase from active safety to full self-driving, requirements become more complex. Key technologies discussed include radar, camera, lidar, HD maps, and Qualcomm's VEPP precise positioning. Qualcomm's approach focuses on sensor fusion from cameras, radars, lidars and 5G to provide robust perception for autonomous vehicles.
This document discusses Qualcomm's work pioneering 5G broadcast technology. It summarizes Qualcomm's vision of establishing a more efficient way to deliver mass media over cellular networks, their invention of key cellular broadcast technologies for 3G, 4G, and 5G, and their leadership in standardizing cellular broadcast through driving new system designs and collaborating on field trials.
Artificial Intelligence (AI), specifically deep learning, is revolutionizing industries, products, and core capabilities by delivering dramatically enhanced experiences. However, the deep neural networks of today use too much memory, compute, and energy. To make AI truly ubiquitous, it needs to run on the end device within tight power and thermal budgets. Advancements in multiple areas are necessary to improve AI model efficiency, including quantization, compression, compilation, and neural architecture search (NAS). In this presentation, we’ll discuss:
- Qualcomm AI Research’s latest model efficiency research
- Our new NAS research to optimize neural networks more easily for on-device efficiency
- How the AI community can take advantage of this research though our open-source projects, such as the AI Model Efficiency Toolkit (AIMET) and AIMET Model Zoo
How to build high performance 5G networks with vRAN and O-RANQualcomm Research
5G networks are poised to deliver an unprecedented amount of data from a richer set of use cases than we have ever seen. This makes efficient networking in terms of scalability, cost, and power critical for the sustainable growth of 5G. Cloud technologies such as virtualization, containerization and orchestration are now powering a surge of innovation in virtualized radio access network (vRAN) infrastructure with modular hardware and software components, and standardized interfaces. While commercial off-the-shelf (COTS) hardware platforms provide the compute capacity for running vRAN software, hardware accelerators will also play a major role in offloading real-time and complex signal processing functions. Together, COTS platforms and hardware accelerators provide the foundation for building the intelligent 5G network and facilitate innovative new use cases with the intelligent wireless edge.
This presentation takes a look at the technology roadmap for 5G NR millimeter wave (mmWave). Including features such as integrated access and backhaul (IAB), enhancements in beam management, mobility, coverage, and more. For more information, please visit www.qualcomm.com/mmwave
INDIAN AIR FORCE FIGHTER PLANES LIST.pdfjackson110191
These fighter aircraft have uses outside of traditional combat situations. They are essential in defending India's territorial integrity, averting dangers, and delivering aid to those in need during natural calamities. Additionally, the IAF improves its interoperability and fortifies international military alliances by working together and conducting joint exercises with other air forces.
AI_dev Europe 2024 - From OpenAI to Opensource AIRaphaël Semeteys
Navigating Between Commercial Ownership and Collaborative Openness
This presentation explores the evolution of generative AI, highlighting the trajectories of various models such as GPT-4, and examining the dynamics between commercial interests and the ethics of open collaboration. We offer an in-depth analysis of the levels of openness of different language models, assessing various components and aspects, and exploring how the (de)centralization of computing power and technology could shape the future of AI research and development. Additionally, we explore concrete examples like LLaMA and its descendants, as well as other open and collaborative projects, which illustrate the diversity and creativity in the field, while navigating the complex waters of intellectual property and licensing.
MYIR Product Brochure - A Global Provider of Embedded SOMs & SolutionsLinda Zhang
This brochure gives introduction of MYIR Electronics company and MYIR's products and services.
MYIR Electronics Limited (MYIR for short), established in 2011, is a global provider of embedded System-On-Modules (SOMs) and
comprehensive solutions based on various architectures such as ARM, FPGA, RISC-V, and AI. We cater to customers' needs for large-scale production, offering customized design, industry-specific application solutions, and one-stop OEM services.
MYIR, recognized as a national high-tech enterprise, is also listed among the "Specialized
and Special new" Enterprises in Shenzhen, China. Our core belief is that "Our success stems from our customers' success" and embraces the philosophy
of "Make Your Idea Real, then My Idea Realizing!"
Video traffic on the Internet is constantly growing; networked multimedia applications consume a predominant share of the available Internet bandwidth. A major technical breakthrough and enabler in multimedia systems research and of industrial networked multimedia services certainly was the HTTP Adaptive Streaming (HAS) technique. This resulted in the standardization of MPEG Dynamic Adaptive Streaming over HTTP (MPEG-DASH) which, together with HTTP Live Streaming (HLS), is widely used for multimedia delivery in today’s networks. Existing challenges in multimedia systems research deal with the trade-off between (i) the ever-increasing content complexity, (ii) various requirements with respect to time (most importantly, latency), and (iii) quality of experience (QoE). Optimizing towards one aspect usually negatively impacts at least one of the other two aspects if not both. This situation sets the stage for our research work in the ATHENA Christian Doppler (CD) Laboratory (Adaptive Streaming over HTTP and Emerging Networked Multimedia Services; https://athena.itec.aau.at/), jointly funded by public sources and industry. In this talk, we will present selected novel approaches and research results of the first year of the ATHENA CD Lab’s operation. We will highlight HAS-related research on (i) multimedia content provisioning (machine learning for video encoding); (ii) multimedia content delivery (support of edge processing and virtualized network functions for video networking); (iii) multimedia content consumption and end-to-end aspects (player-triggered segment retransmissions to improve video playout quality); and (iv) novel QoE investigations (adaptive point cloud streaming). We will also put the work into the context of international multimedia systems research.
Quantum Communications Q&A with Gemini LLM. These are based on Shannon's Noisy channel Theorem and offers how the classical theory applies to the quantum world.
Sustainability requires ingenuity and stewardship. Did you know Pigging Solutions pigging systems help you achieve your sustainable manufacturing goals AND provide rapid return on investment.
How? Our systems recover over 99% of product in transfer piping. Recovering trapped product from transfer lines that would otherwise become flush-waste, means you can increase batch yields and eliminate flush waste. From raw materials to finished product, if you can pump it, we can pig it.
The Rise of Supernetwork Data Intensive ComputingLarry Smarr
Invited Remote Lecture to SC21
The International Conference for High Performance Computing, Networking, Storage, and Analysis
St. Louis, Missouri
November 18, 2021
Blockchain and Cyber Defense Strategies in new genre timesanupriti
Explore robust defense strategies at the intersection of blockchain technology and cybersecurity. This presentation delves into proactive measures and innovative approaches to safeguarding blockchain networks against evolving cyber threats. Discover how secure blockchain implementations can enhance resilience, protect data integrity, and ensure trust in digital transactions. Gain insights into cutting-edge security protocols and best practices essential for mitigating risks in the blockchain ecosystem.
Are you interested in learning about creating an attractive website? Here it is! Take part in the challenge that will broaden your knowledge about creating cool websites! Don't miss this opportunity, only in "Redesign Challenge"!
Implementations of Fused Deposition Modeling in real worldEmerging Tech
The presentation showcases the diverse real-world applications of Fused Deposition Modeling (FDM) across multiple industries:
1. **Manufacturing**: FDM is utilized in manufacturing for rapid prototyping, creating custom tools and fixtures, and producing functional end-use parts. Companies leverage its cost-effectiveness and flexibility to streamline production processes.
2. **Medical**: In the medical field, FDM is used to create patient-specific anatomical models, surgical guides, and prosthetics. Its ability to produce precise and biocompatible parts supports advancements in personalized healthcare solutions.
3. **Education**: FDM plays a crucial role in education by enabling students to learn about design and engineering through hands-on 3D printing projects. It promotes innovation and practical skill development in STEM disciplines.
4. **Science**: Researchers use FDM to prototype equipment for scientific experiments, build custom laboratory tools, and create models for visualization and testing purposes. It facilitates rapid iteration and customization in scientific endeavors.
5. **Automotive**: Automotive manufacturers employ FDM for prototyping vehicle components, tooling for assembly lines, and customized parts. It speeds up the design validation process and enhances efficiency in automotive engineering.
6. **Consumer Electronics**: FDM is utilized in consumer electronics for designing and prototyping product enclosures, casings, and internal components. It enables rapid iteration and customization to meet evolving consumer demands.
7. **Robotics**: Robotics engineers leverage FDM to prototype robot parts, create lightweight and durable components, and customize robot designs for specific applications. It supports innovation and optimization in robotic systems.
8. **Aerospace**: In aerospace, FDM is used to manufacture lightweight parts, complex geometries, and prototypes of aircraft components. It contributes to cost reduction, faster production cycles, and weight savings in aerospace engineering.
9. **Architecture**: Architects utilize FDM for creating detailed architectural models, prototypes of building components, and intricate designs. It aids in visualizing concepts, testing structural integrity, and communicating design ideas effectively.
Each industry example demonstrates how FDM enhances innovation, accelerates product development, and addresses specific challenges through advanced manufacturing capabilities.
Transcript: Details of description part II: Describing images in practice - T...BookNet Canada
This presentation explores the practical application of image description techniques. Familiar guidelines will be demonstrated in practice, and descriptions will be developed “live”! If you have learned a lot about the theory of image description techniques but want to feel more confident putting them into practice, this is the presentation for you. There will be useful, actionable information for everyone, whether you are working with authors, colleagues, alone, or leveraging AI as a collaborator.
Link to presentation recording and slides: https://bnctechforum.ca/sessions/details-of-description-part-ii-describing-images-in-practice/
Presented by BookNet Canada on June 25, 2024, with support from the Department of Canadian Heritage.
Fluttercon 2024: Showing that you care about security - OpenSSF Scorecards fo...Chris Swan
Have you noticed the OpenSSF Scorecard badges on the official Dart and Flutter repos? It's Google's way of showing that they care about security. Practices such as pinning dependencies, branch protection, required reviews, continuous integration tests etc. are measured to provide a score and accompanying badge.
You can do the same for your projects, and this presentation will show you how, with an emphasis on the unique challenges that come up when working with Dart and Flutter.
The session will provide a walkthrough of the steps involved in securing a first repository, and then what it takes to repeat that process across an organization with multiple repos. It will also look at the ongoing maintenance involved once scorecards have been implemented, and how aspects of that maintenance can be better automated to minimize toil.
Hire a private investigator to get cell phone recordsHackersList
Learn what private investigators can legally do to obtain cell phone records and track phones, plus ethical considerations and alternatives for addressing privacy concerns.
How RPA Help in the Transportation and Logistics Industry.pptxSynapseIndia
Revolutionize your transportation processes with our cutting-edge RPA software. Automate repetitive tasks, reduce costs, and enhance efficiency in the logistics sector with our advanced solutions.
Interaction Latency: Square's User-Centric Mobile Performance MetricScyllaDB
Mobile performance metrics often take inspiration from the backend world and measure resource usage (CPU usage, memory usage, etc) and workload durations (how long a piece of code takes to run).
However, mobile apps are used by humans and the app performance directly impacts their experience, so we should primarily track user-centric mobile performance metrics. Following the lead of tech giants, the mobile industry at large is now adopting the tracking of app launch time and smoothness (jank during motion).
At Square, our customers spend most of their time in the app long after it's launched, and they don't scroll much, so app launch time and smoothness aren't critical metrics. What should we track instead?
This talk will introduce you to Interaction Latency, a user-centric mobile performance metric inspired from the Web Vital metric Interaction to Next Paint"" (web.dev/inp). We'll go over why apps need to track this, how to properly implement its tracking (it's tricky!), how to aggregate this metric and what thresholds you should target.
1. Snapdragon and Qualcomm branded products are products of Qualcomm Technologies, Inc. and/or its subsidiaries.
November 8, 2023
@QCOMResearch
Snapdragon and Qualcomm branded products are products of Qualcomm Technologies, Inc. and/or its subsidiaries.
Generative AI
at the edge
Joseph Soriaga
Senior Director, Technology
Qualcomm Technologies, Inc.
2. 2
Today’s
agenda
Why on-device
generative AI is key
Full-stack AI optimizations
for diffusion models —
Stable Diffusion
Full-stack AI optimizations
for large language models —
Llama 2
Hybrid AI technologies
and architectures
Q&A
2
3. 3
AIMET is a product of Qualcomm Innovation Center, Inc. Qualcomm AI Research is an initiative of Qualcomm Technologies, Inc.
LLM: Large language mode; LVM: Language vision model
Leading machine
learning research
for on-device AI
across the entire
spectrum of topics
Platform
research
Applied
research
Fundamental
research
AI research
Generative AI
G-CNN
Self-supervised
learning
Reinforcement
learning
Causality and
system-2
Model quantization,
compression, and NAS
HW-SW
co-design
Distillation of
generative models
Power
management
AI Model Efficiency
Toolkit (AIMET)
Deep learning
for 3D/geometry
Audio and video
compression
AI for wireless
and RF sensing
Energy-efficient
perception
AI for
chip design
On-device
learning
Bayesian
distributed learning
Graph and kernel
optimization
Federated
learning
Deep learning
for graphics
Video recognition
and prediction
Virtual AI
assistant
(LLM)
Diffusion-based
image generation
(LVM)
Voice UI
4. 4
Full-stack
AI research &
optimization
Model, hardware, and software
innovation across each layer
to accelerate AI applications
Early R&D and
technology inventions
essential to leading
the ecosystem forward
Transfer tech to commercial
teams and influence future
research with learnings
from deployment
Vision
Identify a problem
or need; establish
requirements
Ecosystem
collaboration
Collaborate and
drive the ecosystem
toward rapid
commercialization
at scale
~2-3
years
Model
quantization &
optimization
Develop tech & tools
to quantize weights and
modify architecture to run
efficiently on hardware
Software
compilation
Develop tech & tools
to improve graph-level
and kernel-level software
compilation performance
Proof of concept
Target teams integrate models
into final application for stable and
intuitive demonstration
Invention
Invent new methods
that set state-of-the-art
5. 5
World’s first
on-device demo of
Stable Diffusion
running on an
Android phone
1B+ parameter generative AI model
runs efficiently and interactively
Full-stack AI optimization to achieve sub-15
second latency for 20 inference steps
Enhanced privacy, security, reliability,
and cost with on-device processing
Fast development enabled by Qualcomm
AI Research and Qualcomm® AI Stack
At MWC
2023
6. 6
Text generation
(ChatGPT, Bard, Llama, etc.)
Image generation
(Stable Diffusion, MidJourney, etc.)
Code generation
(Codex, etc.)
6
Input
prompts
“Write a lullaby about
cats and dogs to help
a child fall asleep, include
a golden shepherd”
A great
lullaby is
created in
seconds
Real-life application
of this platform
• Communications,
• Journalism,
• Publishing,
• Creative writing
• Writing assistance
Input
prompts
“Super cute fluffy
cat warrior in armor”
Real-life application of this platform
• Advertisements
• Published
illustrations
• Corporate visuals
• Novel image
generation
Input
prompts
“Create code for a pool
cleaning website with
tab for cleaning, repairs,
and testimonials”
Real-life application
of this platform
• Web design
• Software development
• Coding
• Technology
A beautiful website
is created in seconds
What is
generative AI?
AI models that create new and
original content like text, images,
video, audio, or other data
Generative AI, foundational models,
and large language models are
sometimes used interchangeably
7. 7
7
Infrastructure
Cloud
Hyperscaler datacenters,
enterprise servers
Assistant app (using foundation models)
Vertical applications for consumers and knowledge
workers to assist with various tasks such
as writing content, coding, designing etc.
Tooling/orchestration
Developer tools and platforms for generative AI
Foundation model
Generic models
General purpose LLM
and others; exposed
functionality the APIs
Domain specific models
Purpose-specific model
development and/or
training (enterprise, pro
photo/video, simulated data)
Assistant app (using own model)
Vertical application implementation
from model (e.g., LLM) development
and training to user app
Machine learning apps
Labeling, training, model hub,
optimization, etc.
The
generative
AI ecosystem
stack
is allowing many
apps to proliferate
8. 8
8
Generative AI will impact use cases across device categories
Gen AI can help improve customer
and employee experience in retail,
such as providing recommendations
for inventory and store layout
“Suggest inventory
and store layout
changes to increase
user satisfaction
in the sports section”
IoT
Gen AI is transforming productivity
by composing emails, creating
presentations, and writing code
PC
Phone
“Make me
reservations for
a weekend getaway
at the place Bob
recommended”
Gen AI can
become a true
digital assistant
XR
Gen AI can
help create
immersive
3D virtual
worlds
based on
simple
prompts
Automotive
Gen AI can
be used for
ADAS/AD
to help improve
drive policy by
predicting the
trajectory and
behavior of
various agents
“Make me a status
presentation for
my boss based
on inputs from
my team”
9. 9
9
Generative AI with diffusion models for robotics path planning
9
Stable Diffusion
Denoising an image
with a diffusion model
Generating robot trajectories
Instead of diffusing an image
we diffuse a robot trajectory
10. 10
2024
2023
Assuming INT4 parameters
On-device AI
can support
a variety of
Gen AI models
A broad number of Gen AI
capabilities can run on device
using models that range from
1 to 10 billion parameters
We can run models with
over 1 billion parameters
on device today and
anticipate this growing to
over 10 billion parameters
in the coming months
0.1 1 10 100 1000
Collaborative robotics
Video understanding
Image understanding
Combinatorial optimization
Mathematical reasoning
Programming
Dialog and NLP
Text-to-image
Model size (billions of parameters)
11. 11
Knowledge
distillation
Create a smaller model
with fewer parameters
Run faster inference
on target deployment
Maintain prediction
quality close to
the teacher
Less training time
Training a smaller
“student” model
to mimic a larger
“teacher” model
Teacher model
Training data
Student model
Logits
Knowledge
distillation
Logits
Soft labels
Match logits of the models
to transfer teacher model
representation and minimize
distillation loss (KL divergence)
Output
Output
Cross
entropy
loss
Ground
truth
13. 13
Output image
VAE: Variational Auto Encoder;
CLIP: Contrastive Language-Image Pre-Training
What is
diffusion?
Reverse
diffusion
(subtract
noise or
denoise)
Forward
diffusion
(add noise)
Image
generation
Stable
Diffusion
architecture
UNet is the biggest component
model of Stable Diffusion
Many steps, often 20 or more,
are used for generating
high-quality images
Significant compute
is required
Input prompt
Stable Diffusion
(1B+ parameters)
CLIP text encoder
(123M parameters)
Scheduler UNet
(860M parameters)
VAE decoder
(50M parameters)
Step
Vase in Greek style with intricate patterns and design
14. 14
14
Original Stable Diffusion UNet
Pruning &
knowledge distillation
More efficient architecture design through pruning and knowledge distillation
Reducing UNet compute (FLOPs), model size, and peak memory usage
Efficient UNet
Convolutional
block
Attention
block
15. 15
15
DDIM: Denoising Diffusion Implicit Models; MSE: Mean-squared error
Step distillation for the DDIM scheduler
Teach the student model to achieve in one step what the teacher achieves in multiple steps
Teacher: 2 UNets
Student: 1 UNet
MSE loss
16. 16
16
FID↓ CLIP ↑ Inference latency
Baseline (SD-1.5)
Fast SD
17.14* 0.3037 5.05 seconds
20.08 0.3004 0.56 seconds
16
Fast
Stable
Diffusion
Reduces UNet
forward passes
to less than 20
Step
distillation
Combines conditional and
unconditional generation
Guidance
conditioning
Reduces compute
(FLOPs), model size,
peak memory usage
Efficient
UNet
Reparameterization from
epsilon to velocity space
for robust distillation
*: These results are not directly comparable since baseline Stable Diffusion was trained with over 20x larger dataset than fast Stable Diffusion. SD: Stable Diffusion
Our full-stack AI optimization of Stable Diffusion
significantly improves latency while maintaining accuracy
e-to-v
Baseline
Stable
Diffusion
speedup vs baseline
Stable Diffusion
9x
17. 17
17
Fast
Stable
Diffusion
Stable
Diffusion
V1.5
Similar image quality between our fast implementation and baseline model
Panoramic view of mountains
of Vestrahorn and perfect
reflection in shallow water,
soon after sunrise, Stokksnes,
South Iceland, Polar Regions,
natural lighting
A hyper realistic photo of a
beautiful cabin inside of a forest
and full of trees and plants, with
large aurora borealis in the sky
Underwater world, plants,
flowers, shells, creatures,
high detail, sharp focus, 4k
High quality colored pencil
sketch portrait of an anthro
furry fursona blue fox,
handsome eyes, sketch
doodles surrounding it, photo
of notebook sketch
Japanese garden
at wild life river and
mountain range,
highly detailed,
digital illustration
18. 18
World’s fastest AI
text-to-image
generative AI
on a phone
Takes less than 0.6 seconds for generating
512x512 images from text prompts
Efficient UNet architecture, guidance conditioning,
and step distillation
Full-stack AI optimization to achieve this
improvement
19. 19
LVM: Language vision model
AI acceleration on the Qualcomm®
Hexagon™ NPU of the Snapdragon® 8
Gen 3 Mobile Processor
Full-stack AI
optimization
for LVM
Runs completely
on the device
Significantly reduces
runtime latency and
power consumption
Continuously improves
the Qualcomm® AI Stack
Qualcomm® AI Engine direct
for improved performance and
minimized memory spillage
Knowledge distillation for pruning and
removing of attention blocks, resulting in
accurate model with improved performance
and power efficiency
Designing an efficient diffusion
model through knowledge
distillation for high accuracy
20. 20
20
LLMs are highly bandwidth limited rather than compute limited
Illustration of autoregressive language modeling
Single-token generation architecture of large languages models results in high memory bandwidth
Recite the first law of robotics
Recite the first law of robotics
A robot may not injure a human being
A robot may not injure a human
robot may not injure a human
A
Huge bandwidth
Each parameter of the
model must be read to
generate each token
(e.g., read 7B parameters
for Llama 7B to generate
a single token)
DRAM
NPU DDR
TCM
Transformer layer 1
Transformer layer N
Embeddings
LM head
LLM
21. 21
LLM quantization
motivations
LLM quantization
challenges
A 4x smaller model
(i.e., FP16 -> INT4)
Reduce memory
bandwidth and storage
Reduce latency
Reduce power consumption
Maintain accuracy of
FP published models
Post-training quantization
(PTQ) may not be accurate
enough for 4-bit
The training pipeline (e.g., data
or rewards) is not available for
quantization aware training (QAT)
Shrinking an LLM
for increased performance
while maintaining accuracy
is challenging
22. 22
1: Perplexity is average over several test sets, including wikitext and c4 (subset)
Quantization-aware training with knowledge distillation
Reduces memory footprint while solving quantization challenges of
maintaining model accuracy and the lack of original training pipeline
<1
Point increase
in perplexity1
<1%
Decrease in
accuracy
Construct a
training loop
that can run
two models
on the same
input data
Teacher
Llama-2-Chat 7B
[FP16]
Student
Llama-2-Chat 7B
[INT4]
Dataset
true labels
Teacher logits
Student logits
Loss1: KL loss
(Teacher soft logits,
student soft logits)
Loss2: Cross entropy
loss (True labels,
student hard logits)
KD loss function combines the KL divergence
loss and hard-label based CE loss
Hard logits
(no temperature)
Soft logits
(temperature = 4)
Classes
Probability
23. 23
Recite the first law of robotics
Recite the first law of robotics
A robot may not injure those human being
A robot may not injure a human
robot may not injure those human
A
Recite the first law of robotics
Recite the first law of robotics
A robot may not injure a human being
A robot may not injure a human
robot may not injure a human
A
A robot may
A robot may
not injure a
not injure
injure
a
a
Llama 2
Llama 2 draft
not
not
not
Speculative decoding
speeds up token rate by trading
off compute for bandwidth
A good draft model predicts
with a high acceptance rate
Draft model generates a few
speculative tokens at a time
Target model decides which
to accept in one pass
24. Train a significantly smaller draft
LLM for speculative decoding
while maintaining enough
accuracy is challenging
Small draft model
motivations
Small draft model
challenges
10x smaller draft model
than target model
Fast results
Reduce memory bandwidth,
storage, latency,
and power consumption
The training pipeline (e.g., data
or rewards) is not available
Cover multiple families,
e.g., 7B and 13B models
Match the distribution of the target
model for higher acceptance rate
25. 25
Speculative decoding provides speedup with no accuracy loss
Using our research techniques on Llama 2-7B Chat, we achieved
Upto
20
tokens per second
26. 26
AI assistant
enables basic
chat and
chat-assisted
apps on device
Orchestration across
different tasks based
on user query
Powered by
Llama 2 Chat (7B)
Voice UI with Snapdragon Voice
Activation and
Whisper-Small (244M)
Orchestrator
Task classification
Travel planner
API
Llama 2 Chat
(7B param)
User interface
Voice/text/browser
miniLM
(∼33M param)
Snapdragon Voice
Activation &
Whisper-Small
(∼244M param)
LM: Language model
28. 28
World’s fastest
Llama 2-7B
on a phone
Up to 20 tokens per second
Demonstrating both chat and
application interaction on
device
World’s first demonstration of
speculative decoding running
on a phone
At
Snapdragon
Summit
2023
29. 29
QAT: Quantization-aware training; LLM: Large language model
AI acceleration on the Qualcomm®
Hexagon™ NPU of the Snapdragon® 8
Gen 3 Mobile Processor
Full-stack AI
optimization
for LLM
Runs completely
on the device
Significantly reduces
runtime latency and
power consumption
Continuously improves
the Qualcomm® AI Stack
Qualcomm AI Engine direct
for improved performance and
minimized memory spillage
QAT with knowledge distillation for accurate
INT4 target LLM for improved performance
and power efficiency
Designing a good draft model for given
target model through knowledge distillation
for high acceptance and no accuracy loss
30. 30
30
1: Reuters 2023
Cloud economics will not allow generative AI to scale
Cost per query1
Gen AI applications
Coding assistant
Copy creation
Web search
Personal assistant
Image & video creation
Text summarization
Conversational chatbots
…
Billions of users
(e.g. web search)
Traditional Generative AI
~10x
31. 31
We are a leader in the
realization of the hybrid AI
Convergence of:
Wireless connectivity
Efficient computing
Distributed AI
Unlocking the data
that will fuel our digital
future and generative AI
To scale, the center of
gravity of AI processing is
moving to the edge
Central cloud Edge cloud On device
Hybrid AI
31
Cost
Energy
Reliability, latency,
& performance
Privacy & security
Personalization
32. 32
Device-centric
hybrid AI
On-device neural network
or rules-based arbiter will
decide where to run the model
More complex models will
use the cloud as needed
It will be seamless to the user
On-device neural network
or rules-based arbiter
Yes
Is cloud needed?
No
The device acts as
the anchor point
33. 33
33
33
ASR: Automatic speech recognition; CV: Computer vision; TTS: Text to speech
Device-sensing hybrid AI
The device acts as the eyes and ears
Simple
model
ASR, CV, TTS
Speech
Image/video
LLM
Text
Text answer
TTS
Speech
Image/video
LLM
Improved
prompt
Text answer
TTS
Advanced
model
ASR, CV, TTS
• Sensor and
human-machine
interface processing
run on device
• ASR, CV, TTS
• LLM runs in the cloud
• For advanced version,
an on-device orchestrator
uses on-device learning
and personal data to
provided improved
prompts to the LLM
34. 34
34
34
Joint-processing hybrid AI
Multi-token speculative decoding as an example
• LLMs are memory-bound
and produce a single token
per inference, reading in all
the weights
• The smaller draft model
runs on device, sequentially
• The larger target model
runs on the cloud, in
parallel and speculatively
• The good tokens
are accepted
• Results in net speedup
in tokens per unit time
and energy savings
Predict
draft model
Four tokens sequentially
computed on device
Accept
Average 2 to 3 are
correct and accepted
Verify
target model
Four tokens speculatively
computed in parallel in cloud
1 2 3 4
1
2
X
X
1
2
3
4
35. 35
On-device generative AI offers many
benefits
Generative AI is happening now on the
device
Our on-device AI leadership
is enabling generative AI to scale
Hybrid AI is the future
35