This document provides a technical introduction to large language models (LLMs). It explains that LLMs are based on simple probabilities derived from their massive training corpora, containing trillions of examples. The document then discusses several key aspects of how LLMs work, including that they function as a form of "lossy text compression" by encoding patterns and relationships in their training data. It also outlines some of the key elements in the architecture and training of the most advanced LLMs, such as GPT-4, focusing on their huge scale, transformer architecture, and use of reinforcement learning from human feedback.
This document discusses generative AI and its potential transformations and use cases. It outlines how generative AI could enable more low-cost experimentation, blur division boundaries, and allow "talking to data" for innovation and operational excellence. The document also references responsible AI frameworks and a pattern catalogue for developing foundation model-based systems. Potential use cases discussed include automated reporting, digital twins, data integration, operation planning, communication, and innovation applications like surrogate models and cross-discipline synthesis.
AI and ML Series - Introduction to Generative AI and LLMs - Session 1DianaGray10
Session 1
👉This first session will cover an introduction to Generative AI & harnessing the power of large language models. The following topics will be discussed:
Introduction to Generative AI & harnessing the power of large language models.
What’s generative AI & what’s LLM.
How are we using it in our document understanding & communication mining models?
How to develop a trustworthy and unbiased AI model using LLM & GenAI.
Personal Intelligent Assistant
Speakers:
📌George Roth - AI Evangelist at UiPath
📌Sharon Palawandram - Senior Machine Learning Consultant @ Ashling Partners & UiPath MVP
📌Russel Alfeche - Technology Leader RPA @qBotica & UiPath MVP
Let's talk about GPT: A crash course in Generative AI for researchersSteven Van Vaerenbergh
This talk delves into the extraordinary capabilities of the emerging technology of generative AI, outlining its recent history and emphasizing its growing influence on scientific endeavors. Through a series of practical examples tailored for researchers, we will explore the transformative influence of these powerful tools on scientific tasks such as writing, coding, data wrangling and literature review.
Unlocking the Power of Generative AI An Executive's Guide.pdfPremNaraindas1
Generative AI is here, and it can revolutionize your business. With its powerful capabilities, this technology can help companies create more efficient processes, unlock new insights from data, and drive innovation. But how do you make the most of these opportunities?
This guide will provide you with the information and resources needed to understand the ins and outs of Generative AI, so you can make informed decisions and capitalize on the potential. It covers important topics such as strategies for leveraging large language models, optimizing MLOps processes, and best practices for building with Generative AI.
Generative AI: Past, Present, and Future – A Practitioner's PerspectiveHuahai Yang
Generative AI: Past, Present, and Future – A Practitioner's Perspective
As the academic realm grapples with the profound implications of generative AI
and related applications like ChatGPT, I will present a grounded view from my
experience as a practitioner. Starting with the origins of neural networks in
the fields of logic, psychology, and computer science, I trace its history and
align it within the wider context of the pursuit of artificial intelligence.
This perspective will also draw parallels with historical developments in
psychology. Against this backdrop, I chart a proposed trajectory for the future.
Finally, I provide actionable insights for both academics and enterprising
individuals in the field.
Large Language Models, No-Code, and Responsible AI - Trends in Applied NLP in...David Talby
An April 2023 presentation to the AMIA working group on natural language processing. The talk focuses on three current trends in NLP and how they apply in healthcare: Large language models, No-code, and Responsible AI.
For this plenary talk at the Charlotte AI Institute for Smarter Learning, Dr. Cori Faklaris introduces her fellow college educators to the exciting world of generative AI tools. She gives a high-level overview of the generative AI landscape and how these tools use machine learning algorithms to generate creative content such as music, art, and text. She then shares some examples of generative AI tools and demonstrate how she has used some of these tools to enhance teaching and learning in the classroom and to boost her productivity in other areas of academic life.
This document provides a 50-hour roadmap for building large language model (LLM) applications. It introduces key concepts like text-based and image-based generative AI models, encoder-decoder models, attention mechanisms, and transformers. It then covers topics like intro to image generation, generative AI applications, embeddings, attention mechanisms, transformers, vector databases, semantic search, prompt engineering, fine-tuning foundation models, orchestration frameworks, autonomous agents, bias and fairness, and recommended LLM application projects. The document recommends several hands-on exercises and lists upcoming bootcamp dates and locations for learning to build LLM applications.
The Future of AI is Generative not Discriminative 5/26/2021Steve Omohundro
The deep learning AI revolution has been sweeping the world for a decade now. Deep neural nets are routinely used for tasks like translation, fraud detection, and image classification. PwC estimates that they will create $15.7 trillion/year of value by 2030. But most current networks are "discriminative" in that they directly map inputs to predictions. This type of model requires lots of training examples, doesn't generalize well outside of its training set, creates inscrutable representations, is subject to adversarial examples, and makes knowledge transfer difficult. People, in contrast, can learn from just a few examples, generalize far beyond their experience, and can easily transfer and reuse knowledge. In recent years, new kinds of "generative" AI models have begun to exhibit these desirable human characteristics. They represent the causal generative processes by which the data is created and can be compositional, compact, and directly interpretable. Generative AI systems that assist people can model their needs and desires and interact with empathy. Their adaptability to changing circumstances will likely be required by rapidly changing AI-driven business and social systems. Generative AI will be the engine of future AI innovation.
The document discusses using generative AI to improve learning products by making them better, stronger, and faster. It provides examples of using generative models for game creation, runtime design, and postmortem data analysis. It also addresses ethics and copyright challenges and considers generative AI as both a tool and potential friend. The document explores what models are, how they work, examples of applications, and resources for staying up to date on generative AI advances.
* "Responsible AI Leadership: A Global Summit on Generative AI"
*April 2023 guide for experts and policymakers
* Developing and governing generative AI systems
* + 100 thought leaders and practitioners participated
* Recommendations for responsible development, open innovation & social progress
* 30 action-oriented recommendations aim
* Navigate AI complexities
This presentation presents an overview of the challenges and opportunities of generative artificial intelligence in Web3. It includes a brief research history of generative AI as well as some of its immediate applications in Web3.
generative-ai-fundamentals and Large language modelsAdventureWorld5
Thank you for the detailed review of the protein bars. I'm glad to hear you and your family are enjoying them as a healthy snack and meal replacement option. A couple suggestions based on your feedback:
- For future orders, you may want to check the expiration dates to help avoid any dried out bars towards the end of the box. Freshness is key to maintaining the moist texture.
- When introducing someone new to the bars, selecting one in-person if possible allows checking the flexibility as an indicator it's moist inside. This could help avoid a disappointing first impression from a dry sample.
- Storing opened boxes in an airtight container in the fridge may help extend the freshness even further when you can't
The document discusses generative AI and how it has evolved from earlier forms of AI like artificial intelligence, machine learning, and deep learning. It explains key concepts like generative adversarial networks, large language models, transformers, and techniques like reinforcement learning from human feedback and prompt engineering that are used to develop generative AI models. It also provides examples of using generative AI for image generation using diffusion models and how Stable Diffusion differs from earlier diffusion models by incorporating a text encoder and variational autoencoder.
This document provides information about a bootcamp to build applications using Large Language Models (LLMs). The bootcamp consists of 11 modules covering topics such as introduction to generative AI, text analytics techniques, neural network models for natural language processing, transformer models, embedding retrieval, semantic search, prompt engineering, fine-tuning LLMs, orchestration frameworks, the LangChain application platform, and a final project to build a custom LLM application. The bootcamp will be held in various locations and dates between September 2023 and January 2024.
Build an LLM-powered application using LangChain.pdfAnastasiaSteele10
LangChain is an advanced framework that allows developers to create language model-powered applications. It provides a set of tools, components, and interfaces that make building LLM-based applications easier. With LangChain, managing interactions with language models, chaining together various components, and integrating resources like APIs and databases is a breeze. The platform includes a set of APIs that can be integrated into applications, allowing developers to add language processing capabilities without having to start from scratch.
A non-technical overview of Large Language Models, exploring their potential, limitations, and customization for specific challenges. While this deck is tailored for an audience from the financial industry in mind, its content remains broadly applicable.
(This updated version builds on our previous deck: slideshare.net/LoicMerckel/intro-to-llms.)
A brief introduction to generative models in general is given, followed by a succinct discussion about text generation models and the "Transformer" architecture. Finally, the focus is set on a non-technical discussion about ChatGPT with a selection of recent news articles.
Presented at All Things Open RTP Meetup
Presented by Karthik Uppuluri, Fidelity
Title: Generative AI
Abstract: In this session, let us embark on a journey into the fascinating world of generative artificial intelligence. As an emergent and captivating branch of machine learning, generative AI has become instrumental in myriad of sectors, ranging from visual arts to creating software for technological solutions. This session requires no prior expertise in machine learning or AI. It aims to inculcate a robust understanding of fundamental concepts and principles of generative AI and its diverse applications. Join us as we delve into the mechanics of this transformative technology and unpack its potential.
Usage of AI and machine learning models is likely to become more commonplace as larger swaths of the economy embrace automation and data-driven decision-making. While these predictive systems can be quite accurate, they have been treated as inscrutable black boxes in the past, that produce only numeric predictions with no accompanying explanations. Unfortunately, recent studies and recent events have drawn attention to mathematical and sociological flaws in prominent weak AI and ML systems, but practitioners usually don’t have the right tools to pry open machine learning black-boxes and debug them.
This presentation introduces several new approaches to that increase transparency, accountability, and trustworthiness in machine learning models. If you are a data scientist or analyst and you want to explain a machine learning model to your customers or managers (or if you have concerns about documentation, validation, or regulatory requirements), then this presentation is for you!
Breaking down the AI magic of ChatGPT: A technologist's lens to its powerful ...rahul_net
ChatGPT has taken the world of natural language processing by storm, and as an experienced AI practitioner, enterprise architect, and technologist with over two decades of experience, I'm excited to share my insights on how this innovative powerhouse is designed from an AI components perspective. In this post, I'll provide a fresh take on the key components that make ChatGPT a powerful conversational AI tool, including its use of the Transformer architecture, pre-training on large amounts of text data, and fine-tuning with human feedback. With ChatGPT's massive success, there's no doubt that it's changing the way we think about language and conversation. So, whether you're a seasoned pro or new to the world of AI, my post will provide a valuable perspective on this fascinating technology. Check out my slides to learn more!
Hacking Predictive Modeling - RoadSec 2018HJ van Veen
This document provides an overview of machine learning and predictive modeling techniques for hackers and data scientists. It discusses foundational concepts in machine learning like functionalism, connectionism, and black box modeling. It also covers practical techniques like feature engineering, model selection, evaluation, optimization, and popular Python libraries. The document encourages an experimental approach to hacking predictive models through techniques like brute forcing hyperparameters, fuzzing with data permutations, and social engineering within data science communities.
Machine Learning Interpretability - Mateusz Dymczyk - H2O AI World London 2018Sri Ambati
This talk was recorded in London on Oct 30, 2018 and can be viewed here: https://youtu.be/p4iAnxwC_Eg
The good news is building fair, accountable, and transparent machine learning systems is possible. The bad news is it’s harder than many blogs and software package docs would have you believe. The truth is nearly all interpretable machine learning techniques generate approximate explanations, that the fields of eXplainable AI (XAI) and Fairness, Accountability, and Transparency in Machine Learning (FAT/ML) are very new, and that few best practices have been widely agreed upon. This combination can lead to some ugly outcomes!
This talk aims to make your interpretable machine learning project a success by describing fundamental technical challenges you will face in building an interpretable machine learning system, defining the real-world value proposition of approximate explanations for exact models, and then outlining the following viable techniques for debugging, explaining, and testing machine learning models
Mateusz is a software developer who loves all things distributed, machine learning and hates buzzwords. His favourite hobby data juggling.
He obtained his M.Sc. in Computer Science from AGH UST in Krakow, Poland, during which he did an exchange at L’ECE Paris in France and worked on distributed flight booking systems. After graduation he move to Tokyo to work as a researcher at Fujitsu Laboratories on machine learning and NLP projects, where he is still currently based.
Basics of Generative AI: Models, Tokenization, Embeddings, Text Similarity, V...Robert McDermott
This document provides an overview of natural language processing techniques like language modeling, tokenization, embeddings, and semantic similarity. It discusses the basics of these concepts and how they relate to each other, such as how tokenization is used as a preprocessing step and embeddings are used to capture semantic meaning and relationships that allow measuring text similarity. It also presents examples to illustrate these techniques in action.
Basics of Generative AI: Models, Tokenization, Embeddings, Text Similarity, V...Robert McDermott
This document provides an overview of natural language processing techniques like language modeling, tokenization, embeddings, and semantic similarity. It discusses the basics of these concepts and how they relate to each other, such as how tokenization is used as a preprocessing step and embeddings are used to capture semantic meaning and relationships that allow measuring text similarity. It also presents examples of projects that utilize these techniques, such as a document retrieval system that finds similar texts using embeddings and a vector database.
Using Generative AI in the Classroom .pptxJonathanDietz3
Here are some key ethical issues to consider when using generative AI like ChatGPT in the classroom:
1. Accuracy and reliability of information. Students may take generative AI outputs as fact without verifying the information. Teachers need to emphasize to students that AI systems can be wrong or generate implausible responses.
2. Bias and unfair treatment. As the systems are trained on human-created data, they risk perpetuating biases in that data if not developed carefully. Teachers should be aware of potential biases.
3. Privacy and consent. Student data used to improve systems raises privacy issues. Systems should not collect private student data without permission.
4. Authorship and ownership. It may not be clear
This document provides an overview of a machine learning course. It outlines the course structure, including topics covered, assignments, and grading. The course covers fundamental machine learning algorithms for classification, regression, clustering, and dimensionality reduction. It also discusses applications of machine learning like spam filtering, recommender systems, and chess playing computers.
This document provides an overview of a machine learning course. It outlines the course structure, including topics covered, assignments, and grading. The course covers fundamental machine learning algorithms for classification, regression, clustering, and dimensionality reduction. It also discusses applications of machine learning like spam filtering, recommender systems, and chess playing programs.
More information, visit: http://www.godatadriven.com/accelerator.html
Data scientists aren’t a nice-to-have anymore, they are a must-have. Businesses of all sizes are scooping up this new breed of engineering professional. But how do you find the right one for your business?
The Data Science Accelerator Program is a one year program, delivered in Amsterdam by world-class industry practitioners. It provides your aspiring data scientists with intensive on- and off-site instruction, access to an extensive network of speakers and mentors and coaching.
The Data Science Accelerator Program helps you assess and radically develop the skills of your data science staff or recruits.
Our goal is to deliver you excellent data scientists that help you become a data driven enterprise.
The right tools
We teach your organisation the proven data science tools.
The right hands
We are trusted by many industry leading partners.
The right experience
We've done big data and data science at many clients, we know what the real world is like.
The right experts
We have a world class selection of lecturers that you will be working with.
Vincent D. Warmerdam
Jonathan Samoocha
Ivo Everts
Rogier van der Geer
Ron van Weverwijk
Giovanni Lanzani
The right curriculum
We meet twice a month. Once for a lecture, once for a hackathon.
Lectures
The RStudio stack.
The art of simulation.
The iPython stack.
Linear modelling.
Operations research.
Nonlinear modelling.
Clustering & ensemble methods.
Natural language processing.
Time series.
Visualisation.
Scaling to big data.
Advanced topics.
Hackathons
Scrape and mine the internet.
Solving multiarmed bandit problems.
Webdev with flask and pandas as a backend.
Build an automation script for linear models.
Build a heuristic tsp solver.
Code review your automation for nonlinear models.
Build a method that outperforms random forests.
Build a markov chain to generate song lyrics.
Predict an optimal portfolio for the stock market.
Create an interactive d3 app with backend.
Start up a spark cluster with large s3 data.
You pick!
Interested?
Ping us here. signal@godatadriven.com
Chatbots are growing in popularity as developers face the
limitations of the mobile app. User interfaces that simulate a human
conversation, the history of chatbots goes back to the late 18th
century. I'll take you on a tour of that history with an eye on finding
insights on what is possible today and in the near future with chatbots.
Issues Covered: Amazon Alexa, Facebook Messenger Chatbots, Alan
Turing, and much more.
The document discusses developing an open domain chatbot using sequence modeling and machine translation techniques. It provides background on early rule-based chatbots and modern data-driven approaches. The proposed methodology collects data, performs word embeddings, uses an encoder-decoder model with attention to generate responses, and evaluates the model using metrics like F1 score.
This is a general presentation that is appropriate for anyone that is just learning concepts of semantic integration. This presentation covers some of the background concepts underlying semantics (Ogden\'s Semantic Triangle), lexical and conceptual mapping, metadata registries, metadata discovery and semantic thinking. Excellent for an introductory class in business semantics.
This document summarizes an introductory session on programming in the digital humanities. It discusses how programming involves complex work in figuring out what to do and which languages to use. Examples are provided of tasks a programming language could perform on text data, like finding quotes from a novel or allowing a user to search a text file. The document emphasizes that critical thinking is important to programming in the humanities. It also discusses different ways of structuring data, such as with markup languages like HTML and TEI, or in a structured format like a database. The goal is to make data understandable to computers while retaining its usefulness. Collaboration is important when creating structured data.
This document provides a SWOT (Strengths, Weaknesses, Opportunities, Threats) analysis of open-source software (OSS). It notes that while OSS has strengths like low costs and a large community, its biggest challenge is supporting software without strict governance rules. OSS also faces threats from established institutions resistant to change and past perceptions that liken it to outdated systems. The document advocates for "next generation" library catalogs that go beyond just finding information to helping users understand content through services like analyzing word frequencies, phrases and numeric metadata.
Writing Machines: Detection and Stylometric ProfilingGeorgeMikros3
The document discusses techniques for detecting artificially generated text and summarizing key differences between human and AI writing styles. It outlines several statistical and linguistic features that can help discriminate between the two, such as average word length, sentence length, part-of-speech frequencies, sentiment analysis, and personal pronoun use. The document also examines recent research showing humans are not always accurate at distinguishing human and AI-generated text. Large language models like GPT-3 are able to mimic human writing styles closely, highlighting both the promise and risks of advanced AI systems.
Similar to How Does Generative AI Actually Work? (a quick semi-technical introduction to Large Language Models) (20)
Are you wondering how to migrate to the Cloud? At the ITB session, we addressed the challenge of managing multiple ColdFusion licenses and AWS EC2 instances. Discover how you can consolidate with just one EC2 instance capable of running over 50 apps using CommandBox ColdFusion. This solution supports both ColdFusion flavors and includes cb-websites, a GoLang binary for managing CommandBox websites.
Join me for an insightful journey into task scheduling within the ColdBox framework. In this session, we explored how to effortlessly create and manage scheduled tasks directly in your code, enhancing control and efficiency in applications and modules. Attendees experienced a user-friendly dashboard for seamless task management and monitoring. Whether you're experienced with ColdBox or new to it, this session provided practical knowledge and tips to streamline your development workflow.
What is OCR Technology and How to Extract Text from Any Image for FreeTwisterTools
Discover the fascinating world of Optical Character Recognition (OCR) technology with our comprehensive presentation. Learn how OCR converts various types of documents, such as scanned paper documents, PDFs, or images captured by a digital camera, into editable and searchable data. Dive into the history, modern applications, and future trends of OCR technology. Get step-by-step instructions on how to extract text from any image online for free using a simple tool, along with best practices for OCR image preparation. Ideal for professionals, students, and tech enthusiasts looking to harness the power of OCR.
Lots of bloggers are using Google AdSense now. It’s getting really popular. With AdSense, bloggers can make money by showing ads on their websites. Read this important article written by the experienced designers of the best website designing company in Delhi –
Break data silos with real-time connectivity using Confluent Cloud Connectorsconfluent
Connectors integrate Apache Kafka® with external data systems, enabling you to move away from a brittle spaghetti architecture to one that is more streamlined, secure, and future-proof. However, if your team still spends multiple dev cycles building and managing connectors using just open source Kafka Connect, it’s time to consider a faster and cost-effective alternative.
CommandBox was highlighted as a powerful web hosting solution, perfect for developers and businesses alike. Featuring a built-in server and command-line interface, CommandBox simplified web application management. Developers could deploy multiple application instances simultaneously, optimizing development workflows. CommandBox's efficient deployment processes ensured reliable web hosting, seamlessly integrating into existing workflows for scalability and feature enhancements.
WhatsApp Tracker - Tracking WhatsApp to Boost Online Safety.pdfonemonitarsoftware
WhatsApp Tracker Software is an effective tool for remotely tracking the target’s WhatsApp activities. It allows users to monitor their loved one’s online behavior to ensure appropriate interactions for responsive device use.
Download this PPTX file and share this information to others.
COMPSAC 2024 D&I Panel: Charting a Course for Equity: Strategies for Overcomi...Hironori Washizaki
Hironori Washizaki, "Charting a Course for Equity: Strategies for Overcoming Challenges and Promoting Inclusion in the Metaverse", IEEE COMPSAC 2024 D&I Panel, 2024.
Sami provided a beginner-friendly introduction to Amazon Web Services (AWS), covering essential terms, products, and services for cloud deployment. Participants explored AWS' latest Gen AI offerings, making it accessible for those starting their cloud journey or integrating AI into coding practices.
AI Chatbot Development – A Comprehensive Guide .pdfayushiqss
Discover how generative AI is transforming IT development in this blog. Learn how using AI software development, artificial intelligence tools, and generative AI tools can lead to smarter, faster, and more efficient software creation. Explore real-world applications and see how these technologies are driving innovation and cutting costs in IT development.
How Does Generative AI Actually Work? (a quick semi-technical introduction to Large Language Models)
1. How Does Generative AI Actually Work?
(a quick semi-technical introduction to Large Language Models)
William H. Press
Department of Computer Science
The University of Texas at Austin
May 12, 2023
2. What word
comes after
“Happy…” Is it “aardvark”? Or “birthday”?
Simple probabilities (but trillions of them) are
at the root of all large language models (LLMs)
3. Me: ChatGPT, I want to explain sequential word guessing to someone. If a
sentence starts with "The boy" can you tell me what your next 10
sequential guesses of words would be, one word at a time?
Sure! Here are 10 sequential word guesses, one at a time, based on the sentence
starting with "The boy":
1. went, 2. to, 3. the, 4. park, 5. to, 6. play, 7. with, 8. his, 9.friends, 10. and
Keep in mind that as a language model, my guesses are based on patterns and
probabilities derived from the text corpus I was trained on. The actual guesses may
vary depending on the specific context and the model version being used.
What if the sentence instead begins with "The secret behind GPT-4 is"?
Here are 10 sequential word guesses:
1. "its", 2. "ability", 3. "to", 4. "understand", 5. "and", 6. "generate", 7. "human-
like", 8. "text", 9. "based", 10. "on“
Again, these are just guesses based on common patterns and do not represent an
actual sentence.
The distant ancestor of all chatbots is sequential word-
guessing in natural languages. Hence “generative”.
4. LLMs as “lossy text compression”
• Represent training corpus more compactly by finding and
encoding its structures and relationships
• eliminate redundancy at all levels: syntactic, semantic, multiple
sources, etc.
• the result encoded in ~109-1012 matrix “weights”
• “Lossy” because information is irretrievably lost
• prompts are answered by (sort-of) decompressing into highly
probable responses that could have been in the training data, but,
in general, weren’t exactly so verbatim
• The decompressed data maintains accuracy when…
• it is “common sense” or “conventional wisdom”
• because then a huge redundancy in the training data
• But can be wildly inaccurate (like “digital artifacts” in a
defective video) if query is not well represented in the
compressed training corpus
• e.g., most probable answer comes from one (wrong) document
• or variants of a widely circulated conspiracy theory
• if uncompressing from no germane data at all, it just makes things
up (“hallucinates”) to get the most probable response
https://www.newyorker.com/tech/annals-of-technology/chatgpt-is-a-blurry-jpeg-of-the-web
5. GPT-4: OpenAI’s latest released Large Language Model
• OpenAI isn’t actually open! Can for many purposes be
thought of as practically a Microsoft subsidiary.
• Microsoft is said to provide the unique hardware infrastructure for
OpenAI algorithm development.
• GPT = Generative Pre-trained Transformer
• Thought to have >5x1011 trainable parameters.
• GPT-3 had 1.75x1011
• Trained on > several terabytes of language data
• Training cost claimed to be $100 million
• but this might be including amortized R&D
• once trained, cost per query is millicents per token
• I will highlight three key elements of “secret sauce”:
• 1. transformer architecture
• 2. huge scale of parameter space training corpus
• 3. “RLHF” Reinforcement Learning from Human Feedback
• mostly not reported on
6. Key #1: Transformer architecture. It is a distant
descendant of document query concepts
Document retrieval:
• Input text projected onto matrix of
possible queries
• Matrix multiply to cross queries
with keys (e.g., keywords)
• Matrix multiply to map result from
keys to values (e.g., documents)
• The brilliant idea of Vaswani et al.
(2017, “Attention Is All You Need”) is
map all of Q, K, V from the same input.
• This is “Self-Attention”
• And have all of Q, K, V learned.
• Many layers allows attention to many
different levels of structure
simultaneously
• This is “Multi-headed”
https://dugas.ch/artificial_curiosity/GPT_architecture.html
input
processing
stuff
(encoder)
output
processing
stuff
(decoder)
~ 103 layers?
7. Key #2: Sheer scale: Only a few dare to call it
emergence, but the gain-of-function is striking
• Transformer parameters:
• trillion parameters =(?) 1000 parallel instances of billion
• billion parameters in an instance =(?) 104 each query space, key space,
value space (multiply two at a time) + “glue”
• could think of as looking at every token list 107 ways in formulating the
next response
• “stateless”: looks at whole previous dialogue as a new token list, maximum
length 32768 tokens
• Training corpus parameters:
• many terabytes?
• ~1000x number of words a human hears or reads in a lifetime
• Many proprietary tricks:
• how to propagate gradients through the huge scale?
• how to maximize parallelism in training?
• special hardware?
• rumored that GPT-5 training is on hold because not enough GPUs are
obtainable in the world.
8. Key #3: Reinforcement Learning with
Human Feedback (RLHF)
• Said to involve 40 contractors with (each?) hundreds of workers.
• 1. Humans generate an initial dataset of “typical” queries and “good” responses.
• 2. Humans rank a separate dataset of responses (some good, some bad).
• 3. The reward model is trained on this data.
• 4. The language model trains itself to satisfy the reward model. (How often?)
• 5. The reward model is retrained with (free!) input from hundreds of millions of
users. (How often?)
user
prompt
“most-human”
output
many
output
guesses
feedback
https://aisupremacy.substack.com/p/
what-is-reinforcement-learning-with
https://www.bradknox.net/
human-reward/tamer/
9. In typical usage, RLHF doesn't directly train the base GPT model. The GPT
model is first pretrained on a large corpus of text data, learning to predict
the next word in a sentence. This pretraining process is unsupervised and
uses a variant of the Transformer architecture to learn patterns and
structures in the data.
After this pretraining, the model can be fine-tuned using reinforcement
learning from human feedback. In this process, human evaluators rank
different responses generated by the model, and these rankings are used
to create a reward model. The model is then fine-tuned to optimize this
reward model, typically using a method like Proximal Policy Optimization.
In this way, the RLHF is a separate step that fine-tunes the pretrained GPT
model, rather than directly training the initial GPT weights. However,
please note that this information is based on the methodology up to
September 2021, and OpenAI's approach may have evolved since then.
Me: ChatGPT, When GPT-3 or GPT-4 uses RLHF, is it a separate model, or
does RLHF train the GPT weights directly?
10. “There is no moat?” (Some insiders think that the technology
can be miniaturized and will leak via open source availability to
individuals and thence startups.)
• Meta/Facebook’s entire LLM was leaked online in March, 2023.
• and later released publicly
• A May, 2023, leaked Google document (one engineer’s private
view):
• People are now running foundation models on a Pixel 6 at 5 tokens/s
• You can fine-tune a personalized AI on your laptop in an evening.
• Open-source models are [he says] faster, more customizable, more
private, and pound-for-pound more capable.
• “They are doing things with $100 and 13B params that we struggle with
at $10M and 540B.”
• LORa (low-rank factorization of large models) saves factor ~103
• and is composable to train new models, no need to go back to full model
• Big companies might be held liable for copyright infringement on
training data, or have to buy it
• Open-source leaked models might (de-facto) not be
• Scenario: A bunch of (LORa?) matrices appear on a Discord server
11. Me: ChatGPT, To what extent can LLMs be miniaturized, for example by low
rank factorization?
As of my last update in September 2021, there's been some promising
research showing that low-rank factorization can be used to compress neural
networks, including Transformer-based models. Weight matrices in the
Transformer's self-attention layers can be approximated well with low-rank
matrices, which can significantly reduce the model's size with minimal loss in
performance.
Low-rank factorization is just one of many techniques for model
compression. Others include knowledge distillation, where a smaller model is
trained to mimic the behavior of a larger model, and pruning, where less
important parts of the model are removed.
12. Me: ChatGPT, Is LLM training data subject to copyright and thus needs to be
licensed?
The use of copyrighted material in training LLMs may be considered a
copyright infringement. The models learn by consuming vast amounts of text
data which may include copyrighted works. By using this information to
generate new text, the models are indirectly reproducing and repurposing
parts of the copyrighted works they were trained on, albeit in a non-
verbatim manner. Some argue that this constitutes a derivative work, which
is a right reserved for the copyright holder under U.S. Copyright Law and
international copyright agreements. The copyright holder has exclusive rights
to reproduce and create derivative works from their material.
The counter-argument hinges on the doctrine of fair use. Advocates can
argue that LLMs, which generate new, original text, use copyrighted material
in a transformative manner, this a crucial consideration in fair use
determinations. The vast amount of training data also means any specific
copyrighted work constitutes an insubstantial portion of the input. Lastly, this
use doesn't harm the market for the original works but may encourage
further knowledge creation. However, as of September 2021, these complex
legal questions lack clear legal precedent and interpretations may vary across
jurisdictions.
13. The LLM Value Chain:
figure courtesy W. Dally
May turn out that most value is with the end-service provider
who specializes the model and knows the end-customer needs.