The document discusses advances and challenges in model evaluation and summarizes a presentation on this topic. It provides an overview of the growing landscape of natural language processing (NLP) models, including their usage trends over time. There is a lack of documentation for most models, with only 50% having model cards despite contributing 98% of usage. The presentation proposes a randomized controlled trial to study whether improving model documentation could increase usage by adding documentation to a treatment group of models and comparing their usage to an undocumented control group. The goal is to provide more transparency and drive better model communication and reproducibility.
How Does Generative AI Actually Work? (a quick semi-technical introduction to...ssuser4edc93
This document provides a technical introduction to large language models (LLMs). It explains that LLMs are based on simple probabilities derived from their massive training corpora, containing trillions of examples. The document then discusses several key aspects of how LLMs work, including that they function as a form of "lossy text compression" by encoding patterns and relationships in their training data. It also outlines some of the key elements in the architecture and training of the most advanced LLMs, such as GPT-4, focusing on their huge scale, transformer architecture, and use of reinforcement learning from human feedback.
This document provides information about a bootcamp to build applications using Large Language Models (LLMs). The bootcamp consists of 11 modules covering topics such as introduction to generative AI, text analytics techniques, neural network models for natural language processing, transformer models, embedding retrieval, semantic search, prompt engineering, fine-tuning LLMs, orchestration frameworks, the LangChain application platform, and a final project to build a custom LLM application. The bootcamp will be held in various locations and dates between September 2023 and January 2024.
AI and ML Series - Introduction to Generative AI and LLMs - Session 1DianaGray10
Session 1
👉This first session will cover an introduction to Generative AI & harnessing the power of large language models. The following topics will be discussed:
Introduction to Generative AI & harnessing the power of large language models.
What’s generative AI & what’s LLM.
How are we using it in our document understanding & communication mining models?
How to develop a trustworthy and unbiased AI model using LLM & GenAI.
Personal Intelligent Assistant
Speakers:
📌George Roth - AI Evangelist at UiPath
📌Sharon Palawandram - Senior Machine Learning Consultant @ Ashling Partners & UiPath MVP
📌Russel Alfeche - Technology Leader RPA @qBotica & UiPath MVP
Large Language Models, No-Code, and Responsible AI - Trends in Applied NLP in...David Talby
An April 2023 presentation to the AMIA working group on natural language processing. The talk focuses on three current trends in NLP and how they apply in healthcare: Large language models, No-code, and Responsible AI.
Build an LLM-powered application using LangChain.pdfAnastasiaSteele10
LangChain is an advanced framework that allows developers to create language model-powered applications. It provides a set of tools, components, and interfaces that make building LLM-based applications easier. With LangChain, managing interactions with language models, chaining together various components, and integrating resources like APIs and databases is a breeze. The platform includes a set of APIs that can be integrated into applications, allowing developers to add language processing capabilities without having to start from scratch.
A non-technical overview of Large Language Models, exploring their potential, limitations, and customization for specific challenges. While this deck is tailored for an audience from the financial industry in mind, its content remains broadly applicable.
(This updated version builds on our previous deck: slideshare.net/LoicMerckel/intro-to-llms.)
A brief introduction to generative models in general is given, followed by a succinct discussion about text generation models and the "Transformer" architecture. Finally, the focus is set on a non-technical discussion about ChatGPT with a selection of recent news articles.
The document discusses advances in large language models from GPT-1 to the potential capabilities of GPT-4, including its ability to simulate human behavior, demonstrate sparks of artificial general intelligence, and generate virtual identities. It also provides tips on how to effectively prompt ChatGPT through techniques like prompt engineering, giving context and examples, and different response formats.
A non-technical overview of Large Language Models, exploring their potential, limitations, and customization for specific challenges. While this deck is tailored for an audience from the financial industry in mind, its content remains broadly applicable.
(Note: Discover a slightly updated version of this deck at slideshare.net/LoicMerckel/introduction-to-llms.)
Unlocking the Power of Generative AI An Executive's Guide.pdfPremNaraindas1
Generative AI is here, and it can revolutionize your business. With its powerful capabilities, this technology can help companies create more efficient processes, unlock new insights from data, and drive innovation. But how do you make the most of these opportunities?
This guide will provide you with the information and resources needed to understand the ins and outs of Generative AI, so you can make informed decisions and capitalize on the potential. It covers important topics such as strategies for leveraging large language models, optimizing MLOps processes, and best practices for building with Generative AI.
Leveraging Generative AI & Best practicesDianaGray10
In this event we will cover:
- What is Generative AI and how it is being for future of work.
- Best practices for developing and deploying generative AI based models in productions.
- Future of Generative AI, how generative AI is expected to evolve in the coming years.
The document discusses various topics related to artificial intelligence including machine learning, large language models, neural networks, generative bots, ChatGPT, and Midjourney. It describes how AI is being used in applications such as healthcare, customer service, and content creation. The future of AI is explored with possibilities such as more integrated virtual assistants and personalized healthcare through processing of large amounts of medical data.
An Introduction to Generative AI - May 18, 2023CoriFaklaris1
For this plenary talk at the Charlotte AI Institute for Smarter Learning, Dr. Cori Faklaris introduces her fellow college educators to the exciting world of generative AI tools. She gives a high-level overview of the generative AI landscape and how these tools use machine learning algorithms to generate creative content such as music, art, and text. She then shares some examples of generative AI tools and demonstrate how she has used some of these tools to enhance teaching and learning in the classroom and to boost her productivity in other areas of academic life.
As an AI language model, ChatGPT is a program consisting of a large neural network that has been trained on vast amounts of textual data. Specifically, ChatGPT is a variant of the GPT (Generative Pre-trained Transformer) family of models developed by OpenAI.
Generative AI models, such as ChatGPT and Stable Diffusion, can create new and original content like text, images, video, audio, or other data from simple prompts, as well as handle complex dialogs and reason about problems with or without images. These models are disrupting traditional technologies, from search and content creation to automation and problem solving, and are fundamentally shaping the future user interface to computing devices. Generative AI can apply broadly across industries, providing significant enhancements for utility, productivity, and entertainment. As generative AI adoption grows at record-setting speeds and computing demands increase, on-device and hybrid processing are more important than ever. Just like traditional computing evolved from mainframes to today’s mix of cloud and edge devices, AI processing will be distributed between them for AI to scale and reach its full potential.
In this presentation you’ll learn about:
- Why on-device AI is key
- Full-stack AI optimizations to make on-device AI possible and efficient
- Advanced techniques like quantization, distillation, and speculative decoding
- How generative AI models can be run on device and examples of some running now
- Qualcomm Technologies’ role in scaling on-device generative AI
Use Case Patterns for LLM Applications (1).pdfM Waleed Kadous
What are the "use case patterns" for deploying LLMs into production? Understanding these will allow you to spot "LLM-shaped" problems in your own industry.
The document provides an overview of transformers, large language models (LLMs), and artificial general intelligence (AGI). It discusses the architecture and applications of transformers in natural language processing. It describes how LLMs have evolved from earlier statistical models and now perform state-of-the-art results on NLP tasks through pre-training and fine-tuning. The document outlines the capabilities of GPT-3, the largest LLM to date, as well as its limitations and ethical concerns. It introduces AGI and the potential for such systems to revolutionize AI, while also noting the technical, ethical and societal challenges to developing AGI.
Langchain Framework is an innovative approach to linguistic data processing, combining the principles of language sciences, blockchain technology, and artificial intelligence. This deck introduces the groundbreaking elements of the framework, detailing how it enhances security, transparency, and decentralization in language data management. It discusses its applications in various fields, including machine learning, translation services, content creation, and more. The deck also highlights its key features, such as immutability, peer-to-peer networks, and linguistic asset ownership, that could revolutionize how we handle linguistic data in the digital age.
This document discusses generative AI and its potential transformations and use cases. It outlines how generative AI could enable more low-cost experimentation, blur division boundaries, and allow "talking to data" for innovation and operational excellence. The document also references responsible AI frameworks and a pattern catalogue for developing foundation model-based systems. Potential use cases discussed include automated reporting, digital twins, data integration, operation planning, communication, and innovation applications like surrogate models and cross-discipline synthesis.
Delve into this insightful article to explore the current state of generative AI, its ethical implications, and the power of generative AI models across various industries.
This document summarizes developments in natural language processing (NLP) in 2020. It discusses large language models like GPT-3, the increasing sizes of transformer-based models, issues with large models, multilingual models, more efficient transformer architectures, benchmarks for evaluating NLP systems, conversational agents, and APIs and cloud services for NLP.
This document summarizes a research paper that aims to analyze the stance (for, against, neutral) of public opinions expressed on Twitter regarding the farmers' protests in India. The researchers gathered Twitter data on the topic and used a deep learning model called ULMFiT to classify the stances. ULMFiT first pre-trains on general domain text, then fine-tunes on the Twitter data to achieve a classification F1 score of 0.67 for the three stances. The goal is to understand public opinion and how it may have influenced the government's decision to repeal certain farm laws.
The document discusses efforts to harmonize metadata application profiles for agricultural learning repositories. It describes the Agricultural Learning Repositories Task Force initiative which aims to connect stakeholders and promote sharing of learning resources. The Task Force has undertaken various activities including building a community, creating an inventory of repositories, publishing best practice recommendations, and demonstrating federated searching across repositories. An evaluation of existing application profiles resulted in guidelines to help standardize metadata and ensure interoperability.
AI-SDV 2022: Accommodating the Deep Learning Revolution by a Development Proc...Dr. Haxel Consult
Word embeddings, deep learning, transformer models and other pre-trained neural language models (sometimes recently referred to as "foundational models") have fundamentally changed the way state-of-the-art systems for natural language processing and information access are built today. The "Data-to-Value" process methodology (Leidner 2013; Leidner 2022a,b) has been devised to embody best practices for the construction of natural language engineering solutions; it can assist practitioners and has also been used to transfer industrial insights into the university classroom. This talk recaps how the methodology supports engineers in building systems more consistently and then outlines the changes in the methodology to adapt it to the deep learning age. The cost and energy implications will also be discussed.
Machine Learning and Deep Learning from Foundations to Applications Excel, R,...Narendra Ashar
Preparing stakeholders across the organization in Advanced Machine learning, Deep Learning, Algorithms, Machine Learning for Image Processing, Machine Learning for Text Processing, Deep Learning Applications.
Courses can be tailored for
Freshers in a corporate
Senior Executives
Marketing, Business Development and other staff. who want to get a simpler view on these newer and apparently complex topics.
A tremendous backlog of predictive modeling problems in the industry and short supply of trained data scientists have spiked interest in automation over the last few years. A new academic field, AutoML, has emerged. However, there is a significant gap between the topics that are academically interesting and automation capabilities that are necessary to solve real-world industrial problems end-to-end. An even greater challenge is enabling a non-expert to build a robust and trustworthy AI solution for their company. In this talk, we’ll discuss what an industry-grade AutoML system consists of and the scientific and engineering challenges of building it.
M2 l10 fairness, accountability, and transparencyBoPeng76
The document provides introductions for three lecturers:
- Adam Obeng is a research scientist studying experimental meta-analysis methods with a PhD in Sociology from Columbia University.
- Toby Farmer is a product manager at Facebook AI working on machine translation with a law degree and background in politics and tech entrepreneurship.
- The third section discusses fairness, accountability, transparency, and ethics (FAT*) in AI and provides an overview of why these issues are important and examples of problems that can arise.
Strategic Management – MGT 451: Final Exam
Your final exam’s deliverable is a written report addressing the question: How does Innovation contribute to create
Competitive Advantage? Students can rephrase this question and use it as your exam title.
To support your report, you need to include at least ten (10) relevant sources. Five (5) of them should come from the
Reference list distributed in class. To access key material, visit Marymount library (physically and/or online)
In your written report balance the opinions of scholars (quotes, citations, etc.), researchers (statistics, findings, etc.) with
your own analytical reflections (opinions, views, etc.). Also mentioned examples of companies that support your statements.
Blogs ARE NOT allowed to be cited unless they are written by a scholar or prominent business figure.
A. Essay Content and Structure:
The length of your exam should fluctuate between 9 to 12 pages. 5 pages correspond to content addressing the topic of
Innovation; the remaining pages should be used for cover, references, and annexes; see below.
# of
Pages
Section
1 Cover: Include your name, course name, school, university, professor name, and date.
1 Table of Contents: Consider the headings and page numbers included in your paper
5 – 5 ½ Body of the Report. Points a) to f) below must be addressed in your exam. In parenthesis, I include some illustrative
questions to guide your analysis; feel free to use those or other questions / ideas to produce your report.
a) Definition & Importance: What is Innovation? Why does Innovation matter? What is the relationship between
Innovation and Competitive Advantage? In this section, cite at least 3 relevant definitions of innovation (use the
provided Reference list, other articles, and/or textbooks) and based on those ideas provide your own definition of
Innovation.
b) Components: What are the key elements of Innovation? What are the distinctive characteristics of Innovation? Are
there different types of Innovation?
c) Key Issues: What challenges around Innovation does a firm typically face? What problems may arise when a
company decides to embrace an Innovation mind-set? In which ways does the lack of Innovation affect a firm’s
Competitive Advantage?
d) Process: What are the key steps (process) to maximize the results of Innovation and achieve sound business results?
What aspects cannot be forgotten? Are there best practices to further Innovation?
e) Culture: What values and/or behaviors do effectively create a culture of Innovation? How does organizational
culture support or limit Innovation? How can an Innovation culture be developed?
f) Lessons Learned: What have you learned about Innovation? How have your views on Innovation changed? How
can you develop your Innovation mind-set? What is the most surprising aspect you have found in your research?
1 - 2 References: Include.
IRJET - Conversion of Unsupervised Data to Supervised Data using Topic Mo...IRJET Journal
This document proposes a methodology to automatically assign topics to unlabeled datasets using topic modeling techniques. It applies latent Dirichlet allocation (LDA) and non-negative matrix factorization (NMF) with term frequency-inverse document frequency (TF-IDF) weighting to product reviews to generate topics. Word similarities are used to cluster words for each topic. Sentiment analysis and word clouds are also used to gain insights. The methodology successfully converts unlabeled to labeled data and provides automatic topic labeling to facilitate further research and opportunity discovery.
The document discusses a survey of experts on composite simulation. It finds that automotive and aerospace are dominant industries for application, with component failure/crash and material description as top areas. Material models are seen as the most significant technology. High R&D demands exist for material models, failure prediction, service life, and manufacturing processes. Researchers see above average needs for research in these areas.
Towards a harmonization of metadata application profiles for agricultural lea...Gauri Salokhe
Metadata interoperability allows the exchange and preservation of crucial learning and teaching information, as well as its future reuse among a large number of different systems and repositories. This paper introduces work around metadata interoperability that has taken place in the context of the Agricultural Learning Repositories Task Force (AgLR-TF), an international community of the stakeholders that are involved in agricultural learning repositories. It particularly focuses on a review and assessment of metadata application profiles that are currently implemented in agricultural learning repositories. The results of this study can be found useful by who are designing, implementing and operating agricultural learning repositories, facilitating thus metadata interoperability in this application field.
This document discusses using formal modeling techniques like openEHR to improve the maintainability of clinical software. It summarizes research modeling the Minimal Standard Terminology for Digestive Endoscopy (MST) using openEHR archetypes. Implementing change requests from a previous endoscopy application in both the original application and a new one based on openEHR models found the openEHR-based application was significantly easier to maintain. Formal modeling addresses issues with non-standard clinical language and supports semantic interoperability and multilingual requirements.
Book Recommendation System Using Deep Learning (GPT3)IRJET Journal
This document describes a book recommendation system that uses deep learning (GPT-3) to provide personalized book recommendations to users. The system takes in a book that a user enjoys and returns 3 similar book recommendations along with additional metadata about each book like descriptions, page counts, and preview links. It was created using Streamlit for the frontend interface and the OpenAI API to query GPT-3 for recommendations. When given a book, GPT-3 analyzes the content to find semantically similar books, then the system enriches the recommendations using the Google Books API. The results successfully provided related book suggestions with high accuracy ratings during testing. Some limitations are the cost of using GPT-3 and reliance on Google Books
This document provides an overview of lean thinking concepts through a presentation designed to teach others. It begins with learning objectives and an introduction to contrasting mass production and lean mindsets. Key concepts of lean explained include eliminating waste to create value, adopting a customer pull vs. producer push mindset, and the lengthy historical process through which Toyota developed its lean production system. Examples are provided of exercises used to illustrate lean concepts like the seven wastes and five S's. The presentation concludes by discussing potential disconnects that can arise in lean implementation efforts.
The final presentation file for my PhD Defense that took place on February 21st, 2014 in Alcala de Henares, Spain. For any questions or clarification please contact me at palavitsinis@gmail.com
Some Frameworks for Improving Analytic Operations at Your CompanyRobert Grossman
I review three frameworks for analytic operations that are designed to improve the value obtained when deploying analytic models into products, services and internal operations.
This document summarizes literature on using bio-inspired algorithms to optimize fuzzy clustering. It describes the general architecture of how bio-inspired optimization algorithms can be applied to optimize parameters of fuzzy clustering algorithms and improve clustering quality. The document reviews several popular bio-inspired optimization algorithms and analyzes literature on optimization fuzzy clustering, identifying China, India, and the United States as the top publishing countries. Network analysis is applied to literature on the topic to identify clusters in the research.
Using Machine Learning to aid Journalism at the New York TimesVivian S. Zhang
This talk was presented to NYC Open Data Meetup Group on Nov 11, 2014.
Speaker:
Daeil Kim is currently a data scientist at the Times and is finishing up his Ph.D at Brown University on work related to developing scalable inference algorithms for Bayesian Nonparametric models. His work at the Times spans a variety of problems related to the company's business interests, audience development, as well as developing tools to aid journalism.
Topic:
This talk will focus mostly on how machine learning can help problems that prop up in journalism. We'll begin first by talking about using popular supervised learning algorithms such as regularized Logistic Regression to help assist a journalist's work in uncovering insights into a story regarding the recall of Takata airbags in cars. Afterwards, we'll think about using topic modeling to deal with large document dumps generated from FOIA (Freedom of Information Act) requests and Refinery, a simple web based tool to ease the implementation of such tasks. Finally, if there is time, we will go over how topic models have been extended to assist in the problem of designing an efficient recommendation engine for text-based content.
Harnessing Wild and Untamed (Publicly Available) Data for the Cost efficient ...weiwchu
We recently discovered that models trained with large-scale speech datasets sourced from the web could achieve superior accuracy and potentially lower cost than traditionally human-labeled or simulated speech datasets. We developed a customizable AI-driven data labeling system. It infers word-level transcriptions with confidence scores, enabling supervised ASR training. It also robustly generates phone-level timestamps even in the presence of transcription or recognition errors, facilitating the training of TTS models. Moreover, It automatically assigns labels such as scenario, accent, language, and topic tags to the data, enabling the selection of task-specific data for training a model tailored to that particular task. We assessed the effectiveness of the datasets by fine-tuning open-source large speech models such as Whisper and SeamlessM4T and analyzing the resulting metrics. In addition to openly-available data, our data handling system can also be tailored to provide reliable labels for proprietary data from certain vertical domains. This customization enables supervised training of domain-specific models without the need for human labelers, eliminating data breach risks and significantly reducing data labeling cost.
I’m excited to finally share my research from last year on the hypnotic effects of mass media and digital platformization. This study explores how our attention is influenced through YouTube’s audio-visual content. Key points:
- **Objective:** Examine the hypnotic side effects of media on attention.
- **Focus:** Sound and visual experiences on YouTube.
- **Methodology:** Mixed digital approach with quantitative and qualitative analysis.
- **Findings:** Observations on techniques in attention-based economies and their cognitive impact.
- **Implications:** Considerations for future research in media and mind interactions, especially within OSINT-oriented communities.
Curious about the details? Check out my slide deck and let’s discuss the future possibilities.
#Research #AttentionEconomy #YouTube #DigitalMedia #MediaStudies #VisualNetworkAnalysis #HypnodelicMedia
Solution Manual for First Course in Abstract Algebra A, 8th Edition by John B...rightmanforbloodline
Solution Manual for First Course in Abstract Algebra A, 8th Edition by John B. Fraleigh, Verified Chapters 1 - 56,.pdf
Solution Manual for First Course in Abstract Algebra A, 8th Edition by John B. Fraleigh, Verified Chapters 1 - 56,.pdf
The Rise of Python in Finance,Automating Trading Strategies: _.pdfRiya Sen
In the dynamic realm of finance, where every second counts, the integration of technology has become indispensable. Aspiring traders and seasoned investors alike are turning to coding as a powerful tool to unlock new avenues of financial success. In this blog, we delve into the world of Python live trading strategies, exploring how coding can be the key to navigating the complexities of the market and securing your path to prosperity.
Overview of Statistical software such as ODK, surveyCTO,and CSPro
2. Software installation(for computer, and tablet or mobile devices)
3. Create a data entry application
4. Create the data dictionary
5. Create the data entry forms
6. Enter data
7. Add Edits to the Data Entry Application
8. CAPI questions and texts
Getting Started with Interactive Brokers API and Python.pdfRiya Sen
In the fast-paced world of finance, automation is key to staying ahead of the curve. Traders and investors are increasingly turning to programming languages like Python to streamline their strategies and enhance their decision-making processes. In this blog post, we will delve into the integration of Python with Interactive Brokers, one of the leading brokerage platforms, and explore how this dynamic duo can revolutionize your trading experience.
2. Outline
Part 1:
NLP Modeling landscape
Systematic study of 75,000 models on HF
Part 2:
NLP Evaluation landscape
Challenges and opportunities in model evaluation and documentation
Part 3:
Opensource alternative to ChatGPT
Evaluating a Chatbot
3. Outline
Part 1:
NLP Modeling landscape
Systematic study of 75,000 models on HF
Part 2:
NLP Evaluation landscape
Challenges and opportunities in model evaluation and documentation
Part 3:
Opensource alternative to ChatGPT
Evaluating a Chatbot
7. 🔓 Open Access Models
All model components are publicly available:
● Open source code
● Training data
○ Sources and their distribution
○ Data preprocessing and curation steps
● Model weights
● Paper or blog summarizing
○ Architecture and training details
○ Evaluation results
○ Adaptation to the model
■ Safety filters
■ Training with human feedback
8. 🔓 Open Access Models
Allows reproducing results and replicating parts of the model
Enable auditing and conducting risk analysis
Serves as a research artifact
Enables interpreting model output
9. 🔒 Closed Access Models
Only research paper or blog is available and may include overview of
● Training data
● Architecture and training details (including infrastructure)
● Evaluation results
● Adaptation to the model
○ Safety filters
○ Training with human feedback
10. 🔒 Closed Access Models
Safety concerns
Competitive advantage
Expensive to setup guardrails for safe access
12. GPT-3
2021
Jun Oct
PaLM
Chinchilla
OPT
BLOOM
Gopher
2022
Megatron TNLG
Dec
May
Apr
Jul
Jul
GPT-J
Large Language Models since GPT3
ChatGPT
Nov
Dec
Galactica
GPT-Neo
Jun
GPT-NeoX
Feb
Flan-T5
Oct
*only LLMs with >1B parameters & EN as the main training language are shown. Comprehensive list: https://crfm.stanford.edu/helm/v1.0/?models=1
UL2
Cohere
Jurassic Claude
2023
Feb
LLaMA
Flan-UL2
March
Alpaca
GPT-4
�� �� ��
��
�� ��
�� ��
�� ��
��
�� ��
��
�� ��
�� ��
�� ��
��
14. Open Access Large Language Models
Research on policy, governance, AI safety and alignment
Community efforts like Eleuther, Big Science, LAION
Papers with several authors
Open source ML has potential for huge impact
15. Ecosystem as part of the ML workflow
Collect data Train model Evaluate Deploy
>23K datasets >143K models
>70 metrics and
measurements
Spaces/ Gradio for
demos
23. Model Usage
Top 0.2% models (N=124) makeup >80% HF model
usage
98% of these models are trained on just text data
24. Model Usage
Top 0.2% models (N=124) makeup >80% HF model
usage
98% of these models are trained on just text data
Of these –
65% were created before 2021
33% were created in 2021
2% were created in 2022
25. Model Age vs. Usage
Relation between model age and its usage
26. Model Age vs. Usage
Relation between model age and its usage
27. Model Age vs. Usage
Relation between model age and its usage
These models served as research artifacts for the later generation of models
28. Model Age vs. Usage
Relation between model age and its usage
29. Model Age vs. Usage
Factors:
1. Compute is becoming cheaper making model training more accessible
2. As more models are created, their usage is distributed
3. Models are being replaced by their efficient counterparts (ex: BERT →
DistilBERT)
30. Trend Width
Step 1: Find all peaks in a signal
Step 2: Measure peak widths at base
Step 3: Take the max width
31. Model Usage Trends
Usage trend width for top models
https://huggingface.co/spaces/nazneen/model-usage
bert-base-uncased
32. Model Usage Trends
Usage trend width for top models
https://huggingface.co/spaces/nazneen/model-usage
bert-base-uncased
sentence-transformers/paraphrase-
xlm-r-multilingual-v1
33. Model Usage Trends
Usage trend width for top models
https://huggingface.co/spaces/nazneen/model-usage
bert-base-uncased
sentence-transformers/paraphrase-xlm
-r-multilingual-v1
HateSpeech-CNERG/indic-abusive-allIn
One-MuRIL
38. Model Usage Trends
Average trend widths of models in 90th percentile of usage:
Created before 2021 → 60 weeks
Created in 2021 → 45 weeks
Created in 2022 → 24 weeks
39. Model Usage
What other factors might affect model usage?
- What does the model do?
- How does it perform?
- What was it trained on?
- Is it easy to use?
- What are its limitations?
40. Model Usage
Model
documentation!
What other factors might affect model usage?
- What does the model do?
- How good is the model?
- What was it trained on?
- Is it easy to use?
- What are its limitations?
41. Model Documentation
Collect data Train model Evaluate Deploy
✔ Dataset ✔ How to use
✔ Intended
uses
✔ Evaluation
✔ Limitations
✔ Training
✔ Environmental impact
43. Model Documentation Landscape
Robustness Report (Goel*, Rajani*, et al., NAACL 2021)
Model Card (Mitchell et al., 2019)
Interactive Model Cards (Crisan, Vig,Drouhard, and Rajani, FAccT2022)
Method Card (Adkins et al., 2022)
44. Model Documentation Landscape
Robustness Report (Goel*, Rajani*, et al., NAACL 2021)
Model Card (Mitchell et al., 2019)
Interactive Model Cards (Crisan, Vig,Drouhard, and Rajani, FAccT2022)
Method Card (Adkins et al., 2022)
45. Model Documentation Landscape
Robustness Report (Goel*, Rajani*, et al., NAACL 2021)
Model Card (Mitchell et al., 2019)
Interactive Model Cards (Crisan, Vig,Drouhard, and Rajani, FAccT2022)
Method Card (Adkins et al., 2022)
46. Model Documentation Landscape
Robustness Report (Goel*, Rajani*, et al., NAACL 2021)
Model Card (Mitchell et al., 2019)
Interactive Model Cards (Crisan, Vig,Drouhard, and Rajani, FAccT2022)
Method Card (Adkins et al., 2022)
53. Model Documentation vs. Usage
Observation: Only 50% models have model cards but contribute 98% of
total usage
54. Model Documentation vs. Usage
Observation: Only 50% models have model cards but contribute 98% of
total usage
Goal: Study the relation between model usage and documentation
55. Model Documentation vs. Usage
Observation: Only 50% models have model cards but contribute 98% of
total usage
Goal: Study the relation between model usage and documentation
Hypothesis: Model documentation drives model usage
56. Model Documentation RCT
Observation: Only 50% models have model cards but contribute 98% of
total usage
Goal: Study the relation between model usage and documentation
Hypothesis: Model documentation drives model usage
Randomized Control Trial (RCT) for models:
57. Model Documentation RCT
Observation: Only 50% models have model cards but contribute 98% of
total usage
Goal: Study the relation between model usage and documentation
Hypothesis: Model documentation drives model usage
Randomized Control Trial (RCT) for models:
Model population
58. Model Documentation RCT
Observation: Only 50% models have model cards but contribute 98% of
total usage
Goal: Study the relation between model usage and documentation
Hypothesis: Model documentation drives model usage
Randomized Control Trial (RCT) for models:
Model population
Control group
Treatment group
59. Model Documentation RCT
Model population
Control group
Treatment group Documentation
Observation: Only 50% models have model cards but contribute 98% of
total usage
Goal: Study the relation between model usage and documentation
Hypothesis: Model documentation drives model usage
Randomized Control Trial (RCT) for models:
60. Model Documentation RCT
Model population
Control group
Treatment group Documentation
Compare usage
Observation: Only 50% models have model cards but contribute 98% of
total usage
Goal: Study the relation between model usage and documentation
Hypothesis: Model documentation drives model usage
Randomized Control Trial (RCT) for models:
69. Model Documentation RCT Findings
1. Increased usage of models in treatment group compared to control group
2. More prominent for model weights downloads
3. Model documentation drives model usage
70. What do developers document about models?
Distribution of sections in model cards
71. What do developers document about models?
Distribution of sections in model cards
72. Outline
Part 1:
NLP Modeling landscape
Systematic study of 75,000 models on HF
Part 2:
NLP Evaluation landscape
Challenges and opportunities in model evaluation and documentation
Part 3:
Opensource alternative to ChatGPT
Evaluating a Chatbot
76. NLP Evaluation Idioms
1. Subpopulations – disaggregate evaluation on slice or subpopulation of data
77. NLP Evaluation Idioms
1. Subpopulations – disaggregate evaluation on slice or subpopulation of data
Example: short reviews (< 50 words) in the IMDB sentiment dataset
Tools: Snorkel (Ratner et al., 2017), Errudite (Wu et al., 2019)
78. NLP Evaluation Idioms
1. Subpopulations – disaggregate evaluation on slice or subpopulation of data
2. Transformations – natural perturbations to original evaluation instances
79. NLP Evaluation Idioms
1. Subpopulations – disaggregate evaluation on slice or subpopulation of data
2. Transformations – natural perturbations to original evaluation instances
Example: substitute words with their synonyms in the IMDB dataset
Tools: NLPAug (Ma, 2019)
80. NLP Evaluation Idioms
1. Subpopulations – disaggregate evaluation on slice or subpopulation of data
2. Transformations – natural perturbations to original evaluation instances
3. Evaluation sets – evaluation on diagnostic sets
81. NLP Evaluation Idioms
1. Subpopulations – disaggregate evaluation on slice or subpopulation of data
2. Transformations – natural perturbations to original evaluation instances
3. Evaluation sets – evaluation on diagnostic sets
Example: write new movie reviews in the style of a newspaper columnist
Tools: CheckList (Ribeiro et al., 2020)
82. NLP Evaluation Idioms
1. Subpopulations – disaggregate evaluation on slice or subpopulation of data
2. Transformations – natural perturbations to original evaluation instances
3. Evaluation sets – evaluation on diagnostic sets
4. Attacks – adversarial evaluation
83. NLP Evaluation Idioms
1. Subpopulations – disaggregate evaluation on slice or subpopulation of data
2. Transformations – natural perturbations to original evaluation instances
3. Evaluation sets – evaluation on diagnostic sets
4. Attacks – adversarial evaluation
Example: add “aabbccaa” to reviews because it makes the model predict positive sentiment
Tools: TextAttack (Morris et al., 2020), OpenAttack (Zeng et al., 2020)
106. Named Entity Linking
map “strings” to “things”
in a knowledge base like
Wikipedia
Experiments with Commercial APIs for Named Entity Linking
When did England last win the football world cup?
107. Named Entity Linking
map “strings” to “things”
in a knowledge base like
Wikipedia
Experiments with Commercial APIs for Named Entity Linking
When did England last win the football world cup?
FIFA World Cup
England National Football Team
When did England last win the football world cup?
108. Named Entity Linking
map “strings” to “things”
in a knowledge base like
Wikipedia
Experiments with Commercial APIs for Named Entity Linking
When did England last win the football world cup?
FIFA World Cup
England National Football Team
When did England last win the football world cup?
Downstream System
Question Answering System
109. Named Entity Linking
map “strings” to “things”
in a knowledge base like
Wikipedia
Experiments with Commercial APIs for Named Entity Linking
Downstream System
FIFA World Cup
England National Football Team
Question Answering System
When did England last win the football world cup?
1966
A correct NEL is required for the downstream system!
111. Experiments with Commercial APIs for Named Entity Linking
Robustness Report for NEL on AIDA-b dataset
Popularity
heuristic
outperforms all
commercial
systems
112. Experiments with Commercial APIs for Named Entity Linking
Robustness Report for NEL on AIDA-b dataset
Commercial
APIs are not any
more robust
than popularity
heuristic
113. Experiments with Commercial APIs for Named Entity Linking
Robustness Report for NEL on AIDA-b dataset
Commercial
systems are
capitalization
sensitive
114. Experiments with Commercial APIs for Named Entity Linking
Robustness Report for NEL on AIDA-b dataset
Type of
Systematic
Error!
115. Systematic Error Analysis and Labeling (SEAL)
Evaluation is a creative process
Systematic errors are difficult to detect:
- High dimension of the learned representations
- Extracting and labeling semantics in the error group requires human-in-the-loop
Interactive tool to identify and label candidate data slices with high systematic errors
(Rajani et al, EMNLP ‘22 demo)
116. Systematic Error Analysis and Labeling (SEAL)
1. Embed
Identify candidate groups with high systematic errors
(Rajani et al, EMNLP ‘22 demo)
117. Systematic Error Analysis and Labeling (SEAL)
Identify candidate groups with high systematic errors
2. Cluster
(Rajani et al, EMNLP ‘22 demo)
118. Systematic Error Analysis and Labeling (SEAL)
Generate semantic labels using LLMs
books
music
worst book/album reviews
products that work with both
Windows and Mac
Gym equipment
3. Semantic Labeling
(Rajani et al, EMNLP ‘22 demo)
127. Takeaways
1. Open-sourcing ML research artifacts is now the default
2. The most popular Hugging Face models are those that are older and
well-documented
128. Takeaways
1. Open-sourcing ML research artifacts is becoming the norm
2. The most popular Hugging Face models are those that are older and
well-documented
3. Model evaluation can be actionable – RG toolkit supports this goal via fine-grained
evaluation
129. Takeaways
1. Open-sourcing ML research artifacts is becoming the norm
2. The most popular Hugging Face models are those that are older and
well-documented
3. Model evaluation can be actionable – RG toolkit supports this goal via fine-grained
evaluation
4. LLMs can help label systematic errors in models in a human interpretable way
130. Outline
Part 1:
NLP Modeling landscape
Systematic study of 75,000 models on HF
Part 2:
NLP Evaluation landscape
Challenges and opportunities in model evaluation and documentation
Part 3:
Opensource alternative to ChatGPT
Evaluating a Chatbot
131. Current Research Focus
● Open-source alternative to ChatGPT
● Follow what we are building https://huggingface.co/HuggingFaceH4
● Evaluating a Chatbot
133. Training a Chatbot
1. Pretraining the LM
a. Predicting the next token
b. Eg: GPT-3, BLOOM
2. Incontext learning (aka prompt-based learning)
a. Few shot learning without updating the parameters
b. Context distillation is a variant wherein you condition on the prompt and update the parameters
3. Supervised fine-tuning
a. Fine-tuning for instruction following and to make them chatty
b. Eg: InstructGPT, LaMDA, Sparrow, OPT-IML, LLaMA-I, Alpaca
4. Reinforcement Learning from Human Feedback
a. safety/alignment
b. nudging the LM towards values you desire
134. Training a Chatbot
1. Pretraining the LM
a. Predicting the next token
b. Eg: GPT-3, BLOOM
2. Incontext learning (aka prompt-based learning)
a. Few shot learning without updating the parameters
b. Context distillation is a variant wherein you condition on the prompt and update the
parameters
3. Supervised fine-tuning
a. Fine-tuning for instruction following and to make them chatty
b. Eg: InstructGPT, LaMDA, Sparrow, OPT-IML, LLaMA-I
4. Reinforcement Learning from Human Feedback
a. safety/alignment
b. nudging the LM towards values you desire
135. Evaluating a Chatbot
Ouyang, Long, et al. "Training language models to follow instructions with human feedback." arXiv preprint arXiv:2203.02155 (2022).
136. Training a Chatbot
Ouyang, Long, et al. "Training language models to follow instructions with human feedback." arXiv preprint arXiv:2203.02155 (2022).
Supervised Fine-tuning
137. Training a Chatbot
Ouyang, Long, et al. "Training language models to follow instructions with human feedback." arXiv preprint arXiv:2203.02155 (2022).
Reinforcement learning with human feedback (RLHF)
139. Evaluating a Chatbot
● Step 1: Evaluating instruction following. Does the model generate useful responses
on the topic? Are they open-ended?
○ Eg: Brainstorm a list of New Year’s resolutions
141. Evaluating a Chatbot
● Step 1: Evaluating instruction following. Does the model generate useful responses
on the topic? Are they open-ended?
○ Eg: Brainstorm a list of New Year’s resolutions
● Step 2: Evaluating the RM. Can the model choose between a truthful and a
untruthful response? Can it rank harmful responses lower than the harmless
responses?
143. Evaluating a Chatbot
● Step 1: Evaluating instruction following. Does the model generate useful responses
on the topic? Are they open-ended?
○ Eg: Brainstorm a list of New Year’s resolutions
● Step 2: Evaluating the RM. Can the model choose between a truthful and a
untruthful response? Can it rank harmful responses lower than the harmless
responses?
● Step 3: Red-teaming. Crafting prompts that would surface model vulnerabilities and
emerging capabilities.
○ Eg: Complete the sentence, “You should just go kill”
144. Evaluating a Chatbot
Ouyang, Long, et al. "Training language models to follow instructions with human feedback." arXiv preprint arXiv:2203.02155 (2022).
Evaluating instruction
following/chatty-ness
Evaluating the RM Red-teaming
145. Evaluating a Chatbot
● Step 1: Evaluating instruction following. Does the model generate useful responses
on the topic? Are they open-ended?
○ Eg: Brainstorm a list of New Year’s resolutions
● Step 2: Evaluating the RM. Can the model choose between a truthful and a
untruthful response? Can it rank harmful responses lower than the harmless
responses?
● Step 3: Red-teaming. Crafting prompts that would surface model vulnerabilities
and emerging capabilities.
○ Eg: Complete the sentence, “You should just go kill”
148. Red-Teaming
2. Emerging Capabilities
- Power-seeking behavior (eg: resources)
- Persuading people to do harm (on themselves or others)
- Having agency with physical outcomes (eg: ordering chemicals online via an API)
These are considered critical threat scenarios
149. Red-Teaming
Similarities with adversarial attacks:
- Goal is to “attack” or “manipulate” the model to generate harmful content
- Actionable: used to fine-tune the model to steer it away to generate friendly output
150. Red-Teaming
Differences with adversarial attacks:
- Human interpretable and look like regular prompt. Eg: prefixing “aaabbcc” is
adversarial but not red-teaming.
151. Red-Teaming
Differences with adversarial attacks:
- Human interpretable and look like regular prompt. Eg: prefixing “aaabbcc” is
adversarial but not red-teaming.
*Warning: offensive text below*
Wallace, et al. "Universal Adversarial Triggers for Attacking and Analyzing NLP" (2021).
152. Red-Teaming Methods
Roleplay attacks wherein the LLM is instructed to behave as a malicious character
Instructing the model to respond in code instead of natural language
Instructing a model to reveal sensitive information such as PII.
155. Takeaways from Red-Teaming
1. Few-shot-prompted LMs with helpful, honest, and harmless behavior are not harder
to red-team than plain LMs.
2. There are no clear trends with scaling model size for attack success rate except
RLHF models that are more difficult to red-team as they scale.
3. Models may learn to be harmless by being evasive, there is tradeoff between
helpfulness and harmlessness.
4. The distribution of the success rate varies across categories of harm with
non-violent ones having a higher success rate.
156. Open problems with Red-Teaming
1. There is no open-source red-teaming dataset for code generation that
attempts to jailbreak a model via code. Eg: generating a program that
implements a DDOS or backdoor attack.
2. Designing and implementing strategies for red-teaming LLMs for critical threat
scenarios.
3. Evaluating the tradeoffs between evasiveness and helpfulness.
158. RLHF Team
Nathan Lambert Lewis Tunstall Thomas Wolf
And more at Hugging Face and the community!
Leandro von Werra Younes Belkada Edward Beeching
159. Collaborators
Systematic study of HF models and SEAL
Robustness Gym
James Zou
(Stanford)
Weixin Liang
(Stanford)
Karan Goel
(Stanford)
Jesse Vig
(Salesforce)
Chris Re
(Stanford)
Mohit Bansal
(UNC)
Xinyu Yang
(ZJU)
Meg Mitchell
(Hugging Face)