(Go: >> BACK << -|- >> HOME <<)

Model collapse: Difference between revisions

Content deleted Content added
m text edits
ce
Line 65:
 
=== Linear Regression ===
In the case of a linear regression model,<ref>{{Cite webarxiv |last=Dohmatob |first=Elvis |last2=Feng |first2=Yunzhen |last3=Kempe |first3=Julia |date=2024-02-12 |title=Model Collapse Demystified: The Case of Regression |url=https://arxiv.org/abs/2402.07712v2 |access-date=2024-06-22 |website=arXiv2402.org |language=en07712}}</ref><ref>{{Citationcite arxiv |last=Dohmatob |first=Elvis |title=A Tale of Tails: Model Collapse as a Change of Scaling Laws |date=2024-02-10 |url=http://arxiv.org/abs/2402.07043 |access-date=2024-06-22 |doi=10.48550/arXiv.2402.07043 |last2=Feng |first2=Yunzhen |last3=Yang |first3=Pu |last4=Charton |first4=Francois |last5=Kempe |first5=Julia}}</ref> scaling laws and bounds on learning can be found.
 
=== Statistical Language Model ===
In the case of a linear softmax classifier for next token prediction,<ref>{{Citationcite arxiv |last=Seddik |first=Mohamed El Amine |title=How Bad is Training on Synthetic Data? A Statistical Analysis of Language Model Collapse |date=2024-04-07 |url=http://arxiv.org/abs/2404.05090 |access-date=2024-06-22 |doi=10.48550/arXiv.2404.05090 |last2=Chen |first2=Suei-Wen |last3=Hayou |first3=Soufiane |last4=Youssef |first4=Pierre |last5=Debbah |first5=Merouane}}</ref> exact bounds on learning with even a partially synthetic dataset can be found.[[File:Model Collapse in Generative Models Can Be Avoided By Accumulating Data.png|thumb|Model collapse in generative models can be curbed by accumulating data]]
 
== Impact on large language models ==