https://www.youtube.com/@Arxflix
👉 Twitter: https://x.com/arxflix
👉 LMNT (Partner): https://lmnt.com/

By Arxflix
$\"9t4iCUHx_400x400-1.jpg\"$

\n","updatedAt":"2024-06-09T03:59:38.076Z","author":{"avatarUrl":"/avatars/716b6a7d1094c8036b2a8a7b9063e8aa.svg","fullname":"Julien BLANCHON","name":"blanchon","type":"user","isPro":true,"isHf":false,"isMod":false}},"numEdits":0,"editors":["blanchon"],"reactions":[],"identifiedLanguage":{"language":"en","probability":0.4912989139556885},"isReport":false}}],"primaryEmailConfirmed":false,"paper":{"id":"2005.14165","authors":[{"_id":"6411c77b6b75ddced388f46a","name":"Tom B. Brown","hidden":false},{"_id":"6411c77b6b75ddced388f46b","name":"Benjamin Mann","hidden":false},{"_id":"6411c77b6b75ddced388f46c","name":"Nick Ryder","hidden":false},{"_id":"6411c77b6b75ddced388f46d","user":{"_id":"65782998528e89e35f402b46","avatarUrl":"/avatars/78b414c67f17c774a1dcb9f139869726.svg","isPro":false,"fullname":"Melanie Subbiah","user":"melsuub","type":"user"},"name":"Melanie Subbiah","status":"claimed_verified","statusLastChangedAt":"2024-04-26T09:03:45.469Z","hidden":false},{"_id":"6411c77b6b75ddced388f46e","name":"Jared Kaplan","hidden":false},{"_id":"6411c77b6b75ddced388f46f","name":"Prafulla Dhariwal","hidden":false},{"_id":"6411c77b6b75ddced388f470","name":"Arvind Neelakantan","hidden":false},{"_id":"6411c77b6b75ddced388f471","name":"Pranav Shyam","hidden":false},{"_id":"6411c77b6b75ddced388f472","name":"Girish Sastry","hidden":false},{"_id":"6411c77b6b75ddced388f473","name":"Amanda Askell","hidden":false},{"_id":"6411c77b6b75ddced388f474","name":"Sandhini Agarwal","hidden":false},{"_id":"6411c77b6b75ddced388f475","name":"Ariel Herbert-Voss","hidden":false},{"_id":"6411c77b6b75ddced388f476","name":"Gretchen Krueger","hidden":false},{"_id":"6411c77b6b75ddced388f477","name":"Tom Henighan","hidden":false},{"_id":"6411c77b6b75ddced388f478","name":"Rewon Child","hidden":false},{"_id":"6411c77b6b75ddced388f479","name":"Aditya Ramesh","hidden":false},{"_id":"6411c77b6b75ddced388f47a","name":"Daniel M. Ziegler","hidden":false},{"_id":"6411c77b6b75ddced388f47b","name":"Jeffrey Wu","hidden":false},{"_id":"6411c77b6b75ddced388f47c","name":"Clemens Winter","hidden":false},{"_id":"6411c77b6b75ddced388f47d","name":"Christopher Hesse","hidden":false},{"_id":"6411c77b6b75ddced388f47e","name":"Mark Chen","hidden":false},{"_id":"6411c77b6b75ddced388f47f","name":"Eric Sigler","hidden":false},{"_id":"6411c77b6b75ddced388f480","name":"Mateusz Litwin","hidden":false},{"_id":"6411c77b6b75ddced388f481","name":"Scott Gray","hidden":false},{"_id":"6411c77b6b75ddced388f482","name":"Benjamin Chess","hidden":false},{"_id":"6411c77b6b75ddced388f483","name":"Jack Clark","hidden":false},{"_id":"6411c77b6b75ddced388f484","name":"Christopher Berner","hidden":false},{"_id":"6411c77b6b75ddced388f485","name":"Sam McCandlish","hidden":false},{"_id":"6411c77b6b75ddced388f486","name":"Alec Radford","hidden":false},{"_id":"6411c77b6b75ddced388f487","name":"Ilya Sutskever","hidden":false},{"_id":"6411c77b6b75ddced388f488","name":"Dario Amodei","hidden":false}],"publishedAt":"2020-05-28T17:29:03.000Z","title":"Language Models are Few-Shot Learners","summary":"Recent work has demonstrated substantial gains on many NLP tasks and\nbenchmarks by pre-training on a large corpus of text followed by fine-tuning on\na specific task. While typically task-agnostic in architecture, this method\nstill requires task-specific fine-tuning datasets of thousands or tens of\nthousands of examples. By contrast, humans can generally perform a new language\ntask from only a few examples or from simple instructions - something which\ncurrent NLP systems still largely struggle to do. Here we show that scaling up\nlanguage models greatly improves task-agnostic, few-shot performance, sometimes\neven reaching competitiveness with prior state-of-the-art fine-tuning\napproaches. Specifically, we train GPT-3, an autoregressive language model with\n175 billion parameters, 10x more than any previous non-sparse language model,\nand test its performance in the few-shot setting. For all tasks, GPT-3 is\napplied without any gradient updates or fine-tuning, with tasks and few-shot\ndemonstrations specified purely via text interaction with the model. GPT-3\nachieves strong performance on many NLP datasets, including translation,\nquestion-answering, and cloze tasks, as well as several tasks that require\non-the-fly reasoning or domain adaptation, such as unscrambling words, using a\nnovel word in a sentence, or performing 3-digit arithmetic. At the same time,\nwe also identify some datasets where GPT-3's few-shot learning still struggles,\nas well as some datasets where GPT-3 faces methodological issues related to\ntraining on large web corpora. Finally, we find that GPT-3 can generate samples\nof news articles which human evaluators have difficulty distinguishing from\narticles written by humans. We discuss broader societal impacts of this finding\nand of GPT-3 in general.","upvotes":11,"discussionId":"641192323ea54b1aa7e2ed1e"},"canReadDatabase":false,"canManageCommunity":false,"canSubmit":false,"hasHfLevelAccess":false,"upvoted":false,"upvoters":[{"_id":"64b90921ace99c0723ac1f3d","avatarUrl":"/avatars/d8bbd863fc29d3ff4f1b9620efddf991.svg","isPro":false,"fullname":"Lucas","user":"lckr","type":"user"},{"_id":"5e67bdd61009063689407479","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1583857146757-5e67bdd61009063689407479.jpeg","isPro":true,"fullname":"Clem 🤗","user":"clem","type":"user"},{"_id":"6551fce8ea9be8f1e6facf99","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/6551fce8ea9be8f1e6facf99/yWHIimvBNxvLpEdKAMT2x.jpeg","isPro":false,"fullname":"James Aymer","user":"qwerty87","type":"user"},{"_id":"6538119803519fddb4a17e10","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/6538119803519fddb4a17e10/ffJMkdx-rM7VvLTCM6ri_.jpeg","isPro":false,"fullname":"samusenps","user":"samusenps","type":"user"},{"_id":"64b0186de92886769eb46863","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/64b0186de92886769eb46863/SjVa4w8SmzuxVyg4_GQ6x.jpeg","isPro":false,"fullname":"Florian von Stosch","user":"flauflauf","type":"user"},{"_id":"61f36f0ee3b6dea737a65031","avatarUrl":"/avatars/32a34625ebd6542be80c241c4e1c51d1.svg","isPro":false,"fullname":"Michael Kingston","user":"michael-kingston","type":"user"},{"_id":"6316c9e193ab42acfb0d2599","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/6316c9e193ab42acfb0d2599/Z_iugzOnx-Qbx7SwBD4TC.jpeg","isPro":false,"fullname":"Steven Casteel","user":"stevencasteel","type":"user"},{"_id":"655ac762cb17ec19ef82719b","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/655ac762cb17ec19ef82719b/1kDncYrGLYS_2SR8cNdAL.png","isPro":false,"fullname":"Welcome to matlok","user":"matlok","type":"user"},{"_id":"61646b4f83e3ea36633da1d7","avatarUrl":"/avatars/e72c7faaf130ea982328805e8a4a491e.svg","isPro":false,"fullname":"Yusuf AKDAS","user":"yusufakdas","type":"user"},{"_id":"63d10d4e8eaa4831005e92b5","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/63d10d4e8eaa4831005e92b5/7p7-OmWM6PqqCs7ZStPGD.jpeg","isPro":false,"fullname":"Aymeric Roucher","user":"m-ric","type":"user"},{"_id":"668981a8f83d2c64ead30cc6","avatarUrl":"/avatars/713af47d263cded4f194ac8594aa1b92.svg","isPro":false,"fullname":"Ricardo Washington","user":"washingtonrUS","type":"user"}],"acceptLanguages":["*"]}">

arxiv:2005.14165

Language Models are Few-Shot Learners

Published on May 28, 2020

Upvote

Authors:

Melanie Subbiah ,

Abstract

Recent work has demonstrated substantial gains on many NLP tasks and benchmarks by pre-training on a large corpus of text followed by fine-tuning on a specific task. While typically task-agnostic in architecture, this method still requires task-specific fine-tuning datasets of thousands or tens of thousands of examples. By contrast, humans can generally perform a new language task from only a few examples or from simple instructions - something which current NLP systems still largely struggle to do. Here we show that scaling up language models greatly improves task-agnostic, few-shot performance, sometimes even reaching competitiveness with prior state-of-the-art fine-tuning approaches. Specifically, we train GPT-3, an autoregressive language model with 175 billion parameters, 10x more than any previous non-sparse language model, and test its performance in the few-shot setting. For all tasks, GPT-3 is applied without any gradient updates or fine-tuning, with tasks and few-shot demonstrations specified purely via text interaction with the model. GPT-3 achieves strong performance on many NLP datasets, including translation, question-answering, and cloze tasks, as well as several tasks that require on-the-fly reasoning or domain adaptation, such as unscrambling words, using a novel word in a sentence, or performing 3-digit arithmetic. At the same time, we also identify some datasets where GPT-3's few-shot learning still struggles, as well as some datasets where GPT-3 faces methodological issues related to training on large web corpora. Finally, we find that GPT-3 can generate samples of news articles which human evaluators have difficulty distinguishing from articles written by humans. We discuss broader societal impacts of this finding and of GPT-3 in general.

View arXiv page View PDF Add to collection