Ein Artikel aus der Zeitschrift MIT Technology Review. Der Autor geht der Frage nach, warum LLMs die erstaunlichen Leistungen erbringen können, die sie zeigen und hat dazu eine ganze Reihe von Wissenschaftlern befragt. Grundtenor der Antworten: wir wissen es nicht… Unterhaltsame Lektüre.
– – –
Der Autor, Will Douglas Heaven, ist senior editor for AI für die Zeitschrift MIT Technology Review. In seinem Artikel mit dem Titel “Large language models can do jaw-dropping things. But nobody knows exactly why” finden sich unter anderem Passagen, die zeigen, wie zufallsgetrieben wissenschaftlicher Fortschritt sein kann. Hier eine Kostprobe:
Two years ago, Yuri Burda and Harri Edwards, researchers at the San Francisco–based firm OpenAI, were trying to find out what it would take to get a language model to do basic arithmetic. They wanted to know how many examples of adding up two numbers the model needed to see before it was able to add up any two numbers they gave it. At first, things didn’t go too well. The models memorized the sums they saw but failed to solve new ones. By accident, Burda and Edwards left some of their experiments running far longer than they meant to—days rather than hours. The models were shown the example sums over and over again, way past the point when the researchers would otherwise have called it quits. But when the pair at last came back, they were surprised to find that the experiments had worked. They’d trained a language model to add two numbers—it had just taken a lot more time than anybody thought it should.
Im Artikel geht es um Phänomäne wie “Grokking” (emergentes, unerwartetes Verhalten von LLMs), um “Overfitting” und um “Double descent”. Und um die Frage, ob es wirklich wichtig ist, dass Wissenschaftler verstehen bzw. eine Theorie dazu haben, warum LLMs das können, was sie können:
Why does it matter whether AI models are underpinned by classical statistics or not? One answer is that better theoretical understanding would help build even better AI or make it more efficient. At the moment, progress has been fast but unpredictable. Many things that OpenAI’s GPT-4 can do came as a surprise even to the people who made it. Researchers are still arguing over what it can and cannot achieve. “Without some sort of fundamental theory, it’s very hard to have any idea what we can expect from these things,”
(…)
This isn’t only about managing progress—it’s about anticipating risk, too. Many of the researchers working on the theory behind deep learning are motivated by safety concerns for future models. “We don’t know what capabilities GPT-5 will have until we train it and test it,”
– – –
Heaven, Will Douglas (2024): Large language models can do jaw-dropping things. But nobody knows exactly why. MIT Technology Review. 04. März 2024
Beitragsbild: DALL-E mit Prompting von SCIL.
Schreiben Sie einen Kommentar