[ad_1]
words, data and algorithms combine,
Article about LLMs, so divine.
A look into the world of language
where language machines are spread out.
The natural inclination was to commission a large language model (LLM) such as CHATGPT to produce a poem that explores the topic of large language models and subsequently used the poem as an introduction to this article.
So how did this poem come together in a neat package of rhyming words and little bits of clever phrasing?
We went straight to the source: MIT Assistant Professor and CSAIL Principal Investigator Jakob Andreas, whose research focuses on advancing the field of natural language processing, both in developing cutting-edge machine learning models and exploring the potential of language to improve other tools. Forms of artificial intelligence. This includes pioneering work in areas such as using natural language to teach robots and using language to enable computer vision systems to inform their decision-making processes. We quizzed Andreas on the mechanics of the current technology, implications and future prospects.
Q: Language is a rich ecosystem full of subtle nuances that people use to communicate with each other—sarcasm, irony, and other forms of figurative language. There are many ways to convey meaning beyond the literal meaning. Is it possible for large language models to understand context complexities? What does it mean for a model to achieve “contextual learning”? Moreover, how do multilingual transformers handle variations and dialects of different languages beyond English?
A: When we think about linguistic context, these models can reason about much, much longer documents and chunks of text more broadly than anything we’ve known before. But this is only one kind of context. With humans, language is produced and understood in a grounded context. For example, I know that I am sitting at this table. There are objects I can refer to, and the language models we have now usually don’t see anything when interacting with humans.
There is a wider social context that informs our language use that these models are, at least not immediately, sensitive to or aware of. It is not clear how to give them information about the social context in which their language is generated and language modeled. Another important thing is the temporal context. We are making this video at a specific point in time when specific facts are true. The models that we have now were trained on a snapshot of the Internet that stopped at a particular time—for most of the models that we have now, probably a few years ago—and they don’t know about anything that’s happened since then. They don’t even know at what point in time they are generating the text. Figuring out how to provide all these different kinds of context is also an interesting question.
Perhaps one of the most surprising components here is this phenomenon called contextual learning. If I take a small ML [machine learning] data set and feed it to a model, such as a movie review and a star rating given to a movie by a critic, to give just a few examples of these things, linguistic models produce the ability to both generate plausible-sounding movie reviews and predict star ratings. More generally, if I have a machine learning problem, I have my inputs and my outputs. When you give a model an input, give it another input, and ask it to predict an output, the models can often do very well.
It’s a very interesting, fundamentally different way of doing machine learning, where I have this one big general-purpose model that I can put a lot of small machine learning datasets into, and still not have to train a new model or classifier or generator or whatever that’s specialized for my particular task. This is actually something that we’ve been thinking a lot about in my group and in collaboration with colleagues at Google – trying to understand exactly how this contextual learning phenomenon occurs.
Q: We like to believe that people (at least sometimes) seek what is objectively and morally known to be true. Large language models, perhaps with ill-defined or as-yet-understood “moral compasses,” do not hold true. Why do large language models hallucinate facts or convincingly assert inaccuracies? Does this limit its usefulness for applications where actual accuracy is important? Is there a leading theory as to how we might solve this?
A: It is well documented that these models reflect the fact that they are not always reliable. I recently asked ChatGPT to describe some of our group’s research. He cited five papers, four of which are not actually existing papers, and one of which is a genuine paper written by a colleague of mine who lives in the UK, with whom I have never co-authored. Actuality is still a big problem. Even beyond that, things that involve reasoning in a really general sense, with complex calculations and complex inferences, are still really hard for these models. There may also be fundamental limitations to this transformer architecture, and I believe a lot more modeling work is needed to make things better.
Why this happens is still partly an open question, but perhaps, just architecturally, there are reasons why these models have difficulty creating coherent models of the universe. They can do it a little bit. You can ask them factual questions, trivia questions, and they get them right most of the time, maybe even more often than your average human customer on the street. But unlike your average user, it’s not really clear whether anything living inside this linguistic model corresponds to beliefs about the state of the world. I think it is for architectural reasons that transformers obviously have nowhere to put this belief and training data, that these models were trained on the Internet, authored by different people at different times. He believes different things about the state of the world. Therefore, it is difficult to expect models to represent these things consistently.
That said, I don’t think this is a fundamental limitation of neural language models or more general language models in general, but something that is true of current language models. We’re already seeing models approaching that can create representations of facts, representations of the state of the world, and I think there’s still room for improvement.
Q: The pace of progression from GPT-2 to GPT-3 to GPT-4 is dizzying. What does the trajectory look like from here? Will it be exponential or an S-curve that will taper off in progress in the near term? If so, are there any limiting factors in terms of scale, compute, data, or architecture?
A: Of course, in the short term, what I fear most has to do with these issues of truth and consistency that I mentioned earlier, that even the best models we have today produce the wrong facts. They generate code with bugs, and because of how these models work, they do it in a way that is particularly difficult for humans to detect because the model output has all the correct surface statistics. When we think about code, it’s still an open question whether it actually works less well for someone to write a function by hand or ask the language model to generate that function and then have a person go through and check that the implementation of that function was there. actually correct.
There’s a bit of danger in rushing to use these tools right away and we end up in a world where things are a little worse, but it’s actually very difficult for people to reliably test the results of these models. That being said, these are problems that can be overcome. Especially at the rate things are going, there’s a lot of room for long-term realism and the consistency and correctness of the generated code. These are really tools, tools that we can use to free ourselves as a society from many of the unpleasant tasks, chores or drudgery that have been difficult to automate – and that’s something to be excited about.
[ad_2]
Source link