AI scientist: “We need to think beyond the general language model”


Images by PM/Getty Images

Generative artificial intelligence The developers of (Gen AI) continually push the boundaries of what is possible, such as Google Gemini 1.5which can handle a million information tokens at a time.

However, even this level of development is not enough to make real progress in AI, say competitors who compete directly with Google.

Also: 3 Ways Meta’s Llama 3.1 Is a Breakthrough for AI Generation

“We need to think beyond the boundaries of the LLM,” said Yoav Shoham, co-founder and co-CEO of AI21 Laboratoriesin an interview with ZDNET.

AI21 Labs, a privately-backed startup, is competing with Google in LLMs, the large language models that are the foundation of artificial general intelligence. Shoham, once a principal scientist at Google, is also a professor emeritus at Stanford University.

Also: AI21 and Databricks show that open source can radically reduce artificial intelligence

“They’re amazing in the results they produce, but they don’t really understand what they’re doing,” he said of LLMs. “I think even the most die-hard neural network experts don’t believe that just building a bigger language model is enough to solve everything.”

shoham-et-al-2024-common-llms-faults

AI21 Labs researchers point to the basic flaws in OpenAI’s GPT-3 as an example of how models fail at basic questions. The answer, the company argues, is to supplement LLMs with something else, such as modules that can perform consistently.

AI21 Laboratories

Shoham’s company has pioneered new approaches to general AI that go beyond the traditional “transformer,” the centerpiece of most LLMs. For example, the company In April a model called Jamba debutedan intriguing combination of transformers with a second neural network called a state space model (SSM).

The combination has allowed Jamba to outperform other AI models on important metrics.

Shoham asked ZDNET for a detailed explanation of one important metric: context length.

The context length is the amount of input (in tokens, usually words) that a program can handle. Flame of Meta 3.1 offers 128,000 tokens in the context window. AI21 Labs’ Jamba, which is also open source software, has twice that number: a context window of 256,000 tokens.

prof-yoav-shoham-credito-roei-shor-photography

Shoham: “Even the most die-hard neural network experts don’t believe that building a broader language model is enough to solve everything.”

Photography by Roei Shor

In benchmark testing, using a benchmark built by Nvidia, Shoham said the Jamba model was the only model besides Gemini that could maintain that 256K context window “in practice.” Context length may be advertised as one thing, but it can fall apart as a model scores less as context length increases.

Also: 3 Ways Meta’s Llama 3.1 Is a Breakthrough for AI Generation

“We’re the only ones who have truth in advertising,” when it comes to the extent of context, Shoham said. “All other models degrade as the extent of context increases.”

Google’s Gemini can’t be tested at more than 128K, Shoham said, given the limitations imposed on the Gemini application programming interface by Google. “They actually have a good effective context window, at least at 128K,” he said.

Jamba is cheaper than Gemini for the same 128K window, Shoham said. “They are about ten times more expensive than us,” in terms of the cost of delivering Gemini predictions compared to Jamba, the inference practice, he said.

All of that, Shoham stressed, is a product of the “architectural” choice to do something different, attaching a transformer to an SSM. “You can show exactly how many [API] “There are calls being made” to the model, he told ZDNET. “It’s not just about cost and latency, it’s something inherent to the architecture.”

Shoham He described the findings in a blog post..

However, none of that progress matters unless Jamba can do something superior. The benefits of having a large context window become apparent, Shoham said, as the world moves toward things like recovery-augmented generation (RAG), an increasingly popular approach to Connecting an LLM to an external information sourcesuch as a database.

Also: Make room for RAG: How the AI ​​generation’s balance of power is shifting

A large context window allows the LLM to retrieve and sort more information from the RAG source to find the answer.

“At the end of the day, get back as much as you can. [from the database]“But not too much,” is the right approach for RAG, Shoham said. “Now, you can recover more than you could before, if you have a long context window, and now the language model has more information to work with.”

Asked if there is a practical example of this effort, Shoham told ZDNET: “It’s too early to show a working system. I can tell you that we have several customers who have been frustrated with RAG solutions and are working with us now. And I’m pretty sure we’ll be able to publicly show the results, but they haven’t been available for that long.”

Jamba, which has had 180,000 downloads since they put it on HuggingFaceIt’s available on Amazon’s AWS Bedrock inference service and on Microsoft Azure, and “people are doing interesting things with it,” Shoham said.

However, even an improved RAG is ultimately no salvation for the various shortcomings of Gen AI, from hallucinations to risking generations of the technology. descending into gibberish.

“I think we’re going to see people demanding more, demanding systems that aren’t ridiculous and that have something resembling real understanding, that have answers that are close to perfect,” Shoham said, “and that’s not going to be pure LLM.”

Also: Beware of AI “model collapse”: How training with synthetic data contaminates the next generation

in a Article published last month On the preprint server arXiv, with collaborator Kevin Leyton-Brown, and titled ‘Understanding Understanding: A Pragmatic Framework Motivated by Large Language Models’, Shoham demonstrated how, through numerous operations, such as mathematics and table data manipulation, LLMs produced “explanations that sound compelling but are not worth the metaphorical paper they are written on.”

“We show how to naively hook [an LLM] “If it’s a table, the table function will succeed 70% or 80% of the time,” Shoham told ZDNET. “That’s usually very satisfying because you get something for nothing, but if it’s a mission-critical job, you can’t do that.”

Those flaws, Shoham said, mean that “the whole approach to creating intelligence will say that LLMs have a role to play, but they are part of a larger AI system that brings to the table things that you can’t do with LLMs.”

Among the things needed to go beyond the LLM are the various tools that have emerged in recent years, Shoham said. Elements such as function calls allow an LLM to hand off a task to another type of software created specifically for a particular task.

“If you want to add, language models do it, but they do it terribly,” Shoham said. “Hewlett-Packard gave us a calculator in 1970, so why reinvent the wheel? That’s an example of a tool.”

Shoham and others have broadly grouped the use of LLM with tools under the heading “composite AI systems.” With the help of data management company Databricks, Shoham has organized a workshop on the prospects for building such systems.

One example of how these tools are used is to introduce master’s students to the “semantic structure” of table-based data, Shoham said. “Now, you get close to 100 percent accuracy” from the LLM, he said, “and you wouldn’t get that if you just used a language model without any additional elements.”

Beyond tools, Shoham advocates for scientific exploration in other directions outside of the pure deep learning approach that has dominated AI for more than a decade.

“You don’t get robust reasoning by just backpropagating and hoping for the best,” Shoham said, referring to backpropagation, the learning rule that most current AI is trained on.

Also: Anthropic brings Tool Use for Claude out of beta, promises sophisticated assistants

Shoham was careful not to discuss upcoming product initiatives. He did, however, hint that what may be needed is represented, at least philosophically, in a system he and his colleagues introduced in 2022 called the MRKL (Modular Reasoning, Knowledge, and Language) System.

The paper describes the MRKL system as “neural, including the huge general-purpose language model as well as smaller, specialized LMs,” and also “symbolic, for example a mathematical calculator, a currency converter, or an API call to a database.”

That breathing is a neurosymbolic approach to AI. And in that sense, Shoham agrees with some prominent thinkers who have concerns about the dominance of next-generation AI. Frequent AI critic Gary Marcus, for example, has said that AI It will never reach the human level of intelligence. without the ability to manipulate symbols.

MRKL has been implemented as a program called Jurassic-X, which the company has tested with partners.

Also: OpenAI is training the successor to GPT-4. Here are three big improvements expected from GPT-5

An MRKL system should be able to use the LLM to analyze problems involving complicated wording, such as, “Ninety-nine bottles of beer on the wall, one fell, how many bottles of beer are on the wall?” The actual arithmetic is handled by a second neural network with access to the arithmetic logic, using arguments extracted from the text by the first model.

A “router” between the two has the difficult task of choosing what to extract from the text analyzed by the LLM and choosing which “module” to pass the results to perform the logic.

That work means that “there is no free lunch, but lunch is in many cases affordable,” Shoham and his team write.

From a product and business perspective, “we’d like to continue to offer additional functionality so people can build things,” Shoham said.

The important thing is that a system like MRKL doesn’t need to do everything to be practical, he said. “If you’re trying to build a universal LLM that understands math problems and how to generate pictures of donkeys on the moon, and how to write poems, and do all that, that can be expensive,” he observed.

“But 80% of the data in the company is text: you have tables, you have graphs, but donkeys on the moon are not that important in the company.”

Given Shoham’s skepticism about LLMs alone, is there a danger that the current generation of AI could trigger what’s known as an AI winter — a sudden collapse in activity as interest and funding completely dry up?

“It’s a valid question and I don’t really know the answer,” he said. “I think this time is different because, in the 1980s,” during the last AI winter, “there wasn’t enough value created with AI to offset the unfounded hype. Now it’s clear that there is unfounded hype, but I feel like enough value has been created for us to overcome it.”





Source link