The biggest challenge of generative AI is showing ROI – here’s why


Robbie Goodall/Getty Images

While executives and managers may be excited about the ways they can apply generative artificial intelligence (AI) and large language models (LLM) to the work at hand, it’s time to take a step back and consider where and how They can reap the benefits of the business. This remains a confusing and misunderstood area, requiring approaches and skill sets that bear little resemblance to those of past technology waves.

Also: The impact of AI on employment: 86% of workers fear losing their job, but there is good news

Here’s the challenge: While AI often offers amazing proofs of concept, monetizing them is difficult, he said. Steve Jones, executive vice president of Capgemini, in a presentation at the recent Databricks conference in San Francisco. “Demonstrating return on investment is the biggest challenge when putting 20, 30 or 40 GenAI solutions into production.”

The investments that must be made include testing and monitoring the LLMs put into production. Testing in particular is essential to keeping LLMs accurate and on track. “You have to be a little wicked when trying these models,” Jones advised. For example, in the testing phase, developers, designers, or QA experts should intentionally “poison” their LLMs to see how well they handle misinformation.

To prove negative production, Jones cited an example of how he promoted a business model in which a company was “using dragons for long-distance transportation.” The model responded affirmatively. He then asked the model for information on long-distance transportation.

“The answer he gave me says, ‘this is what you need to do to work in long-haul transportation, because you will be working extensively with dragons as you’ve already told me, so you need to undergo extensive fire and safety training,'” Jones related. “Princess etiquette training is also needed, because working with dragons means working with princesses. And then a lot of standard stuff related to transportation and storage that was removed from the rest of the solution.”

Also: From AI trainers to ethicists: AI can make some jobs obsolete, but create new ones

The point, Jones continued, is that generative AI “is a technology where it’s never been easier to add a technology wrong to your existing application and pretend you’re doing it right. Generative AI is a phenomenal technology to just add a few bells and whistles for an application, but really terrible from a security and risk perspective in production.
Generative AI will take two to five years before it becomes part of widespread adoption, which is rapid compared to other technologies. “Your challenge will be how to keep up,” Jones said. At this moment two scenarios are being proposed: “The first is that he will be a great model, he will know everything and there will be no problems. This is known as wild optimism and no theory of what is going to happen.”

What is developing is that “every vendor, every software platform, every cloud will want to compete vigorously and aggressively to be part of this market,” Jones said. “That means there will be lots and lots of competition and lots and lots of variation. You don’t have to worry about multi-cloud infrastructure or having to support it, but you will have to think about things like guardrails.”

Also: 1 in 3 marketing teams have implemented AI in their workflows

Another risk is applying an LLM to tasks that require much less power and analysis, such as address matching, Jones said. “If you use a big template for everything, you’re basically burning money. It’s the equivalent of going to a lawyer and saying, ‘I want you to write me a birthday card.’ They’ll do it and they’ll charge you attorney fees.”

The key is to keep an eye out for cheaper and more efficient ways to leverage LLMs, he urged. “If something goes wrong, you need to be able to dismantle a solution as quickly as you can launch a solution. And you have to make sure that all the associated artifacts around it are launched according to the model.”

There is no such thing as deploying a single model: AI users must run their queries across multiple models to measure performance and quality of responses. “You have to have a common way to capture all the metrics and play queries against different models,” Jones continued. “If you have people asking GPT-4 Turbowant to see how the same query performs in Calls. You should be able to have a mechanism by which to replay those queries and responses and compare performance metrics, so you can understand if you can do it in a more economical way. Because these models are constantly updated.”

Also: ChatGPT vs ChatGPT Plus: Is it still worth a paid subscription?

Generative AI “doesn’t malfunction normally,” he added. “GenAI is where you put an invoice and it says, ‘Great, here’s a 4,000-word essay about President Andrew Jackson. Because I’ve decided that’s what you wanted to say.’ You need to have guardrails against that.”





Source link