Generative AI is a new attack vector that puts businesses at risk, says CrowdStrike CTO


Skynesher/Getty Images

Cybersecurity researchers have been warning for quite some time that generative artificial intelligence (GenAI) are vulnerable to a wide range of attacks, from specially designed instructions that can break through protection barriers, to data leaks that can reveal confidential information.

The deeper the research goes, the more experts discover the extent to which GenAI constitutes a very broad risk, especially for business users with extremely sensitive and valuable data.

Also: Generative AI can easily turn malicious despite security barriers, academics say

“This is a new attack vector that opens up a new attack surface,” Elia Zaitsev, chief technology officer of cybersecurity vendor CrowdStrike, told ZDNET in an interview.

“I see that with generative AI, a lot of people are rushing to use this technology and bypassing the normal controls and methods” of secure computing, Zaitsev said.

“In many ways, you can think of generative AI technology as a new operating system or a new programming language,” Zaitsev said. “A lot of people don’t know what the pros and cons are, or how to use it properly, or how to secure it properly.”

The most infamous recent example of AI raising security issues is Microsoft’s Recall feature, which was originally going to be built into All new Copilot+ PCs.

Security researchers have shown that attackers who gain access to a PC with the Recall feature can view the entire history of an individual’s interaction with the PC, similar to what happens when a keylogger or other spyware is deliberately placed on machine.

“They’ve released a feature for consumers that is basically built-in spyware that copies everything you do to an unencrypted local file,” Zaitsev explained. “It’s a gold mine for adversaries to attack, breach the system and obtain all kinds of information.”

Also: US Auto Dealers Reel After Massive Cyberattack: Three Things Customers Should Know

After a violent reaction, Microsoft said it would disable the feature by default on PCs, making it an optional feature. Security researchers said there were still risks to the feature. Subsequently, The company said would not make Recall available as a preview feature on Copilot+ PCs, and now he says Let’s remember that “it will arrive soon through a post-launch Windows update.”

However, the threat is broader than a poorly designed application. The same problem of centralizing a lot of valuable information exists with all large language model (LLM) technology, Zaitsev said.

Crowdstrike-cto-elia-zaitsev-headshot

“I see a lot of people rushing to use this technology and bypassing the normal controls and methods” of secure computing, says Crowdstrike’s Elia Zaitsev.

Crowd on strike

“I call it naked LLMs,” he said, referring to large language models. “If I train a bunch of sensitive information, put it into a big language model, and then make that big language model directly accessible to an end user, then you can use fast injection attacks where you can have it basically dump all of the training information, including the information that is sensitive.”

Enterprise technology executives have expressed similar concerns. an interview this month Writing in technology newsletter The Technology Letter, data storage provider Pure Storage CEO Charlie Giancarlo commented that LLMs are “not yet ready for enterprise infrastructure.”

Giancarlo mentioned the lack of “role-based access controls” in LLMs. The programs will allow anyone to access an LLM’s message and discover sensitive data that has been absorbed with the model training process.

Also: Cybercriminals are using Meta’s Llama 2 AI, according to CrowdStrike

“Right now there are no good controls,” Giancarlo said.

“If I asked an AI robot to write my earnings script, the problem is that it could provide data that only I could have,” the CEO explained, “but once you teach the robot, it couldn’t forget it, and then someone else, before disclosure, might ask, ‘What will Pure’s profits be?’ and I would tell him.” Revealing information about company earnings before scheduled disclosure can lead to insider trading and other securities violations.

GenAI programs, Zaitsev said, are “part of a broader category that we could call non-malware intrusions,” where there is no need to invent malicious software and place it on a target computer system.

Cybersecurity experts call this malware-free code “living off the land,” Zaitsev said, and uses vulnerabilities inherent in a software program by design. “You’re not bringing in anything external, you’re just taking advantage of what’s built into the operating system.”

A common example of living off the land includes SQL injectionwhere the structured query language used to query an SQL database can be designed with certain character sequences to force the database to take steps that would normally be blocked.

Similarly, LLMs are themselves databases, as the primary function of a model is “just super-efficient data compression” that effectively creates a new data store. “It’s very analogous to SQL injection,” Zaitsev said. “It’s a fundamental negative property of these technologies.”

However, Gen AI technology is not something we should abandon. It has its value if it can be used carefully. “I’ve seen firsthand some pretty spectacular successes with [GenAI] “The technology is very advanced,” Zaitsev said. “And we are already using it very effectively on a customer-facing basis with Charlotte AI,” Crowdstrike’s support program that can help automate some security functions.

Also: Enterprise cloud security failures ‘worrying’ as AI threats accelerate

Techniques to mitigate risk include validating the user’s message before it reaches an LLM and then validating the response before sending it back to the user.

“Users are not allowed to pass uninspected indications directly to the LLM,” Zaitsev said.

For example, a “naked” LLM can search directly in a database that it has access to through “RAG” or, Retrieval Augmented Generation, An increasingly common practice to take the user’s message and compare it to the contents of the database. This extends the LLM’s ability to reveal not only sensitive information that has been compressed by the LLM, but also the entire repository of sensitive information in those external sources.

baidu-2024-rag-outline.png

RAG is a general methodology that allows an LLM to access a database.

Baidu

The key is not to allow naked LLM to directly access data stores, Zaitsev said. In a sense, you need to tame RAG before it makes the problem worse.

“We take advantage of the property of LLMs where the user can ask an open-ended question, and then we use that to decide what they’re trying to do, and then we use more traditional programming technologies” to complete the query.

“For example, Charlotte AI, in many cases, allows the user to ask a generic question, but then what Charlotte does is identify which part of the platform, which data set has the source of truth, and then extract information to answer the question. question” via an API call instead of allowing LLM to query the database directly.

Also: AI is changing cybersecurity and companies must be aware of the threat

“We have already invested in building this robust platform with API and search capability, so we do not need to rely too much on the LLM, and we are now minimizing the risks,” Zaitsev said.

“The important thing is that you have blocked these interactions, not that they are completely open.”

Beyond point-of-notice misuses, the fact that GenAI can leak training data is a very broad concern for which appropriate controls must be found, Zaitsev said.

“Are you going to put your social security number in a message that you then send to a third party that you have no idea is now training your social security number into a new LLM that someone could leak via an injection attack?”

“Privacy, personally identifiable information, knowing where your data is stored and how it’s protected — those are all things people should keep in mind when developing Gen AI technology and using other vendors that use that technology.”





Source link