AI chatbots can be tricked by hackers into stealing your data

Security researchers have uncovered a new flaw in some AI chatbots that could have allowed hackers to steal personal information from users.

A group of researchers from the University of California, San Diego (UCSD) and Nanyang Technological University in Singapore discovered the flaw, which they have nameed "Imprompter", which uses a clever trick to hide malicious instructions within seemingly-random text.

As the "Imprompter: Tricking LLM Agents into Improper Tool Use" research paper explains, the malicious prompt looks like gibberish to humans but contains hidden commands when read by LeChat (a chatbot developed by French AI company Mistral AI) and Chinese chatbot ChatGLM.

The hidden commands instructed the AI chatbots to extract personal information the user has shared with the AI, and secretly send it back to the hacker - without the AI user realising what was happening.

The researchers discovered that their technique had a nearly 80 percent success rate at extracting personal data

In examples of possible attack scenarios described in the research paper, the malicious prompt is shared by the attacker with the promise that it will help "polish your cover letter, resume, etc..."

When a potential victim tries to use the prompt with their cover letter (in this example, a job application)...

... the user does not see the resulted they hoped for.

But, unknown to them, personal information contained in the job application cover letter (and the user's IP address) is sent to a server under the attacker's control.

"The effect of this particular prompt is essentially to manipulate the LLM agent to extract personal information from the conversation and send that personal information to the attacker’s address," Xiaohan Fu, a computer science PhD student at UCSD and the lead author of the research, told Wired. "We hide the goal of the attack in plain sight."

The good news is that there is no evidence that malicious attackers have used the technique to steal personal information from users. The bad news is that the chatbots weren't aware of the technique, until it was pointed out to them by the researchers.

Mistral AI, the company behind LeChat, were informed about the security vulnerability by the researchers last month, and described it as a "medium-severity issue" and fixed the issue on September 13, 2024.

According to the researchers, hearing back from the ChatGLM team proved to be more difficult. On 18 October 2024 "after multiple communication attempts through various channels", ChatGLM responded to the researchers to say that they had begun working on resolving the issue.

AI chatbots that allow users to input arbitrary text are prime candidates for exploitation, and as more and more users become comfortable with using large language models to follow their instructions the opportunity for AI to be tricked into performing harmful actions increases.

Users would be wise to limit the amount of personal information that they share with AI chatbots. In the above example, it would not be necessary - for instance - to use your real name, address, and contact information to have your job application cover letter rewritten.

In addition, users should be wary of copying-and-pasting prompts from untrusted sources. If you don't understand what it does, and how it does it, you might be more sensible to steer clear.