I agree with critics of the letter who say that worrying about future risks distracts us from the very real harms AI is already causing today. Biased systems are used to make decisions about people’s lives that trap them in poverty or lead to wrongful arrests. Human content moderators have to sift through mountains of traumatizing AI-generated content for only $2 a day. Language AI models use so much computing power that they remain huge polluters.
But the systems that are being rushed out today are going to cause a different kind of havoc altogether in the very near future.
I just published a story that sets out some of the ways AI language models can be misused. I have some bad news: It’s stupidly easy, it requires no programming skills, and there are no known fixes. For example, for a type of attack called indirect prompt injection, all you need to do is hide a prompt in a cleverly crafted message on a website or in an email, in white text that (against a white background) is not visible to the human eye. Once you’ve done that, you can order the AI model to do what you want.
Tech companies are embedding these deeply flawed models into all sorts of products, from programs that generate code to virtual assistants that sift through our emails and calendars.
In doing so, they are sending us hurtling toward a glitchy, spammy, scammy, AI-powered internet.
Allowing these language models to pull data from the internet gives hackers the ability to turn them into “a super-powerful engine for spam and phishing,” says Florian Tramèr, an assistant professor of computer science at ETH Zürich who works on computer security, privacy , and machine learning.
Let me walk you through how that works. First, an attacker hides a malicious prompt in a message in an email that an AI-powered virtual assistant opens. The attacker’s prompt asks the virtual assistant to send the attacker the victim’s contact list or emails, or to spread the attack to every person in the recipient’s contact list. Unlike the spam and scam emails of today, where people have to be tricked into clicking on links, these new kinds of attacks will be invisible to the human eye and automated.
This is a recipe for disaster if the virtual assistant has access to sensitive information, such as banking or health data. The ability to change how the AI-powered virtual assistant behaves means people could be tricked into approving transactions that look close enough to the real thing, but are actually planted by an attacker.