What are prompt injections and why do they matter?

by Black Hat Middle East and Africa
on
What are prompt injections and why do they matter?

Welcome to the new 209 cyber warriors who joined us last week. Each week, we'll be sharing insights from the Black Hat MEA community. Read exclusive interviews with industry experts and key findings from the #BHMEA stages.

Keep up with our weekly newsletters on LinkedIn — subscribe here.


This week we’re focused on…

Prompt injections. 

Why? 

Because on the blog this week, we wrote about a new GenAI threat dubbed Imprompter, that uses a prompt injection to steal user information. 

So we thought we’d go a little deeper into the world of prompt injections and talk about what they are, the damage they can do, and why they’re so hard to protect against. 

What is a prompt injection?

It’s a type of security vulnerability affecting artificial intelligence (AI) and machine learning (ML) models. And especially large language models (LLMs) that are trained to follow instructions from users. 

Prompt injections are prepared in advance – the attacker manipulates the information that will be used by the AI model, in order to make that model produce a particular response or bypass its in-built restrictions and safety measures. 

The manipulated input injects malicious instructions into the AI model – and the AI then follows those instructions. 

This type of attack takes advantage of the way that many AI models, and particularly LLMs, are designed. These models process instructions and data together, which means they often can’t tell the difference between legitimate instructions and nefarious ones. 

An attacker might, for example, inject a very straightforward prompt like: ‘ignore all previous instructions and do this instead.” 

That would be a direct prompt injection, with the purpose of immediately changing the AI’s response. But prompt injections can also be indirect and gradual, influencing the AI over a period of time by regularly sliding malicious prompts into the model. Or prompt injections can be embedded within the training data used in the AI system, in order to create an ongoing bias in the model’s responses. 

What damage can prompt injections do? 

The potential impact of prompt injection attacks is growing all the time, because more and more people are using LLMs – including ChatGPT, Perplexity, Falcon, Gemini, and more. 

The more users there are, the greater potential there is for prompt injections to be leveraged for data theft and network entry. 

Prompt injection attacks can enable threat actors to: 

  • Bypass AI content filters and restrictions
  • Extract sensitive information from AI models and users
  • Manipulate AI outputs for malicious purposes
  • Evade detection by AI-powered security systems

And as we integrate AI tools into an increasing number of critical systems across a wide range of industries, the scope for these attacks to do serious harm continues to grow. 

Why are they hard to protect against? 

If we look at the Imprompter attack strategy in particular, it starts with a natural language prompt that tells the AI to extract all personal information from the user’s conversation. The researchers’ algorithm then generates an obfuscated version of this prompt which has the exact same meaning to the LLM, but just looks like a series of random characters to a human. 

And this highlights a key concern in the detection of, and protection against, prompt injection attacks – we don’t always know exactly how they work. 

Key known approaches to mitigate prompt injection risks at the moment include: 

  • Input and output filtering
  • Developing more robust internal prompts
  • Implementing techniques like reinforcement learning from feedback provided by humans
  • Implementing ongoing security checks to detect injections
  • Creating separation between user inputs and system instructions 

Join the conversation 

How can the field of cybersecurity work to overcome the threat of prompt attacks? Open this newsletter on LinkedIn and share your perspective in the comment section. 

See you at Black Hat MEA 2024. 


Do you have an idea for a topic you'd like us to cover? We're eager to hear it! Drop us a message and share your thoughts. Our next newsletter is scheduled for 13 November 2024.

Catch you next week,
Steve Durning
Exhibition Director

Join us at Black Hat MEA 2024 to grow your network, expand your knowledge, and build your business

Share on

Join newsletter

Join the newsletter to receive the latest updates in your inbox.


Follow us


Topics

Sign up for more like this.

Join the newsletter to receive the latest updates in your inbox.

Related articles