Can you stop your data from being used to train AI?

by Black Hat Middle East and Africa
on
Can you stop your data from being used to train AI?

Can you stop AI models from using and repurposing your digital content? 

The short answer is yes – sort of. 

The reality of digital life today is that anything you’ve ever published online has probably been ingested by AI, and perhaps regurgitated by now. LLMs and AI image creators leverage immense volumes of user data from across the internet; and as new and more advanced generative AI systems become commercially available, their appetite for data is only going to grow.

Companies and digital users need to learn how AI uses their content, and what they can do to prevent it from being hoovered up by hungry AI models. 

You can’t backdate your data protection 

The internet has already been scraped by AI developers. It’s highly likely that anything you’ve posted in the past has already been used in AI training.

Going forward though, it’s worth understanding that while a growing number of AI companies do offer an opt-out option to allow you to say no to your data being used in AI training, many of them make it very difficult to do that.

Tech giants including Meta, Google, and X have included clauses in their privacy policies that say they may use your data to train their AI models. And once data has been used, it’s very difficult to get it removed from the pools of data the AI learns from. 

Companies that enable you to keep your content out of AI training data 

Although their opt-out processes may be unclear, a growing number of companies do enable users to keep your content out of AI training data. 

As explained in detail by Wired, those companies include: 

  1. Adobe only analyses content to train GenAI models if it’s submitted to the Adobe Stock marketplace. And you can opt out of this via Adobe’s privacy page.
  2. Amazon enables customers to opt out of AI training – and the process has been made easier for users recently, too.
  3. Figma automatically opts you out of AI data training if you have an ‘Organisation’ or ‘Enterprise’ plan with the design software. If you have a ‘Starter’ or ‘Professional’ plan, you can opt out via the AI tab in settings.
  4. Google’s Gemini chatbot feeds conversations into AI training, but you can opt out easily by opening the LLM in your browser, and heading to the ‘Activity’ tab.
  5. Grammarly enables personal accounts to opt out of AI training through their account settings.
  6. LinkedIn has been feeding user posts into its AI training data, but now enables users to opt out from new posts being used via the ‘Data Privacy’ settings in their profile. 

Will you always have to opt out of everything?  

As coherent AI governance emerges from this period of rapid growth, it’s likely that more companies will either choose, or be required, to invite users to opt in if they’re OK with their content being used in AI training data – rather than automatically opting them in unless the user actively opts out. 

Companies already doing this include Anthropic, with its ‘frontier AI models backed by uncompromising integrity.’ Anthropic leverages user information for training their AI model, Claude, only when the user explicitly grants permission to do so. 

Ultimately, the future of data privacy in an AI-powered digital landscape remains uncertain. At this point in time, awareness is crucial: because if you know your content is being used, and how it’s being used, you can make informed decisions to protect it.

Join us at Black Hat MEA 2024 and discover how to improve your organisation’s cyber resilience.

Share on

Join newsletter

Join the newsletter to receive the latest updates in your inbox.


Follow us


Topics

Sign up for more like this.

Join the newsletter to receive the latest updates in your inbox.

Related articles