Organizations Put Sensitive Data at Risk in AI Model Training

By

Real, identifiable data should never be used to train AI models, Perforce says.

Although 60% of organizations report experiencing data breaches or data theft in software development, AI, and analytics environments – up 11% from last year – most still say sensitive data should be allowed in AI training and testing, according to the 2025 State of Data Compliance and Security Report from Perforce.

Securing sensitive data is crucial for maintaining data compliance, yet the report highlights contradictions that suggest "a lack of shared understanding or consistent guidance within organizations about the safety of using sensitive data in AI model training and initiatives."

For example:

  • 91% of respondents say sensitive data should be allowed in AI training and testing.
  • 91% are very or extremely concerned about data breaches or theft of sensitive data in non-production environments.
  • 82% say it’s safe to use sensitive data in AI model training.
  • 84% of organizations allow data compliance exceptions in non-production, despite the threat of compromise.
  • 68% worry about privacy and compliance audits.

"Using sensitive data in AI poses tremendous risks,” the report notes, “including regulatory violations, data re-identification, breach and theft, and unpredictable re-sharing of data by models. All of these can cause reputational damage and financial loss."

When asked "What is preventing your organization from protecting all your sensitive data in non-production?" survey respondents cited the following reasons:

  • Slows down innovation (61%)
  • The quality of data is degraded (54%)
  • It’s too big of an effort (20%)
  • It’s too difficult to locate all the sensitive data (20%)

"AI models don’t forget inputs, so sensitive customer information should never enter these pipelines in the first place," the report says.

Learn more at Perforce Software.
 
 

 
 
 

10/06/2025

Related content

  • Delphix Report Cites Growing Concerns Over Data Protection
  • Tech News
    In the news: Open Source AGPL Added as License Option for Elasticsearch; Sovereign Tech Fund Invests in FreeBSD Development; Red Hat's OpenStack Services on OpenShift Now Generally Available; Juniper Networks Offers New AI-Native Courses and Services; Delphix Report Cites Growing Concerns Over Data Protection; Endor Labs Launches Magic Patches and Upgrade Analysis Tool; Rackspace to Offer TuxCare's Extended Linux System Support; Announcing eLxr: Enterprise-Grade Linux for Edge-to-Cloud Deployments; NSA Issues Zero Trust Guidance on Automation and Orchestration; and IT Pros Report Lack of Familiarity with Secure Software Development.
  • Data security and data governance
    Protecting data becomes increasingly important as the quantity and value of information grows. We describe the basics of data security and governance and how they intertwine.
  • Anonymization and pseudonymization of data
    Data anonymization and pseudonymization are two key techniques that ensure privacy while enabling the use of data for analysis and decision making; the two methods offer different approaches that vary as a function of use case.
  • Understanding Cybersecurity Maturity Model Certification
    United States Cybersecurity Maturity Model Certification will be required by mid-2023 to handle controlled unclassified information and win federal contracts, but it can also help minimize business risk and keep information out of the hands of adversaries.
comments powered by Disqus
Subscribe to our ADMIN Newsletters
Subscribe to our Linux Newsletters
Find Linux and Open Source Jobs



Support Our Work

ADMIN content is made possible with support from readers like you. Please consider contributing when you've found an article to be beneficial.

Learn More”>
	</a>

<hr>		    
			</div>
		    		</div>

		<div class=