Organizations Put Sensitive Data at Risk in AI Model Training

By

Real, identifiable data should never be used to train AI models, Perforce says.

Although 60% of organizations report experiencing data breaches or data theft in software development, AI, and analytics environments – up 11% from last year – most still say sensitive data should be allowed in AI training and testing, according to the 2025 State of Data Compliance and Security Report from Perforce.

Securing sensitive data is crucial for maintaining data compliance, yet the report highlights contradictions that suggest "a lack of shared understanding or consistent guidance within organizations about the safety of using sensitive data in AI model training and initiatives."

For example:

  • 91% of respondents say sensitive data should be allowed in AI training and testing.
  • 91% are very or extremely concerned about data breaches or theft of sensitive data in non-production environments.
  • 82% say it’s safe to use sensitive data in AI model training.
  • 84% of organizations allow data compliance exceptions in non-production, despite the threat of compromise.
  • 68% worry about privacy and compliance audits.

"Using sensitive data in AI poses tremendous risks,” the report notes, “including regulatory violations, data re-identification, breach and theft, and unpredictable re-sharing of data by models. All of these can cause reputational damage and financial loss."

When asked "What is preventing your organization from protecting all your sensitive data in non-production?" survey respondents cited the following reasons:

  • Slows down innovation (61%)
  • The quality of data is degraded (54%)
  • It’s too big of an effort (20%)
  • It’s too difficult to locate all the sensitive data (20%)

"AI models don’t forget inputs, so sensitive customer information should never enter these pipelines in the first place," the report says.

Learn more at Perforce Software.
 
 

 
 
 

10/06/2025

Related content

  • Delphix Report Cites Growing Concerns Over Data Protection
  • Tech News
    In the news: Open Source AGPL Added as License Option for Elasticsearch; Sovereign Tech Fund Invests in FreeBSD Development; Red Hat's OpenStack Services on OpenShift Now Generally Available; Juniper Networks Offers New AI-Native Courses and Services; Delphix Report Cites Growing Concerns Over Data Protection; Endor Labs Launches Magic Patches and Upgrade Analysis Tool; Rackspace to Offer TuxCare's Extended Linux System Support; Announcing eLxr: Enterprise-Grade Linux for Edge-to-Cloud Deployments; NSA Issues Zero Trust Guidance on Automation and Orchestration; and IT Pros Report Lack of Familiarity with Secure Software Development.
  • Data security and data governance
    Protecting data becomes increasingly important as the quantity and value of information grows. We describe the basics of data security and governance and how they intertwine.
  • Securing AI model deployments with SELinux
    We assess the effectiveness of SELinux in protecting AI model deployments on RHEL 9, demonstrating its ability to block unauthorized access, modification, and resource exploitation through mandatory access control.
  • Anonymization and pseudonymization of data
    Data anonymization and pseudonymization are two key techniques that ensure privacy while enabling the use of data for analysis and decision making; the two methods offer different approaches that vary as a function of use case.
comments powered by Disqus
Subscribe to our ADMIN Newsletters
Subscribe to our Linux Newsletters
Find Linux and Open Source Jobs



Support Our Work

ADMIN content is made possible with support from readers like you. Please consider contributing when you've found an article to be beneficial.

Learn More”>
	</a>

<hr>		    
			</div>
		    		</div>

		<div class=