Join Bedrock Data at AWS re:Invent 2025! Learn more
a bunch of purple cubes are stacked on top of each other on a purple background .

Securing AI Training Data: How Bedrock Scans AWS EFS for Sensitive Information

Securing your training data is essential, not just for complian[Image]ce, but for model integrity and brand protection. If you’re storing AI
June 11, 2025|4 min read
a man in a brown shirt is standing in front of a bush .

Praveen Yarlagadda

Founding Engineer

Share:

Securing your training data is essential, not just for compliance, but for model integrity and brand protection. If you’re storing AI training datasets in Amazon EFS, how do you ensure those volumes aren’t quietly harboring (and adding risk to the organization) PII, credentials, or other sensitive content?

Bedrock Security’s EFS scanning architecture is purpose-built for modern cloud-native environments. Unlike other approaches, it offers ephemeral, horizontally scalable, privacy preserving scanning that ensures data does not leave the customer’s environment. Here’s how it works, and why it’s different.

Why EFS Scanning Matters

AI training pipelines are increasingly dynamic and decentralized. Data scientists and ML engineers often mount EFS access points across multiple subnets for performance and scale, but this introduces risk, and brings up several questions:

  • Is there sensitive data buried in your feature stores?
  • Are files being reused from legacy workloads?
  • Who has access, and are those controls auditable?

Without contextual scanning and visibility, you’re flying blind.

How Bedrock Scans AWS EFS, and What Makes It Unique

1. Dedicated, Isolated VPC for Bedrock Scanners

Bedrock deploys scanners into a purpose-built, ephemeral VPC environment:

  • Private subnets host the scanning workloads
  • A NAT gateway securely handles outbound metadata transfer

VPC peering connects Bedrock to customer EFS-hosting VPCs

🔍 What’s Unique:

Most scanning tools require deployment within the customer’s VPC or demand heavy IAM privileges. Bedrock flips that model, isolating scanning infrastructure from your core environment, reducing blast radius, and simplifying compliance reviews.

2. Scoped Access Point-Based Scanning

Each scanner mounts a single EFS access point with precise IAM scope:

  • Metadata is read recursively and securely sent to Bedrock SaaS Service
  • No full file transfer; raw data never leaves the customer’s account

Bedrock SaaS Service performs AI-based classification and tagging

🔍 What’s Unique:

Access-point-level scanning allows fine-grained visibility without compromising data boundaries. Traditional scanners operate at the file system level, introducing broad access risk. Bedrock’s access-point isolation maintains a tight and verifiable security posture.

3. Massively Parallel, Horizontally Scalable Architecture

Need to scan 100 access points? Bedrock spins up 100+ ephemeral and cost-optimized serverless scanners:

  • Auto-scales based on available access points and SLA needs
  • Workloads complete independently with no infrastructure left behind

Works seamlessly across dev, staging, and production environments

🔍 What’s Unique:

This isn’t just scalable, it’s ephemeral. Bedrock scanners are serverless functions, not persistent EC2 instances or containers. That means zero ops overhead, no long-lived credentials, and a vastly smaller attack surface than alternatives.

Architecture Overview

Customer AWS Account Scanning Workflow Diagram

Customer AWS Account Scanning Workflow

The scanning workflow:

  1. Bedrock Outpost deploys inside an isolated VPC
  2. Each scanner mounts an EFS access point
  3. File metadata is read and transmitted securely to Bedrock SaaS Service
  4. Bedrock enriches, classifies, and maps sensitivity to risk, usage, and entitlements

The architecture integrates directly with Bedrock’s Metadata Lake, enabling cross-system correlation and automated tagging for future policy enforcement or lifecycle governance.

Why Not Just Scan Inside My VPC?

Some homegrown tools and open-source scripts do this, but they come at a cost:

  • You inherit all runtime and IAM complexity
  • There's no isolation between scanning logic and production systems
  • Managing secrets, logging, and lateral movement risks becomes your problem

Bedrock’s isolated approach removes that operational burden, without sacrificing control or visibility.

Why This is Different: Bedrock vs. Legacy Scanning Tools

Wrapping Up

Securing AI training pipelines is not just about compliance; it’s about trust. Bedrock’s EFS scanning capability offers a smarter, safer way to discover and manage sensitive data embedded in AI workloads.

If you're running multi-tenant training pipelines, staging AI data from internal systems, or inheriting legacy datasets, you need to know what's in those files. Bedrock lets you find out, at scale, without slowing anything down or risking a breach.

👉 Explore our platform or schedule a demo to see how EFS scanning fits into your broader data security strategy.

Related Content

Subscribe to our newsletter

See the difference with Bedrock