Bedrock Data: A Data Context Engine Built for AWS

Bedrock Data is a data security platform built for AWS. It runs entirely inside the customer’s environment, where it discovers and classifies structured, semi-structured, and unstructured data across nearly every AWS data store. That inventory is the starting point, not the product.

The product is context. For every finding, Bedrock Data answers the questions that turn visibility into security. Who can actually reach this data once every IAM layer is evaluated together? Where did the data come from, and where did its copies go? Which vulnerable infrastructure can touch it? Which exposure carries the most business risk? What does policy permit? And what could an AI agent built on Amazon Bedrock reveal in a response?

Answering those questions requires modeling AWS itself: its policy evaluation logic, its storage layouts, its enforcement primitives, and its AI services. Because classification, effective access, impact scoring, and policy evaluation are built as one system that feeds itself, Bedrock Data goes beyond the point-in-time inventories and hand-written rules of incumbent tools. It tells you not just where your data is, but whether it is safe.

Why Context, Not Just Visibility

Every data security posture management (DSPM) tool can tell you that a bucket contains Social Security numbers. That is visibility, but visibility alone is not security. Security lives in everything around that fact: who can reach the data, how it got there, which copies exist, what vulnerable infrastructure can touch it, what policy says about it, and what an AI agent could surface in a response.

Context cannot be bolted on through a connector, because each of those questions requires modeling the platform itself. Take the question of who can reach a bucket. In AWS, the answer lives in the interaction of identity policies, resource policies, service control policies (SCPs), permission boundaries, and trust relationships. No single document records it. A tool that reads policy documents one at a time can summarize each document, but it cannot compute the answer. Bedrock Data models AWS at that depth.

What Bedrock Data does: Models AWS policy logic, storage layouts, enforcement primitives, and AI services directly, so context is native to the platform rather than inferred from a connector reading documents one at a time.

The Foundation: Complete, Intelligent Visibility

AWS workloads spread data widely: across managed services such as DynamoDB, DocumentDB, RDS, and Redshift, across unmanaged databases that teams stand up on EC2, and across S3 and EFS. Incumbent approaches fall short at this breadth. Amazon Macie scans S3 only, struggles to scale cost-effectively to multi-petabyte environments, and cannot interpret nuanced enterprise data.

Bedrock Data scans structured, semi-structured, and unstructured stores across nearly all AWS data workloads. It reads managed relational databases through snapshot side-scanning against a temporary clone, so production never feels the load, and reads other stores through native read-only APIs that self-throttle. All analysis runs inside the customer’s environment through the Bedrock Data Outpost, a Lambda based architecture which uses CPU-bound techniques rather than costly GPU inference. Only metadata features travel back to the control plane, where they sustain the long-tail robustness of classification and topic detection across customers. Customer data itself never leaves the environment.

Scale is where most scanners break, because large data sources are highly partitioned. Data lakes and observability logs repeat the same structure across millions of objects, so reading every byte mostly re-confirms patterns the scanner has already seen. Bedrock Data’s patented Adaptive Sampling clusters partitioned objects by folder structure, file structure, and data type into one logical dataset, then scans a representative sample, cutting scan costs 10 to 100 times or more versus brute-force full reads. Tuneable depth scanning can still go to a full read when a team needs it. The approach holds at extremes: Bedrock Data was the only vendor able to scan an APAC retailer’s AWS environment of more than 200 petabytes within a week, while the retailer’s incumbent DSPM failed repeatedly attempting the same scan.

Understanding the data is the second half of the foundation, and it is where rules-based classification falters. Macie matches content against managed identifiers plus hand-written regex, and because the matching knows nothing about why the data exists, it produces false positives and false negatives alike: a rule for account numbers flags internal cost-center codes as urgently as customer financial accounts, while proprietary identifiers that match no published pattern never surface at all. Bedrock Data’s topic modeling instead infers each dataset’s data domain and business purpose, so a finding reads as data of this type, used for this purpose, likely owned by this team. A cost-center code stops resembling a customer account once the model understands the dataset around it, and inference surfaces the proprietary identifiers no rules-based classifier had a rule for.

What Bedrock Data does: Discovers data across nearly all AWS stores while customer data never leaves the environment, uses patented Adaptive Sampling to cut scan costs 10 to 100 times (proven on a 200+ petabyte environment scanned within a week), and replaces brittle regex with topic modeling that understands what each dataset is and who owns it.

Who Can Reach It?

Risk lives at the intersection of data and reach, and in AWS the reach question has no single answer document. Effective access is the net of granting layers and capping layers evaluated together. Identity-based and resource-based policies grant; SCPs, permission boundaries, and session policies cap and never grant. A bucket policy can read as a clean grant of s3:GetObject while an SCP denies it, and the effective answer is deny. The inverse trap exists too: a resource policy granting directly to a user produces an allow the user’s identity policy never mentions. Cross-account access doubles the work: AWS evaluates the calling account and the resource account separately and allows only when both agree, and every hop in an AssumeRole chain multiplies the checks.

Bedrock Data deploys across every account in the AWS Organization through CloudFormation StackSets and resolves effective access by evaluating all of these layers together, including cross-account AssumeRole chains, then attributes each path to the specific user or role at its end. The model holds at scales where other approaches collapse. One neobank customer’s data lake contains 33 million distinct ways principals can reach datasets; Bedrock Data alone was able to model that entitlement space, while other platforms simply break.

IAM Access Analyzer covers part of this ground well: it finds externally shared resources, flags dormant roles and keys, and validates new policies. But its model stops at the resource boundary. It can prove a principal reaches a bucket without saying whether the bucket holds payroll records or build logs, and a security team cannot prioritize access findings without sensitivity. Bedrock Data supplies the layer Access Analyzer reasons without. Access activity tracking on S3 then closes the least-privilege loop by showing who uses the access they hold; unexercised access is access a team can prune. One caution: break-glass and disaster-recovery roles are designed to sit unused, so activity-based pruning warrants a review step before any revoke.

What Bedrock Data does: Resolves true effective access across all IAM layers and cross-account AssumeRole chains for every account in the Organization, at scales of tens of millions of entitlement paths, and adds the data-sensitivity layer Access Analyzer lacks so findings can be prioritized.

Where Did It Come From, and Where Did It Go?

Access describes the present; the next question is history. A point-in-time inventory records where data sits today and nothing about how it got there. Sensitive data is copied, transformed, and re-exported across services constantly, which is how a single customer identifier ends up in four places when the team flagged three.

Bedrock Data finds similar data across the environment, inside AWS and beyond it, and uses that similarity to reconstruct lineage. A team can trace a dataset back to its origin and forward to every copy, including the unexpected fourth copy in a store nobody flagged. Similarity is the mechanism’s strength and its limit. Direct copies and exports match strongly, while heavily transformed or aggregated derivatives carry a weaker signal, so lineage reads as a map of likely flows for a team to confirm.

What Bedrock Data does: Detects similar data everywhere it lives, inside AWS and beyond, to reconstruct likely lineage, revealing copies and exports a static inventory would miss.

What Threatens It?

AWS environments produce more vulnerability findings than any team can fix. Amazon Inspector scans EC2 instances, ECR container images, and Lambda functions for CVEs and network exposure, and its risk score usefully adjusts CVSS for reachability and exploitability. What the score lacks is data sensitivity, so two functions with identical ratings can carry very different business risk depending on what each can reach.

Bedrock Data maps each service to the data it can access, then joins Inspector’s findings to that map, producing a remediation queue ranked by data impact. A medium-severity flaw on a Lambda function with a path to a sensitive datastore outranks a high-severity flaw on infrastructure that touches nothing important. The queue inherits Inspector’s coverage, and its accuracy rests on the access map and classification beneath it: one more reason the deployment spans the full Organization.

What Bedrock Data does: Joins Inspector’s CVE findings to a map of which data each service can reach, producing a remediation queue ranked by business impact rather than raw severity.

Which Exposure Would Hurt Most?

Each preceding question produces findings; the synthesis question is where limited remediation time goes first. Bedrock Data assigns every datastore and dataset an Impact Score built from the volume of sensitive data present and the sensitivity of the types found, with weighting drawn from the same topic modeling that classified the data. The easy case is easy anywhere: ten thousand customer records outrank a single test file. The case that breaks naive scoring is a small store of crown-jewel data, such as credentials or deal documents, sitting beside an enormous store of low-grade marketing contacts. Business context lets the small store win that comparison, because the score reflects what the data is for.

The same scoring applies to identities. Each user and role receives an Impact Score tracking the volume and sensitivity of the data it can reach, so the identities carrying the largest breach liability surface first. A score remains a model, so it should inform triage before it drives automation.

What Bedrock Data does: Assigns business-context-aware Impact Scores to datastores, datasets, and identities so the exposures with the greatest breach liability rise to the top, and can feed a SIEM or ticketing system pre-ranked.

From Context to Action: Policy and Enforcement

Context earns its keep when it changes what the platform allows. Bedrock Data ships detections for common anti-patterns, such as sensitive data in non-production environments, and lets organizations write their own policies in plain language: no SSNs reachable by user accounts, or sensitive data belongs only in this bucket. Each policy becomes a live detection that alerts on violation. A translated policy is still a detection rule, so the sound practice is to run it in alert-only mode, watch the volume, and tune it before wiring it to any response.

Enforcement itself runs through AWS-native machinery. Bedrock Data pushes tags onto datastores recording what they hold and how sensitive it is, and those tags plug directly into IAM through attribute-based access control (ABAC): a condition on aws:ResourceTag can restrict S3 buckets tagged as containing PII to a defined set of service accounts. Bedrock Data does the discovery and classification; IAM itself performs the deny.

Tag-based enforcement carries two sharp edges. First, a principal that can write tags can rewrite its own access: whoever holds s3:PutBucketTagging alongside a tag-conditioned grant can re-tag a resource until the condition matches. AWS’s mitigations apply, namely SCPs that deny changes to authorization tags except by designated administrators, and separation of duties so no principal holds tagging and data permissions on the same resources. Second, a misclassification driving an automated deny can lock a production pipeline out of its own data. Tags should drive alerts first and automated denies only after a team has reviewed the classification, beginning with non-production resources where the blast radius of a mistake is a ticket instead of an outage.

What Bedrock Data does: Converts plain-language policies into live detections and writes sensitivity tags that plug into native IAM ABAC, letting AWS itself enforce the deny while Bedrock Data supplies the discovery and classification.

What Could AI Combine and Reveal?

The final question is where every layer above gets used at once. Enterprises building AI on AWS build it on Amazon Bedrock, and an agent’s risk is the intersection of what it can reach and how sensitive that reach is. The reach is layered: an agent acts through an execution role evaluated by standard IAM logic, invokes action-group Lambda functions that carry execution roles of their own, and queries knowledge bases fed from sources such as S3, Confluence, SharePoint, and Salesforce.

Knowledge bases deserve particular care because they ingest at sync time. The connector crawls and embeds source content into the vector store, and from then on anyone with bedrock:Retrieve permission can pull it, regardless of the per-document permissions the source system enforced. A SharePoint library restricted to a handful of executives, once synced, answers to every principal the knowledge base answers to.

Bedrock Data enumerates the agents created in Amazon Bedrock, resolves each one’s effective reach across execution roles, action-group Lambdas, and knowledge bases using the same effective-access engine described above, and shows the sensitive data behind that reach using the same classification that built the foundation. What it surfaces is precisely the data the agent could surface in a response.

It then evaluates the Amazon Bedrock Guardrails configured to constrain those responses, because two properties leave gaps. Guardrails apply per call or per agent, so any invocation path or new agent left unwired is unprotected. And the built-in PII list is fixed, so proprietary identifiers go undetected until someone writes a custom rule, while the PII filter reads text output only, leaving values passed as tool-call parameters unfiltered. Bedrock Data compares the sensitive data types each agent can reach against the filters its guardrail defines, flags types with no corresponding filter, and suggests revisions. Static analysis describes configuration coverage; the filters themselves remain probabilistic at inference time. A clean report is the start of assurance, and a red one is a finding to fix today.

What Bedrock Data does: Maps each Bedrock agent’s full effective reach across roles, Lambdas, and knowledge bases, surfaces the sensitive data behind it, then checks Guardrail coverage against that data and flags the gaps.

One System, Modeled on the Platform

The agent analysis is the proof of the thesis. It works only because classification, effective access, Impact Scores, and policy evaluation already exist and feed each other: classification feeds Impact Scores, scores prioritize entitlement analysis, entitlements drive least-privilege pruning, lineage maps the copies, Inspector findings get ranked by the data behind them, and tags turn policy into IAM enforcement.

Each layer answers a question that lives inside AWS itself, in the SCPs, trust policies, partition layouts, and guardrail configurations no connector contains. Classification tells you where the data is. Modeling the platform tells you whether it is safe.