Context for Data: The Missing Layer in Enterprise AI

Everyone in enterprise technology is talking about context right now. Context engineering, context windows, semantic context and other contexts. But there’s a version of context that is more fundamental and almost nobody is discussing it: context about data itself.

For an enterprise, data context means everything that describes the data payload - What is it, What sensitivity it is, Where it comes from, Who owns it, Who and what can access it, How it is being used, What its lineage is, How it moves and What is at risk. This is the context that tells an AI system what it can safely use and how it must be handled, and what should be off-limits

The problem is that enterprise-wide data context has never been achievable. Not because it is not valuable. Because the technology to build it, derive it and gather it at scale did not exist.

Why comprehensive data context was never possible

The concept of data context is not new. Enterprises have been trying to capture it for more than a decade, mostly manually, through data catalogs, master data management platforms and data classification programs. None of these have worked at enterprise scale and are almost immediately inaccurate and out of data.

There are three reasons.

First, the technologies that captured context could not keep up with the volume of data. Data catalogs were built for structured environments where humans curated metadata. They were never designed for the reality of modern enterprise data, where 80 percent of what organizations generate is unstructured and growing at three times the rate of structured data according to Gartner.

Second, it was too expensive. Analyzing all of an enterprise’s data to build context required compute and human resources that made the exercise cost-prohibitive for most organizations. So companies classified a fraction of their data, governed what they could and accepted the rest as dark data.

Third, data is fundamentally difficult. Data is fluid, non-discrete and moves across systems constantly. People ETL data from object stores into data warehouses, export spreadsheets, copy files across repositories, make derivatives that are similar but carry the same risk. Putting your arms around data is one of the hardest problems in enterprise technology and it has been for decades.

To understand data, human-like reasoning is required.

The result is that a third of businesses still lack a formal data classification policy. Most enterprises have no or minimal level of context for a small fraction of their data. The rest exists without context, labels, ownership, governance.

Why data context is essential now

This state of affairs was barely acceptable when the primary consumers of enterprise data were BI dashboards and SQL queries, requested and accessed by humans. Those systems were deterministic and even though humans are not, humans operate at human speed.

AI changed that equation.

AI is non-deterministic and runs at machine speed. You cannot predict what it will do, just like you can’t predict what humans will do. But now the problem is a machine-scale problem. AI will find data wherever it lives and use it for what it is designed to do, disregarding the risk. Current AI technologies do not need you to prepare data for them. They can access your raw data exactly where it is, whether it is in a data warehouse, an object store, a file share or a document repository, often via MCP servers. All data is fair game if it’s not governed.

Without context, AI does not know whether the data is sensitive and what risk it is incurring if it uses it. Without context AI does not know if it is allowed to use this data and which policies or regulations it will violate. Without context, AI does not know if this data will bias its output. Without context, AI does not know whether the data has provenance and it is safe to use to produce business outcomes. Without context, AI does not know whether it should be providing surfacing data to users on whose behalf it is operating.

McKinsey's 2025 State of AI report found that 88 percent of organizations are now using AI in at least one business function. But nearly two-thirds have not begun scaling AI across the enterprise. And 51 percent reported at least one negative AI incident in the past year, including inaccuracy, compliance failures and privacy breaches. Gartner predicts 25% of enterprise breaches will trace back to AI agent abuse by 2028. Organizations are deploying AI broadly but governing it narrowly and incidents are the predictable outcome.

Meanwhile, Gartner also projects that AI spending on data readiness will increase sevenfold between 2025 and 2029. That’s the market telling us that the data foundation is not ready for what is being built on top of it. That sevenfold increase is the enterprise learning, often the hard way, that AI without data-context-informed guardrails is a liability.

What viable data context requires

If data context at scale has been attempted for years without success, what has to be different now?

Three things.

Architecture that scales efficiently. The approach cannot be appliance-based. It has to be highly parallelized and built to keep pace with the volumes of data that enterprises have and continue to generate, without breaking the bank. If you cannot scale efficiently, you will cover a fraction of your environment and call it done. A fraction is not sufficient when AI can reach everything.
AI-native understanding of data. Unstructured data is complex. It requires a brain to understand it, not a regular expression or a rule. Classification and contextualization have to be done autonomously by AI, without requiring human input. This is the only way to keep up with data at enterprise scale.
An open backbone that makes context available everywhere. Data context cannot be locked inside a single product or a narrow use case. The enterprise needs a metadata layer that is openly accessible to any downstream technology that needs this context, whether through labeling, APIs or direct integration. Context that sits in a silo is not context, it is just another silo.

Bedrock Data’s platform is designed around these three principles because we saw that partial context, narrow context and siloed context all produce the same outcome: enterprises operating without a real understanding of what their AI systems can access.

Enabling AI, not blocking it

Most enterprises instinctively want to protect data from AI - blocking access and restricting usage. Their teams are habituated to locking things down.

I think about it differently. An enterprise’s data is its competitive advantage - that’s not new. The new imperative is to enable AI to access all of its proprietary data, but do so safely. Without doing so, AI will not deliver the value the enterprise needs. This is not about blocking. It is about understanding data at a deep level so you can open it up responsibly.

That means deeply inspecting data payloads for each AI system and building what we call a Data Bill of Materials (DBOM) that maps what data feeds each system, what that data contains, what is sensitive, what could introduce bias and what guardrails are required. When you have that level of context, you do not have to choose between innovation and security. You can push data to AI systems with confidence because you understand what they are consuming.

Our mission at Bedrock Data is to democratize safe and efficient use of all enterprise data, everywhere. That is an innovation statement. It is about accelerating AI adoption, not slowing it down.

The next decade: replatforming on AI

The next decade will be defined by enterprises replatforming on top of AI. AI is becoming the new enterprise platform. AI is the engine, data is its fuel and security is the guardrails.

When you run AI on data without security, you are in the risk zone. When you have security but are not deploying AI because you are too conservative, you are leaving value on the table. When you are running AI but not giving it the right data, you get poor outcomes. The target is the intersection of all three: using AI on the right data with the right guardrails.

Reaching that target zone requires context. It requires knowing what you have, understanding how sensitive it is, governing who and what can access it and maintaining that understanding continuously as data changes and moves. Not for a single dataset or a single system. Across the entire enterprise data estate, from databases and data warehouses to object stores and document repositories, structured and unstructured alike.

McKinsey’s latest research on agentic AI reinforces this: eight in ten companies cite data limitations as the primary roadblock to scaling AI agents. The technology is ready but the data foundation is not. Context is the missing piece of that foundation.

Enterprises that build enterprise-wide scalable data context will replatform effectively. Enterprises that do not will be operating a non-deterministic system on top of their data with no understanding of what it can access, what is at risk or what it will incur.

What’s coming up in this series

This article lays out why data context is the single most important capability for enterprises deploying AI. In three follow-up articles, I will go into specific dimensions of this problem:

1. Why data context has never scaled before and what has now changed

2. How deep data context enables safe AI adoption at scale

3. Why AI agents require data context to self-govern

The replatforming of the enterprise is well underway. Whether it succeeds depends on what we build underneath it.

Context for Data: The Missing Layer in Enterprise AI

Related Content

Subscribe to our newsletter

Subscribe to newsletter