Mapping the Agent-to-Data Chain on GCP Vertex AI

Introducing Bedrock Data’s ArgusAI for GCP Vertex AI: continuous visibility into which agents reach which data through which model.

Vertex AI has become the default foundation for enterprise AI on Google Cloud. Conversational agents power customer-facing chatbots. Vertex AI Search apps deliver natural language retrieval over internal documents. Both categories of agent query, summarize and reason over institutional data at production scale.

The capability is what makes them valuable. It is also what makes governance teams pause. The question blocking faster adoption is not whether the technology works. It is whether anyone can answer a simple set of questions: what data can each agent actually reach, which model is processing that data, and whether those access patterns match policy.

Standard GCP tooling cannot answer those questions. This post explains why, walks through how Vertex AI agents actually connect to data, and describes how Bedrock Data’s ArgusAI closes the gap.

HOW VERTEX AI AGENTS REACH DATA

GCP offers two primary agent types, both backed by Discovery Engine, with distinct architectures and distinct governance implications.

Conversational agents are the assistants: customer support bots, internal helpdesks, operational copilots. Each agent can attach data store tools that provide retrieval-augmented generation (RAG) access to indexed documents. A single query triggers a multi-model pipeline. The agent-level LLM (e.g., Gemini 2.5 Flash) orchestrates the conversation, interprets intent and selects tools. When it invokes a data store tool, a separate RAG pipeline takes over: a rewriter LLM (Gemini 2.5 Flash Lite) reformulates the query, Discovery Engine retrieves relevant documents, and a summarizer LLM generates the answer from retrieved content. One user query. Three or more LLM invocations. Each independently configurable. Each processing document content from the connected data stores.

☁ GCP Console: Conversational Agents

Two Conversational agents deployed in us-east1: Financial Advisor Bot and Patient Care Assistant

Vertex AI Search apps are enterprise search applications built directly on Discovery Engine. They index documents in GCS buckets, BigQuery tables or websites. When the LLM add-on is enabled, a Gemini model generates summarized answers. The model does not just retrieve documents. It reads and processes them. If a data store indexes a GCS bucket containing employee SSNs, salary data and performance reviews, the LLM has access to all of it at query time.

☁ GCP Console: Vertex AI Search Apps

Four Search apps connected to Bedrock Demo data stores across HR, Legal, Financial and Healthcare domains

Both agent types connect to Discovery Engine data stores, which index documents from GCS buckets. The chain is:

Agent (Conversational Agent or Search App)

→ Data Store (Discovery Engine index)

→ GCS Bucket (source documents: patient records,

financials, contracts)

Governance lives at every link in that chain. Standard GCP tooling sees none of it as a chain.

WHY STANDARD GCP TOOLING FALLS SHORT

Standard GCP security tooling operates at the infrastructure layer. IAM policies, VPC Service Controls and Cloud Audit Logs tell you which service account has storage.objectViewer on a bucket. They do not tell you that a Conversational agent named “Patient Care Assistant” uses Gemini 2.5 Flash to query a data store that indexes patient records containing SSNs and PHI.

The gap exists because agent-to-data connections span multiple GCP services. The Conversational Agents service manages agents and tools in regions like us-east1. Discovery Engine manages data stores and search indexes in multi-regions like us. Cloud Storage holds the source documents. Vertex AI provides the foundation model. No single GCP service provides a unified view of which agent uses which model to access which data.

IAM tells you who can invoke the agent. It does not tell you what the agent can reach once invoked.

Capability	Standard GCP Tooling	ArgusAI
Agent inventory	✘ Manual / fragmented	✔ Auto-discovered
Agent → data store mapping	✘ Not available	✔ Full graph
LLM model identification	✘ Requires API calls	✔ Auto-captured
Data sensitivity classification	✘ Not available	✔ Adaptive Scanning
Effective access analysis	✘ IAM only	✔ Full chain resolution
Continuous monitoring	✘ Log-based, reactive	✔ Posture-based, proactive

HOW ARGUSAI DISCOVERS AND MAPS THE CHAIN

Bedrock Data’s ArgusAI automatically discovers both Conversational agents and Discovery Engine search apps across GCP projects, then maps the complete connection graph from agent to model to data.

Agent discovery. For each GCP project, ArgusAI scans Conversational Agent regions to discover agents and their tool configurations. Each data store tool reveals which Discovery Engine data store the agent can query. For Discovery Engine, ArgusAI discovers all search apps and their linked data stores directly. For every Conversational agent, ArgusAI fetches the generative settings to capture the exact LLM model powering the agent, not a guess but the actual model returned by the generativeSettings API (e.g., gemini-2.5-flash). For search apps, ArgusAI checks whether the LLM add-on is enabled, indicating that a Gemini model actively processes queries.

Data store to bucket resolution. ArgusAI walks from each data store to its indexed documents, sampling document URIs to discover which GCS buckets back the data store. This resolves the final link in the chain: the actual storage locations that contain the data the agent’s LLM can access.

The agent-to-data visualization. ArgusAI maps the complete policy flow for every agent. Datasources flow through the agent to the accessibility policies (the sensitive data types exposed to the model) and the specific model processing the data.

🛡 ArgusAI: Conversational Agents Inventory

Patient Care Assistant tagged with GLBA, SPD, PII, PHI, Healthcare Data. Financial Advisor Bot tagged with GLBA, SPD, PII, Financial Data, HR Data.

🛡 ArgusAI: Financial Advisor Bot: Policy Flow Visualization

Full policy flow: two datasources (Dob, Income) → Financial Advisor Bot → Exposed to Model (Name, Address, BankAccount, SSN, Income, Dob, Email, Phone, Agreements & Contracts, Recruiting & Onboarding) → gemini-2.5-flash

The Financial Advisor Bot has access to both financial records and HR documents, including employee SSNs, salary data and 401k beneficiary information. A financial advisory chatbot with access to HR data is a configuration decision that needs to be made deliberately. The connection exists. Whether it should is a governance call that requires visibility to make.

🛡 ArgusAI: Healthcare Records Search: Policy Flow Visualization

Datasource (Dob, SSN +3) → Healthcare Records Search → Exposed to Model (Name, Dob, SSN, MedicalDiagnosis, Patient Health Records) → gemini

This is the difference between knowing an agent has access to a bucket and knowing an agent has access to PHI.

CLASSIFYING WHAT EACH AGENT CAN REACH

Discovery alone is not enough. The critical question is what kind of data each agent can access. ArgusAI’s Adaptive Scanning engine scans the GCS buckets that back each data store, identifying sensitive data types at the file and column level: SSNs, dates of birth, medical diagnoses, account numbers, salary figures, beneficiary information. This classification maps directly onto the agent graph.

Looking at the Patient Care Assistant agent in ArgusAI, you do not just see that it connects to a healthcare data store. You see that the data store indexes a bucket containing patient records with SSN, DOB, diagnosis and insurance IDs, lab results with physician names and medical test values, and prescription history with medication details tied to patient SSNs.

SURFACING CONFIGURATION GAPS

With the agent graph and data classifications in place, ArgusAI surfaces specific, actionable configuration risks.

Scope exceeding purpose. A financial advisor bot connected to both financial records and HR documents, including employee SSNs, salary data and 401k beneficiary information. The scope of the agent’s data access exceeds the scope of its stated purpose.

LLM-enabled search on sensitive data. A Vertex AI Search app with the LLM add-on enabled, pointed at a data store indexing compliance audit findings and NDA agreements containing tax IDs and revenue projections. Every query sends these documents through a Gemini model for answer generation.

Unreviewed agent proliferation. Multiple teams creating agents in the same project, each attaching data store tools that index overlapping or sensitive GCS buckets. Without a centralized inventory, duplicate agents with broad data access accumulate silently.

These are not theoretical risks. They are specific findings tied to named agents, identified models and classified data. That specificity is what makes targeted remediation possible without blocking deployment.

CONTINUOUS MONITORING

Agent configurations change. New data store tools get attached. Search apps enable the LLM add-on. Teams create new agents and connect them to existing data stores. A data store that indexed only public documents last week now indexes a bucket with customer PII.

ArgusAI continuously scans the Vertex AI agent inventory, detecting new agents or search apps, changes to data store tool connections, LLM add-ons being enabled, new GCS buckets being indexed by existing data stores and changes in data sensitivity as new files are added to indexed buckets. Agent governance becomes a continuous posture, not a point-in-time review. The same model Bedrock Data applies to cloud data stores, identity access and infrastructure security across the Metadata Lake.

THE BIGGER PICTURE

GCP Vertex AI agent governance is one piece of a broader problem. ArgusAI extends the same visibility across the full enterprise AI risk surface: the chain of connectivity between agents, MCP servers, identity roles and data. Whether agents run on GCP Vertex AI, Amazon Bedrock, Snowflake Cortex or connect through MCP servers, ArgusAI maps the complete exposure model.

The Data Bill of Materials (DBOM) provides a continuously updated inventory of every data asset connected to an AI system. Guardrail Gap Analysis compares policy intent against model exposure and produces targeted remediation plans. Natural Language Policy enables governance teams to articulate controls in plain English and enforce them across the full AI stack.

The organizations that adopt Vertex AI agents fastest are not the ones with the fewest security requirements. They are the ones that can see what each agent reaches, classify what that access represents and detect when the posture changes. Precision instead of guesswork for security teams. Guardrails instead of gatekeepers for platform teams.

ArgusAI is available today for GCP Vertex AI.

See it in action → bedrockdata.ai/demo