The Cyber Archive

Establishing AI Governance Without Stifling Innovation | [un]prompted 2026

Learn how to build a tiered AI governance framework that balances enterprise AI security with innovation — from intake scoring to human oversight gates.

BN
Deep dive of a talk by
Billy Norwood
16 April 2026
6848 words
38 min read

Billy Norwood presenting talk - Billy Norwood - Establishing AI Governance Without Stifling Innovation | [un]prompted 2026 at Unprompted 2026
Billy Norwood presenting talk - Billy Norwood - Establishing AI Governance Without Stifling Innovation | [un]prompted 2026 at Unprompted 2026

Your organization just approved 40 AI projects, a consulting firm handed you a roadmap, and now the CISO is realizing no one thought about AI governance until after the fact — that is the exact situation Billy Norwood walked into at FFF Enterprises, a $5 billion pharmaceutical distributor where patients depend on drug authorizations being handled correctly. When AI governance is bolted on after deployment, you get access control spaghetti, undefined human oversight thresholds, and shadow AI running on personal credit cards before procurement even knows a vendor exists.

This post breaks down the practical AI governance framework Norwood built at FFF — tiered committees, risk-scored intake forms, approved tooling boundaries, and where to insert human review in agentic workflows — drawing directly from a real-world deployment in a highly regulated healthcare environment.

Key Takeaways

  • You'll learn how to structure a tiered AI governance committee — from executive steering to operational center of excellence — so that risk escalation is automatic and high-stakes decisions reach the right people without bottlenecking innovation.
  • You'll be able to design an AI use case intake and risk-scoring system that separates legitimate automation candidates from shadow AI and over-engineered requests, giving you a repeatable framework to prioritize safely.
  • Apply this to avoid the common mistake of launching AI projects with vague acceptable-use policies — you'll understand how to define approved tooling, prohibited use cases, mandatory training sign-offs, and human oversight checkpoints before a single agent goes live.

Building a Tiered AI Governance Committee Structure

When a $5 billion pharmaceutical distributor kicked off 40 AI projects based on a consultant’s roadmap, the CISO walked into the final presentation and realized governance had been an afterthought. That reactive moment is where most enterprise AI governance frameworks fail — structure gets retrofitted after risk is already embedded in production systems.

The solution Billy Norwood built at FFF Enterprises separates policy authority from operational execution through two distinct committee tiers, each with a defined scope, membership, and escalation role.

Tier 1: Executive Steering Committee

The top-level governance body at FFF includes the CISO, CIO, General Counsel, and Chief Compliance Officer. This group operates at the strategic layer: publishing policy, addressing ethics, managing regulatory exposure, and making final calls on high-risk AI use cases. The composition is deliberate — legal and compliance anchor the group because AI deployments in healthcare touch HIPAA, patient safety, and California privacy law simultaneously.

This committee does not review every AI project. Its mandate is to set the guardrails and absorb escalations that operational tiers cannot resolve on their own.

Tier 2: AI Center of Excellence (CoE)

The CoE operates at the VP, director, and senior manager level. Its membership includes:

  • Data science leads responsible for building and maintaining AI models
  • Infrastructure engineers handling deployment and integration
  • The security architect, who surfaces risk findings and escalates when needed
  • HR — included specifically for change management and adoption enablement, because employee anxiety about AI-driven job displacement is a real adoption barrier that undermines rollout success

The CoE handles day-to-day operational decisions: standardizing practices, managing shared services, evaluating use cases, and maintaining the controls framework at an implementation level. For lower-risk requests, a manager can engage the CoE directly through a lightweight “agent checkout” process without triggering a full governance committee review.

Risk-Based Escalation Logic

The defining feature of this structure is automatic risk escalation. Use cases are not reviewed by a fixed committee — they are routed based on data sensitivity and process criticality:

  • Use cases touching PHI or large volumes of PII escalate to the executive committee
  • Cases involving critical business processes or significant financial exposure get elevated
  • Lower-risk, well-scoped requests can be handled entirely within the CoE

The security architect on the CoE serves as the escalation trigger — when a proposed use case crosses defined thresholds, they are responsible for pushing it up to the CISO and relevant executives (e.g., VP of Pharmacy for PHI-adjacent workflows).

Why Committee Structure Determines Control Effectiveness

From a security engineering standpoint, the tiered model matters because it defines where controls get mandated versus where they get implemented. Executive-level mandates (access control standards, human oversight requirements, prohibited use case lists) flow down. Implementation details (Databricks labeling, AD group configurations, monitoring thresholds) flow up through the CoE as operational feedback. Without this separation, security controls either stay abstract at the policy level or get implemented inconsistently without strategic alignment.

One structural gap Norwood acknowledges: while every other member of the governance committee sits on the executive team, the CISO does not hold a formal executive seat. This means some decisions get escalated further — to the CEO or CFO — outside the governance committee’s authority. For security engineers advising on governance design, this is a structural risk worth flagging: the CISO’s formal authority needs to match the risk decisions they are expected to own.

Tiered AI governance committee structure with Executive Steering Committee and AI Center of Excellence

Actionable Takeaways

  • Define committee membership before launching any AI projects — include Legal, Compliance, and HR alongside Security and Engineering. HR's role in adoption enablement is as operationally important as technical controls, particularly in organizations where workforce anxiety about AI is high.
  • Build explicit risk escalation thresholds into your governance model from day one. Document which data types (PHI, PII, financial records) and process criticality levels automatically trigger executive committee review, so routing decisions are deterministic rather than judgment calls.
  • Ensure the CISO holds formal authority commensurate with the risk decisions they are expected to make. If the CISO is not on the executive team, escalation paths can bypass governance entirely when senior leadership applies pressure to accelerate AI deployment.

Common Pitfalls

  • Standing up a single AI governance committee without distinguishing strategic policy authority from operational implementation leads to bottlenecks — every use case gets escalated to the top, the committee becomes a blocker, and business units route around governance entirely to avoid delays.
  • Assembling a governance committee without HR representation underestimates the change management challenge. At FFF, HR was essential for framing AI adoption positively and preventing the "AI is taking your job" narrative from undermining the rollout — a problem that existed independently of any technical security concern.

AI Use Case Intake and Risk Scoring

When FFF Enterprises received a consultant-delivered list of 40 AI projects spanning HR, pharmacy distribution, finance, and patient care, the immediate challenge was not technical — it was evaluative. Without a structured AI use case intake process, every business division was competing for priority and no one had a clear mechanism for separating high-value automation candidates from wishful thinking or high-risk overreach. The AI governance framework Norwood built addresses this directly through a two-part system: a use case intake form and a risk scorecard.

The Initial Mistake: Shallow Intake Forms

In the first iteration, the intake process was deliberately light. Teams were asked to describe their proposed use case, indicate how much technical work they expected, and state what outcomes they anticipated. Risk assessment existed in name only — reviewers made judgment calls based on informal criteria. No one was required to document their current process, how long it took, or what measurable improvement they expected AI to deliver.

This approach failed quickly. The governance committee found itself approving or deprioritizing projects without the data needed to make confident decisions. The 40 initial use cases included a significant proportion of requests that turned out to be simple data joins between Salesforce[1] and SAP[2] — not AI projects at all, just reporting tasks dressed up as transformation initiatives. Without rigorous intake, those requests consumed governance bandwidth and distorted the roadmap.

Redesigned Intake: Process-First, Metrics-Required

The revised intake form requires submitters to document their current process in detail — how long it takes, how many people are involved, what the error rate is, and what downstream systems are affected. Only after establishing the baseline does the form ask what the AI system is expected to improve, and by how much. Critically, submitters are now encouraged to work directly with the data science team to validate whether their assumptions about AI capability are realistic before the use case reaches the governance committee.

This shift accomplishes two things: it filters out requests that are mismatch fits for AI (simple ETL tasks, standard report generation), and it creates pre-deployment benchmarks that can be used later to demonstrate measurable ROI. The $250,000 annual saving attributed to the medical pre-authorization agent was only quantifiable because a baseline existed before the agent went live.

Risk Scoring: Dimensions That Matter

Alongside the intake form, each use case is evaluated against a risk scorecard that has been iteratively expanded with every new project. The scoring covers several key dimensions:

  • Data sensitivity: Does the use case touch Protected Health Information (PHI) or Personally Identifiable Information (PII)? PHI triggers automatic escalation to the full governance committee regardless of other factors.
  • Process criticality: Does the use case affect a critical business operation — drug distribution, pre-authorization, patient billing? Higher criticality raises the risk tier.
  • Access scope: Does the AI system or agent require broader data access than the requesting team currently has? Requests that would expand access beyond existing entitlements receive additional scrutiny and are flagged for security architecture review.
  • Financial exposure: Use cases involving financial data that crosses departmental boundaries (e.g., combining Salesforce sales data with SAP financial records) are evaluated for potential unauthorized disclosure risk.
  • Regulatory surface: Healthcare use cases, particularly anything touching clinical workflows or medical trial data, are treated as prohibited by default, requiring a formal justification path through the governance committee before proceeding.

The scorecard is not static. Norwood noted it has been revised on almost every new use case as edge cases surface — which is expected behavior for a system still in early maturity. The practical implication is that the governance committee is currently hands-on for most reviews rather than only engaging at high-risk thresholds as originally designed.

Prioritization and Scorecard-Driven Sequencing

The output of the risk scorecard feeds directly into project sequencing. Because every business division wants its AI initiative resourced simultaneously, the scorecard gives the governance committee an objective basis for prioritization. High-value, lower-risk use cases with clear baselines are sequenced earlier; high-risk or under-scoped requests are sent back for additional documentation or redesign before resubmission.

This also surfaces a structural challenge: the CISO does not sit on the executive team formally, while all other governance committee members do. In cases where business division heads appeal prioritization decisions to the CEO or CFO, the scorecard-backed rationale provides the security team’s position in writing — a critical safeguard when security recommendations might otherwise be overridden by business urgency alone.

Access Control Implications at Intake

One underappreciated output of the risk intake process is early identification of access control complexity. Norwood described the organization’s current state as “security spaghetti” — Active Directory groups with excessive permissions accumulated over years of ungoverned IT growth. When an AI use case requires combining data from two systems that have never shared access boundaries, the intake process forces a reckoning with those legacy entitlements before an agent is built on top of them. This is considerably easier to fix at intake than after an agent has been deployed and is operating against a poorly scoped permission set.

Actionable Takeaways

  • Require all AI use case submissions to document the current process baseline — time, headcount, error rate, and downstream dependencies — before any AI capability is proposed. This creates measurable benchmarks for ROI validation post-deployment and filters out requests that are actually standard automation or reporting tasks, not AI candidates.
  • Build a risk scorecard with explicit, binary triggers: PHI contact, PII scope, access expansion beyond current entitlements, and regulatory surface (clinical workflows, medical trial data). Use those triggers to automatically route high-risk use cases to the full governance committee rather than relying on reviewer judgment, which does not scale.
  • Treat the intake form as a living document. Revisit and expand the risk dimensions after every new use case — especially the first time each new data type, system integration, or department is involved. Early-stage governance frameworks will have gaps; iterative refinement at intake prevents those gaps from becoming deployed agent vulnerabilities.

Common Pitfalls

  • Launching with a minimal intake form under schedule pressure — asking only for a description and expected outcome — produces a backlog of poorly scoped projects that consume governance bandwidth without generating the baseline data needed to prioritize, approve, or reject them. FFF Enterprises had to rebuild their intake process mid-deployment after discovering the 40 original use cases included a significant proportion of simple reporting tasks that should never have entered the AI project queue.
  • Treating the risk scorecard as final after the first version. Every organization will encounter use cases that expose gaps in their initial risk dimensions — access scope, financial data cross-contamination, prohibited regulatory surfaces. Failing to update the scorecard when new edge cases surface means those risks go unscored on subsequent reviews until an incident forces the issue.

AI Policy, Approved Tooling, and Acceptable Use Enforcement

Enterprise AI policy doesn’t work when it reads like a memo. FFF Enterprises learned this the hard way — their initial AI usage policy amounted to “check with IT before using AI.” That guidance sat in a document no one referenced and provided no mechanism for enforcement. When AI adoption accelerated, the policy vacuum became a liability.

The shift to a meaningful policy required moving from aspirational language to operational specificity across three dimensions: approved tooling, prohibited use cases, and user accountability.

Defining Approved Tooling

Rather than prohibiting everything and carving out exceptions, FFF took an allowlist approach. The policy explicitly names which tools employees are permitted to use:

  • Microsoft Copilot[3] — the designated end-user AI assistant, deployed as the enterprise instance. Any employee attempting to access ChatGPT or other external AI chat interfaces through a corporate browser is automatically redirected to the internal Copilot instance. This funneling is enforced at the network layer via secure browsers, not just policy language.
  • Databricks[4] — designated as the central AI control plane for any workflow touching company data. Teams building or consuming agentic pipelines must route through Databricks rather than standing up ad-hoc model integrations.

This dual-platform consolidation was a deliberate architectural choice. Being a Microsoft shop with existing enterprise licensing made Copilot a natural fit for end-user AI. Databricks absorbed the agentic and data-intensive workloads. Centralizing on two known platforms meant security controls, access management, and monitoring could be applied consistently rather than sprawled across dozens of point solutions.

Specifying Prohibited Use Cases

A credible AI policy doesn’t just list what’s allowed — it names what’s explicitly off-limits. For FFF, the clearest prohibition was AI involvement in medical trial work. Given the regulatory exposure and patient safety implications, any use case touching clinical trial data is categorically prohibited without explicit governance committee approval. The policy states this directly rather than relying on risk assessments to surface it case by case.

This approach — naming prohibited categories in the policy itself — accomplishes something important: it removes ambiguity for employees who might otherwise assume that a general “check with IT” clause covers everything. A developer building a trial management feature shouldn’t need to go through intake to discover AI is prohibited there; it should be stated plainly.

Training Requirements and Mandatory Sign-Off

Two accountability mechanisms were added to the updated policy:

  1. AI awareness training — every employee permitted to use AI tools must complete training before gaining access. This isn’t a one-time orientation; it’s a prerequisite tracked against individual accounts.
  2. Acceptable use sign-off — FFF expanded their existing acceptable use policy (which already covered general internet and device usage) to add an AI-specific addendum. Employees sign off on the expanded policy, creating a documented acknowledgment that they understand what is and isn’t permitted.

Norwood noted that legal was enthusiastic about signatures — partly as a California compliance hedge, though he was candid about the limits of that protection. The real value of the sign-off isn’t legal defensibility; it’s changing the psychological framing. When employees sign something, they read it differently than a policy buried in a company intranet.

Championing Adoption Inside Departments

Policy enforcement through restriction alone creates resistance. FFF supplemented the policy framework with a Copilot adoption program that identified and empowered champions inside each business unit. These champions weren’t just power users — they were advocates who demonstrated concrete use cases within their teams, making AI feel accessible rather than surveilled.

This mirrors a pattern Norwood had tried previously with security and risk management champions, with mixed success. AI champions landed better, likely because employees perceived AI as an opportunity rather than an obligation. The lesson for security teams: enforcement mechanisms work better when paired with enablement. Policy says what you can’t do; champions show what you can.

The Policy as a Living Document

One honest observation from the FFF experience is that the risk intake form has been revised with nearly every new use case evaluated. Edge cases surface that the original policy didn’t anticipate — financial data exposure through combined workflows, access levels that shouldn’t be granted to certain roles, use cases that seem low-risk until you realize they’re combining data sources that create new sensitivity classifications.

That iterative reality means the acceptable use policy needs version control and a process for updates, not just an initial publication. Security engineers building AI policy infrastructure should plan for this: draft the policy to be specific enough to be actionable today, but structured so that prohibited-use-case lists and approved-tooling sections can be updated without rewriting the whole document.

Actionable Takeaways

  • Replace vague "check with IT" AI usage policies with explicit allowlists of approved tools (e.g., Microsoft Copilot for end-user AI, Databricks for agentic workflows) and explicit prohibited-use-case categories (e.g., clinical trial data). Specificity is what makes policy enforceable rather than advisory.
  • Require AI awareness training completion and a signed acceptable use addendum before granting access to any approved AI tool. Tie access provisioning to these prerequisites so the requirement is enforced technically, not just procedurally.
  • Pair policy enforcement with an internal AI champion program inside each business unit. Champions demonstrate legitimate, valuable use cases that make adoption feel like an opportunity — reducing shadow AI temptation and giving employees a path to productive use that doesn't bypass governance.

Common Pitfalls

  • Launching with a policy that says "check with IT before using AI" without defining approved tools, prohibited categories, or enforcement mechanisms. This creates a false sense of governance coverage while leaving every real decision unmade — employees fill the void with their own judgment, which is how shadow AI starts.
  • Treating the acceptable use policy as a one-time publication rather than a living document. AI use cases consistently surface edge cases the original policy didn't anticipate. Without a structured process to update prohibited-use-case lists and approved-tooling sections, the policy becomes stale and employees begin to treat it as non-authoritative.

Human Oversight in Agentic AI Workflows

Human oversight in agentic AI workflows is not just a compliance checkbox — it is a deliberate security architecture decision that determines the blast radius when an agent misbehaves. At FFF Enterprises, the CISO framed the core question as: where does incorrect autonomous action produce consequences that are irreversible, costly, or regulated? Every answer to that question becomes a mandatory human review gate.

The default posture at FFF is risk-averse: the more an agentic pipeline can do end-to-end, the better — but only up to the point before irreversible action. That boundary is where human confirmation must be inserted. This framing is directly actionable for security engineers designing agent architectures: map each agent action to its consequence severity, then enforce human checkpoints at every transition from reversible to irreversible.

Medical Pre-Authorization Agent with Human Review Gate

Proof of Concept

FFF Enterprises deployed an agentic AI workflow to automate the assembly of medical pre-authorization appeal packages for specialty drug patients — a process where insurer denials are common — with a mandatory physician review gate before any submission.

  1. Trigger: Denial Letter Received. A patient’s insurance claim for a specialty (often immune-suppressing) drug is automatically denied by the payer. These denials are routine in specialty pharmacy because the drugs are expensive; the insurer defaults to rejection unless a formal appeal with supporting documentation is submitted.

  2. Agent Reads the Denial Letter. An AI agent is invoked to ingest the denial letter. The agent parses the stated reason(s) for denial — lack of prior authorization, step therapy requirements not met, or missing clinical justification — extracting the specific criteria the payer requires for an appeal to succeed.

  3. Agent Pulls Supporting Documentation. Based on the denial criteria, the agent queries the relevant clinical and administrative data sources to retrieve all documentation required for the appeal: patient history, prescriber notes, clinical trial data for the drug, payer-specific formulary requirements, and any prior correspondence. This pull is automated across internal systems without manual staff involvement.

  4. Agent Assembles the Appeal Package. The agent wraps all retrieved documentation into a structured, formatted appeal submission tailored to the specific payer’s requirements and the denial reason identified in step 2. The output is a complete, ready-to-submit document package.

  5. Human Review Gate: Physician Approval Required. Before any appeal is submitted, the assembled package is routed to a physician for review. The doctor examines the agent-assembled documentation and either approves the package for submission or rejects/modifies it. This human-in-the-loop checkpoint is non-negotiable — the agent has no authority to submit independently. This gate exists because the decision affects patient care and carries PHI, regulatory, and liability exposure.

  6. Outcome and Value Realization. Once the physician approves, the appeal package is submitted to the insurer. FFF Enterprises measured the automation of the assembly step at approximately $250,000 in annual savings, attributed to staff time recaptured from manual appeal preparation.

Why this is a security and governance model, not just a UX choice: PHI is being read and processed by the agent at every step. If the agent were to act autonomously — submitting appeals without physician review — a hallucinated or incorrectly assembled package could cause patient harm, HIPAA exposure, and regulatory liability. The human gate functions as both a quality control checkpoint and a compliance enforcement point.

Medical pre-authorization agent workflow with mandatory physician human review gate before submission

Multi-Agent Shipping Claims Orchestration in Databricks

Proof of Concept

FFF Enterprises deployed a multi-agent pipeline inside Databricks to automate the handling of overages, shortages, and damages (OSD) claims for pharmaceutical shipments — routing customer complaints from SAP e-commerce through Salesforce and into a Databricks orchestrator agent that assembles a complete claims dossier for human review.

  1. Customer complaint ingestion via SAP e-commerce: A customer experiencing an overage, shortage, or damage with a shipped order logs a complaint directly on the FFF e-commerce portal (built on SAP). The complaint event is captured and a notification is automatically triggered downstream.

  2. Salesforce agent receives the alert: The SAP complaint notification is passed to the Salesforce AI Agent[5]. The assigned salesperson is surfaced the issue in Salesforce, and the complaint data is handed off into the multi-agent orchestration pipeline.

  3. Databricks orchestration layer takes over: The Databricks environment acts as the central control plane for the multi-agent workflow. A Databricks-hosted orchestrator agent coordinates all subsequent steps, pulling data from multiple source systems.

  4. SAP data retrieval by sub-agent: A sub-agent within Databricks queries SAP to pull structured order data — including ship date, shipment records, and other relevant logistics details. This structured data is passed to the orchestrator.

  5. Document aggregation: The orchestrator agent pulls together all relevant documentation associated with the claim — shipping manifests, order records, and any supporting files — and begins assembling a consolidated claims package.

  6. Customer-uploaded evidence ingestion: The pipeline ingests image attachments uploaded by the customer (e.g., photos of damaged goods or incorrect quantities). Future capability includes warehouse video footage from the packing process, since some shipments contain vials of drugs valued at approximately $13,000 each.

  7. Orchestrator agent assembles the final dossier: The orchestrator combines the SAP order data, Salesforce complaint metadata, supporting documents, and customer-uploaded media into a single, comprehensive claims document formatted for human review.

  8. Human review gate: The completed dossier is presented to a human reviewer who evaluates the assembled evidence and issues a response to the customer. No autonomous resolution decisions are made by the agents — the human retains final authority.

  9. Access control and authorization context: The Databricks-to-Salesforce agent communication is governed by the authorization controls within each respective platform. FFF is in the process of tightening Active Directory group-based access controls within Databricks to enforce least-privilege across the pipeline, acknowledging current “security spaghetti” in group permissions as a known remediation item.

Multi-agent shipping claims orchestration pipeline: SAP → Salesforce → Databricks orchestrator → human review

Decision Logic for Placing Human Review Gates

Drawing from both deployments, a practical framework for deciding where to place human oversight checkpoints in agentic workflows emerges:

Insert mandatory human review when any of the following is true:

  • The agent’s output will directly affect a regulated process (medical, financial, legal).
  • Incorrect action cannot be undone or correction is costly — such as submitting a wrongly assembled insurance appeal or approving a fraudulent shipping claim.
  • The agent is reading or processing PHI, PII, or sensitive financial data that must be protected under specific compliance frameworks.
  • The agent’s output will be presented externally — to a customer, regulator, or partner — without a further internal review layer.

Autonomous operation is acceptable when:

  • The agent is performing data aggregation or document assembly with no external submission.
  • Errors in the agent’s intermediate output can be caught and corrected before they propagate.
  • The action falls within a clearly defined, low-risk use case scope (e.g., pulling ship dates from SAP for internal assembly).

Access Controls as a Prerequisite for Human Oversight to Work

Human review gates are only as effective as the access controls that constrain what agents can do before and after that gate. At FFF, the team centralized their AI control plane through Databricks, and they are actively tightening agent-level access using Databricks’ native labeling capabilities and Active Directory group-based permissions.

The challenge acknowledged explicitly: legacy group memberships have created overly broad access — some agents effectively inherit more permissions than they should because the underlying AD groups were never scoped for agent-level access. Cleaning this up is an ongoing effort, and the CISO recommended addressing access control hygiene early, before agents go into production, rather than retrofitting controls after deployment.

The principle here maps directly to least-privilege for agentic systems: an agent should have read/write access only to the data and systems required for its specific task at each workflow stage. When agents span systems (Salesforce, SAP, Databricks), this requires explicit mapping of what each agent touches at each step and enforcing those boundaries through platform-level controls — not just governance policy.

Monitoring and Audit as the Backstop

Human oversight gates address the primary decision points, but FFF also relies on Databricks’ built-in monitoring capabilities as a secondary control layer. The platform allows the team to set expected output baselines for agents and alert when outputs deviate from those norms — a practical hallucination detection mechanism. If an agent’s assembled document consistently looks different from the expected pattern, an alert fires.

The CISO noted a gap in this approach: the current alerting catches sudden deviations, but a slow drift — a gradual shift in agent output quality over time — may not trigger threshold-based alerts. A more robust audit program that periodically samples agent outputs against ground-truth expectations is planned but not yet implemented. For security engineers, this is a known weak point in behavioral monitoring for agents: point-in-time alerting is necessary but not sufficient; periodic deep audits are required to catch gradual degradation.

Actionable Takeaways

  • Map every agent action to its consequence severity before deployment. For each workflow stage, explicitly answer: is this action reversible? Does it touch PHI or regulated data? Is it customer-facing? Every "yes" answer is a candidate for a mandatory human review gate. Document this mapping and get sign-off from legal and compliance before agents go live.
  • Enforce least-privilege access controls at the agent level, not just the user level. Before launching any agentic workflow that spans multiple systems (e.g., Databricks querying SAP and Salesforce), audit the underlying service accounts and AD group memberships that agents will inherit. Scope access to the minimum required for each workflow stage — do not allow agent permissions to be as broad as the human user roles they proxy.
  • Implement both real-time alerting and periodic audit sampling for agent output quality. Configure platform-level hallucination monitoring (e.g., Databricks output baseline alerts) for immediate deviation detection, but also schedule periodic manual audits that compare a sample of agent outputs against expected results. Real-time alerts catch sudden failures; periodic audits catch slow drift that threshold-based monitoring misses.

Common Pitfalls

  • Assuming a single centralized control plane eliminates the need to define per-agent access scopes. FFF centralized through Databricks and Microsoft Copilot, which simplified governance significantly — but the underlying Active Directory group memberships that agents inherited were over-permissioned due to legacy accumulation. Centralizing the AI control plane is necessary but not sufficient; agent-level access must be explicitly scoped regardless of platform.
  • Deploying human oversight gates without defining clear criteria for what constitutes a reviewable output. In FFF's medical pre-authorization workflow, the physician reviews a fully assembled document — the agent's job is complete before the human sees anything. If the assembly step is poorly scoped or the agent can partially submit before the review gate, the oversight control loses its effectiveness. Define exactly what state the agent must reach before human review occurs, and enforce that state boundary technically, not just procedurally.

Prompt Injection Defense and Shadow AI Detection

As enterprise AI governance matures, two operational security problems consistently surface as the hardest to solve: prompt injection across user-facing and agentic surfaces, and shadow AI detection before unauthorized tool adoption becomes a supply chain or data exfiltration risk. At FFF Enterprises, both challenges were addressed through a layered control strategy rather than any single silver-bullet solution.

Secure Browser Controls as the First Line of Defense

The primary prompt injection mitigation strategy at FFF centers on secure browsers deployed to all end users. When employees attempt to access external AI services such as ChatGPT, the secure browser intercepts and redirects traffic to the company’s enterprise Microsoft Copilot instance instead. This approach eliminates an entire class of prompt injection vectors by ensuring user interactions never reach untrusted third-party AI endpoints.

A secondary benefit of the secure browser layer is automatic key and credential stripping. When employees copy and paste code into AI tools for review — a common developer workflow that introduces exfiltration risk — the secure browser scrubs API keys and secrets before the content reaches the AI interface. This addresses one of the most frequent inadvertent data leakage paths in developer-facing AI deployments.

Prompt Injection Mitigation via Secure Browser and Databricks Monitoring

Proof of Concept

FFF Enterprises deployed a layered prompt injection defense strategy combining secure browser controls at the user-facing layer with Databricks hallucination monitoring at the agentic layer — addressing two distinct attack surfaces where prompt injection could compromise enterprise AI workflows handling PHI and financial data.

  1. Identify the two attack surfaces: Distinguish between user-facing AI interactions (employees using browser-based AI assistants) and backend agentic pipelines (Databricks-orchestrated multi-agent workflows). Each surface requires a different control class. User-facing surfaces are vulnerable to employees submitting crafted inputs or inadvertently pasting malicious content; agentic surfaces are vulnerable to injected instructions propagating across agent boundaries.

  2. Deploy secure browser controls for user-facing AI access: Configure a secure browser policy that intercepts and redirects all AI assistant traffic. In FFF’s implementation, any attempt to access ChatGPT or other unauthorized AI endpoints is automatically redirected to the enterprise Microsoft Copilot instance. This prevents users from reaching uncontrolled external LLMs where injected content could exfiltrate sensitive data or bypass enterprise policy guardrails.

  3. Enforce key and credential scrubbing at the browser layer: Within the secure browser, configure content inspection policies that detect and strip API keys, credentials, or other secrets when users paste code or text into AI chat interfaces. This directly addresses a prompt injection variant where malicious or accidental inclusion of secrets in LLM input could result in exfiltration or unauthorized model behavior. FFF applied this specifically to code review scenarios where developers might paste source code containing embedded keys.

  4. Funnel all agentic AI through a single control plane (Databricks): Rather than allowing agent frameworks to proliferate across business units, centralize all agentic execution through Databricks. This single control plane means monitoring, access control, and anomaly detection policies only need to be applied in one place. Any prompt injection attempt that alters agent behavior will surface as an anomaly against the monitored baseline.

  5. Establish behavioral baselines and configure hallucination/anomaly alerting in Databricks: Use Databricks’ built-in monitoring capabilities to define what normal agent output looks like — expected response patterns, data access patterns, and output ranges for each agentic workflow. Configure alerts to trigger when agent outputs deviate from these baselines. This catches prompt injection attempts that cause agents to behave unexpectedly (e.g., accessing data outside their defined scope, returning responses inconsistent with their task, or exhibiting hallucination patterns indicative of injected instruction overrides).

  6. Acknowledge the detection gap and plan for audit-based slow-drift detection: Recognize that real-time alerting catches sharp deviations but may miss slow, incremental drift caused by persistent or low-intensity prompt injection attempts. Plan a supplementary audit program that periodically reviews historical agent behavior against established norms to detect gradual manipulation that stays under alerting thresholds. At the time of the talk, FFF had identified this gap but not yet implemented the full audit program.

  7. Integrate shadow AI detection via CrowdStrike endpoint telemetry: For the broader prompt injection risk surface — users bypassing controls by subscribing to external AI platforms on personal credit cards — integrate endpoint detection tooling. FFF leveraged CrowdStrike[6] signals tied to community-sourced AI detection signatures (referencing the Cloud Security Alliance’s[7] OpenClaw release) to identify unauthorized AI tool usage on managed endpoints. This limits the number of unsanctioned surfaces where injected prompts could operate outside enterprise visibility.

  8. Known limitations and open gaps: Secure browser controls cannot fully enumerate AI-like behavior for all possible tools (e.g., distinguishing a general web request to an AI API versus a legitimate SaaS API call). Detecting all shadow AI adoption remains a work in progress. The Databricks monitoring approach requires ongoing baseline maintenance as agentic workflows evolve, and the audit program for slow-drift detection was not yet operational at the time of the talk.

Shadow AI Detection: The Harder Problem

Shadow AI — employees independently subscribing to AI tools on personal or corporate credit cards, outside of any procurement or security review — represents a fundamentally different risk profile than prompt injection. The threat is not a technical exploit; it is unsanctioned data flowing to unvetted models hosted in unknown environments.

FFF’s approach to shadow AI detection operates on two fronts:

1. Secure browser visibility

The same secure browser infrastructure used for prompt injection defense provides partial shadow AI telemetry. Security teams can observe where users attempt to navigate, identifying unauthorized AI platforms. The limitation is incomplete coverage: the secure browser cannot distinguish whether a specific web session involves AI interaction or general browsing without deeper content inspection.

2. Procurement and third-party risk integration

The more durable control is embedding AI disclosure requirements into vendor contracts. FFF’s procurement and legal teams now require vendors to explicitly state whether their products include AI features, where data is processed, and which models are in use. This turns the third-party risk management process into a passive shadow AI inventory mechanism — any vendor with undisclosed AI capabilities becomes a contractual compliance issue rather than an invisible data flow.

The trigger for this approach was the Cloud Security Alliance’s OpenClaw guidance, which provided a framework for AI risk disclosures in vendor relationships. FFF aligned its procurement review templates to that framework after CrowdStrike published associated detection tooling.

Remaining Gaps and Forward Priorities

Norwood was explicit that current controls do not provide complete coverage. The secure browser approach handles the majority of browser-based AI access, but detecting AI usage embedded in legitimate SaaS tools — where the AI feature is one capability among many in a subscribed platform — remains difficult. The specific challenge: when a security tool, CRM, or productivity suite quietly adds an AI feature and begins processing enterprise data through an external model, the interaction looks like normal SaaS traffic.

Shadow AI detection is identified as a priority initiative for later in the year. The planned approach involves deeper integration between endpoint telemetry (CrowdStrike), browser controls, and the procurement inventory, creating overlapping detection layers rather than relying on any single signal.

Key Architectural Principle: Defense Requires Overlapping Layers

The underlying pattern across FFF’s AI security posture is that no single control is sufficient. Secure browsers address user-facing injection and key exfiltration. Databricks monitoring addresses agentic behavioral drift. Procurement integration addresses undisclosed third-party AI. Each layer covers what the others miss, and the gaps between layers define the roadmap for future investment.

Actionable Takeaways

  • Deploy secure browsers as the primary prompt injection control for end users — configure them to intercept requests to external AI endpoints (ChatGPT, consumer Claude, etc.) and redirect to your enterprise AI instance. Enable automatic credential stripping for copy-paste workflows to prevent inadvertent key exfiltration through AI code review sessions.
  • Configure behavioral baseline alerting in your agentic AI platform (Databricks or equivalent) and pair it with a periodic manual audit program. Real-time alerting catches sudden anomalies; scheduled historical reviews catch slow behavioral drift that threshold-based alerts miss — both are required for complete coverage.
  • Embed AI disclosure requirements into your vendor contracts and third-party risk intake forms. Require vendors to declare AI features, data processing locations, and model identities. Align your questionnaire to published frameworks such as the Cloud Security Alliance's OpenClaw guidance to create a replicable, audit-ready shadow AI inventory through your existing procurement process.

Common Pitfalls

  • Relying solely on secure browser controls for shadow AI detection. Browser-level visibility shows where users navigate but cannot identify AI usage embedded inside legitimate SaaS platforms. Employees can interact with shadow AI through tools that are already approved and in use — procurement integration is the only control that reaches this surface.
  • Treating hallucination monitoring baselines as a complete prompt injection defense for agentic pipelines. Threshold-based alerting detects sudden output anomalies but is blind to gradual behavioral manipulation. Without a complementary audit program comparing agent outputs longitudinally, slow-drift injection techniques will not trigger any alert.

Conclusion

Billy Norwood’s framework at FFF Enterprises offers a rare view of AI governance built under real operational pressure — 40 projects already in motion, a CEO who flipped from “no AI” to “I want it now” overnight, and a security budget that had to be emergency-patched to keep up. The result is not a theoretical model but a working blueprint with acknowledged gaps, iteratively refined through production deployments.

The core lesson is structural: governance frameworks that conflate strategic policy authority with day-to-day operational review become blockers. The two-tier model — executive steering committee for high-risk escalations, AI Center of Excellence for operational throughput — keeps the committee mechanism viable without every use case triggering a CISO-level review.

The intake and risk scoring system is the mechanism that makes the two-tier structure work in practice. Without binary triggers for PHI, PII, and regulatory surface areas, routing decisions become judgment calls that don’t scale. And without process baselines in the intake form, the governance committee can’t distinguish a $250,000 automation opportunity from a reporting task dressed up in AI language.

For security engineers designing agentic systems, the human oversight decision logic from FFF’s two production deployments is directly applicable: map every agent action to consequence severity, insert mandatory human gates at every transition from reversible to irreversible, and enforce those boundaries technically rather than through procedural policy alone.

For further reading on related topics covered on this site:


References & Tools

  1. Salesforce — CRM platform used at FFF Enterprises for customer complaint management and sales visibility in the multi-agent shipping claims pipeline.
  2. SAP — ERP and e-commerce platform used as the source system for shipping order data and customer complaint ingestion at FFF Enterprises.
  3. Microsoft Copilot — Enterprise AI assistant deployed as the approved end-user AI interface at FFF; all browser-level AI access is funneled to the enterprise Copilot instance via secure browser redirection.
  4. Databricks — Central AI control plane for agentic workflows at FFF; handles multi-agent orchestration, bronze/silver/gold data labeling standards, hallucination baseline monitoring, and access control enforcement.
  5. Salesforce AI Agent — Integrated into the shipping claims multi-agent workflow; receives e-commerce complaints from SAP and passes structured complaint data to the Databricks orchestration layer.
  6. CrowdStrike — Endpoint security platform providing shadow AI detection signals; FFF leveraged CrowdStrike endpoint telemetry tied to the Cloud Security Alliance's OpenClaw detection signatures to identify unauthorized AI tool usage.
  7. Cloud Security Alliance (OpenClaw) — Published the OpenClaw AI detection framework that FFF used to align procurement AI disclosure requirements and endpoint detection for shadow AI inventory.
Frequently asked

Questions from the audience

What is a tiered AI governance committee and why does it matter for security?
A tiered AI governance committee separates strategic policy authority from operational implementation. The top tier — typically CISO, CIO, General Counsel, and Chief Compliance Officer — sets policy, manages ethics, and handles high-risk escalations. The lower tier, an AI Center of Excellence with VPs, directors, and the security architect, handles day-to-day use case review and controls implementation. The separation prevents bottlenecks and ensures security controls are both mandated at the top and implemented correctly at the operational level.
How should organizations score and prioritize AI use cases at intake?
Use a risk scorecard that evaluates data sensitivity (does it touch PHI or PII?), process criticality (does it affect critical operations?), access scope expansion (does it require permissions beyond current entitlements?), financial exposure, and regulatory surface (clinical workflows should be prohibited by default). Require submitters to document their current process baseline — time, headcount, error rate — so ROI can be measured post-deployment and so the intake form distinguishes real AI candidates from simple reporting tasks.
Where should human review gates be placed in agentic AI workflows?
Insert mandatory human review at every point where an agent action is irreversible, affects a regulated process, touches PHI or sensitive financial data, or produces customer-facing output. Agents can run autonomously for data aggregation and document assembly — but must stop for human confirmation before any submission, approval, or externally visible action. FFF Enterprises applies this pattern in both their medical pre-authorization agent (physician reviews before submission) and their shipping claims orchestrator (human reviews assembled dossier before customer response).
How can enterprises detect shadow AI usage before it becomes a security risk?
Use overlapping detection layers: secure browser controls that redirect unauthorized AI endpoint traffic to your enterprise instance, endpoint telemetry (such as CrowdStrike signals tied to the Cloud Security Alliance's OpenClaw framework), and procurement-integrated vendor contracts that require explicit AI feature disclosure. No single layer covers all shadow AI vectors — SaaS platforms that quietly add AI features are only caught through contractual disclosure requirements, not browser or endpoint controls.
Watch on YouTube
Establishing AI Governance Without Stifling Innovation | [un]prompted 2026
Billy Norwood, · 23 min
Watch talk
Keep reading

Related deep dives