What is AI Goat and how do I deploy it?

AI Goat is an open-source, deliberately vulnerable AI infrastructure built on AWS SageMaker that operationalizes the OWASP Top 10 Machine Learning Risks through three hands-on attack and defense scenarios. Deploy it on your own AWS account using Terraform locally or via a GitHub Actions workflow in a forked repository — both paths are documented in the project repository.

What is an AI supply chain attack and how does it differ from a traditional one?

An AI supply chain attack targets the packages, preprocessing libraries, and infrastructure components that ML models depend on rather than the model itself. Unlike traditional supply chain attacks targeting application code, AI supply chain attacks can compromise model behavior indirectly — for example, by exploiting a vulnerable image preprocessing library to achieve remote code execution on the ML serving infrastructure.

How do I protect ML training data from data poisoning attacks?

The two primary controls are strict access control and data integrity validation. Remove public access from all cloud storage holding training data and restrict write access to authorized IAM roles only. Before each retraining run, validate the dataset using cryptographic checksums and statistical anomaly detection — if the hash or distribution deviates from the approved baseline, abort the training job and alert your ML operations team.

What is an output integrity attack and what makes it hard to detect?

An output integrity attack bypasses a model's predictions without compromising the model, its weights, or its training data. The attacker manipulates the communication layer between the model and the application — for example, by supplying prediction values in the request payload that the backend trusts without re-querying the model. These attacks are hard to detect because the model continues to function correctly; manipulation occurs entirely in the application logic that consumes the output.

AI Goat: Hack ML Vulnerabilities on AWS SageMaker

A publicly accessible S3 bucket holds the training data for a live recommendation model — and any anonymous user can overwrite it. That’s one of three AI security vulnerabilities in machine learning infrastructure that AI Goat exposes hands-on: supply chain attacks that pivot through vulnerable preprocessing libraries, data poisoning via misconfigured cloud storage, and output integrity bypasses that trust client-controlled prediction payloads over the model itself.

Most ML teams are deploying on AWS SageMaker without ever stress-testing their pipelines against these attack classes. This post walks through all three AI Goat challenge scenarios from OWASP Global AppSec USA 2024 — covering the complete attack chains, the OWASP Top 10 ML Risks they map to, and the concrete mitigations across access control, input validation, and continuous monitoring that close each gap.

Key Takeaways

You'll learn how to identify and exploit three critical OWASP ML risks — supply chain attacks, data poisoning, and output integrity attacks — through hands-on scenarios against a deliberately vulnerable AWS-hosted AI infrastructure.
You'll be able to recognize insecure AI deployment patterns such as publicly accessible S3 training data buckets, unvalidated model inputs, and improperly trusted client-supplied prediction values before attackers exploit them.
Apply a practical defense framework — covering access control, input/output validation, vulnerability monitoring, and developer education — to harden your own machine learning pipelines against real-world attack vectors.

The AI Security Threat Landscape and OWASP ML Top 10 Risks

AI security vulnerabilities in machine learning infrastructure are no longer a theoretical concern — they represent a rapidly expanding attack surface that security engineers must treat with the same rigor as traditional application and network threats.

The Scale of ML Adoption Across Cloud Platforms

According to Gartner, the AI market is projected to reach nearly $300 billion by 2027, driven by ~19% annual growth. The security implications become concrete when you look at actual deployment data:

AWS: 29% of organizations have implemented SageMaker^[2] notebooks for ML workloads.
Azure: 39% of Azure organizations have integrated OpenAI services.
GCP: 24% of GCP users are actively leveraging Vertex AI.

More significant than raw adoption is how organizations consume these services: 56% have integrated AI into custom-built applications rather than relying on vendor-managed solutions. With over 50 AI packages available — including TensorFlow^[4] and PyTorch^[5] — developers are assembling bespoke ML pipelines with multiple third-party dependencies, each carrying its own vulnerability surface.

Exposed Packages, Leaked Keys, and Public-Facing Models

The custom integration trend compounds risk in three measurable ways:

Vulnerable AI package exposure: Over 62% of organizations that have adopted AI packages are already exposed to vulnerabilities native to those packages.
Leaked API credentials: 20% of organizations using OpenAI have stored API keys in insecure locations — a leaked key can serve as an initial foothold enabling lateral movement across an entire cloud environment.
Publicly accessible AI assets: 10% of organizations running AI-powered workloads have those assets exposed to the public internet — a bidirectional risk where the public can reach models and training data, and those models may have outbound internet access that exposes them to unintended data influence.

Three Structural Challenges Unique to ML Security

Pace of innovation: New model versions and packages emerge daily, each introducing new attack vectors that exceed the cadence of most vulnerability management programs.
Shadow AI: Just as Shadow IT creates unmanaged cloud resources, Shadow AI creates unmanaged ML workloads that security teams are frequently unaware of.
Immature security documentation: ML security research, formal threat models, and established hardening protocols are sparse compared to web application or network security domains.

Introducing the OWASP Top 10 Machine Learning Risks

Published in early 2023, the OWASP Top 10 Machine Learning Risks^[3] is the authoritative starting point for any security engineer evaluating their ML environment. The three risks most directly relevant to production ML pipelines — and the focus of AI Goat^[1] — are:

AI Supply Chain Attack (ML01): Attackers compromise packages or infrastructure components the model depends on. The Shelltorch vulnerability is one documented real-world example.
Data Poisoning Attack (ML02): An attacker gains access to a model’s training data and introduces changes that cause the model to behave in ways the developer did not intend.
Output Integrity Attack (ML03): The attacker does not touch the model or its training data — they intercept or manipulate the model’s output after the prediction is generated. A documented 2023 incident involving a health diagnosis system illustrates the real-world stakes.

AI Goat^[1] deploys a toy web store with intentional misconfigurations, vulnerable packages, and insecure ML implementations across three challenges of increasing difficulty. The infrastructure mirrors real ML systems — SageMaker^[2] notebooks for training, S3 for data storage, Lambda^[6] for inference invocation, and API Gateway^[7] for frontend communication.

Actionable Takeaways

Audit your organization's AI package inventory immediately. Given that over 62% of organizations using AI packages are already exposed to package-native vulnerabilities, run a dependency scan across all ML projects — including notebooks and experimental workloads — and cross-reference against current CVE databases for frameworks like TensorFlow and PyTorch. Treat AI packages with the same vulnerability management rigor as application dependencies.
Conduct an inventory of AI assets exposed to the public internet. The finding that 10% of organizations have AI-powered assets publicly accessible represents a significant attack surface. Enumerate all SageMaker endpoints, ML-serving APIs, and training data buckets to confirm whether public access is intentional, and apply least-privilege access controls to any asset where it is not.
Integrate the OWASP Top 10 Machine Learning Risks into your existing threat modeling process. For every ML system in development or production, map each component — training pipeline, data storage, inference endpoint, and API layer — against the OWASP ML risk categories to identify where supply chain, data poisoning, and output integrity exposures exist before they are exploited.

Common Pitfalls

Treating AI services as inherently more secure because they are vendor-managed. The data shows that even organizations using managed services like OpenAI, SageMaker, and Vertex AI are accumulating significant risk through insecure credential storage, public asset exposure, and vulnerable third-party package integration. Vendor management of the underlying model does not eliminate the security obligations of the teams integrating and deploying those models in custom applications.
Overlooking Shadow AI as part of your cloud security posture. Security teams that have not explicitly extended their Shadow IT discovery processes to include AI services — notebooks, endpoints, training jobs, and data pipelines spun up outside formal procurement — are operating with an incomplete view of their ML attack surface.

AI Supply Chain Attacks — Exploiting Vulnerable ML Dependencies

AI supply chain attack flow: API traffic leak to source code audit to RCE via malicious image metadata

AI security vulnerabilities in machine learning infrastructure extend far beyond the models themselves. One of the most underappreciated attack surfaces is the supply chain — the packages, dependencies, and preprocessing pipelines that ML systems rely on to function. AI Goat^[1] demonstrates this concretely through an image similarity model, a vulnerable preprocessing library, and a path to remote code execution that requires no direct interaction with the model at all.

What Is an AI Supply Chain Attack?

An AI supply chain attack follows the same fundamental logic as traditional supply chain attacks: rather than breaching the hardened target directly, the attacker compromises something the target depends on. In the ML context, this means targeting:

Third-party packages the model imports (e.g., image processing libraries, data transformation utilities)
Infrastructure misconfigurations in the environment hosting the model
Preprocessing functions that handle raw input before it reaches the model

Real-world precedents already exist. The Shelltorch vulnerability is one documented example of an AI supply chain attack occurring in the wild, demonstrating that this threat class has moved beyond theory.

The Attack Chain: RCE via Malicious Image Metadata

AI Supply Chain RCE via Malicious Image Metadata in a Vulnerable Preprocessing Package

A vulnerable AI preprocessing package in an image similarity model exposes an unauthenticated remote code execution vector via crafted image metadata, demonstrating a real-world AI supply chain attack where malicious input bypasses model logic entirely and achieves shell access on the backend server.

Step 1 — Reconnaissance via API Traffic Inspection

The attacker uses the product image search feature normally while capturing outgoing HTTP requests in browser developer tools. Among the expected API calls, an anomalous request to an image_pre-processing endpoint appears with an error response that references a public GitHub repository URL — the application is transparently leaking the location of its preprocessing code in error messages returned to the client.

Step 2 — Source Code Analysis of the Vulnerable Package

With the GitHub repository URL in hand, the attacker navigates to the public repository and inspects the source code. The repository contains a process_image function built on the Pillow^[8] (PIL) Python image processing library. The function reads metadata embedded in the uploaded image and — if metadata is present — executes that metadata as shell code, returning the output. This is a classic command injection vulnerability introduced through mishandling of user-controlled image metadata.

The vulnerability is not in the model. It lives entirely in a preprocessing utility the model depends on — the defining characteristic of a supply chain attack.

Step 3 — Crafting a Malicious Image Payload

The attacker prepares a Python script to construct a specially crafted image file. The image metadata field is populated with an arbitrary shell command (ls /home/ec2-user) to confirm code execution on the backend server. The malicious image is uploaded to the product search feature just as any legitimate user would upload a photo.

Step 4 — Remote Code Execution and Data Exfiltration

The application processes the uploaded image through the vulnerable preprocessing function. The embedded shell command executes on the backend server, and the directory listing is returned in the API response — revealing a file named sensitive_data. A second malicious image, embedding a file read command, returns the SageMaker recommendations bucket name from sensitive_data. This becomes the entry point for the next attack scenario (data poisoning).

This is full server-side remote code execution achieved without exploiting the ML model, purely through a vulnerable dependency in the preprocessing layer.

Why This Attack Pattern Is Particularly Dangerous

Indirect attack surface: Security teams focused on model endpoints may overlook preprocessing utilities and third-party transformation libraries.
Public exposure of internal dependencies: Error messages referencing internal GitHub repositories give attackers a roadmap to the codebase.
Metadata is rarely validated: Image EXIF fields are frequently ignored by input validation logic — they are not the “content” of the file, but they can carry executable payloads.
Transitive dependencies: ML projects routinely import dozens of packages, each with its own dependency tree that teams rarely audit comprehensively.

Defensive Mitigations

1. Vulnerability Monitoring for All Dependencies Maintain continuous visibility into every package imported by your ML infrastructure — including the full transitive dependency graph. Integrate automated CVE scanning that alerts on newly published vulnerabilities. This is especially important in ML projects where packages like Pillow^[8], NumPy, or custom preprocessing utilities may go unmonitored for extended periods.

2. Package Signature Verification Before any package is installed or executed in your ML environment, verify its authenticity against the signature provided by the vendor or package registry. Pin package versions and enforce hash verification in dependency lockfiles.

3. User Input Validation and Sanitization All user-supplied data — including binary file uploads like images — must be treated as untrusted. Strip or reject EXIF metadata from all uploaded images before any processing occurs. Consider re-encoding images server-side (stripping all metadata by default) before passing them to any processing function.

4. Suppress Sensitive Information in Error Responses Error responses returned to clients should never include internal paths, package names, repository URLs, stack traces, or any detail that could assist an attacker in mapping the application’s dependencies. Replace verbose error messages with opaque error codes.

Actionable Takeaways

Audit your ML preprocessing pipeline as a distinct attack surface: enumerate every library, utility function, and data transformation step that runs before input reaches your model, then scan each component against current CVE databases and enforce hash-pinned, signature-verified package installs in your dependency lockfiles.
Implement a mandatory image (and file upload) sanitization step that re-encodes all uploaded files server-side — stripping EXIF and other metadata fields — before the file is passed to any preprocessing function, eliminating the metadata injection vector demonstrated in this scenario.
Audit all API error responses and application logs to ensure they do not expose internal repository URLs, package names, file paths, or infrastructure details; replace verbose error messages with opaque error codes that are mapped to detailed information only in internal logging systems.

Common Pitfalls

Focusing model security efforts exclusively on the model endpoint and training pipeline while leaving preprocessing utilities and third-party transformation libraries unmonitored and unpatched — the AI Goat scenario shows that a single vulnerable preprocessing function can provide full remote code execution without the model itself being involved at all.
Failing to treat image metadata as user-controlled input subject to the same validation and sanitization rules as form fields or query parameters — metadata fields are a blind spot in most input validation implementations and can carry shell commands that reach execution if the preprocessing code handles them unsafely.

Data Poisoning Attacks — Manipulating ML Training Data at Rest

Data poisoning attack chain: anonymous S3 enumeration to CSV manipulation to automated model retraining

Among the most insidious AI security vulnerabilities in machine learning infrastructure is the data poisoning attack. Unlike exploits that target a model’s code or its runtime outputs, a data poisoning attack operates silently at the source — corrupting the training data before the model ever runs inference. The result is a model that behaves exactly as designed, just with a design that has been secretly subverted.

What Is a Data Poisoning Attack?

A data poisoning attack occurs when an attacker gains access to a model’s training dataset and introduces changes — sometimes subtle, sometimes sweeping — that cause the model to learn incorrect or adversarially skewed behavior:

“This attack happens when the attacker gets their hands on the training data, the model’s training data, and changes it — sometimes a drastic change, sometimes a very minor change. The point is that those changes make the model behave differently in a way that the developer did not expect or intended.”

In a recommendation context, this means an attacker can influence what products users see — hijacking personalization logic without ever touching the model’s weights or code.

The Attack Chain: From Anonymous Bucket Enumeration to Recommendation Override

Data Poisoning via Anonymous S3 Write Access to Override ML Recommendation Model Training Data

An attacker leverages an unauthenticated, publicly writable S3 bucket storing the training dataset for an Amazon SageMaker recommendation model to inject poisoned ratings data, triggering automatic model retraining and causing the model to surface a hidden product as the top recommendation.

Step 1 — Reconnaissance via Prior Exploit

The attack begins with a finding from the supply chain challenge: a bucket name leaked through the RCE vulnerability in the image preprocessing package. The extracted string identifies a SageMaker recommendations bucket — a critical piece of information that seeds the data poisoning attack. This cross-challenge dependency illustrates how individual misconfigurations chain together into high-impact compromises.

Step 2 — Anonymous S3 Enumeration

Using the bucket name, the attacker attempts unauthenticated aws s3 ls access — no AWS credentials required. The bucket is publicly readable and writable. Listing the bucket reveals model artifacts and, critically, product_ratings.csv — the training dataset for the SageMaker^[2] recommendation model.

Step 3 — Training Data Analysis

After downloading the CSV, the attacker inspects the schema: columns are user_id, product_id, and rating. Products currently appearing in recommendations (IDs 3, 4, 13, 19) carry ratings of 4–5, confirming high ratings drive recommendations. Product ID 2 — the hidden “Orca doll” — is present in the dataset with low ratings, explaining its absence from the storefront. Understanding the data schema is the pivot point: the attacker does not need model code, the SageMaker notebook, or the endpoint — only the data.

Step 4 — CSV Manipulation and Re-upload

The attacker modifies the CSV to elevate product ID 2: all records with product ID 3 are overwritten and reassigned to product ID 2, with every rating set to the maximum value of 5. The modified file is uploaded back to the Amazon S3^[9] bucket using the --acl bucket-owner-full-control flag, preserving ACL permissions. The upload succeeds, overwriting the legitimate training file in place.

Step 5 — Triggering Unauthorized Retraining

The AI Goat^[1] infrastructure includes an automated retraining pipeline that monitors S3 for changes. After the poisoned CSV upload, the SageMaker endpoint status transitions from InService to Updating — confirming retraining is in progress. Approximately five minutes later, the storefront now surfaces the Orca doll as a top recommendation. The attacker never authenticated, never touched the model directly, and never modified any application code.

Why This Attack Surface Exists

Two root causes enable this exploit:

Public bucket with read and write access: The S3 bucket storing training data was accessible anonymously — a direct violation of least privilege.
Absence of data integrity validation before retraining: The automated retraining pipeline consumed updated data without verifying it hadn’t been tampered with — no checksums, no anomaly detection, no review gate.

Either one alone would have broken the chain. A private bucket prevents the attacker from writing poisoned data; integrity validation before retraining prevents poisoned data from influencing the model even if written.

Defensive Mitigations

1. Dataset Integrity Validation Before any retraining pipeline consumes new data, verify it is what it is expected to be:

Generate a cryptographic hash (SHA-256) of the canonical training dataset and compare before each training run. Any deviation aborts the pipeline and triggers an alert.
Implement statistical profiling of the dataset (rating distributions, product ID frequencies, row counts) and flag deviations beyond acceptable thresholds.
For high-value models, require human sign-off on dataset changes before retraining is authorized.

2. Bucket Access Control

Remove all public access permissions from S3^[9] buckets storing ML training data, model artifacts, and pipeline outputs.
Apply least privilege: only the SageMaker execution role and authorized pipeline services should have write access.
Enable S3 Object Lock or versioning to create an immutable audit trail of dataset changes.
Enable S3 server access logging and integrate with CloudTrail to capture all PutObject and DeleteObject events on training data buckets.

Actionable Takeaways

Enforce private-only access on all S3 buckets (or equivalent cloud storage) that hold ML training data, model artifacts, and pipeline outputs. Audit existing buckets using your cloud provider's public access block controls and access analyzer. Apply least-privilege IAM policies so that only the specific SageMaker execution role (or equivalent) can write to training data locations — no human users, no anonymous access, no wildcard principals.
Instrument your retraining pipeline with a pre-training data integrity gate. Before each training job is submitted, compute a cryptographic checksum (e.g., SHA-256) of the training dataset and compare it against a stored baseline held in a separate, access-controlled location. If the hash does not match an approved value, abort the pipeline, log the discrepancy, and alert the ML operations team. Pair this with statistical anomaly detection (e.g., flag if rating distributions shift beyond two standard deviations from historical baselines) to catch subtle, low-volume poisoning that may not alter the overall file hash.
Enable versioning and object-level logging on all ML training data buckets. S3 versioning allows rollback to a known-good dataset state immediately after detecting poisoning. Server access logs and CloudTrail integration provide the forensic trail needed to identify when unauthorized writes occurred, from which identity or IP, and which specific objects were modified — critical for post-incident recovery and pipeline hardening.

Common Pitfalls

Treating ML training data storage with lower security rigor than application databases. Engineering teams often apply strict access controls to production databases while leaving S3 buckets that feed ML pipelines with overly permissive ACLs — sometimes public, sometimes with broad write permissions granted to entire AWS accounts or roles. Because these buckets do not directly serve user traffic, they are perceived as low-risk. The AI Goat scenario demonstrates that a publicly writable training data bucket is a direct, exploitable path to live model behavior manipulation.
Automating retraining pipelines without any integrity gate between data mutation and model training. Many ML pipeline implementations trigger retraining on any change event (e.g., an S3 PutObject notification), with no validation step between the data write and the training job submission. This design means that any actor who can write to the bucket can directly and immediately influence what the model learns, with no human review and no anomaly detection standing in the way.

Output Integrity Attacks — Bypassing ML Model Predictions

One of the most underappreciated AI security vulnerabilities in machine learning infrastructure is the output integrity attack — a class of threat where the model itself is never compromised, yet an attacker successfully controls what the application believes the model said. Unlike data poisoning or supply chain attacks, output integrity attacks exploit the gap between a model’s actual prediction and the value the application ultimately acts on.

What Is an Output Integrity Attack?

In an output integrity attack, the attacker needs no access to the model, its weights, or its training data. The model continues to function exactly as designed. Instead, the attacker targets the communication layer between the model and the application — intercepting, substituting, or simply supplying prediction values that the application trusts without verification.

A real-world precedent occurred in 2023, when an attacker targeted a health diagnosis system and caused it to return incorrect diagnoses by manipulating the output pathway rather than the underlying model. The attacker positioned themselves between the model’s output and the consuming application, intercepting the legitimate prediction and replacing it with an incorrect diagnosis before delivery — likely a man-in-the-middle or web-layer attack. The model itself logged no anomaly, as its internal computation was never disturbed.

The AI Goat Scenario: Comment Filtering Bypass

The AI Goat^[1] demonstration uses a comment filtering feature backed by an ML model. The model evaluates submitted comments and returns two fields: is_offensive (a binary flag) and probability (the model’s confidence score). The word “pond” is flagged as offensive — making it the target for bypass.

Output Integrity Bypass — Overriding Comment Filtering Model Predictions with Client-Controlled Payload Fields

A comment filtering ML model’s predictions can be fully bypassed by including the model’s own output fields (is_offensive, probability) directly in the client request payload, causing the backend to consume attacker-supplied values instead of querying the model — a critical output integrity failure rooted in improper input and output validation.

Step-by-step discovery using Burp Suite^[10]:

Establish baseline — blocked word. Submit “pond”. Response shows is_offensive: 0 (model flags it as offensive), probability: 0.8, and no comment ID. Comment rejected.
Establish baseline — allowed word. Submit “great”. Response shows is_offensive: 1 (not offensive), a corresponding probability, and a comment ID confirming the comment was saved.
Identify the anomaly. Observe that is_offensive and probability — the model’s prediction outputs — are present in the client-sent request payload. These fields should be server-side-only values produced by the SageMaker endpoint via the Lambda^[6] function.
Attempt is_offensive: false (boolean). Server rejects it — type mismatch; the backend expects numeric 0/1.
Attempt is_offensive: 1 alone. Comment still rejected. The probability field has not yet been altered.
Combine both overrides. Set is_offensive: 1 and probability: 1 while keeping content: pond. Server responds with a new comment ID. Bypass complete.
Verify on the front end. Reload the product page. “Pond” appears publicly in the comment section — the ML-based filtering model was never consulted.

The Root Cause: Caching Gone Wrong

The most likely technical explanation is a caching assumption baked into the application design. To avoid repeatedly querying the model on the same word or phrase, the developer likely implemented logic that allows the client to pass along previously received prediction values, trusting that the client would not alter them. The application treats a client-supplied is_offensive value as a valid cached result, bypassing the model entirely when those fields are present in the payload.

This is a classic improper input validation failure: the application accepts fields from an untrusted source (the client) that should only ever be generated by a trusted source (the model endpoint). The schema was never enforced to reject unexpected fields.

The Two Vulnerability Classes at Play

1. Improper Input Validation The application accepts client-controlled values (is_offensive, probability) that should be server-generated. A user submitting a comment should only be able to supply author and content. Any prediction-related fields must be stripped from inbound requests or rejected outright at the schema boundary.

2. Improper Output Validation Even if the model generates a correct prediction, the application fails to verify that the prediction value it acts on is the same one the model returned. Between the model endpoint invocation and the final enforcement decision, the prediction traverses Lambda^[6], API Gateway^[7], and web server layers — none of which verify that the is_offensive value matches the model’s actual output for the submitted content.

Defensive Mitigations

Server-Side Prediction Integrity Checks Every prediction used in a business logic decision must originate from a server-side model invocation tied to the specific input being evaluated. The application should invoke the model, receive the prediction, and act on that prediction in a single, unbroken server-side flow. Client-supplied prediction values must never be trusted.

Strict Input Schema Enforcement Define and enforce a strict input schema at the API boundary. The comment submission endpoint should accept only author and content. Any additional fields should be rejected with a 400 error or silently stripped before any processing occurs.

End-to-End Output Validation Implement validation checkpoints at each step of the prediction pipeline — from the moment the model endpoint returns a result to the moment that result is enforced. If the prediction value at enforcement time does not match the freshly generated model output for the given input, reject the request. This is especially important in multi-hop architectures (endpoint → Lambda → API Gateway → web server).

Actionable Takeaways

Audit every API endpoint that accepts client input and cross-reference the full request schema against what the server actually needs from the client. Any field in the request that is derived from or mirrors a model output (e.g., prediction scores, classification flags, confidence values) must be removed from the accepted input schema and regenerated server-side on every request. Use a strict allowlist-based schema validator (such as JSON Schema or Pydantic) to reject unexpected fields at the API boundary before they reach any business logic.
Instrument your model invocation pipeline to log both the raw model output and the prediction value that was ultimately used for each business decision. Periodically reconcile these logs to detect divergence — any case where the enforced prediction differs from the model's actual output for the same input is evidence of a potential output integrity attack or a logic bug. Automated anomaly detection on this divergence is a lightweight but effective control.
When implementing caching for model predictions, never use client-supplied values as the cache source. If caching is necessary to reduce model query load, implement it server-side using a keyed store (e.g., Redis) where the key is a hash of the model input and the value is the model's own output. The client must never be able to write to or influence this cache.

Common Pitfalls

Designing request schemas around the response structure rather than the input structure. When developers build the comment submission payload by copying the shape of the model response (which includes is_offensive and probability), they inadvertently expose those fields to client manipulation. The request schema should be defined independently, based only on what the client legitimately needs to supply, and reviewed against the principle that clients are untrusted.
Assuming that caching optimizations are purely a performance concern with no security implications. The caching assumption in this vulnerability — that a client can supply a prior prediction value to avoid re-querying the model — is a security boundary failure dressed up as a performance feature. Any optimization that delegates trust to the client for values that affect business logic must be treated as a security design decision and reviewed accordingly.

Securing Machine Learning Infrastructure — Defense Principles and Best Practices

Addressing AI security vulnerabilities in machine learning infrastructure requires moving beyond ad hoc fixes toward a coherent, layered defense model. The three attack scenarios in AI Goat^[1] — supply chain exploitation, data poisoning, and output integrity manipulation — each expose a distinct failure domain. The defensive principles that neutralize them overlap significantly, forming a unified framework applicable to any ML pipeline.

Access Control and the Principle of Least Privilege

The data poisoning scenario made the consequences of misconfigured access control explicit: a publicly readable and writable S3^[9] bucket storing the recommendation model’s training data allowed an unauthenticated attacker to download, modify, and re-upload the dataset, triggering an automated retraining cycle that corrupted model behavior. No asset in the ML environment — storage bucket, SageMaker^[2] notebook, endpoint, or Lambda^[6] function — should carry permissions beyond what its specific function requires. Audit IAM roles and bucket policies against the principle of least privilege, and explicitly block public access on all storage resources unless a documented business requirement demands otherwise.

Input and Output Validation

Two of the three attack scenarios exploited the same vulnerability class from different angles. In the supply chain attack, malicious shell commands were embedded in image metadata and executed because the preprocessing function performed no sanitization before passing user-supplied data to a shell. In the output integrity attack, the application accepted attacker-supplied values for is_offensive and probability from the request payload, bypassing the comment-filtering model entirely.

For ML pipelines, this means:

Strict input schemas: Define and enforce exactly what the model endpoint is permitted to receive. Reject any field not in the schema before it reaches preprocessing or inference logic.
Output integrity verification: From the moment a prediction leaves the model to the moment it is enforced, verify that the value has not been altered — whether by a caching layer, a middleware bug, or an attacker.

Vulnerability Management

The supply chain attack succeeded because a preprocessing dependency contained a remote code execution vulnerability. Continuous vulnerability scanning across all ML dependencies — with alerting on newly published CVEs and enforcement of verified package signatures — is the minimum viable control. Before importing or updating any package, validate its authenticity against vendor-supplied checksums or signatures.

Continuous Model Monitoring

Detection speed is a direct function of monitoring coverage. For ML infrastructure specifically, monitoring should capture:

Who is accessing the model and its training data
What inputs the model is receiving and what outputs it is producing
Query volume and resource consumption patterns
Changes to training datasets or model artifacts

Anomalies in any of these signals — an unexpected write to a training data bucket, a sudden shift in prediction distributions, a spike in endpoint invocations — can indicate an active data poisoning or supply chain attack in progress.

Excluding Sensitive Data from Training Sets

ML models are not secure secret stores. Training data can surface through model inversion attacks, membership inference, or simple output leakage. Any personally identifiable information, credentials, internal system details, or business-sensitive data that enters a training set should be considered potentially recoverable by a motivated attacker. Establish a data classification gate before any dataset is used for training.

Developer Security Education as the Highest-Leverage Control

Across all three attack scenarios, the underlying vulnerabilities were introduced at development time — a shell injection in a preprocessing function, a public bucket created without access controls, an API endpoint that trusted client-supplied prediction fields. Each could have been prevented by a developer who understood the relevant risk. Security engineers should advocate for ML-specific secure development training that covers:

The OWASP Top 10 Machine Learning Risks^[3]
Hands-on practice with deliberately vulnerable AI infrastructure environments like AI Goat^[1]
Secure-by-default patterns for cloud-hosted ML pipelines
Threat modeling as a required step before deploying new model features

Actionable Takeaways

Audit every IAM role, storage bucket policy, and network exposure associated with your ML pipeline against the principle of least privilege. Explicitly enumerate what each component needs to read, write, and invoke — then remove everything else. For any storage resource holding training data or model artifacts, verify that public access is blocked and that write operations require authenticated, authorized identities.
Implement schema-enforced input validation at every model endpoint and add output integrity checks between the inference layer and any downstream consumer. Reject requests that include fields the model is not designed to receive (such as prediction confidence values supplied by the client), and verify that prediction outputs have not been modified between the model and the point of use — do not allow caching layers or middleware to substitute client-supplied values for model-generated ones.
Integrate continuous vulnerability scanning for all ML dependencies into your CI/CD pipeline, with alerts on newly published CVEs and enforcement of package signature verification. Pair this with runtime monitoring of model endpoints — logging inputs, outputs, query volumes, and data access patterns — so that anomalies indicative of supply chain compromise or data poisoning are detected quickly rather than discovered after model behavior has already degraded.

Common Pitfalls

Treating training data storage as low-sensitivity infrastructure. The data poisoning scenario demonstrated that a writable training data bucket is effectively a backdoor into the model itself — an attacker who can modify the dataset can alter model behavior without ever touching the model code. Teams that apply strong access controls to compute resources but leave storage buckets with permissive policies create an exploitable gap that is easy to overlook during standard security reviews.
Trusting client-supplied values anywhere in the inference pipeline. The output integrity attack succeeded because the application used attacker-provided is_offensive and probability fields from the request payload instead of enforcing that only the model's own output was authoritative. This pattern — where application logic accepts input fields that should only ever be generated server-side — is a recurring mistake in APIs built around ML models, particularly when caching or performance optimizations are introduced without a corresponding security review of which fields must remain server-controlled.

Conclusion

AI Goat^[1] makes the abstract concrete: three OWASP ML risks that live primarily as documentation become exploitable, measurable attack chains against real AWS infrastructure. The patterns demonstrated — a vulnerable preprocessing dependency exposing RCE, a publicly writable S3 training bucket enabling silent model retraining, and client-supplied prediction fields bypassing ML-based filtering — are not hypothetical. They reflect misconfigurations that production ML teams make today.

The defensive framework that emerges from all three scenarios is consistent: least-privilege access controls on all ML assets, strict input and output validation at every layer, continuous vulnerability scanning across the full dependency graph, and developer education as the control that prevents these issues from being introduced in the first place.

For further reading on related topics, see: application security fundamentals, cloud security misconfigurations, and supply chain attack techniques and defenses.

References & Tools

AI Goat — Open-source deliberately vulnerable AI infrastructure on AWS SageMaker; operationalizes OWASP Top 10 ML Risks through three hands-on attack and defense scenarios. ↩
Amazon SageMaker — AWS managed ML service used in AI Goat for notebook execution, training jobs, and endpoint deployment; representative of enterprise ML platforms. ↩
OWASP Top 10 Machine Learning Risks — Authoritative framework published in 2023 categorizing the primary security risks facing machine learning systems. ↩
TensorFlow — Widely adopted open-source ML framework cited as an example of AI packages carrying native vulnerabilities that represent supply chain risk when integrated into custom applications. ↩
PyTorch — Widely adopted open-source ML framework cited alongside TensorFlow as an example of AI packages contributing to the finding that 62% of AI-package-using organizations are already exposed. ↩
AWS Lambda — Invocation layer between API Gateway and SageMaker endpoints in the AI Goat architecture; the function handling comment submissions is the component that incorrectly accepts client-supplied prediction fields. ↩
AWS API Gateway — Connects the web server to the Lambda function; its traffic is analyzed during the supply chain attack to discover the vulnerable preprocessing endpoint. ↩
Pillow (PIL) — Python image processing library used by the vulnerable preprocessing function in the supply chain attack; the process_image function built on Pillow contained the metadata-execution vulnerability enabling RCE. ↩
Amazon S3 — Cloud object storage used to host the ML training dataset and model artifacts; misconfigured public bucket with anonymous read/write access is the primary attack surface in the data poisoning scenario. ↩
Burp Suite — Web application testing proxy used to intercept and modify HTTP request payloads, enabling discovery and exploitation of the output integrity vulnerability. ↩
Terraform — Infrastructure-as-code tool used to automate AI Goat deployment on AWS; one of two supported deployment paths. ↩
GitHub Actions — CI/CD-based alternative deployment path for AI Goat, allowing deployment via forked repository secrets without a local Terraform setup. ↩

AI Goat: A Damn Vulnerable AI Infrastructure

The AI Security Threat Landscape and OWASP ML Top 10 Risks

AI Supply Chain Attacks — Exploiting Vulnerable ML Dependencies

What Is an AI Supply Chain Attack?

The Attack Chain: RCE via Malicious Image Metadata

AI Supply Chain RCE via Malicious Image Metadata in a Vulnerable Preprocessing Package

Why This Attack Pattern Is Particularly Dangerous

Defensive Mitigations

Data Poisoning Attacks — Manipulating ML Training Data at Rest

What Is a Data Poisoning Attack?

The Attack Chain: From Anonymous Bucket Enumeration to Recommendation Override

Data Poisoning via Anonymous S3 Write Access to Override ML Recommendation Model Training Data

Why This Attack Surface Exists

Defensive Mitigations

Output Integrity Attacks — Bypassing ML Model Predictions

What Is an Output Integrity Attack?

The AI Goat Scenario: Comment Filtering Bypass

Output Integrity Bypass — Overriding Comment Filtering Model Predictions with Client-Controlled Payload Fields

The Root Cause: Caching Gone Wrong

The Two Vulnerability Classes at Play

Defensive Mitigations

Securing Machine Learning Infrastructure — Defense Principles and Best Practices

Conclusion

References & Tools

Questions from the audience

Related deep dives

Breaking AI Agents: Exploiting Managed Prompt Templates to Take Over Amazon Bedrock Agents

When Passports Execute: Exploiting AI Driven KYC Pipelines | [un]prompted 2026

Agents Exploiting Auth-by-One Errors | [un]prompted 2026

Code Is Free: Securing Software | [un]prompted 2026