The Cyber Archive

Shared-GPU Security Learnings from Fly.io

Learn how Fly.io secured shared GPU infrastructure using VFIO, IOMMU isolation, and firmware auditing — a practical guide to multi-tenant GPU security.

MB
Deep dive of a talk by
Matthew Braun
19 April 2026
8337 words
46 min read

Matthew Braun presenting talk - Shared-GPU Security Learnings from Fly.io at fwd:cloudsec North America 2025
Matthew Braun presenting talk - Shared-GPU Security Learnings from Fly.io at fwd:cloudsec North America 2025

Plugging a GPU into a multi-tenant cloud host is functionally equivalent to handing a stranger remote physical access to your server — complete with direct memory access, nonvolatile storage, and the ability to DMA willy-nilly across host RAM. That’s the shared GPU security problem Fly.io confronted when building GPU infrastructure for customers in 2024, and the attack surface is far broader than most cloud engineers expect. When a GPU bypasses the IOMMU through PCIe peer-to-peer routing or an undisclosed NVLink cable, every isolation guarantee your hypervisor provides evaporates.

This post breaks down how Fly.io’s security team evaluated and hardened GPU passthrough using VFIO, IOMMU groups, PCIe Access Control Services (ACS), and firmware integrity checks — drawing on real-world lessons from deploying a dual-hypervisor fleet with Cloud Hypervisor and Firecracker. Security engineers running or evaluating shared GPU infrastructure will find a practical audit checklist for the attack surfaces that standard cloud security guides overlook entirely.

Key Takeaways

  • You'll learn how GPU passthrough via VFIO and IOMMU enforces memory isolation in multi-tenant cloud environments — and exactly where that boundary breaks down when PCIe peer-to-peer communication bypasses the IOMMU entirely.
  • You'll be able to identify and harden the three layered attack surfaces unique to shared GPU deployments: PCIe topology (ACS/ATS controls), NVLink physical interconnects, and GPU firmware blobs (vBIOS, GSP driver loading) that create persistent compromise vectors.
  • Apply this framework to evaluate any cloud GPU provider's security posture — or your own — by auditing IOMMU group membership, ACS enablement, NVLink cabling, and vBIOS signature verification before trusting shared hardware with sensitive workloads.

Why Shared GPU Security Is a Different Threat Model

Shared GPU Security Starts With a Counterintuitive Reality

The shared GPU security problem is not a variation on existing multi-tenant compute risk — it is a categorically different threat model. Matt Braun from Fly.io framed it directly: a data-center-grade GPU card, if handled carelessly, is “functionally equivalent to remote physical access to your host.” That statement deserves to land. These are not passive accelerator cards. They are full secondary systems.

GPUs Are Secondary Systems With Independent Attack Surface

Data-center-grade GPU cards — the kind you put in a cloud host to offer to customers — carry properties that consumer graphics cards do not:

  • Their own firmware — multiple firmware blobs loaded at card initialization, not all of them signed
  • Nonvolatile storage — an info ROM that persists across power cycles, making it a viable persistence vector
  • Unconstrained DMA capability — without explicit containment, a GPU can issue Direct Memory Access reads and writes to arbitrary locations across host RAM and connected devices

The transcript describes this directly: “these things are basically full-on secondary systems. They have their own firmwares, their own nonvolatile storage. They can do DMA willy-nilly all over your host.”

The ThunderStrike Analogy Reframed for Cloud Infrastructure

To calibrate the severity, Braun invoked the ThunderStrike family of attacks — where plugging a Thunderbolt cable into a MacBook was sufficient to unlock it and compromise firmware. The shared GPU threat model is structurally identical, but the attack is delivered over the internet rather than requiring physical proximity. A guest workload running on a shared GPU host, given sufficient privilege or a misconfigured isolation boundary, can achieve the same class of impact:

  • Compromise the hypervisor host
  • Attack co-resident tenants through DMA or side channels
  • Move laterally through the network from the host’s privileged position
  • Persist across customer VM lifecycles via firmware modification

Braun characterized this as “evil maid as a service” — a reference to the classic physical-access attack scenario, now available as a cloud primitive.

Why This Breaks Standard Cloud Security Guarantees

The transcript notes that unmitigated GPU passthrough “violates a lot of the guarantees that you get from being in an ISO 27001 data center, having person traps, all that jazz.” Physical security controls — mantrap entries, badge access, caged hardware — exist specifically to prevent untrusted parties from touching server hardware directly. Shared GPU access hands a tenant something close to that direct hardware interface, bypassing every physical control in the model.

This is the foundational insight that motivates every technical control in the sections that follow: VFIO[1], IOMMU group isolation, PCIe ACS rules, NVLink controls, and firmware integrity checks are all compensating controls for a threat that sits below the hypervisor layer — one that standard cloud security architecture was not designed to address.

Actionable Takeaways

  • Before offering or consuming shared GPU compute, explicitly model the GPU as a secondary system with DMA access — not as an accelerator card. Audit whether your isolation architecture addresses direct memory access, persistent storage on the card, and firmware integrity independently of your hypervisor controls.
  • Apply the "evil maid as a service" framing when evaluating shared GPU providers: ask whether a sufficiently privileged co-tenant could achieve the equivalent of physical hardware access to your host. If the provider cannot explain their IOMMU group configuration, PCIe ACS posture, and firmware verification process, treat the answer as no.
  • Recognize that ISO 27001 or SOC 2 certification does not address GPU-layer DMA and firmware risks — these controls operate at the physical and organizational layer, not the hardware isolation layer. Treat shared GPU security as a distinct risk category requiring its own technical controls.

Common Pitfalls

  • Assuming hypervisor-level isolation (e.g., VM boundaries enforced by Firecracker or KVM) is sufficient for GPU multi-tenancy. GPU DMA operates below the hypervisor's visibility unless IOMMU containment is explicitly configured and verified — the hypervisor cannot intercept or block a GPU issuing DMA to arbitrary host memory addresses without it.
  • Treating data-center GPU cards as equivalent to consumer GPU cards in a threat model. Consumer cards do not typically carry independent nonvolatile storage, full firmware stacks, or the DMA capabilities of cards like the Nvidia A10 or L40S. The threat surface is fundamentally different at the hardware specification level.

GPU Virtualization Options and Why They Failed for Cloud Passthrough

The Hypervisor Problem: Why Firecracker Couldn’t Deliver GPU Passthrough

Fly.io’s primary compute platform runs on Firecracker[2], the open-source VMM that also powers AWS Lambda and Fargate. Firecracker is purpose-built for fast, lightweight microVMs — but it has a hard limitation: no PCIe passthrough support. For a team trying to expose GPU hardware directly to customer VMs, this was a blocking constraint.

Building PCIe passthrough into Firecracker from scratch was theoretically possible — it’s open source — but the engineering lift was prohibitive. Instead, Fly.io turned to Cloud Hypervisor[3], a Rust VMM that shares many of the same core components as Firecracker (inherited from the same VMM codebase) but extends the feature set to include PCIe passthrough. This trade-off came with costs: expanded attack surface and the operational complexity of running a dual-hypervisor fleet, with Firecracker handling standard workloads and Cloud Hypervisor handling GPU instances.

The Three Virtualization Options Fly.io Evaluated

Once the hypervisor problem was solved, the next question was how to virtualize the GPU itself — specifically, how to safely partition a single physical card across multiple customer workloads. Three candidate approaches existed, and only one was actually usable.

Option 1: Nvidia MIG (Multi-Instance GPU)

MIG is Nvidia’s hardware-enforced GPU partitioning solution. On data-center-grade Nvidia cards, a single physical card actually contains multiple discrete GPUs, each with its own independent memory. MIG allows the card’s hardware logic to group or split those individual GPUs, enforcing isolation at the silicon level rather than in software.

From a security standpoint, MIG is the gold standard for multi-tenant GPU isolation — the boundary is enforced by hardware, not a driver or hypervisor that could be subverted. However, MIG requires Red Hat or VMware as the host platform. Fly.io, running Cloud Hypervisor, was ineligible. MIG was ruled out entirely.

Option 2: Nvidia vGPU

vGPU is Nvidia’s software-layer partitioning solution, operating at the host driver level rather than in hardware. It offers broader hypervisor compatibility than MIG — but still no support for Cloud Hypervisor. That alone was disqualifying.

Beyond the compatibility wall, vGPU introduced a second problem: per-provisioning licensing fees. Every time a vGPU instance is provisioned, the Nvidia driver phones home to Nvidia’s licensing server and a fee is charged. For an enterprise running long-lived, stable GPU workloads, this model might be acceptable. For Fly.io — an ephemeral compute platform that spins VMs up and down in response to individual HTTP requests — it was economically unworkable. The billing model and the product model were fundamentally incompatible.

Option 3: VFIO Direct Passthrough (The Chosen Path)

With both managed virtualization options eliminated, Fly.io took a different approach: skip virtualization entirely and pass through the GPU hardware directly to the VM using VFIO[1] (Virtual Function I/O).

The key insight that made this viable is the physical structure of data-center-grade Nvidia cards. A single card contains multiple discrete GPUs with independent memory. VFIO allows each of those discrete GPUs to be packaged as a virtual function and exposed directly to a guest VM, with the IOMMU (I/O Memory Management Unit) providing the hardware isolation boundary that prevents one VM’s GPU from touching another VM’s memory.

This is not true virtualization — the guest VM receives dedicated hardware, not a time-sliced or partitioned share of a GPU. From the customer’s perspective, they get that physical GPU and nothing else. Oversubscription is not possible with this model. But for Fly.io’s use case, the trade-off was acceptable: customers got real GPU hardware, and the IOMMU provided the memory isolation that MIG and vGPU would have provided through partitioning.

Security Implications of the Architecture Choice

The choice to bypass managed virtualization in favor of VFIO passthrough has direct security consequences. MIG and vGPU both provide structured, vendor-maintained isolation guarantees — the security boundary is something Nvidia has designed, tested, and supports. With VFIO, the isolation burden shifts to the operator.

As Matt Braun noted in the talk: because Fly.io could not use MIG or vGPU, they had to “go to a lower level and put our protections closer to the metal.” That meant implementing and maintaining their own monitoring and enforcement on the IOMMU boundary, BAR access controls, PCIe topology, and firmware integrity — each of which represents an attack surface that the managed virtualization security solutions would have partially addressed out of the box.

Security engineers evaluating cloud GPU providers should treat the underlying virtualization model as a first-order security signal. A provider using MIG on supported hardware has a fundamentally different (and more defensible) security boundary than one using raw VFIO passthrough with self-managed controls.

Actionable Takeaways

  • When evaluating a cloud GPU provider's security posture, ask directly which virtualization model they use: MIG (hardware-enforced, strongest boundary), vGPU (software driver-level), or VFIO passthrough (direct hardware, operator-managed isolation). The answer tells you where the security boundary sits and who is responsible for maintaining it.
  • If you are building GPU infrastructure and MIG or vGPU are viable options for your hypervisor stack, prefer them over raw VFIO passthrough. The managed Nvidia solutions provide vendor-maintained isolation guarantees that reduce the attack surface you must defend yourself; VFIO shifts the entire isolation burden to your own controls.
  • When Firecracker (or any hypervisor lacking PCIe passthrough) is your primary VMM, plan for a dual-hypervisor fleet if you need GPU workloads. Cloud Hypervisor shares core Rust VMM components with Firecracker and adds PCIe passthrough, but the operational overhead and expanded attack surface of a mixed fleet must be accounted for in your security model.

Common Pitfalls

  • Assuming that VFIO passthrough provides security isolation equivalent to MIG or vGPU. VFIO + IOMMU provides DMA memory isolation between VMs, but it does not provide the structured hardware partitioning, firmware boundary controls, or vendor-maintained isolation guarantees that MIG delivers. Operators using VFIO must implement additional controls (BAR monitoring, ACS configuration, firmware auditing) to approximate the security posture that managed virtualization provides out of the box.
  • Overlooking per-provisioning licensing costs when evaluating vGPU for ephemeral workloads. The vGPU model charges a fee every time a GPU instance is provisioned — a cost structure that is workable for stable, long-running enterprise workloads but economically incompatible with ephemeral compute platforms that spin VMs up and down on demand.

VFIO and IOMMU Isolation: How the Security Boundary Actually Works

With MIG ruled out by hypervisor constraints and vGPU ruled out by per-provisioning licensing fees, Fly.io chose the lowest-level option available: pass individual GPU slices directly to guest VMs using VFIO[1], relying on the IOMMU to enforce the security boundary. Understanding how this boundary actually works — and where it stops working — is the foundation for auditing any shared GPU deployment.

What VFIO Does and Why It Matters for PCIe Security

VFIO is a Linux kernel framework that exposes PCIe devices directly to user space in a controlled, safe manner. Rather than letting a guest driver talk to hardware through a software emulation layer, VFIO hands off actual device control — but only after verifying that the device is properly isolated in its own IOMMU group.

The critical constraint VFIO enforces before handing off control is group membership: a device can only be passed through if it occupies its own dedicated IOMMU group. VFIO performs control checks as part of the handoff, but the real enforcement happens at the IOMMU level. VFIO is the gatekeeper; the IOMMU is the actual wall.

For Fly.io’s architecture, each physical Nvidia data center card contains multiple discrete GPUs with independent memory. VFIO packages each of those as a virtual function, exposes it to a guest VM, and lets the IOMMU handle isolation between the virtual functions. This is the mechanism that makes GPU passthrough viable without the hardware-level partitioning that MIG provides.

How the IOMMU Enforces the Memory Isolation Boundary

The IOMMU (Input/Output Memory Management Unit) is the component that prevents a passed-through GPU from issuing DMA arbitrarily across host RAM and other devices. Without the IOMMU, a GPU assigned to a guest VM could read and write to any physical memory address on the host — including memory belonging to other VMs, the hypervisor, or other devices. This is the DMA-as-remote-code-execution threat model that makes GPU passthrough dangerous by default.

What the IOMMU does is remap device virtual addresses (DVAs/IOVAs) to specific physical memory regions on the host. DMA operations are contained to the memory ranges that have been explicitly permitted. Anything outside those ranges is blocked at the hardware level.

The IOMMU enforces these permissions at the granularity of IOMMU groups. An IOMMU group is a set of devices that the hardware considers to share a trust boundary — typically devices connected through the same PCIe path that cannot be fully isolated from each other at the hardware level. The key security rule is:

If more than one device shares an IOMMU group, all bets are off. A device in a group can potentially communicate with other devices in the same group outside the IOMMU’s control.

This means before passing any device through to a guest, you must verify it is the sole member of its IOMMU group. If there are unexpected devices co-grouped with your GPU, those devices are inside the trust boundary — and any guest with access to the GPU effectively has access to those co-grouped devices too.

MMIO and Base Address Registers: The Device Control Interface

Beyond DMA, there is a second channel through which a guest VM interacts with a passed-through GPU: MMIO (Memory-Mapped I/O). This is the mechanism by which the guest CPU — or guest VM — actually controls the device: triggering jobs, reading status registers, initiating firmware operations.

Rather than writing to dedicated I/O ports, MMIO maps device registers and device memory into the system’s address space. A guest writes to a specific memory range, and that write is backed by a device register. For example, writing to a particular address range tells the GPU to start processing a job from the frame buffer.

The memory regions that define where these device registers and memory banks live are described by BARs (Base Address Registers). A single PCIe device can have up to six BARs, each mapping a different region:

  • Small BARs (e.g., 16 MB or smaller) typically expose control registers — configuration, status, command interfaces.
  • Large BARs (e.g., 256 MB or 32 MB) typically expose the GPU’s frame buffer memory or bulk data throughput regions.

Each BAR also carries access policy metadata: prefetchability guarantees and read/write permission constraints. This structure is security-relevant because the BARs define the full attack surface a guest can reach through MMIO — if a guest can write arbitrarily to a control BAR, it can issue commands directly to the GPU firmware.

When inspecting a GPU on a host using lspci -kvn[4], you will see the IOMMU group assignment alongside the full BAR layout. In Fly.io’s example, IOMMU group 14 contained a single Nvidia Tesla, and three BARs were visible: 16 MB (control/config), 256 MB (frame buffer), and 32 MB (additional data region). From the guest’s perspective, the same three BARs appear at different virtual memory addresses — the guest sees physical slot assignment with no visibility into the IOMMU remapping happening beneath it.

Monitoring and Enforcing BAR Access Policies

Because Fly.io could not rely on MIG or vGPU to enforce hardware-level isolation between tenants, they had to implement BAR monitoring and access controls at a lower level. Two approaches were evaluated:

eBPF-based monitoring: Fly.io is a heavy eBPF[5] shop, and eBPF programs can be attached to kernel paths that handle BAR access, allowing detection and enforcement of which memory regions a guest is touching. eBPF gives visibility into access patterns without requiring changes to the VFIO-PCI driver itself.

VFIO-PCI driver shim: An alternative approach involves shimming the VFIO-PCI driver — intercepting calls before they reach the device and checking them against the known BAR layout. The shim can allow or deny specific access ranges, effectively enforcing a whitelist of permitted device interactions at the kernel level.

Both approaches serve the same purpose: since the guest controls a real device (not an emulated one), any guest-side misconfiguration or malicious driver behavior will manifest as real hardware operations. Monitoring BARs gives the host visibility into those operations; a driver shim can block them. Neither is as clean as hardware-enforced MIG partitioning, but both provide meaningful defense-in-depth when hardware virtualization options are unavailable.

Practical Audit Steps for IOMMU Group Verification

The transcript is explicit about what can go wrong if IOMMU groups are not verified: unexpected devices inside a group silently widen the trust boundary. The recommended audit procedure is:

  1. Run lspci -kvn on the host to enumerate all PCIe devices, their IOMMU group assignments, and their BAR layouts.
  2. Verify that each GPU being passed through is the only device in its IOMMU group. Check the group directory at /sys/kernel/iommu_groups/{group_id}/devices/ — it should contain only the device you intend to pass through.
  3. Inspect BAR sizes to understand the attack surface. Control BARs (small) are a different risk profile from frame buffer BARs (large). Know which is which before writing access policies.
  4. Confirm VFIO binding — verify that the correct devices are bound to vfio-pci and not to a host driver that would give the host kernel direct access.

The underlying principle: VFIO and the IOMMU provide a strong isolation boundary for DMA and device memory access, but that boundary only holds if the IOMMU group is clean, ACS is correctly configured (covered in the next section), and BAR access is monitored. Skip any of those checks and the boundary degrades without warning.

Actionable Takeaways

  • Before enabling GPU passthrough for any tenant workload, run `lspci -kvn` and verify that every GPU being passed through is the sole member of its IOMMU group. Check `/sys/kernel/iommu_groups/{group_id}/devices/` directly — any unexpected co-grouped device expands the trust boundary to that guest.
  • Audit the BAR layout for each passed-through GPU and document the size and purpose of each BAR. Small BARs (16 MB range) typically map control registers; large BARs map frame buffer memory. Use this mapping to write targeted eBPF monitors or VFIO-PCI driver shims that detect anomalous access patterns to control regions.
  • When MIG or vGPU hardware-enforced isolation is unavailable, implement defense-in-depth at the BAR level using either eBPF hooks on VFIO access paths or a shimmed VFIO-PCI driver that enforces an access whitelist — this compensates for the absence of hardware partitioning guarantees.

Common Pitfalls

  • Assuming IOMMU group membership is always one-device-one-group on server hardware. On real deployments, PCIe topology and platform firmware can co-group multiple devices (e.g., a GPU and an adjacent NIC sharing a PCIe switch port) without any visible warning. The IOMMU silently treats them as a shared trust domain, giving a GPU tenant indirect access to co-grouped devices.
  • Treating VFIO as the security boundary rather than the IOMMU. VFIO performs checks before handing off device control, but once handoff occurs, the IOMMU is the only mechanism enforcing DMA containment. If IOMMU groups are misconfigured or IOMMU is disabled on the host (e.g., `intel_iommu=off` or absent from kernel parameters), the entire isolation model collapses and passed-through GPUs have unrestricted DMA access to host memory.

VFIO and IOMMU isolation architecture for shared GPU passthrough in multi-tenant cloud environments


Why VFIO and IOMMU Alone Are Not Enough: The PCIe Peer-to-Peer Problem

A common assumption when deploying shared GPU security with VFIO and IOMMU is that once IOMMU group isolation is in place, the memory boundary is fully enforced. That assumption breaks down the moment you reason about how modern PCIe actually works — and it is a critical gap for any security engineer building or auditing multi-tenant GPU infrastructure.

Old-school PCI was a flat, shared bus: every device on the bus could see every other device’s traffic. Modern PCIe replaced that with a hierarchical topology of point-to-point serial links, switches, and a root complex. The root complex is essentially the interface between the PCIe fabric and host memory. Devices communicate up through switches to the root complex — and crucially, the IOMMU checks that enforce DMA isolation only fire when traffic passes through the root complex. Traffic routed laterally between devices through a shared switch never reaches the root complex, so it never gets IOMMU-validated.

PCIe communication is signaled using Transaction Layer Packets (TLPs). TLPs carry source and destination addresses and can be routed up to the root complex, down to endpoint devices, or sideways through switches — device to device. In high-performance computing contexts this is intentional: Nvidia GPU Direct Storage uses exactly this peer-to-peer DMA capability to let a GPU talk directly to NVMe storage without bouncing data through the CPU. That is a feature. In a multi-tenant environment with multiple privilege levels on the same host, it is an attack surface.

The Four ACS Bits That Define Your PCIe Security Posture

Access Control Services (ACS) is the PCIe mechanism designed to control and restrict this peer-to-peer traffic. ACS exposes four controls that security engineers should evaluate on every GPU host:

  • Source Validation — Prevents a device from spoofing a different device’s request ID in a TLP header. Without this, a compromised device can impersonate a peer to route traffic it should not be able to send.
  • Request Redirect — Forces all upstream requests to pass through the root complex, ensuring IOMMU validation is applied to every DMA request. This is the primary control for closing the peer-to-peer bypass: if every TLP goes through the root complex, the IOMMU gets to inspect it.
  • Egress Control — Controls which downstream devices on a switch are permitted to receive forwarded traffic. This allows fine-grained isolation between devices sharing the same PCIe switch.
  • Direct Translated — Disables peer-to-peer entirely. No lateral TLP routing is permitted.

Fly.io’s security consultants recommended enabling both Request Redirect and Direct Translated — the “belt and suspenders” posture. There is overlap between the two (both push toward root-complex-mediated communication), but having both ensures that no path for lateral DMA remains open. Source Validation should also be enabled to prevent request ID spoofing regardless of routing policy.

The critical caveat: ACS is not universally available. Even server-grade, data-center-class hardware may lack ACS support. Before treating ACS as your primary peer-to-peer control, verify that the PCIe switches and endpoint devices in your specific host configuration actually expose these bits. If ACS is absent, you must compensate through IOMMU group design — ensuring no two tenant-controlled devices share a switch subtree where lateral routing could occur.

Address Translation Services (ATS): A Complementary Attack Vector

Alongside ACS, Address Translation Services (ATS) introduces a related but distinct risk. ATS is a PCIe feature that lets a device cache virtual-to-physical address mappings locally. The device issues an address translation request, receives a physical address from the IOMMU, and stores that mapping on-device for use in future TLP headers. The intended benefit is performance: subsequent DMA operations can use the cached physical address without a round-trip translation.

The security problem is straightforward: if a device can write an arbitrary physical address into a TLP header — for example, by loading a previously resolved physical address into a future DMA request — it gains a mechanism to target host memory regions it was never authorized to access. ATS effectively gives a compromised device a cache of “pre-approved” physical addresses that it can reuse, potentially pointing at memory belonging to other tenants.

Unless your workload has a specific, verified need for ATS-accelerated address translation, the recommendation is clear: disable ATS. The performance benefit rarely justifies the attack surface in multi-tenant GPU deployments.

NVLink is Nvidia’s high-bandwidth physical interconnect for ganging multiple GPUs together directly, bypassing the PCIe fabric entirely. For single-tenant high-performance computing this is a desirable capability — it enables GPU-to-GPU DMA at far higher bandwidth than PCIe allows. In a multi-tenant environment, it is a third DMA bypass path that sits entirely outside the IOMMU.

If NVLink is physically installed and enabled between two GPUs assigned to different tenants, an attacker with sufficient privilege on one VM can issue DMA through the NVLink interconnect into the memory space of the adjacent GPU — bypassing every IOMMU group boundary and ACS control that protects the PCIe path.

What makes NVLink particularly dangerous in practice is that it is a physical cable that can be installed without software-level awareness. Fly.io discovered exactly this on at least one production host: remote hands had installed an NVLink cable between GPUs because it was included with the hardware shipment. No one had asked them to. The cable was in place. Had an attacker achieved sufficient privilege on a tenant VM, they could have enabled NVLink programmatically and used it to DMA through the adjacent GPU’s address space.

The mitigation requires two layers: physical audit (verify that NVLink cables are not installed on any host where tenants share a physical machine) and software audit (verify that NVLink interfaces are not enumerable or enableable from within a guest VM). Do not assume that remote hands followed your hardware configuration intent — verify it.

Proof of Concept

  1. Physical prerequisite — cable present but not requested: Remote hands staff received hardware that shipped with NVLink cables included. Without being instructed otherwise, they installed the cable connecting multiple GPUs on the host. Fly.io discovered the cable was in place after the fact — it had not been requested and was not part of the planned configuration.
  2. NVLink as a PCIe bypass: NVLink is a high-bandwidth physical interconnect designed to gang multiple GPUs together for high-performance computing workloads. Unlike PCIe traffic, NVLink communication does not route through the PCIe switch hierarchy and therefore does not pass through the root complex where IOMMU enforcement occurs. The IOMMU isolation boundary that VFIO relies upon to contain guest VM DMA activity is rendered ineffective for traffic traversing NVLink.
  3. Attacker precondition — sufficient privilege on one GPU: An attacker would first need to achieve a level of privilege on one GPU instance sufficient to interact with the NVLink interface. In a VFIO passthrough model where the guest VM controls its GPU directly, a guest with deep access to the GPU’s firmware, driver stack, or control registers is the relevant attacker position.
  4. Enabling NVLink programmatically: With the cable physically present, the NVLink interface can potentially be enabled via GPU driver or firmware commands from within the guest. Because NVLink is a feature of the GPU hardware and its firmware, and because the guest loads its own driver stack (including the GSP driver ELF loaded from the guest side), a malicious or compromised guest could attempt to activate the NVLink interface without host-level authorization.
  5. Cross-GPU DMA via NVLink: Once NVLink is active between two GPUs, the attacker-controlled GPU can issue DMA reads and writes directly into the memory of the adjacent GPU — which may belong to a different tenant’s VM. This bypasses all IOMMU group isolation, VFIO containment, PCIe ACS rules, and any eBPF-based BAR monitoring the host has in place, because none of those controls apply to the NVLink path.
  6. Detection gap: Because the cable installation was unplanned and undocumented, Fly.io had no automated check that would have flagged it. The NVLink capability would not be visible through standard IOMMU group inspection (lspci, VFIO group enumeration) — it requires explicit auditing of physical cabling and GPU topology to detect.
  7. Mitigation: Audit physical GPU cabling on every host — do not assume hardware shipped in a default configuration matches your intended security posture. Explicitly document and enforce NVLink disable policies with data center remote hands. If NVLink is not required, verify it is not physically connected and cannot be enabled via firmware from the guest.

PCIe Peer-to-Peer DMA Attack Bypassing IOMMU via Switch Routing

Proof of Concept

  1. Understand the PCIe topology: Unlike legacy PCI (which used a flat shared bus), PCIe is hierarchical. Devices connect via point-to-point serial links through switches that route traffic upward toward the root complex (the interface to host memory) or laterally between devices on the same switch. This routing behavior is the precondition for the attack.
  2. Understand the IOMMU enforcement boundary: The IOMMU enforces DMA address validation only when TLPs are routed up to the root complex. TLPs are addressed using I/O Virtual Addresses (IOVAs) or physical addresses, and the IOMMU’s job is to remap those to valid physical memory ranges for authorized devices. When a TLP is routed laterally — device to device — via a PCIe switch, it never reaches the root complex and therefore never undergoes IOMMU validation.
  3. Identify the attack scenario: In a multi-tenant GPU host, multiple GPUs belonging to different tenants may be connected to the same PCIe switch. If a tenant can issue DMA operations from their GPU (which they can — GPU direct operations generate TLPs), those TLPs can be addressed to a neighboring device’s memory region and routed by the switch without passing through the IOMMU. The IOMMU, which was trusted as the core isolation layer for the VFIO passthrough architecture, is completely bypassed.
  4. Recognize this as a known legitimate feature being exploited: The speaker notes that PCIe peer-to-peer communication is intentionally useful in high-performance computing — Nvidia GPU Direct Storage is exactly this capability used legitimately. The threat model is the same mechanism applied in a multi-tenant context where the “peer” belongs to a different security principal.
  5. Identify the intended mitigation — Access Control Services (ACS): ACS is a PCIe capability designed to close this gap. The relevant ACS control bits are: Source Validation (prevents request ID spoofing), Request Redirect (forces all requests through the root complex, where IOMMU validation is applied), Egress Control (isolates which downstream devices can receive forwarded traffic), and Direct Translate Disable (disables peer-to-peer transactions entirely). The speaker recommends enabling both Request Redirect and Direct Translate Disable together for defense-in-depth.
  6. Identify the limitation: ACS is not universally available. The speaker explicitly warns that even server-grade hosts may not expose ACS controls. Organizations must audit whether ACS is present and correctly configured — it cannot be assumed. If ACS is absent, there is no software-layer mitigation for peer-to-peer attacks on the current hardware.
  7. Note the complementary risk from ATS: ATS allows a device to request resolution of a virtual address to a physical address and cache that mapping locally for use in future TLP headers. If a device can write a physical address into a TLP header, there is nothing stopping it from inserting an arbitrary physical address — effectively allowing physical address spoofing in DMA operations. ATS should be disabled unless explicitly required.

PCIe and ACS:

  • Enumerate PCIe topology on every GPU host and identify all devices that share a PCIe switch subtree with tenant-controlled GPUs.
  • Check ACS capability registers on the switch and endpoint devices: lspci -vvv | grep -A 20 "Access Control".
  • Verify that Source Validation, Request Redirect, Egress Control, and Direct Translated are all enabled where supported.
  • If ACS is not available on a switch, isolate tenant GPUs into separate IOMMU groups that do not share a switch subtree.

ATS:

  • Check whether ATS is enabled on GPU devices: lspci -vvv | grep "Address Translation".
  • Disable ATS in BIOS/UEFI or via PCIe capability registers unless there is a documented and reviewed requirement.

NVLink:

  • Physically inspect all GPU hosts for NVLink bridge cables or NVSwitch backplanes.
  • Audit host-side driver interfaces: confirm that NVLink cannot be enumerated or enabled from within a guest VM.
  • Add NVLink cable status to your physical server intake checklist so remote hands installation does not go undetected.

Actionable Takeaways

  • Enable all four ACS controls (Source Validation, Request Redirect, Egress Control, Direct Translated) on every PCIe switch and endpoint that hosts tenant-controlled GPUs. Use `lspci -vvv` to verify ACS capability registers are present and set correctly — do not assume server-grade hardware includes ACS support.
  • Disable Address Translation Services (ATS) on GPU hosts unless there is a specific, reviewed requirement for it. ATS-cached physical address mappings give a compromised device a mechanism to target host memory outside its authorized range, which undermines IOMMU isolation.
  • Add NVLink physical cabling to your hardware intake and audit checklist. Inspect production GPU hosts for undisclosed NVLink bridges, and verify at the driver level that NVLink interfaces cannot be enumerated or activated from within a guest VM. Do not rely on provisioning intent — verify physical state.

Common Pitfalls

  • Assuming IOMMU group isolation fully contains DMA when PCIe peer-to-peer routing is still enabled. IOMMU checks only apply to traffic that passes through the root complex. TLPs routed laterally between devices through a shared PCIe switch bypass the IOMMU entirely, leaving a direct DMA path between tenant GPUs that IOMMU grouping cannot prevent.
  • Trusting that NVLink is absent because you did not provision it. Physical infrastructure teams (remote hands, data center staff) may install NVLink bridge cables when they are included with hardware shipments, without operator instruction or awareness. Physical audit of GPU hosts is required — software-level enumeration alone is insufficient if the cable is installed but not yet enabled.

PCIe peer-to-peer DMA bypass path circumventing IOMMU via switch routing and NVLink


GPU Firmware Persistence and Supply Chain Risk

GPU Firmware as a Persistence Surface in Shared Cloud Infrastructure

When most security engineers think about persistent compromise, they think about modified boot loaders, rogue kernel modules, or tampered system images. For shared GPU infrastructure, the persistence surface extends into the GPU card itself — and it is harder to audit, harder to reset, and harder to trust than anything running on the host OS.

Matt Braun’s presentation at fwd:cloudsec North America 2025 identifies three distinct firmware-level attack surfaces on data-center-grade Nvidia cards: the vBIOS, the info ROM, and the GSP driver loading model. Each creates a different risk profile, and together they represent a hardware security supply chain risk that most cloud security programs have never formally assessed.

vBIOS: A Signed Blob You Can’t Fully Trust

The vBIOS (video BIOS) is a large collection of firmware blobs stored on the GPU card itself. These blobs are loaded when the card initializes. Not all of them are signed — some are, some are not — and a privileged attacker who gains sufficient access to the card can modify unsigned blobs to establish persistence that survives host OS reinstallation.

Fly.io’s approach is to dump the vBIOS and verify the signatures of the signed components. This provides some confidence that the firmware has not been tampered with. However, there is a fundamental limitation: the card returns the vBIOS content you ask for, and there is no guarantee that what it reports is what it is actually running. A sophisticated attacker with persistent firmware-level access could return a clean copy on demand while executing modified code. Signature verification is a meaningful control — it raises the bar — but it is not a root of trust.

The practical takeaway is to treat vBIOS integrity checks as a detective control, not a preventive one. Establish a baseline dump at card provisioning time, re-verify periodically, and flag deviations for investigation.

Info ROM: Metadata That Can Carry Malicious State

The info ROM is the metadata store for the GPU card — it logs temperature data, usage history, and other operational telemetry. It is also writable by a sufficiently privileged process on the card. Like the vBIOS, it represents a nonvolatile storage area that survives host reboots and OS reinstallation.

An attacker who has achieved code execution on the GPU — through a driver vulnerability, a malicious workload, or a compromised guest VM with passthrough access — can write attacker-controlled data into the info ROM. This data can be read back on subsequent boots, enabling a two-stage persistence mechanism: write a trigger or payload fragment to the info ROM, then use that data during a future boot sequence or driver initialization to re-establish access.

The info ROM threat is particularly subtle because it is not part of most firmware auditing workflows. Teams that verify vBIOS signatures often do not inspect info ROM contents, creating a blind spot that a patient attacker can exploit.

GSP Driver Loading: The Guest Controls the Firmware

The GSP (GPU System Processor) is the on-card processor that handles boot logic for the GPU. What makes the GSP unusual — and from a security standpoint, alarming — is where the GSP driver comes from: it is loaded not from the host, but from the guest VM.

In Fly.io’s VFIO passthrough architecture, user-controlled guest VMs supply the ELF file that contains the GSP driver. Nvidia signs this driver, so arbitrary driver injection is not straightforwardly possible. However, the attack surface is broader than signature validation alone:

  • Parsing surface: The GSP driver is an ELF file. ELF parsing is a notoriously complex operation with a long history of vulnerabilities. A malformed or specially crafted ELF file could exploit a parser bug in the GSP’s ELF loader, achieving code execution on the GSP itself. As Braun put it: “It’s an ELF file. There’s parsing volume.”
  • Downgrade attacks: If an older, signed version of the GSP driver contains known vulnerabilities, a guest can supply that older version. Signature validity does not imply security — it only confirms the file came from Nvidia. Version pinning and minimum-version enforcement at the host level are necessary to close this gap.

The GSP loading model inverts the trust assumption most engineers apply to firmware. The host does not control what firmware the GPU runs on a per-provisioning basis — the guest does. In a multi-tenant environment where the guest is controlled by an untrusted customer, this is a significant architectural risk.

Guest-Controlled GSP Driver Loading as a Firmware Attack Surface

Proof of Concept

  1. Understand the GSP architecture: The GSP is an embedded processor on the Nvidia GPU card that handles the GPU’s boot sequence and low-level initialization logic. Unlike the vBIOS, the GSP firmware driver is not permanently stored on the card — it is loaded at runtime.
  2. Identify the trust boundary violation: In a standard (non-passthrough) deployment, the host kernel and its Nvidia driver load the GSP firmware from a trusted path on the host filesystem. In VFIO direct passthrough, the guest VM controls the driver that is passed to the GSP. The speaker explicitly states: “The driver for this is loaded not from the host but from the guest. So the user-controlled guests are passing the driver down.”
  3. Assess the signing check as a partial mitigation: The GSP driver is signed by Nvidia, which means a guest cannot arbitrarily substitute a completely fabricated binary. However, the signature check only establishes authenticity of the binary — it does not prevent downgrade attacks (loading an older, signed but vulnerable version of the driver) or exploiting parsing vulnerabilities in the ELF loader that processes the signed file before the signature is fully verified.
  4. Evaluate the ELF parsing surface: The GSP driver is an ELF (Executable and Linkable Format) binary. ELF parsing in firmware and low-level contexts has a documented history of vulnerabilities. A crafted or downgraded ELF could trigger parser bugs in the GSP’s ELF loader — a component that runs at firmware privilege level, not OS privilege level.
  5. Assess the downgrade risk: Because the guest controls which signed GSP driver version is supplied, a malicious guest could deliberately load an older, signed Nvidia GSP firmware binary that contains known vulnerabilities, bypassing the security improvements present in the current version.
  6. Understand the privilege and persistence implications: A successful exploit at the GSP level would execute at GPU firmware privilege — below the guest OS, below the hypervisor’s visibility, and potentially persistent across VM resets if the GSP state is not fully re-initialized between guest allocations.
  7. Recommended mitigations: Trail of Bits[6] and Tetro, Fly.io’s external security consultants, recommended continued fuzzing of the driver as a priority finding, establishing an external root of trust for GPU firmware, and monitoring GPU state continuously.

vBIOS Tampering and Info ROM Persistence via Privileged Guest Access

Proof of Concept

  1. Understand the target firmware surface: The GPU card hosts multiple firmware blobs collectively referred to as the vBIOS. These blobs are loaded when the card initializes. Additionally, the card contains an info ROM — a nonvolatile metadata store used for logging, temperature data, and card identification. Both regions are writable under certain privilege conditions.
  2. Establish the attacker’s position: In Fly.io’s VFIO passthrough architecture, the guest VM has direct, non-virtualized access to the physical GPU (packaged as a virtual function). While IOMMU enforces DMA boundaries, the guest retains MMIO access to the GPU’s BAR-mapped control registers. A guest operating at sufficient privilege — for example, a guest running as root or kernel-level code — can interact with the GPU’s firmware interface through these mapped memory regions.
  3. Target the vBIOS blob: The vBIOS consists of firmware blobs resident on the GPU card. “Not all of them are signed, but some are.” An attacker with privileged guest access can attempt to overwrite unsigned vBIOS blobs with a modified payload. Because the guest interacts with the GPU through MMIO BARs that include control registers, crafted writes to the appropriate BAR regions can trigger firmware update operations on the card.
  4. Target the info ROM for stealth persistence: The info ROM is a separate nonvolatile storage region holding card metadata such as logging data and temperature records. Because it is not a code execution target in the traditional sense, it may receive less scrutiny from integrity monitoring tools. An attacker can write malicious data or a persistent backdoor marker into the info ROM, which survives VM termination and persists across card resets.
  5. Surviving VM teardown: Unlike memory-resident malware that is cleared when a VM is destroyed, modifications to vBIOS blobs or the info ROM are stored in nonvolatile storage on the card itself. When the next customer VM is assigned the same physical GPU, the tampered firmware or info ROM data is already present, enabling cross-tenant persistence and potential lateral impact.
  6. Detection challenge — vBIOS signature verification caveat: “You can dump the vBIOS and verify the signatures, but that assumes that the card is actually returning the one that’s actually running.” A compromised vBIOS could intercept the dump command and return a clean copy while executing malicious code — a classic evil-maid-style attestation bypass. Despite this limitation, vBIOS dumping and signature verification remains a recommended detective control.

The Audit and Hardening Checklist

Fly.io engaged external security consultants — Trail of Bits[6] and Tetro — to evaluate the GPU passthrough architecture. The engagement confirmed that certain parts of the GPU stack are exposed and produced a practical hardening checklist that security engineers can apply to any shared GPU deployment:

  1. Dump and baseline vBIOS on provisioning. Verify signatures of signed components. Re-verify on a defined schedule. Treat deviations as security incidents.
  2. Inspect info ROM contents at provisioning and periodically thereafter. Any unexpected data in a nonvolatile metadata store on a card that has hosted untrusted workloads is a red flag.
  3. Enforce GSP driver version minimums at the host level. Do not allow guests to load GSP driver versions below a defined floor. This closes the downgrade attack path.
  4. Continue fuzzing the driver stack. The GSP’s ELF parsing surface is unaudited in most deployments. Fuzzing the driver against malformed ELF inputs is the only way to surface parser bugs before an attacker does.
  5. Monitor BAR access patterns. As covered in earlier sections, eBPF-based BAR monitoring detects anomalous MMIO access patterns that may indicate exploitation attempts.
  6. Establish an external root of trust. Hyperscalers have custom silicon and attestation chains that can verify firmware integrity end-to-end. For operators running commodity server hardware, the best available option is a combination of TPM-backed host attestation and vBIOS signature verification — imperfect, but meaningful.
  7. If you do not need GPUs, do not deploy them. Fly.io’s own retrospective assessment was that most customers ultimately wanted API-level LLM access, not raw GPU hardware. The attack surface reduction from not deploying passthrough GPUs is the most effective control of all.
  8. If you must offer shared GPUs, default to dedicated hardware per tenant. Shared hardware between untrusted tenants is the most difficult configuration to secure. Where pricing allows, dedicated GPU cards eliminate the cross-tenant threat model entirely.

Supply Chain Implications

The NVLink discovery covered in the previous section — where remote hands installed a cable Fly.io had not requested — is a supply chain story as much as a configuration story. You cannot fully trust the state of hardware you did not personally provision from factory to rack. Data-center-grade GPU cards arrive with firmware pre-loaded, info ROMs pre-written, and sometimes with physical interconnects installed by third parties. Each of these represents an opportunity for pre-compromise that standard cloud security controls will never detect.

For security engineers building or evaluating GPU infrastructure, the firmware and supply chain surface requires explicit threat modeling, explicit audit procedures, and explicit assumptions about what you cannot verify — not just what you can.

Actionable Takeaways

  • Establish a vBIOS baseline at card provisioning time by dumping and verifying signatures of all signed firmware blobs. Re-verify on a defined schedule and treat any deviation as a security incident requiring card quarantine and investigation.
  • Enforce a minimum GSP driver version at the host layer to prevent guests from loading older, signed-but-vulnerable ELF files. Pair this with fuzzing the GSP's ELF parsing surface against malformed inputs to surface parser vulnerabilities before an attacker can exploit them.
  • Inspect info ROM contents at provisioning and after any untrusted workload completes. Nonvolatile metadata stores on GPU cards are a persistence vector that most firmware auditing workflows overlook entirely — add them explicitly to your hardware security checklist.

Common Pitfalls

  • Treating vBIOS signature verification as a root of trust rather than a detective control. The card returns whatever content it chooses in response to a dump request — a compromised card can return a clean copy while running modified firmware. Signature checks raise the bar but do not provide cryptographic proof of what is actually executing.
  • Assuming that GSP driver signing by Nvidia makes the driver loading model safe. Signature validity only confirms provenance, not security. Downgrade attacks using older signed drivers and ELF parser exploitation remain valid attack paths that signature checks do not address.

GPU firmware attack surfaces: vBIOS, info ROM persistence, and guest-controlled GSP driver loading


Conclusion

Fly.io’s journey to securely offer shared GPU compute surfaces a threat model that the cloud security industry has barely begun to address. The core insight — that data-center-grade GPU cards are effectively secondary systems with independent DMA capability, nonvolatile storage, and firmware stacks — should reframe how every security engineer thinks about multi-tenant GPU infrastructure.

The security controls that matter here operate at three distinct layers. At the PCIe level: IOMMU group isolation, ACS configuration, ATS disablement, and NVLink physical auditing. At the kernel level: VFIO binding verification, BAR access monitoring via eBPF or driver shims, and IOMMU enablement confirmation. At the firmware level: vBIOS baselining, info ROM inspection, GSP driver version enforcement, and ELF parsing surface fuzzing. None of these are optional if you are sharing GPU hardware across untrusted tenants.

The hardest lesson from Fly.io’s experience is that the most effective control is often architectural: don’t deploy shared GPUs unless you must, and when you must, default to dedicated hardware per tenant. The attack surface reduction from not sharing physical hardware is more reliable than any software or firmware control you can layer on top.

For further reading on related topics in hardware security and cloud security, explore related talks on thecyberarchive.com. For engineers building or auditing GPU infrastructure, the virtualization security fundamentals covered here provide a practical audit framework applicable to any cloud GPU provider.


References & Tools

  1. VFIO (Virtual Function I/O) — Linux kernel framework for safe PCIe device passthrough to user space, relying on IOMMU for memory isolation.
  2. Firecracker — Open-source VMM powering AWS Lambda and Fargate; lacks PCIe passthrough support.
  3. Cloud Hypervisor — Rust VMM sharing core components with Firecracker but extending functionality to include PCIe passthrough support.
  4. lspci — Linux utility for inspecting PCIe device topology, IOMMU group membership, BAR layouts, and ACS/ATS capability registers.
  5. eBPF — Linux kernel technology used by Fly.io to monitor and enforce GPU BAR access policies as an alternative to hardware-enforced MIG/vGPU isolation.
  6. Trail of Bits — Security consultancy engaged by Fly.io to evaluate the GPU passthrough architecture; recommended ACS/ATS hardening, continued fuzzing, and establishing an external root of trust.
Frequently asked

Questions from the audience

What makes shared GPU security fundamentally different from standard cloud VM isolation?
Data-center-grade GPU cards are effectively secondary systems with independent firmware, nonvolatile storage, and the ability to issue DMA across host RAM without constraint. Standard hypervisor isolation — even with VM boundaries enforced by Firecracker or KVM — does not contain GPU DMA unless IOMMU containment is explicitly configured and verified. This makes GPU multi-tenancy a qualitatively different threat model requiring hardware-level controls that standard cloud security architecture was never designed to address.
Why did Fly.io choose VFIO passthrough over Nvidia MIG or vGPU for GPU multi-tenancy?
MIG requires Red Hat or VMware as the host platform, which ruled it out for Fly.io's Cloud Hypervisor stack. vGPU offered broader hypervisor support but still excluded Cloud Hypervisor, and its per-provisioning licensing model was economically incompatible with Fly.io's ephemeral compute platform. VFIO direct passthrough was the only viable option — but it shifts the entire isolation burden to the operator rather than relying on vendor-maintained partitioning guarantees.
How does PCIe peer-to-peer routing bypass IOMMU isolation, and what controls close the gap?
IOMMU checks only apply to DMA traffic that passes through the root complex. In PCIe's hierarchical topology, devices on the same switch can communicate laterally via Transaction Layer Packets (TLPs) without ever reaching the root complex — meaning those packets bypass IOMMU validation entirely. Access Control Services (ACS) is the PCIe mechanism designed to close this gap: enabling Request Redirect forces all traffic through the root complex, and enabling Direct Translated disables peer-to-peer entirely. Both should be enabled together for defense-in-depth.
What is the GSP driver loading risk in VFIO GPU passthrough, and how should it be mitigated?
In Fly.io's VFIO passthrough architecture, the guest VM — not the host — supplies the ELF file containing the Nvidia GSP (GPU System Processor) driver. Nvidia signs the driver, but signing alone does not prevent downgrade attacks (loading an older signed but vulnerable version) or exploitation of parser bugs in the GSP's ELF loader. Mitigations include enforcing a minimum GSP driver version at the host layer, fuzzing the ELF parsing surface against malformed inputs, and establishing an external root of trust for GPU firmware.
Watch on YouTube
Shared-GPU Security Learnings from Fly.io
Matthew Braun, · 22 min
Watch talk
Keep reading

Related deep dives