What is the RPI loop and why does it matter for AI-assisted security tooling?

The Research-Plan-Implement (RPI) loop is a structured prompting pattern that forces explicit planning before code generation. Instead of issuing a single prompt asking AI to write a script, you first ask it to research constraints and requirements, then produce a written design, then implement. This prevents AI from faithfully producing functional code that is missing every production-critical requirement — caching, rate limiting, idempotency, parallelization — that the naive prompt never specified. The loop can be encoded in harnesses like Claude's Plan Mode so it does not depend on operator discipline.

How did the feedback loop between AI analysis and deterministic scripts work in the Shai-Hulud investigation?

McCarthy used AI as a one-time pattern discovery engine on data samples, not as a recurring analysis runtime. AI would identify new signals — environment variable patterns, JWT claim structures, self-hosted Git instance identifiers — and those signals were then encoded into deterministic Python scripts. The scripts ran at scale against the full 30 GB dataset, producing consistent, auditable, cost-efficient results. Gaps identified by the deterministic pass were fed back to AI for a new signal discovery cycle, compounding coverage with each iteration.

What are the main credulity failure modes of reasoning models in victim attribution?

Three concrete examples from the Shai-Hulud investigation: (1) spurious name matching — the string 'nucleus' was linked to a specific company because a platform called Nucleus exists, despite it being an extremely common name; (2) platform-vendor conflation — any machine running Azure DevOps pipelines was attributed to Microsoft rather than the actual Azure customer; (3) consumer credential misattribution — personal Microsoft account JWTs found on CI/CD runners were treated as evidence of Microsoft infrastructure compromise. The common pattern: reasoning models present coincidental connections with the same confidence as strong ones.

How do you scale victim attribution beyond what a reasoning model can enumerate in one pass?

You do not ask the reasoning model to enumerate all victims — it will shortcut to a representative sample. Instead, ask high-signal bounded queries ('find the top 10 major companies' or 'which match Fortune 100 membership?') to get high-confidence candidates quickly. Separately, build a deterministic attribution engine from the signal types AI identifies — environment variables, JWT issuers, API slugs — and run that engine exhaustively across the full dataset. Use AI for discovery on samples and for creative signal generation; use deterministic scripts for coverage at scale.

AI-Assisted Supply Chain Attack Investigation

When the Shai-Hulud supply chain attack hit the npm ecosystem in 2025, it silently exfiltrated secrets from over 30,000 GitHub repositories — and the data was disappearing faster than any human team could chase it. AI-assisted supply chain attack investigation turned a two-week manual effort that found 200 impacted companies into a two-day agentic workflow that confirmed over 2,400. The catch: you have to know how to use AI correctly, or it will give you confident, completely wrong answers.

This post breaks down how security researcher Rami McCarthy used AI throughout the full investigation lifecycle — from scraping ephemeral breach data to fingerprinting CI/CD machines, attributing victims with reasoning models, and building a 69-method agentic attribution engine. Security engineers will come away with concrete patterns for combining LLM-powered analysis with deterministic code, and the specific failure modes to guard against.

Key Takeaways

You'll learn how to structure AI-assisted workflows using a research-plan-implement (RPI) loop to avoid naive prompting pitfalls when building security tooling at scale.
You'll be able to apply feedback loops that distill AI's probabilistic analysis into deterministic scripts — improving consistency, coverage, and scalability across large breach datasets.
Apply skepticism injection and human-in-the-loop validation to counter LLM credulity, preventing false attribution in high-stakes victim notification scenarios.

The Challenge of Ephemeral Breach Data and Why It Is AI-Shaped

Supply Chain Attacks and the Race Against Disappearing Data

AI-assisted supply chain attack investigation begins with a fundamental constraint that manual workflows simply cannot overcome: the evidence is designed to vanish. When the Shai-Hulud and Singularity attacks swept through the npm ecosystem in 2025, they silently exfiltrated secrets from over 30,000 GitHub repositories — CI/CD tokens, API keys, environment files, TruffleHog^[1] scan results — and pushed them to attacker-controlled public repositories. The window to capture, analyze, and act on that data was narrow. GitHub actively cleaned up the repositories. Victims cleaned up their own exposed data. Once the data was gone, answering the core question — “Was I impacted?” — became extremely difficult.

This is not a problem of scale alone. It is a problem of three overlapping pressures converging simultaneously:

Velocity: Data was actively leaking and actively disappearing at the same time. Every hour of delay was permanent data loss.
Volume: Tens of thousands of repositories, gigabytes of heterogeneous files, no structured schema.
Attribution complexity: The raw data contained secrets and environment artifacts but no obvious owner labels. Determining who each dataset belonged to required inference, enrichment, and pattern matching across thousands of entries.

Why These Challenges Are “AI-Shaped”

Rami McCarthy’s framing from the talk is precise: these challenges are AI-shaped. That is not marketing language — it is a description of the problem geometry. The tasks required are:

Scaling collection across thousands of repositories faster than cleanup can occur
Analyzing heterogeneous data (env files, CI artifacts, encoded secrets, JWT tokens) without a fixed schema
High-speed pattern recognition to identify signals (CI/CD platform indicators, company identifiers, secret prefixes) across massive datasets
Victim attribution — connecting raw leaked data to the organization it belongs to — which requires both broad knowledge and contextual inference

These are precisely the capabilities where AI provides leverage. A human analyst working linearly cannot outpace automated cleanup. A deterministic script cannot perform the kind of open-ended inference required to map an unknown JWT claim or an Azure DevOps slug back to a Fortune 100 company. The combination of speed, breadth, and contextual reasoning is what makes AI the right tool for this phase of incident response.

The Stakes: Victim Notification at Scale

The goal driving all of this is not academic. McCarthy’s team needed to identify impacted organizations so that security contacts could be notified and companies could begin remediation. Without AI-assisted investigation workflows, the team spent two weeks manually analyzing data and identified approximately 200 impacted companies. With the agentic attribution engine built over two days, that number grew to over 2,400 confirmed impacted companies — including at least 37 of the Fortune 100, all manually verified.

That delta — 200 vs. 2,400 — is the practical argument for investing in AI-assisted breach investigation workflows. The data is ephemeral. The problem is AI-shaped. The rest of this post covers how to build the workflows that actually capture that advantage.

Actionable Takeaways

When a supply chain attack produces publicly accessible breach data, treat the collection window as a hard deadline — establish automated scraping workflows immediately rather than beginning with manual triage, because data deletion is ongoing and irreversible.
Assess your incident response problems against the "AI-shaped" profile: if a task requires high-speed collection, heterogeneous data analysis, or open-ended attribution inference across thousands of records, that is a signal to build an AI-assisted workflow rather than a purely deterministic one.
Set a concrete victim notification goal before building your analysis pipeline — the need to identify and contact affected organizations is what determines which attribution signals matter and how much precision you need, which in turn drives workflow design decisions.

Common Pitfalls

Assuming manual triage is sufficient for ephemeral breach data: the transcript makes clear that two weeks of manual analysis produced 200 identifications while two days of AI-assisted agentic work produced 2,400. Starting with manual workflows when the data window is closing means permanently losing attribution coverage.
Underestimating the attribution problem by treating it as a simple lookup: victim attribution for leaked supply chain data is not a matter of reading a label off a file. It requires multi-step inference across environment variables, JWT claims, repository slugs, and API lookups — a heterogeneous reasoning task that manual or purely deterministic approaches handle poorly at scale.

Building Effective AI Workflows: The RPI Loop and Composable Tooling

Why Naive Prompting Fails in AI-Assisted Security Research

The most important lesson from McCarthy’s AI-assisted supply chain attack investigation is not that AI is powerful — it’s that power without structure produces incomplete results. When McCarthy needed to scrape thousands of Singularity attack repositories from GitHub, the naive approach was to open Claude^[2] and type: “Use the GitHub CLI^[3] to grab all of these repositories by name.” The code worked. But that’s the problem with stopping there.

A simple prompt generates functional code that is missing everything that makes it production-ready:

Caching and idempotency — re-running the scraper pulls duplicate data
Search rate limiting and backoff — no handling for GitHub’s API throttling
Parallelization — sequential requests against time-sensitive ephemeral data
Error recovery — no retry logic when network calls fail mid-run

AI will not tell you these requirements are missing. It answered the question you asked, not the question you should have asked. The gap between “AI generates code that works” and “AI generates code you can rely on at scale” is the workflow gap.

The Research-Plan-Implement (RPI) Loop

McCarthy’s core prescriptive pattern is the research-plan-implement (RPI) loop: before writing a single line of code, ask AI to research the problem space, then ask it to produce a plan, and only then move to implementation.

The three stages force explicit handling of what a naive single-shot prompt omits:

Research — what are the constraints? What does the API support? What are the known failure modes?
Plan — what is the architecture? How do we handle rate limits, failures, restarts?
Implement — now write the code against a spec, not a vague task description

This isn’t a novel idea, but the insight is that it can and should be encoded in the harness itself. Claude’s built-in Plan Mode is a minimal implementation of this — a few lines of system prompt that instruct the model to stop, plan, and not write code yet. Engineers who treat Plan Mode as an obstacle are leaving the most leverage on the table.

For a more structured approach, McCarthy references the Superpowers repository^[4], which encodes a seven-step process for AI-assisted code development. Stage one is explicitly brainstorming — the same forcing function as the RPI loop. The point is not that any single harness is the answer. The point is that the harness encodes the discipline, and you should be deliberately choosing or building a harness rather than prompting ad hoc.

The Research-Plan-Implement loop: Research → Plan → Implement cycle for structured AI-assisted security tooling

Naive GitHub Scraper Prompt vs. Production-Ready Harness: What AI Omits Without RPI

Proof of Concept

Establish the naive prompt baseline: McCarthy reproduced the original August prompt in a fresh Claude session to confirm the behavior was not model-version-specific. The prompt was approximately: “Use the GitHub CLI to grab all of these repositories by name.” Claude generated syntactically correct shell or Python code that invoked the GitHub CLI and iterated over a supplied list of repository names.
Observe what the naive output omits: The working but incomplete script lacked the following non-functional and functional requirements — all of which matter at incident scale:
- Caching / idempotency: No mechanism to skip repositories already downloaded. Re-running the script re-fetches everything, wasting time and burning API quota.
- Search rate-limit handling: GitHub’s search API enforces strict rate limits (10 unauthenticated requests/minute, 30 authenticated). The naive script makes no attempt to detect HTTP 429 responses, respect Retry-After headers, or implement token rotation.
- Parallelization: Repositories are cloned sequentially. At 30,000 repositories, sequential cloning is a bottleneck measured in hours, not minutes.
- Exponential backoff: No retry logic with jitter for transient failures — a single network error terminates the run.
- Creative scraping approaches: No use of GitHub’s search API pagination, repository archive endpoints, or bulk download strategies that could reduce total request count.
Understand why the gap exists: McCarthy’s key insight is that AI never questions the abstraction you hand it. A prompt that says “grab these repositories” gives the model a flat-file, sequential mental model. It will implement that model faithfully and completely — without inferring that the actual goal is resilient, high-throughput data collection.
Apply the RPI loop to the same task: Structure the interaction as three phases:
- Research: Ask Claude to enumerate all requirements for a production-grade GitHub scraper operating under API rate limits against a target of ~30,000 repositories. Let it surface caching strategies, API quota constraints, parallelization options, and failure modes before writing a single line of code.
- Plan: Ask Claude to produce a written design — data structures, retry policy, concurrency model, checkpoint format — without generating executable code yet. Claude Plan Mode enforces this by injecting a system-level instruction: “Stop and plan. Do not write any code yet.”
- Implement: Only after the plan is reviewed and agreed upon does code generation begin.
Encode RPI into the harness (not just the prompt): Options include:
- Claude Plan Mode: Activates with a few lines of system prompt; forces planning phase before code emission.
- Superpowers repository: A community harness that encodes a seven-step code development process, with brainstorming/research as stage one.
- Custom harnesses: Any agentic scaffolding that separates a “no-code” planning conversation from a subsequent “implementation” session achieves the same effect.

Composable Utilities: Hoarding What You Build

A second structural challenge with AI-assisted workflows is memory — not the model’s context window, but the engineer’s organizational memory of what has been built.

McCarthy frames this as “hoarding things you know how to do.” When you’re working rapidly with AI — what practitioners call vibe coding — it’s easy to end up with a working script you can’t find again, a utility that solved the problem but was never named or organized, a solution scattered across temp directories.

The concept of composable utilities addresses this. As you build small tools during an investigation or a sprint, treat them as reusable assets:

Give them explicit names and clear interfaces
Store them where you can reference them in future sessions
Think about them as a compounding investment, not throwaway code

McCarthy acknowledges the friction: “People are telling you code is free. People are telling you you can throw away code. I somewhat agree, but it took you effort to get there.” The effort to produce a working utility — even with AI doing most of the work — decreases over time but never reaches zero. A library of composable utilities means that investment compounds. An unorganized disc full of scripts means it resets to zero every project.

Just-in-Time Tooling as a Core Skill

The flip side of composable utilities is just-in-time tooling — recognizing when the right move is to build a throwaway, purpose-specific tool rather than manually inspecting data or trying to fit the problem into a general-purpose utility.

McCarthy’s example: when manually reviewing breach data and noticing patterns AI wasn’t surfacing, instead of laboriously going through files by hand, he prompted up a custom UI tool that let him skim the data, tag attributes with signal, and feed those signals back to the model for iterative analysis cycles.

“Being able to build these throwaway tools is something you should be reaching for all the time. It’s one of those things in your tool belt that once you know it exists, all of a sudden you find a lot of hammer-shaped nails.” This is the practical complement to composable utilities: know when to build for reuse and when to build for this-exact-problem-right-now.

Actionable Takeaways

Adopt the RPI loop as a non-negotiable step before any AI-generated code task: prompt for research first, then a plan, then implementation. Use Claude's Plan Mode or a structured harness like Superpowers to encode this discipline rather than relying on willpower.
Treat every useful utility you build with AI as a composable asset — name it, document its interface, and store it somewhere you can reference it in the next project. An organized library of small tools compounds in value; scattered temp files do not.
Default to just-in-time tooling for data inspection tasks: if you're about to spend an hour manually reviewing data, spend 10 minutes building a purpose-built tool first. The resulting speed and accuracy gain almost always justifies the setup cost.

Common Pitfalls

Treating a working naive prompt as a complete solution. AI answers the question asked, not all the questions that should have been asked. A script that scrapes data without caching, rate limiting, or backoff is a script that will fail in production under the exact conditions — scale and speed — where you need it most.
Allowing composable utilities to accumulate as unorganized vibe-coding output. If the tools you build with AI aren't organized and referenceable, each investigation starts from scratch and the compounding benefit of prior work is lost.

AI-Driven Data Analysis: Fingerprinting, Signal Extraction, and the Shift to Determinism

After scraping the Singularity and Shai-Hulud breach data, McCarthy was left with a poor man’s data lake: roughly 250,000 flat files totaling 30 gigabytes. By typical data engineering standards, this is not “big data” — it fits on a laptop. But it is large enough that naive linear scanning becomes I/O-bound, and the shape of the data introduces an important AI-specific trap worth understanding before touching any analysis tooling.

The Flat Files Trap: Why AI Never Questions Your Abstraction

One of the more counterintuitive observations from this investigation is that AI will never question the abstraction you hand it. Feed Claude flat files on disk, and it will assume flat files are the correct and final data structure. It will not ask whether the data should be indexed, stored in SQLite, or serialized to Parquet for columnar access. It will not raise the question of whether a B-tree index would reduce your query time by three orders of magnitude.

This is a genuine gap in AI-assisted development. Research from Vercel and BrainTrust has begun to formalize when AI works well with bash and flat files versus when a transition to indexed database query patterns is warranted. The practical implication: before starting any AI-driven analysis workflow, explicitly plan the data abstraction layer yourself. Do not let AI inherit your defaults.

For the breach dataset, flat files were ultimately workable at this scale, but the lesson stands: the decision about how data is stored and accessed is yours to make before the analysis loop begins.

Fingerprinting: Deduplicating 30,000 Repositories Down to 13,000 Unique Machines

The first major analytical win was fingerprinting — and it came as a near-instant one-shot from Claude. The raw dataset appeared to contain 30,000 breached repositories, suggesting a comparable number of victims. But on closer inspection, CI/CD pipelines produce duplicate data. A single machine running 50 pipeline jobs generates 50 repositories with materially identical environment files and secrets.

McCarthy used Claude to derive a deduplication fingerprint: a signal extracted from the file contents that could identify when two repositories originated from the same physical or virtual machine, rather than two distinct victims. The result was a collapse from 30,000 apparent records to approximately 13,000 unique machines — meaning the raw count was overstating victim scope by more than 2x.

This is the kind of insight that is straightforward once you see it, but requires pattern recognition across heterogeneous file contents to implement quickly. AI is genuinely well-suited to this task. The fingerprinting logic was deterministic enough once derived to be encoded and run at scale.

CI/CD Fingerprinting: Deduplicating 30,000 Repositories to 13,000 Unique Machines

Proof of Concept

Data collection baseline: After scraping the leaked Singularity-style attack repositories, McCarthy had approximately 30,000 repositories containing environment files, TruffleHog scan results, and other exfiltrated machine data. The raw number suggested 30,000 victims, but the data structure hinted at duplication — CI/CD pipeline runs on the same machine would produce multiple repositories with overlapping or near-identical content.
Identifying the duplication signal: By examining the shape of the data, McCarthy observed that CI/CD runs produce duplicate outputs — the same machine executing a pipeline multiple times would leak similar environment variables and secrets each time, generating separate repositories for each run. This structural artifact meant the 30,000 repository count significantly overstated the number of unique compromised machines.
AI-generated fingerprinting logic (one-shot): McCarthy described this as a “cool one-shot from AI” — he prompted Claude to derive a fingerprinting signal capable of deduplicating machine records. Rather than manually reasoning through which environment variables or file attributes would uniquely identify a machine across multiple CI/CD runs, he delegated that signal-derivation task directly to the model.
Signal extraction for CI/CD classification: In parallel with deduplication, McCarthy used Claude to identify environment variables that indicate specific CI/CD platforms. The model produced approximately 25 different environment variable names associated with various CI/CD systems — including platforms McCarthy had not previously encountered. This classification step allowed each deduplicated machine record to be labeled as a CI/CD runner or a developer endpoint.
Codifying into a deterministic script: Rather than re-running the AI analysis repeatedly against the full 30 GB dataset, McCarthy used the AI-derived fingerprinting logic as a starting point and then encoded it into a deterministic script. This followed his broader principle of using AI for pattern discovery and signal extraction, then distilling those insights into consistent, scalable, re-runnable code.
Results: After running the deterministic fingerprinting script across the full dataset:
- 30,000 repositories → 13,000 unique machines (a ~57% reduction confirming significant duplication from repeated CI/CD runs)
- 77% of those 13,000 machines were identified as CI/CD runners
- This reshaped the investigation: the attack had primarily targeted automated build infrastructure rather than individual developer workstations, which influenced downstream attribution and victim notification strategy.

CI/CD Signal Extraction: Classifying What Was Breached

With deduplication in place, the next analysis question was victimology: what kind of machines were these? This is where CI/CD security context matters. LLM-driven signal extraction provides disproportionate leverage here. A human analyst reviewing environment variable dumps would recognize common CI/CD signals — GITHUB_ACTIONS, CIRCLE_CI, BUILDKITE_AGENT_ID — but would have a narrow baseline. Claude was able to enumerate 25 different environment variables signifying CI/CD platforms the researcher had never encountered, dramatically broadening the classification coverage in a single pass.

The key analytical capabilities AI brings to this kind of work:

Deduplication and fingerprinting — identifying which records represent the same underlying machine
Signal extraction — pulling structured attributes from unstructured or semi-structured file contents
Pattern matching — recognizing formats, encodings, and schema variants across heterogeneous data
Metadata identification — inferring useful attributes (platform type, language ecosystem, containerization) from environmental signals

The outcome from this classification pass: approximately 77% of the 13,000 unique machines were CI/CD runners, not user workstations. This fundamentally shaped the downstream attribution and notification strategy.

The Feedback Loop: From Probabilistic AI Output to Deterministic Scripts

The most important architectural pattern in the entire investigation is what McCarthy calls the feedback loop toward determinism — and it is worth dwelling on because it applies far beyond this specific case.

The naive approach to AI-assisted data analysis is to repeatedly ask the LLM to analyze your data. Feed it samples, get back observations, feed it more samples, get back more observations. This approach has a fundamental flaw: you get different small samples every time, consistency is not guaranteed, and you cannot iterate or back-test against a fixed implementation.

The correct pattern flips this: use AI as a one-time reasoning engine to identify patterns and signals, then encode those learnings into a deterministic script that you run at scale. The AI generates the insight; the script operationalizes it.

Concretely, the workflow looked like this:

Sample analysis — Feed a representative subset of data to Claude, ask it to identify patterns, signals, and distinguishing characteristics
Signal extraction — Claude surfaces specific environment variables, file path patterns, JWT claim structures, or other indicators that correlate with meaningful attributes
Codification — Translate those signals into a Python script or rule set that can process the full 30 GB dataset deterministically
Validation and backtesting — Run the script against the full dataset, measure coverage, compare against manual spot-checks
Feedback — Take cases the deterministic script missed back to Claude, identify additional signals, extend the rule set, repeat

This loop produces compounding returns. Each cycle increases the coverage and accuracy of the deterministic layer while keeping the expensive, probabilistic LLM calls limited to new pattern discovery rather than routine execution.

AI to deterministic feedback loop: AI pattern discovery feeds deterministic script execution, gaps cycle back to AI for new signal discovery

Why Determinism Is the Goal, Not the Starting Point

A common misconception about AI-assisted security tooling is that the LLM is the analysis engine. The more accurate mental model is that the LLM is a pattern discovery accelerator that feeds a deterministic execution layer.

This distinction matters for several practical reasons:

Scalability — Deterministic scripts process 30 GB without token costs or rate limits
Consistency — The same script run twice on the same data produces the same output
Auditability — A Python script can be reviewed, version-controlled, and shared
Iteration — You can back-test changes to detection logic against the full historical dataset
Cost — AI-driven analysis of the full corpus iteratively would have been prohibitively expensive; deterministic scripts eliminate that cost after the discovery phase

McCarthy frames this broadly: whether you are building intrusion detections, configuration audit rules, or secret scanning detectors, the same loop applies. Use AI for the things it is uniquely good at — pattern recognition, signal extraction, creative hypothesis generation — and then distill those outputs into deterministic methods that run at scale.

Secret Scanning Rule Portability: Porting 58 TruffleHog Detectors to a New Engine in 30 Minutes

Proof of Concept

Identify the need for a different engine: McCarthy was working with ~250,000 flat files of breach data and needed to enrich and validate secrets found within them. TruffleHog was the natural candidate given its large detector library (~800 detectors), but it had three concrete problems: it is commonly fingerprinted by security tooling and generates alerts (noise risk during covert investigation), its secret validation had a high observed rate of false negatives from prior research, and it was relatively slow — all unacceptable characteristics at this scale.
Scope the portable rule subset: Rather than attempting to port all ~800 TruffleHog detectors, McCarthy instructed the LLM to identify only the trivial rules — specifically, detectors where validation logic consists of a secret prefix check followed by a simple GET or POST HTTP request. These rules have no complex logic, no multi-step OAuth flows, and no unusual dependencies. This scoping step is critical: it converts an intractable porting problem into a well-bounded, automatable one.
Prompt the LLM to enumerate matching detectors: The LLM was given access to or knowledge of the TruffleHog detector definitions and asked to find all rules that match the pattern: secret has a known prefix + validation is a single HTTP GET or POST. The LLM enumerated 58 such detectors from the TruffleHog codebase.
Port the 58 detectors to the target engine: The LLM translated each of the 58 detector definitions into the rule format expected by the new scanning engine. Because the underlying logic was simple (prefix match + one HTTP call), the LLM could perform this translation reliably — no ambiguous logic to interpret, no edge cases to resolve. The full porting operation completed in approximately 30 minutes of wall-clock time.
Select the new engine based on desired characteristics: With the rules now portable, McCarthy was free to choose the engine based on operational requirements rather than rule availability. The key decision dimensions he identified are: speed (throughput per second on large flat file datasets), false positive rate (precision of detections), and false negative rate (recall — critical for victim notification where missing a hit has real consequences).
Apply enriched secret validation for victim attribution: The ported detectors were run against the breach dataset to validate secrets and extract metadata — owner identifiers, service names, account information — that could be fed into the broader attribution pipeline.

Key Insight — Rules Are Becoming Fungible: McCarthy frames this as a broader industry shift: the era of picking a security tool because of its rule set is ending. The same dynamic applies beyond secret scanning — Sigma rules ported to proprietary SIEMs, YARA rules migrated between platforms, static analysis signatures moved between engines. As LLMs make rule translation trivial, the strategic investment should go into evaluating and selecting engines with the right performance profile for your threat model, then assuming you can migrate the content.

Actionable Takeaways

Before starting any AI-driven data analysis, explicitly decide your data abstraction layer (flat files, SQLite, Parquet, indexed database) rather than letting AI inherit your defaults. AI will never question the storage model you hand it — that architectural decision must be yours.
Use the feedback loop pattern: run AI on data samples to extract signals and patterns, then encode those findings into a deterministic script. Run that script at scale, identify gaps, feed gaps back to AI for new signal discovery, and repeat. This approach gives you scalable, auditable, consistent results instead of expensive repeated LLM calls.
Treat detection rules (secret scanning detectors, Sigma rules, SIEM signatures) as fungible content independent of the engine. Evaluate tools on operational characteristics (speed, false-positive rate, detection footprint) and use LLMs to port rule sets between engines rather than accepting toolchain lock-in.

Common Pitfalls

Repeatedly querying an LLM to analyze the full dataset iteratively instead of distilling AI-derived patterns into a deterministic script. This produces inconsistent results across runs, incurs unnecessary token costs, and cannot be back-tested — fundamentally the wrong architecture for any repeatable analytical workflow.
Accepting the raw record count from scraped data without deduplication or fingerprinting. In the Shai-Hulud investigation, 30,000 raw repositories collapsed to 13,000 unique machines after fingerprinting — treating the raw count as victim count would have generated false urgency and misdirected notification effort.

Victim Attribution at Scale Using Reasoning Models

The Attribution Problem: 13,000 Leaked Datasets, Unknown Owners

After deduplication, McCarthy was left with 13,000 unique compromised machines and a central question that manual analysis could never answer at pace: whose data was this, and how do you notify them before the window closes? This is where AI-assisted victim attribution moves from a convenience to an operational necessity.

The attribution challenge has a few dimensions that make it hard. Repository names contain implicit signals — company slugs, internal tool names, project identifiers — but they are noisy, abbreviated, and often ambiguous. The secrets themselves carry attribution signal if you know how to read them. And the sheer scale means that any approach requiring human review of each record is dead on arrival.

McCarthy’s approach evolved through several layers of increasing sophistication, ultimately producing a 69-method agentic attribution engine that identified over 2,400 impacted organizations.

Layer 1: Reasoning Model Sampling on Raw Repository Names

The first and most accessible attribution technique was feeding bulk repository name data directly to a reasoning model. In McCarthy’s words, this was “being abusive towards Gemini”^[5] — he took every GitHub repository name from the dataset, accidentally merged them into a single string without newlines, fed the concatenated blob to Gemini, and got back actionable results.

The key insight here is how to frame the query correctly. You cannot ask a reasoning model to enumerate every company in a dataset — it will take shortcuts and sample. But you can ask it high-signal, scoped questions:

“Find me the 10 major companies in this list” — returns high-confidence attributions from recognizable name patterns
“Which of these match Fortune 100 companies?” — leverages the model’s encoded knowledge of well-known enterprises
“Which appear to be government entities?” — exploits domain knowledge baked into the model’s training
“Which match the top VC-funded startups of the last year?” — a creative slice that would not occur to most analysts

The reason this works at all is what McCarthy describes as the approximate knowledge of many things encoded in large language models. The model doesn’t just analyze patterns — it knows which company names, domain conventions, and project naming schemes are associated with which organizations.

Reasoning Model Victim Attribution: Extracting Fortune 100 Companies from a Merged String of Repository Names

Proof of Concept

Collect repository names from the leaked dataset. During the Singularity attack response, McCarthy had approximately 30,000 GitHub repository names scraped from repositories containing exfiltrated secrets and environment data. These names were sourced directly from the GitHub API responses captured during scraping.
Accidentally merge all names into a single string. McCarthy forgot to add newlines between the repository names when constructing the input. Rather than a clean line-separated list, the entire set of 30,000 names was delivered to the model as one continuous merged string — effectively garbage input from a formatting standpoint.
Feed the merged string to a reasoning model (Gemini) and prompt for high-confidence samples. Instead of asking for every company (which would cause the model to sample and shortcut), McCarthy asked a bounded, high-signal question: “Find me the 10 major companies.” This framing exploits the reasoning model’s ability to pattern-match against its embedded knowledge of known company names, branding conventions, and repository naming patterns even within a degraded, unstructured input.
Leverage the model’s approximate world knowledge for victim classification. Beyond simple name matching, McCarthy demonstrated that reasoning models carry significant embedded context about the business world. This enabled several creative attribution query patterns against the same dataset:
- Fortune 100 membership: “Which of these repository names are associated with Fortune 100 companies?” — the model already knows which companies are in the Fortune 100.
- Government entity association: The model knows which domains and naming patterns are associated with government entities, enabling attribution queries like “which of these look like government systems?”
- Thematic slices: McCarthy noted you could also query for “AI top 50 companies” or “top VC-funded startups from the last year,” using the model’s embedded knowledge of current business landscapes to slice the dataset along dimensions no deterministic rule could easily encode.
Observe creative signal extraction beyond explicit names. A notable side effect: when Claude analyzed the same dataset, it identified that encoded JWT strings present in the leaked data likely contained extractable claims. The model recognized an encoded string as a JWT and reasoned that the claims inside could contain identity or company information — a signal McCarthy acknowledged he would not have thought to extract manually.
Recognize the credulity risk and apply bounded queries. McCarthy explicitly called out the double-edged nature of this technique. In the same investigation, the model made overconfident attributions: a repository containing the string “nucleus” was linked to a specific company that uses a platform named Nucleus (an extremely common platform name). Azure DevOps usage was incorrectly attributed to Microsoft employees being the victim. Consumer Microsoft service JWTs were attributed as Microsoft corporate victims. The lesson: reasoning model attribution is high-signal for top-N sampling but unreliable for exhaustive or definitive attribution. Use it for discovery, not for ground truth.
Output and downstream use. The successful queries returned a shortlist of major companies identifiable within the merged repository name string. These results fed directly into the broader attribution pipeline — flagging high-confidence candidates for manual confirmation and seeding the deterministic attribution rules that would later be encoded into the 69-method agentic attribution engine. McCarthy manually confirmed all 37 Fortune 100 attributions that became the headline finding of the Shai-Hulud investigation.

Layer 2: JWT Claim Extraction as Attribution Signal

One of the more surprising attribution vectors was JWT analysis. Leaked datasets frequently included environment files, CI/CD configs, and token caches containing encoded JWTs. A reasoning model can:

Recognize a base64-encoded JWT from its structural pattern even in a noisy environment file
Decode the header and payload to extract claims: iss (issuer), sub (subject), aud (audience), and custom organizational claims
Map the issuer domain to a known organization or identity provider
Use the subject claim to identify whether this is a human user account, a service account, or a CI/CD identity

This is a non-obvious enrichment step that required the model’s generalist knowledge to surface. McCarthy’s example of Claude identifying an Alibaba AI developer tool from a dotfile found inside a TruffleHog secrets output is representative: the model connected a configuration file pattern to a specific internal tool, which mapped to a specific division of a specific company — a chain of reasoning no deterministic rule would have captured.

Layer 3: Azure DevOps Slug Traversal

Not all attribution signals came from AI. McCarthy’s human-in-the-loop contribution was noticing Azure DevOps organization slugs in the leaked data — identifiers that appear in Azure DevOps URLs and pipeline configurations. This is an example of domain expertise that the model missed entirely.

The enrichment chain he built from this signal:

Extract Azure DevOps organization slug from the leaked environment or pipeline config
Query the Azure DevOps API using the slug to retrieve the organization’s tenant ID
Use the tenant ID to query the Microsoft OpenID Connect discovery endpoint
Derive the primary domain associated with that Azure AD tenant
Map the domain to the organization’s public identity

This six-stage enrichment pipeline — traversing from an internal DevOps slug through undocumented-adjacent Microsoft APIs to a verified company identity — was then encoded as one of the 69 attribution methods and fed back into the agentic engine. The point McCarthy makes is that the creative signal discovery was human, but the execution and scale amplification was AI.

Azure DevOps slug traversal enrichment pipeline: 6-stage process from internal DevOps slug to verified company identity

The 69-Method Agentic Attribution Engine

The final attribution system was an agentic analysis tool built iteratively over roughly two days, with the majority of the 69 attribution methods generated by AI. The architecture followed a signal extraction and resolution pattern:

Signal extraction layer: For each machine dataset, extract all available attribution signals. These included:

GitHub organization names and repository naming conventions
GitHub Enterprise instance hostnames
JWT issuer and audience claims
Azure DevOps organization slugs (with the six-stage traversal above)
Self-hosted Git instance hostnames (discovered via the AI feedback loop — a new environment variable type that pointed to internal Git servers)
CI/CD platform-specific identifiers (environment variables from 25+ CI platforms)
Secret prefixes and API endpoint patterns that map to known SaaS vendors
Domain names extracted from config files, certificates, and service URLs

Resolution layer: Take the extracted signals and apply probabilistic matching against known organization identifiers, then enrich with:

GitHub API lookups against organization metadata
Secret validation and enrichment (to retrieve account identity from live API endpoints where safe to do)
Cross-referencing multiple weak signals to build a confidence score

Agentic Attribution Engine: 69 Attribution Methods, Azure DevOps Slug Traversal, and 2,400 Identified Victims

Proof of Concept

Signal extraction across 13,000 unique machines: After deduplicating 30,000 repositories down to 13,000 unique machines, the engine extracted a broad set of attribution signals from each — including environment variables, CI/CD identifiers, leaked secrets, Git configuration fields, JWT claims, and self-hosted Git instance URLs.
Initial reasoning model sampling: Repository names and extracted signals were fed to reasoning models (Gemini and Claude) to perform high-signal sampling. McCarthy fed all repository names — inadvertently merged into a single string without newlines — into Gemini and prompted it to identify the top major companies. Despite the malformed input, the reasoning model returned accurate results.
JWT claim extraction as a creative attribution signal: During AI-assisted analysis, the models identified that Base64-encoded JWTs present in the leaked data contained claims (such as issuer, audience, and subject fields) that could be decoded and used as attribution signals. This was flagged as a creative insight McCarthy credited to the AI — something he noted he might not have thought to do himself.
Azure DevOps slug traversal (human-discovered, AI-enriched): McCarthy identified that several leaked datasets contained Azure DevOps organization slugs. He knew from recent personal research — not from the AI — that Azure DevOps slugs can be traversed through a series of poorly documented APIs to resolve tenant IDs and then map to actual customer domains. He built a six-stage enrichment pipeline: extract slug → call Azure DevOps API to resolve tenant ID → perform OIDC discovery lookup → extract domain from OIDC metadata → use an LLM to identify the most likely company match.
Feedback loop: AI samples → deterministic rules → coverage uplift: The engine followed a structured feedback loop. AI-based analysis would identify new potential attribution signals from sample data. Those signals were reviewed, validated, and then encoded into deterministic scripts. As one example, the AI identified a new environment variable that appeared to host self-hosted Git instance URLs; after encoding this as a deterministic rule and running it across the full dataset, the engine surfaced a major Russian e-commerce company, a Portuguese company, a Thai fintech, and a US Fortune 500 — all from a single new signal.
Skepticism injection to counter LLM credulity: The engine included explicit skepticism mechanisms to counter false attribution. McCarthy cited two concrete credulity failures: (a) the string “nucleus” in leaked data caused a model to attribute the victim to a company that uses a platform named Nucleus; (b) any Azure DevOps usage was incorrectly attributed to Microsoft rather than the actual Azure customer. To counter this, McCarthy injected skepticism prompts during attribution analysis (“let’s slow down, are these true positives?”) and ran attribution through a validation skill before finalizing results.
Scale result: The agentic attribution engine, built over approximately two days with heavy AI assistance, identified over 2,400 impacted companies. Prior manual analysis over two weeks had found 200. Among the identified victims, McCarthy manually confirmed that 37 of the Fortune 100 were concretely impacted by the Shai-Hulud 2.0 incident — none of those 37 attributions relied on LLM output alone.

Actionable Takeaways

Use reasoning models for high-signal, scoped attribution queries ("find me the top 10 companies" or "which match Fortune 100") rather than asking for exhaustive enumeration — the model will sample by default, so design your prompts around that constraint and run the loop repeatedly to build coverage.
Encode every AI-discovered attribution signal as a deterministic rule before scaling: validate the signal on a sample, write the rule, measure uplift on the full dataset, and repeat. This feedback loop is what turns probabilistic AI output into a scalable, repeatable attribution engine.
Inject explicit skepticism at every attribution decision point — review reasoning model outputs for false positives like generic name matches (common platform names attributed to specific vendors), platform-vendor conflation (Azure DevOps users attributed to Microsoft), and consumer credential misattribution before acting on any victim notification.

Common Pitfalls

Trusting reasoning model attribution at face value without validation: models will confidently link a generic string or common platform name to a specific organization. In victim notification scenarios, this produces false positives that damage trust and create legal exposure. Always validate high-stakes attributions through a deterministic secondary check or human review.
Asking the model for complete coverage in a single pass: reasoning models will shortcut to a representative sample and present it as comprehensive. If you need full coverage across 13,000 records, you need a harness that loops the model with explicit exit criteria, or you need to convert the AI's output into a deterministic rule that can be run exhaustively. Coverage is not a default behavior — it must be engineered.

Injecting Skepticism and Human-in-the-Loop Validation in AI Security Workflows

The Credulity Problem in AI-Assisted Security Research

One of the most dangerous characteristics of large language models in high-stakes AI-assisted supply chain attack investigation is their baseline tendency toward credulity — a willingness to make confident, plausible-sounding connections that are simply wrong. McCarthy observed this failure mode directly during victim attribution: the string “nucleus” appearing in leaked data was connected by the LLM to a specific company that uses a platform named Nucleus, despite “Nucleus” being an extraordinarily common platform name. In another case, the presence of Azure DevOps artifacts was used to conclude the victim must work at Microsoft — an obvious logical leap that would have generated false victim notifications.

This credulity is the shadow side of the same capability that makes reasoning models useful. The creativity that lets Claude identify that an encoded JWT likely contains extractable claims is the same mechanism that generates confident but incorrect attribution. You cannot turn off one without losing the other. The answer is not to stop using AI for attribution — the productivity gains are too significant — but to build human-in-the-loop skepticism into the workflow itself.

Skepticism Injection: The Most Direct Countermeasure

The simplest form of skepticism injection requires no architectural changes. During a coding agent session or data analysis workflow, pause deliberately and ask: “Are these true positives?” Run attribution results through an explicit validation pass before acting on them. This single intervention — stopping to interrogate the model’s conclusions rather than accepting them — catches the most egregious hallucinations.

At a more sophisticated level, skepticism can be encoded into the harness itself. McCarthy describes the concept of writing a skeptical persona — a second agent instantiated specifically to challenge and critique the outputs of the primary analysis agent. Running a mix of agents that balance each other introduces structured adversarial review into the pipeline. This is not a theoretical pattern; it is a direct engineering response to the known failure mode of baseline LLM alignment being too credulous for security-critical use cases.

Practical skepticism injection approaches:

Session-level: After any AI attribution run, prompt the model to explicitly enumerate its weakest attributions and the alternative explanations for each signal.
Harness-level: Implement a dedicated validation step that checks attribution outputs against a defined confidence threshold before they flow to the next stage.
Persona-level: Instantiate a separate agent with an explicit instruction to find flaws in the primary agent’s conclusions before results are accepted.

Human-in-the-Loop as a Signal Discovery Mechanism

Human-in-the-loop is not just a validation gate — it is also a signal discovery mechanism. McCarthy found patterns in the data that the AI was not pulling out: short terms and domain-specific identifiers that required human contextual knowledge to recognize as meaningful. The AI will work within the abstractions and signals you give it. It will not volunteer domain knowledge you have not encoded.

The correct response is not to abandon AI but to build a tight feedback loop between human observation and AI execution. When a human identifies a new signal the AI missed, the next step is not to manually process every instance — it is to feed that signal back into the AI for a new analysis cycle. McCarthy describes noticing Azure DevOps organization slugs in the data and recognizing, from his own prior research, that these slugs could be traversed through poorly documented APIs to resolve tenant IDs, which then resolve to actual domains. This was not something the AI surfaced. Once identified by the human, it became one of 69 attribution methods in the agentic engine — a six-stage enrichment pipeline traversing OpenID lookups and API calls to distill the most likely match.

Just-in-Time Throwaway Tools for Human-Guided Analysis

A concrete pattern for enabling effective human-in-the-loop analysis is the just-in-time throwaway tool. Instead of manually skimming through gigabytes of flat files looking for new signals, build a disposable data browsing tool on the spot. The cost is minutes of prompting; the payoff is a clean interface for human review that feeds discoveries directly back to the LLM for scale processing.

McCarthy describes building exactly this kind of tool during the attribution phase: a throwaway utility that let him skim data, identify attributes he suspected had signal, and then feed those attributes to the language model for iterative analysis cycles. The tool was not engineered for reuse — it was engineered to get him to a decision point faster than manual review, then hand off to the AI.

This pattern — build throwaway tool → human skims → human identifies signal → LLM scales it — is a direct counter to the limitation that AI will not question the abstractions you give it. The human changes the abstraction; the AI processes the new abstraction at scale.

Manual Confirmation as the Final Gate

For the most consequential outputs — victim notifications, public disclosures, Fortune 100 breach confirmations — AI attribution is not the final word. McCarthy explicitly notes that while the agentic analysis found over 2,400 impacted companies, the headline claim of 37 Fortune 100 companies was based on manually confirmed cases. The LLM attribution engine surfaces candidates and prioritizes review; human validation closes the loop before any claim is made.

This is the appropriate architecture for security operations where errors have real-world consequences. AI provides coverage and speed. Human review provides accountability and accuracy at the threshold that matters most.

Actionable Takeaways

Encode skepticism as a workflow step, not an afterthought: after any AI attribution or triage run, prompt the model to enumerate its weakest conclusions and alternative explanations before those outputs flow downstream into notifications or reports.
Build just-in-time throwaway tools to enable effective human-in-the-loop signal discovery — when you notice a pattern the AI missed, create a minimal browsing utility in minutes, identify the signal manually, then feed it back to the LLM to process at scale across the full dataset.
Reserve manual confirmation for the highest-stakes outputs (victim notifications, public breach claims): use AI attribution to generate a prioritized candidate list, but close the loop with human verification before any consequential action is taken.

Common Pitfalls

Accepting AI attribution outputs at face value without a validation pass: LLMs will make confident, plausible connections from weak signals — a common platform name, a single cloud provider artifact — that produce false positives in victim identification. The baseline alignment of most models is too credulous for security-critical attribution workflows without explicit skepticism injection.
Relying solely on AI to surface new signals in unfamiliar data: AI will not question the abstractions you provide or volunteer domain knowledge you have not encoded. Patterns that require specialized contextual knowledge — like knowing that a specific API slug can be traversed to resolve a tenant identity — will go unidentified unless a human is actively reviewing the data and feeding discoveries back into the pipeline.

Conclusion

The Shai-Hulud and Singularity investigations make a concrete case for what AI-assisted security workflows actually look like in practice — not AI as a magic oracle, but AI as a pattern discovery engine embedded inside a disciplined engineering process. The productivity gains are real (200 vs. 2,400 victim attributions), but they depend entirely on applying the right architecture: RPI loops before code generation, feedback loops from probabilistic AI output to deterministic scripts, and explicit skepticism injection before any result of consequence leaves the pipeline.

The three structural patterns from this investigation transfer directly to other security engineering domains. Feedback loops that convert AI signal discovery into deterministic rules apply to detection engineering, configuration auditing, and threat intelligence enrichment. The determinism-first principle — use AI for discovery, deterministic code for execution — applies anywhere you need scalable, auditable, cost-efficient analysis. And the credulity problem is universal: any workflow that surfaces AI-generated conclusions without a skepticism pass is one false positive away from a notification to the wrong company.

For engineers building supply chain security response capabilities, the highest-leverage investment is not picking the right LLM — it is building harnesses that encode structured workflow discipline so the right patterns get applied whether or not the analyst remembers to apply them. Plan Mode, composable utilities, and throwaway tools are all implementations of that same principle. McCarthy’s talk is a case study in what it looks like when that engineering investment pays off.

For related techniques on agentic AI in security contexts, see how other researchers have extended reasoning model patterns for offensive and defensive security tooling. For further reading on the secret scanning rule portability patterns discussed here, the principle that detection content is now fungible across engines is reshaping how teams approach toolchain selection across the board.

References & Tools

TruffleHog — Open-source secret scanning tool with ~800 detectors used as the source ruleset for the portability demonstration. ↩
Claude (Anthropic) — AI assistant used throughout the investigation for code generation, data analysis, signal extraction, and building the agentic attribution engine. ↩
GitHub CLI — Command-line interface for GitHub used in the naive scraping example to pull Singularity attack repositories by name. ↩
Superpowers (repository) — A seven-step structured AI code development process that encodes brainstorming as stage one — referenced as an example of a harness that bakes RPI-style discipline into the workflow. ↩
Gemini (Google) — Reasoning model used for bulk victim attribution — fed a merged string of 30,000 repository names and prompted to identify major companies from the dataset. ↩

The Challenge of Ephemeral Breach Data and Why It Is AI-Shaped

Supply Chain Attacks and the Race Against Disappearing Data

Why These Challenges Are “AI-Shaped”

The Stakes: Victim Notification at Scale

Building Effective AI Workflows: The RPI Loop and Composable Tooling

Why Naive Prompting Fails in AI-Assisted Security Research

The Research-Plan-Implement (RPI) Loop

Naive GitHub Scraper Prompt vs. Production-Ready Harness: What AI Omits Without RPI

Composable Utilities: Hoarding What You Build

Just-in-Time Tooling as a Core Skill

AI-Driven Data Analysis: Fingerprinting, Signal Extraction, and the Shift to Determinism

The Flat Files Trap: Why AI Never Questions Your Abstraction

Fingerprinting: Deduplicating 30,000 Repositories Down to 13,000 Unique Machines

CI/CD Fingerprinting: Deduplicating 30,000 Repositories to 13,000 Unique Machines

CI/CD Signal Extraction: Classifying What Was Breached

The Feedback Loop: From Probabilistic AI Output to Deterministic Scripts

Why Determinism Is the Goal, Not the Starting Point

Secret Scanning Rule Portability: Porting 58 TruffleHog Detectors to a New Engine in 30 Minutes

Victim Attribution at Scale Using Reasoning Models

The Attribution Problem: 13,000 Leaked Datasets, Unknown Owners

Layer 1: Reasoning Model Sampling on Raw Repository Names

Reasoning Model Victim Attribution: Extracting Fortune 100 Companies from a Merged String of Repository Names

Layer 2: JWT Claim Extraction as Attribution Signal

Layer 3: Azure DevOps Slug Traversal

The 69-Method Agentic Attribution Engine

Agentic Attribution Engine: 69 Attribution Methods, Azure DevOps Slug Traversal, and 2,400 Identified Victims

Injecting Skepticism and Human-in-the-Loop Validation in AI Security Workflows

The Credulity Problem in AI-Assisted Security Research

Skepticism Injection: The Most Direct Countermeasure

Human-in-the-Loop as a Signal Discovery Mechanism

Just-in-Time Throwaway Tools for Human-Guided Analysis

Manual Confirmation as the Final Gate

Conclusion

References & Tools

Questions from the audience

Related deep dives

Code Is Free: Securing Software | [un]prompted 2026

The SCA Balancing Act

Plugins Gone Rogue: Attacking Developer Environments

AI Code Generation - Benefits, Risks and Mitigation Controls