![Daniel Miessler presenting talk - Daniel Miessler - Anatomy of an Agentic Personal AI Infrastructure | [un]prompted 2026 at unprompted 2026](https://thecyberarchive.com/assets/teasers/personal-ai-infrastructure-for-security-engineers-daniel-miessler.webp)
What happens when every company becomes an API and your competitors — whether McKinsey consultants or rival enterprises — have already converted their institutional knowledge into automated, AI-driven processes? Personal AI infrastructure for security engineers is no longer a productivity experiment; it is a defensive posture. The practitioners who define their own systems first will set the terms for everyone else.
This post unpacks Daniel Miessler’s agentic PAI (Personal AI Infrastructure) stack presented at [un]prompted 2026 — the Council multi-agent debate system, the iterative depth research technique, the PAI algorithm’s scientific-method loop, and the Arbo modular pipeline framework — giving security engineers a concrete blueprint for building AI systems that amplify rather than replace their expertise.
Key Takeaways
- You'll learn how to architect a unified personal AI infrastructure that centralizes all your tools, context, and skills so every capability you build compounds — nothing is built twice.
- You'll be able to implement multi-agent debate (Council) and iterative-depth research techniques to produce dramatically higher-quality outputs than single-shot LLM prompting.
- Apply the PAI algorithm's scientific-method loop — ideal-state criteria as both generation target and verification gate — to any open-ended task, not just code, to get reliably better results from agentic systems.
The Agentic Future: Companies as APIs and the End of Manual Interfaces
AI security and the tooling models around it are shifting faster than most practitioners realize. Daniel Miessler opens with a deceptively simple observation: when Excalidraw[1] shipped AI-assisted diagram generation, the team described the feature as “log in, go to the tool, type what you want, and the diagram appears.” For most users that sounds convenient. For practitioners operating agentic workflows, it is a dead-end. A tool that cannot be reached programmatically by an agent is, from his perspective, effectively non-existent.
Companies as APIs: What It Actually Means
The thesis Miessler advances is that companies must expose their capabilities as machine-readable APIs or risk becoming invisible to the agents that increasingly mediate human interaction with the world. The corollary is that the consumer-facing GUI is becoming a secondary interface — a fallback for humans who have not yet delegated that task to an agent.
He extrapolates this to a near-term future that resembles a ratings marketplace for software services — analogous to IMDb or Rotten Tomatoes — where an agent, given a task, queries a capability registry, evaluates competing services across multiple dimensions (reliability, cost, latency, security posture), and selects the best one autonomously. The human never opens a browser tab.
For security engineers, the implications are immediate:
- Attack surface shifts. When agents mediate service consumption, the attack surface migrates from the UI layer to the API and agent-to-agent communication layer. Injection attacks, credential theft, and prompt injection all become higher-value targets.
- Tooling assumptions break. Security testing workflows built around manual UI interaction — browser-based scanners, point-and-click assessment tools — face obsolescence. The target environment may be entirely agent-operated, with no human-navigable interface at all.
- Custom software proliferates. In agentic environments, assessors may encounter completely custom software stacks — code generated on demand for a specific task, never reused, never publicly documented. Standard signature-based detection is useless against software that has never been seen before.
The Personalized Reality Problem
The second major consequence Miessler identifies is epistemic: when every person’s agent filters their information environment, individuals begin experiencing divergent realities. Your agent surfaces news, research, and signals tuned to your context and goals. Your neighbour’s agent does the same — differently. The shared information substrate that makes coordination and shared situational awareness possible begins to fragment.
For security teams, this has a practical dimension. Threat intelligence, industry alerts, and vulnerability disclosures will increasingly arrive pre-filtered through each practitioner’s personal AI stack. Teams that have not intentionally aligned their agents’ information sources risk operating from incompatible threat pictures — a coordination problem that adversaries can exploit.
From Human Processes to Algorithmic Graphs
Drawing on years of consulting across industries, Miessler observes that most companies run on informal human processes — recommendations that function as suggestions rather than enforced rules, with humans making ad-hoc decisions at every step. A small class of highly regulated industries (large banks, energy companies, defence contractors) operate differently: they run on formal process graphs where the only question after an incident is “did we follow the process?”
His prediction is that AI will push all companies toward the latter model. CEOs and CFOs will demand that institutional knowledge be encoded as explicit, machine-readable SOPs — because that is the only form AI can act on reliably. A company whose processes live in employees’ heads cannot be automated, audited by an AI system, or defended systematically.
The security implication is significant. Miessler points to the long-standing ambition to shift left — embedding security earlier in the development lifecycle. He argues this goal becomes achievable when AI is already writing the code: if the AI writes all the code, and the AI can be given a secure-by-default SOP to write against, then security is not bolted on after the fact. It is the default output of a correctly specified process.
What This Means for the Practitioner Right Now
Miessler frames the shift explicitly as a competitive race: McKinsey-style consultancies and large enterprises are already converting institutional knowledge into automated processes. Security practitioners who wait will find their expertise commoditised and their workflows replicated by others.
The core move is straightforward: make your knowledge explicit. Any workflow, assessment methodology, or repeatable task that lives only in your head is, in Miessler’s framing, an attack surface. Encode it, automate it, and own it — before someone else does.
Actionable Takeaways
- Audit your current security tooling for API accessibility. Any tool that requires manual UI interaction and exposes no programmatic interface should be flagged for replacement or supplementation — in agentic workflows, tools without APIs are effectively invisible.
- When planning security assessments for agentic environments, build for the assumption that no human-navigable interface exists. The target stack may be entirely custom, agent-generated software — prepare assessment methodologies that do not rely on signature databases or prior documentation of the target system.
- Begin encoding your repeatable security workflows as explicit, machine-readable SOPs now. The shift from informal human process to formal algorithmic process is already underway in well-resourced organisations; practitioners who document and systematise their expertise first retain control over how that expertise is applied and automated.
Common Pitfalls
- Treating AI-assisted tooling as a UI enhancement rather than an API-first architectural requirement. Adding a chat interface to an existing tool does not make it agentic-ready. If the capability cannot be invoked programmatically by an external agent, it remains a manual tool — and Miessler's thesis is that manual tools will be bypassed entirely by agents that can find equivalent capability through an API-accessible alternative.
- Assuming that current threat intelligence and situational awareness processes will survive the information-filtering shift intact. When every practitioner's agent curates their information diet differently, teams that do not explicitly align their intelligence sources will lose the shared threat picture that makes coordinated response possible.
Designing a Unified Personal AI Infrastructure Stack
The foundation of personal AI infrastructure for security engineers is a deceptively simple architectural decision: put the human — not the AI, not the code — at the center of the system. Miessler is explicit that his stack is “Claude Code-based but code is not the center of it. It’s more so the magnification of the human.” Whether that human is a security researcher, an entrepreneur, or an artist, the system’s purpose is to amplify what they are already capable of — not to replace them.
This distinction matters practically. A stack built around a tool is brittle: it breaks whenever the tool changes, and every new capability you discover has to be manually wired into your workflow. A stack built around a person persists and compounds because the human’s goals, context, and preferences remain the stable core that all capabilities serve.
One Unified Harness, Not a Collection of Prompts
The second architectural principle is unification. Miessler describes deliberately building “one single unified AI system” and bringing everything into it:
“I like to have one single unified AI system and anything I do inside of AI I bring into it. If I learn something from some other AI, I bring that in as a module.”
The practical implication: when you operate a unified harness, every new capability you build is immediately available to every other part of the system. The PAI algorithm, for example, “instantly knows how to use anything custom that I built.” This is a compounding advantage — each skill added multiplies the value of all existing skills because the orchestrating agent has full visibility into the available toolkit.
Contrast this with the fragmented alternative: a Slack integration here, a custom GPT there, a shell script that pipes to a Python script that calls an API. In that model, every capability lives in isolation. Reusing it requires remembering it exists, finding it, and manually connecting it. Nothing compounds.
Built on Markdown and Claude Code
The implementation choice is deliberate: PAI is built entirely on AI code generation tooling — specifically Claude Code[2] — using Markdown for skills, agents, and context. This is not just a tooling preference — it is a strategic decision with two benefits:
-
Portability and enterprise compatibility. As Miessler notes: “this is built on top of Claude Code, right? Guess what it is? It’s Markdown just like Claude Code, which means everyone’s allowed to use cloud code at work anyway. So you just can just like put this on top and you’re good to go.” If your organization permits Claude Code — and increasingly they do — your personal AI infrastructure runs inside that approved boundary without any additional policy exceptions.
-
Determinism at the edges. Using an established execution environment for all skill invocation means the probabilistic reasoning happens inside a predictable shell. The agent reasons; the harness executes. This is the same separation of concerns that makes agentic systems reliable at scale.
The Compounding Returns Argument
The strongest case for a unified personal AI infrastructure is economic, not technical. Miessler frames it directly:
“When you have a unified AI system like this, you can basically take everything you’re doing and put it inside the system. You put your context in here, you put all your tools… What that means is you end up only doing anything once and then you incorporate it into your harness and then it’s just part of your system.”
For agentic workflow orchestration, this translates to a concrete rule: build once, deploy everywhere. A recon pipeline you write for one engagement is immediately reusable for all future engagements. A threat-modeling prompt template you refine becomes a permanent, context-aware skill that the system can invoke on any future task without you having to remember it exists.
The compounding dynamic also applies to context. Because everything flows through one harness, the agent accumulates a rich, persistent understanding of who you are, what you care about, and how you work. Miessler illustrates this with the PAI algorithm: “it knows everything about me. So I could give a single sentence and it can say well he probably meant this because of all this context.” That richness is only possible in a unified system — it cannot be replicated by a loose collection of one-off tools.
The Strategic Risk of Staying Fragmented
Miessler closes his talk with an explicit threat model for practitioners who do not build a unified system:
“You do not want to have anything that you care about be like an amorphous sort of blob because I feel like that is an attack point for somebody in AI, somebody [like] McKinsey, a giant team of smiling 22-year-olds to come and attack that thing and extract everything out of it and basically turn that into a process which basically outsources you.”
The implication for security automation is direct: the consultancies and enterprises that systematize your domain knowledge before you do will define the workflows, set the quality bar, and capture the leverage. The unified PAI approach is not just a productivity tool — it is a defensive posture against institutional displacement.
PI Upgrade Skill: Automated AI Stack Currency Monitoring
Proof of Concept
- Define source targets. The skill is configured to scrape engineering blog posts from Anthropic (primary) and OpenAI, plus GitHub repository release notes for both providers. Miessler notes the focus is “mostly Anthropic” given his Claude Code-based stack.
- Timestamp-gated fetch. Each run records when it last checked. On the next execution the skill retrieves only content published after that timestamp — covering all new releases since the previous check without re-processing already-seen content.
- Content consolidation. Raw blog posts, release announcements, and GitHub release notes are aggregated into a unified change set for the current run window.
- Stack introspection. The skill reads the user’s existing PAI system — all current skills, agents, goals, and custom tools that have been built and incorporated into the unified harness — to build a representation of what is already in use.
- Gap matching. The skill compares the consolidated upstream changes against the current stack state and identifies which new features, APIs, or model capabilities are most relevant to the user’s specific configuration and goals.
- Recommendation output. The skill returns a structured list of actionable upgrade suggestions — for example, “update this skill to use the new API parameter,” “add this agent capability now available in the latest release,” or “replace this workaround now that the provider ships native support.” The output is scoped to what the system judges best for the user’s particular setup, not a generic changelog summary.
Note: Miessler demonstrates the output format visually in the talk but does not walk through the full implementation code or the exact prompt/agent wiring that drives the skill. The conceptual logic and input/output contract are clear; the internal agent orchestration details are not fully disclosed in this presentation.
Actionable Takeaways
- Audit your current AI usage and identify all one-off scripts, custom prompts, and ad-hoc tools you have built. Migrate them into a single unified harness (e.g., PAI on Claude Code) so the orchestrating agent has full visibility into your entire toolkit and can invoke any capability on any task without manual wiring.
- Define your own context explicitly as a first-class input to your AI system — your goals, domain expertise, preferred working style, and recurring task types. A unified system with rich personal context produces dramatically better outputs because the agent can resolve ambiguous requests against what it knows about you rather than making generic assumptions.
- Treat every new capability you build as a permanent module, not a throwaway script. Write it to be reusable, document its interface in Markdown, and add it to your harness immediately. Each addition multiplies the value of all existing skills through the compounding dynamic Miessler describes.
Common Pitfalls
- Maintaining a fragmented collection of tools and prompts across multiple platforms (separate GPTs, standalone scripts, ad-hoc API calls) instead of a unified harness. This eliminates the compounding advantage: the agent cannot reuse or compose tools it does not know exist, and every new capability requires manual integration work rather than being immediately available system-wide.
- Centering the system around a specific AI tool or model rather than around the human and their goals. Tool-centric architectures break whenever the tool changes (new model, new API, new policy) and do not accumulate context over time. Human-centric architectures persist because the person's goals and context are the stable core that all tooling serves.
Multi-Agent Debate and Advanced Prompting Techniques
The leap from single-shot prompting to multi-agent systems is one of the most consequential upgrades a security engineer can make to their personal AI infrastructure. Instead of asking one model to reason through a complex problem in isolation, you provision a committee of expert agents — each with a distinct perspective — and let them argue. The result is qualitatively different from anything a single prompt can produce.
Council: Spinning Up a Multi-Agent Debate System
Council is the centerpiece of this capability in Miessler’s PAI stack. The mechanic is straightforward but powerful: given a task, the system dynamically spins up between 2 and 16 custom expert agents, each configured as a specialist in the relevant domain. Those agents then debate — aggressively — about the correct approach.
The flow works like this:
- Task intake. You provide a problem statement (e.g., “should this service use test-driven development or spec-driven development?”). The parent agent determines the appropriate expert profiles and the number of debating agents to provision.
- Parallel argumentation. Each agent forms its own position and constructs its best argument in favor of it. Agents are not trying to reach consensus — they are trying to win the argument.
- Multi-round debate. A configurable parameter sets how many rounds the agents debate back and forth, each round pushing each agent to respond to the prior arguments and refine or defend its position.
- Parent agent arbitration. A supervising parent agent watches the entire exchange, synthesizes the best arguments, and either makes a final recommendation directly or surfaces the distilled recommendation to the human in the loop for a final decision. In Miessler’s current implementation, the recommendation goes to the human, who chooses the direction.
For security engineers, this pattern is directly applicable to threat modeling sessions, where competing risk perspectives (attacker, defender, compliance officer, engineer) rarely surface organically in a single LLM call. Council formalizes that adversarial lens. It is equally powerful for architecture review — spinning up agents representing cryptography, network security, IAM, and supply chain risk, then letting them debate a proposed design before a single line of code is written.
Council Multi-Agent Debate: Parallel Expert Agents Arguing System Design
Proof of Concept
-
Define the problem task. The user submits a system-design or architecture question to the PAI harness — for example, “Should we adopt test-driven development or spec-driven development for this service?” The task description is passed to the Council skill as the debate topic.
-
Dynamic agent instantiation. Based on the task, the Council skill spins up between 2 and 16 custom expert agents. Each agent is configured with a distinct expert persona relevant to the domain (e.g., a TDD advocate, a spec-driven development advocate, a security architect, a performance engineer). The exact number of agents is a tunable parameter set by the user at invocation time.
-
Parallel expert argumentation — Round 1. All instantiated agents independently generate their strongest argument for their assigned position. Each agent produces a structured opinion: the approach it advocates, the supporting rationale, and the specific trade-offs it highlights.
-
Multi-round debate exchange. The agents engage in back-and-forth debate for a configurable number of rounds (the
roundsparameter). In each round, agents read the previous positions of opposing agents and sharpen or rebut their arguments. This iterative adversarial exchange forces each agent to address weaknesses in its own position and respond to the strongest counter-arguments from peers. -
Parent agent observation and arbitration. Throughout all rounds, a parent orchestrator agent watches the full debate log without participating. It observes which arguments hold up under scrutiny, which positions collapse, and where consensus or irreconcilable divergence emerges. The parent agent applies meta-level reasoning across the entire debate to identify the most defensible direction.
-
Recommendation output. After the final debate round, the parent agent generates a structured recommendation: the preferred direction, the primary reasons it won the debate, and the key trade-offs acknowledged. In Miessler’s current implementation, this recommendation is surfaced to the human operator, who makes the final selection.
-
Human-in-the-loop final decision. The human reviews the parent agent’s recommendation alongside a summary of the debate output and selects the direction to proceed. This preserves human oversight while offloading the exhaustive multi-perspective analysis to the multi-agent system.
Note: Miessler demonstrated Council’s output format and described its mechanics, but did not expose the full prompt templates used to instantiate each expert agent, the exact schema for debate round messages, or the parent agent’s arbitration logic. A complete implementation would require engineering those components from the described behavior.
Iterative Depth: The Same Question From Multiple Angles
Iterative depth is drawn from a published research paper and the mechanism is deceptively simple: pose the same core question repeatedly, each time from a different perspective or with slightly different framing. The AI encounters the same conceptual surface area multiple times but from distinct vantage points, producing a far more thorough coverage of the problem space than any single prompt achieves.
Miessler reports “really good results” with this technique. The key insight is that large language models are sensitive to framing — the same underlying question, asked differently, activates different reasoning pathways. By deliberately cycling through those framings, you systematically surface information and inferences that a single-pass prompt would miss.
For security engineers conducting red-team hypothesis generation, this is especially valuable. Asking “what are the highest-risk attack vectors for this system?” from the perspective of a network attacker, then an insider threat, then a supply chain adversary, then a social engineer, produces a richer threat landscape than any single framing. Each pass illuminates blind spots left by the previous one.
First-Principles Reverse Engineering
The first-principles skill in PAI approaches troubleshooting and analysis from the opposite direction of most debugging workflows. Rather than diagnosing a specific suspected cause, it reverse-engineers the root cause space — identifying every plausible root cause that could produce the observed symptom, before any one cause is investigated.
This is particularly relevant for incident response and post-mortems in security contexts. When a detection fires, the instinct is to follow the most obvious path. First-principles analysis forces a broader enumeration first, reducing the risk of anchoring on the wrong cause and missing the actual attack vector.
Creative Divergence: Manipulating Output Probability for Novel Results
The creative divergence technique originates from a published paper (credited to Zang and collaborators in the transcript). Where standard approaches to increasing LLM creativity rely on raising temperature — a blunt instrument that introduces noise as much as novelty — creative divergence works by manipulating the output probability distribution more precisely. The result is outputs that are “wildly more creative” than temperature tuning alone produces, without the degradation in coherence that comes with high-temperature sampling.
For security engineers, this technique is most useful in offensive security brainstorming — generating novel attack hypotheses, crafting unconventional phishing narratives, or producing creative bypass attempts for a detection rule under review. When the goal is to think outside familiar patterns, creative divergence provides a mechanism for systematically escaping the model’s default probability mass.
Combining These Techniques in a Security Workflow
These four capabilities — Council debate, iterative depth, first-principles enumeration, and creative divergence — are not mutually exclusive. A mature agentic AI workflow for security might chain them:
- Use creative divergence to generate an unconventional initial threat hypothesis.
- Feed that hypothesis into first-principles analysis to enumerate its root cause tree.
- Run iterative depth from multiple attacker perspectives to stress-test the hypothesis.
- Convene a Council of specialized agents (red team, blue team, compliance, architecture) to debate the priority of the identified risks and recommend mitigations.
The human sits at the end of that chain as decision-maker, not as the bottleneck doing each analytical step manually.
Why Single-Shot Prompting Is No Longer Sufficient
Miessler’s framing is direct: if you are still issuing single prompts and accepting the first response, you are leaving substantial analytical depth on the table. Each of the techniques above addresses a specific failure mode of single-shot prompting:
| Failure mode | Technique |
|---|---|
| One-sided reasoning | Council (multi-agent debate) |
| Surface-level coverage | Iterative depth |
| Anchoring on obvious causes | First-principles |
| Repetitive, low-novelty output | Creative divergence |
Understanding which failure mode applies to your current task — and selecting the appropriate technique — is the core skill that distinguishes engineers who get dramatically better outputs from those who remain stuck at the average capability of the underlying model.
Actionable Takeaways
- Implement Council-style multi-agent debate for your next threat model or architecture review: provision at least three specialized expert agents (e.g., network attacker, insider threat, compliance auditor) with a configurable number of debate rounds, and use a parent agent to synthesize the arguments before you make a final risk prioritization decision.
- Apply iterative depth to red-team hypothesis generation: take a single attack surface or detection gap and ask the same core question — "what could an attacker do here?" — from at least four distinct adversary perspectives (nation-state, opportunistic criminal, insider, supply chain). Each pass should build on and challenge the conclusions of the previous one.
- Map your current analytical bottlenecks to the four techniques: if your threat models are repetitive, switch to creative divergence for hypothesis generation; if incidents keep surprising you with root causes, use first-principles enumeration before investigating; if your architecture reviews miss cross-domain risks, Council is the right tool. Match the technique to the failure mode, not to habit.
Common Pitfalls
- Treating Council's parent-agent recommendation as final without human review. Miessler explicitly keeps the human in the loop at the arbitration stage — the parent agent makes a recommendation and the human chooses the direction. Automating past that decision point removes the accountability layer that catches cases where all agents converged on a plausible but wrong conclusion.
- Conflating increased temperature with creative divergence. Raising temperature is a blunt instrument that increases noise alongside novelty. The creative divergence technique described in the transcript operates on output probability distributions in a more targeted way. Using high temperature as a substitute will degrade output coherence without delivering the same quality of novel results.
The PAI Algorithm: Scientific-Method Loop for Open-Ended AI Tasks
The dominant reason agentic AI workflows work so well for code is the existence of a binary verification signal — either the code runs and tests pass, or it does not. Miessler frames this directly: software has “handholds” that let an AI agent grab onto something concrete, namely whether the output works or does not work.
The problem emerges the moment you leave that domain. Ask an agent to write a threat model, draft a detection rule narrative, produce a penetration test report, or compose a short story, and the verification signal collapses. As Miessler put it: “What are you going to do if you say write me an essay about mind-body transfer in Arkansas in 2038 or write me a short story? How does it know what a good short story is?”
This is not a niche edge case. The majority of security engineering work — policy drafting, architectural review write-ups, detection logic documentation, red-team reports — falls squarely in this open-ended category. Without a verification gate, agentic loops either produce unchecked output or stall.
The Algorithm’s Central Insight: Ideal-State Criteria as Both Target and Gate
The PAI algorithm solves this by doing two things simultaneously with a single artifact: discrete, testable ideal-state criteria.
The algorithm first reverse-engineers a vague request into a set of criteria that represent the ideal completed state. Crucially, these same criteria then serve as the verification gate at the end of the process. Generation target and acceptance test are the same object.
Miessler’s description is precise on this point: “The ideal state criteria are the same exact ones as the verification criteria.” This dual-use design is what makes the loop self-contained. The agent does not need an external oracle or human checkpoint between generation and verification — the criteria embedded at the start of the task define what “done” means.
Writer’s Blindness and Context-Aware Request Expansion
A second problem the algorithm addresses is what Miessler calls writer’s blindness: the gap between what a person intends and what they actually articulate. “You have something in your mind but you’re not communicating it clearly. And I think this is like a central theme for all AI. Clarity of thought, clarity of articulation is like absolutely everything.”
The algorithm compensates for this by exploiting context. Because the PAI system holds comprehensive knowledge about the user — their preferences, prior work, goals, and operating environment — a single-sentence request carries far more inferential weight than it would in a context-free prompt.
Miessler illustrated this with a concrete example: a request like “build me an entire role playing game system — history, languages, terrain, combat system” would be expanded by the algorithm into a full set of inferred intent criteria. The system reasons: given everything it knows about the requester, what did they definitely mean, what did they definitely not mean, and what unstated requirements are implied? Each inference is then converted into a discrete and testable ideal-state criterion.
The strict rule on criterion quality is explicit: criteria must be discrete and testable. Vague aspirational criteria (“should be good”) are disallowed. Each criterion must represent a state that can be verified as present or absent.
The Seven-Phase Scientific Method Structure
Miessler confirmed during the Q&A that the algorithm contains seven phases and that its overall structure is “roughly based on the scientific method.” While the talk did not enumerate all seven phases by name, the scientific method framing provides a navigable skeleton:
- Observation / Request Intake — ingest the raw request in full context
- Hypothesis Formation / Ideal-State Derivation — reverse-engineer intent into discrete testable criteria representing the ideal completed state
- Experimental Design — decompose the task into sub-tasks that each address a subset of the criteria
- Execution — run agents against each sub-task
- Measurement — evaluate outputs against the ideal-state criteria
- Analysis — identify gaps, contradictions, or unmet criteria
- Verification / Acceptance — confirm all criteria are satisfied before the result is surfaced to the user
The algorithm does not converge by gradient descent or any mathematical optimization (Miessler explicitly rejected this framing in response to an audience question). It is a structured reasoning loop — deterministic in structure, probabilistic in execution at each phase.
Integration with Claude Code’s Task System
The criteria are not held in a prompt string. Once the algorithm reverse-engineers a request and produces its ideal-state criteria, those criteria are written into the Claude Code task system as maintained task objects. Claude Code then manages progress tracking across the criteria.
This integration means the criteria persist across agent turns, are visible to all sub-agents working on the task, and provide a shared definition of completion that any agent in the pipeline can test against. During the live demo, Miessler showed a monitoring view where multiple active tasks were visible simultaneously, with their positions inside the algorithm’s phases displayed in real time.
Why This Matters for Security Engineering Work
Security engineers routinely produce outputs that resist binary verification:
- Penetration test reports — must meet criteria like “all in-scope hosts covered,” “each finding tied to a specific evidence artifact,” “remediation guidance is actionable,” none of which reduce to a pass/fail test without an explicit criterion set
- Detection rule documentation — a YARA rule or Sigma rule may be syntactically valid but fail to cover the intended adversary behavior; only pre-defined behavioral criteria can catch this gap
- Policy and procedure documents — completeness and accuracy cannot be verified without a reference checklist derived from the document’s stated purpose
- Threat model narratives — “covers all relevant attack surfaces” is untestable without first enumerating what surfaces qualify as relevant
The PAI algorithm provides a generalized pattern for all of these: before generation begins, force the system to produce a criterion set; after generation ends, verify against that same set. The cost of this discipline is one additional reasoning step at the start of each task. The benefit is a self-contained verification gate that does not require human review at every iteration.
PAI Algorithm: Reverse-Engineering a Vague Request into Testable Ideal-State Criteria
Proof of Concept
-
Submit a minimal, vague request to the algorithm. The practitioner provides a single sentence, e.g., “Build me an entire role-playing game system — I want history, languages, terrain, and a combat system.” No further detail is required at this stage; the algorithm is designed to operate on sparse input.
-
Context ingestion and intent expansion. Because the PAI system maintains a persistent profile of the practitioner (goals, prior work, preferences, tool inventory), the algorithm reads that full context and asks: what did this person almost certainly mean? It surfaces implicit constraints — things the practitioner “definitely didn’t want” and things they “definitely did want” — that the practitioner never stated explicitly.
-
Decompose intent into discrete, testable ideal-state criteria. Each inferred want and constraint is translated into a criterion that is discrete (covers exactly one verifiable property) and testable (has a binary pass/fail evaluation path). Vague notions like “the combat system should feel right” are rejected; only criteria that can be checked unambiguously qualify. Miessler’s explicit rule: “the strict rule is it has to be discrete and testable.”
-
Inject criteria into the Claude Code task system. The finalized ideal-state criteria are written directly into the Claude Code task/todo framework so that every downstream agent working on the problem has visibility into the full acceptance set. The criteria are maintained by Claude Code throughout the lifecycle of the task.
-
Execute the seven-phase scientific-method loop. The algorithm proceeds through phases modeled on the scientific method — framing the problem, forming hypotheses (candidate outputs or approaches), running experiments (generation passes), and tracking progress toward the ideal state.
-
Use ideal-state criteria as the verification gate. At the final phase, each generated artifact is checked against the same ideal-state criteria that were produced in step 3. Because the criteria are discrete and testable, the verification step is deterministic — the output either satisfies the criterion or it does not. This is the core insight: “the ideal state criteria are the same exact ones as the verification criteria.”
-
Observe multi-agent progress in real time. When the algorithm runs, the practitioner can view a dashboard showing all active agents, which phase of the algorithm each is in, and how far each has progressed toward satisfying the ideal-state criteria.
Note: Miessler explicitly flags this as “highly theoretical and experimental.” The internal mechanics of the seven phases, the precise data structures used to represent ideal-state criteria, and the evaluation logic for non-code outputs are not fully detailed in the talk. The PoC is therefore partial — the conceptual loop is clear and reproducible in principle, but practitioners implementing it independently will need to design the criterion schema and evaluation harness themselves.
Actionable Takeaways
- Before running any open-ended agentic task (report writing, policy drafting, detection logic documentation), explicitly enumerate discrete, testable acceptance criteria in advance. These criteria should represent the ideal completed state of the output. Use them as both the generation target given to the agent and the verification gate applied after generation completes.
- Apply context-augmented request expansion to compensate for writer's blindness. When constructing a task for an agent, provide not just the request but explicit "what I definitely want" and "what I definitely do not want" items alongside the acceptance criteria. This mirrors the PAI algorithm's reverse-engineering step and dramatically reduces vague, incomplete, or misaligned outputs.
- Integrate the scientific-method loop structure into any multi-turn agentic workflow for non-code outputs: intake → ideal-state derivation → sub-task decomposition → execution → measurement against criteria → gap analysis → verification. Skipping the derivation step (phase 2) removes the verification gate and produces unverifiable results regardless of how capable the underlying model is.
Common Pitfalls
- Treating open-ended tasks the same as code generation — assuming the agent will self-verify without an explicit criterion set. Code agents have a binary test signal; document, policy, and narrative agents do not. Without pre-defined ideal-state criteria, the agent has no basis for knowing when the output is complete or correct, and outputs will be inconsistent across runs.
- Using vague or aspirational criteria that cannot be discretely tested ("the report should be comprehensive," "the policy should be clear"). The PAI algorithm's strict rule is that each criterion must be discrete and testable — a state that is either present or absent. Vague criteria cannot function as a verification gate and will silently pass incomplete or inaccurate output.
Arbo Pipelines and the Modular Agentic Workflow Architecture
Arbo (Spanish for “tree,” árbol) is the composable pipeline layer inside PAI that Miessler describes as something he “has been trying to do for 10 years.” The core concept is deceptively simple: any discrete AI action — or any deterministic function — is wrapped into a self-contained unit. Each unit can run locally on the command line or be deployed remotely on Cloudflare Workers[3]. Units are then chained together into pipelines. When a pipeline is connected to a source and a destination, it becomes a flow.
The design philosophy maps directly onto how security automation actually works in practice. Security engineers already think in pipelines: enumerate targets, resolve subdomains, port-scan, fingerprint services. Arbo formalizes that mental model into reusable, independently testable components that can be wired together without rebuilding shared infrastructure each time.
Discrete Actions as First-Class Objects
The key architectural decision in Arbo is discreteness. Each action is:
- Self-contained — it does one thing and exposes a consistent interface
- Portable — it runs identically on the local CLI or as a Cloudflare Worker
- Composable — its output becomes the input of the next action via standard piping
Miessler demonstrates this with a recon pipeline: run a command to find all top-level domains for a target (e.g., Tesla), pipe that output into get subdomains, pipe that into Nmap to retrieve open ports. Each step is an independently maintained Arbo action. None of them need to know anything about the others. If a better subdomain enumeration tool appears, you swap that single action without touching the rest of the pipeline.
This is the agentic equivalent of Unix’s “do one thing well” philosophy applied to AI-augmented security tooling.
Building Pipelines and Flows
Miessler distinguishes between two composition levels:
- Pipeline — a linear chain of actions. Input flows through each step in sequence and produces an output. Used for recon chains, data transformation sequences, and multi-step API orchestration.
- Flow — a pipeline connected to a persistent source (e.g., an RSS feed, a social media account, an intelligence feed) and a persistent output (e.g., a database, a UI, a notification channel). Flows run continuously or on a schedule.
The distinction matters for agentic workflow orchestration: pipelines are invoked on demand, flows run autonomously as standing intelligence-collection infrastructure.
Surface: Arbo in Production as an Intelligence Aggregator
Miessler’s primary worked example of Arbo in practice is Surface, a personal intelligence aggregation system he built for himself. Surface demonstrates every layer of the architecture:
Source layer: Approximately 4,000 individual sources — intelligence feeds, OSINT accounts, YouTube channels, RSS subscriptions, and Bluesky follows — are ingested continuously.
Processing layer: Each item passes through a shared Arbo action called label and rate, which uses an LLM to assess the quality and relevance of the content. This single reusable action is the key differentiator: it applies a consistent quality bar regardless of who produced the content.
Output layer: Processed items surface in a ranked, labeled interface — sorted by quality score, not by follower count, publication date, or source authority.
The practical security implication Miessler highlights: authority is decoupled from quality. A prominent analyst writing a low-signal post does not appear. An unknown researcher writing a technically precise analysis does. For security engineers trying to maintain situational awareness across a noisy threat intelligence landscape, this represents a structural improvement over algorithm-driven feeds that optimize for engagement rather than signal.
Arbo + Surface: Chaining Modular AI Actions into an Intelligence Aggregation Pipeline
Proof of Concept
-
Define a discrete action node. Each Arbo unit encapsulates exactly one AI capability or function — for example, “resolve top-level domains for a target,” “enumerate subdomains,” or “run a port scan.” The node has a single input contract and a single output format. It can run locally on the command line or be deployed as a Cloudflare Worker for cloud execution without a persistent local runtime.
-
Chain nodes into a pipeline. Because every node is independently discrete and exposes a consistent interface, output from one node pipes directly into the next. A concrete recon chain:
find all TLDs for Tesla→ pipe output intoget subdomains→ pipe intoNmap/port scanner. No shared state, no monolithic script — just composable units wired together. -
Attach a source and a sink to form a flow. When a pipeline is connected to a live data source on one end and a persistent output on the other, it becomes a continuously running flow. The Surface aggregator is built on top of this pattern: the input source is a curated list of roughly 4,000 feeds (RSS, Bluesky accounts, YouTube channels, OSINT analysts, intelligence sources), and the output sink is a ranked, labeled interface.
-
Insert a “label and rate” quality-scoring node. Inside the Surface flow, every ingested item passes through a dedicated AI action that assesses content quality on its own merits — not on the author’s name or follower count. The node assigns quality labels and a numeric rating. A well-argued post from an unknown author scores higher than a low-effort post from a high-profile name. The scoring criteria are applied consistently by the AI action, not by human curation bias.
-
Render output through a sorted, labeled interface. Downstream of the quality-scoring node, Surface presents only the highest-rated items in a ranked view with labels. Items that fall below the quality threshold are suppressed entirely.
-
Reuse nodes across unrelated pipelines. Because each Arbo action is standalone, the same “get subdomains” node used in the recon pipeline can be independently called from a different context (e.g., asset discovery for an engagement scope). There is no duplication of logic; the modular architecture means every capability built once is immediately available to every future pipeline.
Note: Miessler does not publish the full node definitions or Cloudflare Worker deployment manifests in this talk. The architectural pattern — discrete action → chain → flow — is demonstrated conceptually and through the Surface use case, but the underlying node code, rating model prompt, and deployment configuration are part of the broader PAI open-source project rather than being fully enumerated here.
Actionable Pattern for Security Engineers
The Arbo pattern is directly applicable to common security automation scenarios without requiring PAI’s full stack. The reusable pattern is:
- Identify a multi-step security workflow you run repeatedly (asset discovery, CVE triage, log enrichment, detection engineering).
- Decompose each step into a discrete, self-contained action with a defined input schema and output schema.
- Deploy individual actions as CLI scripts or lightweight Cloudflare Workers.
- Chain them into pipelines using standard stdin/stdout or a simple message-passing contract.
- Attach persistent sources and outputs to convert pipelines into flows for standing automation.
Every action you build is immediately reusable across any other pipeline in the system. The compounding advantage applies here in its most concrete form: you build the subdomain enumeration action once and it is available to every future recon pipeline, every future flow, and every other agent in your harness.
Why Modular Agentic Architecture Matters for Security
The fragmentation problem in security tooling is well understood: organizations accumulate dozens of scripts, each solving one problem, none of them composable with the others. Every new project rebuilds the same plumbing. Arbo’s answer is not a monolithic platform — it is a naming convention and composition contract that makes existing tools first-class citizens in a unified pipeline architecture.
As Miessler notes, the end state of this approach is a system where anything you build once is instantly available to everything else. The algorithm knows how to use custom Arbo actions. Council agents can invoke them. The PI Upgrade Skill can recommend which actions need updating. The system compounds rather than fragments.
Actionable Takeaways
- Decompose your existing recon and automation scripts into discrete Arbo-style actions with defined input/output contracts. Deploy each as a local CLI script or Cloudflare Worker so it can be chained into any future pipeline without modification.
- Build a "label and rate" quality-filtering action for your threat intelligence feeds. Apply it as a shared Arbo node across all your intelligence sources so that content quality — not source authority or recency — determines what surfaces in your workflow.
- Connect your most-used security pipelines to persistent sources and outputs to convert them into standing flows. Asset discovery pipelines, CVE monitoring, and detection feed ingestion are all strong candidates for promotion from on-demand pipelines to always-on flows.
Common Pitfalls
- Treating each pipeline as a monolith rather than a composition of discrete actions. When steps are tightly coupled, you cannot swap out a single component (e.g., replacing one subdomain enumerator with a better one) without rewriting the entire pipeline. Discreteness is the property that makes the system compounding rather than brittle.
- Skipping the "label and rate" quality gate and relying on source reputation or volume as a proxy for signal quality. Miessler's Surface example demonstrates that this produces worse outcomes — high-authority low-signal content crowds out high-signal content from unknown sources. Quality scoring must be applied uniformly at the action level, not inferred from metadata.
Conclusion
Daniel Miessler’s PAI framework is built on a single compounding insight: when every capability you build flows through one unified harness, nothing is ever built twice. The Council debate system, the PAI algorithm’s ideal-state verification loop, the Arbo pipeline architecture, and the PI Upgrade Skill are not independent tools — they are components of a system that gets more powerful with every addition.
For security engineers, the timing argument is urgent. The consultancies and enterprises that encode practitioner knowledge into automated processes first will capture the leverage. PAI is the individual practitioner’s response: a structured, open-source framework that turns personal expertise into a compounding, automatable asset before someone else does it for you.
The three moves that matter most: unify your existing tools into a single harness, replace single-shot prompting with Council debate and iterative depth for high-stakes analysis, and start encoding your repeatable workflows as discrete Arbo actions now.
Explore related material on agentic AI techniques and AI agent security considerations, and check out discussions on prompt engineering for security practitioners on this site.
References & Tools
- Excalidraw — Open-source virtual whiteboard for diagrams; cited as an example of AI feature delivery that lacks programmatic API access. ↩
- Claude Code — Anthropic's agentic coding tool; the execution backbone on which PAI's skills, agents, and algorithm run as Markdown context and tasks. ↩
- Cloudflare Workers — Serverless execution platform used to host Arbo action nodes in the cloud, making modular pipeline components remotely callable without a local runtime. ↩
Questions from the audience
Related deep dives
Kinetic Risk: Securing and Governing Physical AI in the Wild | [un]prompted 2026
Securing Workspace GenAI at Google Speed | [un]prompted 2026
The AI Security Larsen Effect - How to Stop the Feedback Loop | [un]prompted 2026