Security topic

LLM Evaluation

All Deep Dives For Infosec Conference Talks Covering LLM Evaluation. Talks analyzed in full.

4 deep dives

2 conferences

Latest deep dives

Unprompted 2026

Tenderizing the Target | [un]prompted 2026

Learn how NVIDIAs Project Marinade uses LLM coding agents to inject realistic, tunable vulnerabilities into real codebases - giving you ground-truth benchmarks to evaluate your security tools.

Aaron Grattafiori Skyler Bingham 22 April 2026

Unprompted 2026

Guardrails beyond Vibes | [un]prompted 2026

Learn how Stripe built and deployed two production AI security agents with multi-agent architecture, LLM-as-judge eval pipelines, and phased rollout.

Jeffrey Zhang Siddh Shah 3 April 2026

Unprompted 2026

Security Guidance as a Service | [un]prompted 2026

Learn how Adobe built a RAG-powered security guidance platform delivering org-specific recommendations across Jira, Slack, and IDE at scale.

Shruti Datta Gupta Chandrani Mukherjee 1 April 2026

Unprompted 2026

The Hard Part Isn't Building the Agent: Measuring Effectiveness

Learn why precision and recall fail for autonomous AI security agents — and how rubric-based LLM judge evaluation gives your team a reliable deployment bar.

Joshua Saxe 31 March 2026