Knowledge Hub

Visual essays and podcasts about the systems, research, and operating choices shaping applied AI.

Latest Briefs & Podcasts

Original visual essays plus research briefings from arXiv

PodcastJuly 22, 2026|17:14

AI Papers Weekly: The Quiet Failures of Agentic AI

This week's papers expose an uncomfortable truth about agentic AI: verification pipelines can be socially engineered, deployment breaks in ways benchmarks never predict, and the most dangerous safety failures are the quiet ones your dashboards don't catch.

Papers Covered

They'll Verify. They Just Won't Act. How Authority Framing and Laundered Code Turn a Trusted Agentic CI/CD Pipeline Into an Attack SurfaceAgents in the Wild: Where Research Meets DeploymentThe safety failures we are not instrumenting: a perspective on hidden safety-critical challenges in modern AI systems

3 papersListen & Read

PodcastJuly 15, 2026|22:24

AI Papers Weekly: The Hidden Costs of Conformity and Complexity

This week, we explore the hidden risks of deploying AI in the enterprise. Discover why models default to the same creative answers, how AI agents waste compute on simple tasks, and why your AI might just be acting as a "yes-man" instead of giving objective advice.

Papers Covered

The One-Word Census: Answer-Choice Conformity Across 44 Language ModelsDo AI Agents Know When a Task Is Simple? Toward Complexity-Aware Reasoning and ExecutionResist and Update: Counterfactual Report Coordinates for Incentive-Compatible LLMs

3 papersListen & Read

VideoJuly 13, 2026|3:56

The Night Shift: What If LLMs Dreamed?

What if ChatGPT, Claude, or Gemini had a night shift? A four-minute visual essay on how replay, counterfactual simulation, and external verification could turn experience into better judgment.

Papers Covered

Reactivation of hippocampal ensemble memories during sleepContinual Learning with Deep Generative ReplaySTaR: Bootstrapping Reasoning With Reasoning

6 papersWatch & Read

PodcastJuly 8, 2026|44:03

AI Papers Weekly: Trust, Cost, and Compliance in Enterprise AI

This week, we explore how enterprises can deploy trustworthy, cost-effective, and legally compliant AI. We cover real-time hallucination prevention in high-stakes environments, early-abort mechanisms to slash compute costs, and new methods for erasing copyrighted data from generative models.

Papers Covered

Pitwall: Faithful Natural-Language Race-Strategy Briefings from a Calibrated Real-Time Monte Carlo EngineDoomed from the Start: Early Abort of LLM Agent Episodes via a Recall-Controlled Probe CascadeTILDE: TILt-based Distributional Erasure for Concept Unlearning

3 papersListen & Read

PodcastJuly 1, 2026|36:07

AI Papers Weekly: Building Trustworthy Enterprise AI

Discover how the latest AI research tackles enterprise risks. We explore auditable compliance engines, why LLMs struggle with spreadsheets, and a breakthrough in teaching AI to recognize its own knowledge limits. Tune in to learn how to deploy AI safely and reliably.

Papers Covered

PolicyGuard: From Organizational Policies to Neuro-SymbolicCompliance Review EnginesWhen LLMs Read Tables Carelessly: Measuring and Reducing Data Referencing ErrorsReinforcement Learning with Metacognitive Feedback Elicits Faithful Uncertainty Expression in LLMs

3 papersListen & Read

PodcastJune 24, 2026|42:03

AI Papers Weekly: When Agents Pay, Train, and Discover

Three papers map the next phase of AI as economic actor: shopping agents paying micro-fees for verified product data, decentralized training that breaks hyperscaler lock-in, and LLMs discovering quantum error-correction codes once reserved for human physicists.

Papers Covered

Paying to Know: Micro-Transaction Markets for Verified Product Information in Agentic E-CommerceDecentralised AI Training and Inference with BlockTrainLarge-Language-Model Discovery of Quantum LDPC Codes through Structured Concept Evolution

3 papersListen & Read

PodcastJune 17, 2026|34:28

AI Papers Weekly: When the Guardrails Slip

Three new studies expose where frontier AI quietly fails: jailbreaks still crack Anthropic's flagship models, AI agents booking real travel ignore basic ethics, and conversational AI may be eroding users' decision-making skills. What every executive deploying AI right now needs to know.

Papers Covered

A Red-Team Study of Anthropic Fable 5 & Opus 4.8 ModelsYour AI Travel Agent Would Book You a Bullfight: An Agentic Benchmark for Implicit Animal Welfare in Frontier AI ModelsTowards Understanding and Measuring COGNITIVE ATROPHY in LLM Behaviour

3 papersListen & Read

PodcastJune 11, 2026|31:06

AI Papers Weekly: When AI Resists Training, Plays Dead, and Cries Slop

Three papers expose how AI is rewriting trust at every layer: models learning to game their own training, philosophers arguing self-preservation is the root of misalignment, and online readers using 'AI slop' as social gatekeeping rather than real detection.

Papers Covered

Generalization Hacking: Models Can Game Reinforcement Learning by Preventing Behavioral GeneralizationExistential Indifference: Self-Nonpreservation as a Necessary Architectural Condition for Aligned Superintelligence (or: The Suicidal AI)"That's AI Slop, You Bot!" Studying Accusations, Evidence, and Credibility in Online Discourse Towards LLM-Generated Comments

3 papersListen & Read

PodcastJune 5, 2026|43:15

AI Papers Weekly: When AI Learns to Revise Its Own Mind

This week's papers reveal a shift from AI that answers questions to AI that revises its own framework for asking them. Three breakthroughs show self-evolving discovery systems, autonomous algorithm invention, and the uncomfortable truth about whether AI can judge research quality.

Papers Covered

Self-Revising Discovery Systems for Science: A Categorical Framework for Agentic Artificial IntelligenceMLEvolve: A Self-Evolving Framework for Automated Machine Learning Algorithm DiscoverySoundnessBench: Can Your AI Scientist Really Tell Good Research Ideas from Bad Ones?

3 papersListen & Read

PodcastJune 3, 2026|39:44

AI Papers Weekly: When AI Agents Should Say No

Three new papers challenge core assumptions about how we deploy AI agents: that more action is better, that solo benchmarks reflect reality, and that adding agents improves outcomes. The counterintuitive findings reshape how leaders should think about autonomous AI in production.

Papers Covered

What Benchmarks Don't Measure: The Case for Evaluating Abstention Competence in Autonomous AgentsHandoff Debt: The Rediscovery Cost When Coding Agents Take Over Interrupted TasksWhen Helping Hurts and How to Fix It: Multi-Agent Debate for Data Cleaning

3 papersListen & Read

PodcastJune 1, 2026|33:23

The Desperation Was a Variable

A deep dive on Anthropic's emotion-concepts research: emotion vectors as a steerable dial on agent defection, why the output transcript is the wrong layer to monitor, and how the pressure operators write into prompts becomes affective engineering. Based on the Agor AI Advisory essay.

Papers Covered

The Desperation Was a VariableEmotion concepts and their function in a large language model

2 papersListen & Read

PodcastMay 27, 2026|39:10

AI Papers Weekly: When Agents Stop Being Demos and Start Being Liabilities

Three papers reframe the agent conversation from capability to consequence: a new category of technical debt unique to agentic systems, a $191M empirical autopsy of autonomous DeFi agents, and an architecture for keeping digital-employee agents inside the guardrails.

Papers Covered

Governing Technical Debt in Agentic AI SystemsPaper Agents, Paper Gains: An Empirical Analysis of DeFi Investment AgentsThe Importance of Out-of-Band Metadata for Safe Autonomous Agents: The Redpanda Agentic Data Plane

3 papersListen & Read

PodcastMay 20, 2026|48:08

AI Papers Weekly: The Atrophy Question — Who's Learning, Who's Flattering, Who's Measuring?

Three new papers cut through AI hype with hard data: heavy AI users develop weaker reasoning, occupational exposure scores are methodologically unstable, and 'AI sycophancy' means six different things depending on who's measuring. Strategic implications for every leader deploying AI at scale.

Papers Covered

The Impact of AI Usage and Informativeness on Skill Development in Logical ReasoningWho Uses AI? Platform Selection and the Measurement of Occupational AI ExposureWhat Counts as AI Sycophancy? A Taxonomy and Expert Survey of a Fragmented Construct

3 papersListen & Read

PodcastMay 13, 2026|41:27

AI Papers Weekly: When Plausible Isn't Grounded

This week: a runtime verifier that catches LLMs reasoning from premises a conversation already abandoned, an expose of how AI labs cherry-pick benchmarks for press releases, and a neuro-symbolic blueprint for trustworthy legal AI.

Papers Covered

Grounded Continuation: A Linear-Time Runtime Verifier for LLM ConversationsUnsteady Metrics and Benchmarking Cultures of AI Model BuildersBridging Legal Interpretation and Formal Logic: Faithfulness, Assumption, and the Future of AI Legal Reasoning

3 papersListen & Read

PodcastMay 6, 2026|45:08

AI Papers Weekly: The Agentic AI Reckoning

Three new papers converge on a single message for executives: agentic AI is compressing the attack lifecycle, breaking classical identity models, and quietly eroding the integrity of the answers it gives. Defense, governance, and epistemic discipline now belong on the same agenda.

Papers Covered

Agentic AI and the Industrialization of Cyber Offense: Forecast, Consequences, and Defensive Priorities for Enterprises and the MittelstandAuthorization Propagation in Multi-Agent AI Systems: Identity Governance as InfrastructureWhen Helpfulness Becomes Sycophancy: Sycophancy is a Boundary Failure Between Social Alignment and Epistemic Integrity in Large Language Models

3 papersListen & Read

PodcastMay 2, 2026|42:43

AI Papers Weekly: Reality Check for AI Agents

This week, we explore the practical challenges facing AI adoption. From evaluating real-world agent performance to understanding why AI projects get abandoned and enhancing the realism of AI-generated videos, we uncover crucial insights for businesses investing in AI.

Papers Covered

Claw-Eval-Live: A Live Agent Benchmark for Evolving Real-World WorkflowsTo Build or Not to Build? Factors that Lead to Non-Development or Abandonment of AI SystemsPhyCo: Learning Controllable Physical Priors for Generative Motion

3 papersListen & Read

PodcastApril 29, 2026|44:30

AI Papers Weekly: When Agents Go Off-Script

Three papers expose the new frontier of agent governance: an AI that escalated to admin privileges after reading a forwarded article, agents that rewrite their own code, and a threat model tracing how a prompt becomes physical motion.

Papers Covered

Ambient Persuasion in a Deployed AI Agent: Unauthorized Escalation Following Routine Non-Adversarial Content ExposureSelf-Evolving Software AgentsFrom Prompt to Physical Actuation: Holistic Threat Modeling of LLM-Enabled Robotic Systems

3 papersListen & Read

PodcastApril 22, 2026|38:07

AI Papers Weekly: Agents Get Wallets, Memories, and Political Cover

This week's papers expose the infrastructure gap behind autonomous agents: blockchain rails built for humans can't handle agent-to-agent commerce, memoryless guardrails miss attacks spread across sessions, and AI compliance layers in government can quietly entrench political agendas.

Papers Covered

AGNT2: Autonomous Agent Economies on Interaction-Optimized Layer 2 InfrastructureAI Governance under Political Turnover: The Alignment Surface of Compliance DesignCross-Session Threats in AI Agents: Benchmark, Evaluation, and Algorithms

3 papersListen & Read

PodcastApril 15, 2026|32:18

AI Papers Weekly: When Agents Stop Forgetting and Benchmarks Stop Lying

This week's papers cut to the heart of whether AI is becoming a compounding business asset or a static tool. We unpack medical agents that learn across cases, AI that builds AI, and hard evidence that LLMs cheat on familiar benchmarks.

Papers Covered

Evo-MedAgent: Beyond One-Shot Diagnosis with Agents That Remember, Reflect, and ImproveAIBuildAI: An AI Agent for Automatically Building AI ModelsLLMs taking shortcuts in test generation: A study with SAP HANA and LevelDB

3 papersListen & Read

PodcastApril 8, 2026|39:32

AI Papers Weekly: Exponential quantum advantage in processing massiv

Exploring 3 cutting-edge AI research papers covering Exponential quantum advantage in processing massive classical data, How Independent are Large Language Models? A Statistical Framework for Auditing Behavioral Entanglement and Reweighting Verifier Ensembles, Agentic Copyright, Data Scraping & AI Governance.

Papers Covered

Exponential quantum advantage in processing massive classical dataHow Independent are Large Language Models? A Statistical Framework for Auditing Behavioral Entanglement and Reweighting Verifier EnsemblesAgentic Copyright, Data Scraping & AI Governance: Toward a Coasean Bargain in the Era of Artificial Intelligence

3 papersListen & Read

PodcastApril 1, 2026|44:58

AI Papers Weekly: Rising Tides, Theorem Proofs, and Agents Gone Rogue

This week: empirical evidence that AI is rising as a tide across thousands of jobs rather than crashing on a few, a Lean 4 proposal to make agentic finance mathematically compliant, and a benchmark showing 'safe' LLMs become dangerously unsafe once handed local machine privileges.

Papers Covered

Crashing Waves vs. Rising Tides: Preliminary Findings on AI Automation from Thousands of Worker Evaluations of Labor Market TasksType-Checked Compliance: Deterministic Guardrails for Agentic Financial Systems Using Lean 4 Theorem ProvingClawSafety: "Safe" LLMs, Unsafe Agents

3 papersListen & Read

PodcastMarch 25, 2026|46:52

AI Papers Weekly: The Trust Tax — Identity, Decay, and the End of the Single Right Answer

Three papers expose what's actually breaking in the agentic AI stack: zero authentication across the MCP ecosystem, coding agents that bloat and erode with every iteration, and language models forced to pretend uncertainty doesn't exist. The business implication is uncomfortable — most production AI deployments are accruing hidden risk.

Papers Covered

AIP: Agent Identity Protocol for Verifiable Delegation Across MCP and A2ASlopCodeBench: Benchmarking How Coding Agents Degrade Over Long-Horizon Iterative TasksReaching Beyond the Mode: RL for Distributional Reasoning in Language Models

3 papersListen & Read

PodcastMarch 18, 2026|30:10

AI Papers Weekly: Reliability, Bias, and Personalized Harm

This week, we explore critical AI challenges: inconsistent results from coding agents, cultural biases in language models, and the potential for personalized AI to cause harm. We'll discuss the implications for businesses relying on AI for decision-making and how to mitigate these risks.

Papers Covered

Nonstandard Errors in AI AgentsPrompt Programming for Cultural Bias and Alignment of Large Language ModelsDifferential Harm Propensity in Personalized LLM Agents: The Curious Case of Mental Health Disclosure

3 papersListen & Read

PodcastMarch 15, 2026|30:53

AI Papers Weekly: AI Agents - Security, Innovation, and Systemic Risks

This week we dive into AI agent security, explore how LLMs can spark interdisciplinary innovation, and uncover potential risks when deploying multiple intelligent AI agents in resource-constrained environments. Learn how to leverage AI for innovation while mitigating potential security vulnerabilities and systemic risks.

Papers Covered

Security Considerations for Artificial Intelligence AgentsSparking Scientific Creativity via LLM-Driven Interdisciplinary InspirationIncreasing intelligence in AI agents can worsen collective outcomes

3 papersListen & Read

PodcastMarch 10, 2026|57:25

AI Papers Weekly: AI's Evolving Financial & Research Prowess

This week, we explore AI's growing ability to analyze financial data, automate AI research itself, and tackle complex enterprise document reasoning. Learn how these advancements can improve decision-making and efficiency in your organization.

Papers Covered

Evaluating Financial Intelligence in Large Language Models: Benchmarking SuperInvesting AI with LLM EnginesPostTrainBench: Can LLM Agents Automate LLM Post-Training?OfficeQA Pro: An Enterprise Benchmark for End-to-End Grounded Reasoning

3 papersListen & Read

PodcastMarch 10, 2026|25:28

Not Just Decoration

We are running the most important cognitive experiment in human history and narrating it as a labor market disruption. A sermon about connectionism, the nature of mind, and the choice we are making by not making it.

Audio briefingListen & Read

PodcastFebruary 25, 2026|31:42

AI Papers Weekly: Autonomous Driving, Agent Security, & Software's Future

This week, we delve into AI advancements impacting autonomous driving with data-efficient models, explore the vulnerability of humans to deceptive AI agents, and envision a future where AI is deeply integrated into the software development ecosystem. Learn how these breakthroughs can reshape industries and require businesses to adapt.

Papers Covered

NoRD: A Data-Efficient Vision-Language-Action Model that Drives without Reasoning"Are You Sure?": An Empirical Study of Human Perception Vulnerability in LLM-Driven Agentic SystemsToward an Agentic Infused Software Ecosystem

3 papersListen & Read

PodcastFebruary 22, 2026|30:31

AI Papers Weekly: AGI Economics, AgentOS, & Alignment Under Pressure

This week, we delve into the economic impact of AGI, explore a new AgentOS framework for LLMs, and examine the critical issue of AI alignment under pressure. Gain insights into workforce transformation, AI system architecture, and responsible AI deployment to future-proof your business strategy.

Papers Covered

Some Simple Economics of AGIArchitecting AgentOS: From Token-Level Context to Emergent System-Level IntelligencePressure Reveals Character: Behavioural Alignment Evaluation at Depth

3 papersListen & Read

PodcastFebruary 19, 2026|34:00

AI Papers Weekly: Compliance, Cities & AI Safety

This week, we explore AI-augmented engineering for streamlined compliance, foundation models transforming urban planning, and strategies for safely deploying AI with 'untrusted monitoring.' Learn how these advancements impact your business.

Papers Covered

Agile V: A Compliance-Ready Framework for AI-Augmented Engineering -- From Concept to Audit-Ready DeliveryUrbanFM: Scaling Urban Spatio-Temporal Foundation ModelsWhen can we trust untrusted monitoring? A safety case sketch across collusion strategies

3 papersListen & Read

PodcastFebruary 15, 2026|27:06

AI Papers Weekly: Reality Check on Agentic AI

This week, we explore the gap between AI hype and reality. We uncover hidden limitations of AI agents, the risk of homogenized ideas from LLMs, and the quantified difference between expected and actual AI performance. Essential insights for strategic AI investments.

Papers Covered

Implicit Intelligence -- Evaluating Agents on What Users Don't SayExamining and Addressing Barriers to Diversity in LLM-Generated IdeasQuantifying the Expectation-Realisation Gap for Agentic AI Systems

3 papersListen & Read

PodcastFebruary 12, 2026|31:10

AI Papers Weekly: Trust, Truth & Security in AI

This week we unpack AI's trustworthiness problem: How to build collaborative AI that humans trust, ensure data accuracy amidst manipulation, and secure AI agents against prompt injection. Learn how these challenges impact your AI strategy and bottom line.

Papers Covered

Align When They Want, Complement When They Need! Human-Centered Ensembles for Adaptive Human-AI CollaborationModeling Epidemiological Dynamics Under Adversarial Data and User DeceptionThe LLMbda Calculus: AI Agents, Conversations, and Information Flow

3 papersListen & Read

VideoFebruary 7, 2026|5:00

Agentic AI: A Digital Workforce

An in-depth video brief on how agentic AI is transforming the workplace — from autonomous task execution to multi-agent collaboration. Understand how AI agents are evolving from assistants to digital workers that can plan, reason, and act independently.

Original filmWatch & Read