The AI Fundamentalists

Dr. Andrew Clark & Sid Mangalik
undefined
Jan 6, 2026 • 40min

Why validity beats scale when building multi‑step AI systems

In this episode, Dr. Sebastian (Seb) Benthall joins us to discuss research from his and Andrew's paper entitled “Validity Is What You Need” for agentic AI that actually works in the real world. Our discussion connects systems engineering, mechanism design, and requirements to multi‑step AI that creates enterprise impact to achieve measurable outcomes.Defining agentic AI beyond LLM hypeLimits of scale and the need for multi‑step controlTool use, compounding errors, and guardrailsSystems engineering patterns for AI reliabilityPrincipal–agent framing for governanceMechanism design for multi‑stakeholder alignmentRequirements engineering as the crux of validityHybrid stacks: LLM interface, deterministic solversRegression testing through model swaps and driftMoving from universal copilots to fit‑for‑purpose agentsYou can also catch more of Seb's research on our podcast. Tune in to Contextual integrity and differential privacy: Theory versus application.What did you think? Let us know.Do you have a question or a discussion topic for the AI Fundamentalists? Connect with them to comment on your favorite topics: LinkedIn - Episode summaries, shares of cited articles, and more. YouTube - Was it something that we said? Good. Share your favorite quotes. Visit our page - see past episodes and submit your feedback! It continues to inspire future episodes.
undefined
Dec 22, 2025 • 42min

2025 AI review: Why LLMs stalled and the outlook for 2026

Here it is! We review the year where scaling large AI models hit its ceiling, Google reclaimed momentum with efficient vertical integration, and the market shifted from hype to viability. Join us as we talk about why human-in-the-loop is failing, why generative AI agents validating other agents compounds errors, and how small expert data quietly beat the big models.• Google’s resurgence with Gemini 3.0 and TPU-driven efficiency• Monetization pressures and ads in co-pilot assistants• Diminishing returns from LLM scaling• Human-in-the-loop pitfalls and incentives• Agents vs validation and compounding error• Small, high-quality data outperforming synthetic• Expert systems, causality, and interpretability• Research trends return toward statistical rigor• 2026 outlook for ROI, governance, and trustWe remain focused on the responsible use of AI. And while the market continues to adjust expectations for return on investment from AI, we're excited to see companies exploring "return on purpose" as the new foray into transformative AI systems for their business. What are you excited about for AI in 2026? What did you think? Let us know.Do you have a question or a discussion topic for the AI Fundamentalists? Connect with them to comment on your favorite topics: LinkedIn - Episode summaries, shares of cited articles, and more. YouTube - Was it something that we said? Good. Share your favorite quotes. Visit our page - see past episodes and submit your feedback! It continues to inspire future episodes.
undefined
Dec 9, 2025 • 50min

Big data, small data, and AI oversight with David Sandberg

In this episode, we look at the actuarial principles that make models safer: parallel modeling, small data with provenance, and real-time human supervision. To help us, long-time insurtech and startup advisor David Sandberg, FSA, MAAA, CERA, joins us to share more about his actuarial expertise in data management and AI. We also challenge the hype around AI by reframing it as a prediction machine and putting human judgment at the beginning, middle, and end. By the end, you might think about “human-in-the-loop” in a whole new way.• Actuarial valuation debates and why parallel models win• AI’s real value: enhance and accelerate the growth of human capital• Transparency, accountability, and enforceable standards• Prediction versus decision and learning from actual-to-expected• Small data as interpretable, traceable fuel for insight• Drift, regime shifts, and limits of regression and LLMs• Mapping decisions, setting risk appetite, and enterprise risk management (ERM) for AI• Where humans belong: the beginning, middle, and end of the system• Agentic AI complexity versus validated end-to-end systems• Training judgment with tools that force critique and citationCultural references:Foundation, AppleTVThe Feeling of Power, Isaac AsimovPlayer Piano, Kurt VonnegutFor more information, see Actuarial and data science: Bridging the gap.What did you think? Let us know.Do you have a question or a discussion topic for the AI Fundamentalists? Connect with them to comment on your favorite topics: LinkedIn - Episode summaries, shares of cited articles, and more. YouTube - Was it something that we said? Good. Share your favorite quotes. Visit our page - see past episodes and submit your feedback! It continues to inspire future episodes.
undefined
Nov 11, 2025 • 38min

Metaphysics and modern AI: What is space and time?

We explore how space and time form a single fabric, testing our daily beliefs through questions about free-fall, black holes, speed, and momentum to reveal what models get right and where they break. To help us, we’re excited to have our friend David Theriault, a science and sci-fi afficionado; and our resident astrophysicist, Rachel Losacco, to talk about practical exploration in space and time. They'll even unpack a few concerns they have about how space and time were depicted in the movie Interstellar (2014).Highlights:• Introduction: Why fundamentals beat shortcuts in science and AI• Time as experience versus physical parameter• Plato’s ideals versus Aristotle’s change as framing tools• Free-fall, G-forces, and what we actually feel• Gravity wells, curvature, and moving through space-time• Black holes, tidal forces, and spaghettification• Momentum and speed: Laser probe, photon momentum, and braking limits• Doppler shifts, time dilation, and length contraction• Why light’s speed stays constant across frames• Modeling causality and preparing for the next paradigmThis episode about space and time is the second in our series about metaphysics and modern AI. Each topic in the series is leading to the fundamental question, "Should AI try to think?" Step away from your keyboard and enjoy this journey with us. Previous episodes:Introduction: Metaphysics and modern AIWhat is reality?What did you think? Let us know.Do you have a question or a discussion topic for the AI Fundamentalists? Connect with them to comment on your favorite topics: LinkedIn - Episode summaries, shares of cited articles, and more. YouTube - Was it something that we said? Good. Share your favorite quotes. Visit our page - see past episodes and submit your feedback! It continues to inspire future episodes.
undefined
Oct 27, 2025 • 39min

Metaphysics and modern AI: What is reality?

In the first episode of our series on metaphysics, Michael Herman joins us from Episode #14 on “What is consciousness?” to discuss reality. More specifically, the question of objects in reality.  The team explores Plato’s forms, Aristotle’s realism, emergence, and embodiment to determine whether AI models can approximate from what humans uniquely experience.Defining objects via properties, perception, and persistenceBanana and circle examples for identity and idealsPlato versus Aristotle on forms and realismShip of Theseus and continuity through changeSamples, complexes, and emergence in systemsEmbodiment, consciousness, and why LLMs lack lived unityExistentialist focus on subjective reality and meaningWhy metaphysics matters for AI governance and safetyJoin us for the next part of the metaphysics series to explore space and time. Subscribe now.What we're reading:[Mumford's] Metaphysics: A Very Short Introduction (Andrew)What did you think? Let us know.Do you have a question or a discussion topic for the AI Fundamentalists? Connect with them to comment on your favorite topics: LinkedIn - Episode summaries, shares of cited articles, and more. YouTube - Was it something that we said? Good. Share your favorite quotes. Visit our page - see past episodes and submit your feedback! It continues to inspire future episodes.
undefined
Oct 7, 2025 • 16min

Metaphysics and modern AI: What is thinking? - Series Intro

This episode is the intro to a special project by The AI Fundamentalists’ hosts and friends. We hope you're ready for a metaphysics mini‑series to explore what thinking and reasoning really mean and how those definitions should shape AI research. Join us for thought-provoking discussions as we tackle basic questions: What is metaphysics and its relevance to AI? What constitutes reality? What defines thinking? How do we understand time? And perhaps most importantly, should AI systems attempt to "think," or are we approaching the entire concept incorrectly? Show notes:• Why metaphysics matters for AI foundations• Definitions of thinking from peers and what they imply• Mixture‑of‑experts, ranking, and the illusion of reasoning• Turing test limits versus deliberation and causality• Towers of Hanoi, agentic workflows, and brittle stepwise reasoning• Math, context, and multi‑component system failures• Proposed plan for the series and areas to explore• Invitation for resources, critiques, and future guestsWe hope you enjoy this philosophical journey to examine the intersection of ancient philosophical questions and cutting-edge technology.What did you think? Let us know.Do you have a question or a discussion topic for the AI Fundamentalists? Connect with them to comment on your favorite topics: LinkedIn - Episode summaries, shares of cited articles, and more. YouTube - Was it something that we said? Good. Share your favorite quotes. Visit our page - see past episodes and submit your feedback! It continues to inspire future episodes.
undefined
Sep 30, 2025 • 35min

AI in practice: Guardrails and security for LLMs

In this episode, we talk about practical guardrails for LLMs with data scientist Nicholas Brathwaite. We focus on how to stop PII leaks, retrieve data, and evaluate safety with real limits. We weigh managed solutions like AWS Bedrock against open-source approaches and discuss when to skip LLMs altogether.• Why guardrails matter for PII, secrets, and access control• Where to place controls across prompt, training, and output• Prompt injection, jailbreaks, and adversarial handling• RAG design with vector DB separation and permissions• Evaluation methods, risk scoring, and cost trade-offs• AWS Bedrock guardrails vs open-source customization• Domain-adapted safety models and policy matching• When deterministic systems beat LLM complexityThis episode is part of our "AI in Practice” series, where we invite guests to talk about the reality of their work in AI. From hands-on development to scientific research, be sure to check out other episodes under this heading in our listings.Related research:Building trustworthy AI: Guardrail technologies and strategies (N. Brathwaite)Nic's GitHubWhat did you think? Let us know.Do you have a question or a discussion topic for the AI Fundamentalists? Connect with them to comment on your favorite topics: LinkedIn - Episode summaries, shares of cited articles, and more. YouTube - Was it something that we said? Good. Share your favorite quotes. Visit our page - see past episodes and submit your feedback! It continues to inspire future episodes.
undefined
Sep 4, 2025 • 42min

AI in practice: LLMs, psychology research, and mental health

We’re excited to have Adi Ganesan, a PhD researcher at Stony Brook University, the University of Pennsylvania, and Vanderbilt, on the show. We’ll talk about how large language models LLMs) are being tested and used in psychology, citing examples from mental health research. Fun fact: Adi was Sid's research partner during his Ph.D. program.Discussion highlightsLanguage models struggle with certain aspects of therapy including being over-eager to solve problems rather than building understandingCurrent models are poor at detecting psychomotor symptoms from text alone but are oversensitive to suicidality markersCognitive reframing assistance represents a promising application where LLMs can help identify thought trapsProper evaluation frameworks must include privacy, security, effectiveness, and appropriate engagement levelsTheory of mind remains a significant challenge for LLMs in therapeutic contexts; example: The Sally-Anne Test.Responsible implementation requires staged evaluation before patient-facing deploymentResourcesTo learn more about Adi's research and topics discussed in this episode, check out the following resources:Large language models could change the future of behavioral healthcare: a proposal for responsible development and evaluationTherapist Behaviors paper: [2401.00820] A Computational Framework for Behavioral Assessment of LLM Therapists Cognitive reframing paper: Cognitive Reframing of Negative Thoughts through Human-Language Model Interaction - ACL Anthology Faux Pas paper: Testing theory of mind in large language models and humans | Nature Human Behaviour READI: Readiness Evaluation for Artificial Intelligence-Mental Health Deployment and Implementation (READI): A Review and Proposed Framework Large language models could change the future of behavioral healthcare: A proposal for responsible development and evaluation | npj Mental Health Research GPT-4’s Schema of Depression: Explaining GPT-4’s Schema of Depression Using Machine Behavior AnalysisAdi’s Profile: Adithya V Ganesan - Google Scholar What did you think? Let us know.Do you have a question or a discussion topic for the AI Fundamentalists? Connect with them to comment on your favorite topics: LinkedIn - Episode summaries, shares of cited articles, and more. YouTube - Was it something that we said? Good. Share your favorite quotes. Visit our page - see past episodes and submit your feedback! It continues to inspire future episodes.
undefined
Aug 19, 2025 • 23min

LLM scaling: Is GPT-5 near the end of exponential growth?

The release of OpenAI GPT-5 marks a significant turning point in AI development, but maybe not the one most enthusiasts had envisioned. The latest version seems to reveal the natural ceiling of current language model capabilities with incremental rather than revolutionary improvements over GPT-4. Sid and Andrew call back to some of the model-building basics that have led to this point to give their assessment of the early days of the GPT-5 release.• AI's version of Moore's Law is slowing down dramatically with GPT-5• OpenAI appears to be experiencing an identity crisis, uncertain whether to target consumers or enterprises• Running out of human-written data is a fundamental barrier to continued exponential improvement• Synthetic data cannot provide the same quality as original human content• Health-related usage of LLMs presents particularly dangerous applications• Users developing dependencies on specific model behaviors face disruption when models change• Model outputs are now being verified rather than just inputs, representing a small improvement in safety• The next phase of AI development may involve revisiting reinforcement learning and expert systems* Review the GPT-5 system card for further informationFollow The AI Fundamentalists on your favorite podcast app for more discussions on the direction of generative AI and building better AI systems.This summary was AI-generated from the original transcript of the podcast that is linked to this episode.What did you think? Let us know.Do you have a question or a discussion topic for the AI Fundamentalists? Connect with them to comment on your favorite topics: LinkedIn - Episode summaries, shares of cited articles, and more. YouTube - Was it something that we said? Good. Share your favorite quotes. Visit our page - see past episodes and submit your feedback! It continues to inspire future episodes.
undefined
Jul 22, 2025 • 37min

AI governance: Building smarter AI agents from the fundamentals, part 4

Sid Mangalik and Andrew Clark explore the unique governance challenges of agentic AI systems, highlighting the compounding error rates, security risks, and hidden costs that organizations must address when implementing multi-step AI processes. Show notes:• Agentic AI systems require governance at every step: perception, reasoning, action, and learning• Error rates compound dramatically in multi-step processes - a 90% accurate model per step becomes only 65% accurate over four steps• Two-way information flow creates new security and confidentiality vulnerabilities. For example, targeted prompting to improve awareness comes at the cost of performance. (arXiv, May 24, 2025)• Traditional governance approaches are insufficient for the complexity of agentic systems• Organizations must implement granular monitoring, logging, and validation for each component• Human-in-the-loop oversight is not a substitute for robust governance frameworks• The true cost of agentic systems includes governance overhead, monitoring tools, and human expertiseMake sure you check out Part 1: Mechanism design, Part 2: Utility functions, and Part 3: Linear programming. If you're building agentic AI systems, we'd love to hear your questions and experiences. Contact us.What we're reading:We took reading "break" this episode to celebrate Sid! This month, he successfully defended his Ph.D. Thesis on "Psychological Health and Belief Measurement at Scale Through Language." Say congrats!>>What did you think? Let us know.Do you have a question or a discussion topic for the AI Fundamentalists? Connect with them to comment on your favorite topics: LinkedIn - Episode summaries, shares of cited articles, and more. YouTube - Was it something that we said? Good. Share your favorite quotes. Visit our page - see past episodes and submit your feedback! It continues to inspire future episodes.

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!
App store bannerPlay store banner
Get the app