Dev and Doc: AI For Healthcare Podcast

Dev and Doc
undefined
Dec 27, 2025 • 43min

#32 2025 in Review: Our AI Healthcare Predictions and Hot Takes

Reviewing Dev & Doc's 2024/2025 AI Healthcare Predictions.What a year it's been! In this episode of Dev & Doc, we look back at the predictions we made almost 2 years ago. What did we get right? (And what AI developments did we completely overlook that occurred in 2025?)📺 Watch where it all began: Our Original 2024 AI Predictions EpisodeIt's going to be a fun one :) What are your predictions for 2026? Let us know!👋 Hey! If you are enjoying our conversations, reach out, share your thoughts and journey with us. Don't forget to subscribe whilst you're here!Timestamps:00:00 Highlight01:06 Ambient: Biggest game changer04:01 Open Source will catch up to closed source09:20 Big AI companies will fail10:52 There will be more trials involving large language models13:55 Industry will lead progress19:36 LLMs are not going to replace therapists or doctors22:17 AI psychosis and big tech27:19 People with AI replace people without AI29:12 Radiology AI will become more widespread31:00 Dev was way too optimistic about OpenAI; Google is coming for you33:05 Predictions we missed: GOOGLE KILLED EVERYONE37:10 Uprising of China Open source, xAI39:15 RAG-based search products like OpenEvidence, MedWise (UK), Prof ValmedThe Team:👨🏻‍⚕️ Doc - Dr. Joshua Au Yeung: LinkedIn🤖 Dev - Zeljko Kraljevic: Twitter/XReferences:• Nuraxi: https://www.nuraxi.ai/• EU's Earth twin: https://destination-earth.eu/• Blog on language representation of biology: Read here• Foresight GPT: The LancetConnect With Us:📺 YouTube🍎 Apple Podcasts✉️ Substack📧 Enquiries: Devanddoc@gmail.comCredits:🎞️ Editor: Dragan Kraljević (Instagram)🎨 Brand Design: Ana Grigorovici (Behance)
undefined
Dec 19, 2025 • 53min

#31 AI & Digital Twins: The Next Evolution for Personalised Medicine

In this episode of Dev and Doc, we deep dive into the world of Digital Twins. Popularised in engineering, we explore key concepts and ideas before looking to the future: how we can combine digital twins with today's powerful AI /GPT-based models (LLMs) and healthcare data to bring on a new revolution of healthcare to the world.This means the chance for every single person to create digital twins of themselves where they can understand their personal health, risks, disease trajectories, and treatment outcomes by simulating the future. This is the true promise of precision medicine for all. Crazy, right?Dev and Doc recently joined forces to build this exact vision in their start-up, Nuraxi.🚀 Nuraxi is a deep-tech company focused on advancing health and precision medicine through artificial intelligence and digital twin technology. https://www.nuraxi.ai/👋 Hey! If you are enjoying our conversations, reach out, share your thoughts and journey with us. Don't forget to subscribe whilst you're here :)Timestamps:00:00 - Intro: Digital Twin (DT)01:22 - Start / Introduction to DT08:28 - Levels of DTs18:33 - Using natural language to capture biology complexities and scales26:45 - First time in humanity: Combination of AI, compute, healthcare data, and wearables33:15 - Building Agentic Health Twins at Nuraxi38:15 - Combining AI and Digital Twins: GPT-based simulations of the future44:20 - To change healthcare, we must be able to predict the future49:10 - Future directions: From molecular and organ twins to Population TwinsThe Hosts:👨🏻‍⚕️ Doc - Dr. Joshua Au YeungLinkedIn Profile🤖 Dev - Zeljko KraljevicTwitter ProfileReferences:• Nuraxi: Website• EU's Earth Twin: Destination Earth• Blog on language representation of biology: Read here• Foresight GPT (The Lancet): Read PaperListen & Subscribe:📺 YouTube🎧 Spotify🍏 Apple Podcasts📝 SubstackCredits:📧 Enquiries: Devanddoc@gmail.com🎞️ Editor: Dragan Kraljević (Instagram)🎨 Brand Design: Ana Grigorovici (Behance)
undefined
Oct 22, 2025 • 37min

#30 The Age of AI agents in healthcare (Live Podcast at HETT 2025)

Join Josh and Zeljko live at HETT 2025 in London - covering the most exciting topics and highlights that are upcoming in AI for healthcare. Coming from the duo who are living and breathing AI for healthcare, and together, have worked across every area of healthTech - from the hospital frontlines, to university research, to NHS implementation, to building industry grade agents including AI scribes, computer control and digital twins, to product and compliance. This is one not to miss! 00:00 start and intro 2:15 What are AI agents? (and why they're different from chatbots) 3:52 AI scribes: the 150 company sprint to "scribe plus" features 8:02 AI psychosis and mental health - all LLMs reinforce delusional beliefs 9:34 Computer control: Automating hospital workflows by mimicking human actions 13:42 Digital twins for health are the future: A safer path forward? 18:40 How does the national health service become AI enabled? 22:22 closing remarks - Is AI in healthcare a hype or hope? 25:12 questions - digital twins for individuals or for cohorts? 26:52 questions - Lessons from building AVTs and digital twins for consumer space 29:02 questions - LLM clinical summarisation - risks and benefits 31:17 questions - ethics of AI vs Human errors. is it the same? 33:02 questions - challenges and barriers to AI deployment in NHS 👋 Hey! If you are enjoying our conversations, reach out, share your thoughts and journey with us. Don't forget to subscribe whilst you're here :) 👨🏻⚕️ Doc - Dr. Joshua Au Yeung - https://www.linkedin.com/in/dr-joshua-auyeung/ 🤖 Dev - Zeljko Kraljevic - https://twitter.com/zeljkokr Follow us: YT - https://youtube.com/@DevAndDoc Spotify - https://podcasters.spotify.com/pod/show/devanddoc Apple - https://podcasts.apple.com/gb/podcast/dev-and-doc-ai-for-healthcare-podcast/id1751495120 Substack - https://aiforhealthcare.substack.com/ For enquiries: 📧 Devanddoc@gmail.com Credits: 🎞️ Editor - Dragan Kraljević - https://www.instagram.com/dragan_kraljevic/ 🎨 Brand design and art direction - Ana Grigorovici - https://www.behance.net/anagrigorovici027d
undefined
Aug 22, 2025 • 55min

Everything you need to know about LLM benchmarks- Turing Test, OpenAI's Healthbench, ARC prize, LM arena

Whenever there was AI, there were benchmarks- from the turing test, to society-changing benchmarks like MNIST and ImageNet to modern problems like the ARC prize, benchmarked served a vital purpose to measure the performance of AI models. But something has shifted in modern times, in the LLM era have benchmarks lost their utility, becoming mere advertisement for big tech? Even seemingly more sophisticated benchmarks like LM Arena can be gamed by tech giants. We also deep dive into healthcare benchmarks like OpenAI's Healthbench (deeply problematic) and Microsoft's AI-DXO orchestrator agent for diagnosis. Where is this all going? How do we make the perfect benchmark? Or is the real work to be done afterwards in the real world?👋 Hey! If you are enjoying our conversations, reach out, share your thoughts and journey with us. Don't forget to subscribe whilst you're here :)---Timestamps00:00 Intro - The OG benchmarks - Turing test, MNIST, ImageNET06:40 Are large language models benchmarks similar to humans taking tests?10:05 Are we testing model capability vs production ready?12:00 LLM era - data contamination15:30 LM Arena - The leaderboard illusion paper - how big tech games benchmarks28:35 Goodhart's law - When a measure becomes a target, it ceases to be a good measure32:05 Some good benchmarks - games - Pokemon, ARC prize, Minecraft34:35 Medical benchmarks - OpenAI's healthbench has some big problems46:50 Microsoft AI-DXO orchestrator for case reports---Connect with UsYour Hosts:👨🏻‍⚕️ Doc - Dr. Joshua Au Yeung - LinkedIn🤖 Dev - Zeljko Kraljevic - TwitterFollow & Subscribe:YT: https://youtube.com/@DevAndDocSpotify: Follow us on SpotifyApple Podcasts: Listen on Apple PodcastsSubstack: https://aiforhealthcare.substack.com/For enquiries:📧 Devanddoc@gmail.com---Production Credits🎞️ Editor: Dragan Kraljević - Instagram🎨 Brand & Art: Ana Grigorovici - Behance
undefined
May 9, 2025 • 1h 1min

#28 AI agents explained - Manus AI, computer control, Agentic workflows (healthcare)

AI agents are here, but how did we get here in the first place? How do we build and leverage AI agents for high stakes domains like healthcare? In this episode of Dev and Doc, we go deep into the forest that is AI agents and computer control - starting from the "caveman" era of LLMs discovering tools, to cultivating intelligent models and agentic workflows. We dissect everyday agents like MANUS AI, and deep dive into how, where and when AI agents should be used. Are these agents hype or hope, is this actually the second deepseek moment?👋 Hey! If you are enjoying our conversations, reach out, share your thoughts and journey with us. Don't forget to subscribe whilst you're here :)Episode Timestamps:00:00 Highlight3:13 start / intro5:20 LLM's caveman era - tool usage6:46 Agents have autonomy and interact with environment11:15 workflows and agentic flows15:30 when should you be using an agent?24:27 vibe coding is like driving a car29:07 Demo - MANUS gathering financial trends, computer control35:55 Demo MANUS AI- website creation for Autism Assessment49:05 computer control factions- Freedom vs Process automation55:00 Autism website testing59:13 summary + endHosts:👨🏻‍⚕️Doc - Dr. Joshua Au Yeung - https://www.linkedin.com/in/dr-joshua-auyeung/🤖Dev - Zeljko Kraljevic https://twitter.com/zeljkokrFind us on:YT - https://youtube.com/@DevAndDocSpotify - https://podcasters.spotify.com/pod/show/devanddocApple- https://podcasts.apple.com/gb/podcast/dev-and-doc-ai-for-healthcare-podcast/id1751495120Substack- https://aiforhealthcare.substack.com/For enquiries:📧Devanddoc@gmail.comCredits:🎞️ Editor- Dragan Kraljević https://www.instagram.com/dragan_kraljevic/🎨Brand design and art direction - Ana Grigorovici https://www.behance.net/anagrigorovici027d
undefined
Feb 26, 2025 • 58min

#27 Exploring Claude Sonnet 3.7 for healthcare

Can Claude perform a range of complex clinical tasks? Dev and Doc are here to investigate.Claude sonnet 3.7 was released less than 48 hours ago, the model is highly intelligent and is one of the best we have seen in recent memory. Definitely passes the vibe check.We give some amazing examples of coding with claude with few shot prompts, and cover technical and clinical evaluations and share our first thoughts. We even tested claude to take a patient history!NB - PLEASE don't do this at home, obviously this is a demo and we do not in any way condone or recommend using an LLM as your doctor or healthcare provider, we are just demonstrating what the future could be. If you are sick, please seek a medical professional.👋 Hey! If you are enjoying our conversations, reach out, share your thoughts and journey with us. Don't forget to subscribe whilst you're here :)TIMESTAMPS00:00 start + highlights01:54 Introduction08:54 Benchmarks, state of the art14:44 guardrails, refusals, AI safety and catastrophic risks22:36 show and tell- great for coding and make video games!26:54 example hospital runner30:17 Medical use cases- clinical coding, biomedical entity extraction37:04 only medical example in Claude model card- still hallucinating citations38:37 making an anatomy app40:10 forecasting clinical diagnoses43:36 taking a medical history from a patient53:33 wrap up👨🏻‍⚕️Doc - Dr. Joshua Au Yeung - linkedin.com/in/dr-joshua-auyeung🤖Dev - Zeljko Kraljevic twitter.com/zeljkokrYT:youtube.com/@DevAndDocSpotify:podcasters.spotify.com/pod/show/devanddocApple:podcasts.apple.com/gb/podcast/dev-and-doc-ai-for-healthcare-podcast/id1751495120Substack:aiforhealthcare.substack.comFor enquiries - 📧 Devanddoc@gmail.com🎞️ Editor - Dragan Kraljević instagram.com/dragan_kraljevic🎨 Brand design - Ana Grigorovici behance.net/anagrigorovici027d
undefined
Feb 21, 2025 • 57min

#26 Is it still worth doing a PhD in 2025? (Computer Science / Machine Learning)

Is it still worth doing a PhD in 2025? Is the academic system broken in this publish-or-perish landscape? When is a PhD not worth pursuing? About this Episode In this Dev and Doc episode, Zeljko (now associate professor!) and Josh (doctor, PhD drop out) talk about the good and the bad of PhD life. They provide insight into the academic world with a focus on computer science and machine learning. 👋 Connect With Us! Hey! If you are enjoying our conversations, reach out, share your thoughts and journey with us. Don't forget to subscribe whilst you're here :) 🎙️ Hosts 👨🏻‍⚕️ Doc - Dr. Joshua Au Yeung - LinkedIn 🤖 Dev - Zeljko Kraljevic - Twitter ⏳ Timestamps 00:00 - Start and highlight 01:42 - Intro 03:11 - What made you pursue PhD in the first place 05:05 - Industry or PhD first 10:00 - Positives - Moonshots 17:03 - Positives - Access to world experts and collaboration 20:55 - Positives - Open source and open science 24:49 - Positives - A good environment enables a smooth PhD 27:04 - Negatives - You are a one-man show 31:33 - Negatives - Publish or Perish 45:44 - Bring your research closer to the audience through blogs and other media, journals are legacy media 51:20 - Verdict - Is a PhD still worth it in 2025? 📢 Follow Us LinkedIn Newsletter YouTube Spotify Apple Podcasts Substack 📧 Contact Us For enquiries - devanddoc@gmail.com 🎞️ Video Production 🎬 Editor - Dragan Kraljević - Instagram 🎨 Brand Design & Art Direction - Ana Grigorovici - Behance
undefined
Feb 7, 2025 • 1h 21min

#25 Testing Deepseek R1 on Complex Medical Tasks. Here's what we found. (GRPO explainer)

Dev and Doc put Deepseek R1 to the test in a technical and clinical deep dive. 👋 Hey! If you are enjoying our conversations, reach out, share your thoughts and journey with us. Don't forget to subscribe whilst you're here :) 👨🏻‍⚕️Doc - Dr. Joshua Au Yeung - https://www.linkedin.com/in/dr-joshua-au-yeung/ 🤖Dev - Zeljko Kraljevic https://twitter.com/zeljkokr TIMESTAMPS 00:00 Highlights 04:36 Intro 08:29 response from OpenAI, Anthropic- model training costs, tightening restrictions on China, pricing wars 13:13 what an open-source deepseek means for the world. 15:38 Sam altman and Dario amodei feeling the pressure 23:10 TECHNICAL deep dive - RLHF, ppo, dpo 37:08 GRPO, R1s secret sauce 45:02 the aha moment, learning like a human? 50:25 deepseek R1 training and controversy 59:08 deepseek healthcare evaluation - Ethnic Bias 1:06:17 The diagnostic acid test (fail) 1:12:46 Coding clinical data / Medical billing (shout out SNOMED) LinkedIn Newsletter https://www.linkedin.com/build-relation/newsletter-follow?entityUrn=7216474068085026817 YT - https://youtube.com/@DevAndDoc Spotify - https://podcasters.spotify.com/pod/show/devanddoc Apple- https://podcasts.apple.com/gb/podcast/dev-and-doc-ai-for-healthcare-podcast/id1751495120 Substack- https://aiforhealthcare.substack.com/ For enquiries - 📧Devanddoc@gmail.com 🎞️ Editor- Dragan Kraljević https://www.instagram.com/dragan_kraljevic/ 🎨Brand design and art direction - Ana Grigorovici https://www.behance.net/anagrigorovici027d
undefined
Jan 10, 2025 • 58min

#24 Significantly advancing LLMs with RAG (Google's Gemini 2.0, Deep Research, notebookLM)

Dev and Doc - Latest News Dev and Doc - Latest News It's 2025, Dev and Doc cover the latest news including Google's deep research and notebook LM, DeepMind's Promptbreeder, and Anthropic's new RAG approach. We also go through what retrieval augmented generation (RAG) is, and how this technique is advancing LLM performance. 👋 Hey! If you are enjoying our conversations, reach out, share your thoughts and journey with us. Don't forget to subscribe whilst you're here :) Meet the Team 👨🏻‍⚕️ Doc - Dr. Joshua Au Yeung - LinkedIn 🤖 Dev - Zeljko Kraljevic - Twitter Where to Follow Us LinkedIn Newsletter YouTube Spotify Apple Podcasts Substack Contact Us 📧 For enquiries - Devanddoc@gmail.com Credits 🎞️ Editor - Dragan Kraljević - Instagram 🎨 Brand Design and Art Direction - Ana Grigorovici - Behance Episode Timeline 00:00 Highlights 00:53 News - Notebook LM, OpenAI 12 days of Christmas 07:44 Change in the meta - post-training 11:34 Optimizing prompts with DeepMind Promptbreeder 13:20 Is OpenAI losing their lead against Google 16:45 Deep research vs Perplexity 24:18 AIME and oncology 26:00 Deep research results 30:20 RAG intro 33:14 Second pass RAG 36:20 RAG didn't take off 38:40 Wikichat 39:16 How do we improve on RAG? 41:11 Semantic/topic chunking, cross-encoders, agentic RAG 51:15 Google’s Problem Decomposition 53:32 Anthropic’s Contextual Retrieval Processing 56:07 Summary and wrap up References Cross Encoders Wikichat Google's Problem Decomposition Anthropic's Contextual Retrieval Google AIME in Oncology DeepMind's Promptbreeder
undefined
Sep 20, 2024 • 40min

#23 Can OpenAI's GPT o1 solve complex medical problems?

The discussion kicks off with the intriguing features of OpenAI's GPT-01 and its ability to tackle complex medical issues like diabetes. They dive into how AI models reason through challenges with a Tetris-like game, highlighting the hurdles of image processing. Evaluations showcase GPT-01's strengths and weaknesses in healthcare diagnostics, prompting a critical look at AI's reliability. The conversation also tackles the tricky world of opioid dose conversions and the importance of accuracy in medical coding, revealing a fascinating intersection of AI and healthcare.

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!
App store bannerPlay store banner
Get the app