In this episode, James Zou, an assistant professor at Stanford University, discusses the changing behavior of ChatGPT, comparing GPT-3.5 and GPT-4 versions. He also shares insights on CRISPR's impact on LLM and AI systems, monitoring behavioral changes in models, and using Twitter data for pathology image analysis.
Read more
AI Summary
Highlights
AI Chapters
Episode notes
auto_awesome
Podcast summary created with Snipd AI
Quick takeaways
The performance of ChatGPT varied significantly across different tasks, highlighting the need for more precise edits and control in large language models.
Leveraging data from social media platforms like Twitter can be valuable for training AI systems in specific domains, such as pathology image analysis.
Deep dives
Chet GPT and Changes in Behavior
The podcast episode discusses the research on the change in behavior of Chet GPT, a popular language model. The researchers conducted a systematic assessment by comparing the performance of Chet GPT in March and June versions across various tasks. Surprisingly, they found that while the later version performed better in some tasks, it performed substantially worse in others, including seemingly simpler tasks like identifying prime numbers. The researchers explored possible reasons for this change in behavior, such as conflicting objectives and the concept of "pleiotropy". They highlighted the need for more precise and surgical edits to large language models to improve control and transparency.
Visual Language Model for Pathology
Another focus of the podcast episode was a research project on visual language models for pathology image analysis. The researchers utilized data from medical Twitter to build a large dataset of medical images and text descriptions called OpenPath. They trained a vision-language model called FLIP, combining both visual understanding and medical language understanding, using the curated dataset. The model can generate descriptions for medical images, perform image searches, and be used as an assistant for pathologists. The researchers emphasized the potential benefits of these models as assistants rather than replacements for human pathologists, with a focus on augmenting decision-making.
Leveraging Social Media for Data Collection
The episode also discussed the potential of leveraging social media platforms like Twitter for data collection in specific domains. The researchers highlighted the abundance of useful scientific and medical knowledge shared on social media and the possibility of curating this data to train AI systems. They showcased the example of data collection from medical Twitter to create a large dataset for training a visual language model for pathology image analysis. The researchers noted the importance of robust evaluation and validation processes to ensure the performance and quality of AI models trained on such data.
Implications and Challenges
The podcast touched on the implications and challenges of using AI models in medical applications. While the models showed promise in assisting pathologists and providing valuable insights, there are still risks of errors, biases, and hallucinations. The researchers emphasized the need for human-AI collaboration and cautioned against relying solely on AI models. They also highlighted the importance of continuous monitoring of model behavior and the development of robust software stacks to adapt to changing model outputs.
Today we’re joined by James Zou, an assistant professor at Stanford University. In our conversation with James, we explore the differences in ChatGPT’s behavior over the last few months. We discuss the issues that can arise from inconsistencies in generative AI models, how he tested ChatGPT’s performance in various tasks, drawing comparisons between March 2023 and June 2023 for both GPT-3.5 and GPT-4 versions, and the possible reasons behind the declining performance of these models. James also shared his thoughts on how surgical AI editing akin to CRISPR could potentially revolutionize LLM and AI systems, and how adding monitoring tools can help in tracking behavioral changes in these models. Finally, we discuss James' recent paper on pathology image analysis using Twitter data, in which he explores the challenges of obtaining large medical datasets and data collection, as well as detailing the model’s architecture, training, and the evaluation process.
The complete show notes for this episode can be found at twimlai.com/go/645.
Get the Snipd podcast app
Unlock the knowledge in podcasts with the podcast player of the future.
AI-powered podcast player
Listen to all your favourite podcasts with AI-powered features
Discover highlights
Listen to the best highlights from the podcasts you love and dive into the full episode
Save any moment
Hear something you like? Tap your headphones to save it with AI-generated key takeaways
Share & Export
Send highlights to Twitter, WhatsApp or export them to Notion, Readwise & more
AI-powered podcast player
Listen to all your favourite podcasts with AI-powered features
Discover highlights
Listen to the best highlights from the podcasts you love and dive into the full episode