In this podcast, Alex Ratner discusses labeling and training LLMs, the challenges faced by LLM today, and the importance of data labeling for training. The podcast also explores the process of fine-tuning foundational AI models and the significance of purpose-built AI models in scaling AI.
33:44
forum Ask episode
web_stories AI Snips
view_agenda Chapters
auto_awesome Transcript
info_circle Episode notes
question_answer ANECDOTE
Snorkel's Academic Origins
Snorkel spun out of the Stanford AI Lab and emphasizes data-centric development.
The company and academic work focus on programmatic labeling and production-ready enterprise use cases.
insights INSIGHT
Data Is The New Interface For Models
AI development is shifting from model-centric to data-centric workflows as models grow opaque and large.
Labeling, sampling, filtering and curating data become the primary interface to edit model behavior.
insights INSIGHT
Ingredients Matter More Than The Recipe
Foundation models and training algorithms are rapidly standardizing while the data and ingredients determine performance.
Careful data sampling and filtering alone can produce state-of-the-art gains even with fixed models.
Get the Snipd Podcast app to discover more snips from this episode
Alex Ratner (@ajratner, CEO @SnorkelAI) talks about labeling and training of LLMs. We go over Foundational Models and how to take an “off the shelf” model and fine-tune it for private use.
Reduce the complexities of protecting your workloads and applications in a multi-cloud environment. Panoptica provides comprehensive cloud workload protection integrated with API security to protect the entire application lifecycle. Learn more about Panoptica at panoptica.app
Find "Breaking Analysis Podcast with Dave Vellante" on Apple, Google and Spotify
Topic 1 - Welcome back to the show. We last spoke two years ago. A lot has changed so we thought it would be a great time to talk about updates. For those that aren’t familiar, give everyone a quick background.
Topic 2 - Let’s start with when we last spoke. We talked a lot about data scientists and how data labeling works for training LLM’s. For those that aren’t familiar, can you give everyone a quick intro to data labeling and why it is important for training?
Topic 3 - When we last spoke, LLM weren't as mainstream as they are today. How has this impacted how you think about AI/ML in general? What are the big challenges for LLM today that you see?
Topic 4 - Many organizations are trying “off the shelf” models but this may or may not be a good idea. On one side they don’t have to build a model but on the other they still have to fine tune it to their specific needs and use case. What are your recommendations for organizations to get started and be as effective as possible?
Topic 5 - In addition to organization specific models, are teams building purpose built models for specific functions (i.e. are they running multiple models each with a different task?) How does labeling and training come into play here?
Topic 6 - Another challenge I’m seeing is security and data privacy concerns. What do folks getting started need to be aware of to make sure company data is safe?