Training and Labeling Foundational AI Models

8 snips

Sep 20, 2023

In this podcast, Alex Ratner discusses labeling and training LLMs, the challenges faced by LLM today, and the importance of data labeling for training. The podcast also explores the process of fine-tuning foundational AI models and the significance of purpose-built AI models in scaling AI.

Ask episode

AI Snips

Chapters

Transcript

Episode notes

ANECDOTE

Snorkel's Academic Origins

Snorkel spun out of the Stanford AI Lab and emphasizes data-centric development.
The company and academic work focus on programmatic labeling and production-ready enterprise use cases.

INSIGHT

Data Is The New Interface For Models

AI development is shifting from model-centric to data-centric workflows as models grow opaque and large.
Labeling, sampling, filtering and curating data become the primary interface to edit model behavior.

INSIGHT

Ingredients Matter More Than The Recipe

Foundation models and training algorithms are rapidly standardizing while the data and ingredients determine performance.
Careful data sampling and filtering alone can produce state-of-the-art gains even with fixed models.

Get the Snipd Podcast app to discover more snips from this episode

Get the app

Alex Ratner (@ajratner, CEO @SnorkelAI) talks about labeling and training of LLMs. We go over Foundational Models and how to take an “off the shelf” model and fine-tune it for private use.

SHOW: 755

CLOUD NEWS OF THE WEEK - http://bit.ly/cloudcast-cnotw

NEW TO CLOUD? CHECK OUT - "CLOUDCAST BASICS"

SHOW SPONSORS:

Reduce the complexities of protecting your workloads and applications in a multi-cloud environment. Panoptica provides comprehensive cloud workload protection integrated with API security to protect the entire application lifecycle. Learn more about Panoptica at panoptica.app
Find "Breaking Analysis Podcast with Dave Vellante" on Apple, Google and Spotify
Keep up to data with Enterprise Tech with theCUBE

SHOW NOTES:

Topic 1 - Welcome back to the show. We last spoke two years ago. A lot has changed so we thought it would be a great time to talk about updates. For those that aren’t familiar, give everyone a quick background.

Topic 2 - Let’s start with when we last spoke. We talked a lot about data scientists and how data labeling works for training LLM’s. For those that aren’t familiar, can you give everyone a quick intro to data labeling and why it is important for training?

Topic 3 - When we last spoke, LLM weren't as mainstream as they are today. How has this impacted how you think about AI/ML in general? What are the big challenges for LLM today that you see?

Topic 4 - Many organizations are trying “off the shelf” models but this may or may not be a good idea. On one side they don’t have to build a model but on the other they still have to fine tune it to their specific needs and use case. What are your recommendations for organizations to get started and be as effective as possible?

Topic 5 - In addition to organization specific models, are teams building purpose built models for specific functions (i.e. are they running multiple models each with a different task?) How does labeling and training come into play here?

Topic 6 - Another challenge I’m seeing is security and data privacy concerns. What do folks getting started need to be aware of to make sure company data is safe?

FEEDBACK?

Email: show at the cloudcast dot net
Twitter: @thecloudcastnet